CN112990218A

CN112990218A - Optimization method and device of image semantic segmentation model and electronic equipment

Info

Publication number: CN112990218A
Application number: CN202110319553.3A
Authority: CN
Inventors: 何栋梁; 林天威
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-06-18

Abstract

The invention provides an optimization method and device of an image semantic segmentation model and electronic equipment, relates to the technical field of artificial intelligence, particularly relates to the technical field of computer vision technology and deep learning, and can be applied to image processing scenes. The scheme is as follows: when the image semantic segmentation model is optimized, the first image semantic segmentation model and the second image semantic segmentation model with the same network structure are used, on the basis of respectively determining the cross entropy losses corresponding to the two image semantic segmentation models based on the labeled images, the unlabeled images are reused, other losses corresponding to the two image semantic segmentation models are respectively determined based on the unlabeled images, and the losses except the cross entropy losses are fully considered.

Description

Optimization method and device of image semantic segmentation model and electronic equipment

Technical Field

The application discloses an optimization method and device of an image semantic segmentation model and electronic equipment, relates to the technical field of artificial intelligence, particularly relates to the technical field of computer vision technology and deep learning, and can be applied to image processing scenes.

Background

The main purpose of image semantic segmentation is to predict to which object each frame of pixel point of an input image belongs. When training an image semantic segmentation model, an annotated image with annotation data is generally used as a training sample for training. Because the labeling images are all labeled at the pixel level, much labor is consumed, and the labeling efficiency is low. Whereas an unlabelled image without labeling data is easily obtained.

Therefore, the technical staff in the art needs to solve the problem of how to train the image semantic segmentation model by fully utilizing the unlabeled images on the basis of training the image semantic segmentation model by utilizing the labeled images, so as to improve the accuracy of the image semantic segmentation model.

Disclosure of Invention

The application provides an optimization method and device of an image semantic segmentation model and electronic equipment, on the basis of training the image semantic segmentation model by using an annotated image, the image semantic segmentation model is trained by fully using the unlabeled image together, so that the accuracy of the image semantic segmentation model is effectively improved.

According to a first aspect of the present application, there is provided an optimization method of an image semantic segmentation model, including:

and acquiring an annotated image and an unlabeled image.

Respectively determining cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same.

Respectively determining other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability.

And respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, and determining a target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model.

According to a second aspect of the present application, there is provided an optimization apparatus for an image semantic segmentation model, including:

and the acquisition unit is used for acquiring the marked image and the unmarked image.

The first determining unit is used for respectively determining the cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same.

A second determining unit, configured to determine, based on the unlabeled image, other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, respectively; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability.

And the optimization unit is used for respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, and determining a target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of optimizing an image semantic segmentation model according to the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method for optimizing an image semantic segmentation model according to the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect described above.

According to the technical scheme of the application, when the image semantic segmentation model is optimized, the cross entropy losses corresponding to the two image semantic segmentation models are respectively determined on the basis of the annotated image by means of the first image semantic segmentation model and the second image semantic segmentation model which have the same network structure; and on the basis of the cross entropy loss corresponding to the two image semantic segmentation models respectively, the unlabeled image is utilized, other losses corresponding to the two image semantic segmentation models are respectively determined based on the unlabeled image, and other losses except the cross entropy loss are fully considered, so that the two image semantic segmentation models are optimized together according to the cross entropy loss and other losses corresponding to the two image semantic segmentation models respectively, and the accuracy of the image semantic segmentation models can be effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram of a prior art image semantic segmentation model architecture provided by an embodiment of the present application;

FIG. 2 is a flowchart illustrating an optimization method of an image semantic segmentation model according to a first embodiment of the present application;

FIG. 3 is an architectural diagram illustrating an architecture for determining cross-entropy loss corresponding to a semantic segmentation model of a first image according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for determining a consistency loss corresponding to an image semantic segmentation model based on an unlabeled image according to a third embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for determining a countertraining loss corresponding to an image semantic segmentation model based on an unlabeled image according to a fourth embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a method for determining a stability loss corresponding to a semantic segmentation model of a first image based on an unlabeled image according to a fifth embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating a method for optimizing an image semantic segmentation model according to a sixth embodiment of the present application;

FIG. 8 is a schematic block diagram of an optimization apparatus 80 for an image semantic segmentation model provided according to a seventh embodiment of the present application;

fig. 9 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The optimization method of the image semantic segmentation model provided by the embodiment of the application can be applied to image processing scenes, and particularly can be applied to training of the image semantic segmentation model. The main purpose of image semantic segmentation is to predict what object each frame of pixel point of an input image belongs to, and the accuracy of pixel point identification in the image depends on the accuracy of an image semantic segmentation model. Therefore, it is very important to optimize the image semantic segmentation model to improve the accuracy of the image semantic segmentation model.

In the prior art, when an image semantic segmentation model is optimized, an annotated image with annotation data is generally used as a training sample for optimization. Because the labeling images are all labeled at the pixel level, much labor is consumed, and the labeling efficiency is low. Whereas an unlabelled image without labeling data is easily obtained. Therefore, the technical staff in the art needs to solve the problem how to train the image semantic segmentation model by fully utilizing the easily-obtained unlabeled images on the basis of training the image semantic segmentation model by utilizing the labeled images, so as to improve the accuracy of the image semantic segmentation model.

In order to fully utilize the easily-obtained unmarked images to train the image semantic segmentation model together, a semi-supervised mode can be considered to optimize the image semantic segmentation model. When the image semantic segmentation model is optimized in a semi-supervised mode, two image semantic segmentation models can be constructed in advance, one image semantic segmentation model is a teecher model T, and the other image semantic segmentation model is a student model S. For example, please refer to fig. 1, where fig. 1 is a schematic diagram of a framework of a semantic segmentation model of an existing image provided in an embodiment of the present application, and when any t-th network is updated, all parameters θ in a network corresponding to a student model S are shown_sMesh corresponding to equal teacher model TParameter in the network theta_TI.e. at the time t,

and the parameter theta in the network corresponding to the teacher model T_TAnd performing gradient descent updating according to the trained loss function. When performing optimization based on the loss function, a way of data augmentation is constructed, such as rotating the image by 90 degrees. For the input image (x, y) with labeled data and the data phi (x), phi (y) after data augmentation, the prediction results of the teacher model T are respectively T (x), T (phi (x)), one part of the loss function of the network corresponding to the teacher model T in training comes from the supervised training loss function between T (x), T (phi (x)) and y, phi (y), and the other part comes from the input image x without labeled data_uA constructed consistency loss function. The physical meaning of the consistency loss function is the requirement x_uPrediction result T (x) of network corresponding to the teacher model T_u) The result obtained after transformation of phi (T (x)_u) And x) with_uAfter changing phi (x)_u) Then the predicted result S (phi (x)) of the network corresponding to the student model S is passed through_u) Are consistent between them). By constructing the consistency loss, the input image x without annotation data can be constructed_uIntroduced into the training of the teacher model T.

However, in the semi-supervised learning method based on the classical mean-teacher frame design, because the student model S is the exponential moving average of the teacher model T, the student model S and the teacher model T can converge to a result which tends to be approximate finally, the student model S and the teacher model T tend to be homogeneous, the segmentation accuracy of the student model S is restricted by the teacher model T, and the diversity of the student model S is not well utilized; moreover, these images without labeling data are only used to construct a supervision signal of loss of consistency, improving the training of the teacher model T. However, in the practical application process, the segmentation result of the part of pixels of the image without the annotation data after model prediction is accurate and reliable, and the method can be applied to further improve the accuracy of the image semantic segmentation model.

In combination with the above analysis, the embodiment of the present application provides a semi-supervised semantic segmentation network learning method for an anti-dual student network model, which can upgrade the classic mean-teacher frame to a dual-student frame, i.e., a dual-student network frame. Because two student models S in the double-student network framework are almost independently learned, the double-student network framework can better utilize the diversity brought by the two student models S, and find a better student model S as a finally-generated image semantic segmentation model, so that the problem that the mean-teacher framework can converge to a result which tends to be approximate finally can be well solved.

Based on the technical concept, the application provides an optimization method of an image semantic segmentation model, when the image semantic segmentation model is optimized, an annotated image and an unlabeled image can be obtained firstly, and cross entropy losses corresponding to a first image semantic segmentation model and a second image semantic segmentation model are respectively determined based on the annotated image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same; respectively determining other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the unmarked image; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability; and respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, and determining a target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model.

The annotated image can be understood as an image with annotated data, and the unlabeled image can be understood as an image without annotated data.

For example, in the embodiment of the present application, the other loss may be any one of a consistency loss, an anti-training loss, and a stability loss, may also be any two of the consistency loss, the anti-training loss, and the stability loss, and may also be the consistency loss, the anti-training loss, and the stability loss, which may be specifically set according to actual needs, and the embodiment of the present application is not limited further herein.

It can be understood that, in the embodiment of the present application, by using the first image semantic segmentation model and the second image semantic segmentation model which have the same network structure but different parameter values, for example, the first image semantic segmentation model and the second image semantic segmentation model may be two student models S, and since the two image semantic segmentation models are almost independently learned, the problem of homogenization caused by a mean-teacher frame can be well solved by using the network frame, so that the two image semantic segmentation models are optimized based on the cross entropy loss and other losses corresponding to the two image semantic segmentation models, and the accuracy of the image semantic segmentation model can be effectively improved.

It can be seen that in the embodiment of the application, when the image semantic segmentation model is optimized, the cross entropy losses corresponding to the two image semantic segmentation models are respectively determined based on the annotated image by means of the first image semantic segmentation model and the second image semantic segmentation model which have the same network structure; and on the basis of the cross entropy loss corresponding to the two image semantic segmentation models respectively, the unlabeled image is utilized, other losses corresponding to the two image semantic segmentation models are respectively determined based on the unlabeled image, and other losses except the cross entropy loss are fully considered, so that the two image semantic segmentation models are optimized together according to the cross entropy loss and other losses corresponding to the two image semantic segmentation models respectively, and the accuracy of the image semantic segmentation models can be effectively improved.

Hereinafter, the optimization method of the image semantic segmentation model provided by the present application will be described in detail through specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Example one

Fig. 2 is a flowchart illustrating an optimization method of an image semantic segmentation model according to a first embodiment of the present application, which may be performed by software and/or a hardware device, for example, a terminal or a server. For example, referring to fig. 2, the method for optimizing the image semantic segmentation model may include:

s201, acquiring an annotated image and an annotated image.

For example, when the annotation image is obtained, in a possible implementation manner, the annotation image with the annotation data may be directly obtained from other devices; in another possible implementation manner, an unlabelled image without labeling data may be acquired from another device, and the acquired unlabelled image is output, so that a labeling person labels the unlabelled image and accesses the labeling data labeled by the labeling person, so as to acquire a labeled image.

Because the labeled images are labeled at the pixel level, much labor is consumed, the labeling efficiency is low, and a large amount of labeled images are difficult to obtain. Whereas an unlabelled image without labeling data is easily obtained. In order to better optimize the image semantic segmentation model, it can be considered that an image semantic segmentation model is trained by fully utilizing easily-obtained unmarked images on the basis of utilizing marked images, so that a large number of unmarked images can be obtained through a network besides the marked images, and the image semantic segmentation model is trained by utilizing the marked images and the unmarked images.

After the annotated image and the unlabeled image are acquired respectively, the annotated image and the unlabeled image can be used to train an image semantic segmentation model together, that is, the following steps S202 and S203 are executed:

s202, respectively determining cross entropy losses corresponding to a first image semantic segmentation model and a second image semantic segmentation model based on the annotation image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same.

It can be understood that, when cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model are respectively determined, because the cross entropy losses corresponding to the two image semantic segmentation models are determined based on the annotation image, the cross entropy losses corresponding to the two image semantic segmentation models are both supervised losses.

S203, respectively determining other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability.

It can be understood that, when other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model are respectively determined, since the other losses corresponding to the two image semantic segmentation models are both determined based on the unlabeled image, the other losses corresponding to the two image semantic segmentation models are both unsupervised losses.

After the cross entropy loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model is obtained in S202 and other losses corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model are obtained in S203, the following S204 may be performed:

s204, respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, and determining a target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model.

Respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, and determining a target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model so as to be used for subsequent image semantic segmentation processing based on the target image semantic segmentation model.

For example, when the target image semantic segmentation model is determined based on the optimized first and second image semantic segmentation models, in a possible manner, an image semantic segmentation model with higher accuracy in the two optimized image semantic segmentation models may be determined as the target image semantic segmentation model; in another possible manner, if the accuracy of both the two optimized image semantic segmentation models reaches the accuracy threshold, any one of the two optimized image semantic segmentation models may be determined as the target image semantic segmentation model, and may be specifically set according to actual needs.

Based on the embodiment shown in fig. 1, in S202, how to determine the cross entropy loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image is convenient to understand. In the following, how to determine the cross entropy loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image respectively in the embodiment of the present application will be described in detail through the following second embodiment. It can be understood that, since the method for determining the cross entropy loss corresponding to the first image semantic segmentation model based on the annotated image is similar to the method for determining the cross entropy loss corresponding to the second image semantic segmentation model based on the annotated image, the description will be given by taking the determination of the cross entropy loss corresponding to either one of the first image semantic segmentation model and the second image semantic segmentation model based on the annotated image as an example.

Example two

For example, when determining the cross entropy loss corresponding to the image semantic segmentation model based on the unmarked image, the marked image may be input into the image semantic segmentation model to obtain a second prediction result corresponding to the marked image; and determining the cross entropy loss corresponding to the image semantic segmentation model according to the second prediction result and the labeled data of the labeled image.

Taking the example of determining the cross entropy loss corresponding to the first image semantic segmentation model based on the unlabeled image, the first image semantic segmentation model may be represented by Sa, as shown in fig. 3, fig. 3 is an architectural schematic diagram for determining the cross entropy loss corresponding to the first image semantic segmentation model provided in this embodiment of the present application, xl in fig. 3 represents an labeled image, and yl represents labeled data corresponding to the labeled image, the labeled image xl is input to the Sa model to obtain a second prediction result Sa (xl) corresponding to the labeled image xl, and then the cross entropy loss corresponding to the Sa model is determined according to the labeled data yl corresponding to the labeled image and the second prediction result Sa (xl) together, which may be referred to the following formula 1:

wherein the content of the first and second substances,

the method comprises the steps of representing a cross entropy loss corresponding to a first image semantic segmentation model, namely an Sa model, representing the pixel of the (i, j) th frame in an annotated image by i, j, representing the number of semantic segmentation label classes by C, representing the height of the annotated image by H, representing the width of the annotated image by W, and representing a second prediction result obtained by the annotated image through the Sa model by Sa (xl). The predictor may be a matrix of HxWxC, and the predictor corresponding to each frame of pixels is a C-dimensional vector whose sum is 1. For example, assume that the semantically segmented tag classes include three classes: if the prediction results corresponding to the pixels of a certain frame are [ 0.3,0.05,0.65 ] for cats, dogs, and others, the probabilities that the pixels of the certain frame belong to other cats and dogs are 0.3,0.05, and 0.65, respectively.

Through the formula 1, the cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model can be respectively determined, so that the unmarked images can be fully utilized on the basis of the cross entropy losses obtained by utilizing the existing marked images, the losses except the cross entropy losses are considered, the two image semantic segmentation models are jointly optimized according to the cross entropy losses corresponding to the two image semantic segmentation models and the other losses, and the accuracy of the image semantic segmentation models is effectively improved.

When two image semantic segmentation models are optimized according to cross entropy loss and other loss corresponding to the two image semantic segmentation models respectively, taking the other loss including consistency loss as an example, how to determine consistency loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model based on an unlabeled image in the above S203 will be described in detail through the following third embodiment shown in fig. 4. It can be understood that, since the method for determining the consistency loss corresponding to the semantic segmentation model of the first image based on the unlabeled image is similar to the method for determining the consistency loss corresponding to the semantic segmentation model of the second image based on the unlabeled image, the description will be given by taking the consistency loss corresponding to the semantic segmentation model of either one of the semantic segmentation model of the first image and the semantic segmentation model of the second image based on the unlabeled image as an example.

EXAMPLE III

Fig. 4 is a flowchart illustrating a method for determining a consistency loss corresponding to an image semantic segmentation model based on an unlabeled image according to a third embodiment of the present application, where the method for determining a consistency loss corresponding to an image semantic segmentation model based on an unlabeled image may also be performed by a software and/or hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 4, the method for determining a consistency loss corresponding to an image semantic segmentation model based on an unlabeled image may include:

s401, inputting an unmarked image into an image semantic segmentation model to obtain a first prediction result corresponding to the unmarked image, and performing augmentation transformation on the first prediction result to obtain a first numerical value corresponding to each frame of pixel in the unmarked image; the image semantic segmentation model is any one of a first image semantic segmentation model and a second image semantic segmentation model.

For example, the augmentation transformation may be an image rotation transformation or an image warping transformation, and the like, and may be specifically set according to actual needs.

Also for example, the consistency loss corresponding to the semantic segmentation model of the first image, which is further denoted by Sa and is shown in FIG. 3 above, is determined based on the unlabeled image, x_uDenotes an unlabeled image, and τ denotes an augmentation transform, and when the corresponding consistency loss of the Sa model is determined, on the one hand, the unlabeled image x may be used_uInputting the data into a Sa model to obtain a first prediction result corresponding to the unmarked image, and then carrying out tau transformation on the first prediction result to obtain an unmarked image x_uA first value corresponding to each frame pixel, which can be expressed as τ (S)_a(x_u)). On the other hand, the unlabeled image x can be firstly labeled_uCarrying out tau transformation, and inputting the tau transformation result into the Sa model to obtain an unmarked image x_uThe first one corresponding to each frame pixelA second value, which can be expressed as S_a(τ(x_u) I.e., the following S402 is executed:

s402, performing augmentation transformation on the unmarked image, and inputting a transformation result to the image semantic segmentation model to obtain a second numerical value corresponding to each frame of pixel in the unmarked image.

It can be understood that, in the embodiment of the present application, there is no sequence between S401 and S402, and S401 may be executed first, and then S402 may be executed; or executing S402 first and then executing S401; s401 and S402 may also be executed simultaneously, and may be specifically set according to actual needs, and here, the embodiment of the present application is only described by taking the example of executing S401 first and then executing S402, but the embodiment of the present application is not limited to this.

And S403, determining consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to each frame pixel.

For example, when the consistency loss corresponding to the image semantic segmentation model is determined according to the first value and the second value corresponding to each frame of pixels, the mean square error corresponding to each frame of pixels may be determined according to the first value and the second value corresponding to each frame of pixels, where the mean square error corresponding to each frame of pixels is the consistency loss corresponding to each frame of pixels of the image semantic segmentation model, and then the first sum of the mean square errors corresponding to each frame of pixels is calculated; determining the ratio of the first value to the third value as the consistency loss corresponding to the image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

With reference to the description in S401, when determining the consistency loss corresponding to the Sa model, when determining the mean square error corresponding to each frame pixel according to the first value and the second value corresponding to each frame pixel, the mean square error corresponding to the frame pixel is the consistency loss corresponding to each frame pixel of the Sa model, which can be referred to as the following formula 2, and after obtaining the mean square error corresponding to each frame pixel, the ratio of the first value and the third value of the mean square error corresponding to each frame pixel can be calculated to obtain the consistency loss corresponding to the Sa model, which can be referred to as the following formula 3:

ε^a(xu)_(i,j)＝(Sa(τ(xu)))_(i,j)-τ(Sa(xu))_(i,j))²equation 2

Wherein epsilon^a(xu)_(i,j)Sa (τ (xu)), which represents the loss of uniformity of the Sa model for the (i, j) -th frame pixel_(i,j)Representing an unlabelled image x_uSecond value, τ (Sa (xu)) corresponding to the pixel of the (i, j) th frame_(i,j)Representing an unlabelled image x_uA first value corresponding to the pixel of the (i, j) th frame,

the consistency loss corresponding to the Sa model is shown, h is the height of the unmarked image, and w is the width of the unmarked image.

It can be seen that, in the embodiment of the application, a first prediction result corresponding to an unlabeled image is obtained by inputting the unlabeled image into an image semantic segmentation model, and the first prediction result is subjected to augmentation transformation to obtain a first numerical value corresponding to each frame of pixel in the unlabeled image; carrying out augmentation transformation on the unlabeled image, and inputting a transformation result into the image semantic segmentation model to obtain a second numerical value corresponding to each frame of pixel in the unlabeled image; and determining consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to each frame of pixels. On the basis of cross entropy losses corresponding to the two image semantic segmentation models respectively, consistency losses corresponding to the two image semantic segmentation models are determined respectively based on the unlabeled images, and consistency losses except the cross entropy losses are fully considered, so that the two image semantic segmentation models are optimized together according to the cross entropy losses and the consistency losses corresponding to the two image semantic segmentation models respectively, and the accuracy of the image semantic segmentation models can be effectively improved.

Aiming at the unmarked image without the marked data, a thinking of counterstudy can be adopted to train and learn a discriminator, and the discriminator can be used for judging whether the current segmentation result belongs to the result predicted by the image semantic segmentation model or the real artificial marking result. After the classifier learns well, if the classifier can be confused with a high probability for the pseudo label obtained by predicting the image semantic segmentation model, the classifier considers that the pseudo label is artificially labeled, and then considers that the confidence coefficient of the pseudo label is high. The countermeasure loss is a loss function of common GAN training and is mainly used for training a discriminator to predict labels which are manually marked as close to 1 as possible, and the label prediction predicted by an image semantic segmentation model is close to 0 as possible.

In view of this, when the two image semantic segmentation models are optimized according to the cross entropy loss and other losses corresponding to the two image semantic segmentation models, the other losses may also include countertraining losses. Next, how to determine the countertraining loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image in the above S203 will be described in detail through the following fourth embodiment shown in fig. 5. It can be understood that, since the method for determining the countertraining loss corresponding to the first image semantic segmentation model based on the unlabeled image is similar to the method for determining the countertraining loss corresponding to the second image semantic segmentation model based on the unlabeled image, the method for determining the countertraining loss corresponding to either one of the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image will be described as an example.

Example four

Fig. 5 is a flowchart illustrating a method for determining an anti-training loss corresponding to an image semantic segmentation model based on an unlabeled image according to a fourth embodiment of the present disclosure, where the method for determining the anti-training loss corresponding to the image semantic segmentation model based on the unlabeled image may also be performed by a software and/or hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 5, the method for determining the corresponding resistance training loss of the image semantic segmentation model based on the unlabeled image may include:

s501, inputting the cascade operation results of the labeled image and the labeled data of the labeled image into a discriminator to obtain a first discrimination result corresponding to the labeled image and first characteristics of the labeled image in each layer of the M layer of the discriminator.

S502, inputting the unmarked image and the unmarked image into the image semantic segmentation model to obtain a cascade operation result of a first prediction result, and inputting the cascade operation result into the discriminator to obtain a second discrimination result corresponding to the unmarked image and a second feature of the unmarked image in each layer.

S503, determining the countervailing training loss corresponding to the image semantic segmentation model according to the first judgment result, the first characteristic of the labeled image in each layer, the second judgment result and the second characteristic of the unlabeled image in each layer.

Illustratively, when the countertraining loss corresponding to the image semantic segmentation model is determined, the countertraining loss corresponding to the image semantic segmentation model comprises game loss between the discriminator and the image semantic segmentation model, self-training loss corresponding to the image semantic segmentation model and feature matching loss between the discriminator and the image semantic segmentation model, so when the countertraining loss corresponding to the image semantic segmentation model is determined according to the first discrimination result, the first feature of the labeled image in each layer, the second discrimination result and the second feature of the unlabeled image in each layer, the game loss between the discriminator and the image semantic segmentation model can be determined according to the first discrimination result and the second discrimination result, and the self-training loss corresponding to the image semantic segmentation model can be determined according to the second discrimination result; determining the feature matching loss between the discriminator and the image semantic segmentation model according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer; and determining the confrontation training loss corresponding to the image semantic segmentation model according to the game loss, the self-training loss and the feature matching loss.

For example, when determining the game loss between the discriminator and the image semantic segmentation model according to the first discrimination result and the second discrimination result, taking the determination of the game loss corresponding to the game loss Sa model between the Da discriminator and the Sa model as an example, when determining the game loss corresponding to the game loss Sa model between the Da discriminator and the Sa model, the game loss between the Da discriminator and the Sa model can be determined according to the first discrimination result and the second discrimination result, and the following formula 4 can be referred to:

wherein the content of the first and second substances,

representing the game loss between Da arbiter and Sa model,

indicates the first discrimination result corresponding to the annotation image xl,

representing an unlabelled image x_uAnd a corresponding second judgment result.

For example, when determining the self-training loss corresponding to the image semantic segmentation model according to the second determination result, it may be determined whether the determination result of the target pixel in the second determination result is greater than or equal to the preset threshold, if the determination result of the target pixel in the second determination result is greater than or equal to the preset threshold, it is indicated that the label predicted by the pixel is very close to the artificial label, and the label may be used as a pseudo label, and is subsequently used as a supervision signal to train the image semantic segmentation model, the cross entropy loss between the determination result of the target pixel and the prediction result corresponding to the target pixel in the first prediction result is determined as the self-training loss corresponding to the image semantic segmentation model; and if the judgment results of all the pixels in the second judgment result are smaller than the preset threshold value, determining that the self-training loss corresponding to the image semantic segmentation model is 0.

Taking the determination of the self-training loss corresponding to the Sa model as an example, when determining the self-training loss corresponding to the Sa model, see the following formula 5:

wherein the content of the first and second substances,

representing the corresponding self-training loss of the Sa model,

the binarization result of the prediction result corresponding to the (i, j) th frame pixel in the first prediction result is represented, the label with the maximum probability of each prediction is set as 1, and the rest labels are set as 0;

the cascade operation representing the dimensions of the channel,

and gamma represents a preset threshold, and the value of the preset threshold can be set according to actual needs.

For example, when determining the feature matching loss between the discriminator and the image semantic segmentation model according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer, the feature matching loss between the discriminator and the image semantic segmentation model for each layer of features may be determined according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer; calculating a second sum of the feature matching losses of the discriminator and the image semantic segmentation model aiming at the features of each layer; the second sum is determined as a loss of feature matching between the discriminator and the image semantic segmentation model. For the feature matching loss, the feature statistical information difference between the predicted tag and the labeled tag is required to be small.

Taking the example of determining the loss of feature matching between Da discriminator and Sa model, the features between Da discriminator and Sa model are determinedWhen the matching is lost, the first feature of the marked image xl in each layer and the unmarked image x can be used_uDetermining, at the second feature of each layer, a feature matching loss between the Da discriminator and the Sa model for each layer of features, and determining a sum of the feature matching losses between the Da discriminator and the Sa model for each layer of features as a feature matching loss between the Da discriminator and the Sa model, as shown in the following equation 6:

wherein the content of the first and second substances,

indicating the k-th layer characteristics in the Da discriminator,

representing the loss of feature matching between the Da discriminator and the Sa model, k representing the ith layer in the Da discriminator,

indicating a first feature of the annotation image xl at the k-th layer of the Da discriminator,

representing an unlabelled image x_uSecond feature in the k-th layer of the Da discriminator.

After the game loss, the self-training loss and the feature matching loss are respectively determined by combining the formulas, the confrontation training loss corresponding to the image semantic segmentation model can be determined according to the game loss, the self-training loss and the feature matching loss. For example, when the confrontation training loss corresponding to the image semantic segmentation model is determined according to the game loss, the self-training loss and the feature matching loss, a first coefficient corresponding to the self-training loss and a second coefficient corresponding to the feature matching loss may be respectively determined first; and determining the product of the self-training loss and the first coefficient, the product of the feature matching loss and the second coefficient and the third sum of the game loss as the confrontation training loss corresponding to the image semantic segmentation model.

Taking the determination of the corresponding antagonistic training loss of the Sa model as an example, the following formula 7 can be referred to when determining the corresponding antagonistic training loss of the Sa model:

wherein the content of the first and second substances,

denotes the corresponding antagonistic training loss, λ, of the Sa model_fmRepresenting feature matching loss

The corresponding second coefficient of the first coefficient,

representing the loss of feature matching between the Da discriminator and the Sa model,

representing game loss, λ, between Da discriminator and Sa model_stRepresenting loss of self-training

The first coefficient of the first set of coefficients,

representing the self-training loss corresponding to the Sa model.

It can be seen that, in the embodiment of the present application, a first determination result corresponding to an annotation image and a first feature of the annotation image in each layer of an M layer of a discriminator are obtained by inputting a cascade operation result of annotation data of the annotation image and the annotation image to the discriminator; inputting the unmarked image and the unmarked image into a cascade operation result of a first prediction result obtained by the image semantic segmentation model to a discriminator to obtain a second discrimination result corresponding to the unmarked image and a second feature of the unmarked image in each layer; and determining the corresponding confrontation training loss of the image semantic segmentation model according to the first judgment result, the first characteristic of the labeled image in each layer, the second judgment result and the second characteristic of the unlabeled image in each layer. On the basis of cross entropy losses corresponding to the two image semantic segmentation models respectively, countervailing training losses corresponding to the two image semantic segmentation models are determined respectively based on the unlabeled images, and countervailing training losses other than the cross entropy losses are fully considered, so that the two image semantic segmentation models are optimized together according to the cross entropy losses and the countervailing training losses corresponding to the two image semantic segmentation models respectively, and the accuracy of the image semantic segmentation models can be effectively improved.

In order to enable the first image semantic segmentation model and the second image semantic segmentation model with the same structure, the two image semantic segmentation models can construct a stability loss between the two image semantic segmentation models aiming at the interaction between more reliable prediction results, so that the two semantic segmentation models can be aligned to the more reliable prediction results. Therefore, when two image semantic segmentation models are optimized according to the cross entropy loss and other losses corresponding to the two image semantic segmentation models, the other losses can also include stability loss. Different from the cross entropy loss, consistency loss and countertraining loss, when determining the cross entropy loss, consistency loss and countertraining loss corresponding to the first image semantic segmentation model or the second image semantic segmentation model, the calculation process is split, for example, when determining the cross entropy loss, consistency loss and countertraining loss corresponding to the first image semantic segmentation model, the second image semantic segmentation model does not participate in the calculation process; when determining the stability loss corresponding to the first image semantic segmentation model or the second image semantic segmentation model, the calculation processes are related to each other, for example, when determining the stability loss corresponding to the first image semantic segmentation model, the second image semantic segmentation model also participates in the calculation process.

Next, how to determine the stability loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image in the above S203 will be described in detail through an embodiment five shown in fig. 6 below. It can be understood that, since the method for determining the stability loss corresponding to the semantic segmentation model of the first image based on the unlabeled image is similar to the method for determining the stability loss corresponding to the semantic segmentation model of the second image based on the unlabeled image, the stability loss corresponding to the semantic segmentation model of the first image based on the unlabeled image will be described as an example.

EXAMPLE five

Fig. 6 is a flowchart illustrating a method for determining a stability loss corresponding to a semantic segmentation model of a first image based on an unlabeled image according to a fifth embodiment of the present disclosure, where the method for determining a stability loss corresponding to a semantic segmentation model of a first image based on an unlabeled image may also be performed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 6, the method for determining a stability loss corresponding to the first image semantic segmentation model based on the unlabeled image may include:

s601, determining the stability loss of the first image semantic segmentation model corresponding to each frame pixel according to the first type first numerical value and the first type second numerical value corresponding to each frame pixel obtained through the first image semantic segmentation model and the second type first numerical value and the second type second numerical value corresponding to each frame pixel obtained through the second image semantic segmentation model.

It can be understood that, in the embodiment of the present application, in order to distinguish a first numerical value corresponding to each frame pixel obtained by a first image semantic segmentation model from a first numerical value corresponding to each frame pixel obtained by a second image semantic segmentation model, the first numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model is recorded as a first-class first numerical value, and the first numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model is recorded as a second-class first numerical value; similarly, in order to distinguish a second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model from a second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model, the second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model is recorded as a first-class second numerical value, and the second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model is recorded as a second-class second numerical value.

For example, when determining that the stability of the first image semantic segmentation model for each frame pixel is lost according to the first class of first numerical values and the first class of second numerical values corresponding to each frame pixel obtained by the first image semantic segmentation model and the second class of first numerical values and the second class of second numerical values corresponding to each frame pixel obtained by the second image semantic segmentation model, it may be determined whether the first class of first numerical values and the first class of second numerical values corresponding to the pixels and the second class of first numerical values and the second class of second numerical values corresponding to the pixels meet a preset condition for each frame pixel; wherein the preset conditions include: the semantic labels of the first numerical value and the second numerical value of each type are consistent, and at least one of the first numerical value and the second numerical value of each type is larger than a preset threshold value.

It is understood that, in the embodiment of the present application, the preset conditions may include: aiming at each frame of pixel in an unlabeled image, a first type of first numerical value is obtained by the frame of pixel through a first image semantic segmentation model and then through augmentation transformation, the semantic label of the first type of second numerical value is consistent with that of the first type of second numerical value obtained by the frame of pixel through augmentation transformation and then through prediction of the first image semantic segmentation model, and at least one numerical value in the first type of first numerical value and the first type of second numerical value is required to be larger than a preset threshold value; and aiming at each frame of pixel in the unmarked image, the frame of pixel is firstly subjected to a second image semantic segmentation model and then subjected to augmentation transformation to obtain a second type of first numerical value, the semantic labels of the second type of second numerical value obtained by the frame of pixel through augmentation transformation and then subjected to prediction by the second image semantic segmentation model are consistent, and at least one of the second type of first numerical value and the second type of second numerical value is required to be larger than a preset threshold value. It can be seen that the preset condition defines a first type of first numerical value and a first type of second numerical value respectively obtained based on the first image semantic segmentation model, and a second type of first numerical value and a second type of second numerical value respectively obtained based on the second image semantic segmentation model.

In one case, if the preset conditions are all met, determining a stability loss corresponding to the pixel of the first image semantic segmentation model according to the first class second numerical value, the second class second numerical value, a first consistency loss corresponding to the pixel of the first image semantic segmentation model, and a second consistency loss corresponding to the pixel of the second image semantic segmentation model. For example, if the first consistency loss is greater than the second consistency loss, determining a mean square error between the first class second numerical value and the second class second numerical value, and determining the mean square error as a stability loss of the first image semantic segmentation model corresponding to the pixel; and if the first consistency loss is equal to or less than the second consistency loss, determining that the stability loss of the first image semantic segmentation model corresponding to the pixel is 0.

In another case, if the preset condition is not met, determining the stability loss of the first image semantic segmentation model corresponding to the pixel according to the first type second numerical value, the second type second numerical value and the fourth numerical value. For example, when determining the stability loss of the first image semantic segmentation model for the pixel according to the first class second numerical value, the second class second numerical value, and the fourth numerical value, a mean square error between the first class second numerical value and the second class second numerical value is determined first; and determining the product of the mean square error and the fourth numerical value as the stability loss of the first image semantic segmentation model corresponding to the pixel. And the fourth numerical value is determined according to the second type first numerical value and the second type second numerical value corresponding to the pixel. And if the semantic labels of the second type first numerical value and the second type second numerical value corresponding to the pixel are consistent, and at least one of the second type first numerical value and the second type second numerical value is greater than a preset threshold value, the value of the fourth numerical value is 1, otherwise, the value of the fourth numerical value is 0.

Taking the first image semantic segmentation model represented by Sa and the second image semantic segmentation model represented by Sb as an example, when determining the stability loss of the Sa model for each frame pixel, the unlabeled image x may be first subjected to_uEach of whichThe method comprises the steps of frame pixels, judging whether a first type of first numerical value and a first type of second numerical value corresponding to the pixels and a second type of first numerical value and a second type of second numerical value corresponding to the pixels meet preset conditions or not; if the semantic labels of the first class of first numerical values and the first class of second numerical values corresponding to the pixels are provided, and at least one of the first class of first numerical values and the first class of second numerical values is larger than a preset threshold value, marking as r^a1, otherwise denoted r^a0; if the semantic labels of the second type first numerical value and the second type second numerical value corresponding to the pixel are provided, and at least one of the second type first numerical value and the second type second numerical value is larger than a preset threshold value, marking as r^b1, otherwise denoted r^bSpecifically, the stability loss of the Sa model for the pixel can be determined by the following equation 8.

Wherein the content of the first and second substances,

representing Sa model for unlabeled image x_uStability loss, ε, for the (i, j) th frame pixel^a(xu)_(i,j)Represents the consistency loss of Sa model corresponding to the (i, j) th frame pixel, epsilon^b(xu)_(i,j)Representing the corresponding loss of consistency of the Sb model for the (i, j) th frame pixel [. ]]₁Is an indicator function, whose value is 1 if the condition is true, and 0 otherwise; if is e^a(xu)_(i,j)Greater than epsilon^b(xu)_(i,j)Then [ epsilon ]^a(xu)_(i,j)＞ε^b(xu)_(i,j)]_lIs 1, otherwise, [ epsilon ]^a(xu)_(i,j)＞ε^b(xu)_(i,j)]_lIs 0; l is_mse(xu)_(i,j)Is a first-class second numerical value Sa (tau (xu)) obtained based on a Sa model)_(i,j)And a second-class second numerical value Sb (tau (xu)) obtained based on the Sb model)_(i,j)The mean square error therebetween can be expressed by the following equation 9:

L_mse(xu)_(i,j)＝(Sa(τ(xu)))_(i,j)-Sb(τ(xu)))_(i,j))²equation 9

After the stability loss corresponding to each frame pixel of the first image semantic segmentation model is obtained through calculation, the stability loss corresponding to the first image semantic segmentation model can be determined according to the stability loss corresponding to each frame pixel of the first image semantic segmentation model, that is, the following steps are executed 602:

s602, determining the stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to each frame pixel of the first image semantic segmentation model.

For example, when determining the stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to each frame pixel of the first image semantic segmentation model, a fourth sum of the stability losses corresponding to each frame pixel of the first image semantic segmentation model may be calculated first; determining the ratio of the fourth sum to the third value as the stability loss corresponding to the first image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

Taking the first image semantic segmentation model represented by Sa as an example, when determining the stability loss corresponding to the Sa model, the Sa model in S601 may be applied to the unlabeled image x_uThe ratio of the sum of the stability losses corresponding to the frame pixels in the middle frame to the third value is determined as the stability loss corresponding to the Sa model, which can be seen in the following equation 10:

wherein the content of the first and second substances,

representing a corresponding loss of stability for the Sa model,

representing Sa model for unlabeled image x_uMiddle (i, j) th frame pixel corresponds toStability of (2) is lost.

It can be seen that, in the embodiment of the present application, the stability loss of the first image semantic segmentation model corresponding to each frame pixel may be determined according to the first type of first numerical value and the first type of second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model, and the second type of first numerical value and the second type of second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model; and determining the stability loss corresponding to the semantic segmentation model of the first image aiming at the stability loss corresponding to the pixels of each frame according to the semantic segmentation model of the first image. On the basis of cross entropy losses corresponding to the two image semantic segmentation models, stability losses corresponding to the two image semantic segmentation models are respectively determined based on the unlabeled images, and stability losses except the cross entropy losses are fully considered, so that the two image semantic segmentation models are optimized according to the cross entropy losses and the stability losses corresponding to the two image semantic segmentation models, and the accuracy of the image semantic segmentation models can be effectively improved.

Based on any of the above embodiments, after the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model are respectively determined, the first image semantic segmentation model and the second image semantic segmentation model can be optimized according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model. Next, how to optimize the first image semantic segmentation model and the second image semantic segmentation model respectively according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively in the above S204 will be described in detail through the following sixth embodiment shown in fig. 7.

EXAMPLE six

Fig. 7 is a flowchart illustrating a method for optimizing an image semantic segmentation model according to a sixth embodiment of the present application, where the method for optimizing the image semantic segmentation model may also be performed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 4, the method for optimizing the image semantic segmentation model may include:

and S701, respectively determining third coefficients corresponding to other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

The third coefficient may be set according to actual needs, and the value of the third coefficient is not specifically limited in this embodiment of the application.

S702, summing products of other corresponding losses and a third coefficient with corresponding cross entropy losses, and determining the sum as total losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively.

Taking other losses including consistency loss, contrast training loss, and stability loss as an example, when determining the total loss corresponding to the first image semantic segmentation model Sa model, since the total loss includes four losses of cross entropy loss, consistency loss, contrast training loss, and stability loss, the total loss corresponding to the Sa model can be determined according to the four losses, which can be referred to the following formula 11:

wherein L is^aRepresenting the corresponding total loss of the Sa model,

represents the corresponding cross entropy loss, lambda, of the Sa model₁To represent

The corresponding coefficients of the coefficients are then compared to each other,

denotes the corresponding loss of consistency, λ, of the Sa model₂To represent

represents the corresponding stability loss, λ, of the Sa model₃Representing the corresponding coefficients

Representing the corresponding antagonistic training loss of the Sa model.

It can be understood that, in the embodiment of the present application, when the total loss corresponding to the second image semantic segmentation model Sb model is calculated, a calculation method of the total loss corresponding to the first image semantic segmentation model Sa is similar to the above-mentioned method for calculating the total loss corresponding to the first image semantic segmentation model Sa, and reference may be made to the above-mentioned description of calculating the total loss corresponding to the first image semantic segmentation model Sa model, and here, how to calculate the total loss corresponding to the second image semantic segmentation model Sb model is not described in detail in the embodiment of the present application.

After the total loss corresponding to the first image semantic segmentation model and the second image semantic segmentation model is obtained through respective calculation, the first image semantic segmentation model and the second image semantic segmentation model can be optimized according to the total loss corresponding to the first image semantic segmentation model and the second image semantic segmentation model, that is, the following S703 is executed:

s703, respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the total loss corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

For example, when the first image semantic segmentation model and the second image semantic segmentation model are optimized according to the total loss corresponding to the first image semantic segmentation model and the total loss corresponding to the second image semantic segmentation model, the fifth sum of the total loss corresponding to the first image semantic segmentation model and the total loss corresponding to the second image semantic segmentation model may be calculated first; and optimizing the first image semantic segmentation model according to the fifth sum, optimizing the second image semantic segmentation model according to the fifth sum to obtain two optimized image semantic segmentation models, and determining a target image semantic segmentation model in the two optimized image semantic segmentation models so as to be used for subsequent image semantic segmentation processing based on the target image semantic segmentation model.

With reference to the description in S702, when optimizing the Sa model and the Sb model according to the total loss corresponding to the Sa model and the Sb model, respectively, the sum of the total loss corresponding to the Sa model and the total loss corresponding to the Sb model may be calculated first, and the Sa model and the Sb model may be optimized according to the sum, which may be referred to as the following formula 12:

L＝L_a+L_bequation 12

Wherein L represents the sum of the total loss corresponding to the Sa model and the total loss corresponding to the Sb model, L^aRepresents the total loss corresponding to the Sa model, L^bRepresenting the corresponding total loss of the Sb model.

It can be seen that, in the embodiment of the present application, on the basis of obtaining the cross entropy loss corresponding to each of the two image semantic segmentation models based on the annotated image, other losses corresponding to each of the two image semantic segmentation models are respectively determined based on the unlabeled image, and other losses other than the cross entropy loss are fully considered, so that the first image semantic segmentation model and the second image semantic segmentation model can be optimized respectively according to the total loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model, and thus the two image semantic segmentation models are optimized together according to the total loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model, and the accuracy of the image semantic segmentation model can be effectively improved.

EXAMPLE seven

Fig. 8 is a schematic block diagram of an optimization apparatus 80 of an image semantic segmentation model provided according to a seventh embodiment of the present application, for example, please refer to fig. 8, the optimization apparatus 80 of the image semantic segmentation model may include:

an acquiring unit 801 is used for acquiring an annotated image and an annotated image.

A first determining unit 802, configured to determine, based on the annotated image, cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same.

A second determining unit 803, configured to determine, based on the unlabeled image, other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, respectively; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability.

The optimizing unit 804 is configured to optimize the first image semantic segmentation model and the second image semantic segmentation model respectively according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively, and determine the target image semantic segmentation model based on the optimized first image semantic segmentation model and the optimized second image semantic segmentation model.

Optionally, if the other loss includes a consistency loss, the second determining unit 803 includes a first determining module, a second determining module, and a third determining module.

The first determining module is used for inputting the unmarked image into the image semantic segmentation model to obtain a first prediction result corresponding to the unmarked image, and performing augmentation transformation on the first prediction result to obtain a first numerical value corresponding to each frame pixel in the unmarked image.

And the second determining module is used for performing augmentation transformation on the unlabeled image and inputting a transformation result to the image semantic segmentation model to obtain a second numerical value corresponding to each frame of pixel in the unlabeled image.

The third determining module is used for determining consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to each frame of pixel; the image semantic segmentation model is any one of a first image semantic segmentation model and a second image semantic segmentation model.

Optionally, the third determining module includes a first determining submodule and a second determining submodule.

And the first determining submodule is used for determining the mean square error corresponding to each frame of pixels according to the first numerical value and the second numerical value corresponding to each frame of pixels.

And the second determining submodule is used for determining consistency loss corresponding to the image semantic segmentation model according to the mean square error corresponding to each frame of pixels.

Optionally, the second determining submodule is specifically configured to calculate a first sum of mean square deviations corresponding to pixels of each frame; determining the ratio of the first value to the third value as the consistency loss corresponding to the image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

Optionally, if the other loss includes a resistance training loss, the second determining unit 803 further includes a fourth determining module, a fifth determining module, and a sixth determining module.

And the fourth determining module is used for inputting the cascade operation results of the annotation images and the annotation data of the annotation images into the discriminator to obtain a first discrimination result corresponding to the annotation images and first characteristics of the annotation images in each layer of the M layer of the discriminator.

And the fifth determining module is used for inputting the cascade operation result of the first prediction result obtained by inputting the unmarked image and the unmarked image into the image semantic segmentation model into the discriminator to obtain a second discrimination result corresponding to the unmarked image and a second characteristic of the unmarked image in each layer.

A sixth determining module, configured to determine, according to the first determination result, the first feature of the labeled image in each layer, the second determination result, and the second feature of the unlabeled image in each layer, an antagonistic training loss corresponding to the image semantic segmentation model; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

Optionally, the sixth determining module includes a third determining submodule, a fourth determining submodule, a fifth determining submodule and a sixth determining submodule.

And the third determining submodule is used for determining game loss between the discriminator and the image semantic segmentation model according to the first discrimination result and the second discrimination result.

And the fourth determining submodule is used for determining the self-training loss corresponding to the image semantic segmentation model according to the second judgment result.

And the fifth determining submodule is used for determining the feature matching loss between the discriminator and the image semantic segmentation model according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer.

And the sixth determining submodule is used for determining the confrontation training loss corresponding to the image semantic segmentation model according to the game loss, the self-training loss and the feature matching loss.

Optionally, the fourth determining sub-module is configured to determine, if the determination result of the target pixel in the second determination result is greater than or equal to the preset threshold, a self-training loss corresponding to the image semantic segmentation model according to a cross entropy loss between the determination result of the target pixel and the prediction result corresponding to the target pixel in the first prediction result; and if the judgment results of all the pixels in the second judgment result are smaller than the preset threshold value, determining that the self-training loss corresponding to the image semantic segmentation model is 0.

Optionally, the fifth determining sub-module is specifically configured to determine, according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer, a feature matching loss between the discriminator and the image semantic segmentation model for each layer of features; calculating a second sum of the feature matching losses of the discriminator and the image semantic segmentation model aiming at the features of each layer; the second sum is determined as a loss of feature matching between the discriminator and the image semantic segmentation model.

Optionally, the fifth determining sub-module is specifically configured to determine a first coefficient corresponding to the self-training loss and a second coefficient corresponding to the feature matching loss, respectively; and determining the product of the self-training loss and the first coefficient, the product of the feature matching loss and the second coefficient and the third sum of the game loss as the confrontation training loss corresponding to the image semantic segmentation model.

Alternatively, if the other loss includes a stability loss, the second determining unit 803 includes a seventh determining module and an eighth determining module.

And the seventh determining module is used for determining the stability loss of the first image semantic segmentation model corresponding to each frame pixel according to the first type of first numerical value and the first type of second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model and the second type of first numerical value and the second type of second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model.

And the eighth determining module is used for determining the stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to each frame pixel of the first image semantic segmentation model.

Optionally, the seventh determining module includes a seventh determining sub-module, an eighth determining sub-module, and a ninth determining sub-module.

The seventh determining submodule is used for judging whether the first type first numerical value and the first type second numerical value corresponding to the pixel and the second type first numerical value and the second type second numerical value corresponding to the pixel meet the preset condition or not aiming at each frame of pixel; wherein the preset conditions include: the semantic labels of the various first numerical values and the second numerical values are consistent, and at least one of the various first numerical values and the various second numerical values is larger than a preset threshold value.

And the eighth determining submodule is used for determining the stability loss of the first image semantic segmentation model corresponding to the pixel according to the first class second numerical value, the second class second numerical value, the first consistency loss of the first image semantic segmentation model corresponding to the pixel and the second consistency loss of the second image semantic segmentation model corresponding to the pixel if the first class second numerical value, the second class second numerical value, the first consistency loss of the first image semantic segmentation model corresponding to the pixel and the second consistency loss of the second image semantic segmentation model corresponding to the pixel all meet the preset condition.

The ninth determining submodule is used for determining the stability loss of the first image semantic segmentation model corresponding to the pixel according to the first type second numerical value, the second type second numerical value and the fourth numerical value if the preset condition is not met; and the fourth numerical value is determined according to the second type first numerical value and the second type second numerical value corresponding to the pixel.

Optionally, the eighth determining sub-module is specifically configured to determine a mean square error between the first class of second numerical values and the second class of second numerical values if the first consistency loss is greater than the second consistency loss, and determine the mean square error as a stability loss corresponding to the first image semantic segmentation model for the pixel; and if the first consistency loss is equal to or less than the second consistency loss, determining that the stability loss of the first image semantic segmentation model corresponding to the pixel is 0.

Optionally, the ninth determining sub-module is specifically configured to determine a mean square error between the first type of second numerical value and the second type of second numerical value; and determining the product of the mean square error and the fourth numerical value as the stability loss of the first image semantic segmentation model corresponding to the pixel.

Optionally, the eighth determining module includes a tenth determining submodule and an eleventh determining submodule.

And the tenth determining submodule is used for calculating a fourth sum of the stability loss corresponding to each frame pixel of the first image semantic segmentation model.

An eleventh determining submodule, configured to determine a ratio of the fourth sum to the third value as a stability loss corresponding to the first image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

Optionally, the optimization unit 804 includes a first optimization module, a second optimization module, and a third optimization module.

And the first optimization module is used for respectively determining third coefficients corresponding to other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

And the second optimization module is used for summing the products of the other losses and the third coefficient corresponding to the second optimization module and the corresponding cross entropy losses to respectively determine the total losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

And the third optimization module is used for respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the total loss corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

Optionally, the third optimization module includes a first optimization submodule and a second optimization submodule.

And the first optimization submodule is used for calculating the fifth sum of the total loss corresponding to the first image semantic segmentation model and the total loss corresponding to the second image semantic segmentation model.

And the second optimization submodule is used for respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the fifth sum.

Optionally, the first determining unit 802 includes a ninth determining module and a tenth determining module.

And the ninth determining module is used for inputting the annotated image into the image semantic segmentation model to obtain a second prediction result corresponding to the annotated image.

The tenth determining module is used for determining the cross entropy loss corresponding to the image semantic segmentation model according to the second prediction result and the labeled data of the labeled image; the image semantic segmentation model is any one of a first image semantic segmentation model and a second image semantic segmentation model.

The optimization device 80 for the image semantic segmentation model provided in this embodiment of the application can execute the technical solution of the optimization method for the image semantic segmentation model shown in any one of the above embodiments, and its implementation principle and beneficial effect are similar to those of the optimization method for the image semantic segmentation model, and reference may be made to the implementation principle and beneficial effect of the optimization method for the image semantic segmentation model, which are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

Fig. 9 is a schematic block diagram of an electronic device 90 provided in an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the electronic device 90 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 90 can also be stored. The calculation unit 901, ROM902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 90 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 90 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the optimization method of the image semantic segmentation model. For example, in some embodiments, the optimization method of the image semantic segmentation model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 90 via ROM902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the optimization method of the image semantic segmentation model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g. by means of firmware) to perform an optimization method of the image semantic segmentation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for optimizing an image semantic segmentation model comprises the following steps:

acquiring an annotated image and an unlabeled image;

respectively determining cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same;

respectively determining other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the unlabeled image; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability;

2. The method of claim 1, wherein if the other loss comprises a consistency loss, determining a consistency loss corresponding to an image semantic segmentation model based on the unlabeled image comprises:

inputting the unmarked image into an image semantic segmentation model to obtain a first prediction result corresponding to the unmarked image, and performing augmentation transformation on the first prediction result to obtain a first numerical value corresponding to each frame of pixel in the unmarked image;

performing the augmentation transformation on the unlabeled image, and inputting a transformation result into the image semantic segmentation model to obtain a second numerical value corresponding to each frame of pixel in the unlabeled image;

determining consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to each frame pixel; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

3. The method of claim 2, wherein the determining the consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to the pixels of each frame comprises:

determining the mean square error corresponding to each frame pixel according to the first numerical value and the second numerical value corresponding to each frame pixel;

and determining consistency loss corresponding to the image semantic segmentation model according to the mean square error corresponding to each frame pixel.

4. The method of claim 3, wherein the determining the consistency loss corresponding to the image semantic segmentation model according to the mean square error corresponding to the pixels of each frame comprises:

calculating a first sum of mean square deviations corresponding to the pixels of each frame;

determining the ratio of the first value to the third value as the consistency loss corresponding to the image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

5. The method of any one of claims 1-3, wherein if the other loss comprises a countertraining loss, determining a corresponding countertraining loss for an image semantic segmentation model based on the unlabeled image comprises:

inputting the cascade operation results of the labeled image and the labeled data of the labeled image into a discriminator to obtain a first discrimination result corresponding to the labeled image and first characteristics of the labeled image in each layer of the M layer of the discriminator;

inputting the unmarked image and the unmarked image into the discriminator to obtain a second discrimination result corresponding to the unmarked image and a second feature of the unmarked image in each layer;

determining countervailing training loss corresponding to the image semantic segmentation model according to the first judgment result, the first feature of the labeled image in each layer, the second judgment result and the second feature of the unlabeled image in each layer; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

6. The method of claim 5, wherein the determining, according to the first determination result, the first feature of the labeled image in each layer, the second determination result, and the second feature of the unlabeled image in each layer, the countertraining loss corresponding to the image semantic segmentation model comprises:

determining game loss between the discriminator and the image semantic segmentation model according to the first discrimination result and the second discrimination result;

determining self-training loss corresponding to the image semantic segmentation model according to the second judgment result;

determining the feature matching loss between the discriminator and the image semantic segmentation model according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer;

and determining the countertraining loss corresponding to the image semantic segmentation model according to the game loss, the self-training loss and the feature matching loss.

7. The method of claim 6, wherein the determining a self-training loss corresponding to the image semantic segmentation model according to the second judgment result comprises:

if the judgment result of the target pixel in the second judgment result is greater than or equal to a preset threshold value, determining the self-training loss corresponding to the image semantic segmentation model according to the cross entropy loss between the judgment result of the target pixel and the prediction result corresponding to the target pixel in the first prediction result;

and if the judgment results of all the pixels in the second judgment result are smaller than the preset threshold, determining that the self-training loss corresponding to the image semantic segmentation model is 0.

8. The method of claim 6, wherein the determining a feature matching loss between the discriminator and the image semantic segmentation model according to a first feature of the labeled image at the respective layer and a second feature of the unlabeled image at the respective layer comprises:

determining feature matching loss between the discriminator and the image semantic segmentation model aiming at the features of each layer according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer;

calculating a second sum of the discriminator and the image semantic segmentation model for the feature matching loss between the features of each layer;

determining the second sum as a loss of feature matching between the discriminator and the image semantic segmentation model.

9. The method of claim 6, wherein the determining a corresponding countering training loss of the image semantic segmentation model according to the game loss, the self-training loss and the feature matching loss comprises:

respectively determining a first coefficient corresponding to the self-training loss and a second coefficient corresponding to the feature matching loss;

and determining the product of the self-training loss and the first coefficient, the product of the feature matching loss and the second coefficient and the third sum of the game loss as the confrontation training loss corresponding to the image semantic segmentation model.

10. The method according to any one of claims 1-3, wherein if the other loss includes a stability loss, determining a stability loss corresponding to the first image semantic segmentation model based on the unlabeled image comprises:

determining the stability loss of the first image semantic segmentation model corresponding to each frame pixel according to the first type of first numerical value and the first type of second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model and the second type of first numerical value and the second type of second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model;

and determining the stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to the pixels of each frame of the first image semantic segmentation model.

11. The method according to claim 10, wherein the determining the stability loss of the first image semantic segmentation model for each frame pixel according to the first class first numerical value and the first class second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model and the second class first numerical value and the second class second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model comprises:

judging whether a first type first numerical value and a first type second numerical value corresponding to the pixel and a second type first numerical value and a second type second numerical value corresponding to the pixel meet preset conditions or not aiming at each frame of pixel; wherein the preset conditions include: semantic labels of the various first numerical values and the second numerical values are consistent, and at least one of the various first numerical values and the various second numerical values is larger than a preset threshold value;

if the first type of second numerical values and the second type of second numerical values meet preset conditions, determining the stability loss of the first image semantic segmentation model corresponding to the pixel according to the first consistency loss of the first image semantic segmentation model corresponding to the pixel and the second consistency loss of the second image semantic segmentation model corresponding to the pixel;

if the preset condition is not met, determining the stability loss of the first image semantic segmentation model corresponding to the pixel according to the first type second numerical value, the second type second numerical value and the fourth numerical value; and the fourth numerical value is determined according to the second type first numerical value and the second type second numerical value corresponding to the pixel.

12. The method of claim 11, wherein the determining a stability loss of the first image semantic segmentation model for the pixel correspondence from the first class of second numerical values, the second class of second numerical values, a first consistency loss of the first image semantic segmentation model for the pixel correspondence, and a second consistency loss of the second image semantic segmentation model for the pixel correspondence comprises:

if the first consistency loss is larger than the second consistency loss, determining a mean square error between the first class second numerical value and the second class second numerical value, and determining the mean square error as a stability loss corresponding to the pixel by the first image semantic segmentation model;

and if the first consistency loss is equal to or less than the second consistency loss, determining that the stability loss of the first image semantic segmentation model corresponding to the pixel is 0.

13. The method of claim 11, wherein the determining a stability loss corresponding to the first image semantic segmentation model for the pixel according to the first class second numerical value, the second class second numerical value, and a fourth numerical value comprises:

determining a mean square error between the first class of second values and the second class of second values;

and determining the product of the mean square error and the fourth numerical value as the stability loss corresponding to the pixel by the first image semantic segmentation model.

14. The method of claim 10, wherein the determining the stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to the first image semantic segmentation model for the each frame pixel comprises:

calculating a fourth sum of stability losses corresponding to the pixels of each frame by the first image semantic segmentation model;

determining the ratio of the fourth sum to the third value as the stability loss corresponding to the first image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

15. The method according to any one of claims 1-3, wherein the optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the cross entropy loss and other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively comprises:

respectively determining third coefficients corresponding to other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model;

summing products of the other losses and the third coefficient which respectively correspond to the other losses and the corresponding cross entropy losses, and respectively determining total losses which respectively correspond to the first image semantic segmentation model and the second image semantic segmentation model;

and respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the total loss corresponding to the first image semantic segmentation model and the second image semantic segmentation model.

16. The method of claim 15, wherein the optimizing the first image semantic segmentation model and the second image semantic segmentation model according to their respective total losses comprises:

calculating a fifth sum of the total loss corresponding to the first image semantic segmentation model and the total loss corresponding to the second image semantic segmentation model;

and respectively optimizing the first image semantic segmentation model and the second image semantic segmentation model according to the fifth sum.

17. The method of any of claims 1-3, wherein determining a cross-entropy loss corresponding to an image semantic segmentation model based on the annotated image comprises:

inputting the annotated image into an image semantic segmentation model to obtain a second prediction result corresponding to the annotated image;

determining cross entropy loss corresponding to the image semantic segmentation model according to the second prediction result and the labeled data of the labeled image; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

18. An optimization device of an image semantic segmentation model, comprising:

the acquiring unit is used for acquiring an annotated image and an unlabelled image;

the first determining unit is used for respectively determining the cross entropy losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model based on the annotation image; the network structures of the first image semantic segmentation model and the second image semantic segmentation model are the same;

a second determining unit, configured to determine, based on the unlabeled image, other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model, respectively; wherein the other loss comprises at least one of a loss of consistency, a loss of resistance training, or a loss of stability;

19. The apparatus of claim 18, wherein the second determining unit comprises a first determining module, a second determining module, and a third determining module if the other loss comprises a consistency loss;

the first determining module is used for inputting the unlabeled image into an image semantic segmentation model to obtain a first prediction result corresponding to the unlabeled image, and performing augmentation transformation on the first prediction result to obtain a first numerical value corresponding to each frame of pixel in the unlabeled image;

the second determining module is used for performing the augmentation transformation on the unlabeled image and inputting a transformation result to the image semantic segmentation model to obtain a second numerical value corresponding to each frame of pixel in the unlabeled image;

the third determining module is configured to determine a consistency loss corresponding to the image semantic segmentation model according to the first numerical value and the second numerical value corresponding to each frame pixel; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

20. The apparatus of claim 19, wherein the third determination module comprises a first determination submodule and a second determination submodule;

the first determining submodule is used for determining the mean square error corresponding to each frame of pixels according to the first numerical value and the second numerical value corresponding to each frame of pixels;

21. The apparatus of claim 20, wherein,

the second determining submodule is specifically configured to calculate a first sum of mean square deviations corresponding to the pixels of each frame; determining the ratio of the first value to the third value as the consistency loss corresponding to the image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

22. The apparatus according to any of claims 18-20, wherein the second determining unit further comprises a fourth determining module, a fifth determining module, and a sixth determining module if the other loss comprises a resistance training loss;

the fourth determining module is configured to input the cascade operation result of the annotation image and the annotation data of the annotation image to a discriminator to obtain a first discrimination result corresponding to the annotation image and first features of the annotation image in each layer of the layer M of the discriminator;

the fifth determining module is configured to input a cascade operation result of a first prediction result obtained by inputting the unlabeled image and the unlabeled image into the image semantic segmentation model to the discriminator, so as to obtain a second discrimination result corresponding to the unlabeled image and a second feature of the unlabeled image in each layer;

the sixth determining module is configured to determine, according to the first determination result, the first feature of the labeled image in each layer, the second determination result, and the second feature of the unlabeled image in each layer, an antagonistic training loss corresponding to the image semantic segmentation model; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

23. The apparatus of claim 22, wherein the sixth determination module comprises a third determination submodule, a fourth determination submodule, a fifth determination submodule, and a sixth determination submodule;

the third determining submodule is used for determining game loss between the discriminator and the image semantic segmentation model according to the first discrimination result and the second discrimination result;

the fourth determining submodule is used for determining the self-training loss corresponding to the image semantic segmentation model according to the second judgment result;

the fifth determining submodule is used for determining the feature matching loss between the discriminator and the image semantic segmentation model according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer;

24. The apparatus of claim 23, wherein,

the fourth determining sub-module is configured to determine, if the second determination result has a determination result of the target pixel that is greater than or equal to a preset threshold, a self-training loss corresponding to the image semantic segmentation model according to a cross entropy loss between the determination result of the target pixel and a prediction result corresponding to the target pixel in the first prediction result; and if the judgment results of all the pixels in the second judgment result are smaller than the preset threshold, determining that the self-training loss corresponding to the image semantic segmentation model is 0.

25. The apparatus of claim 23, wherein,

the fifth determining submodule is specifically configured to determine, according to the first feature of the labeled image in each layer and the second feature of the unlabeled image in each layer, a feature matching loss between the discriminator and the image semantic segmentation model for the features of each layer; calculating a second sum of the discriminator and the image semantic segmentation model for the feature matching loss between the features of each layer; determining the second sum as a loss of feature matching between the discriminator and the image semantic segmentation model.

26. The apparatus of claim 25, wherein,

the fifth determining submodule is specifically configured to determine a first coefficient corresponding to the self-training loss and a second coefficient corresponding to the feature matching loss, respectively; and determining the product of the self-training loss and the first coefficient, the product of the feature matching loss and the second coefficient and the third sum of the game loss as the confrontation training loss corresponding to the image semantic segmentation model.

27. The apparatus according to any of claims 18-20, wherein the second determining unit comprises a seventh determining module and an eighth determining module if the other loss comprises a stability loss;

the seventh determining module is configured to determine, according to the first-class first numerical value and the first-class second numerical value corresponding to each frame pixel obtained by the first image semantic segmentation model, and the second-class first numerical value and the second-class second numerical value corresponding to each frame pixel obtained by the second image semantic segmentation model, a stability loss corresponding to each frame pixel of the first image semantic segmentation model;

the eighth determining module is configured to determine a stability loss corresponding to the first image semantic segmentation model according to the stability loss corresponding to each frame pixel of the first image semantic segmentation model.

28. The apparatus of claim 27, wherein the seventh determination module comprises a seventh determination submodule, an eighth determination submodule, and a ninth determination submodule;

the seventh determining submodule is configured to, for each frame of pixels, determine whether a first type first numerical value and a first type second numerical value corresponding to the pixel, and a second type first numerical value and a second type second numerical value corresponding to the pixel satisfy a preset condition; wherein the preset conditions include: semantic labels of the various first numerical values and the second numerical values are consistent, and at least one of the various first numerical values and the various second numerical values is larger than a preset threshold value;

the eighth determining submodule is configured to determine, if all the first and second types of numerical values meet a preset condition, a stability loss corresponding to the pixel of the first image semantic segmentation model according to the first type of second numerical value, the second type of second numerical value, a first consistency loss corresponding to the pixel of the first image semantic segmentation model, and a second consistency loss corresponding to the pixel of the second image semantic segmentation model;

the ninth determining submodule is configured to determine, if the preset condition is not met, a stability loss corresponding to the pixel by the first image semantic segmentation model according to the first type second numerical value, the second type second numerical value, and the fourth numerical value; and the fourth numerical value is determined according to the second type first numerical value and the second type second numerical value corresponding to the pixel.

29. The apparatus of claim 28, wherein,

the eighth determining submodule is specifically configured to determine a mean square error between the first class second numerical value and the second class second numerical value if the first consistency loss is greater than the second consistency loss, and determine the mean square error as a stability loss corresponding to the pixel of the first image semantic segmentation model; and if the first consistency loss is equal to or less than the second consistency loss, determining that the stability loss of the first image semantic segmentation model corresponding to the pixel is 0.

30. The apparatus of claim 28, wherein,

the ninth determining submodule is specifically configured to determine a mean square error between the first type second numerical value and the second type second numerical value; and determining the product of the mean square error and the fourth numerical value as the stability loss corresponding to the pixel by the first image semantic segmentation model.

31. The apparatus of claim 27, wherein the eighth determining means comprises a tenth determining submodule and an eleventh determining submodule;

the tenth determining submodule is configured to calculate a fourth sum of stability losses corresponding to the pixels of each frame by the first image semantic segmentation model;

the eleventh determining submodule is configured to determine a ratio of the fourth sum to the third value as a stability loss corresponding to the first image semantic segmentation model; and the third numerical value is the product of the width and the height of the unmarked image.

32. The apparatus of any one of claims 18-20, wherein the optimization unit comprises a first optimization module, a second optimization module, and a third optimization module;

the first optimization module is used for respectively determining third coefficients corresponding to other losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model;

the second optimization module is configured to sum the product of the other loss and the third coefficient corresponding to each other with the corresponding cross entropy loss, and determine the total loss corresponding to each of the first image semantic segmentation model and the second image semantic segmentation model;

the third optimization module is configured to optimize the first image semantic segmentation model and the second image semantic segmentation model respectively according to total losses corresponding to the first image semantic segmentation model and the second image semantic segmentation model respectively.

33. The apparatus of claim 32, wherein the third optimization module comprises a first optimization submodule and a second optimization submodule;

the first optimization submodule is used for calculating a fifth sum of the total loss corresponding to the first image semantic segmentation model and the total loss corresponding to the second image semantic segmentation model;

34. The apparatus according to any of claims 18-20, wherein the first determining unit comprises a ninth determining module and a tenth determining module;

the ninth determining module is configured to input the annotated image to an image semantic segmentation model to obtain a second prediction result corresponding to the annotated image;

the tenth determining module is configured to determine, according to the second prediction result and the labeled data of the labeled image, a cross entropy loss corresponding to the image semantic segmentation model; the image semantic segmentation model is any one of the first image semantic segmentation model and the second image semantic segmentation model.

35. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of optimizing an image semantic segmentation model according to any one of claims 1-17.

36. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method of optimizing an image semantic segmentation model according to any one of claims 1-17.

37. A computer program product comprising a computer program which, when executed by a processor, implements a method of optimization of an image semantic segmentation model according to any one of claims 1-17.