CN114821063A

CN114821063A - Semantic segmentation model generation method and device and image processing method

Info

Publication number: CN114821063A
Application number: CN202210512775.1A
Authority: CN
Inventors: 何悦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-07-29

Abstract

The disclosure provides a semantic segmentation model generation method and device and an image processing method, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as image processing, object detection and segmentation. The specific implementation scheme is as follows: acquiring sample data, wherein the sample data comprises image data which is not marked and image data which is marked; respectively inputting the sample data into the two neural networks to obtain output results of the two neural networks, wherein the output results comprise: outputting the confidence coefficient and outputting a label corresponding to the confidence coefficient; updating network parameters of the two neural networks by using sample data according to output results of the two neural networks; and taking the label corresponding to the first neural network as a supervision signal of the output confidence of the second neural network, and taking the label corresponding to the second neural network as a supervision signal of the output confidence of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

Description

Semantic segmentation model generation method and device and image processing method

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of deep learning and computer vision technologies, and can be applied to image processing, object detection, segmentation, and other scenes.

Background

Unlike the task of image classification, the labeling of data is difficult and costly for the task of semantic segmentation, requiring a label to be labeled for each pixel of the image, but RGB data can be easily obtained, for example, by taking the image through a camera. Therefore, how to improve the performance of the semantic segmentation model by using a large amount of label-free data becomes a key problem of semi-supervised semantic segmentation field research.

Disclosure of Invention

The disclosure provides a generation method and device of a semantic segmentation model, an image processing method, an electronic device, a computer program product and a storage medium.

According to an aspect of the present disclosure, a method for generating a semantic segmentation model is provided, including: acquiring sample data, wherein the sample data comprises image data which is not marked and image data which is marked; respectively inputting the sample data into the two neural networks to obtain output results of the two neural networks, wherein the output results comprise: outputting the confidence coefficient and outputting a label corresponding to the confidence coefficient; updating network parameters of the two neural networks by using sample data according to output results of the two neural networks; and taking the label corresponding to the first neural network as a supervision signal of the output confidence of the second neural network, and taking the label corresponding to the second neural network as a supervision signal of the output confidence of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

According to another aspect of the present disclosure, there is provided a method of processing an image, including: acquiring an image to be processed; and inputting the image into a semantic segmentation model for classification, wherein the semantic segmentation model is generated based on the above generation method training of the semantic segmentation model.

According to another aspect of the present disclosure, there is also provided an apparatus for generating a semantic segmentation model, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring sample data, and the sample data comprises image data which is not marked and image data which is marked; the input module is used for inputting the sample data into the two neural networks respectively to obtain output results of the two neural networks, wherein the output results comprise: outputting the confidence coefficient and outputting a label corresponding to the confidence coefficient; the first processing module is arranged to update the network parameters of the two neural networks by using sample data according to the output results of the two neural networks; and the second processing module is used for taking the label corresponding to the first neural network as a supervision signal of the output confidence coefficient of the second neural network and taking the label corresponding to the second neural network as a supervision signal of the output confidence coefficient of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: a second acquisition module configured to acquire an image to be processed; and the classification module is used for inputting the image into the semantic segmentation model for classification processing, wherein the semantic segmentation model is generated based on the training of the generation method of the semantic segmentation model.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the above method for generating a semantic segmentation model and the method for processing an image.

According to still another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above generation method of a semantic segmentation model and the above processing method.

According to yet another aspect of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the above method of generating a semantic segmentation model and method of processing an image.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of generating a semantic segmentation model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a cross-pseudo-supervised based semi-supervised semantic segmentation algorithm according to an embodiment of the present disclosure;

FIG. 3a is a schematic diagram of a neural network, according to an embodiment of the present disclosure;

FIG. 3b is a schematic diagram of a neural network, according to an embodiment of the present disclosure;

FIG. 3c is a schematic illustration of a counter-propagation in accordance with an embodiment of the present disclosure;

FIG. 4 is a flow chart of a method of processing an image according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an apparatus for generating a semantic segmentation model according to an embodiment of the present disclosure;

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure;

fig. 7 is a block diagram of a device for processing an image according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Before explaining the technical scheme provided by the present disclosure, the related work of the next semi-supervised semantic segmentation is explained.

The work of semi-supervised segmentation is summarized in two categories: self-tracking (semi-supervised learning) and coherent learning. In general, self-tracking is an off-line process, while consistency learning is processed on-line.

Self-tracing is mainly divided into 3 steps. In the first step, a model is trained on labeled data. And secondly, generating a pseudo label for the label-free data set by using the pre-trained model. And thirdly, retraining a model by using the true labels with the labeled data sets and the false labels without the labeled data sets.

The core idea of Consistency learning is as follows: the models are encouraged to have similar outputs for the same sample after different transformations. Here, "transformation" includes gaussian noise, random rotation, change in color, and the like.

Currently, the most advanced semi-supervised semantic segmentation algorithm is Cross Pseudo-supervised (CPS) -based semi-supervised semantic segmentation algorithm, which has the best performance on cityscaps (city landscape data sets) and PASCAL VOC 2012 data sets, but has the problem that in an actual data scene, if the non-supervised data and the supervised data are not homologous and have low similarity, Cross Supervision is very likely to bias the modeling, so that the semi-supervised data does not generate forward value.

The present disclosure improves on the above problem by adding a Stabilization Constraint to the cross-supervised loss, i.e. when the two network outputs supervised each other are more different, no loss value is calculated. The technical solutions provided by the embodiments of the present disclosure are explained in detail below.

Fig. 1 is a flowchart of a method for generating a semantic segmentation model according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:

step S102, sample data is obtained, wherein the sample data comprises image data which is not marked and image data which is marked.

In this step, a neural network model is trained using a large number of unlabeled samples and a small number of labeled sample data.

Semi-supervised learning is a learning mode between supervised learning and unsupervised learning. In supervised learning, the class and class label of a sample are known, and the purpose of learning is to find the link between the features of the sample and the class label. Generally speaking, the greater the number of training samples, the higher the accuracy of the trained machine learning model. However, in many implementation problems, the labeled sample data is very rare due to the high cost of manually labeling the sample. On the other hand, unlabeled data are easy to collect, and the number of the unlabeled data is often hundreds of times that of labeled data, so that the semi-supervised learning trains a neural network model by using a large number of unlabeled samples and a small number of labeled sample data, and the problem of insufficient labeled samples is solved.

Step S104, respectively inputting the sample data into the two neural networks to obtain output results of the two neural networks, wherein the output results comprise: and outputting the confidence and the label corresponding to the confidence.

FIG. 2 is a schematic diagram of a cross-pseudo-supervision-based semi-supervised semantic segmentation algorithm according to an embodiment of the present disclosure, as shown in FIG. 2, the CPS algorithm is designed very compactly, and two neural networks f (θ) are used in training ₁ ) And f (theta) ₂ ). Thus, for the same input image X, there may be two different outputs P ₁ And P ₂ . By making P pairs ₁ And P ₂ Certain operation is carried out to respectively obtain corresponding one-hot labels Y ₁ And Y ₂ 。

In executing step S104, sample data is input to the neural network f (θ), respectively ₁ ) And f (theta) ₂ )，P ₁ And P ₂ Is the output confidence of two neural networks, Y ₁ And Y ₂ Are respectively P ₁ And P ₂ The corresponding one-hot tag.

And step S106, updating the network parameters of the two neural networks by using the sample data according to the output results of the two neural networks.

In the above, the CPS algorithm is problematic in that in an actual data scene, if the non-supervised data and the supervised data are not homologous and have low similarity, cross-supervision is highly likely to bias the modeling, so that the semi-supervised data does not generate forward value. The embodiment of the present disclosure improves on the above problem, and adds a stable constraint to the cross supervision loss, that is, when the output difference of two networks supervised with each other is small, no loss value is calculated; when the output of two networks supervised each other is very different, no loss value is calculated. The penalty value is used to update the weight parameters of the two neural networks by back propagation.

In executing step S106, according to f (theta) ₁ ) And f (theta) ₂ ) Calculates the loss value (including the output confidence and the corresponding one-hot label).

It should be noted that the loss value is calculated by using a loss function, the loss function is used to measure the degree of inconsistency between the predicted value and the true value of the model, and the smaller the calculation result of the loss function is, the better the robustness of the model is.

And S108, taking the label corresponding to the first neural network as a supervision signal of the output confidence coefficient of the second neural network, and taking the label corresponding to the second neural network as a supervision signal of the output confidence coefficient of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

Referring to fig. 2, when step S108 is performed, the two pseudo tags are used as supervision signals, similar to the operation in self-tracking. For example, with Y ₁ As P ₂ By monitoring of Y ₂ As P ₁ And (4) supervision.

By the method, stability constraint is added in cross supervision loss, namely whether a loss value is calculated or not is determined according to output results of two networks which are supervised mutually, so that samples which can be stably predicted can be supervised mutually by the two networks, and unsupervised data can better optimize the performance of a semantic segmentation model. And further, the technical effect of improving the precision of image classification in the process of classifying the images by using the semantic segmentation model can be realized.

According to an alternative embodiment of the present disclosure, the step S106 is executed to update the network parameters of the two neural networks by using the sample data according to the output result of the two neural networks, and the method includes the following steps: and if the labels corresponding to the output confidence degrees of the two neural networks are the same and the output confidence degrees of the two neural networks are both larger than a preset confidence degree threshold value, calculating a loss value of an output result by using a preset loss function, wherein the loss value is used for updating the weight parameters of the two neural networks through back propagation.

If the prediction labels of the two mutually supervised neural networks are the same and the two output confidences are both greater than the set confidence threshold, it is indicated that the output results of the two mutually supervised neural networks are less different. If this condition is met, the loss value of the output result is calculated using a preset loss function, and the training sample is used to update the network parameters.

According to another optional embodiment of the present disclosure, if the labels corresponding to the output confidences of the two neural networks are different, or the output confidence of any one of the two neural networks is smaller than a preset confidence threshold, the loss value of the output result is rejected to be calculated by using a preset loss function.

As an alternative embodiment, when the output results of the two neural networks supervised with each other in fig. 2 are different greatly, the loss value of the output result is not calculated, and the weight parameters of the two neural networks cannot be updated through back propagation by using the loss value of the output result. That is, if the output confidence of the two neural networks corresponds to different labels, or the output confidence of any one of the two neural networks is smaller than the preset confidence threshold, it indicates that the difference between the output results of the two neural networks supervised each other is large, in this case, the loss value of the output result is no longer calculated by using the preset loss function, and the training sample will not be used to update the network parameters.

In another alternative embodiment of the present disclosure, the loss value of the output result is calculated by using a preset loss function, and the method is implemented by: inputting a true value label corresponding to the label and the sample data into a cross entropy loss function; and taking the output result of the cross entropy loss function as a loss value.

The Cross entropy loss function (Cross entropy loss) is the most commonly used loss function in classification, and the Cross entropy is used for measuring the difference between two probability distributions and is used for measuring the difference between the learned distribution and the true distribution of the learning model.

The cross entropy loss function is calculated as follows:

H(P ₁ ，Q ₁ )＝-∑ _i P ₁ (i)+log Q ₁ (i) wherein P is ₁ For true distribution, Q ₁ The learned distributions for the model.

For example, for a five-classification problem, the true distribution probability P ₁ ＝[1，0，0，0，0]Model learned distribution probability Q ₁ ＝[0.4，0.3，0.05，0.05，0.2]The result of calculation according to the above formula of cross entropy loss function is as follows:

H(P ₁ ，Q ₁ )＝-(1log0.4+0log0.3+0log0.05+0log0.05+0log0.2)

＝-log0.4

≈0.916

if the distribution probability learned by the model is closer to the real distribution Q ₁ ＝[0.98，0.01，0，0，0.01]The result of calculation according to the above formula of cross entropy loss function is as follows:

H(P ₁ ，Q ₁ )＝-(1log0.98+0log0.01+0log0+0log0+0log0.01)

＝-log0.98

≈0.02

from the above calculation results, the smaller the calculation result of the cross entropy loss function is, the smaller the difference between the distribution learned by the learning model and the true distribution is.

In this step, the pseudo label output by the model and the true label corresponding to the sample data are input to the calculation formula of the cross entropy loss function for calculation, and the calculation result is used as the loss value of the model output result.

By the method, the loss value of the output result of the model can be accurately calculated by using the cross entropy loss function.

In other alternative embodiments of the present disclosure, calculating the loss value of the output result by using the preset loss function may also be implemented by: inputting a true value label corresponding to the label and the sample data into a target loss function; wherein, the target loss function is used for measuring the similarity of the two sample sets; and taking the output result of the target loss function as a loss value.

In this step, the target loss function is a Dice loss function, and the calculation formula of the Dice loss function is as follows:

wherein X represents a predicted value, Y represents a true value, | X |, Nd.Y | represents the intersection of the two sets, and the smaller the calculation result of the Dice loss function is, the more similar the predicted value and the true value are.

In the step, the intersection of the pseudo label predicted by the model and the truth label corresponding to the sample data is divided by the total pixels to calculate, all pixels of one category are taken as a whole to be considered, and the ratio of the intersection in the total is calculated, so that the influence of a large number of background pixels is avoided, and a better segmentation effect can be obtained.

Through the steps, the loss value of the output result of the model is calculated by using the Dice loss function, so that the problem of sample imbalance can be solved, and the segmentation effect of the semantic segmentation model is improved. And further, the technical effect of improving the precision of image classification in the process of classifying the images by using the semantic segmentation model can be realized.

In other alternative embodiments of the present disclosure, after calculating the loss value of the output result by using the preset loss function, the loss value needs to be returned to the two neural networks through back propagation; determining target hidden layers in the two neural networks according to the loss value, wherein the target hidden layers are hidden layers which cause errors between the labels and true value labels corresponding to the sample data; and reducing the weight parameters of the target hidden layer to update the weight parameters of the two neural networks.

Fig. 3a is a schematic diagram of a neural network according to an embodiment of the disclosure, and as shown in fig. 3a, the neural network is actually a function, and there are many weight parameters in the function, which are determined by training or are approximated, and back propagation is a method for determining the weight parameters.

FIG. 3b is a schematic diagram of a neural network according to an embodiment of the present disclosure, and as shown in FIG. 3b, each training provides an output that is separated from the true value by an error function (i.e., a loss function) that is required to describe the error.

Fig. 3c is a schematic diagram of a back propagation according to an embodiment of the present disclosure, as shown in fig. 3c, except for the structure of the model and the problem of the data itself, it is generally assumed that errors are caused by the weight parameters, and then to adjust the weights, the errors are "propagated" back layer by layer from the output layer, and the reason is found to be the back propagation. In this process, a gradient descent method is usually used, which is actually a partial derivative in the calculus and a chain rule, and an error is reduced by finding an extreme value of an error function.

As shown in fig. 3c, the errors are returned by back-propagation to the concealment layers H21, H22 and H23, the weight parameters of which are lowered if the error between the predicted value y and the true value y is caused by the concealment layer. If the error between the predicted value y and the real value y is not caused by the hidden layer, the error is returned to the hidden layers H11, H12 and H13 by continuing to propagate in reverse until the hidden layer causing the error between the predicted value y and the real value y is found, and then the weight parameter of the hidden layer is lowered.

In this embodiment, the loss value calculated by using the loss function is "propagated" back from the output layers of the two neural networks supervised each other layer by layer through back propagation, a hidden layer causing an error between a pseudo label and a true label is found, and then the weight parameter of the hidden layer is reduced, thereby realizing updating of the weight parameters of the two neural networks.

As an alternative embodiment, the first neural network and the second neural network have the same structure, and the initialization parameters are different.

See the figure2, two neural networks f (θ) that supervise each other ₁ ) And f (theta) ₂ ) The same structure is used but the initialization is done using different initialization parameters.

Before the sample data is input into the two neural networks respectively in step S104, the first neural network and the second neural network need to be initialized randomly twice.

In the disclosed embodiment, the kaiming _ normal function in the PyTorch framework is used for two random initializations without specific constraints on the distribution of initializations.

PyTorch is an open source Python machine learning library used for applications such as natural language processing.

Fig. 4 is a flowchart of a method for processing an image according to an embodiment of the disclosure, as shown in fig. 4, including the following steps:

step S402, an image to be processed is acquired.

Step S404, inputting the image into a semantic segmentation model for classification processing, wherein the semantic segmentation model is generated based on the above generation method training of the semantic segmentation model.

It should be noted that, reference may be made to the description related to the embodiment shown in fig. 1 for implementation of the embodiment shown in fig. 4, and details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

Fig. 5 is a block diagram of a device for generating a semantic segmentation model according to an embodiment of the present disclosure, and as shown in fig. 5, the device includes:

the first obtaining module 50 is configured to obtain sample data, where the sample data includes image data that is not labeled and image data that is labeled.

The input module 52 is configured to input the sample data into the two neural networks respectively to obtain output results of the two neural networks, where the output results include: and outputting the confidence and the label corresponding to the confidence.

Referring to FIG. 2, the CPS algorithm is very compact in design, and two neural networks f (θ) are used for training ₁ ) And f (theta) ₂ ). Thus, for the same input image X, there may be two different outputs P ₁ And P ₂ . By making P pairs ₁ And P ₂ Certain operation is carried out to respectively obtain corresponding one-hot labels Y ₁ And Y ₂ 。

In executing step S104, the sample data is input to the neural network f (θ), respectively ₁ ) And f (theta) ₂ )，P ₁ And P ₂ Is the output confidence of the two neural networks, Y ₁ And Y ₂ Are respectively P ₁ And P ₂ The corresponding one-hot tag.

The first processing module 54 is configured to update the network parameters of the two neural networks by using the sample data according to the output results of the two neural networks.

In the above, the CPS algorithm is problematic in that in an actual data scene, if the non-supervised data and the supervised data are not homologous and have low similarity, cross-supervision is highly likely to bias the modeling, so that the semi-supervised data does not generate forward value. The embodiment of the disclosure improves the above problem, and adds a stable constraint in the cross supervision loss, that is, when the output difference of two networks supervised with each other is small, the loss value is calculated; when the output of two networks supervised each other is very different, no loss value is calculated. The penalty value is used to update the weight parameters of the two neural networks by back propagation.

The first processing module 54 processes according to f (θ) ₁ ) And f (theta) ₂ ) Calculates the loss value (including the output confidence and the corresponding one-hot label).

And the second processing module 56 is configured to use the label corresponding to the first neural network as the supervisory signal for the output confidence of the second neural network, and use the label corresponding to the second neural network as the supervisory signal for the output confidence of the first neural network, wherein the two neural networks include the first neural network and the second neural network.

The second processing block 56 performs operations similar to those in self-tracing, with these two pseudo-tags as supervisory signals. For example, with Y ₁ As P ₂ By monitoring of Y ₂ As P ₁ And (4) supervision.

By adding stable constraint in cross supervision loss, namely calculating a loss value according to output results of two networks supervised with each other, samples which can be stably predicted can be supervised with each other by the two networks, so that unsupervised data can better optimize the performance of a semantic segmentation model. And further, the technical effect of improving the precision of image classification in the process of classifying the images by using the semantic segmentation model can be realized.

According to an optional embodiment of the present disclosure, the first processing module 54 is further configured to, when the labels corresponding to the output confidences of the two neural networks are the same and the output confidences of the two neural networks are both greater than the preset confidence threshold, calculate a loss value of the output result by using a preset loss function, where the loss value is used to update the weight parameters of the two neural networks through back propagation.

According to another alternative embodiment of the present disclosure, the above apparatus further comprises: and the third processing module is set to refuse to calculate the loss value of the output result by using the preset loss function under the condition that the labels corresponding to the output confidence degrees of the two neural networks are different or the output confidence degree of any one of the two neural networks is smaller than a preset confidence degree threshold value.

As an alternative embodiment of the present disclosure, the first processing module 54 includes: a first input unit configured to input a true value label corresponding to a label and sample data into a cross entropy loss function; a first processing unit arranged to take the output result of the cross entropy loss function as a loss value.

According to another alternative embodiment of the present disclosure, the first processing module 54 further includes: a second input unit configured to input a true value label corresponding to the label and the sample data to the target loss function; wherein, the target loss function is used for measuring the similarity of the two sample sets; a second processing unit arranged to take the output result of the target loss function as a loss value.

In some optional embodiments of the present application, the apparatus further comprises: a fourth processing module configured to return the loss values to the two neural networks by back propagation; the determining module is set to determine target hidden layers in the two neural networks according to the loss value, wherein the target hidden layers are hidden layers causing errors between the labels and true value labels corresponding to the sample data; and the fifth processing module is set to reduce the weight parameters of the target hidden layer so as to update the weight parameters of the two neural networks.

As an alternative embodiment of the present application, the first neural network and the second neural network have the same structure and different initialization parameters; the above-mentioned device still includes: and the initialization module is set to perform random initialization on the first neural network and the second neural network twice before the sample data is respectively input into the two neural networks.

Referring to FIG. 2, two neural networks f (θ) that supervise each other ₁ ) And f (theta) ₂ ) Using the same structure but with different initialization parametersAnd (5) initializing.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the generation method of the semantic segmentation model. For example, in some embodiments, the generation method of the semantic segmentation model described above may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the generation method of a semantic segmentation model described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the generation method of the semantic segmentation model.

Fig. 7 is a block diagram of a structure of an image processing apparatus according to an embodiment of the present disclosure, as shown in fig. 7, the apparatus including:

a second acquiring module 70 configured to acquire an image to be processed.

The classification module 72 is configured to input the image into a semantic segmentation model for classification, wherein the semantic segmentation model is generated based on the above generation method of the semantic segmentation model.

It should be noted that, reference may be made to the description related to the embodiment shown in fig. 1 for implementation of the embodiment shown in fig. 7, and details are not repeated here.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A generation method of a semantic segmentation model comprises the following steps:

acquiring sample data, wherein the sample data comprises image data which is not marked and image data which is marked;

inputting the sample data into two neural networks respectively to obtain output results of the two neural networks, wherein the output results comprise: outputting a confidence coefficient and a label corresponding to the output confidence coefficient;

updating network parameters of the two neural networks by using the sample data according to output results of the two neural networks;

and taking the label corresponding to the first neural network as a supervision signal of the output confidence degree of a second neural network, and taking the label corresponding to the second neural network as a supervision signal of the output confidence degree of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

2. The method of claim 1, wherein updating network parameters of the two neural networks with the sample data according to output results of the two neural networks comprises:

and if the labels corresponding to the output confidence degrees of the two neural networks are the same and the output confidence degrees of the two neural networks are both larger than a preset confidence degree threshold value, calculating a loss value of the output result by using a preset loss function, wherein the loss value is used for updating the weight parameters of the two neural networks through back propagation.

3. The method of claim 2, wherein the method further comprises:

and if the labels corresponding to the output confidence degrees of the two neural networks are different, or the output confidence degree of any one of the two neural networks is smaller than the preset confidence degree threshold value, refusing to use the preset loss function to calculate the loss value of the output result.

4. The method of claim 2, wherein said calculating a loss value of said output result using a preset loss function comprises:

inputting the label and a truth label corresponding to the sample data into a cross entropy loss function;

and taking the output result of the cross entropy loss function as the loss value.

5. The method of claim 2, wherein said calculating a loss value of said output result using a preset loss function further comprises:

inputting the label and a true value label corresponding to the sample data into a target loss function; wherein the target loss function is used for measuring the similarity of two sample sets;

and taking the output result of the target loss function as the loss value.

6. The method according to any one of claims 4 or 5, wherein after calculating the loss value of the output result using a preset loss function, the method further comprises:

returning the loss values to the two neural networks by back propagation;

determining a target hidden layer in the two neural networks according to the loss value, wherein the target hidden layer is a hidden layer which causes an error between the label and a true label corresponding to the sample data;

and reducing the weight parameters of the target hidden layer to update the weight parameters of the two neural networks.

7. The method of claim 1, wherein,

the first neural network and the second neural network have the same structure and different initialization parameters;

before the inputting the sample data into two neural networks respectively, the method further comprises: performing two random initializations of the first neural network and the second neural network.

8. A method of processing an image, comprising:

acquiring an image to be processed;

inputting the image into a semantic segmentation model for classification processing, wherein the semantic segmentation model is generated based on the generation method of the semantic segmentation model as claimed in any one of claims 1 to 7.

9. An apparatus for generating a semantic segmentation model, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring sample data, and the sample data comprises image data which is not marked and image data which is marked;

an input module, configured to input the sample data into two neural networks respectively, to obtain output results of the two neural networks, where the output results include: outputting a confidence coefficient and a label corresponding to the output confidence coefficient;

the first processing module is set to update the network parameters of the two neural networks by using the sample data according to the output results of the two neural networks;

and the second processing module is used for taking the label corresponding to the first neural network as a supervision signal of the output confidence of the second neural network and taking the label corresponding to the second neural network as a supervision signal of the output confidence of the first neural network, wherein the two neural networks comprise the first neural network and the second neural network.

10. The apparatus of claim 9, wherein the first processing module is further configured to calculate a loss value of the output result by using a preset loss function when the output confidences of the two neural networks correspond to the same label and are both greater than a preset confidence threshold, wherein the loss value is used for updating the weight parameters of the two neural networks through back propagation.

11. The apparatus of claim 10, wherein the apparatus further comprises:

and the third processing module is configured to reject to calculate a loss value of an output result of the output result by using the preset loss function when the labels corresponding to the output confidence degrees of the two neural networks are different, or the output confidence degree of any one of the two neural networks is smaller than the preset confidence degree threshold value.

12. The apparatus of claim 10, wherein the first processing module comprises:

a first input unit configured to input the label and a true value label corresponding to the sample data into a cross entropy loss function;

a first processing unit arranged to take an output result of the cross entropy loss function as the loss value.

13. The apparatus of claim 10, wherein the first processing module further comprises:

a second input unit configured to input the label and a true label corresponding to the sample data into a target loss function; wherein the target loss function is used for measuring the similarity of two sample sets;

a second processing unit arranged to take an output result of the target loss function as the loss value.

14. The apparatus of claim 12 or 13, wherein the apparatus further comprises:

a fourth processing module configured to return the loss values to the two neural networks by back propagation;

a determining module configured to determine a target hidden layer in the two neural networks according to the loss value, where the target hidden layer is a hidden layer that causes an error between the label and a true label corresponding to the sample data;

and the fifth processing module is used for reducing the weight parameters of the target hidden layer so as to update the weight parameters of the two neural networks.

15. The apparatus of claim 9, wherein,

the device further comprises: and the initialization module is used for randomly initializing the first neural network and the second neural network twice before the sample data is respectively input into the two neural networks.

16. An apparatus for processing an image, comprising:

a second acquisition module configured to acquire an image to be processed;

a classification module configured to input the image into a semantic segmentation model for classification, wherein the semantic segmentation model is generated by training based on the generation method of the semantic segmentation model according to any one of claims 1 to 7.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a semantic segmentation model according to any one of claims 1 to 7 and a method of processing an image according to claim 8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the method of generating a semantic segmentation model according to any one of claims 1 to 7 and the method of processing an image according to claim 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements a method of generating a semantic segmentation model according to any one of claims 1 to 7 and a method of processing an image according to claim 8.