CN117974707A

CN117974707A - Training method of image segmentation model, image segmentation method and device

Info

Publication number: CN117974707A
Application number: CN202311086209.XA
Authority: CN
Inventors: 陈圣
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2024-05-03

Abstract

The application relates to a training method of an image segmentation model, an image segmentation method and an image segmentation device. The training method of the image segmentation model comprises the following steps: inputting the training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image; obtaining a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and obtaining a second loss based on the first image feature and the target image feature; and adjusting model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain the target image segmentation model. By adopting the method, the image segmentation accuracy can be improved.

Description

Training method of image segmentation model, image segmentation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for an image segmentation model, an image segmentation method and an image segmentation device.

Background

With the development of image processing technology, an image segmentation technology, which refers to a technology of dividing an image into a plurality of specific regions having unique properties and presenting an object of interest, has emerged. One classical task of image segmentation techniques is image foreground segmentation, which refers to the task of distinguishing the foreground from the background in a graph. In the related art, an image is usually segmented by training an artificial intelligence model, and the image segmentation training is performed on the artificial intelligence model by acquiring a large number of marked images. However, image segmentation training based on the annotation data set cannot effectively guarantee the image segmentation accuracy of the model.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a training method, an image segmentation method, and an apparatus for an image segmentation model that can improve the accuracy of image segmentation.

The application provides a training method of an image segmentation model. The method comprises the following steps:

inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image;

obtaining a first loss based on the foreground segmentation label image and the foreground segmentation predicted image of the training image, and obtaining a second loss based on the first image feature and the target image feature;

And adjusting model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain a target image segmentation model.

The application provides a training device for an image segmentation model. The device comprises:

the model processing module is used for inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image;

The loss determination module is used for obtaining a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image and obtaining a second loss based on the first image characteristic and the target image characteristic;

And the model adjustment module is used for adjusting the model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain a target image segmentation model.

The application provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the training method of the image segmentation model when executing the computer program.

The present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the training method of an image segmentation model described above.

The present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps described in the training method of an image segmentation model described above.

The application provides an image segmentation method. The method comprises the following steps:

Acquiring an image to be segmented;

Inputting the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented; the target image segmentation model is obtained through training by a training method of the image segmentation model.

The application provides an image segmentation device. The device comprises:

the image acquisition module is used for acquiring an image to be segmented;

The image segmentation module is used for inputting the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented; the target image segmentation model is obtained through training by a training method of the image segmentation model.

The application provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the image segmentation method when executing the computer program.

The present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described image segmentation method.

The present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the image segmentation method described above.

In the embodiment of the application, a training image is input into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image; obtaining a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and obtaining a second loss based on the first image feature and the target image feature; and adjusting model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain the target image segmentation model. Acquiring an image to be segmented; and inputting the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented. In this way, the foreground segmentation predicted image is a foreground segmentation image predicted by the model, the foreground segmentation label image is an accurate foreground segmentation image, the first loss is obtained based on the foreground segmentation predicted image and the foreground segmentation label image of the training image, and the model parameter is adjusted based on the first loss to help the image segmentation model output the foreground segmentation predicted image close to the foreground segmentation label image. The first image feature is an image feature obtained by data processing of the input image by the model, the target image feature is an expected image feature serving as a reference, the second loss is obtained based on the first image feature and the target image feature, and the model parameter is adjusted based on the second loss so as to help improve the image feature processing capacity of the image segmentation model for the input image. Finally, model parameters are adjusted based on the first loss and the second loss, so that model training quality can be improved, an image segmentation model can extract more accurate image features of an input image, the image segmentation model outputs an accurate foreground segmentation image based on the extracted image features, and the target image segmentation model obtained through final training is guaranteed to have higher image segmentation accuracy. The image to be segmented is input into a target image segmentation model, and the target image segmentation model can output an accurate foreground segmentation image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments described in the present specification, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a diagram of a training method of an image segmentation model and an application environment of the image segmentation method in one embodiment;

FIG. 2 is a flow chart of a training method of an image segmentation model in one embodiment;

FIG. 3 is a schematic diagram of an image segmentation model in one embodiment;

FIG. 4 is a schematic diagram of model training based on StyleGAN models to guide Unet in one embodiment;

FIG. 5 is a schematic diagram of model training for Unet and StyleGAN models in one embodiment;

FIG. 6 is a flow chart of an image segmentation method in one embodiment;

FIG. 7 is a schematic diagram of an image segmentation method based on Unet models and StyleGAN models in one embodiment;

FIG. 8 is a block diagram of a training apparatus for an image segmentation model in one embodiment;

FIG. 9 is a block diagram showing the structure of an image dividing apparatus in one embodiment;

FIG. 10 is an internal block diagram of a computer device in one embodiment;

fig. 11 is an internal structural view of a computer device in another embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The training method and the image segmentation method for the image segmentation model provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster or cloud server composed of a plurality of servers.

The terminal and the server can be used independently for executing the training method and the image segmentation method of the image segmentation model provided in the embodiment of the application.

For example, the server inputs the training image into an image segmentation model to be trained, resulting in a first image feature and a foreground segmentation prediction image of the training image. The server obtains a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and obtains a second loss based on the first image feature and the target image feature. And the server adjusts model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain the target image segmentation model.

The server acquires an image to be segmented, and inputs the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented.

The terminal and the server may also cooperate to perform the training method and the image segmentation method of the image segmentation model provided in the embodiments of the present application.

For example, the server acquires a training image and a foreground segmentation label image of the training image from the terminal. And the server inputs the training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image. The server obtains a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and obtains a second loss based on the first image feature and the target image feature. And the server adjusts model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain the target image segmentation model.

Subsequently, the server can help the terminal to carry out image foreground segmentation based on the target image segmentation model, and an image foreground segmentation result is returned to the terminal. The terminal sends the image to be segmented to the server, the server inputs the image to be segmented into the target image segmentation model to obtain a foreground segmentation image of the image to be segmented, and the server returns the foreground segmentation image of the image to be segmented to the terminal.

The server may send the target image segmentation model to the terminal to cause the terminal to perform image foreground segmentation based on the target image segmentation model. The terminal acquires an image to be segmented, and inputs the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented. Subsequently, the terminal can display the foreground segmented image of the image to be segmented, and can also perform further data processing on the foreground segmented image of the image to be segmented.

In the related art, the image segmentation model is usually directly subjected to supervised training based on the labeling data set, so that the image segmentation model has the image segmentation capability. However, model training is performed blindly based on the labeling data set, the accuracy of the model obtained through final training on image segmentation cannot be guaranteed, and the problem of low image segmentation fineness exists in the model obtained through final training.

According to the training method of the image segmentation model, when the model is trained, the model parameters of the image segmentation model are adjusted based on the first loss and the second loss, so that the training quality of the image segmentation model is improved, and the image segmentation accuracy of the target image segmentation model obtained through final training is improved. The method comprises the steps of obtaining a first loss based on a foreground segmentation label image of a training image and a foreground segmentation predicted image obtained by inputting the training image into an image segmentation model, obtaining a second loss based on target image features and the first image features obtained by inputting the training image into the image segmentation model, wherein the first loss is beneficial to the image segmentation model to output the foreground segmentation predicted image close to the foreground segmentation label image, the second loss is beneficial to improving the image feature processing capability of the image segmentation model for the input image, adjusting model parameters of the image segmentation model based on the first loss and the second loss, finally extracting more accurate image features of the input image by the target image segmentation model obtained by training, and outputting accurate and fine foreground segmentation images based on the extracted image features, so that the image segmentation accuracy is ensured.

In one embodiment, as shown in fig. 2, a training method of an image segmentation model is provided, and the method is applied to a computer device for illustration, and the computer device may be a terminal or a server. The method can be independently executed by the terminal or the server, and can also be realized through interaction between the terminal and the server. Referring to fig. 2, the training method of the image segmentation model includes the steps of:

step S202, inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image.

The training image is an image with known foreground segmentation results and is used for model training. It will be appreciated that the training image is an image comprising a foreground and a background, i.e. the training image is an image before foreground segmentation. The training image has a corresponding foreground segmentation label image, which is used to represent the foreground segmentation result that is correct for the training image. The foreground segmentation label image is an image that correctly distinguishes between foreground and background, i.e., the foreground segmentation label image is a correctly foreground segmented image. The foreground segmented label image is typically obtained by pre-labeling, e.g., by manual labeling.

The image segmentation model is an artificial intelligence model for image foreground segmentation, i.e. for foreground segmentation of an input image. The input data of the image segmentation model is an image, the final output data is a foreground segmentation predicted image, that is, the input data of the image segmentation model is an image to be segmented, and the final output data is an image after foreground segmentation. It will be appreciated that the foreground segmented image is an image that characterizes the result of foreground segmentation, e.g., in the foreground segmented image, the pixel values of the pixels in the background region are a preset value (e.g., 0); in the foreground segmentation image, the pixel value of the pixel in the foreground area is a first preset value, and the pixel value of the pixel in the background area is a second preset value; etc. The foreground segmentation label image is an image that characterizes the correct foreground segmentation result. The foreground segmentation prediction image is an image representing the foreground segmentation result of the model prediction. And inputting the image into an image segmentation model, extracting the characteristics of the input image by the image segmentation model to obtain first image characteristics corresponding to the input image, and classifying the characteristics of the first image characteristics corresponding to the input image by the image segmentation model to obtain a foreground segmentation predicted image corresponding to the input image.

Inputting the training image into an image segmentation model to be trained, extracting features of the training image by the image segmentation model to be trained to obtain first image features of the training image, and classifying the features of the first image features of the training image by the image segmentation model to be trained to obtain a foreground segmentation predicted image of the training image.

Specifically, the computer device may obtain the training image locally or from other devices, the computer device inputs the training image into an image segmentation model to be trained, the image segmentation model to be trained obtains the first image feature and the foreground segmentation predicted image of the training image by performing data processing on the training image, and the image segmentation model to be trained outputs the first image feature and the foreground segmentation predicted image of the training image.

Step S204, a first loss is obtained based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and a second loss is obtained based on the first image feature and the target image feature.

Wherein the target image feature is a desired, reference image feature. The target image feature may be an image feature associated with the training image. For example, the training image is input into the image segmentation model to be trained to obtain the first image feature of the training image, the first image feature of the training image is input into the pre-trained image generation model to obtain the target image feature, and the pre-trained image generation model has accurate feature processing capability, so that the target image feature is more accurate image feature aiming at the training image. The target image feature may also be an image feature associated with a foreground segmented label image of the training image. For example, the identity feature of the foreground segmentation label image is extracted to obtain the target image feature, and the target image feature is the identity feature of the correct foreground segmentation result.

Specifically, the foreground segmentation label image of the training image is used to characterize the correct foreground segmentation result for the training image, the foreground segmentation prediction image of the training image is used to characterize the foreground segmentation result for the model prediction for the training image, the training goal of the model is to hope that the foreground segmentation prediction image is close to the correct foreground segmentation label image, so the first loss is calculated based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and the first loss can reflect the difference between the foreground segmentation label image and the foreground segmentation prediction image of the training image. For example, calculating a mean square error between a foreground segmentation label image and a foreground segmentation prediction image of a training image results in a first loss; calculating an average absolute value error between a foreground segmentation label image and a foreground segmentation predicted image of the training image to obtain a first loss; calculating root mean square error between a foreground segmentation label image and a foreground segmentation predicted image of the training image to obtain first loss; etc.

The target image feature is a desired, reference image feature, the first image feature is an image feature obtained by data processing of the training image by the model, and the second loss is calculated based on the first image feature and the target image feature. For example, feature processing is performed on the first image feature to obtain a second image feature, and a second loss is calculated based on the second image feature and the target image feature, where the second loss may reflect a difference between the second image feature and the target image feature.

And S206, adjusting model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, so as to obtain the target image segmentation model.

The target image segmentation model refers to an image segmentation model which completes training. Training the image segmentation model to be trained to obtain a target image segmentation model. The target image segmentation model is used for image foreground segmentation, i.e. the target image segmentation model is used for foreground segmentation of the input image.

The convergence condition refers to a condition for judging whether the model reaches convergence, and the convergence condition includes, but is not limited to, at least one of model loss being smaller than a preset loss value, model iteration number being greater than a preset iteration number, or change rate of model loss being smaller than a preset change rate.

In particular, the computer device may derive a model loss based on the first loss and the second loss, e.g., calculate a sum of the first loss and the second loss as the model loss; the first loss and the second loss are weighted and summed to obtain model loss, and loss weights corresponding to the first loss and the second loss respectively can be set according to actual needs; etc. Furthermore, the computer equipment can reversely propagate the model loss to adjust the model parameters of the image segmentation model to be trained, and the target image segmentation model is obtained through repeated model iteration until convergence conditions are met. For example, model parameters of the image segmentation model may be adjusted based on model loss by a gradient descent algorithm.

It will be appreciated that the training process of the model is a process of continuously perfecting and optimizing model parameters, and is a process of continuously iterating the training. The computer equipment adjusts model parameters of the image segmentation model based on the first loss and the second loss to obtain an intermediate image segmentation model, the intermediate image segmentation model is used as a new image segmentation model to be trained, a training image is returned to be acquired, the training image is input into the image segmentation model to be trained to be executed for iterative training, and the target image segmentation model is obtained through multiple model iterations until convergence conditions are met. It will be appreciated that there may be a plurality of training images acquired at a time.

For example, if the preset iteration number is 50, the intermediate image segmentation model obtained by the 51 st adjustment is obtained as the target image segmentation model.

In one embodiment, different image segmentation models may be trained for different scenes. For example, for a portrait segmentation scene, a pre-segmentation portrait image with a known portrait segmentation result is obtained as a training image, and an image segmentation model to be trained is trained based on the training image, a foreground segmentation label image of the training image and a target image characteristic, so as to obtain a target image segmentation model for portrait segmentation. Aiming at a medical image segmentation scene, acquiring a medical image with a known medical segmentation result as a training image, and training an image segmentation model to be trained based on the training image, a foreground segmentation label image of the training image and a target image characteristic to obtain a target image segmentation model for medical image segmentation. Medical image segmentation refers to the segmentation of a target object from a medical image, e.g. the segmentation of a specific organ site from a medical image.

In the image segmentation method, the foreground segmentation predicted image is a foreground segmentation image predicted by a model, the foreground segmentation label image is an accurate foreground segmentation image, the first loss is obtained based on the foreground segmentation predicted image and the foreground segmentation label image of the training image, and the model parameter is adjusted based on the first loss so as to be beneficial to the image segmentation model to output the foreground segmentation predicted image close to the foreground segmentation label image. The first image feature is an image feature obtained by data processing of the input image by the model, the target image feature is an expected image feature serving as a reference, the second loss is obtained based on the first image feature and the target image feature, and the model parameter is adjusted based on the second loss so as to help improve the image feature processing capacity of the image segmentation model for the input image. Finally, model parameters are adjusted based on the first loss and the second loss, so that model training quality can be improved, an image segmentation model can extract more accurate image features of an input image, the image segmentation model outputs an accurate foreground segmentation image based on the extracted image features, and the target image segmentation model obtained through final training is guaranteed to have higher image segmentation accuracy.

In one embodiment, the image segmentation model comprises a first network and a second network, the first network comprises a plurality of first network layers which are connected in sequence, the second network comprises a plurality of second network layers and output layers which are connected in sequence, the plurality of first network layers and the plurality of second network layers are in one-to-one correspondence, and the last first network layer in the first network is connected with the first second network layer in the second network.

Inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image, comprising:

inputting the training image into a first network, and obtaining data output by a last first network layer to obtain first image characteristics; and after the training image sequentially passes through the first network and the second network, outputting data output by the layer to obtain a foreground segmentation predicted image.

Wherein the image segmentation model comprises a first network and a second network. The first network is used for extracting image features of the model input image. The input data of the first network is a model input image, and the output data of the first network is the image characteristics of the model input image. The second network is used to generate a foreground segmented image of the model input image. The input data of the second network is the output data of the first network, and the output data of the second network is the foreground segmentation predicted image of the model input image.

The first network comprises a plurality of first network layers connected in sequence. The plurality of first network layers connected in sequence are used for gradually extracting the image characteristics of the input image, and the image characteristics extracted by the first network layers at the back are rich in image information. The second network comprises a plurality of second network layers and output layers which are sequentially connected. That is, the second network includes a plurality of second network layers connected in sequence, and the last second network layer in the second network is connected to the output layer. The plurality of sequentially connected second network layers are used for gradually classifying the features, and the image information contained in the image features output by the second network layers which are more backward is finer. The output layer is used for outputting the foreground segmentation prediction image.

Further, the plurality of first network layers and the plurality of second network layers are in one-to-one correspondence, that is, one first network layer corresponds to one second network layer. The output data of the first network layer may be input to the corresponding second network layer in addition to the next network layer.

Specifically, the computer device inputs the training image into an image segmentation model to be trained. In the image segmentation model, a training image is input into a first network to perform feature extraction, each first network layer in the first network can output image features, and data output by the last first network layer in the first network is used as the first image features of the training image. In the image segmentation model, output data of each first network layer in the first network are input into a second network, the input data of the second network layer in the second network comprises output data of a previous network layer and output data of a corresponding first network layer, the output data of the second network layer is used for inputting a next network layer, the output data of a last second network layer in the second network is input into an output layer, and the output layer outputs a foreground segmentation predicted image.

In one embodiment, a first network layer in the first network is a convolution layer for performing feature extraction, a second network layer in the second network is a deconvolution layer for performing feature classification, and an output layer in the second network is a convolution layer for outputting the foreground segmentation prediction image.

In one embodiment, referring to fig. 3, the image segmentation model includes a first network and a second network. The first network comprises 4 convolution layers, namely a convolution layer 1, a convolution layer 2, a convolution layer 3 and a convolution layer 4, wherein the convolution layer 1, the convolution layer 2, the convolution layer 3 and the convolution layer 4 are respectively first network layers. The first network is used for extracting image features of the input image. The convolution layer in the first network is used for performing convolution operations, activation operations and downsampling operations. The convolution operation is used to extract image features and the activation operation may be implemented by an activation function, e.g., a linear rectification function (Linear rectification function, reLU function), and the downsampling operation is used to reduce the size of the features. The second network comprises 5 convolution layers, namely a convolution layer 5, a convolution layer 6, a convolution layer 7, a convolution layer 8 and a convolution layer 9, wherein the convolution layer 5, the convolution layer 6, the convolution layer 7 and the convolution layer 8 are respectively second network layers, and the convolution layer 9 is an output layer. The second network is used to convert the extracted image features into an output image. The second network layer is used for up-sampling operation, splicing operation, convolution operation and activation operation, and the output layer is used for converting the output characteristics of the convolution layer 8 into images. The upsampling operation serves to enlarge the size of the feature and restore the size of the feature. The splicing operation is used for splicing the up-sampled characteristics and the output characteristics of the corresponding convolution layer in the first network. The accuracy is improved by adding cross-layer connections between the first network and the second network that enable the image segmentation model to obtain higher resolution, thereby enabling accurate foreground segmentation while maintaining low errors. Convolution layer 1 corresponds to convolution layer 8, convolution layer 2 corresponds to convolution layer 7, convolution layer 3 corresponds to convolution layer 6, and convolution layer 4 corresponds to convolution layer 5. In the first network, features of the image are gradually extracted through a plurality of convolution and downsampling operations, and in the second network, the size of the image is gradually restored through a plurality of upsampling and convolution operations, and classification at the pixel level is performed.

And inputting the image into an image segmentation model, and finally outputting a foreground segmentation prediction model corresponding to the image by the image segmentation model. In the image segmentation model, an input image of the model is input into a first network, and a convolution layer 4 in the first network outputs first image features corresponding to the input image. The output data of the convolution layers 4 are input into the convolution layers 5 in the second network, the output data of each convolution layer in the first network is input into the corresponding convolution layer in the second network, and the convolution layer 9 finally outputs the foreground segmentation predicted image through the data processing of the second network.

In the above embodiment, the image segmentation model includes a first network and a second network, where the first network may gradually extract image features corresponding to the input image, so as to help to improve accuracy of image segmentation, and the first network layer in the first network corresponds to the second network layer in the second network one by one, so as to help to further improve accuracy of image segmentation. The training image is input into an image segmentation model, the image segmentation model comprises a first network and a second network, the first network can gradually extract image features corresponding to the input image, output data of the last first network layer in the first network is obtained to serve as first image features of the training image, the first image features can be guaranteed to have enough image information, and subsequent data processing is facilitated. The output layer in the second network ultimately outputs the foreground segmented prediction image of the training image.

In one embodiment, the training method of the image segmentation model further comprises:

Inputting the first image features into a mapping network in a target image generation model obtained through pre-training to perform feature mapping to obtain intermediate image features; inputting the intermediate image features into a generating network in the target image generating model, and acquiring the image features output by the generating network to obtain target image features.

Wherein the image generation model is an artificial intelligence model for generating an image. The input data of the image generation model is the image characteristics, and the final output data is the image. The target image generation model refers to a pre-trained image generation model.

The image generation model includes a mapping network and a generation network. The mapping network is used for feature mapping. Feature mapping image features refers to mapping image features to another feature space to decouple image features. It can be understood that the first image features are obtained by extracting features from the training image, the first image features include foreground features and background features, the foreground features and the background features in the first image features are entangled and coupled with each other, and the feature mapping is performed on the first image features, so that the foreground features and the background features in the first image features can be decoupled, and the intermediate image features obtained by the feature mapping are more beneficial to subsequent data processing. The generation network is used to generate the image. The generating network comprises a plurality of network layers, the last network layer is used for outputting the generated image, and other network layers are used for outputting image characteristics.

Specifically, the computer equipment inputs a training image into a first network in an image segmentation model to obtain first image features of the training image, inputs the first image features of the training image into a target image generation model obtained through pre-training, in the target image generation model, firstly inputs the first image features into a mapping network, performs feature mapping on the first image features through the mapping network to obtain intermediate image features, then inputs the intermediate image features into a generation network, performs data processing on the intermediate image features through the generation network, and obtains the image features output by the generation network as target image features.

In one embodiment, the mapping network is a fully connected network, which is a network comprised of a plurality of fully connected layers. The generating network is a convolutional network, which is a network composed of a plurality of convolutional layers.

In one embodiment, the target image generation model is a pre-trained StyleGAN (STYLE GENERATIVEADVERSARIAL Networks, style-based generation countermeasure network) model. The StyleGAN model includes a mapping network to map the input vector into a higher-dimensional vector space to achieve feature decoupling, and a generation network to generate an image based on noise and the output vector of the mapping network. In the method, a first image feature is input into a mapping network in a pre-trained StyleGAN model, the mapping network outputs an intermediate image feature, the intermediate image feature and noise with the value of 0 are input into a generation network in a pre-trained StyleGAN model, and output data of a network layer of a target in the generation network is obtained to serve as target image features.

In one embodiment, a standard image set is acquired, and an initial image generation model and an initial image discrimination model are subjected to countermeasure training based on the standard image set to obtain a target image generation model. Specifically, the image generation model may be a generative model, and the initial image generation model is trained by means of countermeasure learning to obtain the target image generation model. In the model training stage, the initial image generation model and the initial image discrimination model can be subjected to countermeasure learning based on the standard image set, the generator (i.e., the image generation model) makes the output image cheat the judgment of the discriminator (i.e., the image discrimination model) and the discriminator makes the forged image output by the generator and the real image be distinguished so as to be mutually countered, thereby training to obtain the target image generation model and the target image discrimination model. Wherein, the standard image in the standard image set is a real image.

In the above embodiment, the target image generating model is obtained through pre-training, and has stronger feature processing capability, so that the target image features obtained by inputting the first image features of the training image into the target image generating model are the image features comprising more accurate foreground features and background features, and the use of such target image features as the learning object of the model is beneficial to improving the training execution of the model and improving the foreground segmentation accuracy of the model. The second loss is obtained based on the first image feature and the target image feature, and the model parameter is adjusted based on the second loss, so that the feature processing capacity of the image segmentation model is improved, and the image segmentation accuracy of the model is improved.

Obtaining a standard image, and performing image coding on the standard image to obtain standard image characteristics corresponding to the standard image; inputting standard image features into an initial image generation model to obtain a predicted image; respectively inputting the standard image and the predicted image into an initial image discrimination model to obtain the prediction true and false probabilities respectively corresponding to the standard image and the predicted image; and performing countermeasure learning on the initial image generation model and the initial image discrimination model based on the prediction true and false probabilities respectively corresponding to the standard image and the prediction image to obtain a target image generation model.

The standard image is an image which belongs to the same type as the training image and comprises a foreground and a background. For example, the training image is a portrait image, and the standard image is also a portrait image; the training image is a medical image and the standard image is a medical image. Image coding refers to extracting image features of an image. For example, image coding may be performed by artificial intelligence models, such as inputting standard images into an E4S (editing for swapping, compiled for interchange) network, resulting in standard image features.

The image generation model may be a generation model, which is obtained by training in the form of countermeasure learning, and the image discrimination model is required to be used in the countermeasure training. The input data of the image generation model is the image characteristics, and the output data is the image. The input data of the image discrimination network is an image, and the output data is the predicted true-false probability. The predicted true-false probabilities are used to represent the probability that an image belongs to a true image and the probability that an image belongs to a fake image. It will be appreciated that the initial image discrimination model refers to the image discrimination model to be trained.

The countermeasure learning refers to learning by allowing different models to play with each other, thereby training to obtain a desired model. The initial image generation model and the initial image discrimination model are subjected to countermeasure learning, and the initial image generation model aims at generating an image which can be in false spurious based on input data. The object of the initial image discrimination model is to distinguish the fake image output by the initial image generation model from the real image as far as possible.

Specifically, the target image generation model is trained by means of countermeasure learning. The computer equipment acquires a standard image, performs image coding on the standard image to obtain standard image features corresponding to the standard image, inputs the standard image features into an initial image generation model, outputs a generated prediction image by the initial image generation model, inputs the standard image and the prediction image into an initial image discrimination model respectively, and outputs prediction true and false probabilities corresponding to the standard image and the prediction image respectively. Based on the prediction true and false probabilities respectively corresponding to the standard image and the predicted image, the computer equipment performs countermeasure learning on the initial image generation model and the initial image discrimination model to obtain a target image generation model. The initial image generation model and the initial image discrimination model mutually resist learning and continuously adjust parameters, and the final purpose is that the image generation model is covered with the image discrimination model as much as possible, so that the image discrimination model cannot judge whether the output image of the image generation model is real or not.

For example, a discrimination loss is obtained based on the prediction true/false probabilities corresponding to the standard image and the predicted image, model parameters of the initial image discrimination model are adjusted based on the discrimination loss, an intermediate image discrimination model is obtained, a generation loss is obtained based on the prediction true/false probabilities corresponding to the predicted image, model parameters of the initial image generation model are adjusted based on the generation loss, an intermediate image generation model is obtained, the intermediate image discrimination model is used as a new initial image discrimination model, the intermediate image generation model is used as a new initial image generation model, and the step of acquiring the standard image is returned to perform iterative training, and the target image generation model is obtained through repeated iterative training until convergence conditions are satisfied. It will be appreciated that there may be a plurality of standard images acquired at a time.

In the above embodiment, the initial image generation model and the initial image discrimination model are subjected to the countermeasure training based on the standard image, so that the target image generation model obtained by training has strong generation capability, and an image with false spurious can be generated. And furthermore, the training of the initial image segmentation model is guided based on the target image generation model, so that the training quality of the initial image segmentation model can be effectively improved, and the image segmentation accuracy of the target image segmentation model obtained by final training is improved.

In one embodiment, the generating network includes generating network layers in one-to-one correspondence with the plurality of second network layers, and the target image feature includes image features output by each of the plurality of generating network layers.

Obtaining a second penalty based on the first image feature and the target image feature, comprising:

inputting the first image features into a second network to obtain the image features output by each of a plurality of second network layers; wherein the input of each second network layer comprises the output of the previous network layer and the output of the corresponding first network layer; for each generation network layer, calculating a second sub-loss based on the image features output by the generation network layer and the image features output by the corresponding second network layer; a second loss is derived based on the plurality of second sub-losses.

The first network in the image segmentation model comprises a plurality of first network layers which are sequentially connected, the second network in the image segmentation model comprises a plurality of second network layers which are sequentially connected, and the input of each second network layer comprises the output of the last network layer and the output of the corresponding first network layer.

The generation network in the image generation model includes generation network layers in one-to-one correspondence with the plurality of second network layers. For example, for the same first image feature, the feature size between the image feature output by the second network layer and the corresponding image feature output by the generation network layer is the same.

Specifically, the second network in the image segmentation model includes a plurality of second network layers, the first image features of the training image are input into the second network, and the plurality of second network layers in the second network may each output image features. The generating network in the generating image generating model comprises generating network layers which are in one-to-one correspondence with the second network layers, the first image features of the training image are input into the generating image generating model, and each generating network layer can output the image features.

The output data of each generated network layer and the corresponding second network layer can be fully utilized to calculate the second loss so as to improve the model training quality. First, for each generation network layer, calculating a second sub-loss based on the image features output by the generation network layer and the image features output by the corresponding second network layer, for example, calculating a feature difference between the image features output by the generation network layer and the image features output by the corresponding second network layer as the second sub-loss; calculating and generating square of characteristic difference between the image characteristics output by the network layer and the image characteristics output by the corresponding second network layer as second sub-loss; etc. The generation network comprises a plurality of generation network layers, so that a plurality of second sub-losses can be calculated, and finally, the second losses are obtained based on the second sub-losses. For example, an average value of the respective second sub-losses is calculated as the second loss.

In the above embodiment, for the first image feature of the training image, the image feature output by the second network layer in the image segmentation model to be trained is not accurate enough, and the image feature output by the generation network layer in the target image generation model after pre-training is relatively accurate. And obtaining second loss based on the image characteristics output by the generation network layer and the image characteristics output by the corresponding second network layer, and adjusting model parameters based on the second loss when the image segmentation model is trained, so that the output of the second network layer of the image segmentation model is enabled to be close to the output of the corresponding generation network layer, the characteristic processing capacity of the image segmentation model is improved, and the image segmentation accuracy of the image segmentation model is improved finally. Further, different second network layers can output image features with different precision, different generation network layers can output image features with different precision, for each generation network layer, second sub-losses are calculated based on the image features output by the generation network layer and the image features output by the corresponding second network layer, the second losses are obtained based on the second sub-losses, model parameters of the image segmentation model are adjusted based on the second losses, and therefore feature processing capacity of the image segmentation model on multiple precision is improved, and further image segmentation accuracy and image segmentation precision of the model are improved. From the whole training process, training of the image segmentation model is guided based on the target image generation model obtained through pre-training, so that the training quality of the image segmentation model is improved, and the image segmentation accuracy of the target image segmentation model obtained through final training is improved.

In a specific embodiment, the method of the present application may be applied in a portrait segmentation scenario. Based on the portrait images before and after segmentation and the pre-trained StyleGAN model, training of the Unet model is guided so as to improve the segmentation accuracy and precision of the Unet model.

As shown in fig. 4, the Unet model is used for image segmentation, the input data of the Unet model is a pre-segmentation image (i.e., a pre-segmentation portrait image), and the output data of the Unet model is a post-segmentation image (i.e., a post-segmentation portrait image). The StyleGAN model is pre-trained using a portrait image set in advance so that the StyleGAN model can generate relatively accurate portrait images. The StyleGAN model is used for generating a complete portrait image, the input data of the StyleGAN model is the hidden code output by the Unet model, and the output data of the StyleGAN model is the generated portrait image. The StyleGAN model includes a mapping network for mapping the hidden code and a generating network, the input data of which includes output data of the mapping network and noise, and the generating network for generating the portrait image. It will be appreciated that the mapping network may also be referred to as a map network, which is a fully connected network, which comprises a small number of fully connected layers, e.g. 4 fully connected layers. It will be appreciated that in order to ensure that the face id in the portrait image generated by the StyleGAN model coincides with the face id in the portrait image input by the Unet model, noise is set to 0.

The Unet model comprises 9 convolution layers, the image before segmentation is input into the Unet model, and output data of the convolution layer 4 in the Unet model is obtained as hidden codes. The hidden codes are input into a mapping network in a StyleGAN model, output data of the mapping network and noise with the value of 0 are input into a generating network in a StyleGAN model, the generating network comprises 18 convolution layers, and the output data of 11 th, 13 th and 15 th layers in a StyleGAN model and the output data of 6 th, 7 th and 8 th layers corresponding to the Unet model are adopted to conduct comparison loss, so that training of the Unet model is guided. It will be appreciated that in the StyleGAN model, the more clearly the image details are in the output data of the later convolutional layer. The sizes of the output data between the corresponding layers in the StyleGAN model and the Unet model are the same. The network layer of the object in the StyleGAN model (i.e., the generated network layer) is the 11 th, 13 th, 15 th layers in the StyleGAN model.

The loss function of the Unet model is as follows:

Loss＝Loss1+Loss2+Loss3+Loss4

Loss1＝mse(conv11(x),conv6(x))

Loss2＝mse(conv13(x),conv7(x))

Loss3＝mse(conv15(x),conv8(x))

Loss4＝mse(unet(x),label)

Where Loss represents the total Loss, i.e., the target Loss. x represents the image of the input Unet model, i.e., training image, pre-segmentation image. unet (x) denotes an image that is input x into Unet model, and that is finally output by Unet model, that is, a segmented image, a foreground segmented prediction image. mse denotes a mse loss function (mean square error loss function). label represents the label image corresponding to x, namely the segmented image and the foreground segmented label image which are marked in advance. Loss4 represents the first penalty for the purpose of the Unet model's output approaching the correct portrait segmentation result.

Conv6 (x) represents the output data of the convolutional layer 6 in the Unet model, conv7 (x) represents the output data of the convolutional layer 7 in the Unet model, and conv8 (x) represents the output data of the convolutional layer 8 in the Unet model. conv11 (x) represents the output data of the convolution layer 11 in the StyleGAN model, conv13 (x) represents the output data of the convolution layer 13 in the StyleGAN model, and conv15 (x) represents the output data of the convolution layer 15 in the StyleGAN model. The objective of loss1+loss2+loss3 represents a second penalty, in order for the Unet model corresponding layer output result to be close to the StyleGAN model output, because the StyleGAN model result is high definition and well-defined.

After the Unet model is trained, inputting the portrait image to be segmented into the Unet model and outputting an accurate and fine portrait segmentation result by the Unet model.

In the above embodiment, the model structure of Unet model is simpler than StyleGAN model, training of Unet model is guided by using pre-trained StyleGAN model, fine result can be generated by pre-trained StyleGAN model, training of Unet model is guided by using middle layer of StyleGAN model, so that fine portrait segmentation result can be output by Unet model obtained by final training.

In one embodiment, the image segmentation model includes a first network and a second network.

inputting the training image into a first network to obtain a first image feature; and inputting the first image characteristic and random noise into a second network to obtain a foreground segmentation predicted image.

Wherein the image segmentation model comprises a first network and a second network. The first network is used for extracting image features of the model input image. The input data of the first network is a model input image, and the output data of the first network is the image characteristics of the model input image. The second network is used to generate a foreground segmented image of the model input image. The input data of the second network comprises output data of the first network and random noise, and the output data of the second network is a foreground segmentation prediction model of the model input image. Random noise refers to noise that is randomly generated.

Specifically, the computer device inputs the training image into a first network in the image segmentation model, the first network outputs first image features of the training image, inputs the first image features of the training image and random noise into a second network, and the second network outputs a foreground segmentation predicted image of the training image.

In one embodiment, the first network comprises a plurality of first network layers connected in sequence. The plurality of first network layers connected in sequence are used for gradually extracting the image characteristics of the input image, and the image characteristics extracted by the first network layers at the back are rich in image information.

In one embodiment, the reference model includes a third network and a fourth network, the third network includes a plurality of third network layers connected in sequence, the fourth network includes a plurality of fourth network layers and output layers connected in sequence, the plurality of third network layers and the plurality of fourth network layers are in one-to-one correspondence, and a last third network layer in the third network is connected to a first fourth network layer in the fourth network. The reference model is used for image foreground segmentation, input data of the reference model are images, and output data are foreground segmentation prediction images. Compared with the method for randomly initializing the model parameters of the image segmentation model to be trained, the method for obtaining the pre-trained reference model aims at the first network of the image segmentation model to be trained, and the third network in the pre-trained reference model is used as the first network of the image segmentation model to be trained, so that the method is beneficial to shortening the model training time of the image segmentation model and improving the model training efficiency. And aiming at the second network of the image segmentation model to be trained, acquiring the image generation model to be trained as the second network of the image segmentation model to be trained.

It will be appreciated that the pre-trained reference model may be trained by means of supervised learning. And acquiring a segmentation data set, and performing supervised training on the reference model based on the segmentation data set to obtain a pre-trained reference model. Specifically, the segmented dataset comprises a plurality of segmented image pairs, one segmented image pair comprising a pre-segmented image and a correct post-segmented image, the pre-segmented image in the segmented dataset being used as input data for the reference model, the correct post-segmented image being tag data for the intended output data for the reference model. The training goal of the reference model is to make the segmented image output by the reference model for the pre-segmentation image close to the correct segmented image.

In the above embodiment, the image segmentation model includes a first network and a second network, the training image is input into the first network to obtain a first image feature, and the first image feature and the random noise are input into the second network to obtain the foreground segmentation predicted image. Random noise can introduce diversity to help the image segmentation model output a detail-rich foreground segmented predicted image.

In one embodiment, deriving the second penalty based on the first image feature and the target image feature comprises:

Extracting a first identity feature of the foreground segmented predicted image based on the foreground segmented predicted image determined from the first image feature; extracting a second identity characteristic of the foreground segmentation label image to obtain a target image characteristic; based on the first identity and the second identity, a second penalty is obtained.

The first identity features are obtained by extracting identity features of foreground segmentation predicted images of the training images. The second identity feature is obtained by extracting the identity feature of the foreground segmentation label image of the training image. The identity of the image is used to indicate the object represented by the foreground portion in the image. For example, for a portrait image, an identity feature is used to indicate the user represented by the foreground portion in the image.

Specifically, the image segmentation model comprises a first network and a second network, a training image is input into the first network to obtain first image features, and the first image features and random noise are input into the second network to obtain the foreground segmentation predicted image. Random noise of the second network input into the image segmentation model can introduce diversity, so that the image segmentation model can generate a foreground segmentation predicted image with rich details and fineness, but identity information of the generated foreground segmentation predicted image and the generated foreground segmentation label image can be inconsistent. In order to keep identity information consistent, the foreground segmentation predicted image and the foreground segmentation label image can be subjected to identity comparison. The foreground segmentation label image is a desired and correct foreground segmentation result, the foreground segmentation predicted image is a foreground segmentation result output by the model, in order to keep identity information consistent, the identity characteristic of the foreground segmentation predicted image of the training image is extracted as a first identity characteristic, the identity characteristic of the foreground segmentation label image of the training image is extracted as a second identity characteristic, and fourth loss is calculated based on the first identity characteristic and the second identity characteristic. The fourth loss may reflect a difference between the first identity and the second identity.

In one embodiment, the identity of the image may be extracted by an artificial intelligence model.

In one embodiment, deriving the second penalty based on the first identity and the second identity comprises: obtaining a second loss based on the feature similarity between the first identity feature and the second identity feature; the second loss is inversely related to the feature similarity. Specifically, the feature similarity between the first identity feature and the second identity feature is calculated, and a second loss is obtained based on the feature similarity, wherein the second loss and the feature similarity are in negative correlation, i.e. the larger the feature similarity is, the smaller the second loss is. Thus, the training goal of the model is to reduce model loss, the smaller the model loss, the more similar the model loss reflects between the first identity feature and the second identity feature, and the more consistent the identity information between the foreground segmentation predicted image and the foreground segmentation label image.

In one embodiment, adjusting model parameters of the image segmentation model based on the first loss and the second loss until convergence conditions are met, to obtain a target image segmentation model, comprises: based on the loss weights corresponding to the first loss and the second loss respectively, fusing the first loss and the second loss to obtain model loss; the loss weight corresponding to the first loss is greater than the loss weight corresponding to the second loss; based on the model loss, adjusting model parameters of the image segmentation model until convergence conditions are met, and obtaining the target image segmentation model.

Specifically, when the model parameters are adjusted based on the first loss and the second loss, the first loss and the second loss can be weighted and fused to obtain model losses, the model losses are propagated in the opposite direction to adjust the model parameters of the image segmentation model, and the target image segmentation model is obtained through multiple model iterations until convergence conditions are met. When the first loss and the second loss are fused, the loss weight corresponding to the first loss is larger than the loss weight corresponding to the second loss, so that the first loss in the model loss is larger in duty ratio. It will be appreciated that the first loss is mainly used for image foreground segmentation, the second loss is mainly used for keeping identity information consistent, and image foreground segmentation is a primary task, so when the first loss and the second loss are fused, the loss weight corresponding to the first loss may be greater than the loss weight corresponding to the second loss.

In the above embodiment, the training image is input into the first network of the image segmentation model to obtain the first image feature of the training image, the first image feature and the random noise are input into the second network of the image segmentation model to obtain the foreground segmentation predicted image of the training image, and the random noise can be introduced into the diversity, so that the model is helped to output the foreground segmentation predicted image with abundant details. On the basis, based on the first identity characteristic of the foreground segmentation predicted image and the second identity characteristic of the foreground segmentation label image, second loss is obtained, when the image segmentation model is trained, model parameters are adjusted based on the second loss, and the image segmentation model is helped to output the foreground segmentation predicted image with identity information consistent with that of the foreground segmentation label image, so that the finally trained target image segmentation model can output an accurate and fine foreground segmentation image aiming at the model input image under the condition that the identity information is kept unchanged, and the image segmentation accuracy is ensured.

In one embodiment, the second network includes a mapping layer and a splitting layer.

Inputting the first image feature and random noise into a second network to obtain a foreground segmentation predicted image, comprising:

Inputting the first image features into a mapping layer for feature mapping to obtain intermediate image features; and inputting the intermediate image features and random noise into a segmentation layer for image segmentation to obtain a foreground segmentation predicted image.

Wherein the second network in the image segmentation model comprises a mapping layer and a segmentation layer. The mapping layer is used for feature mapping. The segmentation layer is used for image segmentation and generates a foreground segmented image.

Specifically, the computer equipment inputs a training image into a first network in an image segmentation model to obtain first image features of the training image, a second network in the image segmentation model comprises a mapping layer and a segmentation layer, the first image features of the training image are firstly input into the mapping layer, feature mapping is carried out on the first image features through the mapping network to obtain intermediate image features, and then the intermediate image features and random noise are input into the segmentation layer to carry out image segmentation to obtain a foreground segmentation predicted image.

In the above embodiment, the first image features are input into the mapping layer to perform feature mapping, so that the entangled foreground features and background features in the first image features can be decoupled, and the intermediate image features obtained through feature mapping are beneficial to subsequent image segmentation. The intermediate image features and random noise are input into a segmentation layer for image segmentation, so that a relatively fine foreground segmentation predicted image can be obtained.

In a specific embodiment, the method of the present application may be applied in a portrait segmentation scenario. Based on the intermediate processing results of the portrait images before and after segmentation and the pre-trained Unet model, the Unet model and the StyleGAN model are trained to realize the generation of more accurate and fine portrait segmentation results through the StyleGAN model.

The Unet model is pre-trained using the segmentation dataset for the portrait image so that the Unet model can generate a relatively accurate hidden code. In pre-training, the input data of Unet models is the pre-segmentation image (i.e., pre-segmentation portrait image), and the output data of Unet models is the post-segmentation image (i.e., post-segmentation portrait image). As shown in fig. 5, the StyleGAN model is used to generate a segmented image, the input data of the StyleGAN model is the hidden code output by the Unet model, and the output data of the StyleGAN model is the generated segmented image. The StyleGAN model includes a mapping network for mapping the hidden code and a generating network, the input data of which includes output data of the mapping network and noise, and the generating network for generating the segmented image. It will be appreciated that the mapping network may also be referred to as a map network, which is a fully connected network, which comprises a small number of fully connected layers, e.g. 4 fully connected layers. It can be understood that, in order to ensure the details of the image in the segmented image generated by the StyleGAN model, noise is set to be random noise, so that the model can output diversified results, and because portrait segmentation is often poor in segmentation of fine textures such as hairlines, the model can be enabled to select the best effect through diversity.

A segmentation network net (i.e., an image segmentation model) is built based on the Unet model and the StyleGAN model. The convolution layers 1 to 4 in the Unet model are the first network in the split network net, the StyleGAN model is the second network in the split network net, the mapping network in the StyleGAN model is the mapping layer in the second network, and the generation network in the StyleGAN model is the split layer in the second network.

The Unet model comprises 9 convolution layers, the image before segmentation is input into the Unet model, and output data of the convolution layer 4 in the Unet model is obtained as hidden codes. The hidden codes are input into a mapping network in the StyleGAN model, output data of the mapping network and noise are input into a generating network in the StyleGAN model, and a network output segmented image is generated. In order to keep the face id unchanged, the identity of an output image of the StyleGAN model is compared with that of an original image, and training of the segmentation network net is guided.

The loss function of the split network net is as follows:

Loss＝2Loss1+Loss_id

Loss1＝mse(net(x),label)

Loss_id＝1-cos(arcface(net(x)),arcface(label))

Where Loss represents the total Loss, i.e., the target Loss. x represents an image input to the split network net, i.e., an image input to the first network in the split network net, i.e., a training image, a pre-split image. net (x) represents an image which is input into the split network net and finally output by the split network net, namely an image which is output by StyleGAN models in the split network net, namely a split image and a foreground split prediction image. mse denotes a mse loss function (mean square error loss function). label represents the label image corresponding to x, namely the segmented image and the foreground segmented label image which are marked in advance. Loss1 represents the first penalty for the purpose of approximating the correct portrait segmentation result for the output of StyleGAN models.

Arcface (net (x)) represents the identity extracted by inputting net (x) into the identification network, namely the identity of the foreground segmented predicted image, the first identity. arcface (label) denotes identity features extracted from the label input identity recognition network, namely the identity features of the foreground segmentation label image and the second identity features. cos represents a cosine function. Loss _id represents a second penalty in order to keep the face id corresponding to the images before and after portrait segmentation unchanged.

After the segmentation network net is trained, the portrait image to be segmented is input into a first network, the hidden code and random noise output by the first network are input into StyleGAN model, and the StyleGAN model can output accurate and fine portrait segmentation results.

In the above embodiment, compared with Unet model, styleGAN model can output finer face details, and training StyleGAN model is performed in combination with pre-trained Unet model, so that StyleGAN model obtained by final training can output fine and smooth portrait segmentation result.

In one embodiment, as shown in fig. 6, an image segmentation method is provided, and is applied to a computer device, which may be a terminal or a server, for illustration. The method can be independently executed by the terminal or the server, and can also be realized through interaction between the terminal and the server. Referring to fig. 6, the image segmentation method includes the steps of:

step S602, an image to be segmented is acquired.

Step S604, inputting the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented.

The target image segmentation model is obtained through training by the training method of the image segmentation model.

Specifically, the computer device may obtain the portrait image to be segmented locally or from another device, input the portrait image to be segmented into a target image segmentation model, and output a foreground segmentation image of the image to be segmented by the target image segmentation model.

The foreground segmented image of the image to be segmented may be further applied in other image processing tasks, for example, the foreground segmented image of the image to be segmented may be used to replace the image background of the image to be segmented. In the video call scene or the video conference scene, the image to be segmented can be a portrait image acquired in real time for a user participating in the video call or the video conference, and the portrait segmentation result output by the target image segmentation model can be used for replacing the video background of the video call or the video conference.

According to the image segmentation method, the target image segmentation model obtained through training by the training method of the image segmentation model has high image segmentation accuracy. The image to be segmented is input into a target image segmentation model, and the target image segmentation model can output an accurate foreground segmentation image.

In a specific embodiment, referring to fig. 7, the method of the present application proposes two image foreground segmentation schemes based on Unet model and StyleGAN model. The first scheme is as follows: during model training, training of a Unet model is guided based on portrait images before and after segmentation and a StyleGAN model which is pre-trained so as to improve the segmentation accuracy and precision of a Unet model; in model application, the pre-segmentation image is input into a Unet model which is trained, and the Unet model outputs an accurate and fine post-segmentation image. The second scheme is as follows: during model training, training a Unet model and a StyleGAN model based on the portrait images before and after segmentation and the intermediate processing result of a pre-trained Unet model so as to generate a more accurate and fine portrait segmentation result through a StyleGAN model; when the model is applied, the pre-segmentation image is input into a Unet model which is completely trained, an intermediate processing result of the Unet model and a random noise input StyleGAN model are obtained, and an accurate and fine post-segmentation image is output by the StyleGAN model.

It will be appreciated that the segmented image may be a binary image, for example, a binary image is an image consisting of 0 and 1,0 representing the background and 1 representing the foreground. The segmented image may be an image including only the foreground.

It will be appreciated that the model structure of the StyleGAN model is more complex than that of the Unet model, and that the image segmentation speed of the first approach is faster when the model is applied. The StyleGAN model has strong image generation capability, and the second scheme has better segmentation effect when the model is applied.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image segmentation device for realizing the above related image segmentation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the image segmentation device provided below may refer to the limitation of the image segmentation method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 8, there is provided a training apparatus of an image segmentation model, including: a model processing module 802, a loss determination module 804, and a model adjustment module 806, wherein:

The model processing module 802 is configured to input a training image into an image segmentation model to be trained, and obtain a first image feature and a foreground segmentation prediction image of the training image.

The loss determination module 804 is configured to obtain a first loss based on the foreground segmentation label image and the foreground segmentation prediction image of the training image, and obtain a second loss based on the first image feature and the target image feature.

The model adjustment module 806 is configured to adjust model parameters of the image segmentation model based on the first loss and the second loss until a convergence condition is satisfied, thereby obtaining a target image segmentation model.

Model processing module 802 is also configured to:

inputting the training image into a first network, and obtaining data output by a last first network layer to obtain first image characteristics;

And after the training image sequentially passes through the first network and the second network, outputting data output by the layer to obtain a foreground segmentation predicted image.

In an embodiment, the training device of the image segmentation model is further configured to:

Inputting the first image features into a mapping network in a target image generation model obtained through pre-training to perform feature mapping to obtain intermediate image features;

Inputting the intermediate image features into a generating network in the target image generating model, and acquiring the image features output by the generating network to obtain target image features.

In one embodiment, the generating network includes generating network layers in one-to-one correspondence with the plurality of second network layers, and the target image feature includes image features output by each of the plurality of generating network layers. The loss determination module 804 is further configured to:

Inputting the first image features into a second network to obtain the image features output by each of a plurality of second network layers; wherein the input of each second network layer comprises the output of the previous network layer and the output of the corresponding first network layer;

for each generation network layer, calculating a second sub-loss based on the image features output by the generation network layer and the image features output by the corresponding second network layer;

a second loss is derived based on the plurality of second sub-losses.

In one embodiment, the image segmentation model includes a first network and a second network. Model processing module 802 is also configured to:

Inputting the training image into a first network to obtain a first image feature;

and inputting the first image characteristic and random noise into a second network to obtain a foreground segmentation predicted image.

In one embodiment, the loss determination module 804 is further to:

Extracting a first identity feature of the foreground segmented predicted image based on the foreground segmented predicted image determined from the first image feature;

Extracting a second identity characteristic of the foreground segmentation label image to obtain a target image characteristic;

Based on the first identity and the second identity, a second penalty is obtained.

In one embodiment, the second network includes a mapping layer and a splitting layer. Model processing module 802 is also configured to:

Inputting the first image features into a mapping layer for feature mapping to obtain intermediate image features;

And inputting the intermediate image features and random noise into a segmentation layer for image segmentation to obtain a foreground segmentation predicted image.

In one embodiment, as shown in fig. 9, there is provided an image segmentation apparatus including: an image acquisition module 902 and an image segmentation module 904, wherein:

An image acquisition module 902, configured to acquire an image to be segmented.

The image segmentation module 904 is configured to input an image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented; the target image segmentation model is obtained through training by a training method of the image segmentation model.

According to the training device and the image segmentation device of the image segmentation model, the foreground segmentation predicted image is the foreground segmentation image predicted by the model, the foreground segmentation label image is the accurate foreground segmentation image, the first loss is obtained based on the foreground segmentation predicted image and the foreground segmentation label image of the training image, and the model parameter is adjusted based on the first loss so that the image segmentation model is helped to output the foreground segmentation predicted image close to the foreground segmentation label image. The first image feature is an image feature obtained by data processing of the input image by the model, the target image feature is an expected image feature serving as a reference, the second loss is obtained based on the first image feature and the target image feature, and the model parameter is adjusted based on the second loss so as to help improve the image feature processing capacity of the image segmentation model for the input image. Finally, model parameters are adjusted based on the first loss and the second loss, so that model training quality can be improved, an image segmentation model can extract more accurate image features of an input image, the image segmentation model outputs an accurate foreground segmentation image based on the extracted image features, and the target image segmentation model obtained through final training is guaranteed to have higher image segmentation accuracy. The image to be segmented is input into a target image segmentation model, and the target image segmentation model can output an accurate foreground segmentation image.

The training device of the image segmentation model and each module in the image segmentation device can be fully or partially realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as models, training data of the models and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method of an image segmentation model and an image segmentation method.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 11. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a training method of an image segmentation model and an image segmentation method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the structures shown in FIGS. 10 and 11 are block diagrams of only some of the structures associated with the present inventive arrangements and are not limiting of the computer device to which the present inventive arrangements may be implemented, and that a particular computer device may include more or fewer components than shown, or may be combined with certain components, or may have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of training an image segmentation model, the method comprising:

2. The method of claim 1, wherein the image segmentation model comprises a first network and a second network, the first network comprising a plurality of first network layers connected in sequence, the second network comprising a plurality of second network layers connected in sequence and an output layer, the plurality of first network layers and the plurality of second network layers being in one-to-one correspondence, a last one of the first network layers connecting a first one of the second network layers; inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image, wherein the method comprises the following steps:

inputting a training image into the first network, and acquiring data output by the last first network layer to obtain the first image characteristics;

And acquiring data output by the output layer after the training image sequentially passes through the first network and the second network, and obtaining the foreground segmentation predicted image.

3. The method according to claim 2, wherein the method further comprises:

Inputting the intermediate image features into a generating network in the target image generating model, and obtaining the image features output by the generating network to obtain target image features.

4. A method according to claim 3, wherein the generating network comprises generating network layers in one-to-one correspondence with the plurality of second network layers, and the target image features comprise image features output by each of the plurality of generating network layers;

the obtaining a second loss based on the first image feature and the target image feature includes:

inputting the first image features into the second network to obtain the image features output by each of the plurality of second network layers; wherein the input of each second network layer comprises the output of the last network layer and the output of the first network layer corresponding to the output of the last network layer;

obtaining a second loss based on a plurality of the second sub-losses.

5. The method of claim 1, wherein the image segmentation model comprises a first network and a second network;

Inputting a training image into an image segmentation model to be trained to obtain a first image feature and a foreground segmentation predicted image of the training image, wherein the method comprises the following steps:

inputting a training image into the first network to obtain the first image characteristics;

and inputting the first image characteristic and random noise into the second network to obtain the foreground segmentation predicted image.

6. The method of claim 5, wherein the deriving a second penalty based on the first image feature and the target image feature comprises:

extracting a first identity feature of a foreground segmented predicted image based on the foreground segmented predicted image determined from the first image feature;

extracting a second identity characteristic of the foreground segmentation tag image to obtain the target image characteristic;

And obtaining a second loss based on the first identity and the second identity.

7. The method of claim 6, wherein the second network comprises a mapping layer and a segmentation layer;

the inputting the first image feature and random noise into the second network to obtain the foreground segmentation predicted image comprises:

Inputting the first image features into the mapping layer for feature mapping to obtain intermediate image features;

And inputting the intermediate image features and random noise into the segmentation layer to carry out image segmentation to obtain the foreground segmentation predicted image.

8. An image segmentation method, the method comprising:

Acquiring an image to be segmented;

Inputting the image to be segmented into a target image segmentation model to obtain a foreground segmentation image of the image to be segmented; the object image segmentation model is trained by the training method of the image segmentation model according to any one of claims 1-7.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.