WO2022017025A1

WO2022017025A1 - Image processing method and apparatus, storage medium, and electronic device

Info

Publication number: WO2022017025A1
Application number: PCT/CN2021/098905
Authority: WO
Inventors: 刘钰安
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-07-23
Filing date: 2021-06-08
Publication date: 2022-01-27
Also published as: CN111862127A

Abstract

An image processing method and apparatus, a storage medium, and an electronic device. The method comprises: multiple convolutional network blocks comprised in a segmentation module at least comprised in a pre-trained image segmentation model being connected in sequence and then connected to at least one convolutional layer comprised in the segmentation module, and each convolutional network block comprising a convolutional layer, a batch normalization layer, and a nonlinear activation layer; inputting a first image into the pre-trained image segmentation model, and outputting a segmentation mask; and segmenting a second image from the first image according to the segmentation mask.

Description

Image processing method, device, storage medium, and electronic device

This application claims the priority of the Chinese patent application filed on July 23, 2020 with the application number 202010718338.6 and the application name "image processing method, device, storage medium and electronic device", the entire contents of which are incorporated by reference in this application.

technical field

The present application belongs to the field of image technology, and in particular, relates to an image processing method, device, storage medium and electronic device.

Background technique

Image segmentation is a fundamental topic in the field of computer vision. Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis.

SUMMARY OF THE INVENTION

Embodiments of the present application provide an image processing method, apparatus, storage medium, and electronic device, which can improve the accuracy of image segmentation by the electronic device.

In a first aspect, an embodiment of the present application provides an image processing method, the method comprising:

get the first image;

Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;

inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;

A second image is segmented from the first image according to a segmentation mask corresponding to the first image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, and the apparatus includes:

a first acquisition module, configured to acquire a first image;

The second acquisition module is used to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of volumes The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each of the convolutional network blocks includes a convolutional layer and a batch normalization layer. and nonlinear activation layer;

a processing module, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;

A segmentation module, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.

In a third aspect, embodiments of the present application provide a storage medium on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the image processing method provided by the embodiments of the present application.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the image processing method provided by the embodiment of the present application by invoking a computer program stored in the memory.

Description of drawings

The technical solutions of the present application and the beneficial effects thereof will be apparent through the detailed description of the specific embodiments of the present application in conjunction with the accompanying drawings.

FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application.

FIG. 2 is another schematic flowchart of an image processing method provided by an embodiment of the present application.

FIG. 3 is a schematic structural diagram of a training model provided by an embodiment of the present application.

FIG. 4 is a schematic structural diagram of a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of each network block provided by an embodiment of the present application.

FIG. 6 is another schematic structural diagram of a training model provided by an embodiment of the present application.

FIG. 7 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.

FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

FIG. 9 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

Please refer to the drawings, wherein the same component symbols represent the same components, and the principles of the present application are exemplified by being implemented in a suitable computing environment. The following description is based on illustrated specific embodiments of the present application and should not be construed as limiting other specific embodiments of the present application not detailed herein.

It can be understood that, the executive body of the embodiment of the present application may be an electronic device such as a smart phone or a tablet computer.

The application provides an image processing method, the method includes:

get the first image;

In one embodiment, the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, The split modules are connected in sequence.

In one embodiment, during model training, the training model used to obtain the pre-trained image segmentation model further includes a deep feature supervision module, the deep feature supervision module is connected to the feature pyramid module, and the deep feature supervision module is connected to the feature pyramid module. The feature supervision module is used to supervise deep features from multiple scales;

During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;

obtaining the first label segmentation mask used as label in the training sample;

separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;

According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;

The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.

In one embodiment, the training model further includes an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.

In one embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;

The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;

The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.

In one embodiment, the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and a segmentation module and an edge are added. gradient module;

Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following processes:

The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;

Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;

Obtain the second annotation segmentation mask used as annotation in the training sample;

Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;

Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;

multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;

calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;

calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;

Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;

Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;

Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;

The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.

In one embodiment, the method further includes:

performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;

The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The process may include:

101. Acquire a first image.

Image segmentation is a fundamental topic in the field of computer vision. Image segmentation is the technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a key step from image processing to image analysis. However, in the related art, the accuracy of image segmentation by the electronic device is low.

In this embodiment of the present application, for example, the electronic device may acquire the first image first. It can be understood that the first image is an image that needs to be processed by image segmentation.

102. Obtain a pre-trained image segmentation model, where the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional The multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer. Each convolutional network block includes a convolutional layer, a batch normalization layer and a nonlinear activation layer.

For example, the electronic device can also acquire a pre-trained image segmentation model that has been trained in advance, where the pre-trained image segmentation model can be used to output a segmentation mask of the image. The pre-trained image segmentation model may include at least a segmentation module, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, each A convolutional network block includes convolutional layers, batch normalization layers (ie, BN layers) and non-linear activation layers (ie, ReLu layers).

It should be noted that the user can pre-train the model according to the requirements, so that the pre-trained image segmentation model can output the segmentation mask required by the user. For example, if the user needs to use the trained model for portrait segmentation, the pre-trained image segmentation model obtained after training should be a model that can output portrait segmentation masks. For another example, the user needs to use the trained model to segment a specific object (such as a car or a potted plant, etc.), then the pre-trained image segmentation model obtained after training should be able to output the segmentation mask of the specific object. model, etc.

103. Input the first image to a pre-trained image segmentation model, and output a segmentation mask corresponding to the first image by the pre-trained image segmentation model.

For example, after acquiring the first image and the pre-trained image segmentation model, the electronic device can input the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model outputs the segmentation corresponding to the first image mask.

104. Segment the second image from the first image according to the segmentation mask corresponding to the first image.

For example, after obtaining the segmentation mask corresponding to the first image, the electronic device may segment the first image to obtain a corresponding image, that is, the second image, according to the segmentation mask corresponding to the first image.

For example, after obtaining the portrait segmentation mask corresponding to the first image, the electronic device can segment the corresponding portrait from the first image according to the portrait segmentation mask.

It can be understood that, in this embodiment of the present application, the electronic device can acquire the first image and a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, and the pre-trained image segmentation model at least includes a segmentation module. , the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch of Normalization layer and nonlinear activation layer. Afterwards, the electronic device may input the first image to the pre-trained image segmentation model, and the pre-trained image segmentation model outputs a segmentation mask corresponding to the first image. The electronic device may obtain a second image by segmenting the first image according to a segmentation mask corresponding to the first image. Since the pre-trained image segmentation model includes a segmentation module, the segmentation module includes multiple convolutional network blocks and at least one convolutional layer. The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each volume The product network block includes a convolution layer, a BN layer, and a ReLu layer, so the electronic device can use the segmentation mask output by the pre-trained image segmentation model to more accurately segment the corresponding image from the first image. That is, the embodiments of the present application can improve the accuracy of image segmentation by the electronic device.

Please refer to FIG. 2 , which is another schematic flowchart of an image processing method provided by an embodiment of the present application.

First, the training process of the pre-trained image segmentation model will be described below.

In the embodiment of the present application, as shown in FIG. 3 , the training model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, a deep feature supervision module, a segmentation module, and an edge gradient module. Among them, the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence. The deep feature supervision module is connected with the feature pyramid module, which is used to supervise the deep features from multiple scales. The edge gradient module is connected with the segmentation module. The edge gradient module is used to provide the edge gradient loss function as one of the loss functions during model training.

During model training, the multi-scale decoder in the training model outputs the first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, where N is The number of layers of the feature pyramid;

Obtain the first annotation segmentation mask used as an annotation in the training sample;

Calculate the cross-entropy loss of each of the N deep-supervised prediction masks and the first annotation segmentation mask, respectively, and calculate the cross-entropy loss of the first preliminary segmentation mask and the first annotation segmentation mask;

For example, in this embodiment, a model including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module may be trained first, and the training may be referred to as the first One stage of training.

During the first stage of training, please refer to FIG. 4 . FIG. 4 includes four modules including a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module provided by this embodiment of the application. Schematic diagram of the structure of the model.

The input image is input by the multi-scale encoder module, processed by the multi-scale encoder module and transmitted to the feature pyramid module, and then processed by the feature pyramid module and then transmitted to the deep feature supervision module and the multi-scale decoder module respectively. After the data corresponding to the image is processed by the deep feature supervision module, four up-sampling masks (Mask) can be obtained, such as Mask32, Mask16, Mask8, and Mask4, respectively. The multi-scale decoder module may output a first preliminary segmentation mask (Mask). Among them, the basic network in the multi-scale encoder can choose the MobileNetV2 network with strong feature extraction ability and relatively lightweight, and then extract feature maps of different scales to form a feature pyramid. The numbers 320, 64, 32 and 24 on the feature pyramid image in the feature pyramid module represent the number of channels, and the numbers 1/4, 1/8, 1/16 and 1/32 represent the multiples of the down-sampling resolution relative to the original image . conv represents the convolution processing performed by the convolutional layer, up2x represents the bilinear interpolation 2 times upsampling processing, and 4x represents the bilinear interpolation 4 times the upsampling processing. cgr2x represents a first network block consisting of a convolutional layer, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2x upsampling layer in sequence. sgr2x represents a second network block consisting of a convolutional layer with the same number of input and output channels, a Group Normalization layer, a ReLU layer, and a bilinear interpolation 2 times upsampling layer in sequence. sgr is the third network block where sgr2x removes the bilinear interpolation 2x upsampling layer. Please also refer to FIG. 5. In FIG. 5, (a) is a schematic structural diagram of the first network block cgr2x, (b) is a structural schematic diagram of the second network block sgr2x, and (c) is a structural schematic diagram of the third network block sgr.

The training process of the first stage is described below by taking the training of a portrait segmentation mask as an example. The model used for the first stage training includes a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module and a deep feature supervision module.

First, the electronic device can obtain training samples, and divide the training samples into a test set and a training set in a ratio of 2:8. The electronic device can perform data enhancement processing on the samples in the training set, including random rotation, random left-right flip, random cropping, and Gamma transformation. It should be noted that performing data enhancement processing on the samples in the training set can not only increase the sample data in the training set, but also improve the robustness of the model obtained by training.

During training, for example, the electronic device may acquire an image in a training sample for input to the multi-scale encoder module, such as a third image, and perform preset processing on the third image, wherein the preset processing Random cropping and/or normalization processing may be included.

After that, the electronic device can input the image obtained by the preset processing of the third image into the multi-scale encoder, and the resolution obtained after being processed by the multi-scale encoder is 1/4, 1/8, and 1/4 of the third image, respectively. 16 and 1/32 feature maps. For example, as shown in Figure 4, the multi-scale encoder includes 5 layers, namely the receiving layer (Layer0), the first layer (Layer1), the second layer (Layer2), the third layer (Layer3) and the fourth layer. (Layer4). The receiving layer is used to receive the input third image, and the first layer, the second layer, the third layer and the fourth layer are respectively used to extract the feature maps of the input image at different scales. For example, the first layer extracts feature maps at one-fourth the resolution of the input image, the second layer extracts feature maps at one-eighth the resolution of the input image, and the third layer extracts the input image The feature map of 1/16th the resolution of the input image is extracted by the fourth layer, and the feature map of 1/32th the resolution of the input image is extracted. The electronic device can transmit these feature maps to the feature pyramid module, so as to obtain corresponding feature pyramid images, which are denoted as the third feature pyramid, for example. For example, as shown in Figure 4, the resolution of the third feature pyramid increases sequentially from top to bottom, that is, the resolution of the first layer feature map of the third feature pyramid is 1/32 of the third image, and the second layer feature map The resolution of the third layer is 1/16 of the third image, the resolution of the third layer feature map is 1/8 of the third image, and the resolution of the fourth layer feature map is 1/4 of the third image. Moreover, as shown in FIG. 4 , the number of channels of the first layer feature map to the fourth layer feature map of the third feature pyramid are 320, 64, 32, and 24 in sequence. In this embodiment, for example, the electronic device may record the first-level feature map, second-level feature map, third-level feature map, and fourth-level feature map of the third feature pyramid as a1, b1, c1, and d1 in sequence.

After that, the electronic device can call the feature pyramid module to process each image in the third feature pyramid into an image with the same number of channels, so as to obtain a fourth feature pyramid composed of images with the same number of channels. For example, for each pair of two adjacent images in the third feature pyramid, the feature pyramid module can use convolution processing and bilinear interpolation double upsampling processing to process the images in the third feature pyramid into the number of channels. Consistent images, resulting in a fourth feature pyramid. For example, for each pair of two adjacent images in the third feature pyramid, the feature pyramid module can use convolution processing and bilinear interpolation twice upsampling processing, and use bilinear interpolation for lower resolution feature maps. The double upsampling process is mixed with the higher resolution feature maps in the third feature pyramid that have the same resolution as the lower resolution upsampling by a factor of 2, thereby processing the number of channels of both images to 128.

For example, for the feature map a1 of the first layer and the feature map b1 of the second layer in the third feature pyramid, the feature pyramid module can first perform convolution processing and bilinear interpolation double upsampling processing on the feature map a1, for example, after processing The obtained feature map a11, and the feature map b1 is subjected to convolution processing, such as the feature map b11 obtained after processing, and then the feature pyramid module can add the feature map a11 and the feature map b11, and obtain after the addition process. The feature map of is subjected to convolution processing to obtain the feature map b2, and the number of channels of the feature map b2 is 128.

In addition, the electronic device can perform convolution processing on the feature map a1 to obtain the feature map a2, and the number of channels of the feature map a2 is 128.

For the second-layer feature map b1 and the third-layer feature map c1 in the third feature pyramid, the feature pyramid module can first obtain the feature map a11 and the feature map b11 and add them to obtain b12, and then analyze the feature map b12 performs bilinear interpolation 2 times upsampling processing to obtain feature map b13, and then performs convolution processing on feature map c1 to obtain feature map c11, then adds feature map b13 and feature map c11, and then performs convolution processing to obtain Feature map c2, the number of channels of feature map c2 is 128.

For the third layer feature map c1 and the fourth layer feature map d1 in the third feature pyramid, the feature pyramid module can first perform convolution processing on the feature map d1 to obtain the feature map d11, and then obtain the feature map b13 and the feature map c11. The feature map c12 obtained after the addition processing, and then perform the bilinear interpolation 2 times upsampling process on the feature map c12 to obtain the feature map c13, add the feature map c13 and the feature map d11, and then perform the convolution process to obtain the feature Figure d2, the number of channels of the feature map d2 is 128.

That is, the fourth feature pyramid is composed of feature maps a2, b2, c2, and d2, and their number of channels is 128. The resolutions of feature maps a2, b2, c2, and d2 are 1/32 of the third image, 1/16, 1/8, 1/4.

After that, the electronic device can call the deep feature supervision module, and pass the feature maps of each layer of the fourth feature pyramid from top to bottom through the four upsampling layers of the deep feature supervision module, upsampling 32 times, 16 times, 8 times and 4 times respectively. times, so as to obtain a mask image with the same size as the third image (ie, a deep supervision prediction mask), for example, denote the four deep supervision prediction masks as Mask32, Mask16, Mask8, and Mask4, respectively.

In addition, the electronic device can call the multi-scale decoder module to perform certain processing on each image in the fourth feature pyramid, so that the resolutions of the feature maps of each layer in the fourth feature pyramid are equal to 1 of the third image. /4. For example, as shown in Figure 4, the feature map a2 of the first layer of the fourth feature pyramid can be calculated by two first network blocks cgr2x and one second network block sgr2x in turn to obtain a resolution of 1/4 of the third image Image. The feature map b2 of the second layer of the fourth feature pyramid can be sequentially calculated by a first network block cgr2x and a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image. The third layer feature map c2 of the fourth feature pyramid can be calculated by a second network block sgr2x to obtain an image with a resolution of 1/4 of the third image. The third layer feature map d2 of the fourth feature pyramid can be calculated by a third network block sgr to obtain an image with a resolution of 1/4 of the third image. The multi-scale decoder module can sequentially perform addition processing, convolution processing, and 4-fold upsampling processing on the four images obtained in the above-mentioned manner with a resolution of 1/4 of the third image, thereby obtaining a preliminary segmentation mask, For example, the preliminary segmentation mask is denoted as the first preliminary segmentation mask.

After that, the model can obtain the first label segmentation mask used as the label (Label) of the training model in the training sample. It can be understood that the first labeled segmentation mask is the accurate portrait segmentation mask corresponding to the third image in this training sample.

After that, the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) can respectively calculate the four deep supervision prediction masks Mask32, Mask16, Mask8 output by the deep feature supervision module , the cross entropy loss between Mask4 and the first annotation segmentation mask, and the cross entropy loss between the first preliminary segmentation mask and the first annotation segmentation mask. After that, the model can perform a backpropagation algorithm on the model according to the above-mentioned five cross-entropy losses calculated, and update the parameters of the model.

In multiple training cycles, the electronic device can repeatedly perform the above-mentioned process of training the model (including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module) using the training samples, until The loss function of the model is fully converged, the model is saved, and the parameters of the model are not frozen.

After obtaining the trained model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module, the electronic device can perform the second stage of training. In the second stage of training, the deep feature supervision module in the model obtained after the first stage of training can be removed, and a segmentation module and an edge gradient module can be added.

In this embodiment, the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;

In this embodiment, the model training (that is, the second stage of training) is continued based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:

For example, as shown in FIG. 6 , during the second stage of training, the electronic device can obtain the image in the training sample for input to the multi-scale encoder module, for example, denoted as the fourth image, and firstly record the fourth image. Preset processing is performed, wherein the preset processing may include random cropping and/or normalization processing.

After that, the electronic device can input the image obtained by the preset processing of the fourth image into the training model, and sequentially process the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module to obtain a preliminary segmentation mask, such as Denote as the second preliminary segmentation mask.

After that, the electronic device can call the training model to perform concat processing on the image obtained after the preset processing of the fourth image and the second preliminary segmentation mask, and transmit the concat-processed image to the segmentation module, where the concat processing is to combine the two The graph is stitched in the channel dimension.

The image input to the segmentation module is sequentially processed by three convolutional network blocks and a convolutional layer Conv to output a fine segmentation mask. Among them, each convolutional network block is a network block composed of a convolutional layer, a BN layer and a ReLU layer in sequence. It should be noted that the fine segmentation mask is a two-channel probability prediction map without argmax operation.

Afterwards, the electronic device may acquire a second label segmentation mask used as a label (Label) of the training model in the training sample. It can be understood that the second labeled segmentation mask is the precise portrait segmentation mask corresponding to the fourth image in this training sample. The electronic device may input the second preliminary segmentation mask, the second annotation segmentation mask, and the fourth image to the edge gradient module. The fourth image is transmitted to the Sobel operator module of the edge gradient module, and the gradient map of the fourth image is obtained after being processed by the Sobel operator of the Sobel operator module.

After that, the edge gradient module calls the expansion and corrosion module it includes to perform expansion and corrosion processing on the second label segmentation mask to obtain an edge mask, which is the edge of the portrait (that is, an edge composed of 0 and 1). mask).

After that, the edge gradient module can multiply the gradient map of the fourth image by the edge mask to obtain the edge gradient map of the fourth image, and multiply the fine segmentation mask by the edge mask to obtain the edge probability of the fine segmentation mask forecast graph.

After that, the edge gradient module can calculate the cross-entropy loss and structural similarity loss (SSIM Loss) between the fine segmentation mask and the second annotation segmentation mask, and calculate the edge gradient map of the fourth image and the edge of the fine segmentation mask Edge Gradient Loss between probabilistic prediction graphs. After that, the edge gradient module can calculate the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge probability prediction map between the edge gradient map of the fourth image and the fine segmentation mask. The edge gradient losses are summed. The edge gradient module can perform a backpropagation algorithm on the trained model based on the sum of the calculated losses to update the model parameters.

The above-mentioned second-stage training process can be repeated in multiple training cycles until the loss function of the training model is completely converged, the model and its model parameters are saved, and the edge gradient module in the model obtained after the second-stage training is removed to obtain The model is identified as a pretrained image segmentation model.

The pre-trained image segmentation model can be obtained by training in the above manner.

It should be noted that, in other embodiments, the third image and the fourth image may not be pre-processed, and when the pre-trained image segmentation model is applied to output the segmentation mask of the image, there is no need to pre-process the image. It is set to be processed and then input to the pre-trained image segmentation model.

In the embodiments of the present application, the calculation methods of the cross-entropy loss, the structural similarity loss, and the edge gradient loss are all calculation methods in the prior art, and thus are not repeated in the embodiments of the present application.

It should be noted that this application applies a structural similarity loss to portrait segmentation, so that the segmentation mask is consistent with the label mask in image structure, and additional gradients are provided to train the model and reduce false positive predictions .

This application designs an edge gradient module, which motivates the segmentation mask to be consistent with the input image map on the edge gradient, provides additional gradients for edge features, improves the refined segmentation effect on the edge, and reduces false positive predictions.

This application adopts a lightweight design, utilizes a lightweight basic network in the multi-scale encoder, and realizes a small amount of computation, so it can be deployed on mobile terminals such as mobile phones.

The present application provides a lightweight portrait segmentation model that combines structural similarity and edge gradient. The refinement module (ie, the segmentation module) and the edge gradient module are simultaneously applied to the portrait segmentation model. The two modules are combined, Work together to improve the segmentation effect on the edge and improve the accuracy of portrait segmentation.

In addition to the cross-entropy loss, this application adds structural similarity loss and edge gradient loss during model training to provide additional gradients. Compared with the cross entropy loss, the structural similarity loss and edge gradient loss can make the model pay more attention to the segmentation effect on the edge, and the additional gradient on the edge can promote the improvement of the edge segmentation effect.

This application designs a segmentation module consisting of three convolutional network blocks and one convolutional layer, which improves the segmentation effect while only introducing a small amount of computation. The edge gradient module and deep feature supervision module in this application are removed in the final deployment, so no additional computing resource requirements are added.

In addition, in this application, the segmentation module can be designed to be more complex, for example, various neural networks can be used to implement it, and it only needs to output a segmentation mask at the end. For example, Resnet Block can be used in the segmentation module, etc.

In this application, the number of layers of the feature pyramid can be adjusted flexibly depending on the specific data set. The maximum downsampling multiple can be 64 times, 32 times, 16 times, etc. more high-level feature information. The multi-scale encoder can be implemented using various lightweight basic networks, such as ShuffleNet, MobileNetV3, etc.

The flow of the image processing method shown in FIG. 2 may include:

201. The electronic device acquires a first image.

For example, after obtaining the pre-trained image segmentation model through the training method described above, the electronic device can use the pre-trained image segmentation model to segment images. For example, the electronic device may acquire the first image first.

202. The electronic device performs preset processing on the first image, where the preset processing includes random cropping and/or normalization processing.

For example, after acquiring the first image, the electronic device may perform preset processing on the first image. The preset processing may include random cropping and/or normalization processing.

203. The electronic device obtains a pre-trained image segmentation model, where the pre-trained image segmentation model is used to output a portrait segmentation mask of an image, and the pre-trained image segmentation model includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, segmentation module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module and segmentation module are connected in sequence, the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, the plurality of convolutional network The blocks are sequentially connected and then connected to the at least one convolutional layer, and each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer.

For example, the electronic device can also acquire a pre-trained image segmentation model, and the pre-trained image segmentation model can be used to output a portrait segmentation mask of an image. The pretrained image segmentation model may include a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a segmentation module. The multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the segmentation module are connected in sequence, and the segmentation module includes a plurality of convolutional network blocks and at least one convolutional layer, and the plurality of convolutional network blocks are connected in sequence before Connected to the at least one convolutional layer, each convolutional network block includes a convolutional layer, a batch normalization layer, and a nonlinear activation layer. It should be noted that the user can pre-train the model according to the requirements of portrait segmentation, so that the pre-trained image segmentation model can output the portrait segmentation mask required by the user.

204. The electronic device inputs the image obtained by the preset processing of the first image into a pre-trained image segmentation model, and the pre-trained image segmentation model outputs a portrait segmentation mask corresponding to the first image.

For example, after acquiring the first image and the pre-trained image segmentation model, the electronic device can input the image obtained by pre-processing the first image into the pre-trained image segmentation model, and the pre-trained image segmentation model A portrait segmentation mask corresponding to the first image is output.

205. The electronic device segments the portrait from the first image according to the portrait segmentation mask corresponding to the first image.

For example, after obtaining the portrait segmentation mask corresponding to the first image, the electronic device may segment the corresponding portrait from the first image according to the portrait segmentation mask.

206. After segmenting the portrait from the first image, the electronic device performs background blur processing, background replacement processing, or portrait beauty processing on the first image according to the segmented portrait.

For example, after segmenting the portrait from the first image, the electronic device may perform various processing on the first image according to the segmented portrait. For example, the electronic device may perform background blurring processing on the first image according to the segmented portrait, or the electronic device may perform background replacement processing on the first image according to the segmented portrait, or the electronic device may perform background blurring processing on the first image according to the segmented portrait. Image for portrait beautification, etc.

It is easy to understand that, since the portrait segmented by the electronic device from the first image has high precision, the electronic device performs background blurring processing, background replacement processing, or portrait beauty processing, etc. on the first image according to the segmented portrait. The processing effect will be better, resulting in better image quality.

Please refer to FIG. 7 , which is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. The image processing apparatus 300 may include: a first acquisition module 301 , a second acquisition module 302 , a processing module 303 , and a segmentation module 304 .

a first acquisition module 301, configured to acquire a first image;

The second acquisition module 302 is configured to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of A convolutional network block is connected to at least one convolutional layer, the plurality of convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, batch normalization layer and nonlinear activation layer;

a processing module 303, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;

A segmentation module 304, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.

The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, and the model is saved and the parameters of the model are not frozen.

In one embodiment, the segmentation module 304 may also be used to:

An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the process in the image processing method provided in this embodiment.

An embodiment of the present application further provides an electronic device, including a memory and a processor, where the processor is configured to execute the process in the image processing method provided by the present embodiment by invoking a computer program stored in the memory.

For example, the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone. Please refer to FIG. 8 , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

The electronic device 400 may include components such as a display screen 401, a memory 402, a processor 403, and the like. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components.

The display screen 401 may be used to display information such as text, images, and the like.

Memory 402 may be used to store applications and data. The application program stored in the memory 402 contains executable code. Applications can be composed of various functional modules. The processor 403 executes various functional applications and data processing by executing the application programs stored in the memory 402 .

The processor 403 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 402 and calling the data stored in the memory 402. The various functions and processing data of the device are used to monitor the electronic equipment as a whole.

In this embodiment, the processor 403 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 executes the execution and stores it in the memory 402 in the application, thus executing:

get the first image;

Referring to FIG. 9 , the electronic device 400 may include a display screen 401 , a memory 402 , a processor 403 , a battery 404 , a camera module 405 , a speaker 406 , a microphone 407 and other components.

The display screen 401 may be used to display information such as images, text, and the like.

The battery 404 can be used to provide power support for various modules and components of the electronic device, thereby ensuring the normal operation of the electronic device.

The camera module 405 can be used to capture images.

Speaker 406 may be used to play sound signals.

The microphone 407 may be used to collect sound signals in the surrounding environment. For example, the microphone 407 may be used to capture the user's voice commands.

get the first image;

Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume a stacking layer, wherein the multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;

In one embodiment, the pre-trained image segmentation model further includes a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, feature pyramid module, multi-scale decoder module, The split modules are connected in sequence.

Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:

Loss sums the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;

In one embodiment, the processor 403 may also execute:

The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the preset processing of the first image into the pre-trained image segmentation model.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the detailed description of the image processing method above, and details are not repeated here.

The image processing apparatus provided in the embodiments of the present application and the image processing methods in the above embodiments belong to the same concept, and any method provided in the image processing method embodiments can be executed on the image processing apparatus. For the implementation process, please refer to the embodiment of the image processing method, which will not be repeated here.

It should be noted that, for the image processing methods described in the embodiments of the present application, those of ordinary skill in the art can understand that all or part of the process for implementing the image processing methods described in the embodiments of the present application can be controlled by computer programs. To complete, the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and the execution process can include the flow of the embodiment of the image processing method . Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and the like.

For the image processing apparatus of the embodiments of the present application, each functional module may be integrated into one processing chip, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

The image processing method, device, storage medium, and electronic device provided by the embodiments of the present application are described in detail above. The principles and implementations of the present application are described with specific examples. The descriptions of the above embodiments are only It is used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there will be changes in the specific embodiments and application scope. In summary, this specification The content should not be construed as a limitation on this application.

Claims

An image processing method, wherein the method comprises:

get the first image;

Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;

inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;

A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
The image processing method according to claim 1, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, The multi-scale decoder module and the segmentation module are connected in sequence.
The image processing method according to claim 2, wherein, during model training, the training model used to obtain the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;

During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;

obtaining the first label segmentation mask used as label in the training sample;

separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;

According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;

The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
The image processing method according to claim 3, wherein the training model further comprises an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
The image processing method according to claim 4, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;

The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;

The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
The image processing method according to claim 5, wherein the deep feature supervision module in the saved model comprising a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, and a deep feature supervision module is removed, And add segmentation module and edge gradient module;

Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;

The input image is input to the edge gradient module, and the Sobel operator it includes is called by the edge gradient module to carry out corresponding calculation to the input image to obtain the gradient map of the input image;

Obtain the second annotation segmentation mask used as annotation in the training sample;

Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;

Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;

multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;

calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;

calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;

Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;

Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;

Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;

The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
The image processing method according to claim 1, wherein the method further comprises:

performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;

The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the first image is subjected to the preset processing into the pre-trained image segmentation model.
An image processing device, wherein the device comprises:

a first acquisition module, configured to acquire a first image;

The second acquisition module is used to acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of the image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of volumes The multiple convolutional network blocks are connected in sequence and then connected to the at least one convolutional layer. Each of the convolutional network blocks includes a convolutional layer and a batch normalization layer. and nonlinear activation layer;

a processing module, configured to input the first image into the pre-trained image segmentation model, and output a segmentation mask corresponding to the first image from the pre-trained image segmentation model;

A segmentation module, configured to segment a second image from the first image according to a segmentation mask corresponding to the first image.
The image processing apparatus according to claim 8, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, and a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, The multi-scale decoder module and the segmentation module are connected in sequence.
The image processing apparatus according to claim 9, wherein, during model training, the training model used to obtain the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;

During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;

obtaining the first label segmentation mask used as label in the training sample;

separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;

According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;

The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
The image processing apparatus according to claim 10, wherein the training model further comprises an edge gradient module, and the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
The image processing apparatus according to claim 11, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;

The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;

The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
A computer-readable storage medium having a computer program stored thereon, wherein, when the computer program is executed on a computer, the computer is caused to perform the method of claim 1 .
An electronic device comprising a memory and a processor, wherein the processor executes by calling a computer program stored in the memory: acquiring a first image;

Acquire a pre-trained image segmentation model, the pre-trained image segmentation model is used to output a segmentation mask of an image, the pre-trained image segmentation model includes at least a segmentation module, and the segmentation module includes a plurality of convolutional network blocks and at least one volume Layering, the multiple convolutional network blocks are sequentially connected and then connected to the at least one convolutional layer, and each of the convolutional network blocks includes a convolutional layer, a batch normalization layer and a nonlinear activation layer;

inputting the first image into the pre-training image segmentation model, and outputting a segmentation mask corresponding to the first image by the pre-training image segmentation model;

A second image is segmented from the first image according to a segmentation mask corresponding to the first image.
The electronic device according to claim 14, wherein the pre-trained image segmentation model further comprises a multi-scale encoder module, a feature pyramid module, a multi-scale decoder module, the multi-scale encoder module, the feature pyramid module, the multi-scale encoder module, the multi-scale The scale decoder module and the segmentation module are connected in sequence.
The electronic device according to claim 15, wherein, during model training, the training model for obtaining the pre-trained image segmentation model further comprises a deep feature supervision module, the deep feature supervision module and the feature pyramid module connection, the deep feature supervision module is used to supervise the deep features from multiple scales;

During model training, the multi-scale decoder in the training model outputs a first preliminary segmentation mask corresponding to the training samples, and the deep feature supervision module in the training model outputs N deep supervision prediction masks corresponding to the training samples, N is the number of layers of the feature pyramid;

obtaining the first label segmentation mask used as label in the training sample;

separately calculating the cross-entropy loss for each of the N deep-supervised prediction masks and the first annotation segmentation mask, and computing the first preliminary segmentation mask and the first annotation segmentation mask The cross entropy loss of ;

According to the calculated multiple cross-entropy losses, the back-propagation algorithm is performed on the training model to update the model parameters;

The model training process is repeated over multiple training cycles until the loss function of the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is fully converged, the model is saved and the parameters of the model are not frozen.
The electronic device according to claim 16, wherein the training model further comprises an edge gradient module, the edge gradient module is configured to provide an edge gradient loss function as one of the loss functions during model training.
The electronic device according to claim 17, wherein the edge gradient module is used to calculate the edge gradient map corresponding to the input image in the training sample;

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

inputting the second preliminary segmentation mask and the input image to a segmentation module, and the segmentation module outputs a fine segmentation mask;

The edge gradient module is used to calculate the edge probability prediction map corresponding to the fine segmentation mask;

The edge gradient loss function provided by the edge gradient module is used to calculate the edge gradient loss between the edge gradient map and the edge probability prediction map.
The electronic device according to claim 18, wherein the deep feature supervision module in the saved model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, and the deep feature supervision module is removed, and Add segmentation module and edge gradient module;

Model training continues based on the model including the multi-scale encoder module, the feature pyramid module, the multi-scale decoder module, the segmentation module and the edge gradient module, including the following process:

Input the input image in the training sample into the training model, and obtain the second preliminary segmentation mask through the processing of the multi-scale encoder module, the feature pyramid module, and the multi-scale decoder module in turn;

The second preliminary segmentation mask and the input image are spliced in the channel dimension and then input to the segmentation module, and the segmentation module outputs the fine segmentation mask;

Inputting the input image to the edge gradient module, and the edge gradient module invokes the Sobel operator included in the edge gradient module to perform corresponding calculations on the input image to obtain a gradient map of the input image;

Obtain the second annotation segmentation mask used as annotation in the training sample;

Invoking the expansion and corrosion module included in the edge gradient module to perform expansion and corrosion processing on the second labeling segmentation mask to obtain an edge mask;

Multiplying the gradient map of the input image by the edge mask to obtain an edge gradient map corresponding to the input image;

multiplying the fine segmentation mask by the edge mask to obtain an edge probability prediction map corresponding to the fine segmentation mask;

calculating a cross-entropy loss and a structural similarity loss between the fine segmentation mask and the second annotated segmentation mask;

calculating the edge gradient loss between the edge gradient map and the edge probability prediction map;

Loss summation of the cross-entropy loss and structural similarity loss between the fine segmentation mask and the second annotated segmentation mask and the edge gradient loss between the edge gradient map and the edge probability prediction map deal with;

Perform back-propagation algorithm on the training model according to the sum of the calculated losses, and update the model parameters;

Repeat the process of model training in multiple training cycles until the loss function of the model is completely converged, and save the model and its model parameters;

The model obtained after removing the edge gradient module in the model obtained after training is determined as the pre-trained image segmentation model.
The electronic device of claim 14, wherein the processor is configured to perform:

performing preset processing on the first image, the preset processing including random cropping and/or normalization processing;

The inputting the first image into the pre-trained image segmentation model includes: inputting the image obtained after the preset processing of the first image into the pre-trained image segmentation model.