WO2021057848A1

WO2021057848A1 - Network training method, image processing method, network, terminal device and medium

Info

Publication number: WO2021057848A1
Application number: PCT/CN2020/117470
Authority: WO
Inventors: 刘钰安
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-09-29
Filing date: 2020-09-24
Publication date: 2021-04-01
Also published as: CN110660066B; CN110660066A

Abstract

The present application provides a network training method, an image processing method, a network, a terminal device, and a medium. The training method comprise: S1, acquiring sample images containing a target object, a sample mask corresponding to each sample image, and sample edge information corresponding to the sample mask; S2, inputting each sample image to an image segmentation network to obtain a generated mask output by the image segmentation network; S3, inputting the generated mask to a trained edge neural network to obtain generated edge information output by the edge neural network; S4, determining a loss function according to the gap between the sample mask and the generated mask, and the gap between the generated edge information and the sample edge information; S5, adjusting parameters of the image segmentation network, and returning to S2 until the loss function is less than a threshold. The present application enables the contour edge of the target object to be more accurately represented on the mask image output by the image segmentation network.

Description

Network training method, image processing method, network, terminal equipment and medium

Technical field

This application relates to the field of image processing technology, and in particular to an image segmentation network training method, image processing method, image segmentation network, terminal equipment, and computer-readable storage medium.

Background technique

After the user has taken the image, he often wants to change the background in the image (for example, to change the background to an outdoor beach scene, or to change the background to a solid color background for taking ID photos). In order to achieve the above technical effects, the current commonly used method is: use the trained image segmentation network to output a mask that represents the area where the target object (that is, the foreground, such as a portrait) is located, and then use the mask to segment the target object Come out, and then change the image background.

However, the mask output by the current image segmentation network cannot accurately represent the contour edge of the target object, so that the target object cannot be accurately segmented, and the effect of replacing the image background is poor. Therefore, how to enable the mask output by the image segmentation network to more accurately represent the contour edge of the target object is a technical problem that needs to be solved urgently.

Application content

The purpose of the embodiments of this application is to provide an image segmentation network training method, image processing method, image segmentation network, terminal equipment, and computer-readable storage medium, which can make the output of the trained image segmentation network to a certain extent The mask can more accurately represent the contour edge of the target object.

The technical solutions adopted in the embodiments of this application are:

In the first aspect, an image segmentation network training method is provided, which includes steps S101-S105:

S101: Obtain each sample image containing a target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, where each sample mask is used to indicate the target in the corresponding sample image The image area where the object is located, each sample edge information is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask;

S102: For each sample image, input the sample image to an image segmentation network, and obtain a generation mask output by the image segmentation network for indicating the area where the target object in the sample image is located;

S103: For each generated mask, input the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, and the generated edge information is used to indicate the target object indicated by the generated mask The contour edge of the area;

S104. Determine the loss function of the image segmentation network, where the loss function is used to measure the difference between the sample mask corresponding to each sample image and the generated mask, and the loss function is also used to measure each sample image Corresponding to the gap between the generated edge information and the sample edge information;

S105: Adjust various parameters of the image segmentation network, and then return to execute S102 until the loss function of the image segmentation network is less than a first preset threshold, thereby obtaining a trained image segmentation network.

In the second aspect, an image processing method is provided, including:

Obtain the image to be processed, and input the image to be processed into the trained image segmentation network to obtain the mask corresponding to the image to be processed, wherein the trained image segmentation network adopts a trained edge neural network After training, the trained edge neural network is used to output the contour edge of the area where the target object is indicated by the mask according to the input mask;

Based on the mask corresponding to the image to be processed, the target object contained in the image to be processed is segmented.

In a third aspect, an image segmentation network is provided, and the image segmentation network is obtained by training using the training method described in the first aspect.

In a fourth aspect, a terminal device is provided, including a memory, a processor, and a computer program that is stored in the memory and can run on the processor. When the processor executes the computer program, it implements the first aspect or the second aspect. The steps of the method described in the aspect.

In a fifth aspect, a computer-readable storage medium is provided, and the above-mentioned computer-readable storage medium stores a computer program, and when the above-mentioned computer program is executed by a processor, the steps of the method described in the first aspect or the second aspect are implemented.

In a sixth aspect, a computer program product is provided. The computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method described in the first aspect or the second aspect are implemented.

It can be seen from the above that in the training method provided by this application, when training the image segmentation network, the trained edge neural network will be used to train the image segmentation network.

First, use Figure 1 to describe the trained edge neural network. As shown in Figure 1, the trained edge neural network 001 is input to the edge neural network 001 according to the generated mask 002 indicated by the image area (pure) White area), output generated edge information 003, the edge information is used to indicate the location of the contour edge of the image area, the generated edge information 003 in Figure 1 is presented in the form of an image.

The training method provided by this application includes the following steps: First, for each sample, the sample image is input to the image segmentation network to obtain the generated mask output by the image segmentation network, and the generated mask is input to the training After the edge neural network, the generated edge information output by the edge neural network is obtained; secondly, the loss function of the image segmentation network is determined, and the loss function is positively correlated with the mask gap corresponding to each sample image (a sample image corresponds to The mask gap is the gap between the sample mask corresponding to the sample image and the generated mask), and the loss function and the edge gap corresponding to each sample image are also positively correlated (the edge gap corresponding to a sample image is the The difference between the sample edge information corresponding to the sample image and the generated edge information), and finally, adjust the various parameters of the image segmentation network until the loss function is less than the first preset threshold.

It can be seen that while the above training method ensures that the generated mask output by the image segmentation network is close to the sample mask, it will further ensure that the contour edge of the target object represented in the generated mask output by the image segmentation network is more consistent with the actual contour edge. For approximation, therefore, the mask image output by the image segmentation network provided by this application can more accurately represent the contour edge of the target object.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the embodiments or exemplary technical descriptions. Obviously, the accompanying drawings in the following description are only of the present application. Some examples.

Fig. 1 is a schematic diagram of the working principle of a trained edge neural network provided by the present application;

FIG. 2 is a schematic diagram of a training method of an image segmentation network provided by Embodiment 1 of the present application;

FIG. 3 is a schematic diagram of a sample image, sample mask, and sample edge information provided in Embodiment 1 of the present application;

4 is a schematic structural diagram of an image segmentation network provided by Embodiment 1 of the present application;

FIG. 5 is a schematic diagram of the connection relationship between the image segmentation network provided in the first embodiment of the present application and the trained edge neural network;

FIG. 6 is a schematic diagram of the structure of the edge neural network provided in the first embodiment of the present application;

FIG. 7 is a schematic diagram of another image segmentation network training method provided in Embodiment 2 of the present application;

FIG. 8(a) is a schematic diagram of the training process of the edge segmentation network provided in the second embodiment of the present application;

Fig. 8(b) is a schematic diagram of the training process of the image segmentation network provided in the second embodiment of the present application;

FIG. 9 is a schematic diagram of the work flow of the image processing method provided in the third embodiment of the present application;

FIG. 10 is a schematic structural diagram of an image segmentation network training device provided in the fourth embodiment of the present application;

FIG. 11 is a schematic structural diagram of an image processing apparatus according to Embodiment 5 of the present application;

FIG. 12 is a schematic structural diagram of a terminal device according to Embodiment 6 of the present application.

detailed description

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

The method provided in the embodiments of the present application may be applicable to terminal devices. Illustratively, the terminal devices include, but are not limited to: smart phones, tablet computers, notebooks, desktop computers, cloud servers, and the like.

It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other features The existence or addition of, whole, step, operation, element, component and/or its collection.

It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

As used in this specification and the appended claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context . Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of this application, the terms "first", "second", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

In order to illustrate the technical solutions provided by the present application, detailed descriptions are given below in conjunction with specific drawings and embodiments.

Example one

The following describes the training method of the image segmentation network provided in the first embodiment of the present application. Please refer to FIG. 2. The training method includes:

In step S101, each sample image containing the target object, the sample mask corresponding to each sample image, and the sample edge information corresponding to each sample mask are obtained, where each sample mask is used to indicate the corresponding sample In the image area where the target object is located, each sample edge information is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask.

In the embodiment of the present application, a part of the sample images can be obtained from the data set first, and then the number of sample images used for training the image segmentation network can be expanded in the following ways: mirror inversion, scale scaling and/or scaling of the sample images obtained in advance Gamma changes, etc., so as to increase the number of sample images, so as to obtain each sample image described in step S101.

The sample mask described in this application is a binary image. The method of obtaining the sample edge information corresponding to a certain sample mask in step S101 may be: performing an expansion operation on the sample mask to obtain a mask image after the expansion operation, and combining the mask image after the expansion operation with the By subtracting the sample mask, the edge information of the sample corresponding to the sample mask can be obtained. The edge information of the sample obtained in this way is the same as the sample mask, which is a binary image.

In order to facilitate those skilled in the art to have a more intuitive understanding of the sample image, sample mask, and sample edge information, the following will use FIG. 3 for description. As shown in FIG. 3, an image 201 is a sample image containing a target object (ie, a portrait), an image 202 may be a sample mask corresponding to the sample image 201, and an image 203 may be sample edge information corresponding to the sample mask 201. In addition, those skilled in the art should understand that the above sample edge information is not necessarily a binary image, but can also be other information expression forms, as long as it can reflect the "contour edge of the image area where the target object is indicated by the sample mask". .

In addition, those skilled in the art should be able to understand that the above-mentioned target object may be any subject, such as a portrait, a dog, a cat, etc., and this application does not limit the category of the target object.

In addition, in order to better train the image segmentation network, the image content contained in each sample image should be as different as possible. For example, if the above target object is a portrait, the image content contained in sample image 1 can be a frontal portrait of Xiao Ming. The image content contained in the sample image 2 may be a half-profile portrait of Xiaohong.

In step S102, for each sample image, the sample image is input to the image segmentation network, and a generation mask output by the image segmentation network for indicating the area of the target object in the sample image is obtained.

In the embodiment of the present application, before performing step S102, an image segmentation network needs to be established in advance, and the image segmentation network is used to output a mask corresponding to the image (that is, to generate a mask) according to the input image. The image segmentation network may be CNN (Convolutional Neural Networks, Convolutional Neural Network), or FPN (Feature Pyramid Networks, Feature Pyramid Network), and this application does not limit the specific network structure of the image segmentation network. The image segmentation network using the FPN structure can be specifically referred to in Figure 4.

After the above-mentioned image segmentation network is established, the step S102 is started to train the image segmentation network.

In the training process, each sample image needs to be input to the image segmentation network to obtain each generation mask output by the image segmentation network, where each generation mask corresponds to a sample image. In addition, those skilled in the art can easily understand that the "generating mask" described in this step is the same as the sample mask described in step S101, and may be a binary image.

In step S103, for each generated mask, input the generated mask to the trained edge neural network to obtain the generated edge information output by the edge neural network, and the generated edge information is used to indicate what the generated mask indicates The contour edge of the area where the target object is located.

Before performing this step S103, a trained edge neural network needs to be obtained. The trained edge neural network is used to generate a mask based on the input and output to generate edge information. The generated edge information is used to indicate the input generated mask The contour edge of the area where the indicated target object is located. In the embodiment of the present application, the edge neural network after training may be as shown in FIG. 1. In FIG. 1, after inputting the generated mask shown in 002 into the trained edge neural network shown in 001, the trained edge neural network shown in 001 will output the generated edge information shown in 003.

After the trained edge neural network is obtained, each of the generated masks described in step S102 is input to the trained edge neural network, and each generated edge information output by the trained edge neural network is obtained, where each Each generation edge information corresponds to a generation mask, which is used to represent the contour edge of the image area where the target object indicated by the generation mask is located.

In the embodiment of the present application, in the process of training the image segmentation network, the connection between the image segmentation network and the trained edge neural network is shown in FIG. 5.

In step S104, the loss function of the above-mentioned image segmentation network is determined. The loss function is used to measure the difference between the sample mask and the generated mask corresponding to each sample image, and the loss function is also used to measure each sample image. Respectively correspond to the gap between the generated edge information and the sample edge information.

Those skilled in the art can easily understand that each sample image corresponds to a sample mask, sample edge information, mask generation, and edge information generation. In order to obtain the loss function described in step S104, for each sample image, it is necessary to calculate the gap between the sample mask corresponding to the sample image and the generated mask (for the convenience of subsequent description, it is defined that for a certain sample image, The gap between the sample mask corresponding to the sample image and the generated mask is the mask gap corresponding to the sample image), it is also necessary to calculate the gap between the sample edge information corresponding to the sample image and the generated edge information (for the convenience of subsequent description, the definition of For a sample image, the difference between the sample edge information corresponding to the sample image and the generated edge information is the edge difference corresponding to the sample image).

In step S104, the loss function of the aforementioned image segmentation network needs to be calculated. The loss function and each sample image are used to measure the difference between the sample mask and the generated mask corresponding to each sample image, and the loss function is also used To measure the gap between the generated edge information and the sample edge information corresponding to each sample image, that is, the loss function is positively correlated with the mask gap corresponding to each sample image, and the loss function is related to each sample image. The corresponding marginal gaps are also positively correlated.

In the embodiment of the present application, the calculation process of the aforementioned loss function may be:

Step A. For each sample image, calculate the image difference between the generated mask corresponding to the sample image and the sample mask corresponding to the sample image (that is, it can be regarded as

m1 _i is the pixel value of the i-th pixel of the generated mask, m2 _i is the pixel value of the i-th pixel of the sample mask, and M is the total number of pixels of the generated mask).

Step B. If the above sample edge information and generated edge information are both images, then for each sample image, calculate the image difference between the sample edge information corresponding to the sample image and the generated edge information corresponding to the sample image (calculation of the image difference Refer to step A).

Step C. The image differences obtained in the above step A and the image differences obtained in the image B can be averaged (if the number of sample images is N, the image differences obtained in step A and the image differences obtained in step B can be calculated And then divide by 2N) to get the loss function.

However, the calculation method of the aforementioned loss function is not limited to the aforementioned step A-step C. In the embodiment of the present application, the aforementioned loss function can also be calculated by the following formula (1):

Among them, LOSS ₁ is the loss function of the above image segmentation network, N is the total number of sample images, F1 _{j is} used to measure the gap between the sample mask corresponding to the jth sample image and the generated mask, and F2 _{j is} used to measure the The difference between the sample edge information corresponding to j sample images and the generated edge information,

In the embodiment of the present application, _{the calculation method of F1 j} may be: calculating the cross entropy loss between the sample mask and the generated mask corresponding to the j-th sample image, and the specific formula is as follows:

Among them, M is the total number of pixels in the j-th sample image, _{the value of y ji} is determined according to the sample mask corresponding to the j-th sample image, and y _{ji is} used to indicate the i-th sample image in the j-th sample image. Whether the pixel is in the image area where the target object is located, p _ji is the probability that the i-th pixel in the j-th sample image predicted by the image segmentation network is in the image area where the target object is located, x is the logarithm log Bottom value.

In the embodiment of the present application, _{the value of y ji} is determined according to the sample mask corresponding to the j-th sample image. For example, if the sample mask corresponding to the j-th sample image is in the sample mask, it indicates that the j-th sample image is If i pixels are located in the image area where the target object is located, y _ji can be 1. If the sample mask corresponding to the j-th sample image is in the sample mask, it indicates that the i-th pixel in the j-th sample image is not located where the target object is In the image area, y _ji can be 0. Those skilled in the art should understand that _{the value of y ji} is not limited to 1 and 0, and can also be other values. The value of y _ji is preset, such as 1 or 0.

Those skilled in the art should note that when the sample mask indicates that the i-th pixel is located in the image area where the target object is located, _{the value of y ji} is greater than when the sample mask indicates that the i-th pixel is not located in the image area where the target object is located. The value of y _{ji at time.} That is, if the sample mask indicates that the i-th pixel is located in the image area where the target object is located, y _ji is 1, otherwise, y _ji is 0. Or, if the sample mask indicates that the i-th pixel is located in the image area where the target object is located, y _ji is 2, otherwise, y _ji is 1. Or, if the sample mask indicates that the i-th pixel is located in the image area where the target object is located, y _ji is 0.8, otherwise, y _ji is 0.2.

In the embodiment of the present application, _{the calculation method of F2 j} may be similar to the above formula (2), that is, calculating the cross entropy loss of the sample edge information corresponding to the j-th sample image and the generated edge information.

Or, if the above-mentioned trained edge neural network is formed by cascading A convolutional blocks, and each convolutional block has B convolutional layers, correspondingly, _{the calculation formula of F2 j} can be as follows:

Among them, mask ₁ is the generation mask corresponding to the j-th sample image, mask ₂ is the sample mask corresponding to the j-th sample image, and h _c (mask ₁ ) is when the trained edge neural network input is mask ₁ , the first _{The output of c convolutional blocks, h c} (mask ₂ ) is the output of the c-th convolutional block when the trained edge neural network input is mask ₂ , and λ _c is a constant.

In the above _{calculation formula of F2 j} , when the trained edge neural network input is mask ₂ , the output of the last convolution block can be considered equivalent to the sample edge information. Therefore, the sample edge information can be measured by the above formula (3) The gap with generating edge information.

As shown in Figure 6, the edge neural network can be formed by cascading three convolutional blocks, and each convolutional block is a convolutional layer.

In step S105, it is determined whether the aforementioned loss function is less than the first preset threshold, if so, step S107 is executed, otherwise, step S106 is executed.

In step S106, adjust each parameter of the above-mentioned image segmentation network, and then return to perform step S102.

In step S107, a trained image segmentation network is obtained.

That is, the parameters of the image segmentation network are continuously adjusted until the loss function is less than the first preset threshold. In addition, in the embodiments of the present application, the parameter adjustment method is not specifically limited, and a gradient descent algorithm, a power update algorithm, etc. can be used, and the method used for adjusting the parameters is not limited here.

In addition, in the embodiment of the present application, when the image segmentation network is trained, before the sample image is input to the image segmentation network, the sample image can be preprocessed first, and then the preprocessed sample image can be input to the image Split the network. Wherein, the above-mentioned preprocessing may include: image cropping and/or normalization processing and so on.

After the above step S107, the test set can also be used to evaluate the trained image segmentation network. Wherein, the method of obtaining the test set can be referred to the prior art, and will not be repeated here.

For a single sample image in the test set, the evaluation function can be:

Where X is the image area of the target object indicated by the generated mask output by the image segmentation network after the sample image is input to the trained image segmentation network.

Y is the image area of the target object indicated by the sample mask corresponding to the sample image.

The IoU (Intersection-over-Union) of X and Y is used to evaluate the image segmentation network after training. The closer the value of IoU is to 1, the better the performance of the image segmentation network after training. By evaluating the trained image segmentation network, we can further evaluate whether the performance of the trained image segmentation network meets the requirements. For example, if it is determined that the performance of the trained image segmentation network does not meet the requirements, the training of the trained image segmentation network is continued.

The training method provided in the first embodiment of the application ensures that the generated mask output by the image segmentation network is close to the sample mask, and at the same time, it will further ensure that the contour edge of the target object represented in the generated mask output by the image segmentation network is true The contour edge of is closer. Therefore, the image corresponding to the generated mask output by the image segmentation network provided by this application can more accurately represent the contour edge of the target object.

Example two

The following describes another image segmentation network training method provided in the second embodiment of the present application. Compared with the training method described in the first embodiment, the training method includes the training process of the edge neural network. Please refer to Figure 7. The training method includes:

In step S301, each sample image containing the target object, the sample mask corresponding to each sample image, and the sample edge information corresponding to each sample mask are obtained, where each sample mask is used to indicate the corresponding sample In the image area where the target object is located, each sample edge information is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask.

For the specific implementation process of step S301, please refer to the part of step S101 in the first embodiment, which will not be repeated here.

In step S302, for each sample mask, the sample mask is input to the edge neural network to obtain edge information output by the edge neural network, and the edge information is used to indicate the area where the target object indicated by the sample mask is located Contour edges.

In the embodiment of the present application, the step S302 to the subsequent step S306 are the training process of the edge neural network to obtain the trained edge neural network. Those skilled in the art should be able to understand that the steps S302-S306 are executed before the subsequent step S308, and need not be executed before the step S307.

Before performing step S302, an edge neural network needs to be established in advance, and the edge neural network is used to obtain the contour edge of the area where the target object indicated by the input sample mask is located. As shown in Fig. 6, the edge neural network can be formed by cascading three convolutional layers.

In step S302, each sample mask is input to the edge neural network to obtain each edge information output by the edge neural network, wherein each sample mask corresponds to one edge information output by the edge neural network.

In step S303, the loss function of the edge neural network is determined, and the loss function is used to measure the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network.

In the embodiment of the present application, the specific meaning of step S303 is to determine the loss function of the above-mentioned edge neural network, where the loss function is positively correlated with the edge gap corresponding to each sample mask (the edge gap is corresponding to the sample mask). The difference between the edge information of the sample and the edge information output by the edge neural network after the sample mask is input to the edge neural network).

In the embodiment of the present application, the calculation method of the loss function of the above-mentioned edge neural network may be:

If the edge information of the sample and the edge information output by the edge neural network are both the image shown in 003 in Figure 1, the loss function of the edge neural network may be the difference between the edge information of the sample and the edge information output by the edge neural network. Image difference (the image difference calculation method can be referred to as described in step A in the first embodiment, which will not be repeated here).

In addition, the calculation method of the loss function of the edge neural network may be: for each sample mask, calculate the cross-entropy loss of the corresponding sample edge information and the edge information output by the edge neural network, and then calculate the average. The specific calculation formula is as follows:

Among them, LOSS ₂ is the loss function of the aforementioned edge neural network, and N is the total number of sample masks (it is easy for those skilled in the art to understand that the total number of sample images, sample masks, and sample edge information are all the same, and they are all N ), M is the total number of pixels in the j-th sample mask, _{the value of r ji} is determined according to the sample edge information corresponding to the j-th sample image, and r _{ji is} used to indicate the number of pixels in the j-th sample mask. Whether i pixels are contour edges, q _ji is the probability that the i-th pixel in the j-th sample mask predicted by the edge neural network is the contour edge, and x is the bottom value of the logarithm log.

In the embodiment of the present application, _{the value of r ji} is determined according to the sample edge information corresponding to the jth sample mask. For example, if the sample edge information corresponding to the jth sample mask indicates the jth sample mask If the i-th pixel in the film is a contour edge, r _ji can be 1. If the sample edge information corresponding to the j-th sample mask indicates that the i-th pixel is not a contour edge, then r _ji can be 0 . Those skilled in the art should understand that _{the value of r ji} is not limited to 1 and 0, and may also be other values. The value of r _ji is preset, such as 1 or 0.

Those skilled in the art should be noted that, when the edge of the sample information indicates the i-th pixel as the edge contour, the value of r _ji when the sample is greater than the edge information indicates an i-th pixel is not the value of r _ji edge contour. That is, if the sample edge information indicates that the i-th pixel is a contour edge, r _ji is 1, otherwise, r _ji is 0. Or, if the sample edge information indicates that the i-th pixel is a contour edge, r _ji is 2, otherwise, r _ji is 1. Or, if the sample edge information indicates that the i-th pixel is a contour edge, r _ji is 0.8, otherwise, r _ji is 0.2.

In step S304, it is determined whether the loss function of the aforementioned edge neural network is less than a second preset threshold, if not, step S305 is executed, and if yes, step S306 is executed.

In step S305, adjust each parameter of the above-mentioned edge neural network model, and then return to step S302.

In step S306, a trained edge neural network is obtained.

That is, the parameters of the edge neural network are continuously adjusted until the loss function is less than the second preset threshold. In addition, in the embodiments of the present application, the parameter adjustment method is not specifically limited, and a gradient descent algorithm, a power update algorithm, etc. can be used, and the method used for adjusting the parameters is not limited here.

In step S307, for each sample image, the sample image is input to the image segmentation network, and the generated mask output by the image segmentation network for indicating the area of the target object in the sample image is obtained.

In step S308, for each generated mask, input the generated mask to the trained edge neural network to obtain the generated edge information output by the edge neural network, and the generated edge information is used to indicate what the generated mask indicates The contour edge of the area where the target object is located.

In step S309, the loss function of the above-mentioned image segmentation network is determined. The loss function is used to measure the difference between the sample mask corresponding to each sample image and the generated mask, and the loss function is also used to measure each sample image. Respectively correspond to the gap between the generated edge information and the sample edge information.

In step S310, it is determined whether the aforementioned loss function is less than a first preset threshold, if so, step S312 is executed, otherwise, step S311 is executed.

In step S311, adjust the various parameters of the above-mentioned image segmentation network, and then return to perform step S102.

In step S312, a trained image segmentation network is obtained.

The specific implementation manners of the above steps S307-S312 are completely the same as the specific implementation manners of the steps S102-S107 in the first embodiment. For details, please refer to the description of the first embodiment, which will not be repeated here.

The following briefly describes the training process described in the second embodiment of the present application by using FIG. 8.

As shown in Figure 8(a), the training process of the edge neural network. First, input the sample mask into the edge neural network to obtain the edge information output by the edge neural network. Secondly, the cross-entropy loss is calculated according to the edge neural network and the edge information of each sample. The edge information of the sample is obtained by the expansion operation and the subtraction operation of the sample mask. For details, please refer to the description of the first embodiment, which will not be repeated here. Then, the various cross entropy losses are averaged to obtain the loss function. Finally, continuously adjust the various parameters of the edge neural network until the loss function is small, so as to obtain the edge neural network after training.

After the trained edge neural network is obtained, see Figure 8(b) to realize the training of the image segmentation network.

As shown in Figure 8(b), it is the training process of the image segmentation network. First, input the sample image into the image segmentation network to obtain the generated mask output by the image segmentation network, and calculate the generated mask and sample mask The cross entropy loss. Secondly, the generated mask is input to the trained edge neural network to obtain the output of each convolutional layer, and the sample mask is input to the trained edge neural network to obtain the output of each convolutional layer. Then, according to the calculated cross-entropy loss, the output of each convolutional layer when the generated mask is input, and the output of each convolutional layer when the sample mask is input, the loss function of the image segmentation network is calculated (the specific calculation method can be See the description of Example 1). Finally, the parameters of the image segmentation network are continuously adjusted until the loss function is small, so as to obtain the image segmentation network after training.

Compared with the first embodiment, the training method described in the second embodiment of the present application has an additional training process of the edge neural network, which can make the samples used for training the edge neural network consistent with the samples used for training the image segmentation network Therefore, the accuracy of the edge of the mask output by the image segmentation network can be better measured according to the output result of the edge neural network, so as to better train the image segmentation network.

Example three

The third embodiment of the present application provides an image processing method. Please refer to FIG. 9. The image processing method includes:

In step S401, an image to be processed is obtained, and the image to be processed is input to the trained image segmentation network to obtain a mask corresponding to the image to be processed, wherein the trained image segmentation network uses the trained edge The neural network is trained, and the trained edge neural network is used to output the edge contour of the area where the target object indicated by the mask is located according to the input mask.

Specifically, the trained edge neural network described in this step S401 is a neural network obtained by training using the method described in the first or second embodiment above.

In step S402, the target objects contained in the image to be processed are segmented based on the mask corresponding to the image to be processed.

It is easy for those skilled in the art to understand that after the above step S402, a specific operation of changing the background can also be performed. This operation is in the prior art and will not be repeated here.

The method described in the third embodiment can be a method applied in a terminal device (such as a mobile phone). This method can facilitate the user to replace the background in the image to be processed. This method can accurately segment the target object and more accurately replace the background. Can improve user experience to a certain extent.

Example four

The fourth embodiment of the present application provides a training device for an image segmentation network. For ease of description, only the parts related to the present application are shown. As shown in FIG. 10, the training device 500 includes:

The sample acquisition module 501 is used to acquire each sample image containing the target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, wherein each sample mask is used to indicate Corresponding to the image area where the target object is located in the sample image, each sample edge information is used to indicate the contour edge of the image area where the target object indicated by the corresponding sample mask is located.

The generation mask acquisition module 502 is configured to input the sample image to the image segmentation network for each sample image, and obtain the generation mask output by the image segmentation network for indicating the area of the target object in the sample image.

The generated edge acquisition module 503 is used to input the generated mask to the trained edge neural network for each generated mask to obtain the generated edge information output by the edge neural network, and the generated edge information is used to indicate the generated mask. The contour edge of the area where the target object indicated by the film is located.

The loss determination module 504 is used to determine the loss function of the image segmentation network. The loss function is used to measure the difference between the sample mask and the generated mask corresponding to each sample image, and the loss function is also used to Measure the gap between the generated edge information and the sample edge information corresponding to each sample image.

The parameter adjustment module 505 is used to adjust various parameters of the image segmentation network, and then trigger the generation mask acquisition module to continue to perform corresponding steps until the loss function of the image segmentation network is less than the first preset threshold, thereby Get the trained image segmentation network.

Optionally, the aforementioned loss determination module 504 is specifically configured to:

Determine the loss function of the image segmentation network, and the calculation formula of the loss function is:

Among them, LOSS ₁ is the loss function of the image segmentation network, N is the total number of sample images, F1 _{j is} used to measure the gap between the sample mask corresponding to the jth sample image and the generated mask, and F2 _{j is} used to measure The difference between the sample edge information corresponding to the j-th sample image and the generated edge information,

Optionally, the calculation formula of _{F1 j is as follows:}

Further, when the sample mask indicates the i-th pixel of the image region of the target object is located, the value of y _ji is greater than when the sample mask indicates the i-th pixel of the image region of the target object is not located in the y _ji value.

Optionally, the above-mentioned trained edge neural network is formed by cascading A convolutional blocks, and each convolutional block is composed of B convolutional layers.

Correspondingly, the calculation formula for the _{above F2 j is as follows:}

Optionally, the above-mentioned training device further includes an edge neural network training module, and the edge neural network training module includes:

The edge information acquisition unit is used to input the sample mask to the edge neural network for each sample mask to obtain the edge information output by the edge neural network, and the edge information is used to indicate the target object indicated by the sample mask The contour edge of the area.

The edge loss determining unit is used to determine the loss function of the edge neural network, and the loss function is used to measure the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network.

The edge parameter adjustment unit is used to adjust various parameters of the edge neural network, and then trigger the edge information acquisition unit to continue to perform corresponding steps until the loss function value of the edge neural network is less than the second preset threshold, thereby obtaining training After the edge of the neural network.

Optionally, the aforementioned edge loss determining unit is specifically used for:

Determine the loss function of the edge neural network, and the calculation formula of the loss function is:

Among them, LOSS ₂ is the loss function of the edge neural network, N is the total number of sample images, M is the total number of pixels in the j-th sample mask, and _{the value of r ji} is based on the j-th sample image The corresponding sample edge information is determined, r _{ji is} used to indicate whether the i-th pixel in the j-th sample mask is a contour edge, and q _ji is the i-th pixel in the j-th sample mask predicted by the edge neural network The pixel point is the probability of the edge of the contour, and x is the base value of the logarithm log.

Further, when the sample information indicates the i-th edge pixel as an edge contour, the value of r _ji when the sample is greater than the edge information indicates an i-th pixel is not the value of r _ji edge contour.

It should be noted that the information interaction and execution process among the above-mentioned devices/units are based on the same concept as the method embodiment 1 and method embodiment 2 of this application. For specific functions and technical effects, please refer to The corresponding method embodiment part will not be repeated here.

Example five

The fifth embodiment of the present application provides an image processing device. For ease of description, only the parts related to the present application are shown. As shown in FIG. 11, the image processing apparatus 600 includes:

The mask acquisition module 601 is used to acquire the image to be processed, and input the image to be processed into the trained image segmentation network to obtain the mask corresponding to the image to be processed, wherein the trained edge neural network is used After training, the trained edge neural network is used to output the edge contour of the area where the target object indicated by the mask is located according to the input mask (specifically, the trained image segmentation network adopts the method as in the first embodiment or The training method described in the second embodiment is obtained through training).

The target object segmentation module 602 is configured to segment the target object contained in the image to be processed based on the mask corresponding to the image to be processed.

It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the third method of this application. For specific functions and technical effects, please refer to the third part of the third method of the application. , I won’t repeat it here.

Example Six

FIG. 12 is a schematic diagram of a terminal device provided in Embodiment 6 of the present application. As shown in FIG. 12, the terminal device 700 of this embodiment includes a processor 701, a memory 702, and a computer program 703 that is stored in the memory 702 and can run on the processor 701. The above-mentioned processor 701 implements the steps in the above-mentioned method embodiments when the above-mentioned computer program 703 is executed. Alternatively, when the processor 701 executes the computer program 703, the function of each module/unit in the foregoing device embodiments is realized.

Exemplarily, the foregoing computer program 703 may be divided into one or more modules/units, and the foregoing one or more modules/units are stored in the foregoing memory 702 and executed by the foregoing processor 701 to complete the present application. The foregoing one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the foregoing computer program 703 in the foregoing terminal device 700. For example, the aforementioned computer program 703 can be divided into a sample acquisition module, a mask generation module, an edge generation module, a loss determination module, and a parameter adjustment module. The specific functions of each module are as follows:

S101: Obtain each sample image containing a target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, where each sample mask is used to indicate the target in the corresponding sample image In the image area where the object is located, the edge information of each sample is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask.

S102: For each sample image, input the sample image to an image segmentation network, and obtain a generation mask output by the image segmentation network for indicating a region where a target object in the sample image is located.

S103: For each generated mask, input the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, and the generated edge information is used to indicate the target object indicated by the generated mask The contour edge of the area.

S104. Determine the loss function of the image segmentation network, where the loss function is used to measure the difference between the sample mask corresponding to each sample image and the generated mask, and the loss function is also used to measure each sample image Respectively correspond to the gap between the generated edge information and the sample edge information.

Alternatively, the aforementioned computer program 703 can be divided into a mask acquisition module and a target object segmentation module, and the specific functions of each module are as follows:

Obtain the image to be processed, and input the image to be processed into the trained image segmentation network to obtain the mask corresponding to the image to be processed, wherein the trained image segmentation network adopts the first embodiment or the embodiment The training method described in the second is obtained by training.

The foregoing terminal device may include, but is not limited to, a processor 701 and a memory 702. Those skilled in the art can understand that FIG. 12 is only an example of the terminal device 700, and does not constitute a limitation on the terminal device 700. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. For example, the aforementioned terminal device may also include input and output devices, network access devices, buses, and so on.

The so-called processor 701 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The foregoing memory 702 may be an internal storage unit of the foregoing terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash memory equipped on the terminal device 700. Card (Flash Card), etc. Further, the aforementioned memory 702 may also include both an internal storage unit of the aforementioned terminal device 700 and an external storage device. The above-mentioned memory 702 is used to store the above-mentioned computer program and other programs and data required by the above-mentioned terminal device. The aforementioned memory 702 can also be used to temporarily store data that has been output or will be output.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the above device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are only illustrative. For example, the division of the above-mentioned modules or units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or Components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the above-mentioned integrated modules/units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the foregoing method embodiments, and can also be completed by instructing relevant hardware through a computer program. The foregoing computer program may be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The above-mentioned computer-readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random Access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable media can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable media cannot Including electric carrier signal and telecommunication signal.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still compare the foregoing embodiments. The recorded technical solutions are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in this Within the scope of protection applied for.

Claims

A method for training an image segmentation network, which is characterized in that it includes:

S101: Obtain each sample image containing a target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, where each sample mask is used to indicate the target in the corresponding sample image The image area where the object is located, each sample edge information is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask;

S102: For each sample image, input the sample image to an image segmentation network, and obtain a generation mask output by the image segmentation network for indicating the area where the target object in the sample image is located;

S103: For each generated mask, input the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, and the generated edge information is used to indicate the target object indicated by the generated mask The contour edge of the area;

S104. Determine the loss function of the image segmentation network, where the loss function is used to measure the difference between the sample mask corresponding to each sample image and the generated mask, and the loss function is also used to measure each sample image Corresponding to the gap between the generated edge information and the sample edge information;

S105: Adjust various parameters of the image segmentation network, and then return to execute S102 until the loss function of the image segmentation network is less than a first preset threshold, thereby obtaining a trained image segmentation network.
The training method according to claim 1, wherein the determining the loss function of the image segmentation network, the loss function is used to measure the difference between the sample mask and the generated mask corresponding to each sample image, In addition, the loss function is also used to measure the gap between the generated edge information and the sample edge information corresponding to each sample image, including:

Determine the loss function of the image segmentation network, the loss function is positively related to the mask gap corresponding to each sample image, and the loss function is positively related to the edge gap corresponding to each sample image, where each The mask gap corresponding to each sample image is the gap between the sample mask corresponding to the sample image and the generated mask, and the edge gap corresponding to each sample image is the gap between the sample edge information corresponding to the sample image and the generated edge information.
The training method according to claim 2, wherein the loss function of the image segmentation network is determined, and the loss function is positively correlated with the mask gap corresponding to each sample image, and the loss function It is positively related to the edge gap corresponding to each sample image, including:

Determine the loss function of the image segmentation network, and the calculation formula of the loss function is:

Among them, LOSS 1 is the loss function of the image segmentation network, N is the total number of sample images, F1 j is used to measure the gap between the sample mask corresponding to the jth sample image and the generated mask, and F2 j is used to measure The difference between the sample edge information corresponding to the j-th sample image and the generated edge information,
The training method according to claim 3, wherein the calculation formula of F1 j is as follows:

Among them, M is the total number of pixels in the j-th sample image, the value of y ji is determined according to the sample mask corresponding to the j-th sample image, and y ji is used to indicate the i-th sample image in the j-th sample image. Whether the pixel is in the image area where the target object is located, p ji is the probability that the i-th pixel in the j-th sample image predicted by the image segmentation network is in the image area where the target object is located, x is the logarithm log Bottom value

Further, when the sample mask indicates the i-th pixel of the image region of the target object is located, the value of y ji is greater than when the sample mask indicates the i-th pixel of the image region of the target object is not located in the y ji value.
The training method according to claim 1, wherein said determining the loss function of the image segmentation network comprises:

For each sample image, calculate the image difference between the generated mask corresponding to the sample image and the sample mask corresponding to the sample image;

If the foregoing sample edge information and the generated edge information are both images, then for each sample image, calculate the image difference between the sample edge information corresponding to the sample image and the generated edge information corresponding to the sample image;

The image difference of the sample mask and the image difference of the generated edge information are averaged to obtain the loss function of the image segmentation network.
The training method according to claim 3, wherein the trained edge neural network is formed by cascading A convolutional blocks, and each convolutional block is composed of B convolutional layers;

Correspondingly, the calculation formula of F2 j is as follows:

Among them, mask 1 is the generation mask corresponding to the j-th sample image, mask 2 is the sample mask corresponding to the j-th sample image, and h c (mask 1 ) is when the trained edge neural network input is mask 1 , the first The output of c convolutional blocks, h c (mask 2 ) is the output of the c-th convolutional block when the trained edge neural network input is mask 2 , and λ c is a constant.
The training method according to any one of claims 1 to 6, characterized in that, before the step S103, the training method further comprises a training process for the edge neural network, and the training process for the edge neural network as follows:

For each sample mask, input the sample mask to the edge neural network to obtain edge information output by the edge neural network, and the edge information is used to indicate the contour edge of the area where the target object indicated by the sample mask is located;

Determining a loss function of the edge neural network, where the loss function is used to measure the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network;

Adjust each parameter of the edge neural network, and then return to the execution of the steps for each sample mask, input the sample mask to the edge neural network, and obtain the edge information output by the edge neural network and the subsequent steps until all The value of the loss function of the edge neural network is less than the second preset threshold, so that the edge neural network after training is obtained.
The training method according to claim 7, wherein the determining the loss function of the edge neural network, the loss function is used to measure the sample edge information corresponding to each sample mask and the output of the edge neural network The marginal information gaps include:

Determine the loss function of the edge neural network, and the calculation formula of the loss function is:

Among them, LOSS 2 is the loss function of the edge neural network, N is the total number of sample images, M is the total number of pixels in the j-th sample mask, and the value of r ji is based on the j-th sample image The corresponding sample edge information is determined, r ji is used to indicate whether the i-th pixel in the j-th sample mask is a contour edge, and q ji is the i-th pixel in the j-th sample mask predicted by the edge neural network The probability that the pixel is the edge of the contour, x is the base value of the logarithm log;

Further, when the sample information indicates the i-th edge pixel as an edge contour, the value of r ji when the sample is greater than the edge information indicates an i-th pixel is not the value of r ji edge contour.
The training method according to any one of claims 1 to 6, characterized in that, after obtaining the trained image segmentation network, it comprises:

The trained image segmentation network is evaluated according to the sample images of the test set and the evaluation function, where the evaluation function is:

Wherein, X is the image area of the target object indicated by the generation mask output by the image segmentation network after any sample image in the test set is input to the trained image segmentation network;

Y is the image area of the target object indicated by the sample mask corresponding to the sample image of the input image segmentation network after training;

The closer the IoU value is to 1, the better the performance of the image segmentation network after training.
An image processing method, characterized in that it comprises:

Obtain the image to be processed, and input the image to be processed into the trained image segmentation network to obtain the mask corresponding to the image to be processed, wherein the trained image segmentation network adopts a trained edge neural network After training, the trained edge neural network is used to output the contour edge of the area where the target object is indicated by the mask according to the input mask;

Based on the mask corresponding to the image to be processed, the target object contained in the image to be processed is segmented.
10. The image processing method of claim 10, wherein the trained image segmentation network is obtained by training using a trained edge neural network, and the trained edge neural network is used according to the input mask, Output the contour edge of the area where the target object indicated by the mask is located, including:

The trained image segmentation network is obtained by using the training method according to any one of claims 1 to 9.
An image segmentation network, characterized in that the image segmentation network is trained by the training method according to any one of claims 1 to 9.
A terminal device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the following steps when the processor executes the computer program:

S101: Obtain each sample image containing a target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, where each sample mask is used to indicate the target in the corresponding sample image The image area where the object is located, each sample edge information is used to indicate the contour edge of the image area where the target object is indicated by the corresponding sample mask;

S102: For each sample image, input the sample image to an image segmentation network, and obtain a generation mask output by the image segmentation network for indicating the area where the target object in the sample image is located;

S103: For each generated mask, input the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, and the generated edge information is used to indicate the target object indicated by the generated mask The contour edge of the area;

S104. Determine the loss function of the image segmentation network, where the loss function is used to measure the difference between the sample mask corresponding to each sample image and the generated mask, and the loss function is also used to measure each sample image Corresponding to the gap between the generated edge information and the sample edge information;

S105: Adjust various parameters of the image segmentation network, and then return to execute S102 until the loss function of the image segmentation network is less than a first preset threshold, thereby obtaining a trained image segmentation network.
The terminal device according to claim 13, wherein when the processor executes the determination of the loss function of the image segmentation network, the method comprises:

Determine the loss function of the image segmentation network, the loss function is positively related to the mask gap corresponding to each sample image, and the loss function is positively related to the edge gap corresponding to each sample image, where each The mask gap corresponding to each sample image is the gap between the sample mask corresponding to the sample image and the generated mask, and the edge gap corresponding to each sample image is the gap between the sample edge information corresponding to the sample image and the generated edge information.
The terminal device according to claim 14, wherein the processor executes the determination of the loss function of the image segmentation network, and the loss function is positively correlated with the mask gap corresponding to each sample image, and , When the loss function is positively correlated with the edge gap corresponding to each sample image, it includes:

Determine the loss function of the image segmentation network, and the calculation formula of the loss function is:

Among them, LOSS 1 is the loss function of the image segmentation network, N is the total number of sample images, F1 j is used to measure the gap between the sample mask corresponding to the jth sample image and the generated mask, and F2 j is used to measure The difference between the sample edge information corresponding to the j-th sample image and the generated edge information,
The terminal device according to claim 15, wherein the calculation formula of F1 j is as follows:

Among them, M is the total number of pixels in the j-th sample image, the value of y ji is determined according to the sample mask corresponding to the j-th sample image, and y ji is used to indicate the i-th sample image in the j-th sample image. Whether the pixel is in the image area where the target object is located, p ji is the probability that the i-th pixel in the j-th sample image predicted by the image segmentation network is in the image area where the target object is located, x is the logarithm log Bottom value

Further, when the sample mask indicates the i-th pixel of the image region of the target object is located, the value of y ji is greater than when the sample mask indicates the i-th pixel of the image region of the target object is not located in the y ji value.
The terminal device according to claim 15, wherein the trained edge neural network is formed by cascading A convolutional blocks, and each convolutional block is composed of B convolutional layers;

Correspondingly, the calculation formula of F2 j is as follows:

Among them, mask 1 is the generation mask corresponding to the j-th sample image, mask 2 is the sample mask corresponding to the j-th sample image, and h c (mask 1 ) is when the trained edge neural network input is mask 1 , the first The output of c convolutional blocks, h c (mask 2 ) is the output of the c-th convolutional block when the trained edge neural network input is mask 2 , and λ c is a constant.
The terminal device according to any one of claims 13 to 17, wherein the execution of the computer program by the processor includes a training process for the edge neural network, and the training process for the edge neural network is as follows:

For each sample mask, input the sample mask to the edge neural network to obtain edge information output by the edge neural network, and the edge information is used to indicate the contour edge of the area where the target object indicated by the sample mask is located;

Determining a loss function of the edge neural network, where the loss function is used to measure the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network;

Adjust each parameter of the edge neural network, and then return to the execution of the steps for each sample mask, input the sample mask to the edge neural network, and obtain the edge information output by the edge neural network and the subsequent steps until all The value of the loss function of the edge neural network is less than the second preset threshold, so that the edge neural network after training is obtained.
The terminal device of claim 18, wherein the processor is executing the determination of the loss function of the edge neural network, and the loss function is used to measure the sample edge information corresponding to each sample mask and The gap of the edge information output by the edge neural network includes:

Determine the loss function of the edge neural network, and the calculation formula of the loss function is:

Among them, LOSS 2 is the loss function of the edge neural network, N is the total number of sample images, M is the total number of pixels in the j-th sample mask, and the value of r ji is based on the j-th sample image The corresponding sample edge information is determined, r ji is used to indicate whether the i-th pixel in the j-th sample mask is a contour edge, and q ji is the i-th pixel in the j-th sample mask predicted by the edge neural network The probability that the pixel is the edge of the contour, and x is the base value of the logarithm log;

Further, when the sample information indicates the i-th edge pixel as an edge contour, the value of r ji when the sample is greater than the edge information indicates an i-th pixel is not the value of r ji edge contour.
A computer-readable storage medium storing a computer program, wherein the computer program implements the steps of the method according to any one of claims 1 to 11 when the computer program is executed by a processor.