CN110660066B

CN110660066B - Training method of network, image processing method, network, terminal equipment and medium

Info

Publication number: CN110660066B
Application number: CN201910931784.2A
Authority: CN
Inventors: 刘钰安
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2023-08-04
Anticipated expiration: 2039-09-29
Also published as: WO2021057848A1; CN110660066A

Abstract

The application provides a network training method, an image processing method, a network, terminal equipment and a medium. The training method comprises the following steps: s1, acquiring a sample image containing a target object, a sample mask corresponding to the sample image and sample edge information corresponding to the sample mask; s2, inputting the sample image into an image segmentation network to obtain a generated mask output by the image segmentation network; s3, inputting the generated mask into a trained marginal neural network to obtain generated marginal information output by the marginal neural network; s4, determining a loss function according to the difference between the sample mask and the generated mask and the difference between the generated edge information and the sample edge information; s5, adjusting each parameter of the image segmentation network, and returning to S2 until the loss function is smaller than the threshold value. The method and the device can enable the mask image output by the image segmentation network to more accurately represent the contour edge of the target object.

Description

Training method of network, image processing method, network, terminal equipment and medium

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a training method of an image segmentation network, an image processing method, an image segmentation network, terminal equipment and a computer readable storage medium.

Background

After the image is captured, the user often desires to change the background in the image (e.g., change the background to an outdoor beach scene, or change the background to a solid background for capturing credentials). In order to achieve the technical effects, the current common practice is as follows: and outputting a mask for representing the region where the target object (namely, the foreground, such as the portrait) is located by using the trained image segmentation network, and then segmenting the target object by using the mask so as to replace the image background.

However, the mask output by the image segmentation network at present cannot accurately represent the contour edge of the target object, so that the target object cannot be accurately segmented, and the effect of replacing the image background is poor. Therefore, how to make the mask outputted by the image segmentation network more accurately represent the contour edge of the target object is a technical problem to be solved at present.

Disclosure of Invention

In view of this, the embodiments of the present application provide a training method of an image segmentation network, an image processing method, an image segmentation network, a terminal device, and a computer-readable storage medium, which can enable a mask output by the trained image segmentation network to more accurately represent a contour edge of a target object to a certain extent.

A first aspect of an embodiment of the present application provides a training method of an image segmentation network, including steps S101 to S105:

s101, acquiring each sample image containing a target object, a sample mask corresponding to each sample image and sample edge information corresponding to each sample mask, wherein each sample mask is used for indicating an image area where the target object is located in the corresponding sample image, and each sample edge information is used for indicating the contour edge of the image area where the target object indicated by the corresponding sample mask is located;

s102, for each sample image, inputting the sample image into an image segmentation network to obtain a generated mask which is output by the image segmentation network and used for indicating the region where a target object is located in the sample image;

s103, for each generated mask, inputting the generated mask into the trained edge neural network to obtain generated edge information output by the edge neural network, wherein the generated edge information is used for indicating the contour edge of the area where the target object indicated by the generated mask is located;

s104, determining a loss function of the image segmentation network, wherein the loss function is used for measuring the difference between a sample mask and a generated mask corresponding to each sample image, and the loss function is also used for measuring the difference between generated edge information and sample edge information corresponding to each sample image;

S105, adjusting each parameter of the image segmentation network, and then returning to S102 until the loss function of the image segmentation network is smaller than a first preset threshold value, so as to obtain the trained image segmentation network.

A second aspect of an embodiment of the present application provides an image processing method, including:

acquiring an image to be processed, acquiring the image to be processed, and inputting the image to be processed into a trained image segmentation network to obtain a mask corresponding to the image to be processed, wherein the trained image segmentation network is obtained by training a trained edge neural network, and the trained edge neural network is used for outputting the contour edge of the area where the target object indicated by the mask is located according to the input mask;

and dividing the target object contained in the image to be processed based on the mask corresponding to the image to be processed.

A third aspect of embodiments of the present application provides an image segmentation network trained using the training method described in the first aspect above.

A fourth aspect of the embodiments of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first or second aspect when executing the computer program.

A fifth aspect of the embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method according to the first or second aspect.

A sixth aspect of the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method according to the first or second aspect.

In view of the above, in the training method provided by the present application, when the image segmentation network is trained, the trained edge neural network is used to train the image segmentation network.

First, a trained edge neural network is described with reference to fig. 1, as shown in fig. 1, the trained edge neural network 001 outputs edge information 003 according to an image area (solid white area) where a target object indicated by a mask 002 input to the edge neural network 001 is located, the edge information is used to indicate a position where a contour edge of the image area is located, and the edge information 003 in fig. 1 is presented in an image form.

The training method provided by the application comprises the following steps: firstly, for each sample, inputting the sample image into an image segmentation network to obtain a generation mask output by the image segmentation network, and inputting the generation mask into a trained edge neural network to obtain generation edge information output by the edge neural network; and secondly, determining a loss function of the image segmentation network, wherein the loss function is positively correlated with the mask gap corresponding to each sample image (the mask gap corresponding to a certain sample image is the gap between the sample mask corresponding to the sample image and the generated mask), and the edge gap corresponding to each sample image is positively correlated with the loss function (the edge gap corresponding to a certain sample image is the gap between the sample edge information corresponding to the sample image and the generated edge information), and finally, adjusting each parameter of the image segmentation network until the loss function is smaller than a first preset threshold value.

Therefore, the training method can further ensure that the contour edge of the target object represented in the generated mask output by the image segmentation network is more approximate to the actual contour edge while ensuring that the generated mask output by the image segmentation network approximates to the sample mask, so that the mask image output by the image segmentation network provided by the application can more accurately represent the contour edge of the target object.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application.

Fig. 1 is a schematic diagram of the working principle of a trained edge neural network provided in the present application;

fig. 2 is a schematic diagram of a training method of an image segmentation network according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a sample image, a sample mask and sample edge information according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image segmentation network according to a first embodiment of the present disclosure;

fig. 5 is a schematic diagram of a connection relationship between an image segmentation network and a trained edge neural network according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an edge neural network according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a training method of another image segmentation network according to the second embodiment of the present application;

fig. 8 is a schematic diagram of a training process of an image segmentation network according to a second embodiment of the present application;

fig. 9 is a schematic workflow diagram of an image processing method according to a third embodiment of the present application;

Fig. 10 is a schematic structural diagram of a training device of an image segmentation network according to a fourth embodiment of the present application;

fig. 11 is a schematic structural view of an image processing apparatus according to a fifth embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal device according to a sixth embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The method provided by the embodiment of the application can be applied to a terminal device, and the terminal device includes, but is not limited to: smart phones, tablet computers, notebooks, desktop computers, cloud servers, etc.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

In order to illustrate the technical solutions described in the present application, the following description is made by specific examples.

Example 1

In the following, a training method of an image segmentation network according to an embodiment of the present application will be described, with reference to fig. 2, where the training method includes:

In step S101, each sample image including the target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask are obtained, where each sample mask is used to indicate an image area where the target object in the corresponding sample image is located, and each sample edge information is used to indicate a contour edge of the image area where the target object indicated by the corresponding sample mask is located;

in an embodiment of the present application, a portion of the sample images may be acquired from the dataset, and then the number of sample images for training the image segmentation network may be expanded by: mirror inversion, scaling, and/or Gamma change are performed on the sample images acquired in advance, thereby increasing the number of sample images, and thus acquiring the respective sample images described in this step S101.

The sample mask is a binary image. The sample edge information corresponding to a certain sample mask in the step S101 may be obtained by: performing expansion operation on the sample mask to obtain a mask image after expansion operation, and performing subtraction operation on the mask image after expansion operation and the sample mask to obtain sample edge information corresponding to the sample mask. The sample edge information obtained in this way is the same as the sample mask, and is a binary image.

For a more visual understanding of the sample image, sample mask, and sample edge information by those skilled in the art, reference is made to fig. 3. As shown in fig. 3, the image 201 is a sample image containing a target object (i.e., a human image), the image 202 may be a sample mask corresponding to the sample image 201, and the image 203 may be sample edge information corresponding to the sample mask 201. In addition, it should be understood by those skilled in the art that the sample edge information is not necessarily a binary image, but may be other information representations, so long as the "contour edge of the image area where the target object indicated by the sample mask" can be represented.

In addition, it should be understood by those skilled in the art that the target object may be any subject of photographing such as a portrait, a dog, a cat, etc., and the category of the target object is not limited in this application.

In addition, for better training image segmentation network, the image content included in each sample image should be as different as possible, for example, if the target object is a portrait, the image content included in the sample image 1 may be a bright front portrait, and the image content included in the sample image 2 may be a bright red half-side portrait.

In step S102, for each sample image, inputting the sample image to an image segmentation network, to obtain a generated mask output by the image segmentation network for indicating an area where a target object in the sample image is located;

in this embodiment, before executing the step S102, an image segmentation network needs to be established in advance, where the image segmentation network is used to output a mask corresponding to an input image according to the input image. The image segmentation network may be CNN (Convolutional Neural Networks, convolutional neural network) or FPN (Feature Pyramid Networks, feature pyramid network), and the specific network structure of the image segmentation network is not limited in this application. The image segmentation network using the FPN structure can be seen in particular in fig. 4.

After the image segmentation network is established, the step S102 is started to train the image segmentation network.

In the training process, each sample image is required to be input into the image segmentation network to obtain each generation mask output by the image segmentation network, wherein each generation mask corresponds to one sample image. In addition, it will be readily understood by those skilled in the art that the "generated mask" in this step is the same as the sample mask in step S101, and may be a binary image.

In step S103, for each generated mask, inputting the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, where the generated edge information is used to indicate a contour edge of an area where the target object indicated by the generated mask is located;

before executing the step S103, a trained edge neural network needs to be acquired, where the trained edge neural network is used to output edge information according to an input mask, where the edge information is used to indicate an outline edge of an area where a target object indicated by the input mask is located. In the embodiment of the present application, the trained edge neural network may output the edge information shown in 003 as shown in fig. 1.

After the trained edge neural network is obtained, each generated mask in step S102 is input to the trained edge neural network, so as to obtain each generated edge information output by the trained edge neural network, where each generated edge information corresponds to a generated mask and is used for representing the contour edge of the image area where the target object indicated by the generated mask is located.

In the embodiment of the present application, in the process of training the image segmentation network, the connection manner between the image segmentation network and the trained edge neural network is shown in fig. 5.

In step S104, determining a loss function of the image segmentation network, where the loss function is used to measure a difference between a sample mask and a generated mask corresponding to each sample image, and the loss function is also used to measure a difference between generated edge information and sample edge information corresponding to each sample image;

one skilled in the art will readily appreciate that each sample image corresponds to one sample mask, sample edge information, a production mask, and production edge information. To obtain the loss function in step S104, for each sample image, a difference between the sample mask corresponding to the sample image and the generated mask needs to be calculated (for convenience of description, a difference between the sample mask corresponding to the sample image and the generated mask is defined as a mask difference corresponding to the sample image for a certain sample image), and a difference between the sample edge information corresponding to the sample image and the generated edge information needs to be calculated (for convenience of description, a difference between the sample edge information corresponding to the sample image and the generated edge information is defined as an edge difference corresponding to the sample image for a certain sample image).

In step S104, a loss function of the image segmentation network needs to be calculated, where the loss function and each sample image are used to measure a difference between a sample mask and a generated mask corresponding to each sample image, and the loss function is also used to measure a difference between generated edge information and sample edge information corresponding to each sample image, that is: the loss function is positively correlated with the mask gap corresponding to each sample image, and the loss function is positively correlated with the edge gap corresponding to each sample image.

In the embodiment of the present application, the calculation process of the loss function may be:

step A, for each sample image, calculating the image difference between the generated mask corresponding to the sample image and the sample mask corresponding to the sample image (namely, the image difference can bem1 _i To generate the pixel value of the ith pixel point of the mask, m2 _i The pixel value of the ith pixel point of the sample mask is used for generating the total number of the pixel points of the mask by M;

step B, if the sample edge information and the generated edge information are both images, calculating, for each sample image, an image difference between the sample edge information corresponding to the sample image and the generated edge information corresponding to the sample image (the calculation of the image difference can be described with reference to step A);

And C, averaging the image aberrations obtained in the step A and the image aberrations obtained in the step B (if the number of the sample images is N, summing the image aberrations obtained in the step A and the image differences obtained in the step B, and dividing by 2N) to obtain the loss function.

However, the calculation method of the loss function is not limited to the steps a-C, and in the embodiment of the present application, the loss function may be calculated by the following formula (1):

wherein LOSS ₁ For the loss function of the image segmentation network, N is the total number of sample images, F1 _j For measuring the difference between the sample mask corresponding to the jth sample image and the generated mask, F2 _j For measuring the difference between the sample edge information corresponding to the jth sample image and the generated edge information,

in the embodiment of the present application, F1 is as described above _j The calculation method of (1) can be as follows: calculating a sample corresponding to the jth sample imageThe cross entropy loss of the mask and the generated mask is specifically expressed as follows:

wherein M is the total number of pixel points in the jth sample image, y _ji The value of (2) is determined according to the sample mask corresponding to the jth sample image, y _ji For indicating whether the ith pixel point in the jth sample image is in the image area where the target object is located, p _ji And predicting the probability of the ith pixel point in the image area where the target object is located in the jth sample image for the image segmentation network, wherein x is the bottom value of log.

In the examples of the present application, y _ji Is determined from the sample mask corresponding to the jth sample image, such as: if the ith pixel point in the jth sample image is indicated to be located in the image area where the target object is located in the sample mask corresponding to the jth sample image, y _ji May be 1, if the ith pixel point in the jth sample image is indicated to be not located in the image area where the target object is located in the sample mask corresponding to the jth sample image, y _ji May be 0. As will be appreciated by those skilled in the art, y _ji The values of (2) are not limited to 1 and 0, but may be other values. y is _ji The value of (2) is preset, for example, 1 or 0.

It should be noted by those skilled in the art that when the sample mask indicates that the ith pixel location is in the image area where the target object is located, y _ji The value of (2) is greater than y when the sample mask indicates that the ith pixel point is not located in the image area where the target object is located _ji Is a value of (2). That is, if the sample mask indicates the image region where the i-th pixel point is located, y _ji 1, otherwise, y _ji Is 0; or, if the sample mask indicates that the ith pixel point is located in the image area where the target object is located, y _ji 2, otherwise, y _ji 1 is shown in the specification; or, if the sample mask indicates that the ith pixel point is located in the image area where the target object is located, y _ji 0.8, otherwise, y _ji 0.2.

In the examples of the present application, F2 _j The calculation method of (2) can be similar to the calculation method of the formula (2), namely, the calculation method can be as follows: and calculating the cross entropy loss of the sample edge information corresponding to the jth sample image and the generated edge information.

Alternatively, if the trained edge neural network is formed by cascading a convolution blocks, each having B convolution layers, then F2 is correspondingly _j The calculation formula of (2) can be as follows:

wherein, mask ₁ Generating mask corresponding to jth sample image, mask ₂ Sample mask corresponding to jth sample image, h _c (mask ₁ ) The trained edge neural network is input as a mask ₁ Output of the c-th convolution block, h _c (mask ₂ ) The trained edge neural network is input as a mask ₂ Output of the c-th convolution block, lambda _c Is constant.

F2 described above _j In the calculation formula of (2), when the trained edge neural network input is mask ₂ The output of the last convolution block may be considered equivalent to the sample edge information, and thus the difference between the sample edge information and the generated edge information may be measured by the above equation (3).

As shown in fig. 6, the edge neural network may be formed by cascading 3 convolution blocks, each of which is a convolution layer.

In step S105, it is determined whether the loss function is smaller than a first preset threshold, if yes, step S107 is executed, otherwise step S106 is executed;

in step S106, the parameters of the image segmentation network are adjusted, and then step S102 is executed again;

in step S107, a trained image segmentation network is obtained;

that is, the parameters of the image segmentation network are continuously adjusted until the loss function is less than a first preset threshold. In addition, in the embodiment of the present application, the parameter adjustment manner is not specifically limited, and a gradient descent algorithm, a power update algorithm, and the like may be used, and the method used for adjusting the parameter is not limited here.

In addition, in the embodiment of the present application, when training the image segmentation network, before inputting the sample image to the image segmentation network, the sample image may be preprocessed, and then the preprocessed sample image may be input to the image segmentation network. Wherein, the preprocessing may include: image cropping and/or normalization processes, etc.

After the above step S107, the trained image segmentation network may also be evaluated with a test set. The method for obtaining the test set may refer to the prior art, and will not be described herein.

For a single sample image in the test set, the evaluation function may be:

wherein, X is the image area of the target object indicated by the mask outputted by the image segmentation network after inputting the sample image into the trained image segmentation network;

y is an image area of the target object indicated by the sample mask corresponding to the sample image;

the trained image segmentation network was evaluated by the ratio of X to Y IoU (cross-over-Union), with IoU values approaching 1 indicating better performance of the trained image segmentation network.

According to the training method provided by the embodiment of the application, the generated mask output by the image segmentation network is enabled to approach the sample mask, and meanwhile, the contour edge of the target object represented in the generated mask output by the image segmentation network is enabled to be more approximate to the real contour edge, so that the contour edge of the target object can be represented more accurately by the mask image output by the image segmentation network.

Example two

The following describes another training method of an image segmentation network provided in the second embodiment of the present application, which includes a training process of an edge neural network compared to the training method described in the first embodiment. Referring to fig. 7, the training method includes:

in step S301, each sample image including the target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask are obtained, where each sample mask is used to indicate an image area where the target object in the corresponding sample image is located, and each sample edge information is used to indicate a contour edge of the image area where the target object indicated by the corresponding sample mask is located;

the specific implementation process of the step S301 can be referred to in the step S101 of the first embodiment, and will not be described herein.

In step S302, for each sample mask, inputting the sample mask to an edge neural network to obtain edge information output by the edge neural network, where the edge information is used to indicate a contour edge of an area where a target object indicated by the sample mask is located;

in the embodiment of the present application, the step S302 to the subsequent step S306 are training processes of the edge neural network, so as to obtain a trained edge neural network. It will be appreciated by those skilled in the art that this step S302-S306 is performed before the subsequent step S308, and need not be performed before step S307.

Before this step S302 is performed, an edge neural network for acquiring the contour edge of the region where the target object indicated by the input sample mask is located needs to be established in advance. As shown in fig. 6, the edge neural network may be formed by cascading 3 convolutional layers.

In the step S302, each sample mask is input to the edge neural network to obtain each edge information output by the edge neural network, where each sample mask corresponds to one edge information output by the edge neural network.

In step S303, determining a loss function of the edge neural network, where the loss function is used to measure a difference between the edge information of the sample corresponding to each sample mask and the edge information output by the edge neural network;

in the embodiment of the present application, the specific meaning of this step S303 is: and determining a loss function of the edge neural network, wherein the loss function is positively related to an edge gap corresponding to each sample mask (the edge gap is a gap between sample edge information corresponding to the sample mask and edge information output by the edge neural network after the sample mask is input into the edge neural network).

In this embodiment of the present application, the loss function calculation manner of the edge neural network may be:

if the edge information of the sample and the edge information outputted by the edge neural network are both images as shown in fig. 1 at 003, the loss function of the edge neural network may be an image difference between the edge information of the sample and the edge information outputted by the edge neural network (the image difference calculation method may be described in step a of the first embodiment, and will not be repeated here).

In addition, the loss function calculation mode of the edge neural network may be: for each sample mask, calculating the cross entropy loss of the corresponding sample edge information and the edge information output by the edge neural network, and then averaging. The specific calculation formula is as follows:

wherein LOSS ₂ As the loss function of the edge neural network, N is the total number of sample masks (as will be understood by those skilled in the art, the total number of sample images, sample masks and sample edge information are all the same, and are all N), M is the total number of pixel points in the jth sample mask, r _ji The value of (2) is determined according to the sample edge information corresponding to the jth sample image, r _ji For indicating jWhether the ith pixel point in the sample mask is a contour edge or not, q _ji And (3) predicting the probability that the ith pixel point in the jth sample mask is the contour edge for the edge neural network, wherein x is the bottom value of log.

In the examples of the present application, r _ji The value of (2) is determined according to the sample edge information corresponding to the j-th sample mask, for example: if the ith pixel point in the jth sample mask is indicated to be a contour edge in the sample edge information corresponding to the jth sample mask, r _ji May be 1, if the sample edge information corresponding to the jth sample mask indicates that the ith pixel point is not a contour edge, r _ji May be 0. Those skilled in the art will appreciate that r _ji The values of (2) are not limited to 1 and 0, but may be other values. r is (r) _ji The value of (2) is preset, for example, 1 or 0.

It should be noted by those skilled in the art that when the sample edge information indicates that the ith pixel point is a contour edge, r _ji The value of (2) is greater than r when the sample edge information indicates that the ith pixel point is not a contour edge _ji Is a value of (2). That is, if the sample edge information indicates that the ith pixel point is a contour edge, r _ji 1, otherwise, r _ji Is 0; or, if the sample edge information indicates that the ith pixel point is a contour edge, r _ji 2, otherwise, r _ji 1 is shown in the specification; or, if the sample edge information indicates that the ith pixel point is a contour edge, r _ji 0.8, otherwise, r _ji 0.2.

In step S304, determining whether the loss function of the edge neural network is smaller than a second preset threshold, if not, executing step S305, and if so, executing step S306;

in step S305, the parameters of the edge neural network model are adjusted, and then step S302 is executed again;

in step S306, a trained edge neural network is obtained;

that is, the parameters of the edge neural network are continuously adjusted until the loss function is less than a second preset threshold. In addition, in the embodiment of the present application, the parameter adjustment manner is not specifically limited, and a gradient descent algorithm, a power update algorithm, and the like may be used, and the method used for adjusting the parameter is not limited here.

In step S307, for each sample image, the sample image is input to an image segmentation network, so as to obtain a generated mask output by the image segmentation network and used for indicating an area where a target object is located in the sample image;

in step S308, for each generated mask, inputting the generated mask to the trained edge neural network to obtain generated edge information output by the edge neural network, where the generated edge information is used to indicate a contour edge of an area where the target object indicated by the generated mask is located;

In step S309, determining a loss function of the image segmentation network, where the loss function is used to measure a difference between a sample mask and a generated mask corresponding to each sample image, and the loss function is also used to measure a difference between generated edge information and sample edge information corresponding to each sample image;

in step S310, it is determined whether the loss function is smaller than a first preset threshold, if yes, step S312 is executed, otherwise, step S311 is executed;

in step S311, the parameters of the image segmentation network are adjusted, and then step S102 is executed again;

in step S312, a trained image segmentation network is obtained;

the specific implementation of steps S307 to S312 is identical to the specific implementation of steps S102 to S107 in embodiment one, and the description of embodiment one may be referred to herein and will not be repeated.

The training process described in the second embodiment of the present application will be briefly described with reference to fig. 8.

As shown in fig. 8 (a), a training process of the edge neural network is shown. Firstly, inputting a sample mask into an edge neural network to obtain edge information output by the edge neural network; secondly, calculating cross entropy loss according to the edge neural network and each sample edge information, wherein the sample edge information is obtained by performing expansion operation and subtraction operation on a sample mask, and the description of the first embodiment is omitted herein; then, averaging the cross entropy losses to obtain a loss function; finally, each parameter of the edge neural network is continuously adjusted until the loss function is smaller, so that the trained edge neural network is obtained.

After obtaining the trained edge neural network, training of the image segmentation network can be achieved with reference to fig. 8 (b).

As shown in fig. 8 (b), in the training process of the image segmentation network, first, a sample image is input into the image segmentation network to obtain a generated mask output by the image segmentation network, and the cross entropy loss of the generated mask and the sample mask is calculated; secondly, inputting the generated mask into a trained edge neural network to obtain the output of each convolution layer, and inputting a sample mask into the trained edge neural network to obtain the output of each convolution layer; then, calculating a loss function of the image segmentation network according to the calculated cross entropy loss, the output of each convolution layer when a mask is to be generated and the output of each convolution layer when a sample mask is to be input (for a specific calculation mode, see description of embodiment one); finally, each parameter of the image segmentation network is continuously adjusted until the loss function is smaller, so that the trained image segmentation network is obtained.

Compared with the first embodiment, the training method of the second embodiment of the present invention has more training processes of the edge neural network, which can make the samples used for training the edge neural network consistent with the samples used for training the image segmentation network, so that the edge accuracy of the output mask of the image segmentation network can be measured better according to the output result of the edge neural network, thereby better training the image segmentation network.

Example III

An embodiment III of the present application provides an image processing method, referring to FIG. 9, including:

in step S401, an image to be processed is obtained, and the image to be processed is input to a trained image segmentation network to obtain a mask corresponding to the image to be processed, wherein the trained image segmentation network is obtained by training a trained edge neural network, and the trained edge neural network is used for outputting an edge contour of an area where a target object indicated by the mask is located according to the input mask;

specifically, the trained edge neural network in the step S401 is a neural network trained by the method described in the first or second embodiment;

in step S402, the target object included in the image to be processed is segmented based on the mask corresponding to the image to be processed.

It is easily understood by those skilled in the art that, after the above step S402, a specific operation of replacing the background may also be performed, which is a prior art and is not described herein in detail.

The method described in the third embodiment may be a method applied to a terminal device (such as a mobile phone), and the method may facilitate the user to replace the background in the image to be processed.

Example IV

The fourth embodiment of the application provides a training device of an image segmentation network. For ease of illustration, only the portions relevant to the present application are shown, as shown in fig. 10, the training device 500 includes:

the sample obtaining module 501 is configured to obtain each sample image including the target object, a sample mask corresponding to each sample image, and sample edge information corresponding to each sample mask, where each sample mask is used to indicate an image area where the target object is located in the corresponding sample image, and each sample edge information is used to indicate a contour edge of the image area where the target object indicated by the corresponding sample mask is located;

the generating mask acquiring module 502 is configured to input, for each sample image, the sample image to an image segmentation network, so as to obtain a generating mask that is output by the image segmentation network and is used for indicating an area where a target object in the sample image is located;

a generated edge obtaining module 503, configured to input, for each generated mask, the generated mask to the trained edge neural network, to obtain generated edge information output by the edge neural network, where the generated edge information is used to indicate a contour edge of an area where the target object indicated by the generated mask is located;

A loss determining module 504, configured to determine a loss function of the image segmentation network, where the loss function is used to measure a difference between a sample mask and a generated mask corresponding to each sample image, and the loss function is further used to measure a difference between generated edge information and sample edge information corresponding to each sample image;

the parameter adjusting module 505 is configured to adjust each parameter of the image segmentation network, and then trigger the mask generating and acquiring module to continue to execute corresponding steps until a loss function of the image segmentation network is less than a first preset threshold, thereby obtaining a trained image segmentation network.

Optionally, the loss determination module 504 is specifically configured to:

determining a loss function of the image segmentation network, wherein the loss function has a calculation formula as follows:

alternatively, F1 as described above _j The calculation formula of (2) is as follows:

Wherein M is the total number of pixel points in the jth sample image, y _ji The value of (2) is determined according to the sample mask corresponding to the jth sample image, y _ji For indicating whether the ith pixel point in the jth sample image is in the image area where the target object is located, p _ji The probability of the ith pixel point in the image area where the target object is located in the jth sample image predicted by the image segmentation network is calculated, and x is the bottom value of log;

in addition, when the sample mask indicates that the ith pixel point is located in the image area where the target object is located, the y _ji The value of (2) is greater than y when the sample mask indicates that the ith pixel point is not located in the image area where the target object is located _ji Is a value of (2).

Optionally, the trained edge neural network is formed by cascading A convolution blocks, and each convolution block is composed of B convolution layers;

correspondingly, F2 is as described above _j The calculation formula of (2) is as follows:

Optionally, the training device further includes an edge neural network training module, where the edge neural network training module includes:

the edge information acquisition unit is used for inputting each sample mask into the edge neural network to obtain edge information output by the edge neural network, wherein the edge information is used for indicating the contour edge of the area where the target object indicated by the sample mask is located;

the edge loss determining unit is used for determining a loss function of the edge neural network, and the loss function is used for measuring the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network;

and the edge parameter adjusting unit is used for adjusting each parameter of the edge neural network, and then triggering the edge information acquisition unit to continuously execute corresponding steps until the loss function value of the edge neural network is smaller than a second preset threshold value, so that the trained edge neural network is obtained.

Optionally, the edge loss determining unit is specifically configured to:

determining a loss function of the edge neural network, wherein the loss function has a calculation formula as follows:

wherein LOSS ₂ N is the total number of sample images, M is the total number of pixel points in the jth sample mask, r _ji The value of (2) is determined according to the sample edge information corresponding to the jth sample image, r _ji For indicating whether the ith pixel point in the jth sample mask is a contour edge, q _ji The probability that the ith pixel point in the jth sample mask predicted by the edge neural network is the contour edge is calculated, and x is the base value of log;

in addition, when the sample edge information indicates that the ith pixel point is a contour edge, the r _ji The value of (2) is greater than r when the sample edge information indicates that the ith pixel point is not a contour edge _ji Is a value of (2).

It should be noted that, because the content of the information interaction and the execution process between the devices/units is based on the same concept as the first embodiment and the second embodiment of the method, specific functions and technical effects thereof may be referred to in the corresponding method embodiment section, and will not be described herein.

Example five

The fifth embodiment of the application provides an image processing device. For convenience of explanation, only a portion related to the present application is shown, and as shown in fig. 11, the image processing apparatus 600 includes:

The mask obtaining module 601 is configured to obtain an image to be processed, and input the image to be processed to a trained image segmentation network to obtain a mask corresponding to the image to be processed, where the mask is obtained by training using a trained edge neural network, and the trained edge neural network is configured to output, according to the input mask, an edge contour of an area where a target object indicated by the mask is located (specifically, the trained image segmentation network is obtained by training using a training method as described in the first embodiment or the second embodiment);

the target object segmentation module 602 is configured to segment a target object included in the image to be processed based on a mask corresponding to the image to be processed.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the third embodiment of the method of the present application, specific functions and technical effects thereof may be found in the third embodiment of the method, which is not described herein.

Example six

Fig. 12 is a schematic diagram of a terminal device provided in a sixth embodiment of the present application. As shown in fig. 12, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703 stored in the memory 702 and executable on the processor 701. The steps of the various method embodiments described above are implemented when the processor 701 executes the computer program 703. Alternatively, the processor 701, when executing the computer program 703, performs the functions of the modules/units in the above-described apparatus embodiments.

Illustratively, the computer program 703 may be partitioned into one or more modules/units that are stored in the memory 702 and executed by the processor 701 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 703 in the terminal device 700. For example, the computer program 703 may be divided into a sample acquiring module, a mask acquiring module, an edge acquiring module, a loss determining module, and a parameter adjusting module, where each module specifically functions as follows:

Alternatively, the computer program 703 may be divided into a mask acquisition module and a target object division module, where each module specifically functions as follows:

acquiring an image to be processed, and inputting the image to be processed into a trained image segmentation network to obtain a mask corresponding to the image to be processed, wherein the trained image segmentation network is trained by the training method described in the first embodiment or the second embodiment;

The terminal device may include, but is not limited to, a processor 701, a memory 702. It will be appreciated by those skilled in the art that fig. 12 is merely an example of a terminal device 700 and is not intended to limit the terminal device 700, and may include more or fewer components than shown, or may combine certain components, or different components, such as the terminal device described above may also include input and output devices, network access devices, buses, etc.

The processor 701 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 702 may be an internal storage unit of the terminal device 700, for example, a hard disk or a memory of the terminal device 700. The memory 702 may be an external storage device of the terminal device 700, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided in the terminal device 700. Further, the memory 702 may also include both an internal storage unit and an external storage device of the terminal device 700. The memory 702 is used for storing the computer programs and other programs and data required for the terminal device. The memory 702 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units described above is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of each of the above-described method embodiments, or may be implemented by a computer program to instruct related hardware, where the above-described computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the above-described method embodiments. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium described above can be appropriately increased or decreased according to the requirements of the jurisdiction's legislation and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the legislation and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of training an image segmentation network, comprising:

s101, acquiring each sample image containing a target object, a sample mask corresponding to each sample image and sample edge information corresponding to each sample mask, wherein each sample mask is used for indicating an image area where the target object is located in the corresponding sample image, and each sample edge information is used for indicating the contour edge of the image area where the target object indicated by the corresponding sample mask is located; the sample edge information corresponding to each sample mask is obtained by the following steps: performing expansion operation on a sample mask to obtain a mask image after expansion operation, and performing subtraction operation on the mask image after expansion operation and the sample mask to obtain sample edge information corresponding to the sample mask;

the determining a loss function of the image segmentation network, the loss function being directly related to a mask gap corresponding to each sample image, and the loss function being directly related to an edge gap corresponding to each sample image, comprises:

determining a loss function of the image segmentation network, wherein a calculation formula of the loss function of the image segmentation network is as follows:

s105, adjusting each parameter of the image segmentation network, and then returning to S102 until the loss function of the image segmentation network is smaller than a first preset threshold value, so as to obtain a trained image segmentation network;

before the step S103, the training method further includes a training process of an edge neural network, where the training process of the edge neural network is as follows:

for each sample mask, inputting the sample mask into an edge neural network to obtain edge information output by the edge neural network, wherein the edge information is used for indicating the contour edge of the area where the target object indicated by the sample mask is located;

determining a loss function of the edge neural network, wherein the loss function is used for measuring the difference between the sample edge information corresponding to each sample mask and the edge information output by the edge neural network;

The method comprises the steps of adjusting each parameter of the edge neural network, then returning to execute each sample mask, inputting the sample mask into the edge neural network, and obtaining edge information output by the edge neural network and subsequent steps until the loss function value of the edge neural network is smaller than a second preset threshold value, so as to obtain the trained edge neural network;

the determining a loss function of the edge neural network, where the loss function is used to measure a difference between the edge information of the sample corresponding to each sample mask and the edge information output by the edge neural network, includes:

determining a loss function of the marginal neural network, wherein the calculation formula of the loss function of the marginal neural network is as follows:

wherein LOSS ₂ N is the total number of sample images, M is the total number of pixel points in the jth sample mask, r _ji The value of (2) is determined according to the sample edge information corresponding to the jth sample image, r _ji For indicating whether the ith pixel point in the jth sample mask is a contour edge, q _ji The probability that the ith pixel point in the jth sample mask predicted for the edge neural network is a contour edge, x is the base of log;

2. The training method of claim 1, wherein the determining a loss function of the image segmentation network, the loss function being used to measure a difference between a sample mask and a generated mask corresponding to each sample image, and the loss function being further used to measure a difference between generated edge information and sample edge information corresponding to each sample image, comprises:

and determining a loss function of the image segmentation network, wherein the loss function is positively correlated to a mask gap corresponding to each sample image, and the loss function is positively correlated to an edge gap corresponding to each sample image, wherein the mask gap corresponding to each sample image is a gap between a sample mask corresponding to the sample image and a generated mask, and the edge gap corresponding to each sample image is a gap between sample edge information corresponding to the sample image and generated edge information.

3. The training method of claim 2, wherein F1 _j The calculation formula of (2) is as follows:

4. The training method of claim 2, wherein the trained edge neural network is formed by cascading a convolution blocks, each convolution block having B convolution layers;

accordingly, F2 _j The calculation formula of (2) is as follows:

5. An image processing method, comprising:

acquiring an image to be processed, and inputting the image to be processed into a trained image segmentation network to obtain a mask corresponding to the image to be processed, wherein the trained image segmentation network is obtained by training a trained edge neural network, and the trained edge neural network is used for outputting the contour edge of the region where the target object indicated by the mask is located according to the input mask;

dividing a target object contained in the image to be processed based on a mask corresponding to the image to be processed;

the trained image segmentation network is obtained by training a trained edge neural network, and the trained edge neural network is used for outputting the outline edge of the area where the target object indicated by the mask is located according to the input mask, and comprises the following steps:

the trained image segmentation network is trained by the training method according to any one of claims 1 to 4.

6. An image segmentation network, characterized in that the image segmentation network is trained using the training method according to any one of claims 1 to 4.

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4.