CN111598902B

CN111598902B - Image segmentation method, device, electronic equipment and computer readable medium

Info

Publication number: CN111598902B
Application number: CN202010430949.0A
Authority: CN
Inventors: 李华夏
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2023-05-30
Anticipated expiration: 2040-05-20
Also published as: CN111598902A

Abstract

The embodiment of the disclosure provides an image segmentation method, an image segmentation device, electronic equipment and a computer readable medium. The method comprises the following steps: determining a target area: determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image, and determining a detection area: determining a frame image to be processed, determining a detection area of the frame image to be processed based on target area information, and cycling the steps: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and executing the step of determining the detection area and the step of circulating circularly until the second preset condition is met. The embodiment of the disclosure realizes that the target object is easier to identify in the detection area, thereby reducing the complexity of dividing the target object and improving the dividing effect.

Description

Image segmentation method, device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image segmentation method, an image segmentation apparatus, an electronic device, and a computer readable medium.

Background

With the continuous development of information technology, images have become essential information in transmission networks. As the use of images has become more widespread, image processing techniques have become more important, wherein image segmentation processing is one of the important processing techniques.

The image segmentation processing technique is a technique of segmenting an image into regions having features and extracting target features of interest. In image segmentation processing for frame images in video, in the prior art, segmentation processing is generally performed for foreground, background and target objects in each frame image to extract the target objects. However, if the target object occupies a relatively small amount of the frame image, it is not easy to identify the target object, and the complexity of dividing the target object may be high and the dividing effect is poor, so how to divide the frame image in the video to obtain the target object becomes a key problem.

Disclosure of Invention

The present disclosure provides an image segmentation method, apparatus, electronic device, and computer readable medium, which can solve at least one of the above technical problems.

In a first aspect, there is provided an image segmentation method, the method comprising:

Determining a target area: determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image;

determining a detection area: determining a frame image to be processed, and determining a detection area of the frame image to be processed based on target area information;

the circulation steps are as follows: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and executing the step of determining the detection area and the step of circulating circularly until the second preset condition is met.

In a second aspect, there is provided an image segmentation apparatus comprising:

the target area determining module is used for determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image;

the detection area determining module is used for determining a frame image to be processed and determining a detection area of the frame image to be processed based on the target area information;

and the first circulation module is used for determining a target object area based on each pixel point in the detection area when each pixel point in the detection area meets a first preset condition, and performing the operations corresponding to the detection area determination module and the first circulation module in a circulating manner until a second preset condition is met.

In a third aspect, an electronic device is provided, the electronic device comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: an apparatus corresponding to the image segmentation method according to the first aspect is performed.

In a fourth aspect, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the image segmentation method of the first aspect.

The beneficial effects that this disclosure provided technical scheme brought are:

the present disclosure provides an image segmentation method, apparatus, electronic device, and computer readable medium, compared to the prior art, the present disclosure provides a method for determining a target area by: determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image, and determining a detection area: determining a frame image to be processed, determining a detection area of the frame image to be processed based on the target area information, and cycling the steps: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and circularly executing the steps of determining the detection area and circularly executing the steps until the second preset condition is met, namely determining the detection area of the subsequent frame image through the area of the target object contained in the target frame determined in the video to be processed, and determining the target object area through each pixel point in the detection area.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of an image segmentation method according to an embodiment of the disclosure;

fig. 2 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;

fig. 4 is a flowchart illustrating another image segmentation method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a split network model according to an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are used merely to distinguish one device, module, or unit from another device, module, or unit, and are not intended to limit the order or interdependence of the functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise. The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The embodiment of the disclosure provides an image segmentation method, which is executed by an electronic device, wherein the electronic device can be executed by a terminal device or a server, as shown in fig. 1, and the method comprises:

step S101, a target area determination step.

Wherein the step of determining the target area comprises: target region information in the target frame image is determined.

The target area is an area containing a target object in the target frame image.

For the embodiment of the present disclosure, the target frame image may be a first frame image in a preset video, the target frame image may also be a frame image to be processed corresponding to a detection area in the preset video where the target object area is not detected, and the target object may be any object, for example, the target object may be at least one of a person, an animal, and a vehicle, which is not limited in the embodiment of the present disclosure.

For the embodiment of the present disclosure, the target area information may be rectangular area information including the target object in the target frame image, such as coordinate information of at least three position points in the target frame image, or coordinate information of one position point and two side length information in the target frame image; the target area information may also be circular area information of the target object contained in the target frame image, such as coordinate information and radius information of a point in the target frame image; the target area information is not limited to rectangular area information or circular area information, but may be area information of other shapes, and is not limited in the embodiment of the present disclosure.

Step S102, a detection area step is determined.

Wherein the step of determining the detection area includes: and determining a frame image to be processed, and determining a detection area of the frame image to be processed based on the target area information.

For the embodiment of the disclosure, the frame image to be processed may be a next frame image of a target frame image in a preset video, and the frame image to be processed may also be a next frame image of the frame image to be processed corresponding to a detection area in the preset video, where the detection area is image information obtained by cutting the frame image to be processed based on the target area information.

Step S103, looping the steps.

Wherein the cycling step comprises: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and executing step S102 and step S103 in a circulating manner until the second preset condition is met.

For the embodiment of the disclosure, the fact that each pixel point in the detection area meets the first preset condition is that the connected domain value obtained by calculating each pixel point in the detection area is larger than the preset connected domain threshold, wherein the connected domain value is obtained by calculating the pixel points in the connected area, and generally, an image area formed by foreground pixel points which have the same pixel value and are adjacent in position is called a connected area in an image; and when the second preset condition is met, the frame image to be processed is the last frame image of the preset video.

The embodiment of the disclosure provides an image segmentation method, compared with the prior art, the embodiment of the disclosure comprises the steps of: determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image, and determining a detection area: determining a frame image to be processed, determining a detection area of the frame image to be processed based on the target area information, and cycling the steps: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and circularly executing the steps of determining the detection area and circularly executing the steps until the second preset condition is met, determining the detection area of the subsequent frame image through the area of the target object contained in the target frame determined in the video to be processed, and determining the target object area through each pixel point in the detection area.

Another possible implementation manner of the embodiment of the present disclosure, determining target area information in a target frame image may specifically include:

and step A, determining target area information in a target frame image based on the frame image to be detected.

The step a specifically may include: acquiring a frame image to be detected; carrying out image segmentation processing on the frame image to be detected through a segmentation network model based on a full graph to obtain a segmentation result; based on the segmentation result, target region information in the target frame image is determined.

For the embodiment of the disclosure, the frame image to be detected may also be extracted from a preset video.

For the embodiment of the disclosure, the structure of the split network model based on the full graph is shown in fig. 5, where the split network model based on the full graph is a U-shaped structure and is a full convolution network. The left side (i.e., left dashed line frame) of the split network model based on the whole image is subjected to downsampling operation, as shown in fig. 5, for four downsampling operations, specifically, the image input to the split network model based on the whole image is subjected to two convolution operations (arrow "→" to right in the left dashed line frame indicates convolution operation) and one downsampling operation (arrow "≡" to down in the left dashed line frame indicates downsampling), the multiple of each downsampling is 2, and so on, four downsampling operations are performed in total; the right side (i.e., right dashed box) of the split network model based on the full graph performs an up-sampling operation on an image, as shown in fig. 5, where, specifically, the image after the fourth downsampling is first subjected to two up-convolution operations (arrow "→" in the right dashed box to indicate deconvolution operation) and one up-sampling operation (arrow "→" in the right dashed box to indicate up-sampling), where the multiple of each up-sampling is also 2, and then the number of channels of the image after the right up-sampling operation is superimposed with the number of channels of the image before the left symmetric downsampling (i.e., the image pointed by arrow "→" in the right dashed box is superimposed with the number of channels of the image pointed by the dashed arrow), and so on, where the image after the fourth upsampling operation is further subjected to three up-convolution operations, an output result is obtained.

For the embodiment of the disclosure, the segmentation network model based on the full graph can add the features of the shallow layer to the deep layer in a channel superposition mode, so that more original graph information is reserved. Note that, in the embodiment of the present disclosure, the number of upsampling operations and downsampling operations is not limited, and four upsampling operations and four downsampling operations are used in fig. 5, and in the embodiment of the present disclosure, three upsampling operations and three downsampling operations may be preferably used.

Another possible implementation manner of the embodiment of the present disclosure, based on the segmentation result, determines target region information in the target frame image, which may specifically include: if the segmentation result meets the first preset condition, determining target area information in the frame image to be detected based on the segmentation result, and determining the target area information in the frame image to be detected as target area information in the target frame image.

For the embodiment of the present disclosure, when it is determined that there is target area information in the frame image to be detected based on the division result, the target area information in the frame image to be detected is determined as the target area information in the target frame image. In an embodiment of the present disclosure, determining, based on a segmentation result, that target region information exists in a frame image to be detected includes: if the segmentation result meets a first preset condition, determining that target area information exists in the frame image to be detected.

The segmentation result meets a first preset condition, namely that the connected domain value calculated by using the segmentation result is larger than a preset connected domain threshold value.

Another possible implementation manner of the embodiment of the present disclosure, based on the segmentation result, determines target region information in the target frame image, which may specifically include: if the segmentation result does not meet the first preset condition, the next frame image is used as the frame image to be detected and the step A is executed until the second preset condition is met or the segmentation result meets the first preset condition; and determining target area information in the target frame image based on the segmentation result meeting the first preset condition.

For the embodiment of the present disclosure, the next frame image may also be extracted from a preset video, where the next frame image is a frame image to be detected of the next frame. For example, if the segmentation result of the nth frame image (frame image to be detected) does not satisfy the first preset condition, the (n+1) th frame image is taken as the frame image to be detected.

For the embodiment of the disclosure, if the segmentation result does not meet the first preset condition, that is, the connected domain value obtained by calculation by using the segmentation result is smaller than or equal to the preset connected domain threshold value; the segmentation result meets a first preset condition, namely that the connected domain value calculated by using the segmentation result is larger than a preset connected domain threshold value; and the second preset condition is met, namely the frame image to be detected is the last frame image of the preset video.

For the step of determining the target region information in the target frame image based on the frame image to be detected in the embodiments of the present disclosure, please refer to the description of the foregoing implementation manner, which is not repeated in the embodiments of the present disclosure.

Further, the step of circularly taking the next frame image as the frame image to be detected and the step A until a second preset condition is met or the segmentation result meets the first preset condition comprises the following steps:

and C, circularly utilizing the first preset special effect image to replace the frame image to be detected, taking the next frame image as the frame image to be detected, and executing the step A until a second preset condition is met or the segmentation result meets the first preset condition.

For the embodiment of the disclosure, the first preset special effect image includes at least one of a computer-generated special effect image and a manually photographed special effect image, wherein the first preset special effect image may be an image with any special effect, for example, a snowflake special effect, a bubble special effect, a lightning special effect, a cartoon character special effect, and the like.

For the embodiment of the disclosure, when the segmentation result corresponding to the frame image to be detected does not meet the first preset condition, that is, the connected domain value obtained by calculation by using the segmentation result is smaller than or equal to the preset connected domain threshold value, it is indicated that no target region information exists in the frame image to be detected, the first preset special effect image can be obtained, and the frame image to be detected is replaced by the first preset special effect image. For any two frames of frame images to be detected, in which no target area information exists, the same first preset special effect image can be used for replacing the frame images, or two different first preset special effect images can be used for replacing the frame images, for example, if no target area information exists in an nth frame image (frame image to be detected), a snowflake special effect image can be obtained, and the snowflake special effect image is used for replacing the nth frame image; if the n+1th frame image (to-be-detected frame image) does not have target area information, a snowflake special effect image can be obtained, the n+1th frame image can be replaced by the snowflake special effect image, a thunder special effect image can also be obtained, and the n+1th frame image can be replaced by the thunder special effect image.

For the embodiment of the present disclosure, the next frame image is taken as the frame image to be detected, and the step a is executed, that is, the step of determining the target area information in the target frame image based on the frame image to be detected is executed, and the description of the foregoing implementation manner is referred to, and will not be repeated in the embodiment of the present disclosure.

In another possible implementation manner of the embodiment of the disclosure, the segmentation result is first probability information that each pixel point in the frame image to be detected belongs to the target object; based on the segmentation result, determining target region information in the target frame image includes: determining an area formed by pixel points with the first probability information larger than a first preset threshold value as a target object area; based on the target object area, target area information is determined.

For the embodiment of the disclosure, a frame image to be detected is input to a full-image-based segmentation network model for image segmentation processing, and a segmentation result is output based on the full-image-based segmentation network model. In the embodiment of the disclosure, when the segmentation result satisfies the first preset condition, the target region information in the target frame image is determined based on the segmentation result.

The segmentation result is first probability information that each pixel point in the frame image to be detected belongs to a target object.

For the embodiment of the present disclosure, the manner of determining whether the segmentation result satisfies the first preset condition includes: determining at least one pixel point with the first probability information being greater than a first preset threshold value based on first probability information of each pixel point in the frame image to be detected belonging to the target object, calculating a connected domain value by using the determined pixel point with the first probability information being greater than the first preset threshold value, and if the connected domain value is greater than the preset connected domain threshold value, enabling the segmentation result to meet a first preset condition; if the connected domain value is smaller than or equal to the preset connected domain threshold value, the segmentation result does not meet the first preset condition. For example, a connected domain value is calculated using pixels whose first probability information is greater than 0.5 (a first preset threshold value), and whether the segmentation result satisfies a first preset condition is determined based on a relationship between the connected threshold value and 0.75 (a preset connected domain threshold value).

For the embodiment of the present disclosure, an area composed of pixels whose first probability information is greater than a first preset threshold is determined as a target object area, for example, an area composed of pixels whose first probability information is greater than 0.5 (first preset threshold) is determined as a target object area.

Further, determining target area information based on the target object area specifically includes: target object region information is determined based on the target object region, and target region information is determined based on the target object region information. For example, a minimum rectangular frame (i.e., target object area information) including a human outline area is determined based on the human outline area (i.e., target object area), the minimum rectangular frame is expanded by 25%, and the expanded rectangular frame coincides with the center of the minimum rectangular frame, and then the expanded rectangular frame is the target area information.

Further, determining an area composed of pixels with the first probability information greater than a first preset threshold as the target object area may further include: determining first probability information of each first pixel point belonging to a target object in a non-target object area based on the target frame image; determining first pixel values corresponding to the first pixel points in a second preset special effect image; and determining a special effect image corresponding to the target frame image based on the first probability information that each first pixel point belongs to the target object, the first pixel value corresponding to each first pixel point and the target object area in the target frame image.

For the embodiment of the disclosure, an area formed by pixel points with first probability information greater than a first preset threshold is determined as a target object area, that is, it is determined that a target object area exists in a frame image to be detected, and at this time, the frame image to be detected in which the target object area exists corresponds to the target frame image in which the target object area exists.

Further, after determining the target object region in the target frame image, for the non-target object region in the target frame image, first probability information that each first pixel point in the non-target object region belongs to the target object can be determined, a second preset special effect image is acquired, first pixel values corresponding to each first pixel point in the non-target object region are determined based on the second preset special effect image, and first pixel update values of each first pixel point are determined based on the first probability information that each first pixel point belongs to the target object and the first pixel values corresponding to each first pixel point, so that the special effect region corresponding to the non-target object region in the target frame image is determined. The second preset special effect image may be the same as or different from the first preset special effect image, which is not limited in the embodiment of the present disclosure.

For example, for a non-target object region in a target frame image, where the first pixel point 1 and the first pixel point 2 exist, it may be determined that the first pixel point 1 belongs to first probability information p1 of the target object based on the target frame image, a first pixel value x1 corresponding to the first pixel point 1 based on the snowflake special effect image, and a first pixel update value (1-p 1) ×1 of the first pixel point 1 based on the first probability information p1 and the first pixel value x1; the first probability information p2 that the first pixel point 2 belongs to the target object may be determined based on the target frame image, the first pixel value x2 corresponding to the first pixel point 2 may be determined based on the snowflake special effect image, the first pixel update value (1-p 2) x2 of the first pixel point 2 may be determined based on the first probability information p2 and the first pixel value x2, and the special effect region corresponding to the non-target object region may be determined based on the first pixel update value (1-p 1) x1 and the first pixel update value (1-p 2) x 2.

Further, a special effect image corresponding to the target frame image is determined based on the special effect region corresponding to the non-target object region in the target frame image and the target object region in the target frame image. Namely, according to the mode, the non-target object area in the target frame image is replaced by the special effect area, and the special effect image corresponding to the target frame image is obtained by combining the target object area in the target frame image.

In another possible implementation manner of the embodiment of the present disclosure, if each pixel point in the detection area does not meet the first preset condition, the method may further include: and circularly executing the steps of A, S102 and S103, wherein the preset frame image is determined to be the frame image to be detected, and the second preset condition is met.

The preset frame image is a frame image of which the detection area does not meet the first preset condition.

For the embodiment of the present disclosure, the determining the preset frame image as the frame image to be detected, step a, step S102, and step S103 are performed in a loop until the second preset condition is met, and may be performed after step S102, where specific steps a, S102, and S103 refer to related descriptions of the foregoing implementation manner, and are not repeated herein.

For the embodiment of the disclosure, the second preset condition is satisfied, that is, the frame image to be detected is the last frame image.

Another possible implementation manner of the embodiment of the present disclosure, determining the target object area based on each pixel point in the detection area may further include: and carrying out segmentation processing on the detection region through a segmentation network model based on a detection frame to obtain second probability information that each pixel point in the detection region belongs to the target object.

The determining the target object area based on each pixel point in the detection area may specifically include: and determining an area formed by pixel points with the second probability information larger than a second preset threshold value as a target object area.

For the embodiment of the present disclosure, the frame-based split network model has the same structure as the full-view-based split network model, and the difference between the two is only the size of the network model parameters, so that the above description about the full-view-based network model can be detailed, and will not be repeated here.

For the embodiment of the disclosure, the detection area is input to a frame-based segmentation network model for image segmentation processing, and second probability information that each pixel point in the detection area belongs to a target object is output. In the embodiment of the disclosure, when each pixel point in the detection area meets a first preset condition, a target object area is determined based on each pixel point in the detection area.

For the embodiment of the present disclosure, the method for determining whether each pixel point in the detection area meets the first preset condition may include: determining pixel points with second probability information larger than a second preset threshold value based on second probability information of each pixel point belonging to a target object in the detection area, calculating a connected threshold value by using the determined pixel points with the second probability information larger than the second preset threshold value, and if the connected domain value is larger than the preset connected domain threshold value, enabling each pixel point in the detection area to meet a first preset condition; if the connected domain value is smaller than or equal to a preset connected domain threshold value, each pixel point in the detection area does not meet a first preset condition. For example, the connected domain value is calculated by using the pixels with the second probability information greater than 0.5 (the second preset threshold value), and whether each pixel in the detection area meets the first preset condition is determined based on the relation between the connected threshold value and 0.75 (the preset connected domain threshold value).

For the embodiment of the present disclosure, an area composed of pixels whose second probability information is greater than a second preset threshold is determined as a target object area, for example, an area composed of pixels whose second probability information is greater than 0.5 (second preset threshold) is determined as a target object area.

Further, determining the region composed of the pixel points with the second probability information greater than the second preset threshold value as the target object region may further include: determining second probability information of each second pixel point belonging to the target object in the non-target object area based on the frame image to be processed; determining second pixel values corresponding to the second pixel points in a third preset special effect image; and determining a special effect image corresponding to the frame image to be processed based on the second probability information that each second pixel point belongs to the target object, the second pixel value corresponding to each second pixel point and the target object area in the frame image to be processed.

For the embodiment of the disclosure, an area composed of pixel points with second probability information greater than a second preset threshold is determined as a target object area, that is, a target object area exists in a detection area, that is, a target object area exists in a frame image to be processed is determined.

Further, after determining the target object area in the frame image to be processed, for the non-target object area in the frame image to be processed, second probability information that each second pixel point in the non-target object area belongs to the target object can be determined, a third preset special effect image is obtained, second pixel values corresponding to each second pixel point in the non-target object area are determined based on the third preset special effect image, and second pixel update values of each second pixel point are determined based on second probability information that each second pixel point belongs to the target object and second pixel values corresponding to each second pixel point respectively, so that special effect areas corresponding to the non-target object area in the frame image to be processed are determined. The third preset special effect image may be the same as or different from the first preset special effect image, and may be the same as or different from the second preset special effect image, which is not limited in the embodiment of the present disclosure.

For example, for a non-target object region in the frame image to be processed, where the second pixel 3 and the second pixel 4 exist, the second probability information p3 that the second pixel 3 belongs to the target object may be determined based on the frame image to be processed, the second pixel value x3 corresponding to the second pixel 3 may be determined based on the snowflake special effect image, and the second pixel update value (1-p 3) ×x3) of the second pixel 3 may be determined based on the second probability information p3 and the second pixel value x3; the second probability information p4 that the second pixel point 4 belongs to the target object may be determined based on the frame image to be processed, the second pixel value x4 corresponding to the second pixel point 4 may be determined based on the snowflake special effect image, the second pixel update value (1-p 4) x4 of the second pixel point 4 may be determined based on the second probability information p4 and the second pixel value x4, and the special effect region corresponding to the non-target object region may be determined based on the second pixel update value (1-p 3) x3 and the second pixel update value (1-p 4) x 4.

Further, a special effect image corresponding to the frame image to be processed is determined based on the special effect area corresponding to the non-target object area in the frame image to be processed and the target object area in the frame image to be processed. Namely, according to the mode, the non-target object area in the frame image to be processed is replaced by the special effect area, and the special effect image corresponding to the frame image to be processed is obtained by combining the target object area in the frame image to be processed.

Another possible implementation of an embodiment of the disclosure, the method may further include: and determining third probability information of each pixel point belonging to the target object based on the first probability information of each pixel point belonging to the target object in the frame image to be detected and the first probability information or the second probability information of the first preset pixel point corresponding to each pixel point.

Each first preset pixel point is a pixel point in a previous frame image of the frame image to be detected.

For the embodiment of the present disclosure, the step of determining the third probability information that each pixel belongs to the target object may be performed before the step of determining the region composed of the pixel whose first probability information is greater than the first preset threshold as the target object region.

The determining, as the target object area, an area composed of pixels whose first probability information is greater than a first preset threshold value may include: and determining an area formed by pixel points with the third probability information larger than a first preset threshold value as a target object area.

For the embodiments of the present disclosure, the smoothing process between the frame image to be detected and the previous frame image is performed using dense optical flow. The dense optical flow is an image registration method for carrying out point-by-point matching on an image, and is different from the sparse optical flow which only aims at a plurality of characteristic points on the image, and the offset of all the points on the image is calculated by the dense optical flow, so that a dense optical flow field is formed. Through the dense optical flow field, pixel-level image registration can be performed, so that the effect after registration is obviously better than that of sparse optical flow registration.

For the embodiment of the disclosure, for any pixel point in the frame image to be detected, the calculation formula of the third probability information is as follows:

P ₃ ＝αP ₁ +βP ₂

wherein P is ₃ For the third probability information that any pixel point in the frame image to be detected belongs to the target object, alpha is the weight coefficient corresponding to the any pixel point, and P ₁ For the first probability information that any pixel belongs to the target object, β is the weight coefficient corresponding to the first preset pixel of the any pixel, and P ₂ The first preset pixel point of any pixel point belongs to first probability information or second probability information of a target object.

For the embodiment of the disclosure, if the third probability information meets the first preset condition, determining an area composed of pixel points with the third probability information greater than the first preset threshold as the target object area. The method for determining whether the third probability information meets the first preset condition comprises the following steps: determining pixel points with third probability information larger than a first preset threshold value based on third probability information that all pixel points in a frame image to be detected belong to a target object, calculating a connected domain value by using the determined pixel points with the third probability information larger than the first preset threshold value, and if the connected domain value is larger than the preset connected domain threshold value, enabling the third probability information to meet a first preset condition; if the connected domain value is smaller than or equal to the preset connected domain threshold value, the third probability information does not meet the first preset condition. For example, a connected domain value is calculated using pixels whose third probability information is greater than 0.5 (a first preset threshold value), and whether the third probability information satisfies the first preset condition is determined based on a relationship between the connected threshold value and 0.75 (a preset connected domain threshold value).

For the embodiment of the present disclosure, an area composed of pixels whose third probability information is greater than a first preset threshold is determined as a target object area, for example, an area composed of pixels whose third probability information is greater than 0.5 (first preset threshold) is determined as a target object area.

Another possible implementation manner of the embodiment of the present disclosure, the image segmentation method may further include: and determining fourth probability information corresponding to each pixel point in the detection area based on second probability information that each pixel point in the detection area belongs to the target object and first probability information or second probability information corresponding to each pixel point in the detection area corresponding to a second preset pixel point, wherein each second preset pixel point is a pixel point in a previous frame image of the detection area.

For the embodiment of the present disclosure, the step of determining fourth probability information corresponding to each pixel point in the detection area may be performed before the step of determining an area composed of pixel points whose second probability information is greater than a second preset threshold as the target object area.

The determining, as the target object area, an area composed of pixel points where the second probability information is greater than a second preset threshold value may include: and determining an area formed by pixel points with fourth probability information larger than a second preset threshold value as a target object area.

For the embodiment of the present disclosure, smoothing processing between the detection area and the previous frame image is performed using dense optical flow. The dense optical flow is an image registration method for carrying out point-by-point matching on an image, and is different from the sparse optical flow which only aims at a plurality of characteristic points on the image, and the offset of all the points on the image is calculated by the dense optical flow, so that a dense optical flow field is formed. Through the dense optical flow field, pixel-level image registration can be performed, so that the effect after registration is obviously better than that of sparse optical flow registration.

For the embodiment of the present disclosure, for any pixel point in the detection area, the calculation formula of the fourth probability information is as follows:

P ₃ '＝α'P ₁ '+β'P ₂ '

wherein P is ₃ 'is fourth probability information of any pixel point in the detection area belonging to the target object, alpha' is a weight coefficient corresponding to the any pixel point, and P ₁ 'is the second probability information that any pixel belongs to the target object, beta' is the weight coefficient corresponding to the second preset pixel of the any pixel, and P ₂ ' is the first probability information or the second probability information of the second preset pixel point of the any pixel point belonging to the target object.

For the embodiment of the disclosure, if the fourth probability information meets the first preset condition, determining an area composed of pixel points with the fourth probability information larger than the second preset threshold as the target object area. The method for determining whether the fourth probability information meets the first preset condition comprises the following steps: determining pixel points with fourth probability information larger than a second preset threshold value based on fourth probability information of each pixel point belonging to a target object in the detection area, calculating a connected domain value by using the determined pixel points with fourth probability information larger than the second preset threshold value, and determining that the fourth probability information meets a first preset condition if the connected domain value is larger than the preset connected domain threshold value; if the connected domain value is smaller than or equal to the preset connected domain threshold value, determining that the fourth probability information does not meet the first preset condition. For example, a connected domain value is calculated by using pixels whose fourth probability information is greater than 0.5 (the second preset threshold value), and whether the fourth probability information satisfies the first preset condition is determined based on the relation between the calculated connected threshold value and the preset connected domain threshold value (for example, 0.75).

For the embodiment of the present disclosure, an area composed of pixels whose fourth probability information is greater than a second preset threshold is determined as a target object area, for example, an area composed of pixels whose fourth probability information is greater than 0.5 (second preset threshold) is determined as a target object area.

The above details describe a method for performing image segmentation by using a trained full-graph-based segmentation network model and a frame-based segmentation network model, and in practical application, a preset network model needs to be trained to obtain a trained full-graph-based segmentation network model or a frame-based segmentation network model, so that the training results in a full-graph-based segmentation network model and a frame-based segmentation network model are as follows:

in another possible implementation manner of the embodiment of the present disclosure, image segmentation processing is performed on a frame image to be detected through a segmentation network model based on a full graph, so as to obtain a segmentation result, which may further include: acquiring a first training sample; and training the first initial network model based on the first training sample to obtain a segmentation network model based on the full graph.

Wherein the first training sample comprises: and labeling information of whether each pixel point in the first images belongs to a target object or not.

For the embodiment of the disclosure, a plurality of first images are input into a first initial network model to perform image segmentation processing, the first initial network model outputs probability information that each pixel point in each first image belongs to a target object, and model parameters corresponding to the first initial network model are adjusted by using the probability information that each pixel point in each first image belongs to the target object and labeling information whether each pixel point in each first image belongs to the target object, so that training of the first initial network model by using a first training sample is realized.

For the embodiment of the disclosure, a plurality of first images are input into a first initial network model, the first initial network model can scale each first image in equal proportion to obtain each first image with a size not smaller than a first target size, and then randomly shearing each first image with a size not smaller than the first target size to obtain each first image with a first target size. In an embodiment of the present disclosure, the first target size is a size that meets the input size requirements of the first initial network model.

For the embodiment of the present disclosure, the network structure of the first initial network model is consistent with the network structure of the split network model based on the full graph, please refer to the related description of the above implementation in detail, which is not repeated in the embodiment of the present disclosure.

In another possible implementation manner of the present disclosure, the processing of dividing the detection area by using a division network model based on the detection frame may further include: acquiring a second training sample; and training the second initial network model based on the second training sample to obtain a segmentation network model based on the detection frame.

Wherein the second training sample comprises: and labeling information of whether each pixel point in the second images belongs to the target object or not.

For the embodiment of the disclosure, a plurality of second images are input to a second initial network model for image segmentation processing, the second initial network model outputs probability information that each pixel point in each second image belongs to a target object, and model parameters corresponding to the second initial network model are adjusted by using the probability information that each pixel point in each second image belongs to the target object and labeling information whether each pixel point in each second image belongs to the target object, so that training of the second initial network model by using second training samples is realized.

Further, acquiring the plurality of second images may include: acquiring a plurality of third images, and determining target area information in each third image; respectively carrying out expansion processing on target area information in each third image according to a preset expansion ratio to obtain target shearing areas in each third image; and performing shearing treatment on the target shearing area in each third image to obtain a plurality of second images.

For the embodiment of the disclosure, for any third image, a target object area in any third image may be determined, target area information is determined based on the target object area, expansion processing is performed on the target area information according to a preset expansion ratio, a target clipping area is obtained, and clipping processing is performed on the target clipping area, so as to obtain a second image. For example, for a third image of 5cm×3cm, an area surrounded by a human body contour (i.e., a target object area) in the third image may be determined, a rectangular area information (i.e., target area information) is determined based on the area surrounded by the human body contour, the area in the third image corresponding to the rectangular area information includes an area surrounded by the human body contour, the rectangular area information is coordinate information (0, 0) of the center and side length information of two sides, each side length is 1cm, the rectangular area information may be subjected to expansion processing according to an expansion ratio of 1.25 to obtain expanded rectangular area information, the coordinate information of the center is (0, 0) and the side lengths of the two sides are 1.25cm, the area in the third image corresponding to the expanded rectangular area information is the target clipping area, and the target clipping area in the third image is clipped to obtain a second image, i.e., a second image of 1.25cm×1.25cm is obtained by clipping.

Further, labeling is performed on each second image to obtain labeling information of whether each pixel point in each second image belongs to a target object, so as to obtain a second training sample, namely obtaining labeling information of whether each pixel point in each second image belongs to the target object or not.

For the embodiment of the disclosure, a plurality of second images are input to a second initial network model, the second initial network model can scale each second image in equal proportion to obtain each second image which is not smaller than a second target size, and then randomly shearing each second image which is not smaller than the second target size to obtain each second image with the second target size. The second image is obtained by clipping the third image based on the target region information in the third image, and is input into the second initial network model, and the second initial network model performs an equal-ratio scaling on the second image, which is actually equivalent to the second initial network model performing an unequal-ratio scaling on the third image. In an embodiment of the present disclosure, the second target size is a size that meets the input size requirements of the second initial network model.

For the embodiment of the present disclosure, the network structure of the second initial network model is consistent with the network structure of the frame-based segmentation network model, please refer to the related description of the above implementation in detail, which is not repeated in the embodiment of the present disclosure.

In the above method embodiment, the first preset condition is that a connected domain value calculated based on each pixel point is greater than a preset connected domain threshold; the second preset condition is to preset the last frame image of the video.

The above embodiments describe the image segmentation method provided by the embodiments of the present disclosure, and the image segmentation method provided by the embodiments of the present disclosure is described below through a specific application scenario. The method is specifically as follows:

another possible implementation of an embodiment of the disclosure, as shown in fig. 4, the image segmentation method may include:

step S401, acquiring a frame image to be detected.

Step S402, image segmentation processing is carried out on the frame image to be detected through a segmentation network model based on a full graph, and a segmentation result is obtained.

If the segmentation result meets the first preset condition, step S403 is executed to determine the region composed of the pixels with the first probability information greater than the first preset threshold as the target region.

If the segmentation result does not meet the first preset condition and does not meet the second preset condition, step S401 and step S402 are executed in a circulating manner until the segmentation result does not meet the first preset condition and meets the second preset condition, or the segmentation result meets the first preset condition.

And if the second preset condition is met, ending the image segmentation method of the embodiment of the disclosure.

If the second preset condition is not satisfied, step S404 is executed to determine the target area information based on the target object area.

Step S405, determining a frame image to be processed, and performing clipping processing on the frame image to be processed based on the target area information to obtain a detection area.

In step S406, the detection area is subjected to segmentation processing by using a segmentation network model based on the detection frame, so as to obtain second probability information that each pixel point in the detection area belongs to the target object.

If each pixel point in the detection area meets the first preset condition, step S407 is executed, the area formed by the pixel points with the second probability information greater than the second preset threshold is determined as the target object area, and if the second preset condition is not met, step S405, step S406 and step S407 are executed in a circulating manner until the second preset condition is met.

If each pixel point in the detection area does not meet the first preset condition, step S408 is executed in a circulating manner, a frame image to be processed in the detection area which does not meet the first preset condition is obtained, the frame image to be processed is determined to be the frame image to be detected, and step S402, step S403, step S404, step S405 and step S406 are executed until each pixel point in the detection area meets the first preset condition.

Wherein, satisfying the first preset condition means that the connected domain value calculated by using the pixel point is larger than the preset connected domain threshold, failing to satisfy the first preset condition means that the connected domain value calculated by using the pixel point is smaller than or equal to the preset connected domain threshold, satisfying the second preset condition means that the image is the last frame of image, failing to satisfy the second preset condition means that the image is not the last frame of image.

The relevant descriptions of the embodiments of the present disclosure may be referred to the relevant descriptions of the above embodiments, and the implementation principles thereof are similar, and are not repeated in the embodiments of the present application.

The above-described image segmentation method is specifically described in terms of method steps, and the image segmentation apparatus is described in terms of virtual modules or virtual units, specifically as follows:

the embodiment of the present disclosure provides an image segmentation apparatus, as shown in fig. 2, the image segmentation apparatus 20 may include: a determine target area module 201, a determine detection area module 202, and a first loop module 203, wherein,

The target area determining module 201 is configured to determine target area information in a target frame image.

The detection area determining module 202 is configured to determine a frame image to be processed, and determine a detection area of the frame image to be processed based on the target area information.

And the first circulation module 203 is configured to determine, when each pixel point in the detection area meets a first preset condition, a target object area based on each pixel point in the detection area, and perform operations corresponding to the detection area determination module and the first circulation module in a circulation manner until a second preset condition is met.

In another possible implementation manner of the embodiment of the present disclosure, the determining target area module 201 may include an acquiring unit, a dividing unit, and a determining unit, where,

and the acquisition unit is used for acquiring the frame image to be detected.

The segmentation unit is used for carrying out image segmentation processing on the frame image to be detected through a segmentation network model based on a full graph to obtain a segmentation result.

And a determination unit configured to determine target region information in the target frame image based on the segmentation result.

In another possible implementation manner of the embodiment of the present disclosure, the determining unit may be specifically configured to determine, when the segmentation result meets a first preset condition, target area information in the frame image to be detected based on the segmentation result, and determine the target area information in the frame image to be detected as the target area information in the target frame image.

In another possible implementation manner of the embodiment of the present disclosure, the determining unit may be further specifically configured to, when the segmentation result does not meet the first preset condition, cycle taking the next frame image as the frame image to be detected and executing operations corresponding to the acquiring unit, the segmentation unit, and the determining unit until the second preset condition is met or the segmentation result meets the first preset condition.

The determining unit may be further configured to determine target region information in the target frame image based on the segmentation result satisfying the first preset condition.

In another possible implementation manner of the embodiment of the present disclosure, the determining unit may be further specifically configured to, when the segmentation result does not meet the first preset condition, replace the frame image to be detected with the first preset special effect image, and use the next frame image as the frame image to be detected, execute the acquiring unit, the segmentation unit, and determine the corresponding operation until the second preset condition is met or the segmentation result meets the first preset condition.

In another possible implementation manner of the embodiment of the present disclosure, the segmentation result is first probability information that each pixel point in the frame image to be detected belongs to the target object.

The determining unit may be further configured to determine, as the target object area, an area composed of pixels whose first probability information is greater than a first preset threshold.

The determining unit may be further configured to determine the target area information based on the target object area.

In another possible implementation of the disclosed embodiments, the image segmentation apparatus 20 may further include a first processing module, where,

the first processing module is used for determining first probability information of each first pixel point belonging to the target object in the non-target object area based on the target frame image.

The first processing module is further configured to determine first pixel values corresponding to the first pixel points respectively in the second preset special effect image.

The first processing module is further configured to determine a special effect image corresponding to the target frame image based on the first probability information that each first pixel belongs to the target object, the first pixel value corresponding to each first pixel, and the target object area in the target frame image.

In another possible implementation manner of the embodiment of the present disclosure, when each pixel point in the detection area does not meet the first preset condition, the image segmentation apparatus 20 may further include a second circulation module, where,

the second circulation module is configured to perform operations corresponding to the predetermined frame image determined as the frame image to be detected, where the operations corresponding to the acquisition unit, the segmentation unit, the determination unit, the detection area determination module 202, and the first circulation module 203 are performed in a circulation manner until a second predetermined condition is satisfied.

The preset frame image is a frame image whose detection area does not satisfy a first preset condition.

Another possible implementation of the disclosed embodiments, the image segmentation apparatus 20 may further include a segmentation module, wherein,

the segmentation module is used for carrying out segmentation processing on the detection area through a segmentation network model based on the detection frame to obtain second probability information that each pixel point in the detection area belongs to the target object.

The first loop module 203 may be specifically configured to determine, as the target object area, an area formed by pixels having the second probability information greater than the second preset threshold when determining the target object area based on each pixel in the detection area.

In another possible implementation of the disclosed embodiments, the image segmentation apparatus 20 may further include a second processing module, where,

and the second processing module is used for determining second probability information of each second pixel point belonging to the target object in the non-target object area based on the frame image to be processed.

The second processing module is further configured to determine second pixel values corresponding to the second pixel points in the third preset special effect image.

The second processing module is further configured to determine a special effect image corresponding to the frame image to be processed based on second probability information that each second pixel belongs to the target object, second pixel values corresponding to each second pixel, and a target object area in the frame image to be processed.

In another possible implementation of the disclosed embodiments, the image segmentation apparatus 20 may further include a first determination module, wherein,

the first determining module is configured to determine third probability information that each pixel belongs to the target object based on first probability information that each pixel belongs to the target object in the frame image to be detected and first probability information or second probability information of a first preset pixel corresponding to each pixel, where each first preset pixel is a pixel in a previous frame image of the frame image to be detected.

The determining unit may be further configured to determine, when determining, as the target object area, an area composed of pixels whose first probability information is greater than a first preset threshold, and in particular, determine, as the target object area, an area composed of pixels whose third probability information is greater than the first preset threshold.

In another possible implementation of the disclosed embodiments, the image segmentation apparatus 20 may further include a second determination module, where,

the second determining module is configured to determine fourth probability information corresponding to each pixel point in the detection area based on second probability information that each pixel point in the detection area belongs to the target object and first probability information or second probability information corresponding to each pixel point in the detection area corresponding to a second preset pixel point, where each second preset pixel point is a pixel point in a previous frame of image in the detection area.

The first loop module may be further configured to determine, when determining, as the target object area, an area composed of pixels having second probability information greater than a second preset threshold, and determine, as the target object area, an area composed of pixels having fourth probability information greater than the second preset threshold.

In another possible implementation of the embodiment of the present disclosure, the image segmentation apparatus 20 may further include a first acquisition module and a first training module, where,

the first acquisition module is used for acquiring a first training sample.

And the first training module is used for training the first initial network model based on the first training sample to obtain a segmentation network model based on the full graph.

In another possible implementation of the disclosed embodiment, the image segmentation apparatus 20 may further include a second acquisition module and a second training module, where,

and the second acquisition module is used for acquiring a second training sample.

And the second training module is used for training the second initial network model based on the second training sample to obtain a segmentation network model based on the detection frame.

In another possible implementation manner of the embodiment of the present disclosure, the second obtaining module is specifically configured to obtain a plurality of third images and determine target area information in each third image when obtaining the plurality of second images.

The second acquisition module is specifically further configured to perform expansion processing on the target area information in each third image according to a preset expansion ratio, so as to obtain a target clipping area in each third image.

The second acquisition module is specifically configured to perform a cropping process on the target cropping area in each third image, so as to obtain a plurality of second images.

In another possible implementation manner of the embodiment of the present disclosure, the first preset condition is that a connected domain value calculated based on each pixel point is greater than a preset connected domain threshold.

In another possible implementation manner of the embodiment of the disclosure, the second preset condition is preset a last frame image of the video.

For the embodiment of the present disclosure, the first circulation module 203 and the second circulation module may be the same circulation module, or may be two different circulation modules, the first determination module and the second determination module may be the same determination module, or may be two different determination modules, the first acquisition module and the second acquisition module may be the same acquisition module, or may be two different acquisition modules, the first training module and the second training module may be the same training module, or may be two different training modules, and the first processing module and the second processing module may be the same processing module, or may be two different processing modules, which is not limited in the embodiment of the present disclosure.

The image segmentation apparatus 20 in this embodiment may perform an image segmentation method provided in the embodiment of the present disclosure, and its implementation principle is similar, and will not be described herein.

The embodiment of the disclosure provides an image segmentation apparatus, compared with the prior art, the embodiment of the disclosure comprises the steps of: determining target area information in a target frame image, wherein the target area is an area containing a target object in the target frame image, and determining a detection area: determining a frame image to be processed, determining a detection area of the frame image to be processed based on the target area information, and cycling the steps: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and circularly executing the steps of determining the detection area and circularly executing the steps until the second preset condition is met, determining the detection area of the subsequent frame image through the area of the target object contained in the target frame determined in the video to be processed, and determining the target object area through each pixel point in the detection area.

The image dividing apparatus of the present disclosure is described above from the viewpoint of a virtual module or a virtual unit, and the electronic device of the present disclosure is described below from the viewpoint of a physical apparatus.

Referring now to fig. 3, a schematic diagram of an electronic device 300 (which may be a terminal device or a server in the above-described method embodiments) suitable for use in implementing embodiments of the present disclosure is shown.

Wherein the electronic device 300 comprises:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the image segmentation method shown in the embodiment of the method is executed.

The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

The electronic device 300 includes: a memory and a processor, where the processor may be referred to as a processing device 301 described below, the memory may include at least one of a Read Only Memory (ROM) 302, a Random Access Memory (RAM) 303, and a storage device 308 described below, as follows:

as shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a target area: determining target area information in a target frame image, wherein a target area is an area containing a target object in the target frame image; determining a detection area: determining a frame image to be processed, and determining a detection area of the frame image to be processed based on target area information; the circulation steps are as follows: if each pixel point in the detection area meets the first preset condition, determining a target object area based on each pixel point in the detection area, and executing the step of determining the detection area and the step of circulating circularly until the second preset condition is met.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of a module or a unit does not constitute a limitation of the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires an image of a frame to be detected".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The embodiment of the disclosure provides an electronic device, which includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, performs: the detection area of the subsequent frame image is determined through the area of the target object contained in the target frame determined in the video to be processed, and the target object area is determined through each pixel point in the detection area, compared with the image segmentation of the frame image directly, the occupation of the target object in the detection area obtained based on the area containing the target object in the target frame is larger, the target object is more easily identified in the detection area, the complexity of segmenting the target object can be reduced, and the segmentation effect is improved.

The electronic apparatus of the present disclosure is described above in terms of a physical device, and the computer-readable medium of the present disclosure is described below in terms of a readable medium.

The disclosed embodiments provide a computer readable medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the method has the advantages that the detection area of the subsequent frame image is determined through the area of the target object contained in the target frame determined in the video to be processed, and the target object area is determined through each pixel point in the detection area, compared with the method for directly dividing the frame image, the method has the advantages that the occupation of the target object in the detection area obtained based on the area containing the target object in the target frame is large, the target object is more easily identified in the detection area, the complexity of dividing the target object can be reduced, and the dividing effect is improved.

According to one or more embodiments of the present disclosure, there is provided an image segmentation method including:

According to one or more embodiments of the present disclosure, determining target region information in a target frame image includes:

determining target area information in a target frame image based on the frame image to be detected:

acquiring a frame image to be detected;

carrying out image segmentation processing on the frame image to be detected through a segmentation network model based on a full graph to obtain a segmentation result;

based on the segmentation result, target region information in the target frame image is determined.

According to one or more embodiments of the present disclosure, determining target region information in a target frame image based on a segmentation result includes:

If the segmentation result meets the first preset condition, determining target area information in the frame image to be detected based on the segmentation result, and determining the target area information in the frame image to be detected as target area information in the target frame image.

if the segmentation result does not meet the first preset condition, the next frame image is used as the frame image to be detected and the step of determining the target area information in the target frame image based on the frame image to be detected is executed circularly until the second preset condition is met or the segmentation result meets the first preset condition;

and determining target area information in the target frame image based on the segmentation result meeting the first preset condition.

According to one or more embodiments of the present disclosure, the steps of cycling the next frame image as the frame image to be detected and performing the determination of the target region information in the target frame image based on the frame image to be detected until the second preset condition is satisfied or the segmentation result satisfies the first preset condition, include:

and circularly utilizing the first preset special effect image to replace the frame image to be detected, taking the next frame image as the frame image to be detected, and executing the steps of determining the target area information in the target frame image based on the frame image to be detected until the second preset condition is met or the segmentation result meets the first preset condition.

According to one or more embodiments of the present disclosure, the segmentation result is first probability information that each pixel point in the frame image to be detected belongs to the target object;

based on the segmentation result, determining target region information in the target frame image includes:

determining an area formed by pixel points with the first probability information larger than a first preset threshold value as a target object area;

based on the target object area, target area information is determined.

According to one or more embodiments of the present disclosure, an area composed of pixels whose first probability information is greater than a first preset threshold is determined as a target object area, and then further includes:

determining first probability information of each first pixel point belonging to a target object in a non-target object area based on the target frame image;

determining first pixel values corresponding to the first pixel points in a second preset special effect image;

and determining a special effect image corresponding to the target frame image based on the first probability information that each first pixel point belongs to the target object, the first pixel value corresponding to each first pixel point and the target object area in the target frame image.

According to one or more embodiments of the present disclosure, if each pixel point in the detection area does not meet the first preset condition, the method further includes:

The method comprises the steps of circularly executing the steps of determining a preset frame image as a frame image to be detected, determining target area information in a target frame image based on the frame image to be detected, determining a detection area and circularly executing until a second preset condition is met;

According to one or more embodiments of the present disclosure, determining the target object region based on the respective pixel points within the detection region further includes:

dividing the detection area through a division network model based on a detection frame to obtain second probability information of each pixel point in the detection area belonging to the target object;

wherein determining the target object area based on each pixel point in the detection area comprises:

and determining an area formed by pixel points with the second probability information larger than a second preset threshold value as a target object area.

According to one or more embodiments of the present disclosure, a region composed of pixels whose second probability information is greater than a second preset threshold is determined as a target object region, and then further includes:

determining second probability information of each second pixel point belonging to the target object in the non-target object area based on the frame image to be processed;

Determining second pixel values corresponding to the second pixel points in a third preset special effect image;

and determining a special effect image corresponding to the frame image to be processed based on the second probability information that each second pixel point belongs to the target object, the second pixel value corresponding to each second pixel point and the target object area in the frame image to be processed.

According to one or more embodiments of the present disclosure, the method further comprises:

determining third probability information of each pixel belonging to the target object based on first probability information of each pixel belonging to the target object in the frame image to be detected and first probability information or second probability information of first preset pixels corresponding to each pixel respectively, wherein each first preset pixel is a pixel in a previous frame image of the frame image to be detected;

determining an area composed of pixel points with the first probability information larger than a first preset threshold value as a target object area, wherein the method comprises the following steps:

and determining an area formed by pixel points with the third probability information larger than a first preset threshold value as a target object area.

determining fourth probability information corresponding to each pixel point in the detection area based on second probability information of the pixel points in the detection area belonging to the target object and first probability information or second probability information corresponding to each pixel point in the detection area corresponding to a second preset pixel point respectively, wherein each second preset pixel point is a pixel point in a previous frame image of the detection area;

The determining, as the target object area, an area composed of pixel points with second probability information greater than a second preset threshold value includes:

and determining an area formed by pixel points with fourth probability information larger than a second preset threshold value as a target object area.

According to one or more embodiments of the present disclosure, image segmentation processing is performed on a frame image to be detected through a segmentation network model based on a full graph, so as to obtain a segmentation result, which further includes:

acquiring a first training sample, the first training sample comprising: labeling information of whether each pixel point in each first image belongs to a target object or not;

and training the first initial network model based on the first training sample to obtain a segmentation network model based on the full graph.

According to one or more embodiments of the present disclosure, the processing of dividing the detection area by the detection frame-based dividing network model further includes:

obtaining a second training sample, the second training sample comprising: whether each pixel point in each second image belongs to the labeling information of the target object or not;

and training the second initial network model based on the second training sample to obtain a segmentation network model based on the detection frame.

According to one or more embodiments of the present disclosure, acquiring a plurality of second images includes:

acquiring a plurality of third images, and determining target area information in each third image;

respectively carrying out expansion processing on target area information in each third image according to a preset expansion ratio to obtain target shearing areas in each third image;

and performing shearing treatment on the target shearing area in each third image to obtain a plurality of second images.

According to one or more embodiments of the present disclosure, the first preset condition is that a connected domain value calculated based on each pixel point is greater than a preset connected domain threshold value.

According to one or more embodiments of the present disclosure, the second preset condition is a last frame image of the preset video.

According to one or more embodiments of the present disclosure, there is provided an image segmentation apparatus including:

And the first circulation module is used for determining a target object area based on each pixel point in the detection area if each pixel point in the detection area meets a first preset condition, and performing the operations corresponding to the detection area determination module and the first circulation module in a circulating manner until a second preset condition is met.

According to one or more embodiments of the present disclosure, the determining target region module includes an acquisition unit, a segmentation unit, and a determination unit, wherein,

the acquisition unit is used for acquiring the frame image to be detected;

the segmentation unit is used for carrying out image segmentation processing on the frame image to be detected through a segmentation network model based on a full graph to obtain a segmentation result;

According to one or more embodiments of the present disclosure, the determining unit is specifically configured to determine, based on the segmentation result, target area information in the frame image to be detected and determine the target area information in the frame image to be detected as target area information in the target frame image if the segmentation result meets a first preset condition.

According to one or more embodiments of the present disclosure, the determining unit is specifically further configured to, when the segmentation result does not meet the first preset condition, cycle taking a next frame image as the frame image to be detected and executing operations corresponding to the acquiring unit, the segmentation unit, and the determining unit until a second preset condition is met or the segmentation result meets the first preset condition;

The determining unit is specifically further configured to determine target region information in the target frame image based on the segmentation result that satisfies the first preset condition.

According to one or more embodiments of the present disclosure, the determining unit may be further configured to, when the segmentation result does not meet the first preset condition, replace the frame image to be detected with the first preset special effect image, and take the next frame image as the frame image to be detected, execute the acquiring unit, the segmentation unit, and determine the corresponding operation until the second preset condition is met or the segmentation result meets the first preset condition.

the determining unit is specifically configured to determine an area formed by pixel points with first probability information greater than a first preset threshold as a target object area;

the determining unit is specifically further configured to determine target area information based on the target object area.

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus may further include a first processing module, wherein,

According to one or more embodiments of the present disclosure, if each pixel point in the detection area does not meet the first preset condition, the image segmentation apparatus further includes a second circulation module, wherein,

the second circulation module is used for circularly executing the operations corresponding to the preset frame image which is determined to be the frame image to be detected, the acquisition unit, the segmentation unit, the determination unit, the detection area determination module and the first circulation module until a second preset condition is met;

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus further includes a segmentation module, wherein,

the segmentation module is used for carrying out segmentation processing on the detection area through a segmentation network model based on a detection frame to obtain second probability information that each pixel point in the detection area belongs to a target object;

The first loop module is specifically configured to determine, as the target object area, an area composed of pixels having second probability information greater than a second preset threshold when determining the target object area based on each pixel in the detection area.

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus may further include a second processing module, wherein,

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus further includes a first determination module, wherein,

the first determining module is used for determining third probability information of each pixel belonging to the target object based on first probability information of each pixel belonging to the target object in the frame image to be detected and first probability information or second probability information of first preset pixel corresponding to each pixel respectively, wherein each first preset pixel is a pixel in a previous frame image of the frame image to be detected;

The determining unit is specifically configured to determine, when determining, as the target object area, an area composed of pixels whose first probability information is greater than a first preset threshold, and is further configured to determine, as the target object area, an area composed of pixels whose third probability information is greater than the first preset threshold.

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus further includes a second determination module, wherein,

the second determining module is used for determining fourth probability information corresponding to each pixel point in the detection area based on second probability information of the pixel points in the detection area belonging to the target object and first probability information or second probability information corresponding to each pixel point in the detection area corresponding to a second preset pixel point respectively, wherein each second preset pixel point is a pixel point in a previous frame image of the detection area;

the first loop module is specifically configured to determine, when determining, as the target object area, an area composed of pixels having second probability information greater than a second preset threshold, and further determine, as the target object area, an area composed of pixels having fourth probability information greater than the second preset threshold.

According to one or more embodiments of the present disclosure, the image segmentation apparatus further comprises a first acquisition module and a first training module, wherein,

The first acquisition module is used for acquiring a first training sample, and the first training sample comprises: the method comprises the steps of enabling a plurality of first images and each pixel point in each first image to belong to labeling information of a target object;

In accordance with one or more embodiments of the present disclosure, the image segmentation apparatus further includes a second acquisition module and a second training module, wherein,

the second acquisition module is used for acquiring a second training sample, and the second training sample comprises: the second images and the labeling information of each pixel point in each second image belonging to the target object;

According to one or more embodiments of the present disclosure, the second acquiring module is specifically configured to acquire a plurality of third images and determine target area information in each third image when acquiring the plurality of second images.

According to one or more embodiments of the present disclosure, there is provided an electronic device, one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the image segmentation method according to the method embodiment is performed.

According to one or more embodiments of the present disclosure, a computer program is stored thereon, which when executed by a processor implements the image segmentation method shown in the method embodiments.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An image segmentation method, comprising:

Determining a detection area: determining a frame image to be processed, and determining a detection area of the frame image to be processed based on the target area information;

the circulation steps are as follows: if each pixel point in the detection area meets a first preset condition, determining a target object area based on each pixel point in the detection area, and circularly executing the step of determining the detection area and the step of circularly executing until a second preset condition is met;

wherein after the determining the target object area based on each pixel point in the detection area in the cycling step, the method further includes:

acquiring a third preset special effect image, and determining second pixel values corresponding to second pixel points in the non-target object area based on the third preset special effect image;

determining second pixel updating values of the second pixel points respectively based on second probability information that the second pixel points belong to the target object and second pixel values corresponding to the second pixel points respectively;

determining a special effect area corresponding to the non-target object area based on the second pixel updating value of each second pixel point;

And determining the special effect image corresponding to the frame image to be processed based on the special effect area corresponding to the non-target object area in the frame image to be processed and the target object area in the frame image to be processed.

2. The method according to claim 1, wherein determining target area information in the target frame image includes:

determining target area information in the target frame image based on the frame image to be detected:

acquiring a frame image to be detected;

and determining target area information in the target frame image based on the segmentation result.

3. The method according to claim 2, wherein the determining target region information in the target frame image based on the segmentation result includes:

and if the segmentation result meets the first preset condition, determining target area information in the frame image to be detected based on the segmentation result, and determining the target area information in the frame image to be detected as the target area information in the target frame image.

4. The method according to claim 2, wherein the determining target region information in the target frame image based on the segmentation result includes:

If the segmentation result does not meet the first preset condition, the next frame image is taken as a frame image to be detected and the step of determining the target area information in the target frame image based on the frame image to be detected is executed until a second preset condition is met or the segmentation result meets the first preset condition;

5. The method according to claim 4, wherein the steps of cycling the next frame image as a frame image to be detected and performing the step of determining target area information in the target frame image based on the frame image to be detected until a second preset condition is satisfied or the segmentation result satisfies the first preset condition include:

and circularly using the first preset special effect image to replace the frame image to be detected, taking the next frame image as the frame image to be detected, and executing the step of determining the target area information in the target frame image based on the frame image to be detected until a second preset condition is met or the segmentation result meets the first preset condition.

6. The method according to any one of claims 2 to 5, wherein the segmentation result is first probability information that each pixel point in the frame image to be detected belongs to the target object;

and determining the target area information based on the target object area.

7. The method according to claim 6, wherein the determining the region composed of the pixels whose first probability information is greater than the first preset threshold as the target object region further includes:

8. The method according to any one of claims 2-5, wherein if each pixel point in the detection area does not meet the first preset condition, the method further comprises:

the step of determining the preset frame image as the frame image to be detected, the step of determining the target area information in the target frame image based on the frame image to be detected, the step of determining the detection area and the step of circulating are circularly executed until a second preset condition is met;

9. The method of claim 1, wherein the determining the target object region based on the respective pixel points within the detection region further comprises:

wherein the determining the target object area based on each pixel point in the detection area includes:

and determining the region formed by the pixel points with the second probability information larger than a second preset threshold value as a target object region.

10. The method of claim 6, wherein the method further comprises:

determining third probability information of each pixel belonging to a target object based on first probability information of each pixel belonging to the target object in a frame image to be detected and first probability information or second probability information of first preset pixels corresponding to each pixel respectively, wherein each first preset pixel is a pixel in a previous frame image of the frame image to be detected;

and determining an area formed by pixel points with the third probability information larger than the first preset threshold value as a target object area.

11. The method according to claim 9, wherein the method further comprises:

determining fourth probability information corresponding to each pixel point in the detection area based on second probability information of the target object of each pixel point in the detection area and first probability information or second probability information corresponding to second preset pixel points corresponding to each pixel point in the detection area, wherein each second preset pixel point is a pixel point in a previous frame image of the detection area;

The determining the region composed of the pixel points with the second probability information larger than the second preset threshold value as the target object region comprises the following steps:

and determining an area formed by the pixel points with the fourth probability information larger than the second preset threshold value as a target object area.

12. The method according to claim 2, wherein the image segmentation processing is performed on the frame image to be detected through a segmentation network model based on a full graph to obtain a segmentation result, and further comprising:

obtaining a first training sample, the first training sample comprising: labeling information of whether each pixel point in each first image belongs to a target object or not;

and training the first initial network model based on the first training sample to obtain the segmentation network model based on the full graph.

13. The method of claim 9, wherein the segmenting the detection region by a detection frame based segmented network model further comprises:

And training a second initial network model based on the second training sample to obtain the segmentation network model based on the detection frame.

14. The method of claim 13, wherein acquiring the plurality of second images comprises:

respectively carrying out expansion processing on the target area information in each third image according to a preset expansion ratio to obtain target shearing areas in each third image;

and performing shearing treatment on the target shearing areas in the third images to obtain a plurality of second images.

15. The method according to any one of claims 1-5, 9, 11-14, wherein the first preset condition is that a connected domain value calculated based on each pixel point is greater than a preset connected domain threshold.

16. The method according to any one of claims 1-5, 9, 11-14, wherein the second preset condition is a last frame image of a preset video.

17. An image dividing apparatus, comprising:

the first circulation module is used for determining a target object area based on each pixel point in the detection area when each pixel point in the detection area meets a first preset condition, and performing the operations corresponding to the detection area determining module and the first circulation module in a circulation mode until a second preset condition is met;

wherein the determining the target object area based on each pixel point in the detection area in the cycling step further includes:

18. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the image segmentation method according to any one of claims 1 to 16 is performed.

19. A computer readable medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the image segmentation method according to any one of claims 1-16.