CN106875398B - Method, device and terminal for realizing interactive image segmentation - Google Patents

Method, device and terminal for realizing interactive image segmentation Download PDF

Info

Publication number
CN106875398B
CN106875398B CN201710005328.6A CN201710005328A CN106875398B CN 106875398 B CN106875398 B CN 106875398B CN 201710005328 A CN201710005328 A CN 201710005328A CN 106875398 B CN106875398 B CN 106875398B
Authority
CN
China
Prior art keywords
pixel
parameter
segmentation
image
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710005328.6A
Other languages
Chinese (zh)
Other versions
CN106875398A (en
Inventor
梁舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SHUKE WANGWEI TECHNOLOGY Co.,Ltd.
Original Assignee
Beijing Shuke Wangwei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shuke Wangwei Technology Co ltd filed Critical Beijing Shuke Wangwei Technology Co ltd
Priority to CN201710005328.6A priority Critical patent/CN106875398B/en
Publication of CN106875398A publication Critical patent/CN106875398A/en
Application granted granted Critical
Publication of CN106875398B publication Critical patent/CN106875398B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A method, a device and a terminal for realizing interactive image segmentation are disclosed. The method comprises the following steps: determining a first adjacent area of a smearing track or a drawing track on the original image as a marking area, determining a second adjacent area as an interesting area, and generating an input mask image of an image segmentation algorithm: taking pixels in the marked area as foreground points in the mask image, and taking pixels outside the marked area in the interested area as background points in the mask image; determining a first segmentation parameter of each pixel according to the color image and the mask image, determining a second segmentation parameter of each pixel according to the depth image and the mask image, and fusing the two segmentation parameters; mapping the fused segmentation parameters of each pixel into an undirected graph, operating a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph, and segmenting an image corresponding to a foreground point in the finely segmented mask graph from a color graph. The method can shorten the running time of the algorithm and improve the image segmentation effect by combining the depth information of the image.

Description

Method, device and terminal for realizing interactive image segmentation
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a terminal for implementing interactive image segmentation.
Background
Image segmentation refers to dividing a planar image into a plurality of regions that are not connected to each other according to characteristics such as color, texture, shape, and the like, which is a practical basic technology in the field of image processing. Conventional image segmentation techniques include a threshold-based segmentation method, an edge-based segmentation method, a region-based segmentation method, an energy functional-based segmentation method, a graph theory-based segmentation method, and the like. Among them, the GraphCut algorithm and its improved version, the GrabCut algorithm, are well known in the graph theory method.
The GraphCut algorithm and the GrabCut algorithm of an improved version thereof are an interactive image segmentation method based on region labeling. The GraphCut algorithm is an algorithm based on a Markov Random Field (MRF) energy minimization framework, and has the advantage that global optimal solution can be performed by combining various theoretical knowledge. The GrabCut algorithm is an improvement on the GraphCut algorithm, and is characterized in that a mask map is generated by marking foreground points (points on a target object to be extracted) and background points on an original image, a Gaussian Mixture Model (GMM) is established for foreground and background color spaces by using the original image and the mask map, energy minimization is completed by using an iterative algorithm capable of evolving in the GMM parameter learning and estimating processes, the foreground points and the background points in the image are determined, and the target image consisting of foreground point pixels is extracted from the original image.
When the GrabCut algorithm is used for image segmentation on a mobile phone, in order to reduce the complexity of interaction, strict requirements on how to mark a user are generally not made, so that the number of iterations is possibly large under the condition that the foreground points marked by the user are few, the running time of the algorithm is long, and the user experience is influenced. On the other hand, the GrabCut algorithm in the related art performs image segmentation based on a color image, and when the color feature of the target object to be extracted is not obvious, the segmentation effect by using the color image is not ideal.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method, a device and a terminal for realizing interactive image segmentation, which can shorten the running time of an algorithm and improve the image segmentation effect by combining the depth information of an image.
The embodiment of the invention provides a method for realizing interactive image segmentation, which comprises the following steps:
after detecting a smearing track or a delineating track on an original image, determining a first adjacent area of the smearing track or the delineating track as a marking area, and determining a second adjacent area of the smearing track or the delineating track as an area of interest, wherein the area of interest comprises the marking area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
the method comprises the steps of obtaining a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on a mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph and obtain the finely segmented mask graph;
and segmenting an image corresponding to the foreground point in the mask image after fine segmentation from the color image.
Optionally, determining a first segmentation parameter for each pixel on the mask map from the color map and mask map comprises: determining a first area item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a first area item segmentation parameter of the pixel, wherein the first area item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
Optionally, determining a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map, further includes: determining a second region item segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
performing Gaussian mixture model GMM calculation according to an EM method, wherein the EM method comprises an E step and MA step of; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
Optionally, determining a first segmentation parameter for each pixel on the mask map according to the color map and the mask map, further comprising: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
the method comprises the steps of accumulating numerical difference absolute values between pixels and adjacent pixels on RGB three color channels for any pixel, and then normalizing accumulated sums to obtain normalized accumulated sums serving as first boundary item segmentation parameters of the pixels.
Optionally, determining a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map, further includes: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
and for any pixel, accumulating the absolute values of the depth value difference values between the pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain the normalized accumulated sum as a second boundary item segmentation parameter of the pixel.
Optionally, fusing the first segmentation parameter with the second segmentation parameter includes: fusing the first region item segmentation parameter with the second region item segmentation parameter:
for any pixel, multiplying the first region item segmentation parameter by a weight (1-a) to obtain an adjusted first region item segmentation parameter, and multiplying the second region item segmentation parameter by the weight a to obtain an adjusted second region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter.
Optionally, fusing the first segmentation parameter and the second segmentation parameter, further comprising: fusing the first boundary item segmentation parameter with the second boundary item segmentation parameter:
multiplying the first boundary item segmentation parameter by a weight (1-a) to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1.
Optionally, the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1.
Optionally, constructing an undirected graph and mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, including:
constructing an undirected graph, and arranging two suspension points Q outside the plane of the undirected graph0And Q1Said suspension point Q0Is a virtual foreground point, theSuspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item segmentation parameter is used as a mapping point P in the undirected graphi' of the pixel PiThe fused boundary term segmentation parameter is used as a mapping point P in the undirected graphi' and flying Point Q0Or Q1The weight of the line between them.
Optionally, processing the undirected graph according to a min-max flow algorithm to obtain a finely segmented mask graph, including:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing a part of pixels in an undirected graph into a part of pixels and a floating point Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the weight sum of the foreground point set Q, wherein the weight sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q are added0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
Optionally, the image segmentation algorithm is a GrabCut algorithm.
The embodiment of the invention also provides a device for realizing interactive image segmentation, which comprises:
the device comprises a preprocessing module, a marking module and a judging module, wherein the preprocessing module is used for determining a first adjacent area of a smearing track or a delineating track as a marking area and determining a second adjacent area of the smearing track or the delineating track as an interesting area after the smearing track or the delineating track on an original image is detected, and the interesting area comprises the marking area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
the segmentation parameter calculation and fusion module is used for acquiring a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on the mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that the pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
the mask map adjusting module is used for constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask map into the undirected graph, processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask map, and obtaining the finely segmented mask map;
and the output module is used for segmenting an image corresponding to the foreground point in the mask image after the fine segmentation from the color image.
Optionally, the segmentation parameter calculation and fusion module is configured to determine a first segmentation parameter of each pixel on the mask map according to the color map and the mask map in the following manner: determining a first area item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; the division of the pixels obtained by the last execution of the M stepsDetermining the class as the classification of the pixel, and obtaining the maximum probability value P of the pixel belonging to a cluster obtained by executing M steps for the last timemaxDetermining a first area item segmentation parameter of the pixel, wherein the first area item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
Optionally, the segmentation parameter calculation and fusion module is further configured to determine a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map in the following manner: determining a second region item segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
Optionally, the segmentation parameter calculation and fusion module is further configured to determine a first segmentation parameter of each pixel on the mask map according to the color map and the mask map in the following manner: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
the method comprises the steps of accumulating numerical difference absolute values between pixels and adjacent pixels on RGB three color channels for any pixel, and then normalizing accumulated sums to obtain normalized accumulated sums serving as first boundary item segmentation parameters of the pixels.
Optionally, the segmentation parameter calculation and fusion module is further configured to determine a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map in the following manner: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
and for any pixel, accumulating the absolute values of the depth value difference values between the pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain the normalized accumulated sum as a second boundary item segmentation parameter of the pixel.
Optionally, the segmentation parameter calculation and fusion module is configured to fuse the first segmentation parameter and the second segmentation parameter in the following manner: fusing the first region item segmentation parameter with the second region item segmentation parameter:
for any pixel, multiplying the first region item segmentation parameter by a weight (1-a) to obtain an adjusted first region item segmentation parameter, and multiplying the second region item segmentation parameter by the weight a to obtain an adjusted second region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter.
Optionally, the segmentation parameter calculation and fusion module is further configured to fuse the first segmentation parameter and the second segmentation parameter in the following manner: fusing the first boundary item segmentation parameter with the second boundary item segmentation parameter:
multiplying the first boundary item segmentation parameter by a weight (1-a) to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1.
Optionally, the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1.
Optionally, the mask map adjusting module is configured to construct an undirected graph and map the fused segmentation parameters of each pixel in the mask map into the undirected graph in the following manner:
constructing an undirected graph, and arranging two suspension points Q outside the plane of the undirected graph0And Q1Said suspension point Q0As a virtual foreground point, the suspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item segmentation parameters are madeFor mapping point P in said undirected graphi' of the pixel PiThe fused boundary term segmentation parameter is used as a mapping point P in the undirected graphi' and flying Point Q0Or Q1The weight of the line between them.
Optionally, the mask map adjusting module is configured to process the undirected graph according to a min-max flow algorithm in the following manner to obtain a finely segmented mask map:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing a part of pixels in an undirected graph into a part of pixels and a floating point Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the weight sum of the foreground point set Q, wherein the weight sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q are added0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
Optionally, the image segmentation algorithm is a GrabCut algorithm.
The embodiment of the invention also provides a terminal which comprises the device for realizing the interactive image segmentation.
According to the method, the device and the terminal for realizing interactive image segmentation, a first adjacent area of a smearing track or a delineating track on an original image is determined as a mark area, a second adjacent area of the smearing track or the delineating track is determined as an interested area, and the interested area comprises the mark area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image; the method comprises the steps of obtaining a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on a mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters; constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph; and segmenting an image corresponding to the foreground point in the mask image after fine segmentation from the color image. The technical scheme can expand the number of foreground points marked by an image segmentation algorithm through image preprocessing and reduce the number of background points marked by the image segmentation algorithm, thereby shortening the running time of the image segmentation algorithm, respectively calculating segmentation parameters of each pixel based on a depth map and a color map and carrying out parameter fusion, and carrying out image segmentation by using the fused segmentation parameters.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an optional mobile terminal for implementing various embodiments of the present invention;
FIG. 2 is a diagram of a wireless communication system for the mobile terminal shown in FIG. 1;
FIG. 3 is a flowchart of a method for implementing interactive image segmentation according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram of an apparatus for implementing interactive image segmentation according to embodiment 2 of the present invention;
FIG. 5-a is a schematic diagram of an original image and a smear track of a user in application example 1 of the present invention;
FIG. 5-b is a schematic diagram of a mark region (circumscribed rectangle expansion) and a region of interest generated by a smear track in application example 1 of the present invention;
FIG. 5-c is a schematic view of a mask map generated from a mark region and a region of interest in application example 1 of the present invention;
fig. 5-d is a schematic view of a depth map of application example 1 of the present invention;
fig. 5-e is a schematic diagram of an undirected graph of application example 1 of the present invention;
FIG. 5 f is a schematic view of a mask pattern after fine division in application example 1 of the present invention;
fig. 5-g are schematic diagrams of the target object segmented in application example 1 of the present invention.
FIG. 6-a is a schematic diagram of a marking region (seed growth) and a region of interest generated by a smear track in application example 2 of the present invention;
FIG. 6-b-1 is a schematic diagram of the initial activity point and the peripheral neighborhood during the growth of the seed according to example 2;
FIG. 6-b-2 is a schematic diagram of a sub-region grown from an initial activity point in the seed growth process in application example 2 of the present invention;
6-b-3 are schematic diagrams of areas where the initial active points and the new active points grow together in the seed growth process in application example 2 of the present invention;
FIG. 6-c is a schematic view of a mask map generated from a mark region and a region of interest in application example 2 of the present invention;
FIG. 7-a is a schematic diagram of an original image and a user-traced trajectory in application example 3 of the present invention;
FIG. 7-b is a schematic diagram of a mark region (closed by a patch) and a region of interest generated by a sketched track in an application example 3 of the present invention;
fig. 7-c is a schematic diagram of a mask map generated from the mark region and the region of interest in application example 3 of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solution of the present invention will be described in more detail with reference to the accompanying drawings and examples.
A mobile terminal implementing various embodiments of the present application will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
The mobile terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.
Fig. 1 is a schematic diagram of a hardware structure of an optional mobile terminal for implementing various embodiments of the present application.
The mobile terminal 100 may include a wireless communication unit 110, an a/V (audio/video) input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory 160, an interface unit 170, a controller 180, and a power supply unit 190, etc.
Fig. 1 illustrates the mobile terminal 100 having various components, but it is to be understood that not all illustrated components are required to be implemented. More or fewer components may alternatively be implemented. The elements of the mobile terminal 100 will be described in detail below.
The wireless communication unit 110 may generally include one or more components that allow radio communication between the mobile terminal 100 and a wireless communication system or network. For example, the wireless communication unit 110 may include at least one of a broadcast receiving module 111, a mobile communication module 112, a wireless internet module 113, a short-range communication module 114, and a location information module 115.
The broadcast receiving module 111 receives a broadcast signal and/or broadcast associated information from an external broadcast management server via a broadcast channel. The broadcast channel may include a satellite channel and/or a terrestrial channel. The broadcast management server may be a server that generates and transmits a broadcast signal and/or broadcast associated information or a server that receives a previously generated broadcast signal and/or broadcast associated information and transmits it to a terminal. The broadcast signal may include a TV broadcast signal, a radio broadcast signal, a data broadcast signal, and the like. Also, the broadcast signal may further include a broadcast signal combined with a TV or radio broadcast signal. The broadcast associated information may also be provided via a mobile communication network, and in this case, the broadcast associated information may be received by the mobile communication module 112. The broadcast signal may exist in various forms, for example, it may exist in the form of an Electronic Program Guide (EPG) of Digital Multimedia Broadcasting (DMB), an Electronic Service Guide (ESG) of digital video broadcasting-handheld (DVB-H), and the like. The broadcast receiving module 111 may receive a signal broadcast by using various types of broadcasting systems. In particular, the broadcast receiving module 111 may receive digital broadcasting by using a digital broadcasting system such as a data broadcasting system of multimedia broadcasting-terrestrial (DMB-T), digital multimedia broadcasting-satellite (DMB-S), digital video broadcasting-handheld (DVB-H), forward link media (MediaFLO @), terrestrial digital broadcasting integrated service (ISDB-T), and the like. The broadcast receiving module 111 may be constructed to be suitable for various broadcasting systems that provide broadcast signals as well as the above-mentioned digital broadcasting systems. The broadcast signal and/or broadcast associated information received via the broadcast receiving module 111 may be stored in the memory 160 (or other type of storage medium).
The mobile communication module 112 transmits and/or receives radio signals to and/or from at least one of a base station (e.g., access point, node B, etc.), an external terminal, and a server. Such radio signals may include voice call signals, video call signals, or various types of data transmitted and/or received according to text and/or multimedia messages.
The wireless internet module 113 supports wireless internet access of the mobile terminal. The module may be internally or externally coupled to the terminal. The wireless internet access technology to which the module relates may include WLAN (wireless LAN) (Wi-Fi), Wibro (wireless broadband), Wimax (worldwide interoperability for microwave access), HSDPA (high speed downlink packet access), and the like.
The short-range communication module 114 is a module for supporting short-range communication. Some examples of short-range communication technologies include bluetooth (TM), Radio Frequency Identification (RFID), infrared data association (IrDA), Ultra Wideband (UWB), zigbee (TM), and the like.
The location information module 115 is a module for checking or acquiring location information of the mobile terminal. A typical example of the location information module 115 is a GPS (global positioning system). According to the current technology, the GPS calculates distance information and accurate time information from three or more satellites and applies triangulation to the calculated information, thereby accurately calculating three-dimensional current location information according to longitude, latitude, and altitude. Currently, a method for calculating position and time information uses three satellites and corrects an error of the calculated position and time information by using another satellite. In addition, the GPS can calculate speed information by continuously calculating current position information in real time.
The a/V input unit 120 is used to receive an audio or video signal. The a/V input unit 120 may include a camera 121 and a microphone 122, and the camera 121 processes image data of still pictures or video obtained by an image capturing apparatus in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 151. The image frames processed by the cameras 121 may be stored in the memory 160 (or other storage medium) or transmitted via the wireless communication unit 110, and two or more cameras 121 may be provided according to the construction of the mobile terminal 100. The microphone 122 may receive sounds (audio data) via the microphone 122 in a phone call mode, a recording mode, a voice recognition mode, or the like, and is capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the mobile communication module 112 in case of a phone call mode. The microphone 122 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.
The user input unit 130 may generate key input data to control various operations of the mobile terminal 100 according to a command input by a user. The user input unit 130 allows a user to input various types of information, and may include a keyboard, dome sheet, touch pad (e.g., a touch-sensitive member that detects changes in resistance, pressure, capacitance, and the like due to being touched), scroll wheel, joystick, and the like. In particular, when the touch pad is superimposed on the display unit 151 in the form of a layer, a touch screen may be formed.
The sensing unit 140 detects a current state of the mobile terminal 100 (e.g., an open or closed state of the mobile terminal 100), a position of the mobile terminal 100, presence or absence of contact (i.e., touch input) by a user with the mobile terminal 100, an orientation of the mobile terminal 100, acceleration or deceleration movement and direction of the mobile terminal 100, and the like, and generates a command or signal for controlling an operation of the mobile terminal 100. For example, when the mobile terminal 100 is implemented as a slide-type mobile phone, the sensing unit 140 may sense whether the slide-type phone is opened or closed. In addition, the sensing unit 140 can detect whether the power supply unit 190 supplies power or whether the interface unit 170 is coupled with an external device. The sensing unit 140 may include a proximity sensor 141.
The interface unit 170 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The identification module may store various information for authenticating a user using the mobile terminal 100 and may include a User Identity Module (UIM), a Subscriber Identity Module (SIM), a Universal Subscriber Identity Module (USIM), and the like. In addition, a device having an identification module (hereinafter, referred to as an "identification device") may take the form of a smart card, and thus, the identification device may be connected with the mobile terminal 100 via a port or other connection means. The interface unit 170 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and the external device.
In addition, when the mobile terminal 100 is connected with an external cradle, the interface unit 170 may serve as a path through which power is supplied from the cradle to the mobile terminal 100 or may serve as a path through which various command signals input from the cradle are transmitted to the mobile terminal 100. Various command signals or power input from the cradle may be used as a signal for identifying whether the mobile terminal 100 is accurately mounted on the cradle. The output unit 150 is configured to provide output signals (e.g., audio signals, video signals, alarm signals, vibration signals, etc.) in a visual, audio, and/or tactile manner. The output unit 150 may include a display unit 151, an audio output module 152, an alarm unit 153, and the like.
The display unit 151 may display information processed in the mobile terminal 100. For example, when the mobile terminal 100 is in a phone call mode, the display unit 151 may display a User Interface (UI) or a Graphical User Interface (GUI) related to a call or other communication (e.g., text messaging, multimedia file downloading, etc.). When the mobile terminal 100 is in a video call mode or an image capturing mode, the display unit 151 may display a captured image and/or a received image, a UI or GUI showing a video or an image and related functions, and the like.
Meanwhile, when the display unit 151 and the touch pad are overlapped with each other in the form of a layer to form a touch screen, the display unit 151 may serve as an input device and an output device. The display unit 151 may include at least one of a Liquid Crystal Display (LCD), a thin film transistor LCD (TFT-LCD), an Organic Light Emitting Diode (OLED) display, a flexible display, a three-dimensional (3D) display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and a typical transparent display may be, for example, a TOLED (transparent organic light emitting diode) display or the like. Depending on the particular desired implementation, mobile terminal 100 may include two or more display units (or other display devices), for example, mobile terminal 100 may include an external display unit (not shown) and an internal display unit (not shown). The touch screen may be used to detect a touch input pressure as well as a touch input position and a touch input area.
The audio output module 152 may convert audio data received by the wireless communication unit 110 or stored in the memory 160 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output module 152 may provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output module 152 may include a speaker, a buzzer, and the like.
The alarm unit 153 may provide an output to notify the mobile terminal 100 of the occurrence of an event. Typical events may include call reception, message reception, key signal input, touch input, and the like. In addition to audio or video output, the alarm unit 153 may provide output in different ways to notify the occurrence of an event. For example, the alarm unit 153 may provide an output in the form of vibration, and when a call, a message, or some other incoming communication (communicating communication) is received, the alarm unit 153 may provide a tactile output (i.e., vibration) to inform the user thereof. By providing such a tactile output, the user can recognize the occurrence of various events even when the user's mobile phone is in the user's pocket. The alarm unit 153 may also provide an output notifying the occurrence of an event via the display unit 151 or the audio output module 152.
The memory 160 may store software programs and the like for processing and controlling operations performed by the controller 180, or may temporarily store data (e.g., a phonebook, messages, still images, videos, and the like) that has been or will be output. Also, the memory 160 may store data regarding various ways of vibration and audio signals output when a touch is applied to the touch screen.
The memory 160 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the mobile terminal 100 may cooperate with a network storage device that performs a storage function of the memory 160 through a network connection.
The controller 180 generally controls the overall operation of the mobile terminal. For example, the controller 180 performs control and processing related to voice calls, data communications, video calls, and the like. In addition, the controller 180 may include a multimedia module 181 for reproducing (or playing back) multimedia data, and the multimedia module 181 may be constructed within the controller 180 or may be constructed separately from the controller 180. The controller 180 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
The power supply unit 190 receives external power or internal power and provides appropriate power required to operate various elements and components under the control of the controller 180.
The various embodiments described herein may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, such embodiments may be implemented in the controller 180. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in the memory 160 and executed by the controller 180.
Up to this point, the mobile terminal 100 has been described in terms of its functionality. In addition, the mobile terminal 100 in the embodiment of the present invention may be a mobile terminal such as a folder type, a bar type, a swing type, a slide type, and other various types, and is not limited herein.
The mobile terminal 100 as shown in fig. 1 may be configured to operate with communication systems such as wired and wireless communication systems and satellite-based communication systems that transmit data via frames or packets.
A communication system in which a mobile terminal according to the present invention is operable will now be described with reference to fig. 2.
Such communication systems may use different air interfaces and/or physical layers. For example, the air interface used by the communication system includes, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Universal Mobile Telecommunications System (UMTS) (in particular, Long Term Evolution (LTE)), global system for mobile communications (GSM), and the like. By way of non-limiting example, the following description relates to a CDMA communication system, but such teachings are equally applicable to other types of systems.
Referring to fig. 2, the CDMA wireless communication system may include a plurality of mobile terminals 100, a plurality of Base Stations (BSs) 270, Base Station Controllers (BSCs) 275, and a Mobile Switching Center (MSC) 280. The MSC280 is configured to interface with a Public Switched Telephone Network (PSTN) 290. The MSC280 is also configured to interface with a BSC275, which may be coupled to the base station 270 via a backhaul. The backhaul line may be constructed according to any of several known interfaces, which may include, for example, european/american standard high capacity digital lines (E1/T1), Asynchronous Transfer Mode (ATM), network protocol (IP), point-to-point protocol (PPP), frame relay, high-rate digital subscriber line (HDSL), Asymmetric Digital Subscriber Line (ADSL), or various types of digital subscriber lines (xDSL). It will be understood that a system as shown in fig. 2 may include multiple BSCs 275.
Each BS 270 may serve one or more sectors (or regions), each sector covered by a multi-directional antenna or an antenna pointing in a particular direction being radially distant from the BS 270. Alternatively, each partition may be covered by two or more antennas for diversity reception. Each BS 270 may be configured to support multiple frequency allocations, with each frequency allocation having a particular frequency spectrum (e.g., 1.25MHz, 5MHz, etc.).
The intersection of partitions with frequency allocations may be referred to as a CDMA channel. The BS 270 may also be referred to as a Base Transceiver Subsystem (BTS) or other equivalent terminology. In such a case, the term "base station" may be used to generically refer to a single BSC275 and at least one BS 270. The base stations may also be referred to as "cells". Alternatively, each partition of a particular BS 270 may be referred to as a plurality of cell sites.
As shown in fig. 2, a Broadcast Transmitter (BT)295 transmits a broadcast signal to the mobile terminal 100 operating within the system. A broadcast receiving module 111 as shown in fig. 1 is provided at the mobile terminal 100 to receive a broadcast signal transmitted by the BT 295. In fig. 2, several Global Positioning System (GPS) satellites 300 are shown. The satellite 300 assists in locating at least one of the plurality of mobile terminals 100.
In fig. 2, a plurality of satellites 300 are depicted, but it is understood that useful positioning information may be obtained with any number of satellites. The location information module 115 (e.g., GPS) as shown in fig. 1 is generally configured to cooperate with the satellites 300 to obtain desired positioning information. Other techniques that can track the location of the mobile terminal may be used instead of or in addition to GPS tracking techniques. In addition, at least one GPS satellite 300 may selectively or additionally process satellite DMB transmission.
As a typical operation of the wireless communication system, the BS 270 receives reverse link signals from various mobile terminals 100. The mobile terminal 100 is generally engaged in conversations, messaging, and other types of communications. Each reverse link signal received by a particular base station is processed within a particular BS 270. The obtained data is forwarded to the associated BSC 275. The BSC provides call resource allocation and mobility management functions including coordination of soft handoff procedures between BSs 270. The BSCs 275 also route the received data to the MSC280, which provides additional routing services for interfacing with the PSTN 290. Similarly, the PSTN290 interfaces with the MSC280, the MSC interfaces with the BSCs 275, and the BSCs 275 accordingly control the BS 270 to transmit forward link signals to the mobile terminal 100.
Based on the above mobile terminal hardware structure and communication system, various embodiments of the method of the present application are proposed.
As shown in fig. 3, an embodiment of the present invention provides a method for implementing interactive image segmentation, including:
s310, after detecting a smearing track or a delineating track on an original image, determining a first adjacent area of the smearing track or the delineating track as a mark area, and determining a second adjacent area of the smearing track or the delineating track as an area of interest, wherein the area of interest comprises the mark area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
s320, acquiring a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on the mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that the pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
s330, constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph and obtain the finely segmented mask graph;
s340, segmenting an image corresponding to the foreground point in the mask image after the fine segmentation from the color image;
the method may further comprise the following features:
wherein, the smearing and the delineation are two different marking modes;
generally, a smearing trajectory is a marking made in an inner region of a target object, and a delineating trajectory is a marking made along an outer contour of the target object;
the mask image is a mark image generated after foreground and background distinguishing is carried out on part or all pixels of one image, and each pixel on the mask image is marked as a foreground point or a background point.
In this embodiment, the image segmentation algorithm is the GrabCut algorithm.
In one embodiment, determining a first contiguous region of the smear track as a marker region and a second contiguous region of the smear track as a region of interest comprises:
constructing a minimum circumscribed rectangle of the smearing track, and expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area;
applying a rectangular template for the smearing track, generating a Region of Interest (ROI) containing the smearing track according to the rectangular template, and if the Region of Interest does not contain the mark Region, expanding the Region of Interest to contain the mark Region; or performing expansion treatment on the marked region to form a region of interest.
Wherein, the minimum circumscribed rectangle of the smearing track is constructed, and the minimum circumscribed rectangle is expanded according to the length L of the smearing track to form a mark area, and the method comprises the following steps:
determining a maximum abscissa value x _ max, a minimum abscissa value x _ min, a maximum ordinate value y _ max and a minimum ordinate value y _ min according to the abscissa and ordinate values of all pixels on the smearing track;
constructing a minimum circumscribed rectangle of the smearing track, wherein coordinates of four vertexes of the minimum circumscribed rectangle are as follows in sequence: (x _ min, y _ max), (x _ min, y _ min), (x _ max, y _ max), (x _ max, y _ min);
expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area, wherein the coordinates of four vertexes of the marking area are as follows: (x _ min-a, y _ max + a), (x _ min-a, y _ min-a), (x _ max + a, y _ max + a), (x _ max + a, y _ min-a); wherein a is an adjustable coefficient, a is L b, and b is a constant; b may be an empirical value.
In one embodiment, determining a first contiguous region of the smear track as a marker region and a second contiguous region of the smear track as a region of interest comprises:
acquiring edge gradient information of an original image, selecting partial pixels from the pixels of the smearing track as seeds, respectively growing each seed outwards to form a sub-region according to the edge gradient information of the image, combining the sub-regions formed by the outward growth of all the seeds to form an extended region, and taking the extended region as a mark region;
applying a geometric shape template according to the shape characteristics of the smearing track, generating an interested area containing the smearing track according to the geometric shape template, and if the interested area does not contain the marking area, expanding the interested area to contain the marking area; or performing expansion treatment on the marked region to form a region of interest.
Wherein the geometric form comprises: a rectangular template, a polygonal template, or an elliptical template;
wherein, each seed grows outwards to form a sub-region according to the edge gradient information of the image, and the following steps A-F are carried out on each seed:
step A: taking the seeds as a starting point, setting an energy value for the starting point, and marking the starting point as an active point;
b, judging whether a pixel marked as an active point exists or not, if so, executing the step C, otherwise, executing the step F;
and C: for any pixel A marked as an active point, judging whether an unchecked pixel B exists in four adjacent points of the upper, lower, left and right of the pixel A, if so, executing a step D, otherwise, executing a step E;
step D: for any pixel B which is not checked, if the pixel B meets the growth condition, marking the pixel B as a new active point, setting the energy value of the pixel B as a difference value obtained by subtracting the edge gradient value of the pixel B from the energy value of the pixel A, and returning to the step C; if the pixel B does not meet the growth condition, returning to the step C; wherein the growth conditions are: the edge gradient value of the pixel B is less than or equal to the energy value of the pixel A;
step E: judging that the pixel A finishes growing, removing the active point mark of the pixel A, classifying the pixel A into a foreground point set, and returning to the step B;
and F, forming a sub-region by all pixels in the foreground point set, wherein the sub-region is formed by the outward growth of the seeds according to the edge gradient information of the image.
In this embodiment, determining the first adjacent area of the delineating track as a mark area and the second adjacent area of the delineating track as an area of interest includes:
when the delineation track is closed, determining an area closed by the delineation track as a mark area; when the closed tracing track is not closed, closing the closed tracing track, if the closing is successful, determining a region closed by the closed tracing track as a mark region, and if the closing is unsuccessful, performing expansion processing on the tracing track, and determining the expanded region as the mark region;
applying a geometric shape template according to the shape characteristics of the delineating track, generating an interested area containing the delineating track according to the geometric shape template, and if the interested area does not contain the marking area, expanding the interested area to contain the marking area; or performing expansion treatment on the marked region to form a region of interest;
wherein the geometric form comprises: a rectangular template or an elliptical template;
wherein, the closed processing is carried out on the sketched track, which comprises the following steps:
if the distance between the starting point and the end point of the sketching track is larger than or equal to a threshold value, acquiring an edge line in an area between the starting point and the end point of the sketching track, and superposing the sketching track on the edge line; if the closed area can be formed by superimposing the edge line on the sketching track, the closing is judged to be successful, and if the closed area can not be formed by superimposing the edge line on the sketching track, the closing is judged to be failed;
and if the distance between the starting point and the end point of the sketching track is smaller than a threshold value, performing line segment connection between the starting point and the end point to finish the closing of the sketching track.
In this embodiment, determining the first segmentation parameters of each pixel on the mask map according to the color map and the mask map includes: determining a first area item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a first area item segmentation parameter of the pixel, wherein the first area item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel;
in this embodiment, determining the second segmentation parameter of each pixel on the mask map according to the depth map and the mask map further includes: determining a second region item segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel;
optionally, the convergence condition of the EM method may be: stopping the iteration process when the number of times of iteration operation reaches a threshold value;
in this embodiment, determining a first segmentation parameter for each pixel on the mask map according to the color map and the mask map further includes: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
accumulating the absolute values of the numerical difference values between the pixel and each adjacent pixel on the RGB three color channels for any pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a first boundary item segmentation parameter of the pixel;
the value ranges of the numerical values on the three RGB color channels are as follows: 0 to 255;
in this embodiment, determining the second segmentation parameter of each pixel on the mask map according to the depth map and the mask map further includes: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
accumulating the absolute values of the depth value difference values between any one pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a second boundary item segmentation parameter of the pixel;
wherein, the numerical range of the depth value may be: 0 to 255;
alternatively, each pixel adjacent to a pixel may be 8 pixels of the periphery of the pixel.
In this embodiment, fusing the first segmentation parameter and the second segmentation parameter includes: fusing the first region item segmentation parameter with the second region item segmentation parameter:
for any pixel, multiplying the first region item segmentation parameter by a weight (1-a) to obtain an adjusted first region item segmentation parameter, and multiplying the second region item segmentation parameter by the weight a to obtain an adjusted second region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter;
in this embodiment, fusing the first segmentation parameter and the second segmentation parameter further includes: fusing the first boundary item segmentation parameter with the second boundary item segmentation parameter:
multiplying the first boundary item segmentation parameter by a weight (1-a) to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1;
in the present embodiment, the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1;
in this embodiment, constructing an undirected graph and mapping the merged segmentation parameters of each pixel in the mask graph into the undirected graph includes:
constructing an undirected graph, and arranging two suspension points Q outside the plane of the undirected graph0And Q1Said suspension point Q0As a virtual foreground point, the suspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item segmentation parameter is used as a mapping point P in the undirected graphi' of the pixel PiThe fused boundary term segmentation parameter is used as a mapping point P in the undirected graphi' and flying Point Q0Or Q1The weight of the line between them.
In this embodiment, processing the undirected graph according to a min-max flow algorithm to obtain a finely divided mask graph includes:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing a portion of pixels in an undirected graph intoAnd a floating point Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the weight sum of the foreground point set Q, wherein the weight sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q are added0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
In the related technology, foreground points in an input mask image of an image segmentation algorithm are manually marked by a user, other pixels except the foreground points on an original image are marked as background points, and the problems that iteration times of distinguishing the foreground points and the background points by the image segmentation algorithm are increased due to the fact that the foreground points are marked less and the size of the input mask image is large, and running time of the algorithm is long exist. After the method provided by the embodiment of the invention is adopted, the number of the foreground points marked in the input mask image of the image segmentation algorithm is automatically expanded in a mode of generating the marking area, the number of the background points marked by the image segmentation algorithm can be reduced by generating the region of interest to replace the whole original image, the iteration frequency of distinguishing foreground points and background points by the image segmentation algorithm can be reduced, and the running time of the image segmentation algorithm is obviously reduced. On the other hand, the technical scheme of the embodiment of the invention can respectively calculate the segmentation parameters of each pixel based on the depth map and the color map and perform parameter fusion, and the image segmentation is performed by using the fused segmentation parameters.
As shown in fig. 4, an embodiment of the present invention provides an apparatus for implementing interactive image segmentation, including:
the preprocessing module 401 is configured to, after detecting a smearing track or a delineating track on an original image, determine a first adjacent area of the smearing track or the delineating track as a mark area, and determine a second adjacent area of the smearing track or the delineating track as an area of interest, where the area of interest includes the mark area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
a segmentation parameter calculation and fusion module 402, configured to obtain a color map including color information of a target object and a depth map including depth information of the target object, determine a first segmentation parameter of each pixel on the mask map according to the color map and the mask map, and determine a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map, where the first segmentation parameter and the second segmentation parameter are used to indicate a probability that a pixel is determined as a foreground point or a background point and a numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
the mask map adjusting module 403 is configured to construct an undirected graph, map the fused segmentation parameters of each pixel in the mask map into the undirected graph, and process the undirected graph according to a min-max flow algorithm to obtain a finely segmented mask map;
an output module 404, configured to segment an image corresponding to a foreground point in the mask map after the fine segmentation from the color map;
the apparatus may also include the following features:
wherein, the smearing and the delineation are two different marking modes;
generally, a smearing trajectory is a marking made in an inner region of a target object, and a delineating trajectory is a marking made along an outer contour of the target object;
the mask image is a mark image generated after foreground and background distinguishing is carried out on part or all pixels of one image, and each pixel on the mask image is marked as a foreground point or a background point.
In this embodiment, the image segmentation algorithm is the GrabCut algorithm.
In one embodiment, the pre-processing module is configured to determine a first adjacent area of the smearing track as the marker area and a second adjacent area of the smearing track as the region of interest by:
constructing a minimum circumscribed rectangle of the smearing track, and expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area;
applying a rectangular template for the smearing track, generating a Region of Interest (ROI) containing the smearing track according to the rectangular template, and if the Region of Interest does not contain the mark Region, expanding the Region of Interest to contain the mark Region; or performing expansion treatment on the marked region to form a region of interest.
Wherein, the minimum circumscribed rectangle of the smearing track is constructed, and the minimum circumscribed rectangle is expanded according to the length L of the smearing track to form a mark area, and the method comprises the following steps:
determining a maximum abscissa value x _ max, a minimum abscissa value x _ min, a maximum ordinate value y _ max and a minimum ordinate value y _ min according to the abscissa and ordinate values of all pixels on the smearing track;
constructing a minimum circumscribed rectangle of the smearing track, wherein coordinates of four vertexes of the minimum circumscribed rectangle are as follows in sequence: (x _ min, y _ max), (x _ min, y _ min), (x _ max, y _ max), (x _ max, y _ min);
expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area, wherein the coordinates of four vertexes of the marking area are as follows: (x _ min-a, y _ max + a), (x _ min-a, y _ min-a), (x _ max + a, y _ max + a), (x _ max + a, y _ min-a); wherein a is an adjustable coefficient, a is L b, and b is a constant; b may be an empirical value.
In one embodiment, the preprocessing module is configured to determine a first adjacent region of the smearing track as a marker region and a second adjacent region of the smearing track as a region of interest by:
acquiring edge gradient information of an original image, selecting partial pixels from the pixels of the smearing track as seeds, respectively growing each seed outwards to form a sub-region according to the edge gradient information of the image, combining the sub-regions formed by the outward growth of all the seeds to form an extended region, and taking the extended region as a mark region;
applying a geometric shape template according to the shape characteristics of the smearing track, generating an interested area containing the smearing track according to the geometric shape template, and if the interested area does not contain the marking area, expanding the interested area to contain the marking area; or performing expansion treatment on the marked region to form a region of interest.
Wherein the geometric form comprises: a rectangular template, a polygonal template, or an elliptical template;
wherein, each seed grows outwards to form a sub-region according to the edge gradient information of the image, and the following steps A-F are carried out on each seed:
step A: taking the seeds as a starting point, setting an energy value for the starting point, and marking the starting point as an active point;
b, judging whether a pixel marked as an active point exists or not, if so, executing the step C, otherwise, executing the step F;
and C: for any pixel A marked as an active point, judging whether an unchecked pixel B exists in four adjacent points of the upper, lower, left and right of the pixel A, if so, executing a step D, otherwise, executing a step E;
step D: for any pixel B which is not checked, if the pixel B meets the growth condition, marking the pixel B as a new active point, setting the energy value of the pixel B as a difference value obtained by subtracting the edge gradient value of the pixel B from the energy value of the pixel A, and returning to the step C; if the pixel B does not meet the growth condition, returning to the step C; wherein the growth conditions are: the edge gradient value of the pixel B is less than or equal to the energy value of the pixel A;
step E: judging that the pixel A finishes growing, removing the active point mark of the pixel A, classifying the pixel A into a foreground point set, and returning to the step B;
and F, forming a sub-region by all pixels in the foreground point set, wherein the sub-region is formed by the outward growth of the seeds according to the edge gradient information of the image.
In this embodiment, the preprocessing module is configured to determine a first adjacent area of the delineating track as a mark area and a second adjacent area of the delineating track as an area of interest by using the following method:
when the delineation track is closed, determining an area closed by the delineation track as a mark area; when the closed tracing track is not closed, closing the closed tracing track, if the closing is successful, determining a region closed by the closed tracing track as a mark region, and if the closing is unsuccessful, performing expansion processing on the tracing track, and determining the expanded region as the mark region;
applying a geometric shape template according to the shape characteristics of the delineating track, generating an interested area containing the delineating track according to the geometric shape template, and if the interested area does not contain the marking area, expanding the interested area to contain the marking area; or performing expansion treatment on the marked region to form a region of interest;
wherein the geometric form comprises: a rectangular template or an elliptical template;
wherein, the closed processing is carried out on the sketched track, which comprises the following steps:
if the distance between the starting point and the end point of the sketching track is larger than or equal to a threshold value, acquiring an edge line in an area between the starting point and the end point of the sketching track, and superposing the sketching track on the edge line; if the closed area can be formed by superimposing the edge line on the sketching track, the closing is judged to be successful, and if the closed area can not be formed by superimposing the edge line on the sketching track, the closing is judged to be failed;
and if the distance between the starting point and the end point of the sketching track is smaller than a threshold value, performing line segment connection between the starting point and the end point to finish the closing of the sketching track.
In this embodiment, the segmentation parameter calculation and fusion module is configured to determine the first segmentation parameter of each pixel on the mask map according to the color map and the mask map in the following manner: determining a first area item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a first area item segmentation parameter of the pixel, wherein the first area item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel;
in this embodiment, the segmentation parameter calculation and fusion module is further configured to determine the second segmentation parameter of each pixel on the mask map according to the depth map and the mask map in the following manner: determining a second region item segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by last M steps as the division of the pixelClass, maximum probability value P of the pixel belonging to a cluster obtained by last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel;
optionally, the convergence condition of the EM method may be: stopping the iteration process when the number of times of iteration operation reaches a threshold value;
in this embodiment, the segmentation parameter calculation and fusion module is further configured to determine the first segmentation parameter of each pixel on the mask map according to the color map and the mask map in the following manner: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
accumulating the absolute values of the numerical difference values between the pixel and each adjacent pixel on the RGB three color channels for any pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a first boundary item segmentation parameter of the pixel;
the value ranges of the numerical values on the three RGB color channels are as follows: 0 to 255;
in this embodiment, the segmentation parameter calculation and fusion module is further configured to determine the second segmentation parameter of each pixel on the mask map according to the depth map and the mask map in the following manner: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
accumulating the absolute values of the depth value difference values between any one pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a second boundary item segmentation parameter of the pixel;
wherein, the numerical range of the depth value may be: 0 to 255;
alternatively, each pixel adjacent to a pixel may be 8 pixels of the periphery of the pixel.
In this embodiment, the segmentation parameter calculation and fusion module is configured to fuse the first segmentation parameter and the second segmentation parameter in the following manner: fusing the first region item segmentation parameter with the second region item segmentation parameter:
for any pixel, multiplying the first region item segmentation parameter by a weight (1-a) to obtain an adjusted first region item segmentation parameter, and multiplying the second region item segmentation parameter by the weight a to obtain an adjusted second region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter;
in this embodiment, the segmentation parameter calculation and fusion module is further configured to fuse the first segmentation parameter and the second segmentation parameter in the following manner: fusing the first boundary item segmentation parameter with the second boundary item segmentation parameter:
multiplying the first boundary item segmentation parameter by a weight (1-a) to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1;
in the present embodiment, the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1;
in this embodiment, the mask map adjusting module is configured to construct an undirected graph and map the merged segmentation parameters of each pixel in the mask map into the undirected graph in the following manner:
constructing an undirected graph, and arranging two suspension points Q outside the plane of the undirected graph0And Q1Said suspension point Q0As a virtual foreground point, the suspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item segmentation parameter is used as a mapping point P in the undirected graphi' of the pixel PiThe fused boundary term segmentation parameter is used as a mapping point P in the undirected graphi' and flying Point Q0Or Q1The weight of the line between them.
In this embodiment, the mask map adjusting module is configured to process the undirected graph according to a min-max flow algorithm in the following manner to obtain a finely divided mask map:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing a part of pixels in an undirected graph into a part of pixels and a floating point Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the total weight of the foreground point set Q, wherein the weight isThe sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
According to the method provided by the embodiment of the invention, the number of the foreground points marked in the input mask image of the image segmentation algorithm is automatically expanded by generating the marking area, the number of the background points marked by the image segmentation algorithm can be reduced by generating the region of interest to replace the whole original image, the iteration frequency of the image segmentation algorithm for distinguishing foreground points and background points can be reduced, and the running time of the image segmentation algorithm is remarkably reduced. On the other hand, the technical scheme of the embodiment of the invention can respectively calculate the segmentation parameters of each pixel based on the depth map and the color map and perform parameter fusion, and the image segmentation is performed by using the fused segmentation parameters.
Example 3
The embodiment of the invention also provides a terminal which comprises the device for realizing the interactive image segmentation.
Application example 1
The method for preprocessing the image by using the preprocessing method comprises the following steps:
step S501, detecting that a user selects to mark a target object in a smearing mode;
for example, two keys for marking are provided on the interface, one is "smearing", and the other is "outlining", and if the user clicks the "smearing" key, the smearing trajectory is preprocessed.
Step S502, detecting that a user paints on an original image;
for example, as shown in fig. 5-a, the user has painted on an original image, which is a color map, and the target object is a "stapler";
step S503, constructing a minimum circumscribed rectangle of the smearing track, and expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area; applying a rectangular template for the smearing track, and generating an interested area containing the smearing track according to the rectangular template; if the region of interest does not contain the marker region, expanding the region of interest to contain the marker region;
wherein, the minimum circumscribed rectangle of the smearing track is constructed, and the minimum circumscribed rectangle is expanded according to the length L of the smearing track to form a mark area, and the method comprises the following steps:
determining a maximum abscissa value x _ max, a minimum abscissa value x _ min, a maximum ordinate value y _ max and a minimum ordinate value y _ min according to the abscissa and ordinate values of all pixels on the smearing track;
constructing a minimum circumscribed rectangle of the smearing track, wherein coordinates of four vertexes of the minimum circumscribed rectangle are as follows in sequence: (x _ min, y _ max), (x _ min, y _ min), (x _ max, y _ max), (x _ max, y _ min);
expanding the minimum circumscribed rectangle according to the length L of the smearing track to form a marking area, wherein the coordinates of four vertexes of the marking area are as follows: (x _ min-a, y _ max + a), (x _ min-a, y _ min-a), (x _ max + a, y _ max + a), (x _ max + a, y _ min-a); wherein a is an adjustable coefficient, a is L b, and b is a constant;
for example, as shown in fig. 5-b, the marking area corresponding to the smearing track is a first rectangular area containing the smearing track, the region of interest is a second rectangular area containing the marking area, and the border of the region of interest is indicated by a dashed line.
Step S504, generating an input mask map of an image segmentation algorithm (GrabCut algorithm): all pixels in the mark area are used as foreground points in the mask image, and pixels outside the mark area in the interested area are used as background points in the mask image.
For example, as shown in fig. 5-c, the rectangular dark region containing the smear track is a mark region, which is a foreground point block in the mask map (input mask map); the edge of the region of interest is represented by a dashed box, inside which the part excluding the marked area (foreground dot block) is the background dot block in the mask map.
Step S505, a color image and a depth image containing depth information of a target object are obtained;
as shown in fig. 5-a, the original image that the user smears is a color map;
as shown in fig. 5-f, the depth map is a map containing depth information, corresponding to the size of the color map; in the depth map, the darker part is shot farther, and the lighter part is shot closer.
Step S506, determining a first segmentation parameter of each pixel on the mask map according to the color map and the mask map, and determining a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
for any pixel on the mask image, the segmentation parameters of the pixel comprise region term segmentation parameters and boundary term segmentation parameters; the region item segmentation parameter of the pixel refers to the probability that the pixel is judged as a foreground point or a background point; the boundary term segmentation parameter of the pixel refers to the numerical difference between the pixel and the adjacent pixel;
wherein, a first area item segmentation parameter of each pixel on the mask image is determined according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches the specified number of times; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a first region term segmentation parameter for the pixel, the first region termThe cutting parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color image;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel;
wherein the color value of each pixel may be an RGB value;
wherein, according to the depth map and the mask map, determining a second region item segmentation parameter of each pixel on the mask map:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e, iteration operation and M, wherein the iteration process is stopped after the iteration operation times reach the specified times; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: GM according to each clusterM model determines the probability of each pixel belonging to each cluster, and for any pixel, the maximum probability value P of the pixel is determinedmaxThe corresponding cluster determines the classification of the pixel;
determining a first boundary term segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel, wherein the determining comprises the following steps: accumulating the absolute values of the numerical difference values between the pixel and each adjacent pixel on the RGB three color channels for any pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a first boundary item segmentation parameter of the pixel;
the value ranges of the numerical values on the three RGB color channels are as follows: 0 to 255;
wherein, determining a second boundary item segmentation parameter of the pixel according to the depth value difference between the pixel and the adjacent pixel comprises: accumulating the absolute values of the depth value difference values between any one pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a second boundary item segmentation parameter of the pixel;
wherein, the numerical range of the depth value may be: 0 to 255;
alternatively, each pixel adjacent to a pixel may be 8 pixels of the periphery of the pixel.
The fusing of the first region item segmentation parameters determined based on the color image and the second region item segmentation parameters determined based on the depth image comprises the following steps:
for any pixel, multiplying the first region item segmentation parameter by a weight (1-a) to obtain an adjusted first region item segmentation parameter, and multiplying the second region item segmentation parameter by the weight a to obtain an adjusted second region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter;
the fusing of the first boundary item segmentation parameter determined based on the color image and the second boundary item segmentation parameter determined based on the depth image comprises the following steps:
multiplying the first boundary item segmentation parameter by a weight (1-a) to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1;
wherein the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1;
step S507, constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph;
wherein, the undirected graph is shown in fig. 5-g, and two suspension points Q are arranged outside the plane of the undirected graph0And Q1Said suspension point Q0As a virtual foreground point, the suspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item segmentation parameter is used as a mapping point P in the undirected graphi' of the pixel PiThe fused boundary term segmentation parameter is used as a mapping point P in the undirected graphi' and flying Point Q0Or Q1The weight of the line between them.
And running a minimum cut Mincut-maximum flow MaxFlow algorithm according to the undirected graph to obtain a mask graph after fine segmentation, wherein the mask graph comprises the following steps:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing a part of pixels in an undirected graph into a part of pixels and a floating point Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the weight sum of the foreground point set Q, wherein the weight sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q are added0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
The finely divided mask pattern is shown in fig. 5-h, and the boundary between the foreground point and the background point of the finely divided mask pattern is clearer and finer than that of the original mask pattern.
Step S508, segmenting an image corresponding to the foreground point in the mask map after the fine segmentation from the color map.
Wherein the target object is segmented from the original color map based on the finely segmented mask map, the segmented "stapler" image being shown in fig. 5-i.
The application example can automatically expand the number of foreground points marked in an input mask image of the image segmentation algorithm by determining a first rectangular adjacent area into which the smearing track is expanded as a marking area and marking all pixels in the marking area as foreground points, can reduce the number of background points marked in the input mask image of the image segmentation algorithm by determining a second rectangular adjacent area into which the smearing track is expanded as an area of interest and marking pixels outside the marking area in the area of interest as background points, and can reduce the number of the background points marked in the input mask image of the image segmentation algorithm. On the other hand, the technical solution of the present application example can calculate the segmentation parameters of each pixel based on the depth map and the color map respectively and perform parameter fusion, and perform image segmentation using the fused segmentation parameters, so that the image segmentation effect can be improved compared with the image segmentation performed only by using the color map in the related art.
Application example 2
The method for preprocessing the image by using the preprocessing method comprises the following steps:
step S601, detecting that a user selects to mark a target object in a smearing mode;
for example, two keys for marking are provided on the interface, one is "smearing", and the other is "outlining", and if the user clicks the "smearing" key, the smearing trajectory is preprocessed.
Step S602, detecting that a user paints on an original image;
for example, as shown in fig. 5-a, the user has painted on the original image, and the target object is a "stapler";
step S603, acquiring edge gradient information of an original image, selecting partial pixels from the pixels of the smearing track as seeds, enabling each seed to respectively grow outwards to form a sub-region according to the edge gradient information of the image, combining the sub-regions formed by all the seeds growing outwards to form an extended region, and enabling the extended region to serve as a mark region; applying a geometric shape template according to the shape characteristics of the smearing track, and generating an interested area containing the smearing track according to the geometric shape template; if the region of interest does not contain the marker region, expanding the region of interest to contain the marker region;
wherein, as shown in fig. 6-a, a seed growing method can be adopted to grow a marking region from the smearing track; applying a rectangular template to generate an interested area containing the smearing track;
wherein, each seed grows outwards to form a sub-region according to the edge gradient information of the image, and the following steps A-F are carried out on each seed:
step A: taking the seeds as a starting point, setting an energy value for the starting point, and marking the starting point as an active point;
b, judging whether a pixel marked as an active point exists or not, if so, executing the step C, otherwise, executing the step F;
and C: for any pixel A marked as an active point, judging whether an unchecked pixel B exists in four adjacent points of the upper, lower, left and right of the pixel A, if so, executing a step D, otherwise, executing a step E;
step D: for any pixel B which is not checked, if the pixel B meets the growth condition, marking the pixel B as a new active point, setting the energy value of the pixel B as a difference value obtained by subtracting the edge gradient value of the pixel B from the energy value of the pixel A, and returning to the step C; if the pixel B does not meet the growth condition, returning to the step C; wherein the growth conditions are: the edge gradient value of the pixel B is less than or equal to the energy value of the pixel A;
step E: judging that the pixel A finishes growing, removing the active point mark of the pixel A, classifying the pixel A into a foreground point set, and returning to the step B;
f, forming a sub-region by all pixels in the foreground point set, wherein the sub-region is formed by the outward growth of the seeds according to the edge gradient information of the image;
the growth of a seed is described as follows:
as shown in fig. 6-b-1, in the edge gradient map, "steep 2" represents that the edge gradient value of the pixel is 2, the gradient value is large and belongs to a steep class, and "gentle 1" represents that the edge gradient value of the pixel is 1, the gradient value is small and belongs to a gentle class.
Taking the seed as a starting point O, marking the seed as an active point, and using the mark to represent the active point; the energy value of the starting point O may be set to 4; the size of the growing area can be affected by the difference of the energy value setting of the starting point O, and the larger the energy value is, the larger the growing area is.
As shown in fig. 6-b-2, when the starting point O is ready to grow to the right, the edge gradient value of the right adjacent pixel is 1, which meets the growth condition, so the first pixel to the right of the starting point O is marked as a new active point, and the energy value of the new active point is the difference (3) obtained by subtracting the edge gradient value (1) of the adjacent pixel from the energy value (4) of the starting point O. And (4) by using the same method, checking adjacent pixels in the upper, lower, left and right directions of the starting point O one by one, wherein the adjacent pixels meeting the growth conditions become new active points, and the four adjacent pixels of the starting point O meet the growth conditions and are marked as new active points. And after the starting point O finishes the check, removing the active point mark of the starting point O.
And for each newly marked active point, adopting the same growth method as the starting point O, and expanding the new active point outwards. A schematic of the sub-regions after all active sites have ceased growing is shown in figure 6-b-3.
Step S604, generating an input mask map of an image segmentation algorithm (GrabCut algorithm): all pixels in the mark area are used as foreground points in the mask image, and pixels outside the mark area in the interested area are used as background points in the mask image.
For example, as shown in fig. 6-c, the irregular dark regions containing the smear tracks are marker regions, which are blocks of foreground points in the mask map (input mask map); the edge of the region of interest is represented by a dashed box, inside which the part excluding the marked area (foreground dot block) is the background dot block in the mask map.
Step S605, a color image and a depth image containing depth information of a target object are obtained;
wherein, the original image which is smeared by the user is a color image; the depth map is a map containing depth information and is consistent with the size of the color map; in the depth map, the darker part is shot farther, and the lighter part is shot closer.
Step S606, determining a first segmentation parameter of each pixel on the mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
specifically, the method for calculating the first segmentation parameter and the second segmentation parameter of each pixel on the mask map and fusing the two segmentation parameters is the same as the related method described in step S506 of application example 1;
step S607, constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph;
the method for specifically constructing the undirected graph and mapping the segmentation parameters into the undirected graph and the method for processing the undirected graph according to the min-max flow algorithm are the same as the related method described in step S507 of application example 1;
compared with the initial mask image, the boundary between the foreground point and the background point of the finely divided mask image is clearer and finer.
Step S608, segmenting an image corresponding to the foreground point in the mask map after the fine segmentation from the color map.
Wherein the target object "stapler" can be segmented from the original color map based on the finely segmented mask map.
The application example determines the adjacent area grown by taking the smearing track as the seed as the marking area, marks all pixels in the marking area as foreground points, can automatically expand the number of foreground points marked in an input mask image of the image segmentation algorithm, determines the rectangular adjacent area of the smearing track as the area of interest, marks the pixels outside the marking area in the area of interest as background points, can reduce the number of background points marked in the input mask image of the image segmentation algorithm, can reduce the iteration times of distinguishing the foreground points and the background points of the image segmentation algorithm, and remarkably reduces the running time of the image segmentation algorithm. On the other hand, the technical solution of the present application example can calculate the segmentation parameters of each pixel based on the depth map and the color map respectively and perform parameter fusion, and perform image segmentation using the fused segmentation parameters, so that the image segmentation effect can be improved compared with the image segmentation performed only by using the color map in the related art.
Application example 3
The method for preprocessing the image comprises the following steps that a user outlines a target object interested by the user on an original image, and the preprocessing method is adopted to preprocess the image:
the method for preprocessing the image comprises the following steps that a user outlines a target object interested by the user on an original image, and the preprocessing method is adopted to preprocess the image:
step S701, detecting that a user selects to mark a target object in a sketching mode;
for example, two keys for marking are provided on the interface, one is "smearing" and the other is "outlining", and if the user clicks the "outlining" key, the outlining trajectory is preprocessed.
Step S702, detecting that a user outlines on an original image;
for example, as shown in fig. 7-a, the user has outlined on the original image, and the target object is a "stapler";
step S703 of determining, when the delineation track is closed, an area enclosed by the delineation track as a mark area; when the closed tracing track is not closed, closing the closed tracing track, if the closing is successful, determining a region closed by the closed tracing track as a mark region, and if the closing is unsuccessful, performing expansion processing on the tracing track, and determining the expanded region as the mark region; applying a geometric shape template according to the shape characteristics of the delineating track, and generating an interested area containing the delineating track according to the geometric shape template; if the region of interest does not contain the marker region, expanding the region of interest to contain the marker region;
as shown in fig. 7-b, if the distance between the starting point and the ending point of the closed trajectory is smaller than the threshold, performing line segment connection between the starting point and the ending point to complete the closing of the closed trajectory, and determining an area enclosed by the closed trajectory as a mark area; using a rectangular template to generate an interested area containing the delineation track;
step S704, generating an input mask map of the image segmentation algorithm (GrabCut algorithm): all pixels in the marking region are used as foreground points in a GrabCut algorithm input mask image, and pixels outside the marking region in the interested region are marked as background points in the GrabCut algorithm input mask image;
for example, as shown in fig. 7-c, the irregular dark area where the delineation trajectory is superimposed with a closed line segment is a mark area, which is a foreground point block in the mask map (input mask map); the edge of the region of interest is represented by a dashed box, and the part of the dashed box excluding the marked area (foreground point block) is a background point block in the mask diagram;
step S705, a color image and a depth image containing depth information of a target object are obtained;
wherein, the original image which is sketched by the user is a color image; the depth map is a map containing depth information and is consistent with the size of the color map; in the depth map, the darker part is shot farther, and the lighter part is shot closer.
Step S706, determining a first segmentation parameter of each pixel on the mask map according to the color map and the mask map, and determining a second segmentation parameter of each pixel on the mask map according to the depth map and the mask map, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters;
specifically, the method for calculating the first segmentation parameter and the second segmentation parameter of each pixel on the mask map and fusing the two segmentation parameters is the same as the related method described in step S506 of application example 1;
step S707, constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph to the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph;
the method for specifically constructing the undirected graph and mapping the segmentation parameters into the undirected graph and the method for processing the undirected graph according to the min-max flow algorithm are the same as the related method described in step S507 of application example 1;
compared with the initial mask image, the boundary between the foreground point and the background point of the finely divided mask image is clearer and finer.
Step S708, segmenting an image corresponding to the foreground point in the mask map after the fine segmentation from the color map.
Wherein the target object "stapler" can be segmented from the original color map based on the finely segmented mask map.
The application example determines the area with closed sketching track as the marking area, marks all pixels in the marking area as foreground points, can automatically expand the number of foreground points marked in the input mask image of the image segmentation algorithm, determines the rectangular adjacent area of the sketching track as the area of interest, marks the pixels outside the marking area in the area of interest as background points, can reduce the number of background points marked in the input mask image of the image segmentation algorithm, can reduce the iteration times of distinguishing the foreground points and the background points of the image segmentation algorithm through the processing, and remarkably reduces the running time of the image segmentation algorithm. On the other hand, the technical solution of the present application example can calculate the segmentation parameters of each pixel based on the depth map and the color map respectively and perform parameter fusion, and perform image segmentation using the fused segmentation parameters, so that the image segmentation effect can be improved compared with the image segmentation performed only by using the color map in the related art.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method of implementing interactive image segmentation, comprising:
after detecting a smearing track or a delineating track on an original image, determining a first adjacent area of the smearing track or the delineating track as a marking area, and determining a second adjacent area of the smearing track or the delineating track as an area of interest, wherein the area of interest comprises the marking area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
the method comprises the steps of obtaining a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on a mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that a pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters, wherein determining the first segmentation parameters of each pixel on the mask map according to the color map and the mask map comprises: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
accumulating the absolute values of the numerical difference values between the pixel and each adjacent pixel on the RGB three color channels for any pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a first boundary item segmentation parameter of the pixel;
determining second segmentation parameters of each pixel on the mask map according to the depth map and the mask map, wherein the second segmentation parameters comprise: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
accumulating the absolute values of the depth value difference values between any one pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a second boundary item segmentation parameter of the pixel;
constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, and processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask graph and obtain the finely segmented mask graph;
and segmenting an image corresponding to the foreground point in the mask image after fine segmentation from the color image.
2. The method of claim 1, wherein:
determining a first segmentation parameter for each pixel on the mask map according to the color map and the mask map, further comprising: determining a first area item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a first area item segmentation parameter of the pixel, wherein the first area item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the color map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the color values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
3. The method of claim 2, wherein:
determining a second segmentation parameter for each pixel on the mask map from the depth map and mask map, further comprising: determining a second region item segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
performing Gaussian Mixture Model (GMM) calculation according to an EM (effective electromagnetic) method, wherein the EM method comprises an E step and an M step; e and M steps of iterative operation, stopping the iterative process after the iterative operation reaches a convergence condition; determining the classification of the pixel obtained by the last M steps as the classification of the pixel, and determining the maximum probability value P of the pixel belonging to a cluster obtained by the last M stepsmaxDetermining a second region item segmentation parameter of the pixel, wherein the second region item segmentation parameter is the probability that the pixel is judged to be a foreground point or a background point based on the depth map;
wherein, the step E and the step M respectively comprise the following processing:
e, step E: clustering the pixels of the same type into one or more clusters according to the depth values of the pixels on the mask map and the position relation among the pixels, and determining a GMM model of each cluster; wherein the classification of the pixels comprises foreground points or background points; the cluster classification comprises foreground point clusters or background point clusters;
and M: determining the probability of each pixel belonging to each cluster according to the GMM model of each cluster, and for any pixel, determining the maximum probability value P of the pixelmaxThe corresponding cluster determines the classification of the pixel.
4. The method of claim 3, wherein:
fusing the first segmentation parameters with the second segmentation parameters, including: fusing the first region item segmentation parameter with the second region item segmentation parameter:
for any pixel, multiplying the first area item segmentation parameter by a weight 1-a to obtain an adjusted first area item segmentation parameter, and multiplying the second area item segmentation parameter by a weight a to obtain an adjusted second area item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is the same as the pixel classification indicated by the second region item segmentation parameter, taking the sum of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as a fused region item segmentation parameter;
if the pixel classification indicated by the first region item segmentation parameter is different from the pixel classification indicated by the second region item segmentation parameter, taking the pixel classification indicated by the larger value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the final classification of the pixel, and taking the absolute value of the difference value of the adjusted first region item segmentation parameter and the adjusted second region item segmentation parameter as the fused region item segmentation parameter.
5. The method of claim 4, wherein:
fusing the first segmentation parameters with the second segmentation parameters, further comprising: fusing the first boundary item segmentation parameter with the second boundary item segmentation parameter:
multiplying the first boundary item segmentation parameter by a weight 1-a to obtain an adjusted first boundary item segmentation parameter, multiplying the second boundary item segmentation parameter by a weight a to obtain an adjusted second boundary item segmentation parameter, and then adding the adjusted first boundary item segmentation parameter and the adjusted second boundary item segmentation parameter to obtain a fused boundary item segmentation parameter of the pixel; a is greater than or equal to 0 and less than or equal to 1.
6. The method of claim 5, wherein:
the weight a is determined according to a self-evaluation parameter k1 and a consistency parameter k 2: taking the product of the self-evaluation parameter k1 and the consistency parameter k2 as the weight a;
the self-evaluation parameter k1 is determined in the following manner: determining the distance degree of the shooting distance corresponding to the pixel according to the depth value of the pixel, and setting a self-evaluation parameter k1 according to the distance degree of the shooting distance, wherein the self-evaluation parameter k1 is set to be larger as the shooting distance is shorter; k1 is greater than or equal to 0 and less than or equal to 1;
wherein the consistency parameter k2 is determined in the following manner:
setting a consistency parameter k2 as a first constant if the first boundary term segmentation parameter is equal to the second boundary term segmentation parameter;
if the first boundary item partition parameter and the second boundary item partition parameter are not equal, setting a consistency parameter k2 as a first constant when the first boundary item partition parameter and the second boundary item partition parameter are simultaneously greater than a threshold or simultaneously less than a threshold; when the first boundary item segmentation parameter and the second boundary item segmentation parameter are not larger than a threshold value or not smaller than the threshold value at the same time, setting a consistency parameter k2 as a second constant; the first constant is greater than the second constant; the first constant is greater than 0 and less than or equal to 1, and the second constant is greater than 0 and less than 1.
7. The method of claim 1, wherein:
constructing an undirected graph and mapping the fused segmentation parameters of each pixel in the mask graph into the undirected graph, comprising:
constructing an undirected graph, and arranging two suspension points Q outside the plane of the undirected graph0And Q1Said suspension point Q0As a virtual foreground point, the suspension point Q1Is a virtual background point; establishing mapping points of all pixels on the mask image on the plane of the undirected graph, mapping points of foreground points and the suspension point Q0A connecting line is established between the mapping point of the background point and the suspension point Q1Establishing a connection between the two devices;
for any pixel P in the mask imageiThe pixel PiThe fused region item division parameter of (1) is used as a mapping point P 'in the undirected graph'iThe weight of (2), the pixel PiThe fused boundary term division parameter of (2) is used as a mapping point P 'in the undirected graph'iAnd a floating point Q0Or Q1The weight of the line between them.
8. The method of claim 7, wherein:
processing the undirected graph according to a minimum cut-maximum flow algorithm to obtain a finely divided mask graph, comprising:
iteratively executing the following steps C and D, stopping the iterative process after the iterative operation reaches a convergence condition, and taking each pixel in the foreground point set Q as a foreground point in a mask image after fine segmentation;
wherein, step C and step D include the following treatment respectively:
c, step C: dividing part of pixels in an undirected graph into points Q0The foreground points of the same kind form a foreground point set Q by pixels divided into the foreground points;
d, step: calculating the weight sum of the foreground point set Q, wherein the weight sum is the weight sum of all foreground points in the foreground point set Q, and all foreground points and suspension points Q in the foreground point set Q are added0The sum of the weights of the connecting lines;
and the convergence condition is that the sum of the weights of the foreground point set Q is smaller than a threshold value and the change tends to be stable.
9. An apparatus for enabling interactive image segmentation, comprising:
the device comprises a preprocessing module, a marking module and a judging module, wherein the preprocessing module is used for determining a first adjacent area of a smearing track or a delineating track as a marking area and determining a second adjacent area of the smearing track or the delineating track as an interesting area after the smearing track or the delineating track on an original image is detected, and the interesting area comprises the marking area; generating an input mask map for an image segmentation algorithm: all pixels in the marked area are used as foreground points in the mask image, and pixels outside the marked area in the interested area are used as background points in the mask image;
the segmentation parameter calculation and fusion module is used for acquiring a color image containing color information of a target object and a depth image containing depth information of the target object, determining a first segmentation parameter of each pixel on the mask image according to the color image and the mask image, and determining a second segmentation parameter of each pixel on the mask image according to the depth image and the mask image, wherein the first segmentation parameter and the second segmentation parameter are used for representing the probability that the pixel is judged as a foreground point or a background point and the numerical difference between the pixel and an adjacent pixel; fusing the first segmentation parameters with the second segmentation parameters, wherein determining the first segmentation parameters of each pixel on the mask map according to the color map and the mask map comprises: determining a first boundary item segmentation parameter of each pixel on the mask image according to the color image and the mask image:
determining a first boundary item segmentation parameter of the pixel according to the color difference between the pixel and an adjacent pixel;
accumulating the absolute values of the numerical difference values between the pixel and each adjacent pixel on the RGB three color channels for any pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a first boundary item segmentation parameter of the pixel;
determining second segmentation parameters of each pixel on the mask map according to the depth map and the mask map, wherein the second segmentation parameters comprise: determining a second boundary term segmentation parameter of each pixel on the mask map according to the depth map and the mask map:
determining a second boundary item segmentation parameter of the pixel according to the depth value difference of the pixel and the adjacent pixel;
accumulating the absolute values of the depth value difference values between any one pixel and each adjacent pixel, and then normalizing the accumulated sum to obtain a normalized accumulated sum as a second boundary item segmentation parameter of the pixel;
the mask map adjusting module is used for constructing an undirected graph, mapping the fused segmentation parameters of each pixel in the mask map into the undirected graph, processing the undirected graph according to a minimum segmentation-maximum flow algorithm to obtain a finely segmented mask map, and obtaining the finely segmented mask map;
and the output module is used for segmenting an image corresponding to the foreground point in the mask image after the fine segmentation from the color image.
10. A terminal comprising the apparatus for implementing interactive image segmentation as claimed in claim 9.
CN201710005328.6A 2017-01-04 2017-01-04 Method, device and terminal for realizing interactive image segmentation Active CN106875398B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710005328.6A CN106875398B (en) 2017-01-04 2017-01-04 Method, device and terminal for realizing interactive image segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710005328.6A CN106875398B (en) 2017-01-04 2017-01-04 Method, device and terminal for realizing interactive image segmentation

Publications (2)

Publication Number Publication Date
CN106875398A CN106875398A (en) 2017-06-20
CN106875398B true CN106875398B (en) 2020-06-19

Family

ID=59165514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710005328.6A Active CN106875398B (en) 2017-01-04 2017-01-04 Method, device and terminal for realizing interactive image segmentation

Country Status (1)

Country Link
CN (1) CN106875398B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11164319B2 (en) 2018-12-20 2021-11-02 Smith & Nephew, Inc. Machine learning feature vector generator using depth image foreground attributes
US11109586B2 (en) * 2019-11-13 2021-09-07 Bird Control Group, Bv System and methods for automated wildlife detection, monitoring and control

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820990A (en) * 2015-05-15 2015-08-05 北京理工大学 Interactive-type image-cutting system
CN104992445B (en) * 2015-07-20 2017-10-20 河北大学 A kind of automatic division method of CT images pulmonary parenchyma
CN105701799B (en) * 2015-12-31 2018-10-30 东软集团股份有限公司 Divide pulmonary vascular method and apparatus from lung's mask image

Also Published As

Publication number Publication date
CN106875398A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106846345B (en) Method, device and terminal for realizing interactive image segmentation
CN106886999B (en) Method, device and terminal for realizing interactive image segmentation
CN106651867B (en) Method, device and terminal for realizing interactive image segmentation
CN106898003B (en) Method, device and terminal for realizing interactive image segmentation
CN107145839B (en) Fingerprint image completion simulation method and system
CN106875399B (en) Method, device and terminal for realizing interactive image segmentation
CN106898005B (en) Method, device and terminal for realizing interactive image segmentation
CN105354838A (en) Method and terminal for acquiring depth information of weak texture region in image
CN106846323B (en) Method, device and terminal for realizing interactive image segmentation
CN106778887B (en) Terminal and method for determining sentence mark sequence based on conditional random field
CN106898004B (en) Preprocessing method, device and terminal for realizing interactive image segmentation
CN106780516B (en) Method, device and terminal for realizing interactive image segmentation
CN106791119B (en) Photo processing method and device and terminal
CN106875398B (en) Method, device and terminal for realizing interactive image segmentation
CN106875397B (en) Method, device and terminal for realizing interactive image segmentation
CN106898002B (en) Method, device and terminal for realizing interactive image segmentation
CN106887009B (en) Method, device and terminal for realizing interactive image segmentation
CN105554285B (en) Processing method for taking person photo and intelligent mobile terminal
CN106873981B (en) Icon processing method and device and terminal
CN106887007B (en) Method, device and terminal for realizing interactive image segmentation
CN106780517B (en) Method, device and terminal for realizing interactive image segmentation
CN106846333B (en) Method, device and terminal for realizing interactive image segmentation
CN106898006B (en) Preprocessing method, device and terminal for realizing interactive image segmentation
CN106887008B (en) Method, device and terminal for realizing interactive image segmentation
CN106843649B (en) Icon processing method and device and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200529

Address after: Room 601, floor 6, building 1, Lane ditch 1, Haidian District, Beijing 100000

Applicant after: BEIJING SHUKE WANGWEI TECHNOLOGY Co.,Ltd.

Address before: 518000 Guangdong Province, Shenzhen high tech Zone of Nanshan District City, No. 9018 North Central Avenue's innovation building A, 6-8 layer, 10-11 layer, B layer, C District 6-10 District 6 floor

Applicant before: NUBIA TECHNOLOGY Co.,Ltd.