CN105046689B - A kind of interactive stereo-picture fast partition method based on multi-level graph structure - Google Patents
A kind of interactive stereo-picture fast partition method based on multi-level graph structure Download PDFInfo
- Publication number
- CN105046689B CN105046689B CN201510354774.9A CN201510354774A CN105046689B CN 105046689 B CN105046689 B CN 105046689B CN 201510354774 A CN201510354774 A CN 201510354774A CN 105046689 B CN105046689 B CN 105046689B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- msub
- tau
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 11
- 238000005192 partition Methods 0.000 title abstract 2
- 230000011218 segmentation Effects 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 18
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims abstract description 9
- 238000013179 statistical model Methods 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 21
- 239000003086 colorant Substances 0.000 claims description 14
- 238000003709 image segmentation Methods 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 4
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 101150064138 MAP1 gene Proteins 0.000 claims description 2
- 101150077939 mapA gene Proteins 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 230000001960 triggered effect Effects 0.000 claims 1
- 230000001419 dependent effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
Landscapes
- Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A kind of interactive stereo-picture fast partition method based on multi-level graph structure, inputs one group of stereo-picture, disparity map is obtained by stereo matching algorithm first.In any one figure of original image or so before specified portions, background.Before being established according to the method for specified portions application CUDA parallel computations, the priori statistical model that the color of background and parallax are distributed.By carrying out gaussian filtering to original image, down-sampling obtains the less image of coarse scale, then coarse image is formed to multi-level graph structure together with original image.In view of the problem of there is parted pattern complexity in stereo-picture segmentation at present, computational efficiency is low.The present invention explores new dividing method under the theoretical frame of the stereo sync segmentation based on disparity map.Try hard to the complexity of simplified model, the task of parallel processing computation-intensive, improve stereo-picture splitting speed, realize the purpose of Real-time segmentation common-size stereo-picture.
Description
Technical Field
The invention belongs to the crossing field of image processing, computer graphics, computer vision and the like, and relates to an interactive three-dimensional image fast segmentation method based on a multi-hierarchy graph structure.
Background
In recent years, 3D technology has been developed, and from 3D stereoscopic television to 3D stereoscopic movie, there is an urgent need for 3D content creation and 3D editing tool development. Interactive stereo image segmentation is one of the important tasks, and is the most important process link for many applications, such as object recognition, tracking, image classification, image editing, and image reconstruction. At present, the stereo image segmentation is applied to the segmentation and analysis of organs in medical images, the tracking of objects, the understanding of scenes and other practical lives. Therefore, the stereo image segmentation efficiency is an important research direction.
Compared with the segmentation of a single image, the intelligent segmentation of the interactive stereoscopic image starts late. The current image segmentation method has two main challenges: the calculation accuracy and the calculation speed. This is a contradictory problem and it is difficult to achieve a good balance between the two. Much effort has been made to improve the accuracy of the calculations. Price et al published on ICCV of 2011 as "StereoCut: in the dependent Interactive Object Selection in Stereo image pairs, the accuracy of Stereo image segmentation is improved by using disparity information between Stereo image pairs. The method is characterized in that information such as color, gradient and parallax of each pixel in an image is integrated into a traditional graph cut theory, and a result of three-dimensional image boundary optimization is obtained by solving a maximum flow. Although the method has high segmentation precision, the number of constructed segmentation model edges and nodes is huge, the calculation is complex, and the efficiency is low. At present, the segmentation algorithm improves the segmentation speed by changing the specific implementation process of the graph cut algorithm. For the problems of large number of pixels and complex edge structure of a stereo image, the implementation process of the graph cut algorithm is changed only and cannot be fundamentally solved. Meanwhile, in the process of stereo image segmentation, a plurality of tasks which are intensive in computation of single instruction stream and multiple data streams exist. The traditional method does not well utilize the characteristic that the tasks can be executed in parallel, and the tasks are processed in series, so that the efficiency is low, a large amount of time is consumed, and the segmentation efficiency is low.
Disclosure of Invention
The problems of complex segmentation model and low calculation efficiency existing in the current stereo image segmentation are solved. The invention explores a new segmentation method under the theoretical framework of the stereo image synchronous segmentation based on the disparity map. The complexity of the model is reduced, the calculation-intensive tasks are processed in parallel, the three-dimensional image segmentation speed is increased, and the purpose of segmenting the three-dimensional image with the common size in real time is achieved.
In order to realize the aim, the technical scheme of the invention is as follows: firstly, a group of stereo images are input, and a disparity map is obtained through a stereo image matching algorithm. A part of the front and background are specified in any of the left and right images of the original image. And establishing prior statistical models of the color of the front and the background and the parallax distribution by applying a CUDA parallel computing method according to the appointed part. The method comprises the steps of obtaining an image with a small rough scale by carrying out Gaussian filtering and downsampling on an original image, and forming a multi-level graph structure by the rough image and the original image. Based on the above, the constraints of color, gradient and parallax in the multi-level graph structure are formalized under the graph cutting theory framework, and an energy function is constructed. In order to improve the efficiency, a method of CUDA parallel computing is applied to process the graph building process. And solving the global optimization result of the multilevel graph by adopting the maximum flow/minimum cut algorithm of the graph. Then, the pixel points with larger errors at the boundary are counted, and the counted boundary pixel points are locally optimized by adopting the traditional graph cutting theory. And fusing the results of the global processing and the local optimization together to form a final segmentation result. If the user does not obtain the ideal effect, the user can also continue to draw the error area in the graph until an ideal result is obtained.
Compared with the prior art, the invention has the following advantages: according to the invention, the stereo image segmentation model based on the multi-level graph structure is constructed, so that the complexity of the edge is simplified, and the processing speed is obviously improved. Meanwhile, tasks of a plurality of compute-intensive single instruction streams and multiple data streams are processed in parallel by using the CUDA technology, so that a large amount of time is saved. Experiments prove that: compared with the prior art, the method can obviously improve the segmentation speed under the condition of small change of the segmentation accuracy and consistency on the premise of equal interaction amount.
Drawings
FIG. 1 is a flow chart of a method according to the present invention;
FIG. 2 shows experimental results of an application example of the present invention: (a) (b) are input left and right images, (c) and (d) are "StereoCut: the result of the division by the method in the relationship Interactive ObjectSelection in Stereo Image pages "; (e) and (f) is the segmentation result of the invention; user input for both methods is shown in (c), (e) graphs, where the first line identifies the foreground and the second line identifies the background. Meanwhile, the accuracy of the segmentation and the segmentation time of the two methods are given. The notebook computer used for the test of the embodiment is configured as follows: CPU processor Intel (R) Pentium (R) CPU B950@2.10GHz 2.10 GHz; gpu processor NVIDIAGeForce GT 540M.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
The process of the invention is shown in fig. 1, and specifically comprises the following steps:
step one, matching a stereo image.
Reading in a pair of stereo images I ═ Il,Ir},IlAnd IrRespectively representing left and right images. Calculating to obtain corresponding disparity maps of the left image and the right image by a stereo matching algorithm, and respectively using DlAnd DrAnd (4) showing. The stereo matching algorithm is the algorithm proposed in the paper "efficiency Belief Propagation for Early Vision" published by Felzenszwalb et al on CVPR 04.
And step two, adding front and background clues.
The user designates a part of the front and background in any one of the images through the designed interface. The invention is implemented by adopting a method similar to 'StereoCut' published by Price et al in ICCV of 2011: the method used in the dependent InteractiveObject Selection in Stereo Image pages specifies the front and background pixels of a part by drawing lines of different colors on an Image using an input device such as a mouse, a touch panel, or a stylus. As shown in fig. 2(e), the pixels covered by the first line belong to the foreground, and the pixels covered by the second line belong to the background. The subsequent steps of the present invention are not limited to the method of assigning the background and the foreground pixels used in the subsequent steps, and other methods can be used.
And step three, establishing color and parallax prior models of the front and the background.
F represents a foreground pixel set specified by a user, and B represents a background pixel set specified by the user; the color of the front and background, the prior model of the parallax are expressed in the form of GMM, histogram and a plurality of clusters. The invention adopts a multi-cluster form, and the cluster is obtained by counting the color and the parallax of the corresponding pixel set. In order to increase the processing speed, the method is based onAnd respectively clustering the color values and the parallax values corresponding to the pixels in the F and the B by using a CUDA parallel Kmeans algorithm. The specific process of processing the color model is as follows: each thread processes one pixel, calculates the distance from each pixel to all foreground and background clusters, selects the nearest distance, and clusters the pixels into the corresponding clusters. To obtain NcIndividual foreground color clusterMcIndividual background color clusterThe color clusters respectively represent color distribution statistical models of the foreground and the background; meanwhile, the parallax values corresponding to the pixels in the F and the B are respectively clustered by the same method to obtain NdIndividual foreground disparity clusterMdIndividual background parallax clusterThe parallax cluster respectively represents a parallax distribution statistical model of the foreground and the background; in this embodiment, Nc=Mc=64;Nd=Md=16。
Step four, global optimization based on a multi-level graph structure;
as the respective distribution of the foreground and the background in the image is relatively gathered, namely the difference of pixels in the front and the background is small, the difference of pixels at the boundary is large. By using this property, all pixels in the neighborhood are represented by pixels whose area is representative. The method adopts a Gaussian filtering and down-sampling mode to obtain representative pixel points. And thus a coarse, smaller-scale image is obtained. And fusing the rough image and the original image to form a multi-level graph structure. And carrying out global processing on the model with the multi-level graph structure. Representing an original stereo image pair as I ═ { I ═ Il,IrThe coarse stereo image pair denoted as Iτ={Il,τ,Ir,τ},Il、Il,τAnd Ir、Ir,τRespectively representing left and right images. The original stereo image and the rough stereo image are jointly represented as an undirected graph G<ν,>(ii) a V is a node set in the undirected graph G and is a set of edges; each vertex in the undirected graph G corresponds to a stereo image I and IτOne pixel of (1); the interactive stereo image is quickly divided by using each pixel p in the original stereo image pair under the constraint of input strokesiAssigning a label xi;xi∈ {1,0} representing the front and background, respectively, the edges in the undirected graph G include the connecting edges between each pixel and the source and sink points, the connecting edges between adjacent pixels in the image, and the connecting edges between corresponding points in the stereo image determined by the disparity map, and also include the connecting edges between the rough layer and the parent and child nodes of the original imageAre rough layer image pixels. Since the rough layer is obtained by down-sampling the original layer, oneRepresenting N in I-pictures before samplingl*NlPixels in the region of (1), N in the present embodimentl=3。
The stereo image fast segmentation problem based on the multi-hierarchy graph structure is solved and defined as the optimization problem of the following target energy function:
whereinThe elementary item represents the similarity of the color and the parallax of the rough layer pixel with the front and background colors and the parallax statistical model, and is also called as a data item; the higher the degree of similarity is,the larger the value;is a binary term in the rough layer image, reflecting the difference between all the pixels of the rough layer image and the four-neighbor domain, ΝintraRepresenting a set containing the adjacent relation of all pixel points in the left and right rough layer graphs; the larger the difference, the smaller the term; according to the principle of graph cut algorithm, different labels tend to be taken among the neighborhood pixels at the moment;the binary item between rough images defines the matching result of corresponding points, and the item is larger when the matching degree is higher; n (in)interAnd expressing a set containing the corresponding relation of the left and right rough layer pixel points.
The binary constraint relation between the rough layer image and the original image represents the similarity of parent and child nodes, and the smaller the difference between the parent and child nodes is, the larger the value is, the smaller the possibility that the boundary passes through the parent and child nodes is. N (in)paternityRepresenting a set of parent-child correspondences. w is aunary,wintra,winter,wpaternityAdjusting the weight among the energy items; w is aunary=1,wintra=4000,winter=8000,wpaternity=1000000。
(1) Defining unary constraint terms
The univariate constraint item comprises a color univariate item and a parallax univariate item, and is defined as follows:
wherein,representing a given pixelColor of (2)Taking the probability value of the foreground or background label; since the higher the probability, the smaller the energy function should be, so 1-P is takencRepresenting a color unary; in the same way as above, the first and second,representing disparity values of given pixelsTaking the probability value of the foreground or background label; taking 1-PdRepresenting a disparity unary; w is ac、wdRespectively representing the weight of the influence of color and parallax, wc+wd=1;
The method represents the color and parallax models of the front and background in cluster-like form, including NcIndividual foreground color clusterMcIndividual background color clusterNdIndividual foreground disparity clusterMdIndividual background parallax clusterGiving a calculation method of the unary item;
the color unary is calculated as follows: the method adopts a parallel method based on CUDA to calculate. And transmitting the color values of all the pixels at the CPU end to the GPU end. In the GPU, all pixels are processed in parallel. Each thread represents an unmarked pixel. The threads are mutually independent, and all the threads simultaneously calculate the distance from the pixel color to the cluster center of the foreground color model and the background color model to find out the minimum distance; describing the similarity of the pixel color with the front and background colors by using the minimum distance; the smaller the distance from the foreground or background color is, the closer the color is, and according to the graph cut theory, the more the pixel tends to select the foreground or background label; and after all the threads are finished, transmitting the solving result of each pixel of the GPU end to the CPU end, and carrying out a detailed image building process at the CPU end. The mathematical form of the color unary term is described as:
wherein,respectively representing pixelsColor of (2)The minimum distances to the centers of various clusters of foreground and background colors are respectively expressed as:
the parallax unary item and the color unary item are calculated in the same process;
(2) defining intra-image binary constraint terms
Intra-image binary constraint termThe method comprises two terms, which are used for respectively describing color change and parallax change around a pixel point, namely color gradient and parallax gradient, and are defined as follows:
wherein,representing the similarity of colors between adjacent pixels, wherein the closer the colors are, the larger the value of the colors is, and the probability that the boundary passes through the two pixels is lower according to the principle of a graph cut algorithm;representing a pixelRelative to adjacent pixel pointThe similarity of the parallaxes; the closer the parallax difference between the two is, the larger the value is, and the probability that the two take different labels is lower according to the principle of the graph cutting algorithm; in order to reduce the error caused by parallax, the parallax in the parallax item is obtained by Gaussian filtering and downsampling the parallax information of the rough layer. The two terms are defined as follows:
(3) defining inter-image binary constraint terms
The binary term among the images restricts the corresponding pixels among the images to take the same label, and the definition is as follows:
wherein C represents in a stereoscopic imageThe probability of being a corresponding point between is an asymmetric function:
is determined based on a disparity mapAs a probability distribution function of corresponding points; function(s)To representIs a left coarse layer pixelDetermining a corresponding relation according to the original disparity map at a corresponding point on the right rough layer;adopting a consistent Delta function, and defining the mode as follows;
wherein,for pixels in the left roughness layerCorresponding points in the right pictureThe disparity value of (1);is a pixel in the right rough layerPoints corresponding to the left graphThe parallax of (1); for better determination of the left and right image pixel correspondences, the raw disparity map is used without further processing.
In the formula (8)To representAndthe probability of color similarity between them, in the case of a completely accurate disparity,however, the existing parallax calculation method has errors, and in order to better determine the corresponding relation of the left and right images, a parallax item is abandoned. Using only the color term, the following form is taken:
wherein,for the left coarse layer pixelsThe color value of (a) of (b),is thatCorresponding point of right rough layerA value of (d);
(4) defining parent-child constraint relationship between upper and lower layers
The final result of the image segmentation should be represented in the pixel layer. In order to transfer the processing result of the rough layer to the pixel layer while maintaining the consistency of parent-child pixels between upper and lower layer images, the parent-child constraint relationship between the upper and lower layers is defined as:
representing the similarity between the parent and child pixels of the upper and lower layers. Since the pixels of the rough layer represent the original pixel layer Nl*NlAll pixels of the region, coarse layer pixelsThe labels of (1) represent all pixel labels of the corresponding areas of the pixel layer, so the edge weight between the parent and child pixels is defined as infinity. Edges of non-parent-child pixels are not considered.
(5) Solving for the minimum of the energy function
For the parent-child constraint relationship between the upper layer and the lower layer, the constraint relationship is defined as infinity, so that the edges between the parents and the children can never be divided, and the labels of the parent nodes can be directly transmitted to the child nodes. Because the calculation of the edges of the parent-child nodes consumes a large amount of memory, the calculation time is increased. In the specific optimization solving process, the edges between the parent nodes and the child nodes are not calculated in detail. By optimizing the energy function (equation (1)) defined in the present invention, an optimal marking result, i.e., a rough layer segmentation result, is obtained by using a graph Cut algorithm, for example, the Max/Min Cut algorithm proposed in the paper "experimental company of Min-Cut/Max-Flow Algorithms for energy minimization in Vision", published by Yuri Boykov et al in IEEE Transaction on PAMI 2004. And then, according to the label of the pixel of the rough layer, directly determining the pixel label of the area corresponding to the pixel layer. By the method, the segmentation speed can be obviously improved under the condition of unchanged accuracy. Because the label of the rough layer is directly transmitted to the pixel layer, a large error exists for the pixel points with large difference of the adjacent pixels at the boundary. In order to improve the accuracy of segmentation, points with large errors at the boundary are counted, and local optimization is performed.
Step five, based on the local optimization of the boundary of the original image
And step four, obtaining a rough segmentation boundary through global optimization. Due to the rough layerN corresponding to the original pixel layerl*NlSet of pixels in the regionIs directly transferred to the pixel layer Nl*NlThe area of (a). In this embodiment Nl3. For the boundary, the difference of the neighborhood pixels is large, and a large error exists when the labels of the rough layer pixels are directly assigned to all the pixels of the area. Thus, a separate local optimization is performed at the boundary.
Before local optimization, local boundary information is counted. Firstly, the obtained rough segmentation boundary is divided into an upper boundary, a lower boundary, a left boundary and a right boundary. Then, the upper and lower boundaries are extended to the upper and lower sides of the boundary line by NlA pixel for extending the left and right boundaries to the left and right sides of the boundary line by NlOne pixel, N in the present embodimentl3. And local optimization is carried out on the statistical boundary pixels by adopting the traditional graph cutting theory. The local optimization is carried out on a pixel layer, and parallax information is abandoned during the local optimization due to errors in parallax calculation. During global processing, the consistency of three-dimensional image segmentation is ensured, and local optimization is processing on local pixel points. Therefore, in the local optimization, the local optimization is performed on the left image and the right image independently. If IeIs a statistical local graph to be processed. The local energy function is defined as:
the similarity between the pixels at the boundary and the front and background color models is represented by a unary item, namely a data item, and the larger the similarity is, the larger the value is.The similarity of the neighborhood pixels is represented by a binary term, namely a smoothing term, and the more similar the two terms are, the smaller the value is. The less likely the boundary will pass through both.Representing the union of all adjacent relations in the boundary map. Wherein, wunary+wintra=1
A meta-item is specifically defined as follows:
the optimization at the boundary is a locally precise optimization, and errors should be reduced as much as possible, so that the unary term only adopts the color term. The specific calculation of the unary item is the same as the calculation of the unary item color in the global optimization.
The binary term also uses only the color term in order to reduce errors. The specific definition is as follows:
after the local energy function is well defined, optimizing the local energy function, namely an equation (12), by adopting the maximum flow/minimum cut optimization algorithm mentioned in the fourth step to obtain an optimal marking result, namely a cutting result; the results of the four simultaneous segmentation steps are fused to form the segmentation result of the whole image pair.
Step six, interaction
If the segmentation result is not satisfied, returning to the step two, and continuing to add the front and background clues; each additional stroke triggers a complete segmentation process. On the basis of the segmentation, further segmentation is performed until a satisfactory result is obtained.
"StereoCut: the method in the dependent InteractiveObject Selection in the Stereo Image papers "is a comparison object, and the effectiveness of the method of the present invention will be described. Both methods use a consistent Delta function (equation (9)) as the probability distribution function between corresponding points. Figure 2 shows a comparison of the effects. Fig. 2(a) and (b) show the input left and right images. (c) And (d) is the result of segmentation by the Stereocut method; FIGS. 2(e), (f) are the segmentation results of the present invention; the following two columns give the accuracy of the segmentation of the two methods and the total time of the segmentation. The specific definition of accuracy (denoted by a) is as follows:
wherein
Wherein N isLAnd NrRespectively representing the total number of pixels of the left and right images,label (0 or 1) of ith pixel in left image after division, correspondingAnd (4) a label of j th pixel of the right image after segmentation.Respectively represent the true values of the left and right diagrams,the difference between the label and the true value of a certain pixel of the left image is reflected. Function fAIs a function of the difference, which is 1 if the difference is 0, and is 0 otherwise. It can be seen from the formula (15) that the ratio of the total number of non-differences from the true value in a single image to the image size is the segmentation accuracy, and the segmentation accuracy of the stereo image is the average value of the accuracy of the left and right images.
The user inputs for both methods are shown in graphs (c), (e), respectively, with the line of the first line inside the object marking the foreground and the line of the second line outside the object marking the background. Comparing graphs (c), (d) and graphs (e), (f), and the calculated time and accuracy values given for the two methods, it can be seen that: the method can obviously improve the image segmentation speed under the condition of small change of the segmentation accuracy rate on the premise of equal interaction amount.
Claims (1)
1. A method for rapidly segmenting an interactive three-dimensional image based on a multi-hierarchy chart structure is characterized by comprising the following steps: firstly, inputting a group of stereo images, and obtaining a disparity map through a stereo image matching algorithm; appointing a part of front and background in any one of the left and right images of the original image; establishing prior statistical models of the color of the front and the background and the parallax distribution by applying a CUDA parallel computing method according to the appointed part; performing Gaussian filtering and down-sampling on an original image to obtain an image with a smaller coarse scale, and forming a multi-level graph structure by the coarse image and the original image; based on the color, gradient and parallax constraints in the multi-level graph structure are formalized under the graph cutting theory framework, and an energy function is constructed; in order to improve the efficiency, a CUDA parallel computing method is applied to process the graph building process; solving a global optimization result of the multi-level graph by adopting a maximum flow/minimum cut algorithm of the graph; then, counting pixel points with large errors at the boundary, and carrying out local optimization on the counted boundary pixel points by adopting a traditional graph cutting theory; fusing the results of global processing and local optimization together to form a final segmentation result; if the user does not obtain an ideal effect, the error area in the graph is continuously sketched until an ideal result is obtained;
the method is characterized in that: the method specifically comprises the following steps:
step one, matching a stereo image;
reading in a pair of stereo images I ═ Il,Ir},IlAnd IrRespectively representing a left image and a right image; calculating to obtain corresponding disparity maps of the left image and the right image by a stereo matching algorithm, and respectively using DlAnd DrRepresents;
adding front and background clues;
a user designates a part of front and background in any one of the images through a designed interface; using a mouse, a touch screen or a handwriting pen input device to designate a front part and a background pixel by drawing lines with different colors on an image; pixels covered by the first line belong to the foreground, and pixels covered by the second line belong to the background; the subsequent steps of the method have no limitation on the method for specifying the front and background pixels used in the steps, and other methods can be used;
establishing prior models of the color and the parallax of the front and the background;
f represents a foreground pixel set specified by a user, and B represents a background pixel set specified by the user; the prior models of the color and the parallax of the front and the background are expressed in the form of GMM, histogram and a plurality of clusters; the method adopts a multi-cluster form, and obtains clusters by counting the color and parallax of a corresponding pixel set; in order to improve the processing speed, a Kmeans algorithm based on CUDA parallelism is adopted to carry out pixel comparison on the F and the BClustering the corresponding color values and the corresponding parallax values respectively; the specific process of processing the color model is as follows: processing a pixel by each thread, calculating the distance from each pixel to all foreground and background clusters, selecting the nearest distance, and clustering the pixels into corresponding clusters; to obtain NcIndividual foreground color clusterMcIndividual background color clusterThe color clusters respectively represent color distribution statistical models of the foreground and the background; meanwhile, the parallax values corresponding to the pixels in the F and the B are respectively clustered by the same method to obtain NdIndividual foreground disparity clusterMdIndividual background parallax clusterThe parallax cluster respectively represents a parallax distribution statistical model of the foreground and the background; in this embodiment, Nc=Mc=64;Nd=Md=16;
Step four, global optimization based on a multi-level graph structure;
the respective distribution of the foreground and the background in the image is relatively gathered, namely the difference of pixels in the front and the background is small, and the difference of pixels at the boundary is large; by utilizing the characteristic, all pixels in the neighborhood are represented by pixels with representative regions; the method adopts a Gaussian filtering and down-sampling mode to obtain representative pixel points; further obtaining a rough image with a small scale; fusing the rough image and the original image to form a multi-level image structure; carrying out global processing on the model with the multi-level graph structure; representing an original stereo image pair as I ═ { I ═ Il,IrThe coarse stereo image pair denoted as Iτ={Il,τ,Ir,τ},Il、Il,τAnd Ir、Ir,τRespectively representing a left image and a right image; the original stereo image and the rough stereo image are jointly represented as an undirected graph G<ν,>(ii) a V is a node set in the undirected graph G and is a set of edges; each vertex in the undirected graph G corresponds to a stereo image I and IτOne pixel of (1); the interactive stereo image is quickly divided by using each pixel p in the original stereo image pair under the constraint of input strokesiAssigning a label xi;xi∈ {1,0} representing the front and background, respectively, the edges in the undirected graph G include the connecting edges between each pixel and the source and sink points, the connecting edges between adjacent pixels in the image, and the connecting edges between the corresponding points of the stereo image determined by the disparity map, and also include the connecting edges between the rough layer and the parent and child nodes of the original image, and the method comprises the steps ofCoarse layer image pixel points; since the rough layer is obtained by down-sampling the original layer, oneRepresenting N in I-pictures before samplingl*NlPixels in the region of (1), N in the present embodimentl=3;
The stereo image fast segmentation problem based on the multi-hierarchy graph structure is solved and defined as the optimization problem of the following target energy function:
<mrow> <mtable> <mtr> <mtd> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>w</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>&Element;</mo> <msup> <mi>I</mi> <mi>&tau;</mi> </msup> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
whereinThe elementary item represents the similarity of the color and the parallax of the rough layer pixel with the front and background colors and the parallax statistical model, and is also called as a data item; the higher the degree of similarity is,the larger the value;is a binary term in the rough layer image, reflecting the difference between all the pixels of the rough layer image and the four adjacent domains, NintraRepresenting a set containing the adjacent relation of all pixel points in the left and right rough layer graphs; the greater the difference, the greater theThe smaller; according to the principle of graph cut algorithm, different labels tend to be taken among the neighborhood pixels at the moment;the binary item between rough images defines the matching result of corresponding points, and the item is larger when the matching degree is higher; n is a radical ofinterRepresenting a set containing corresponding relations of the left and right rough layer pixel points;is a binary constraint relation between the rough layer image and the original image, representing the parent and childThe similarity of the nodes is smaller, the difference between the parent nodes and the child nodes is smaller, the value is larger, and the possibility that the boundary passes through the parent nodes and the child nodes is smaller; n is a radical ofpaternityRepresenting a set of parent-child correspondences; w is aunary,wintra,winter,wpaternityAdjusting the weight among the energy items; in the present method wunary=1,wintra=4000,winter=8000,wpaternity=1000000;
(1) Defining unary constraint terms
The univariate constraint item comprises a color univariate item and a parallax univariate item, and is defined as follows:
<mrow> <msub> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>w</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>P</mi> <mi>c</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>x</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>w</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>P</mi> <mi>d</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>x</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
wherein,representing a given pixelColor of (2) Taking the probability value of the foreground or background label; since the higher the probability, the smaller the energy function should be, so 1-P is takencRepresenting a color unary; in the same way as above, the first and second,representing disparity values of given pixels Taking the probability value of the foreground or background label; taking 1-PdRepresenting a disparity unary; w is ac、wdRespectively representing the weight of the influence of color and parallax, wc+wd=1;
The method represents the color and parallax models of the front and background in cluster-like form, including NcIndividual foreground color clusterMcIndividual background color clusterNdIndividual prospect looks atPoor clusterMdIndividual background parallax clusterGiving a calculation method of the unary item;
the color unary is calculated as follows: the method adopts a parallel method based on CUDA to calculate; transmitting the color values of all pixels at the CPU end to the GPU end; in the GPU, all pixels are processed in parallel; each thread represents an unmarked pixel; the threads are mutually independent, and all the threads simultaneously calculate the distance from the pixel color to the cluster center of the foreground color model and the background color model to find out the minimum distance; describing the similarity of the pixel color with the front and background colors by using the minimum distance; the smaller the distance from the foreground or background color is, the closer the color is, and according to the graph cut theory, the more the pixel tends to select the foreground or background label; after all threads are finished, transmitting the solving result of each pixel of the GPU end to the CPU end, and carrying out a detailed image building process at the CPU end; the mathematical form of the color unary term is described as:
<mrow> <mn>1</mn> <mo>-</mo> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mfrac> <msubsup> <mi>s</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mrow> <msubsup> <mi>s</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mo>+</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>min</mi> </msubsup> </mrow> </mfrac> <mo>,</mo> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <msubsup> <mi>t</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mrow> <msubsup> <mi>s</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mo>+</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>min</mi> </msubsup> </mrow> </mfrac> <mo>,</mo> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
wherein,respectively representing pixelsColor of (2)The minimum distances to the centers of various clusters of foreground and background colors are respectively expressed as:
<mrow> <msubsup> <mi>s</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mo>=</mo> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>C</mi> <mi>n</mi> <mi>F</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>N</mi> <mi>c</mi> </msub> </mrow>
<mrow> <msubsup> <mi>t</mi> <mi>i</mi> <mi>min</mi> </msubsup> <mo>=</mo> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>C</mi> <mi>m</mi> <mi>B</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>M</mi> <mi>c</mi> </msub> </mrow>
the parallax unary item and the color unary item are calculated in the same process;
(2) defining intra-image binary constraint terms
Intra-image binary constraint termThe method comprises two terms, which are used for respectively describing color change and parallax change around a pixel point, namely color gradient and parallax gradient, and are defined as follows:
<mrow> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>f</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <msub> <mi>f</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>|</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>x</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
wherein,representing the similarity of colors between adjacent pixels, wherein the closer the colors are, the larger the value of the colors is, and the probability that the boundary passes through the two pixels is lower according to the principle of a graph cut algorithm;representing a pixelRelative to adjacent pixel pointThe similarity of the parallaxes; the closer the parallax difference between the two is, the larger the value is, and the probability that the two take different labels is lower according to the principle of the graph cutting algorithm; in order to reduce errors caused by parallax error and parallax error in the parallax item, the step adopts coarse layer parallax error information obtained through Gaussian filtering and down-sampling; the two terms are defined as follows:
<mrow> <msub> <mi>f</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <msub> <mi>f</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
(3) defining inter-image binary constraint terms
The binary term among the images restricts the corresponding pixels among the images to take the same label, and the definition is as follows:
<mrow> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>e</mi> <mi>r</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <mi>C</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> <mn>2</mn> </mfrac> <mo>|</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>x</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
wherein C represents in a stereoscopic imageThe probability of being a corresponding point between is an asymmetric function:
<mrow> <mi>C</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <mi>M</mi> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>=</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>x</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>M</mi> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>=</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>
is determined based on a disparity mapAs a probability distribution function of corresponding points; function(s)To representIs a left coarse layer pixelDetermining a corresponding relation according to the original disparity map at a corresponding point on the right rough layer;adopting a consistent Delta function, and defining the mode as follows;
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>M</mi> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>=</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>1</mn> <mo>,</mo> <mo>|</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <mo>=</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mi>l</mi> </msubsup> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mo>|</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <mo>=</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>r</mi> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>s</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
wherein,for pixels in the left roughness layerCorresponding points in the right pictureThe disparity value of (1);is a pixel in the right rough layerPoints corresponding to the left graphThe parallax of (1); in order to better determine the correspondence between the left and right image pixels, the disparity of the raw disparity map is used;
in the formula (8)To representAndthe probability of color similarity between them, in the case of a completely accurate disparity,however, the existing parallax calculation method has errors, and in order to better determine the corresponding relation of the left image and the right image, a parallax item is abandoned; using only the color term, the following form is taken:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <mi>M</mi> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>=</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>x</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>-</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>
wherein,for the left coarse layer pixelsThe color value of (a) of (b),is thatCorresponding point of right rough layerA value of (d);
(4) defining parent-child constraint relationship between upper and lower layers
The final result of image segmentation is expressed in a pixel layer; in order to transfer the processing result of the rough layer to the pixel layer while maintaining the consistency of parent-child pixels between upper and lower layer images, the parent-child constraint relationship between the upper and lower layers is defined as:
<mrow> <msub> <mi>E</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&infin;</mi> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>
representing the similarity between the parent-child pixels of the upper layer and the lower layer; since the pixels of the rough layer represent the original pixel layer Nl*NlAll pixels of the region, coarse layer pixelsThe labels of (1) represent all pixel labels of the corresponding area of the pixel layer, so that the edge weight among the parent-child pixels is defined as infinity; edges of non-parent-child node pixels are not considered;
(5) solving for the minimum of the energy function
For the parent-child constraint relationship between an upper layer and a lower layer, the definition is infinite, so that edges between parents and children cannot be divided, and labels of parent nodes can be directly transmitted to child nodes; because the calculation of the edges of the parent-child nodes consumes a large amount of memory, the calculation time is increased; in the specific optimization solving process, edges among parent and child nodes are not calculated in detail; obtaining an optimal marking result, namely a rough layer segmentation result, by optimizing an energy function (formula (1)) defined by the method by adopting a graph cutting algorithm; then, according to the labels of the pixels of the rough layer, directly determining the area pixel labels corresponding to the pixel layer; by the method, the segmentation speed can be obviously improved under the condition of unchanged accuracy rate; because the label of the rough layer is directly transmitted to the pixel layer, a large error exists for the pixel points with large difference of the neighborhood pixels at the boundary; in order to improve the accuracy of segmentation, points with larger errors at the boundary are counted, and local optimization is carried out;
step five, based on the local optimization of the boundary of the original image
Obtaining a rough segmentation boundary through the global optimization of the step four; due to the rough layerN corresponding to the original pixel layerl*NlIn the regionSet of pixels, willIs directly transferred to the pixel layer Nl*NlThe area of (a); for the boundary, the difference of the neighborhood pixels is large, the labels of the rough layer pixels are directly assigned to all the pixels in the region, and a large error exists; thus, a separate local optimization is performed at the boundary;
before local optimization, local boundary information is counted; firstly, dividing the obtained rough segmentation boundary into an upper boundary, a lower boundary, a left boundary and a right boundary; then, the upper and lower boundaries are extended to the upper and lower sides of the boundary line by NlA pixel for extending the left and right boundaries to the left and right sides of the boundary line by NlA plurality of pixels; in the present invention Nl3; performing local optimization on the statistical boundary pixels by adopting the traditional graph cutting theory; the local optimization is carried out on a pixel layer, and parallax information is abandoned during the local optimization due to the error of parallax calculation; during global processing, the consistency of three-dimensional image segmentation is ensured, and local optimization is to process local pixel points; therefore, during local optimization, the local optimization is carried out on the left image and the right image independently; if IeIs a statistical local graph to be processed; the local energy function is defined as:
<mrow> <msup> <mi>E</mi> <mi>e</mi> </msup> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>w</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <munder> <mo>&Sigma;</mo> <mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <msup> <mi>I</mi> <mi>e</mi> </msup> </mrow> </munder> <msubsup> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <munder> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> <mo>&Element;</mo> <msubsup> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> </mrow> </munder> <msubsup> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>
the similarity between the pixels at the boundary and the front and background color models is represented by a unitary item, namely a data item, and the larger the similarity is, the larger the value is;the binary term is a smooth term and represents the similarity of the neighborhood pixels, and the more similar the binary term is, the smaller the value is; the less likely the boundary passes through both;representing all neighbors in the boundary mapCombining connection relations; a meta-item is specifically defined as follows:
<mrow> <msubsup> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>
the optimization at the boundary is local accurate optimization, and errors are reduced as much as possible, so that the unary item only adopts a color item; the specific calculation of the unary item is the same as the calculation of the unary item color in the global optimization;
the binary term only adopts a color term in order to reduce errors; the specific definition is as follows:
<mrow> <msubsup> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>c</mi> <mi>j</mi> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>|</mo> <mo>,</mo> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>
after the local energy function is well defined, optimizing the local energy function, namely an equation (12), by adopting the maximum flow/minimum cut optimization algorithm mentioned in the fourth step to obtain an optimal marking result, namely a cutting result; the results of the four segmentation in the synchronization step are fused to form the segmentation result of the whole image pair;
step six, interaction
If the segmentation result is not satisfied, returning to the step two, and continuing to add the front and background clues; each time a stroke is added, a complete segmentation process is triggered; on the basis of the segmentation, further segmentation is performed until a satisfactory result is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510354774.9A CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510354774.9A CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105046689A CN105046689A (en) | 2015-11-11 |
CN105046689B true CN105046689B (en) | 2017-12-15 |
Family
ID=54453207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510354774.9A Active CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105046689B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203447B (en) * | 2016-07-06 | 2019-12-06 | 华东理工大学 | Foreground target extraction method based on pixel inheritance |
CN106408531A (en) * | 2016-09-09 | 2017-02-15 | 四川大学 | GPU acceleration-based hierarchical adaptive three-dimensional reconstruction method |
CN106887009B (en) * | 2017-01-04 | 2020-01-03 | 深圳市赛维电商股份有限公司 | Method, device and terminal for realizing interactive image segmentation |
CN109615600B (en) * | 2018-12-12 | 2023-03-31 | 南昌工程学院 | Color image segmentation method of self-adaptive hierarchical histogram |
CN110110594B (en) * | 2019-03-28 | 2021-06-22 | 广州广电运通金融电子股份有限公司 | Product distribution identification method and device |
CN110428506B (en) * | 2019-08-09 | 2023-04-25 | 成都景中教育软件有限公司 | Method for realizing dynamic geometric three-dimensional graph cutting based on parameters |
CN110751668B (en) * | 2019-09-30 | 2022-12-27 | 北京迈格威科技有限公司 | Image processing method, device, terminal, electronic equipment and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310452A (en) * | 2013-06-17 | 2013-09-18 | 北京工业大学 | Method for segmenting images by aid of automatic weight selection |
CN104091336A (en) * | 2014-07-10 | 2014-10-08 | 北京工业大学 | Stereoscopic image synchronous segmentation method based on dense disparity map |
CN104166988A (en) * | 2014-07-10 | 2014-11-26 | 北京工业大学 | Sparse matching information fusion-based three-dimensional picture synchronization segmentation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7720282B2 (en) * | 2005-08-02 | 2010-05-18 | Microsoft Corporation | Stereo image segmentation |
-
2015
- 2015-06-24 CN CN201510354774.9A patent/CN105046689B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310452A (en) * | 2013-06-17 | 2013-09-18 | 北京工业大学 | Method for segmenting images by aid of automatic weight selection |
CN104091336A (en) * | 2014-07-10 | 2014-10-08 | 北京工业大学 | Stereoscopic image synchronous segmentation method based on dense disparity map |
CN104166988A (en) * | 2014-07-10 | 2014-11-26 | 北京工业大学 | Sparse matching information fusion-based three-dimensional picture synchronization segmentation method |
Non-Patent Citations (1)
Title |
---|
基于图像分割的立体匹配算法;颜轲等;《计算机应用》;20110131;第31卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105046689A (en) | 2015-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105046689B (en) | A kind of interactive stereo-picture fast partition method based on multi-level graph structure | |
Liu et al. | Local similarity pattern and cost self-reassembling for deep stereo matching networks | |
Papon et al. | Voxel cloud connectivity segmentation-supervoxels for point clouds | |
Chen et al. | A matting method based on full feature coverage | |
Chen et al. | Robust dense reconstruction by range merging based on confidence estimation | |
CN102682477B (en) | Regular scene three-dimensional information extracting method based on structure prior | |
Wang et al. | Multifocus image fusion using convolutional neural networks in the discrete wavelet transform domain | |
CN111625667A (en) | Three-dimensional model cross-domain retrieval method and system based on complex background image | |
CN104166988B (en) | A kind of stereo sync dividing method for incorporating sparse match information | |
CN103020993A (en) | Visual saliency detection method by fusing dual-channel color contrasts | |
CN104091336B (en) | Stereoscopic image synchronous segmentation method based on dense disparity map | |
Wang et al. | Densely connected graph convolutional network for joint semantic and instance segmentation of indoor point clouds | |
Li et al. | Deep learning based monocular depth prediction: Datasets, methods and applications | |
Wang et al. | Semantic annotation for complex video street views based on 2D–3D multi-feature fusion and aggregated boosting decision forests | |
Tian et al. | HPM-TDP: An efficient hierarchical PatchMatch depth estimation approach using tree dynamic programming | |
Wang et al. | EFN6D: an efficient RGB-D fusion network for 6D pose estimation | |
CN113763474B (en) | Indoor monocular depth estimation method based on scene geometric constraint | |
Chen et al. | NeuralRecon: Real-Time Coherent 3D Scene Reconstruction from Monocular Video | |
Liu et al. | Learning Local Event-based Descriptor for Patch-based Stereo Matching | |
Wu et al. | ClusteringSDF: Self-organized neural implicit surfaces for 3D decomposition | |
Ning et al. | Deep Learning on 3D Point Cloud for Semantic Segmentation | |
Zeng et al. | Deep Stereo Network with MRF-based Cost Aggregation | |
Mustafa et al. | Semantically coherent 4d scene flow of dynamic scenes | |
Akın et al. | Challenges in determining the depth in 2-d images | |
Abdellali et al. | 3D reconstruction with depth prior using graph-cut |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |