CN104091336B

CN104091336B - Stereoscopic image synchronous segmentation method based on dense disparity map

Info

Publication number: CN104091336B
Application number: CN201410328103.0A
Authority: CN
Inventors: 马伟; 杨璐维; 段立娟
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2014-07-10
Filing date: 2014-07-10
Publication date: 2017-05-17
Anticipated expiration: 2034-07-10
Also published as: CN104091336A

Abstract

The invention relates to a method for synchronously segmenting stereoscopic images based on dense disparity maps. First, a set of stereo images is input, and a disparity map is obtained through a stereo image matching algorithm. Then in one of the pictures, specify part of the foreground and background by drawing with a brush. According to the specified part, the prior statistical models of the color distribution of the foreground and the background, and the prior statistical models of the parallax distribution of the foreground and the background are respectively established. On this basis, under the framework of graph cut theory, constraints such as color, gradient and parallax are formalized, and energy functions are constructed. Finally, the optimization result is solved by using the maximum flow/minimum cut algorithm of the graph. If the user does not get the desired effect, he can continue to outline the wrong area in the picture until the desired result is obtained. Both the disparity distribution model and the change model adopted in the present invention are disparity statistical information, which effectively avoids the influence of disparity calculation errors. Compared with the existing method, the segmentation result obtained by the method of the present invention is more accurate.

Description

A Stereo Image Synchronous Segmentation Method Based on Dense Disparity Map

技术领域technical field

本发明属于计算机视觉、计算机图形学和图像处理等交叉领域，涉及一种基于稠密视差图的立体图像同步分割方法。The invention belongs to the intersecting fields of computer vision, computer graphics and image processing, and relates to a method for synchronously segmenting stereoscopic images based on dense disparity maps.

背景技术Background technique

立体图像在各领域的普及对该类数据的智能处理提出了迫切的需求。交互式立体图像智能分割是其中一项重要工作：用户只需在立体图中的一张图像上指定少量前、背景，方法会自动完成两张图像的同步分割。分割算法的效果决定了视频监控应用中检测、识别、分类以及跟踪等问题的准确度。分割后的前景目标物可作为三维模型重建的输入数据，去除背景在重建过程中的干扰。分割的算法和程序亦可帮助普通用户对立体相机拍摄到的日常生活图片进行编辑，帮助影视制作人员对立体电视、电影进行后期编辑。例如去除不想要的物体，将前景物体合成到新的背景中，以及拷贝和粘贴前景物体等。The popularity of stereoscopic images in various fields has put forward an urgent demand for intelligent processing of such data. Interactive stereo image intelligent segmentation is one of the important tasks: the user only needs to specify a small amount of foreground and background on one image in the stereo image, and the method will automatically complete the synchronous segmentation of the two images. The effectiveness of segmentation algorithms determines the accuracy of detection, recognition, classification and tracking in video surveillance applications. The segmented foreground object can be used as the input data for 3D model reconstruction to remove background interference during the reconstruction process. The segmentation algorithm and program can also help ordinary users edit the pictures of daily life captured by stereo cameras, and help film and television producers to edit stereo TVs and movies later. Examples include removing unwanted objects, compositing foreground objects into a new background, and copying and pasting foreground objects.

目前针对单幅图像的交互式分割方法相对已经较为成熟，有些已经实现实际应用，例如CS3中快速选择(Quick Selection)工具。相比单幅图像的分割，交互式立体图像的智能分割起步较晚。现有针对立体图像进行分割的基本框架是：首先通过立体匹配算法得到视差图。视差图中的每一个像素值表示参考图(预先选定的两幅图中的其中一幅)中对应像素在匹配图中的偏移量。即给定一对立体图和左图对应的视差图，可得到左图像素在右图中的对应像素。得到视差图之后，形式化视差线索，以及在单张图像分割中常用的颜色、梯度等线索，构成能量函数。通过最优化能量函数实现图像分割问题求解。视差图的好坏对于分割结果有重要的影响。然而，现有立体匹配方法得到的视差图存在较多误差，现有基于视差图的立体图像分割方法，例如Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive Object Selection in Stereo Image Pairs”中，直接将视差图决定的对应关系形式化在能量函数中，容易导致分割错误，影响分割的智能化。At present, the interactive segmentation method for a single image is relatively mature, and some have been implemented in practical applications, such as The Quick Selection tool in CS3. Compared with the segmentation of a single image, the intelligent segmentation of interactive stereo images started late. The existing basic framework for stereo image segmentation is as follows: firstly, a disparity map is obtained through a stereo matching algorithm. Each pixel value in the disparity map represents the offset of the corresponding pixel in the reference map (one of the two pre-selected images) in the matching map. That is, given a pair of stereo images and the corresponding disparity map of the left image, the corresponding pixels of the pixels of the left image in the right image can be obtained. After obtaining the disparity map, formalize the disparity clues, as well as clues such as color and gradient commonly used in single image segmentation, to form an energy function. The image segmentation problem is solved by optimizing the energy function. The quality of the disparity map has an important impact on the segmentation results. However, there are many errors in the disparity map obtained by the existing stereo matching method. The existing stereo image segmentation method based on the disparity map, such as "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on ICCV in 2011 In ", directly formalizing the correspondence determined by the disparity map in the energy function will easily lead to segmentation errors and affect the intelligence of segmentation.

发明内容Contents of the invention

鉴于目前基于视差图的立体图像分割方法在视差使用方面的局限性，本发明在基于视差图的立体图像同步分割的理论框架下，探索新的分割方法，力图减少匹配误差对分割结果的影响，实现提高分割过程智能化的目的。In view of the limitations of the current stereoscopic image segmentation method based on the disparity map in the use of parallax, the present invention explores a new segmentation method under the theoretical framework of the synchronous segmentation of stereoscopic images based on the disparity map, and strives to reduce the impact of matching errors on the segmentation results. Realize the purpose of improving the intelligence of the segmentation process.

为实现这个目标，本发明的技术方案为：在用户输入一组立体图像之后，方法自动通过立体图像匹配算法得到视差图。而后，用户可在其中一图中通过笔刷勾画的方式，指定部分前、背景。并自动根据指定部分分别建立前、背景的颜色分布的先验统计模型，以及前、背景的视差分布的先验统计模型。以此为基础，在图割理论框架下形式化颜色、梯度以及视差等约束，构造能量函数。最后，采用图的最大流/最小割算法求解最优化结果。若用户没有得到理想的效果，还可以继续对图中错误区域进行勾画，直到得到理想结果。To achieve this goal, the technical solution of the present invention is: after the user inputs a set of stereoscopic images, the method automatically obtains a disparity map through a stereoscopic image matching algorithm. Then, the user can designate part of the foreground and background by drawing with a brush in one of the pictures. And automatically establish the prior statistical models of the color distribution of the foreground and the background, and the prior statistical models of the parallax distribution of the foreground and the background respectively according to the specified part. On this basis, under the framework of graph cut theory, constraints such as color, gradient and parallax are formalized, and energy functions are constructed. Finally, the optimization result is solved by using the maximum flow/minimum cut algorithm of the graph. If the user does not get the desired effect, he can continue to outline the wrong area in the picture until the desired result is obtained.

与现有技术相比，本发明具有以下优点：本发明以视差图为依据，建立前、背景视差分布统计模型，同时数学形式化图像内视差的变化情况，结合传统约束项构造能量函数，并通过图割算法求解能量函数最小值实现分割。视差分布模型和变化模型均是视差统计信息，有效避免了视差计算误差带来的影响。实验证明：相比现有方法，在同等交互量的前提下，本发明所述方法得到的分割结果更准确。Compared with the prior art, the present invention has the following advantages: based on the disparity map, the present invention establishes a statistical model of the disparity distribution of the foreground and the background, and at the same time mathematically formalizes the variation of disparity in the image, and constructs an energy function in combination with traditional constraints, and The segmentation is realized by solving the minimum value of the energy function through the graph cut algorithm. Both the disparity distribution model and the change model are disparity statistical information, which effectively avoid the influence of disparity calculation errors. The experiment proves that: compared with the existing method, under the premise of the same amount of interaction, the segmentation result obtained by the method of the present invention is more accurate.

附图说明Description of drawings

图1为本发明所涉及方法的流程图；Fig. 1 is the flowchart of the method involved in the present invention;

图2为本发明应用实例实验结果：(a)、(b)为输入的左、右图像，(c)、(d)是采用Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive ObjectSelection in Stereo Image Pairs”中的方法计算的结果；(e)、(f)为本发明的分割结果；两种方法所用的用户输入在(c)、(e)图中显示，目标物内部的实线标识前景，目标物区域外的虚线标识背景。Fig. 2 is the experimental result of the application example of the present invention: (a), (b) are the left and right images of input, (c), (d) adopt the "StereoCut: Consistent Interactive" published by Price et al. on the ICCV in 2011 The result calculated by the method in ObjectSelection in Stereo Image Pairs"; (e), (f) are the segmentation results of the present invention; the user input used by the two methods is shown in (c), (e), and the object inside Solid lines indicate the foreground, and dashed lines outside the target area indicate the background.

具体实施方式detailed description

下面结合附图和具体实施方式对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

本发明的流程如图1所示，具体包括如下步骤：Flow process of the present invention is as shown in Figure 1, specifically comprises the following steps:

步骤一，匹配立体图像。Step 1, matching stereo images.

读入一对立体图像I＝{I^l，I^r}，I^l与I^r分别表示左、右图像。利用立体匹配算法计算得到左、右图像对应的视差图，分别用D^l与D^r表示。立体匹配可以采用任意算法，例如Felzenszwalb等人在CVPR04上发表的论文“Efficient Belief Propagation for EarlyVision”中提出的算法。Read in a pair of stereo images I={I ^l , I ^r }, where I ^l and I ^r represent the left and right images respectively. Use the stereo matching algorithm to calculate the disparity maps corresponding to the left and right images, denoted by D ^l and D ^r respectively. Stereo matching can use any algorithm, such as the algorithm proposed in the paper "Efficient Belief Propagation for EarlyVision" published by Felzenszwalb et al. on CVPR04.

步骤二，添加前、背景线索。Step two, add foreground and background cues.

用户通过所设计的界面在其中任意一张图像中指定部分前、背景。本发明实施例采用类似于Price等人在2011年的ICCV上发表的“StereoCut：Consistent InteractiveObject Selection in Stereo Image Pairs”中用到的方法，利用鼠标、触摸屏或者手写笔等输入设备，通过在图像上勾画不同颜色的线条指定部分前、背景像素。如图2(e)所示，红色线条覆盖的像素属于前景，蓝色线条覆盖的像素属于背景。本发明的后续步骤对于该步骤中所用的前、背景像素指定方式并无限制，其它方式亦可使用。The user specifies part of the foreground and background in any one of the images through the designed interface. The embodiment of the present invention adopts a method similar to that used in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. Draw lines of different colors to specify part of the foreground and background pixels. As shown in Figure 2(e), the pixels covered by the red line belong to the foreground, and the pixels covered by the blue line belong to the background. The subsequent steps of the present invention are not limited to the method of specifying front and background pixels used in this step, and other methods can also be used.

步骤三，建立前、背景的颜色、视差先验模型。Step 3, establishing the color and parallax prior models of the foreground and background.

用F表示用户指定的前景像素集合，B表示用户指定的背景像素集合。前、背景的颜色、视差的先验模型可以采用GMM、直方图以及多个类簇的形式表达，通过拟合或统计对应像素集合的颜色得到。本发明实施例采用K-means聚类方法，对F与B中的像素对应的颜色值分别进行聚类，得到N_c个前景颜色类簇M_c个背景颜色类簇分别表示背景的颜色分布统计模型。同时，用同样的方法，对F和B中的像素对应的视差值分别进行聚类，得到N_d个前景视差类簇M_d个背景视差类簇分别表示前、背景的视差分布统计模型。本发明建议N_c＝M_c＝N_d＝M_d＝64。Let F represent the set of foreground pixels specified by the user, and B represent the set of background pixels specified by the user. The color of the foreground and the background, and the prior model of the disparity can be expressed in the form of GMM, histogram, and multiple clusters, and obtained by fitting or counting the colors of the corresponding pixel sets. The embodiment of the present invention adopts the K-means clustering method to cluster the color values corresponding to the pixels in F and B respectively, and obtain N _c foreground color clusters M _c background color clusters Represent the statistical model of the color distribution of the background, respectively. At the same time, use the same method to cluster the disparity values corresponding to the pixels in F and B to obtain N _d foreground disparity clusters M _d background disparity clusters Represent the disparity distribution statistical model of foreground and background respectively. The present invention proposes N _c =M _c =N _d =M _d =64.

步骤四，定义能量函数。Step 4, define the energy function.

立体图像I＝{I^l，I^r}，包含左图I^l和右图I^r，可以表示成一个无向图G＝<ν,ε>。其中，ν为图G中的节点集合，ε为边的集合。图G中的每个顶点对应立体图像I中的一个像素。图像中除集合F和集合B中的像素外，其余像素设定属于集合U。交互式立体图像同步分割是在输入的笔画的约束下，为U中的每个像素p_i赋予一个标签x_i。x_i∈{1，0}，分别表示前、背景。图G中的边包含图像内相邻像素的连接边，以及视差图决定的立体图像对应点之间的连接边。Stereo image I={I ^l , I ^r }, including left image I ^l and right image I ^r , can be expressed as an undirected graph G=<ν,ε>. Among them, ν is the set of nodes in graph G, and ε is the set of edges. Each vertex in graph G corresponds to a pixel in stereo image I. Except for the pixels in set F and set B in the image, the remaining pixels are set to belong to set U. Interactive Stereo Image Synchronous Segmentation is to assign a label _xi to each pixel _pi in U under the constraints of the input strokes. x _i ∈ {1, 0}, denote front and background respectively. The edges in the graph G include the connection edges of adjacent pixels in the image, and the connection edges between the corresponding points of the stereo image determined by the disparity map.

把求解上述立体图像同步问题定义为以下目标能量函数的最优化问题：The solution to the above stereo image synchronization problem is defined as the optimization problem of the following target energy function:

其中，f_Unary(p_i，x_i)是一元项(Unary Term)，表示像素p_i的颜色、视差与前、背景颜色和视差统计模型的相似性，也叫做数据项(Data Term)。相似性越高，f_Unary值越大。f_Intra(p_i，p_j)是图像内二元项(Intra Plane Binary Term)，反映了I中所有像素与其邻域(四邻域或八邻域)之间的差异。N_Intra表示包含左右图中所有像素点的邻接关系的集合。差异越大，则该项越小。根据图割算法的原理，此时邻域像素之间倾向于取不同的标签。是图像间的二元项(Inter Plane Binary Term)，定义了对应点的匹配的结果，匹配度越高，则该项越大。C_Inter表示含有左右图像素点对应(Correspondence)关系的集合。λ_Unary，λ_Intra，λ_Inter是调节各能量项之间的权值。Among them, f _Unary (p _i , _xi ) is a unary term (Unary Term), which indicates the similarity of the color of pixel p _i , the disparity with the front and background colors and the disparity statistical model, and is also called a data term (Data Term). The higher the similarity, the larger the value of f _Unary . f _Intra (p _i , p _j ) is an Intra Plane Binary Term, which reflects the difference between all pixels in I and its neighbors (four-neighborhood or eight-neighborhood). N _Intra represents a set of adjacency relationships including all pixels in the left and right images. The larger the difference, the smaller the term. According to the principle of the graph cut algorithm, at this time, the neighboring pixels tend to take different labels. is a binary term (Inter Plane Binary Term) between images, which defines the matching result of corresponding points. The higher the matching degree, the larger the term. C _Inter represents a set containing the correspondence relationship between left and right image pixels. λ _Unary , λ _Intra , λ _Inter are the weights for adjusting the energy items.

(1)定义一元约束项(1) Define unary constraints

一元约束项包括颜色一元项和视差一元项两部分，定义如下:The unary constraint item includes two parts, the color unary item and the parallax unary item, which are defined as follows:

f_Unary(p_i，x_i)＝λ_c(1-P_c(x_i|c_i))+λ_d(1-P_d(x|d_i)) (2)f _Unary (p _i ， _xi )＝λ _c (1-P _c ( _xi | _ci ))+λ _d (1-P _d (x|d _i )) (2)

其中，P_c(x_i|c_i)表示给定像素p_i的颜色c_i，x_i取前景或背景标签的概率值。考虑到概率越大，能量函数应当越小，所以取1-P_c表示颜色一元项。同样地，P_c(x_i|d_i)表示给定像素p_i的视差值d_i，x_i取前景或背景标签的概率值。取1-P_d表示视差一元项。λ_c、λ_d分别代表颜色与视差的影响权值，λ_c+λ_d＝1。Among them, P _c ( _xi |ci ) represents the color c _i of a given pixel p _i _{, and xi} _takes the probability value of the foreground or background label. Considering that the greater the probability, the smaller the energy function should be, so we take 1-P _c to represent the unary item of color. Likewise, P _c ( _xi |d _i ) represents the disparity value d _i of a given pixel p _i , where _xi takes the probability value of the foreground or background label. Take 1-P _d to denote the disparity unary term. λ _c and λ _d represent the influence weights of color and parallax respectively, λ _c +λ _d =1.

本发明以类簇形式表示前、背景的颜色和视差模型(N_c个前景颜色类簇M_c个背景颜色类簇N_d个前景视差类簇M_d个背景视差类簇)，给出一元项的计算方法。The present invention represents the color and parallax model of the foreground and background in the form of clusters (N _c foreground color clusters M _c background color clusters N _d foreground disparity clusters M _d background disparity clusters ), giving the calculation method of the unary term.

颜色一元项的计算方式如下。将每一个未标记像素的颜色与前景和背景颜色的类簇进行对比，找到其与这些类簇中心最小的距离，这个距离用以描述像素颜色与前、背景颜色的相似度。离前景(或背景)颜色距离越小，则颜色越相近，根据图割理论，该像素越倾向于选择前景(或背景)标签。颜色一元项的数学形式描述为：The color unary term is calculated as follows. Compare the color of each unmarked pixel with the clusters of foreground and background colors, and find the minimum distance from the centers of these clusters, which is used to describe the similarity between the pixel color and the foreground and background colors. The smaller the distance from the foreground (or background) color, the closer the color is. According to the graph cut theory, the pixel is more inclined to select the foreground (or background) label. The mathematical form of the color unary term is described as:

其中，分别表示像素p_i的颜色c_i到前景和背景颜色的各类簇中心的最小距离，其表达式分别为：in, represent the minimum distances from the color c _i of the pixel p _i to the centers of various clusters of the foreground and background colors, and the expressions are respectively:

视差一元项与颜色一元项的计算过程相同。The disparity unary is calculated in the same way as the color unary.

(2)定义图像内二元约束项(2) Define binary constraints in the image

图像内二元约束项f_Intra(p_i，p_j)包含两项，分别描述像素点周围颜色变化(即颜色梯度)和视差变化(即视差梯度)，定义如下:The binary constraint item f _Intra (p _i , p _j ) in the image contains two items, which respectively describe the color change (ie color gradient) and parallax change (ie parallax gradient) around the pixel point, which are defined as follows:

f_Intra(p_i，p_j)＝f_c(p_i，p_j)f_d(p_i，p_j) (4)f _Intra (p _i , p _j ) = f _c (p _i , p _j ) f _d (p _i , p _j ) (4)

其中，f_c(p_i，p_j)表示相邻像素间颜色的相似性，颜色越相近其值越大，根据图割算法的原理，边界穿过二者的几率就较小。f_d(p_i，p_j)表示像素p_i相对于邻接像素点p_j视差的相似性。二者视差越相近，其值越大，根据图割算法的原理，边界穿过二者的几率就较小。本发明对于两项的定义形式建议如下：Among them, f _c (p _i , p _j ) represents the similarity of colors between adjacent pixels. The closer the color is, the larger the value is. According to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller. f _d (p _i , p _j ) represents the disparity similarity of pixel p _i with respect to adjacent pixel point p _j . The closer the parallax of the two is, the larger its value is, and according to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller. The present invention suggests as follows for the definition form of two items:

上述两项也可取其他形式，如Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive Object Selection in Stereo Image Pairs”中用到的指数形式。The above two terms can also take other forms, such as the exponential form used in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on ICCV in 2011.

实际上，视差计算存在误差，直接使用会将误差引入分割过程。本发明建议的解决方法是用局部区域的视差变化替代两两像素之间的视差变化。令S_j表示p_i所在区域。区域S_j内的视差变化情况采用方差var(S_j)表示。在此情况下，式(6)变为:In fact, there are errors in the disparity calculation, and direct use will introduce errors into the segmentation process. The solution proposed by the present invention is to replace the parallax change between two pixels with the parallax change in a local area. Let S _j denote the area where p _i is located. The parallax change in the area S _j is represented by the variance var(S _j ). In this case, equation (6) becomes:

其中，S_j＝A(p_i)，函数A(p_i)表示像素p_i所在的区域。图像区域可以采用过分割方法得到，也可以通过将图片事先分割成小的正方形区域集合。Wherein, S _j =A(p _i ), and the function A(p _i ) represents the area where the pixel p _i is located. The image area can be obtained by over-segmentation method, or by dividing the picture into a small set of square areas in advance.

(3)定义图像间二元约束项(3) Define binary constraints between images

图像间二元项约束立体图像之间的对应像素取同样的标签，定义如下：The inter-image binary term constrains corresponding pixels between stereo images to take the same label, defined as follows:

其中，C表示立体图像中之间作为对应点的可能性，是非对称函数：Among them, C represents the stereo image The likelihood of corresponding points between is an asymmetric function:

是基于视差图确定的之间作为对应点的概率分布函数。函数表示是左图像素在右图上的对应点，对应关系根据视差图决定。Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive Object Selection inStereo Image Pairs”中提到了的五种定义方式，分别是均匀分布，Delta函数，概率密度分布函数，一致的Delta函数，以及一致的概率密度分布函数。由于均匀分布、概率密度分布函数、一致的概率密度分布函数的形式的计算复杂度高，本发明建议采用Delta函数或一致的Delta函数，定义方式如下。Delta函数只使用单幅图对应的视差图。令{d^l}为左图视差集合，Delta函数定义方式为： is determined based on the disparity map Between as the probability distribution function of the corresponding points. function express is the left image pixel At the corresponding points on the right image, the correspondence is determined according to the disparity map. Price et al. mentioned in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published on ICCV in 2011 The five definition methods of are Uniform Distribution, Delta Function, Probability Density Distribution Function, Consistent Delta Function, and Consistent Probability Density Distribution Function. Due to the high computational complexity in the form of uniform distribution, probability density distribution function, and uniform probability density distribution function, the present invention proposes to use a Delta function or a consistent Delta function, which is defined as follows. The Delta function only uses the disparity map corresponding to a single image. Let {d ^l } be the disparity set of the left image, and the Delta function is defined as:

其中，为左图中像素与右图中对应点的视差。令{d^r}为右图视差集合，一致的Delta函数为：in, is the pixel in the left image Corresponding to the point in the figure on the right parallax. Let {d ^r } be the disparity set of the right image, and the consistent Delta function is:

其中，为右图中像素与左图对应点的视差。in, is the pixel in the right image Corresponding to the left figure parallax.

式(9)中表示与之间的颜色相似性，在视差完全准确的情况下，但目前的视差计算方法存在误差，本发明建议形式如下：In formula (9) express and The color similarity between, in the case of perfectly accurate parallax, However, there are errors in the current parallax calculation method, and the proposed form of the present invention is as follows:

其中，为左图像素的颜色值，是在右图对应点的颜色值。in, is the pixel of the left image the color value, yes Corresponding point on the right color value.

步骤五，求解能量函数最小值。Step five, find the minimum value of the energy function.

本发明采用图割算法，例如Yuri Boykov等人于2004年在《IEEE Transaction onPAMI》上发表的论文“An Experimental Comparison of Min-Cut/Max-Flow Algorithmsfor Energy Minimization in Vision”中所提出的最大流/最小割算法，通过最优化本发明所定义的能量函数(式(1))，得到最优的标记结果，即分割结果。用户如对分割结果不满意，可返回步骤二，继续添加前、背景线索。每添加一笔，将触发一次完整的分割过程。The present invention adopts a graph-cut algorithm, such as the maximum flow/ The minimum cut algorithm obtains the optimal marking result, ie, the segmentation result, by optimizing the energy function (formula (1)) defined in the present invention. If the user is not satisfied with the segmentation result, he can return to step 2 and continue to add front and background clues. Each addition will trigger a complete splitting process.

以Price等人在2011年的ICCV上发表的“StereoCut：Consistent InteractiveObject Selection in Stereo Image Pairs”中的方法为对比对象，说明本发明方法的有效性。两种方法均采用一致的Delta函数(式(11))作为对应点之间的概率分布函数。图2给出了效果对比。图2(a)、(b)为输入的左、右图像。(c)、(d)是采用StereoCut方法计算的结果；图2(e)、(f)为本发明的分割结果；在四张结果图中，围绕目标物(路灯)的闭合曲线标记分割目标物的边界；两种方法所用的用户输入分别在图(c)、(e)中显示，目标物内部的实线标记前景，目标物外部的虚线标识背景。对比图(c)、(d)和图(e)、(f)可看出，本发明的方法在同等交互量的前提下，得到的分割效果更好。对比方法如果想要得到更好的结果，需要用户指定更多的前、背景。因此，本发明方法比对比方法智能化程度更高。Taking the method in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on ICCV in 2011 as a comparison object, the effectiveness of the method of the present invention is illustrated. Both methods use a consistent Delta function (Equation (11)) as the probability distribution function between corresponding points. Figure 2 shows the effect comparison. Figure 2(a), (b) are the input left and right images. (c), (d) are the results calculated by the StereoCut method; Figure 2 (e), (f) is the segmentation result of the present invention; in the four result figures, the closed curve around the target (street lamp) marks the segmentation target The boundary of the object; the user input used by the two methods is shown in Figures (c) and (e) respectively, the solid line inside the object marks the foreground, and the dotted line outside the object marks the background. Comparing Figures (c), (d) and Figures (e), (f), it can be seen that the method of the present invention has a better segmentation effect under the premise of the same amount of interaction. If the comparison method wants to get better results, the user needs to specify more foreground and background. Therefore, the inventive method is more intelligent than the comparative method.

Claims

1. A stereoscopic image synchronous segmentation method based on dense parallax map, it is characterized in that comprising the following steps:

Step 1, matching stereoscopic images;

Read in a pair of stereo images I={I ^l , I ^r }, I ^l and I ^r represent the left and right images respectively; use the stereo matching algorithm to calculate the disparity maps corresponding to the left and right images, and use D ^l and D ^r respectively express;

Step 2, add foreground and background clues;

Designate part of the foreground and background in any one of the images through the designed interface; use input devices such as mouse, touch screen or stylus to designate part of the foreground and background pixels by drawing lines of different colors on the image;

Step 3, establishing the color and parallax prior models of the foreground and background;

Use F to represent the foreground pixel set specified by the user, and B to represent the background pixel set specified by the user; the prior model of the color of the front and background, and the disparity is expressed in the form of multiple clusters, and is obtained by fitting or counting the colors of the corresponding pixel set : use the K-means algorithm to cluster the color values corresponding to the pixels in F and B respectively, and obtain N _c foreground color clusters M _c background color clusters Represent the statistical model of the color distribution of the background; at the same time, use the same method to cluster the disparity values corresponding to the pixels in F and B to obtain N _d foreground disparity clusters M _d background disparity clusters represent the disparity distribution statistical models of the foreground and background respectively; N _c ＝M _c ＝N _d ＝M _d ＝64;

Step 4, define the energy function;

Stereo image I={I ^l , I ^r }, including left graph I ^l and right graph I ^r , can be expressed as an undirected graph G=<ν,ε>; where ν is the node set in graph G, ε is a set of edges; each vertex in the graph G corresponds to a pixel in the stereo image I; except for the pixels in the set F and the set B in the image, the remaining pixels belong to the set U; the synchronous segmentation of the interactive stereo image is in Under the constraints of the input strokes, assign a label _xi to each pixel p _i in U; _xi ∈ {1,0}, representing the front and background respectively; the edges in the graph G include the connection of adjacent pixels in the image edges, and the connecting edges between the corresponding points of the stereo image determined by the disparity map;

The solution to the above stereo image synchronization problem is defined as the optimization problem of the following target energy function:

Among them, f _Unary (p _i , x _i ) is a unary item, which represents the similarity between the color of pixel p _i , the disparity, the front and background colors and the disparity statistical model, and is also called a data item; the higher the similarity, the higher the value of f _Unary large; f _Intra (p _i , p _j ) is a binary item in the image, reflecting the difference between all pixels in I and its four neighbors or eight neighbors, the greater the difference, the smaller the item; N _Intra means Contains a collection of adjacency relations of all pixels in the left and right images; according to the principle of the graph cut algorithm, at this time, the neighboring pixels tend to take different labels; is a binary item between images, which defines the matching result of corresponding points. The higher the matching degree, the larger the item; C _Inter represents the set containing the corresponding relationship between left and right image pixels; λ _Unary , λ _Intra , λ _Inter are Adjust the weight between each energy item;

(1) Define unary constraints

The unary constraint item includes two parts, the color unary item and the parallax unary item, which are defined as follows:

f _Unary (p _i , _xi )＝λ _c (1-P _c ( _xi | _ci ))+λ _d (1-P _d (x|d _i )) (2)

Among them, P _c ( _xi |ci ) represents the color c _i of a given pixel p _i _{, and xi} _takes the probability value of the foreground or background label; considering that the greater the probability, the smaller the energy function should be, so take 1-P _c represents the color unary item; similarly, P _c ( _xi |d _i ) represents the disparity value d _i of a given pixel p _i , and _xi takes the probability value of the foreground or background label; taking 1-P _d represents the disparity unary item; λ _c , λ _d represent the influence weights of color and parallax respectively, λ _c +λ _d ＝1;

Represent the color and disparity model of foreground and background in the form of clusters, including N _c foreground color clusters M _c background color clusters N _d foreground disparity clusters M _d background disparity clusters Give the calculation method of the unary term;

The color unary term is calculated as follows: compare the color of each unlabeled pixel with the clusters of foreground and background colors, and find the minimum distance from the center of these clusters, which is used to describe the pixel color and the foreground and background colors. The similarity; the smaller the distance from the foreground or background color, the closer the color is, according to the graph cut theory, the pixel is more inclined to select the foreground or background label; the mathematical form of the color unary term is described as:

in, represent the minimum distances from the color c _i of the pixel p _i to the centers of various clusters of the foreground and background colors, and the expressions are respectively:

The calculation process of the disparity unary item is the same as that of the color unary item;

(2) Define binary constraints in the image

The binary constraint term f _Intra (p _i ,p _j ) in the image contains two items, which respectively describe the color change and parallax change around the pixel point, that is, the color gradient and the parallax gradient, which are defined as follows:

f _Intra (p _i ,p _j )＝f _c (p _i ,p _j )f _d (p _i ,p _j ) (4)

Among them, f _c (p _i , p _j ) represents the similarity of colors between adjacent pixels. The closer the color is, the larger the value is. According to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller; f _d ( p _i , p _j ) represents the similarity of disparity between pixel p _i and adjacent pixel point p _j ; the closer the two disparities are, the larger the value is, and according to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller ; The definitions of the two terms are as follows:

In fact, there is an error in the disparity calculation, and direct use will introduce the error into the segmentation process; the solution is to replace the parallax change between two pixels with the parallax change of the local area; let S _j represent the area _where p _i is located; The parallax change is represented by the variance var(S _j ); in this case, formula (6) becomes:

Wherein, S _j =A(p _i ), the function A(p _i ) represents the area where the pixel p _i is located; the image area can be obtained by over-segmentation, or by dividing the picture into small square area sets in advance;

(3) Define binary constraints between images

The inter-image binary term constrains corresponding pixels between stereo images to take the same label, defined as follows:

Among them, C represents the stereo image The likelihood of corresponding points between is an asymmetric function:

is determined based on the disparity map between as the probability distribution function of corresponding points; the function express is the left image pixel At the corresponding point on the right image, the corresponding relationship is determined according to the disparity map; Use the Delta function or a consistent Delta function, defined as follows; the Delta function only uses the disparity map corresponding to a single image; let {d ^l } be the disparity set of the left image, and the Delta function is defined as:

in, is the pixel in the left image Corresponding to the point in the figure on the right disparity; let {d ^r } be the disparity set of the right image, and the consistent Delta function is:

in, is the pixel in the right image Corresponding to the left figure parallax;

In formula (9) express and The color similarity between, in the case of perfectly accurate parallax, However, there is an error in the current disparity calculation method, and the following form is adopted to eliminate the error:

in, is the pixel of the left image the color value, yes Corresponding point on the right the color value;

Step 5, solving the minimum value of the energy function;

Adopt the graph cut algorithm, by optimizing the energy function defined in the present invention, i.e. formula (1), obtain optimal labeling result, i.e. segmentation result; As unsatisfactory to segmentation result, return to step 2, continue to add front, background clue ; Each time an item is added, a complete segmentation process will be triggered.

2. a kind of stereoscopic image synchronous segmentation method based on dense disparity map according to claim 1, it is characterized in that, the color of front and background described in step 3, disparity prior model can also adopt GMM, histogram form expression .