CN105046689B

CN105046689B - A kind of interactive stereo-picture fast partition method based on multi-level graph structure

Info

Publication number: CN105046689B
Application number: CN201510354774.9A
Authority: CN
Inventors: 马伟; 邱晓慧; 杨璐维; 邓米克; 张明亮; 段立娟
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-06-24
Filing date: 2015-06-24
Publication date: 2017-12-15
Anticipated expiration: 2035-06-24
Also published as: CN105046689A

Abstract

A fast interactive stereoscopic image segmentation method based on multi-level graph structure, first input a group of stereoscopic images, and obtain the disparity map through the stereoscopic image matching algorithm. Specify a part of the foreground and background in either left or right of the original image. According to the specified part, the method of applying CUDA parallel computing is used to establish the prior statistical model of the color of the foreground and the background and the disparity distribution. The original image is Gaussian filtered and down-sampled to obtain an image with a smaller rough scale, and then the rough image and the original image are combined to form a multi-level graph structure. In view of the current stereoscopic image segmentation, the segmentation model is complex and the calculation efficiency is low. The invention explores a new segmentation method under the theoretical framework of the synchronous segmentation of stereoscopic images based on the disparity map. Trying to simplify the complexity of the model, parallel processing of computationally intensive tasks, improve the speed of stereoscopic image segmentation, and achieve the purpose of real-time segmentation of common size stereoscopic images.

Description

A Fast Segmentation Method for Interactive Stereo Image Based on Multi-level Graph Structure

技术领域technical field

本发明属于图像处理、计算机图形学和计算机视觉等交叉领域，涉及一种基于多层次图结构的交互式立体图像快速分割方法。The invention belongs to the intersecting fields of image processing, computer graphics and computer vision, and relates to an interactive three-dimensional image rapid segmentation method based on a multi-level graph structure.

背景技术Background technique

近年来3D技术不断发展，从3D立体电视到3D立体电影，对3D内容的创作以及3D编辑工具的开发提出了迫切的需求。交互式立体图像分割是其中一项重要工作，它是许多应用最重要处理的环节，像物体识别、追踪，图像分类，图像编辑以及图像重建等。目前立体图像分割已经应用于医疗图像中器官的分割与分析，物体的追踪，场景的理解等实际生活中。因此，立体图像分割效率成为重要的研究方向。In recent years, 3D technology continues to develop, from 3D stereoscopic TV to 3D stereoscopic movies, there is an urgent need for the creation of 3D content and the development of 3D editing tools. Interactive stereoscopic image segmentation is one of the most important tasks, and it is the most important processing link in many applications, such as object recognition, tracking, image classification, image editing, and image reconstruction. At present, stereoscopic image segmentation has been applied in real life such as segmentation and analysis of organs in medical images, object tracking, and scene understanding. Therefore, the efficiency of stereo image segmentation becomes an important research direction.

相比单幅图像的分割，交互式立体图像的智能分割起步较晚。目前图像分割方法主要存在两方面的挑战：计算准确率和计算速度。这是一对矛盾的问题，很难在两者之间达到较好的平衡。在提高计算准确率方面，人们做了很多的努力。Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive Object Selection in Stereo ImagePairs”中，利用立体图像对间的视差信息来提高立体图像分割的准确率度。其将图像中每个像素的颜色、梯度、视差等信息融入传统的图割理论中，通过求解最大流来得到立体图像边界优化的结果。这种方法虽然分割精度较高，但是构建的分割模型边与节点的数目庞大，计算复杂，效率低下。目前分割算法多通过改变graph cut算法的具体实施过程来提高分割速度。对于立体图像像素数目多，边结构复杂的问题，仅改变graph cut算法的实施过程无法从根本上解决。同时，在立体图像分割过程中，存在很多单指令流多数据流计算密集型的任务。传统方法没有很好的利用这种任务可并行执行的特点，串行处理，使效率低，消耗大量的时间，从而使分割效率低下。Compared with the segmentation of a single image, the intelligent segmentation of interactive stereo images started late. There are two main challenges in current image segmentation methods: calculation accuracy and calculation speed. This is a pair of contradictory issues, and it is difficult to achieve a better balance between the two. Many efforts have been made to improve the calculation accuracy. In "StereoCut: Consistent Interactive Object Selection in Stereo ImagePairs" published by Price et al. on ICCV in 2011, the disparity information between stereo image pairs is used to improve the accuracy of stereo image segmentation. It integrates the color, gradient, parallax and other information of each pixel in the image into the traditional graph cut theory, and obtains the result of stereo image boundary optimization by solving the maximum flow. Although this method has high segmentation accuracy, the number of edges and nodes in the segmentation model constructed is huge, the calculation is complex, and the efficiency is low. At present, most segmentation algorithms increase the segmentation speed by changing the specific implementation process of the graph cut algorithm. For the problem of large number of pixels and complex edge structure in stereoscopic images, it cannot fundamentally be solved only by changing the implementation process of the graph cut algorithm. At the same time, in the stereoscopic image segmentation process, there are many single-instruction-stream-multiple-data-flow computation-intensive tasks. The traditional method does not make good use of the characteristics that this task can be executed in parallel, and the serial processing makes the efficiency low and consumes a lot of time, so that the segmentation efficiency is low.

发明内容Contents of the invention

鉴于目前立体图像分割存在分割模型复杂，计算效率低的问题。本发明在基于视差图的立体图像同步分割的理论框架下，探索新的分割方法。力图简化模型的复杂度，并行处理计算密集型的任务，提高立体图像分割速度，实现实时分割常见尺寸立体图像的目的。In view of the current stereoscopic image segmentation, the segmentation model is complex and the calculation efficiency is low. The invention explores a new segmentation method under the theoretical framework of the synchronous segmentation of stereoscopic images based on the disparity map. Trying to simplify the complexity of the model, parallel processing of computationally intensive tasks, improve the speed of stereoscopic image segmentation, and achieve the purpose of real-time segmentation of common size stereoscopic images.

为实现这个目标，本发明的技术方案为：首先输入一组立体图像，通过立体图像匹配算法得到视差图。在原始图像左右任意一图中指定部分前、背景。根据指定部分应用CUDA并行计算的方法建立前、背景的颜色以及视差分布的先验统计模型。通过对原始图像进行高斯滤波、下采样得到粗糙尺度较小的图像，然后将粗糙图像与原始图像一起构成多层次图结构。以此为基础，在图割理论框架下形式化多层次图结构中的颜色、梯度以及视差等约束，构造能量函数。为了提高效率，应用CUDA并行计算的方法处理建图过程。采用图的最大流/最小割算法求解多层次图的全局最优化结果。然后统计边界处误差较大的像素点，采用传统的图割理论，对统计的边界像素点进行局部优化。将全局处理与局部优化的结果融合在一起，构成最后的分割结果。若用户没有得到理想的效果，还可以继续对图中错误区域进行勾画，直到得到理想结果。To achieve this goal, the technical solution of the present invention is as follows: first input a group of stereoscopic images, and obtain a disparity map through a stereoscopic image matching algorithm. Specify a part of the foreground and background in either left or right of the original image. According to the specified part, the method of applying CUDA parallel computing is used to establish the prior statistical model of the color of the foreground and the background and the disparity distribution. The original image is Gaussian filtered and down-sampled to obtain an image with a smaller rough scale, and then the rough image and the original image are combined to form a multi-level graph structure. On this basis, under the framework of graph cut theory, constraints such as color, gradient and disparity in multi-level graph structures are formalized, and energy functions are constructed. In order to improve efficiency, the method of CUDA parallel computing is used to process the mapping process. The global optimization results of multi-level graphs are solved using the maximum flow/minimum cut algorithm of graphs. Then count the pixels with large errors at the boundary, and use the traditional graph cut theory to perform local optimization on the statistical boundary pixels. The results of global processing and local optimization are fused together to form the final segmentation result. If the user does not get the desired effect, he can continue to outline the wrong area in the picture until the desired result is obtained.

与现有技术相比，本发明具有以下优点：本发明通过构架基于多层次图结构的立体图像分割模型，简化了边的复杂度，显著提高了处理的速度。同时，将一些计算密集型的单指令流多数据流的任务用CUDA技术并行处理，节省大量时间。实验证明：相比现有方法，在同等交互量的前提下，本发明所述方法在分割准确率以及一致性变化不大的情况下，可以显著提高分割速度。Compared with the prior art, the present invention has the following advantages: the present invention simplifies the complexity of edges and remarkably improves the processing speed by constructing a stereoscopic image segmentation model based on a multi-level graph structure. At the same time, some calculation-intensive single instruction stream multiple data stream tasks are processed in parallel with CUDA technology, saving a lot of time. The experiment proves that: compared with the existing method, under the premise of the same amount of interaction, the method of the present invention can significantly improve the segmentation speed under the condition that the segmentation accuracy and consistency have little change.

附图说明Description of drawings

图1为本发明所涉及方法的流程图；Fig. 1 is the flowchart of the method involved in the present invention;

图2为本发明应用实例实验结果：(a)、(b)为输入的左、右图像，(c)、(d)是采用Price等人在2011年的ICCV上发表的“StereoCut：Consistent Interactive ObjectSelection in Stereo Image Pairs”中的方法分割的结果；(e)、(f)为本发明的分割结果；两种方法所用的用户输入在(c)、(e)图中显示，其中第一线条标识前景，第二线条标识背景。同时给出了两种方法分割的准确率以及分割的时间。本实施例测试所用笔记本电脑配置为：CPU处理器Intel(R)Pentium(R)CPU B950@2.10GHz 2.10GHz；Gpu处理器NVIDIAGeForce GT 540M。Fig. 2 is the experimental result of the application example of the present invention: (a), (b) are the left and right images of input, (c), (d) adopt the "StereoCut: Consistent Interactive" published by Price et al. on the ICCV in 2011 ObjectSelection in Stereo Image Pairs" method segmentation result; (e), (f) is the segmentation result of the present invention; The user input that two kinds of methods are used is shown in (c), (e) figure, wherein the first line The foreground is identified, and the second line identifies the background. At the same time, the accuracy and segmentation time of the two methods are given. The configuration of the notebook computer used in the test of this embodiment is: CPU processor Intel(R) Pentium(R) CPU B950@2.10GHz 2.10GHz; Gpu processor NVIDIAGeForce GT 540M.

具体实施方式detailed description

下面结合附图和具体实施方式对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

本发明的流程如图1所示，具体包括如下步骤：Flow process of the present invention is as shown in Figure 1, specifically comprises the following steps:

步骤一，匹配立体图像。Step 1, matching stereo images.

读入一对立体图像I＝{I^l，I^r}，I^l与I^r分别表示左、右图像。通过立体匹配算法计算得到左、右图像对应的视差图，分别用D^l与D^r表示。立体匹配算法采用的是Felzenszwalb等人在CVPR04上发表的论文“Efficient Belief Propagation for Early Vision”中提出的算法。Read in a pair of stereo images I={I ^l , I ^r }, where I ^l and I ^r represent the left and right images respectively. The disparity maps corresponding to the left and right images are calculated by the stereo matching algorithm, denoted by D ^l and D ^r respectively. The stereo matching algorithm adopts the algorithm proposed in the paper "Efficient Belief Propagation for Early Vision" published by Felzenszwalb et al. on CVPR04.

步骤二，添加前、背景线索。Step two, add foreground and background cues.

用户通过所设计的界面在其中任意一张图像中指定部分前、背景。本发明实施采用类似于Price等人在2011年的ICCV上发表的“StereoCut：Consistent InteractiveObject Selection in Stereo Image Pairs”中用到的方法，利用鼠标、触摸屏或者手写笔等输入设备，通过在图像上勾画不同颜色的线条指定部分前、背景像素。如图2(e)所示，第一线条覆盖的像素属于前景，第二线条覆盖的像素属于背景。本发明的后续步骤对于该步骤中所用的前、背景像素指定方式并无限制，其它方式亦可使用。The user specifies part of the foreground and background in any one of the images through the designed interface. The implementation of the present invention adopts a method similar to that used in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on the ICCV in 2011, using input devices such as mouse, touch screen or stylus to draw on the image. Lines of different colors designate some foreground and background pixels. As shown in Figure 2(e), the pixels covered by the first line belong to the foreground, and the pixels covered by the second line belong to the background. The subsequent steps of the present invention are not limited to the method of specifying front and background pixels used in this step, and other methods can also be used.

步骤三，建立前、背景的颜色、视差先验模型。Step 3, establishing the color and parallax prior models of the foreground and background.

用F表示用户指定的前景像素集合，B表示用户指定的背景像素集合；前、背景的颜色、视差的先验模型采用GMM、直方图以及多个类簇的形式表达。本发明采用的是多类簇形式，通过统计对应像素集合的颜色、视差得到类簇。为了提高处理速度，采用基于CUDA并行的Kmeans算法，对F与B中的像素对应的颜色值、视差值分别进行聚类。处理颜色模型的具体过程如下：每个线程处理一个像素，计算每个像素到所有前景、背景类簇的距离，选择最近的距离，将像素聚类到对应的类簇中。得到N_c个前景颜色类簇M_c个背景颜色类簇上述颜色类簇分别表示前景、背景的颜色分布统计模型；同时，用同样的方法，对F和B中的像素对应的视差值分别进行聚类，得到N_d个前景视差类簇M_d个背景视差类簇上述视差类簇分别表示前景、背景的视差分布统计模型；在本实施例中，N_c＝M_c＝64；N_d＝M_d＝16。Use F to represent the foreground pixel set specified by the user, and B to represent the background pixel set specified by the user; the prior models of foreground and background colors and disparity are expressed in the form of GMM, histogram, and multiple clusters. The present invention adopts the form of multi-category clusters, and obtains the clusters by counting the colors and parallaxes of the corresponding pixel sets. In order to improve the processing speed, the color value and disparity value corresponding to the pixels in F and B are clustered respectively by using the parallel Kmeans algorithm based on CUDA. The specific process of processing the color model is as follows: each thread processes a pixel, calculates the distance from each pixel to all foreground and background clusters, selects the shortest distance, and clusters the pixels into the corresponding clusters. Get N _c foreground color clusters M _c background color clusters The above color clusters respectively represent the color distribution statistical models of the foreground and background; at the same time, use the same method to cluster the disparity values corresponding to the pixels in F and B respectively, and obtain N _d foreground disparity clusters M _d background disparity clusters The above-mentioned disparity clusters respectively represent disparity distribution statistical models of foreground and background; in this embodiment, N _c =M _c =64; N _d =M _d =16.

步骤四，基于多层次图结构的全局优化；Step 4, global optimization based on multi-level graph structure;

由于图像中前景、背景各自的分布比较聚集即前、背景内部像素差异较小，边界处像素差异较大。利用这一特性，用区域具有代表性的像素来表示邻域所有像素。本方法采用高斯滤波、下采样的方式，得到代表性的像素点。进而得到粗糙的尺度较小的图像。将粗糙图像与原始图像融合，构成多层次图结构。对多层次图结构的模型进行全局处理。将原始立体图像对表示为I＝{I^l,I^r}，粗糙的立体图像对表示为I^τ＝{I^l,τ,I^r,τ}，I^l、I^l,τ与I^r、I^r,τ分别表示左、右图像。将原始立体图像与粗糙立体图像共同表示成一个无向图G＝<ν,ε>；其中，ν为无向图G中的节点集合，ε为边的集合；无向图G中的每个顶点对应立体图像I与I^τ中的一个像素；交互式立体图像快速分割是在输入笔画的约束下，为原始立体图像对中的每个像素p_i赋予一个标签x_i；x_i∈{1,0}，分别表示前、背景；无向图G中的边包含每个像素与源点、汇点的连接边，图像内相邻像素的连接边，以及视差图决定的立体图像对应点之间的连接边；同时还包含粗糙层与原始图像的父子节点之间的连接边。令为粗糙层图像像素点。由于粗糙层是对原始层进行下采样得到的，所以一个代表采样前的I图像中N_l*N_l的区域内的像素，在本实施例中N_l＝3。Because the respective distributions of the foreground and background in the image are more concentrated, that is, the difference between the internal pixels of the foreground and the background is small, and the pixel difference at the boundary is relatively large. Using this feature, the representative pixels of the region are used to represent all the pixels in the neighborhood. This method uses Gaussian filtering and down-sampling to obtain representative pixels. In turn, a coarser and smaller-scale image is obtained. The rough image is fused with the original image to form a multi-level graph structure. Global processing of models with multi-level graph structures. Denote the original stereo image pair as I={I ^l , I ^r }, and the rough stereo image pair as I ^τ ={I ^l,τ ,I ^r,τ }, I ^l , I ^l,τ and I ^r , I ^{r, τ} denote the left and right images, respectively. Express the original stereo image and the rough stereo image together as an undirected graph G=<ν,ε>; where, ν is the node set in the undirected graph G, ε is the edge set; each in the undirected graph G The vertex corresponds to a pixel in the stereo image I and I ^τ ; the interactive stereo image fast segmentation is to assign a label x _i to each pixel p _i in the original stereo image pair under the constraint of the input stroke; x _i ∈ {1 ,0}, representing the front and background respectively; the edges in the undirected graph G include the connection edges between each pixel and the source point and sink point, the connection edges between adjacent pixels in the image, and the corresponding points of the stereo image determined by the disparity map. The connecting edges between; it also contains the connecting edges between the rough layer and the parent-child nodes of the original image. make is the image pixel of the rough layer. Since the rough layer is obtained by downsampling the original layer, a represents the pixels in the area of N _l *N _l in the I image before sampling, and N _l =3 in this embodiment.

把求解上述基于多层次图结构的立体图像快速分割问题定义为以下目标能量函数的最优化问题：The problem of solving the above-mentioned rapid segmentation of stereo images based on multi-level graph structure is defined as the optimization problem of the following target energy function:

其中是一元项，表示粗糙层像素的颜色、视差与前、背景颜色和视差统计模型的相似性，也叫做数据项；相似性越高，值越大；是粗糙层图像内二元项，反映了粗糙层图像所有像素与四邻域之间的差异，Ν_intra表示包含左右粗糙层图中所有像素点的邻接关系的集合；差异越大，则该项越小；根据图割算法的原理，此时邻域像素之间倾向于取不同的标签；是粗糙图像间的二元项，定义了对应点的匹配的结果，匹配度越高，则该项越大；Ν_inter表示含有左右粗糙层像素点对应关系的集合。in is a unary item, which represents the similarity between the color of the coarse layer pixel, the disparity and the front and background colors and the disparity statistical model, also called the data item; the higher the similarity, the The larger the value; is a binary item in the rough layer image, which reflects the difference between all the pixels in the rough layer image and the four neighbors, and N _intra represents the set of adjacency relations of all pixels in the left and right rough layer images; the greater the difference, the more the item Small; according to the principle of the graph cut algorithm, at this time, the neighboring pixels tend to take different labels; is a binary item between rough images, which defines the matching result of the corresponding points, the higher the matching degree, the larger the item; N _inter represents the set containing the corresponding relationship between the left and right rough layer pixels.

是粗糙层图像与原始图像间的二元约束关系，表示父子节点的相似性，父子节点差异越小，该值越大，边界经过两者的可能性越小。Ν_paternity表示父子对应关系的集合。w_unary，w_intra，w_inter，w_paternity调节各能量项之间的权值；w_unary＝1，w_intra＝4000，w_inter＝8000，w_paternity＝1000000。 Is the binary constraint relationship between the rough layer image and the original image, indicating the similarity between the parent and child nodes. The smaller the difference between the parent and child nodes, the larger the value, and the less likely the boundary passes through the two. _{N paternity} represents the set of parent-child correspondence. w _unary , w _intra , w _inter , w _paternity adjust the weights among the energy items; w _unary =1, w _intra =4000, w _inter =8000, w _paternity =1000000.

(1)定义一元约束项(1) Define unary constraints

一元约束项包括颜色一元项和视差一元项两部分，定义如下：The unary constraint item includes two parts, the color unary item and the parallax unary item, which are defined as follows:

其中，表示给定像素的颜色取前景或背景标签的概率值；因为概率越大，能量函数应当越小，所以取1-P_c表示颜色一元项；同样地，表示给定像素的视差值取前景或背景标签的概率值；取1-P_d表示视差一元项；w_c、w_d分别代表颜色与视差的影响权值，w_c+w_d＝1；in, represents a given pixel s color Take the probability value of the foreground or background label; because the greater the probability, the smaller the energy function should be, so take 1-P _c to represent the color unary item; similarly, Represents the disparity value for a given pixel Take the probability value of the foreground or background label; take 1-P _d to represent the parallax unary item; w _c , w _d represent the influence weights of color and parallax respectively, w _c +w _d =1;

本方法以类簇形式表示前、背景的颜色和视差模型，包括N_c个前景颜色类簇M_c个背景颜色类簇N_d个前景视差类簇M_d个背景视差类簇给出一元项的计算方法；This method represents the color and disparity model of the foreground and background in the form of clusters, including N _c foreground color clusters M _c background color clusters N _d foreground disparity clusters M _d background disparity clusters Give the calculation method of the unary term;

颜色一元项的计算方式如下：本方法采用基于CUDA并行方法来计算。将CPU端的所有像素的颜色值传到GPU端。在GPU中，并行处理所有像素。每个线程表示一个未标记像素。线程相互独立，所有线程同时计算像素颜色到前景、背景颜色模型的类簇中心的距离，找到其中最小的距离；用这个最小的距离描述像素颜色与前、背景颜色的相似度；离前景或背景颜色距离越小，则颜色越相近，根据图割理论，该像素越倾向于选择前景或背景标签；待所有线程结束，将GPU端每个像素的求解结果传到CPU端，在CPU端进行详细的建图过程。颜色一元项的数学形式描述为：The calculation method of the color unary item is as follows: This method uses a parallel method based on CUDA to calculate. Pass the color values of all pixels on the CPU side to the GPU side. In a GPU, all pixels are processed in parallel. Each thread represents an unlabeled pixel. Threads are independent of each other, and all threads simultaneously calculate the distance from the pixel color to the cluster center of the foreground and background color models, and find the smallest distance; use this smallest distance to describe the similarity between the pixel color and the front and background colors; the distance from the foreground or background The smaller the color distance, the closer the color. According to the graph cut theory, the pixel is more inclined to select the foreground or background label; after all threads are finished, the solution result of each pixel on the GPU side is transmitted to the CPU side, and the CPU side performs detailed analysis. mapping process. The mathematical form of the color unary term is described as:

其中，分别表示像素的颜色到前景和背景颜色的各类簇中心的最小距离，其表达式分别为：in, represent pixels respectively s color The minimum distances to the centers of various clusters for the foreground and background colors are expressed as:

视差一元项与颜色一元项的计算过程相同；The calculation process of the disparity unary item is the same as that of the color unary item;

(2)定义图像内二元约束项(2) Define binary constraints in the image

图像内二元约束项包含两项，分别描述像素点周围颜色变化和视差变化，即颜色梯度和视差梯度，定义如下：In-image binary constraints Contains two items, which respectively describe the color change and parallax change around the pixel point, that is, the color gradient and the parallax gradient, which are defined as follows:

其中，表示相邻像素间颜色的相似性，颜色越相近其值越大，根据图割算法的原理，边界穿过二者的几率就较小；表示像素相对于邻接像素点视差的相似性；二者视差越相近，其值越大，根据图割算法的原理，二者取不同标签的几率就较小；为了减少视差产生的误差，视差项中的视差，本步骤采用的是经过高斯滤波以及下采样得到的粗糙层的视差信息。两项的定义形式如下：in, Indicates the similarity of colors between adjacent pixels. The closer the color is, the larger the value is. According to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller; represent pixels Relative to neighboring pixels The similarity of the parallax; the closer the two parallaxes are, the larger the value is. According to the principle of the graph cut algorithm, the probability of the two taking different labels is smaller; in order to reduce the error caused by the parallax, the parallax in the parallax item, this step uses What is the disparity information of the rough layer obtained by Gaussian filtering and downsampling. The definitions of the two terms are as follows:

(3)定义图像间二元约束项(3) Define binary constraints between images

图像间二元项约束图像间对应像素取相同标签，定义如下：The binary term between images constrains the corresponding pixels between images to take the same label, which is defined as follows:

其中，C表示立体图像中之间作为对应点的可能性，是非对称函数：Among them, C represents the stereo image The likelihood of corresponding points between is an asymmetric function:

是基于视差图确定的之间作为对应点的概率分布函数；函数表示是左粗糙层像素在右粗糙层上的对应点，对应关系根据原始视差图决定；采用一致的Delta函数，定义方式如下； is determined based on the disparity map between as the probability distribution function of corresponding points; the function express is the left coarse layer pixel The corresponding point on the right rough layer, the corresponding relationship is determined according to the original disparity map; A consistent Delta function is used, defined as follows;

其中，为左粗糙层中像素与右图中对应点的视差值；为右粗糙层中像素与左图对应点的视差；为了更好的确定左右图像素的对应关系，在此采用的是未经处理的原始视差图的视差。in, is the pixel in the left rough layer Corresponding to the point in the figure on the right the parallax value; is the pixel in the right rough layer Corresponding to the left figure disparity; in order to better determine the corresponding relationship between left and right image pixels, the disparity of the unprocessed original disparity map is used here.

式(8)中表示与之间的颜色相似的概率，在视差完全准确的情况下，但目前的视差计算方法存在误差，为了更好的确定左右图的对应关系，摒弃了视差项。仅利用颜色项，采取如下形式：In formula (8) express and The probability that the colors between However, there are errors in the current disparity calculation method. In order to better determine the correspondence between the left and right images, the disparity item is discarded. Using only color terms, it takes the following form:

其中，为左粗糙层图像素的颜色值，是在右粗糙层对应点的值；in, is the pixel of the left rough layer image the color value, yes Corresponding points in the right rough layer value;

(4)定义上下层间的父子约束关系(4) Define the parent-child constraint relationship between the upper and lower layers

图像分割最终的结果应在像素层中表示出来。为了将粗糙层的处理结果传递到像素层，同时保持上下层图像间的父子像素的一致性，将上下层间的父子约束关系定义为：The final result of image segmentation should be represented in the pixel layer. In order to transfer the processing results of the rough layer to the pixel layer while maintaining the consistency of the parent-child pixels between the upper and lower images, the parent-child constraint relationship between the upper and lower layers is defined as:

表示上下层父子像素间的相似性。由于粗糙层的像素代表原始像素层N_l*N_l区域的所有像素，粗糙层像素的标签即代表像素层对应区域的所有像素标签，因此将父子像素间的边权定义为无穷大。非父子节点像素的边不再考虑。 Indicates the similarity between the parent and child pixels of the upper and lower layers. Since the pixels of the rough layer represent all the pixels in the N _l * N _l area of the original pixel layer, the pixels of the rough layer The label of represents all the pixel labels in the corresponding area of the pixel layer, so the edge weight between the parent and child pixels is defined as infinity. Edges that are not parent-child pixels are no longer considered.

(5)求解能量函数最小值(5) Find the minimum value of the energy function

对于上下层间的父子约束关系，本发明中定义为无穷大，因此父子之间的边永不会被分割，父节点的标签会直接传递到子节点。由于计算父子节点的边会消耗大量的内存，同时增加计算的时间。在具体优化求解过程中，不再详细计算父子节点间的边。采用图割算法，例如Yuri Boykov等人于2004年在《IEEE Transaction on PAMI》上发表的论文“AnExperimental Comparison of Min-Cut/Max-Flow Algorithms for EnergyMinimization in Vision”中所提出的最大流/最小割算法，通过最优化本发明所定义的能量函数(式(1))，得到最优的标记结果，即粗糙层分割结果。然后根据粗糙层像素的标签，直接确定像素层对应的区域像素标签。通过这种方法在准确率不变的情况下，可以显著提高分割的速度。由于直接将粗糙层的标签传递到像素层，对于边界处邻域像素差异较大的像素点存在较大的误差。为了提高分割的准确率，统计边界处误差较大的点，进行局部优化。The parent-child constraint relationship between the upper and lower layers is defined as infinite in the present invention, so the edge between the parent and the child will never be split, and the label of the parent node will be directly passed to the child node. Calculating the edges of parent and child nodes consumes a lot of memory and increases the calculation time. In the specific optimization solution process, the edges between parent and child nodes are no longer calculated in detail. Using a graph cut algorithm, such as the maximum flow/minimum cut proposed in the paper "An Experimental Comparison of Min-Cut/Max-Flow Algorithms for EnergyMinimization in Vision" published by Yuri Boykov et al. Algorithm, by optimizing the energy function (formula (1)) defined in the present invention, the optimal marking result, that is, the rough layer segmentation result is obtained. Then, according to the labels of the pixels in the coarse layer, the corresponding region pixel labels of the pixel layer are directly determined. In this way, the speed of segmentation can be significantly improved without changing the accuracy rate. Because the label of the rough layer is directly transferred to the pixel layer, there is a large error for pixels with large differences in neighboring pixels at the boundary. In order to improve the accuracy of segmentation, the points with larger errors at the boundary are counted and local optimization is performed.

步骤五，基于原始图像的边界处局部优化Step 5, local optimization based on the boundary of the original image

经过步骤四的全局优化，得到粗糙的分割边界。由于粗糙层像素对应原始像素层的N_l*N_l区域内像素的集合，将的标签直接传递到像素层N_l*N_l的区域。在本实施例中N_l＝3。对于边界处，邻域像素的差异大，直接把粗糙层像素的标签赋给区域的所有像素，会存在较大的误差。因此，对边界处进行单独的局部优化。After the global optimization in step 4, a rough segmentation boundary is obtained. due to rough layer pixels Corresponding to the set of pixels in the N _l *N _l area of the original pixel layer, the The label of is passed directly to the region of the pixel layer N _l * N _l . N _l =3 in this embodiment. For the boundary, the difference between the neighboring pixels is large, and there will be a large error if the label of the coarse layer pixel is directly assigned to all the pixels in the area. Therefore, a separate local optimization is performed on the boundaries.

进行局部优化前，先统计局部边界信息。首先将得到的粗糙的分割边界分为上、下边界与左、右的边界两部分。然后将上、下边界向边界线的上面与下面分别扩充N_l个像素，将左、右边界分别向边界线的左面与右面扩充N_l个像素，在本实施例中N_l＝3。对统计的边界像素，采用传统图割理论进行局部优化。局部优化是在像素层上进行的，由于视差计算存在误差，在局部优化时放弃了视差信息。在全局处理时，保证了立体图像分割的一致性，而且局部优化是对局部像素点进行的处理。因此，在局部优化时，同时在左右两幅图像上独立进行。若I^e为统计的局部待处理图。定义局部的能量函数为：Before performing local optimization, the local boundary information is counted first. Firstly, the obtained rough segmentation boundary is divided into upper and lower boundaries and left and right boundaries. Then expand the upper and lower boundaries by N1 pixels above and below the boundary line, and expand the left and right boundaries by _N1 _pixels to the left and right of the boundary line, respectively, _N1 =3 in this embodiment. For the statistical boundary pixels, the traditional graph cut theory is used for local optimization. Local optimization is performed on the pixel level, and disparity information is discarded during local optimization due to errors in disparity calculations. During global processing, the consistency of stereoscopic image segmentation is guaranteed, and local optimization is the processing of local pixels. Therefore, during local optimization, it is performed independently on the left and right images at the same time. If I ^e is a statistical local graph to be processed. Define the local energy function as:

是一元项即数据项，表示边界处的像素与前、背景颜色模型的相似性，相似性越大，值越大。是二元项即平滑项，表示邻域像素的相似性，二者越相似，值越小。边界经过二者的可能性就越小。代表边界图中所有邻接关系的结合。其中，w_unary+w_intra＝1 It is a unary item, that is, a data item, which indicates the similarity between the pixel at the boundary and the front and background color models. The greater the similarity, the greater the value. Is a binary item, that is, a smooth item, indicating the similarity of neighboring pixels, the more similar the two are, the smaller the value. The less likely it is that the border will pass through both. Represents the union of all adjacencies in the boundary graph. Wherein, w _unary + w _intra = 1

一元项具体定义如下：The specific definition of the unary item is as follows:

边界处的优化是局部的精确的优化，应尽可能减少误差，因此，一元项仅采用颜色项。一元项的具体计算同全局优化中一元项颜色的计算。The optimization at the boundary is a local exact optimization, which should reduce the error as much as possible, therefore, only the color term is used for the unary term. The specific calculation of the unary item is the same as the calculation of the color of the unary item in the global optimization.

二元项为了减少误差，也仅采用颜色项。具体定义如下所示：In order to reduce the error of the binary item, only the color item is used. The specific definition is as follows:

局部能量函数定义好后，采用步骤四提到的最大流/最小割优化算法，最优化局部能量函数即式(12)，得到最优的标记结果，即分割结果；同步骤四分割的结果相融合，构成整个图像对的分割结果。After the local energy function is defined, the maximum flow/minimum cut optimization algorithm mentioned in step 4 is used to optimize the local energy function, namely formula (12), to obtain the optimal marking result, that is, the segmentation result; similar to the result of the segmentation in step 4 fused to form the segmentation result for the entire image pair.

步骤六，交互Step six, interact

如对分割结果不满意，返回步骤二，继续添加前、背景线索；每添加一笔，将触发一次完整的分割过程。在已分割的基础上，进行进一步的分割，直到得到满意的结果。If you are not satisfied with the segmentation result, return to step 2 and continue to add foreground and background clues; each addition will trigger a complete segmentation process. On the basis of the segmentation, further segmentation is carried out until a satisfactory result is obtained.

以Price等人在2011年的ICCV上发表的“StereoCut：Consistent InteractiveObject Selection in Stereo Image Pairs”中的方法为对比对象，说明本发明方法的有效性。两种方法均采用一致的Delta函数(式(9))作为对应点之间的概率分布函数。图2给出了效果对比。图2(a)、(b)为输入的左、右图像。(c)、(d)是采用StereoCut方法分割的结果；图2(e)、(f)为本发明的分割结果；下面两列给出了两种方法分割的准确率以及分割的总时间。准确率(用A表示)的具体定义如下：Taking the method in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on ICCV in 2011 as a comparison object, the effectiveness of the method of the present invention is illustrated. Both methods use a consistent Delta function (Formula (9)) as the probability distribution function between corresponding points. Figure 2 shows the effect comparison. Figure 2(a), (b) are the input left and right images. (c), (d) are the result of adopting StereoCut method segmentation; Fig. 2 (e), (f) are the segmentation result of the present invention; The following two columns provide the accuracy rate of two kinds of method segmentations and the total time of segmentation. The specific definition of the accuracy rate (expressed in A) is as follows:

其中 in

其中，N_L和N_r分别表示左图和右图图像的像素总数，为分割后左图中第i个像素的标签(0或1),相应的表示分割后右图第j个像素的标签。分别代表左、右图真值，则反映了左图某一像素的标签与真值之间的差异。函数f_A是关于差异的函数，差异为0时，函数为1，否则记为0。从公式(15)可看出，单幅图像中与真值的无差异总数与图像大小的比值即为分割的准确率，立体图像的分割准确性就是左右两图准确率的平均值。Among them, N _L and N _r represent the total number of pixels of the left image and the right image, respectively, is the label (0 or 1) of the i-th pixel in the left image after segmentation, and the corresponding Indicates the label of the jth pixel in the right image after segmentation. represent the truth values of the left and right graphs, respectively, It reflects the difference between the label and the true value of a pixel in the left image. The function f _A is a function about the difference. When the difference is 0, the function is 1, otherwise it is recorded as 0. It can be seen from formula (15) that the ratio of the total number of indistinguishable values from the true value in a single image to the image size is the segmentation accuracy, and the segmentation accuracy of a stereo image is the average of the accuracy of the left and right images.

两种方法所用的用户输入分别在图(c)、(e)中显示，目标物内部的第一线条的线标记前景，目标物外部的第二线条的线标记背景。对比图(c)、(d)和图(e)、(f)，以及所给出的两种方法的计算时间和准确率值，可看出：本方法在同等交互量的前提下，在分割准确率变化不大的情况下，可以显著提高图像分割的速度。The user input used by the two methods is shown in Figures (c) and (e) respectively, the line of the first line inside the object marks the foreground, and the line of the second line outside the object marks the background. Comparing Figures (c), (d) and Figures (e), (f), as well as the calculation time and accuracy of the two methods given, it can be seen that: under the premise of the same amount of interaction, this method is In the case of little change in segmentation accuracy, the speed of image segmentation can be significantly improved.

Claims

1. A method for rapidly segmenting an interactive three-dimensional image based on a multi-hierarchy chart structure is characterized by comprising the following steps: firstly, inputting a group of stereo images, and obtaining a disparity map through a stereo image matching algorithm; appointing a part of front and background in any one of the left and right images of the original image; establishing prior statistical models of the color of the front and the background and the parallax distribution by applying a CUDA parallel computing method according to the appointed part; performing Gaussian filtering and down-sampling on an original image to obtain an image with a smaller coarse scale, and forming a multi-level graph structure by the coarse image and the original image; based on the color, gradient and parallax constraints in the multi-level graph structure are formalized under the graph cutting theory framework, and an energy function is constructed; in order to improve the efficiency, a CUDA parallel computing method is applied to process the graph building process; solving a global optimization result of the multi-level graph by adopting a maximum flow/minimum cut algorithm of the graph; then, counting pixel points with large errors at the boundary, and carrying out local optimization on the counted boundary pixel points by adopting a traditional graph cutting theory; fusing the results of global processing and local optimization together to form a final segmentation result; if the user does not obtain an ideal effect, the error area in the graph is continuously sketched until an ideal result is obtained;

the method is characterized in that: the method specifically comprises the following steps:

step one, matching a stereo image;

reading in a pair of stereo images I ═ I^l，I^r}，I^lAnd I^rRespectively representing a left image and a right image; calculating to obtain corresponding disparity maps of the left image and the right image by a stereo matching algorithm, and respectively using D^lAnd D^rRepresents;

adding front and background clues;

a user designates a part of front and background in any one of the images through a designed interface; using a mouse, a touch screen or a handwriting pen input device to designate a front part and a background pixel by drawing lines with different colors on an image; pixels covered by the first line belong to the foreground, and pixels covered by the second line belong to the background; the subsequent steps of the method have no limitation on the method for specifying the front and background pixels used in the steps, and other methods can be used;

establishing prior models of the color and the parallax of the front and the background;

f represents a foreground pixel set specified by a user, and B represents a background pixel set specified by the user; the prior models of the color and the parallax of the front and the background are expressed in the form of GMM, histogram and a plurality of clusters; the method adopts a multi-cluster form, and obtains clusters by counting the color and parallax of a corresponding pixel set; in order to improve the processing speed, a Kmeans algorithm based on CUDA parallelism is adopted to carry out pixel comparison on the F and the BClustering the corresponding color values and the corresponding parallax values respectively; the specific process of processing the color model is as follows: processing a pixel by each thread, calculating the distance from each pixel to all foreground and background clusters, selecting the nearest distance, and clustering the pixels into corresponding clusters; to obtain N_cIndividual foreground color clusterM_cIndividual background color clusterThe color clusters respectively represent color distribution statistical models of the foreground and the background; meanwhile, the parallax values corresponding to the pixels in the F and the B are respectively clustered by the same method to obtain N_dIndividual foreground disparity clusterM_dIndividual background parallax clusterThe parallax cluster respectively represents a parallax distribution statistical model of the foreground and the background; in this embodiment, N_c＝M_c＝64；N_d＝M_d＝16；

Step four, global optimization based on a multi-level graph structure;

the respective distribution of the foreground and the background in the image is relatively gathered, namely the difference of pixels in the front and the background is small, and the difference of pixels at the boundary is large; by utilizing the characteristic, all pixels in the neighborhood are represented by pixels with representative regions; the method adopts a Gaussian filtering and down-sampling mode to obtain representative pixel points; further obtaining a rough image with a small scale; fusing the rough image and the original image to form a multi-level image structure; carrying out global processing on the model with the multi-level graph structure; representing an original stereo image pair as I ═ { I ═ I^l，I^rThe coarse stereo image pair denoted as I^τ＝{I^l,τ,I^r,τ}，I^l、I^l,τAnd I^r、I^r,τRespectively representing a left image and a right image; the original stereo image and the rough stereo image are jointly represented as an undirected graph G<ν,>(ii) a V is a node set in the undirected graph G and is a set of edges; each vertex in the undirected graph G corresponds to a stereo image I and I^τOne pixel of (1); the interactive stereo image is quickly divided by using each pixel p in the original stereo image pair under the constraint of input strokes_iAssigning a label x_i；x_i∈ {1,0} representing the front and background, respectively, the edges in the undirected graph G include the connecting edges between each pixel and the source and sink points, the connecting edges between adjacent pixels in the image, and the connecting edges between the corresponding points of the stereo image determined by the disparity map, and also include the connecting edges between the rough layer and the parent and child nodes of the original image, and the method comprises the steps ofCoarse layer image pixel points; since the rough layer is obtained by down-sampling the original layer, oneRepresenting N in I-pictures before sampling_l*N_lPixels in the region of (1), N in the present embodiment_l＝3；

The stereo image fast segmentation problem based on the multi-hierarchy graph structure is solved and defined as the optimization problem of the following target energy function:

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>w</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>&Element;</mo> <msup> <mi>I</mi> <mi>&tau;</mi> </msup> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mrow> <mi>r</mi> <mo>,</mo> <mi>&tau;</mi> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>E</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

whereinThe elementary item represents the similarity of the color and the parallax of the rough layer pixel with the front and background colors and the parallax statistical model, and is also called as a data item; the higher the degree of similarity is,the larger the value;is a binary term in the rough layer image, reflecting the difference between all the pixels of the rough layer image and the four adjacent domains, N_intraRepresenting a set containing the adjacent relation of all pixel points in the left and right rough layer graphs; the greater the difference, the greater theThe smaller; according to the principle of graph cut algorithm, different labels tend to be taken among the neighborhood pixels at the moment;the binary item between rough images defines the matching result of corresponding points, and the item is larger when the matching degree is higher; n is a radical of_interRepresenting a set containing corresponding relations of the left and right rough layer pixel points;is a binary constraint relation between the rough layer image and the original image, representing the parent and childThe similarity of the nodes is smaller, the difference between the parent nodes and the child nodes is smaller, the value is larger, and the possibility that the boundary passes through the parent nodes and the child nodes is smaller; n is a radical of_paternityRepresenting a set of parent-child correspondences; w is a_unary，w_intra，w_inter，w_paternityAdjusting the weight among the energy items; in the present method w_unary＝1，w_intra＝4000，w_inter＝8000，w_paternity＝1000000；

(1) Defining unary constraint terms

The univariate constraint item comprises a color univariate item and a parallax univariate item, and is defined as follows:

wherein,representing a given pixelColor of (2) Taking the probability value of the foreground or background label; since the higher the probability, the smaller the energy function should be, so 1-P is taken_cRepresenting a color unary; in the same way as above, the first and second,representing disparity values of given pixels Taking the probability value of the foreground or background label; taking 1-P_dRepresenting a disparity unary; w is a_c、w_dRespectively representing the weight of the influence of color and parallax, w_c+w_d＝1；

The method represents the color and parallax models of the front and background in cluster-like form, including N_cIndividual foreground color clusterM_cIndividual background color clusterN_dIndividual prospect looks atPoor clusterM_dIndividual background parallax clusterGiving a calculation method of the unary item;

the color unary is calculated as follows: the method adopts a parallel method based on CUDA to calculate; transmitting the color values of all pixels at the CPU end to the GPU end; in the GPU, all pixels are processed in parallel; each thread represents an unmarked pixel; the threads are mutually independent, and all the threads simultaneously calculate the distance from the pixel color to the cluster center of the foreground color model and the background color model to find out the minimum distance; describing the similarity of the pixel color with the front and background colors by using the minimum distance; the smaller the distance from the foreground or background color is, the closer the color is, and according to the graph cut theory, the more the pixel tends to select the foreground or background label; after all threads are finished, transmitting the solving result of each pixel of the GPU end to the CPU end, and carrying out a detailed image building process at the CPU end; the mathematical form of the color unary term is described as:

wherein,respectively representing pixelsColor of (2)The minimum distances to the centers of various clusters of foreground and background colors are respectively expressed as:

the parallax unary item and the color unary item are calculated in the same process;

(2) defining intra-image binary constraint terms

Intra-image binary constraint termThe method comprises two terms, which are used for respectively describing color change and parallax change around a pixel point, namely color gradient and parallax gradient, and are defined as follows:

wherein,representing the similarity of colors between adjacent pixels, wherein the closer the colors are, the larger the value of the colors is, and the probability that the boundary passes through the two pixels is lower according to the principle of a graph cut algorithm;representing a pixelRelative to adjacent pixel pointThe similarity of the parallaxes; the closer the parallax difference between the two is, the larger the value is, and the probability that the two take different labels is lower according to the principle of the graph cutting algorithm; in order to reduce errors caused by parallax error and parallax error in the parallax item, the step adopts coarse layer parallax error information obtained through Gaussian filtering and down-sampling; the two terms are defined as follows:

<mrow> <msub> <mi>f</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>c</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>c</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>f</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msubsup> <mi>p</mi> <mi>j</mi> <mi>&tau;</mi> </msubsup> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

(3) defining inter-image binary constraint terms

The binary term among the images restricts the corresponding pixels among the images to take the same label, and the definition is as follows:

wherein C represents in a stereoscopic imageThe probability of being a corresponding point between is an asymmetric function:

is determined based on a disparity mapAs a probability distribution function of corresponding points; function(s)To representIs a left coarse layer pixelDetermining a corresponding relation according to the original disparity map at a corresponding point on the right rough layer;adopting a consistent Delta function, and defining the mode as follows;

wherein,for pixels in the left roughness layerCorresponding points in the right pictureThe disparity value of (1);is a pixel in the right rough layerPoints corresponding to the left graphThe parallax of (1); in order to better determine the correspondence between the left and right image pixels, the disparity of the raw disparity map is used;

in the formula (8)To representAndthe probability of color similarity between them, in the case of a completely accurate disparity,however, the existing parallax calculation method has errors, and in order to better determine the corresponding relation of the left image and the right image, a parallax item is abandoned; using only the color term, the following form is taken:

wherein,for the left coarse layer pixelsThe color value of (a) of (b),is thatCorresponding point of right rough layerA value of (d);

(4) defining parent-child constraint relationship between upper and lower layers

The final result of image segmentation is expressed in a pixel layer; in order to transfer the processing result of the rough layer to the pixel layer while maintaining the consistency of parent-child pixels between upper and lower layer images, the parent-child constraint relationship between the upper and lower layers is defined as:

<mrow> <msub> <mi>E</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&infin;</mi> <mo>,</mo> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>&tau;</mi> </msubsup> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>N</mi> <mrow> <mi>p</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>

representing the similarity between the parent-child pixels of the upper layer and the lower layer; since the pixels of the rough layer represent the original pixel layer N_l*N_lAll pixels of the region, coarse layer pixelsThe labels of (1) represent all pixel labels of the corresponding area of the pixel layer, so that the edge weight among the parent-child pixels is defined as infinity; edges of non-parent-child node pixels are not considered;

(5) solving for the minimum of the energy function

For the parent-child constraint relationship between an upper layer and a lower layer, the definition is infinite, so that edges between parents and children cannot be divided, and labels of parent nodes can be directly transmitted to child nodes; because the calculation of the edges of the parent-child nodes consumes a large amount of memory, the calculation time is increased; in the specific optimization solving process, edges among parent and child nodes are not calculated in detail; obtaining an optimal marking result, namely a rough layer segmentation result, by optimizing an energy function (formula (1)) defined by the method by adopting a graph cutting algorithm; then, according to the labels of the pixels of the rough layer, directly determining the area pixel labels corresponding to the pixel layer; by the method, the segmentation speed can be obviously improved under the condition of unchanged accuracy rate; because the label of the rough layer is directly transmitted to the pixel layer, a large error exists for the pixel points with large difference of the neighborhood pixels at the boundary; in order to improve the accuracy of segmentation, points with larger errors at the boundary are counted, and local optimization is carried out;

step five, based on the local optimization of the boundary of the original image

Obtaining a rough segmentation boundary through the global optimization of the step four; due to the rough layerN corresponding to the original pixel layer_l*N_lIn the regionSet of pixels, willIs directly transferred to the pixel layer N_l*N_lThe area of (a); for the boundary, the difference of the neighborhood pixels is large, the labels of the rough layer pixels are directly assigned to all the pixels in the region, and a large error exists; thus, a separate local optimization is performed at the boundary;

before local optimization, local boundary information is counted; firstly, dividing the obtained rough segmentation boundary into an upper boundary, a lower boundary, a left boundary and a right boundary; then, the upper and lower boundaries are extended to the upper and lower sides of the boundary line by N_lA pixel for extending the left and right boundaries to the left and right sides of the boundary line by N_lA plurality of pixels; in the present invention N_l3; performing local optimization on the statistical boundary pixels by adopting the traditional graph cutting theory; the local optimization is carried out on a pixel layer, and parallax information is abandoned during the local optimization due to the error of parallax calculation; during global processing, the consistency of three-dimensional image segmentation is ensured, and local optimization is to process local pixel points; therefore, during local optimization, the local optimization is carried out on the left image and the right image independently; if I^eIs a statistical local graph to be processed; the local energy function is defined as:

<mrow> <msup> <mi>E</mi> <mi>e</mi> </msup> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>w</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> </msub> <munder> <mo>&Sigma;</mo> <mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <msup> <mi>I</mi> <mi>e</mi> </msup> </mrow> </munder> <msubsup> <mi>E</mi> <mrow> <mi>u</mi> <mi>n</mi> <mi>a</mi> <mi>r</mi> <mi>y</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>w</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> </msub> <munder> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> <mo>&Element;</mo> <msubsup> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> </mrow> </munder> <msubsup> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

the similarity between the pixels at the boundary and the front and background color models is represented by a unitary item, namely a data item, and the larger the similarity is, the larger the value is;the binary term is a smooth term and represents the similarity of the neighborhood pixels, and the more similar the binary term is, the smaller the value is; the less likely the boundary passes through both;representing all neighbors in the boundary mapCombining connection relations; a meta-item is specifically defined as follows:

the optimization at the boundary is local accurate optimization, and errors are reduced as much as possible, so that the unary item only adopts a color item; the specific calculation of the unary item is the same as the calculation of the unary item color in the global optimization;

the binary term only adopts a color term in order to reduce errors; the specific definition is as follows:

<mrow> <msubsup> <mi>E</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>c</mi> <mi>j</mi> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>|</mo> <mo>,</mo> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msubsup> <mi>N</mi> <mrow> <mi>int</mi> <mi>r</mi> <mi>a</mi> </mrow> <mi>e</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

after the local energy function is well defined, optimizing the local energy function, namely an equation (12), by adopting the maximum flow/minimum cut optimization algorithm mentioned in the fourth step to obtain an optimal marking result, namely a cutting result; the results of the four segmentation in the synchronization step are fused to form the segmentation result of the whole image pair;

step six, interaction

If the segmentation result is not satisfied, returning to the step two, and continuing to add the front and background clues; each time a stroke is added, a complete segmentation process is triggered; on the basis of the segmentation, further segmentation is performed until a satisfactory result is obtained.