CN105046689B - A kind of interactive stereo-picture fast partition method based on multi-level graph structure - Google Patents
A kind of interactive stereo-picture fast partition method based on multi-level graph structure Download PDFInfo
- Publication number
- CN105046689B CN105046689B CN201510354774.9A CN201510354774A CN105046689B CN 105046689 B CN105046689 B CN 105046689B CN 201510354774 A CN201510354774 A CN 201510354774A CN 105046689 B CN105046689 B CN 105046689B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- msub
- tau
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 12
- 238000005192 partition Methods 0.000 title 1
- 230000011218 segmentation Effects 0.000 claims abstract description 59
- 238000004364 calculation method Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000003709 image segmentation Methods 0.000 claims abstract description 17
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims abstract description 10
- 238000009826 distribution Methods 0.000 claims abstract description 9
- 238000013179 statistical model Methods 0.000 claims abstract description 9
- 238000005457 optimization Methods 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 20
- 239000003086 colorant Substances 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 2
- 101150064138 MAP1 gene Proteins 0.000 claims 1
- 101150077939 mapA gene Proteins 0.000 claims 1
- 230000001960 triggered effect Effects 0.000 claims 1
- 230000001360 synchronised effect Effects 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
- Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
Abstract
一种基于多层次图结构的交互式立体图像快速分割方法,首先输入一组立体图像,通过立体图像匹配算法得到视差图。在原始图像左右任意一图中指定部分前、背景。根据指定部分应用CUDA并行计算的方法建立前、背景的颜色以及视差分布的先验统计模型。通过对原始图像进行高斯滤波、下采样得到粗糙尺度较小的图像,然后将粗糙图像与原始图像一起构成多层次图结构。鉴于目前立体图像分割存在分割模型复杂,计算效率低的问题。本发明在基于视差图的立体图像同步分割的理论框架下,探索新的分割方法。力图简化模型的复杂度,并行处理计算密集型的任务,提高立体图像分割速度,实现实时分割常见尺寸立体图像的目的。
A fast interactive stereoscopic image segmentation method based on multi-level graph structure, first input a group of stereoscopic images, and obtain the disparity map through the stereoscopic image matching algorithm. Specify a part of the foreground and background in either left or right of the original image. According to the specified part, the method of applying CUDA parallel computing is used to establish the prior statistical model of the color of the foreground and the background and the disparity distribution. The original image is Gaussian filtered and down-sampled to obtain an image with a smaller rough scale, and then the rough image and the original image are combined to form a multi-level graph structure. In view of the current stereoscopic image segmentation, the segmentation model is complex and the calculation efficiency is low. The invention explores a new segmentation method under the theoretical framework of the synchronous segmentation of stereoscopic images based on the disparity map. Trying to simplify the complexity of the model, parallel processing of computationally intensive tasks, improve the speed of stereoscopic image segmentation, and achieve the purpose of real-time segmentation of common size stereoscopic images.
Description
技术领域technical field
本发明属于图像处理、计算机图形学和计算机视觉等交叉领域,涉及一种基于多层次图结构的交互式立体图像快速分割方法。The invention belongs to the intersecting fields of image processing, computer graphics and computer vision, and relates to an interactive three-dimensional image rapid segmentation method based on a multi-level graph structure.
背景技术Background technique
近年来3D技术不断发展,从3D立体电视到3D立体电影,对3D内容的创作以及3D编辑工具的开发提出了迫切的需求。交互式立体图像分割是其中一项重要工作,它是许多应用最重要处理的环节,像物体识别、追踪,图像分类,图像编辑以及图像重建等。目前立体图像分割已经应用于医疗图像中器官的分割与分析,物体的追踪,场景的理解等实际生活中。因此,立体图像分割效率成为重要的研究方向。In recent years, 3D technology continues to develop, from 3D stereoscopic TV to 3D stereoscopic movies, there is an urgent need for the creation of 3D content and the development of 3D editing tools. Interactive stereoscopic image segmentation is one of the most important tasks, and it is the most important processing link in many applications, such as object recognition, tracking, image classification, image editing, and image reconstruction. At present, stereoscopic image segmentation has been applied in real life such as segmentation and analysis of organs in medical images, object tracking, and scene understanding. Therefore, the efficiency of stereo image segmentation becomes an important research direction.
相比单幅图像的分割,交互式立体图像的智能分割起步较晚。目前图像分割方法主要存在两方面的挑战:计算准确率和计算速度。这是一对矛盾的问题,很难在两者之间达到较好的平衡。在提高计算准确率方面,人们做了很多的努力。Price等人在2011年的ICCV上发表的“StereoCut:Consistent Interactive Object Selection in Stereo ImagePairs”中,利用立体图像对间的视差信息来提高立体图像分割的准确率度。其将图像中每个像素的颜色、梯度、视差等信息融入传统的图割理论中,通过求解最大流来得到立体图像边界优化的结果。这种方法虽然分割精度较高,但是构建的分割模型边与节点的数目庞大,计算复杂,效率低下。目前分割算法多通过改变graph cut算法的具体实施过程来提高分割速度。对于立体图像像素数目多,边结构复杂的问题,仅改变graph cut算法的实施过程无法从根本上解决。同时,在立体图像分割过程中,存在很多单指令流多数据流计算密集型的任务。传统方法没有很好的利用这种任务可并行执行的特点,串行处理,使效率低,消耗大量的时间,从而使分割效率低下。Compared with the segmentation of a single image, the intelligent segmentation of interactive stereo images started late. There are two main challenges in current image segmentation methods: calculation accuracy and calculation speed. This is a pair of contradictory issues, and it is difficult to achieve a better balance between the two. Many efforts have been made to improve the calculation accuracy. In "StereoCut: Consistent Interactive Object Selection in Stereo ImagePairs" published by Price et al. on ICCV in 2011, the disparity information between stereo image pairs is used to improve the accuracy of stereo image segmentation. It integrates the color, gradient, parallax and other information of each pixel in the image into the traditional graph cut theory, and obtains the result of stereo image boundary optimization by solving the maximum flow. Although this method has high segmentation accuracy, the number of edges and nodes in the segmentation model constructed is huge, the calculation is complex, and the efficiency is low. At present, most segmentation algorithms increase the segmentation speed by changing the specific implementation process of the graph cut algorithm. For the problem of large number of pixels and complex edge structure in stereoscopic images, it cannot fundamentally be solved only by changing the implementation process of the graph cut algorithm. At the same time, in the stereoscopic image segmentation process, there are many single-instruction-stream-multiple-data-flow computation-intensive tasks. The traditional method does not make good use of the characteristics that this task can be executed in parallel, and the serial processing makes the efficiency low and consumes a lot of time, so that the segmentation efficiency is low.
发明内容Contents of the invention
鉴于目前立体图像分割存在分割模型复杂,计算效率低的问题。本发明在基于视差图的立体图像同步分割的理论框架下,探索新的分割方法。力图简化模型的复杂度,并行处理计算密集型的任务,提高立体图像分割速度,实现实时分割常见尺寸立体图像的目的。In view of the current stereoscopic image segmentation, the segmentation model is complex and the calculation efficiency is low. The invention explores a new segmentation method under the theoretical framework of the synchronous segmentation of stereoscopic images based on the disparity map. Trying to simplify the complexity of the model, parallel processing of computationally intensive tasks, improve the speed of stereoscopic image segmentation, and achieve the purpose of real-time segmentation of common size stereoscopic images.
为实现这个目标,本发明的技术方案为:首先输入一组立体图像,通过立体图像匹配算法得到视差图。在原始图像左右任意一图中指定部分前、背景。根据指定部分应用CUDA并行计算的方法建立前、背景的颜色以及视差分布的先验统计模型。通过对原始图像进行高斯滤波、下采样得到粗糙尺度较小的图像,然后将粗糙图像与原始图像一起构成多层次图结构。以此为基础,在图割理论框架下形式化多层次图结构中的颜色、梯度以及视差等约束,构造能量函数。为了提高效率,应用CUDA并行计算的方法处理建图过程。采用图的最大流/最小割算法求解多层次图的全局最优化结果。然后统计边界处误差较大的像素点,采用传统的图割理论,对统计的边界像素点进行局部优化。将全局处理与局部优化的结果融合在一起,构成最后的分割结果。若用户没有得到理想的效果,还可以继续对图中错误区域进行勾画,直到得到理想结果。To achieve this goal, the technical solution of the present invention is as follows: first input a group of stereoscopic images, and obtain a disparity map through a stereoscopic image matching algorithm. Specify a part of the foreground and background in either left or right of the original image. According to the specified part, the method of applying CUDA parallel computing is used to establish the prior statistical model of the color of the foreground and the background and the disparity distribution. The original image is Gaussian filtered and down-sampled to obtain an image with a smaller rough scale, and then the rough image and the original image are combined to form a multi-level graph structure. On this basis, under the framework of graph cut theory, constraints such as color, gradient and disparity in multi-level graph structures are formalized, and energy functions are constructed. In order to improve efficiency, the method of CUDA parallel computing is used to process the mapping process. The global optimization results of multi-level graphs are solved using the maximum flow/minimum cut algorithm of graphs. Then count the pixels with large errors at the boundary, and use the traditional graph cut theory to perform local optimization on the statistical boundary pixels. The results of global processing and local optimization are fused together to form the final segmentation result. If the user does not get the desired effect, he can continue to outline the wrong area in the picture until the desired result is obtained.
与现有技术相比,本发明具有以下优点:本发明通过构架基于多层次图结构的立体图像分割模型,简化了边的复杂度,显著提高了处理的速度。同时,将一些计算密集型的单指令流多数据流的任务用CUDA技术并行处理,节省大量时间。实验证明:相比现有方法,在同等交互量的前提下,本发明所述方法在分割准确率以及一致性变化不大的情况下,可以显著提高分割速度。Compared with the prior art, the present invention has the following advantages: the present invention simplifies the complexity of edges and remarkably improves the processing speed by constructing a stereoscopic image segmentation model based on a multi-level graph structure. At the same time, some calculation-intensive single instruction stream multiple data stream tasks are processed in parallel with CUDA technology, saving a lot of time. The experiment proves that: compared with the existing method, under the premise of the same amount of interaction, the method of the present invention can significantly improve the segmentation speed under the condition that the segmentation accuracy and consistency have little change.
附图说明Description of drawings
图1为本发明所涉及方法的流程图;Fig. 1 is the flowchart of the method involved in the present invention;
图2为本发明应用实例实验结果:(a)、(b)为输入的左、右图像,(c)、(d)是采用Price等人在2011年的ICCV上发表的“StereoCut:Consistent Interactive ObjectSelection in Stereo Image Pairs”中的方法分割的结果;(e)、(f)为本发明的分割结果;两种方法所用的用户输入在(c)、(e)图中显示,其中第一线条标识前景,第二线条标识背景。同时给出了两种方法分割的准确率以及分割的时间。本实施例测试所用笔记本电脑配置为:CPU处理器Intel(R)Pentium(R)CPU B950@2.10GHz 2.10GHz;Gpu处理器NVIDIAGeForce GT 540M。Fig. 2 is the experimental result of the application example of the present invention: (a), (b) are the left and right images of input, (c), (d) adopt the "StereoCut: Consistent Interactive" published by Price et al. on the ICCV in 2011 ObjectSelection in Stereo Image Pairs" method segmentation result; (e), (f) is the segmentation result of the present invention; The user input that two kinds of methods are used is shown in (c), (e) figure, wherein the first line The foreground is identified, and the second line identifies the background. At the same time, the accuracy and segmentation time of the two methods are given. The configuration of the notebook computer used in the test of this embodiment is: CPU processor Intel(R) Pentium(R) CPU B950@2.10GHz 2.10GHz; Gpu processor NVIDIAGeForce GT 540M.
具体实施方式detailed description
下面结合附图和具体实施方式对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
本发明的流程如图1所示,具体包括如下步骤:Flow process of the present invention is as shown in Figure 1, specifically comprises the following steps:
步骤一,匹配立体图像。Step 1, matching stereo images.
读入一对立体图像I={Il,Ir},Il与Ir分别表示左、右图像。通过立体匹配算法计算得到左、右图像对应的视差图,分别用Dl与Dr表示。立体匹配算法采用的是Felzenszwalb等人在CVPR04上发表的论文“Efficient Belief Propagation for Early Vision”中提出的算法。Read in a pair of stereo images I={I l , I r }, where I l and I r represent the left and right images respectively. The disparity maps corresponding to the left and right images are calculated by the stereo matching algorithm, denoted by D l and D r respectively. The stereo matching algorithm adopts the algorithm proposed in the paper "Efficient Belief Propagation for Early Vision" published by Felzenszwalb et al. on CVPR04.
步骤二,添加前、背景线索。Step two, add foreground and background cues.
用户通过所设计的界面在其中任意一张图像中指定部分前、背景。本发明实施采用类似于Price等人在2011年的ICCV上发表的“StereoCut:Consistent InteractiveObject Selection in Stereo Image Pairs”中用到的方法,利用鼠标、触摸屏或者手写笔等输入设备,通过在图像上勾画不同颜色的线条指定部分前、背景像素。如图2(e)所示,第一线条覆盖的像素属于前景,第二线条覆盖的像素属于背景。本发明的后续步骤对于该步骤中所用的前、背景像素指定方式并无限制,其它方式亦可使用。The user specifies part of the foreground and background in any one of the images through the designed interface. The implementation of the present invention adopts a method similar to that used in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on the ICCV in 2011, using input devices such as mouse, touch screen or stylus to draw on the image. Lines of different colors designate some foreground and background pixels. As shown in Figure 2(e), the pixels covered by the first line belong to the foreground, and the pixels covered by the second line belong to the background. The subsequent steps of the present invention are not limited to the method of specifying front and background pixels used in this step, and other methods can also be used.
步骤三,建立前、背景的颜色、视差先验模型。Step 3, establishing the color and parallax prior models of the foreground and background.
用F表示用户指定的前景像素集合,B表示用户指定的背景像素集合;前、背景的颜色、视差的先验模型采用GMM、直方图以及多个类簇的形式表达。本发明采用的是多类簇形式,通过统计对应像素集合的颜色、视差得到类簇。为了提高处理速度,采用基于CUDA并行的Kmeans算法,对F与B中的像素对应的颜色值、视差值分别进行聚类。处理颜色模型的具体过程如下:每个线程处理一个像素,计算每个像素到所有前景、背景类簇的距离,选择最近的距离,将像素聚类到对应的类簇中。得到Nc个前景颜色类簇Mc个背景颜色类簇上述颜色类簇分别表示前景、背景的颜色分布统计模型;同时,用同样的方法,对F和B中的像素对应的视差值分别进行聚类,得到Nd个前景视差类簇Md个背景视差类簇上述视差类簇分别表示前景、背景的视差分布统计模型;在本实施例中,Nc=Mc=64;Nd=Md=16。Use F to represent the foreground pixel set specified by the user, and B to represent the background pixel set specified by the user; the prior models of foreground and background colors and disparity are expressed in the form of GMM, histogram, and multiple clusters. The present invention adopts the form of multi-category clusters, and obtains the clusters by counting the colors and parallaxes of the corresponding pixel sets. In order to improve the processing speed, the color value and disparity value corresponding to the pixels in F and B are clustered respectively by using the parallel Kmeans algorithm based on CUDA. The specific process of processing the color model is as follows: each thread processes a pixel, calculates the distance from each pixel to all foreground and background clusters, selects the shortest distance, and clusters the pixels into the corresponding clusters. Get N c foreground color clusters M c background color clusters The above color clusters respectively represent the color distribution statistical models of the foreground and background; at the same time, use the same method to cluster the disparity values corresponding to the pixels in F and B respectively, and obtain N d foreground disparity clusters M d background disparity clusters The above-mentioned disparity clusters respectively represent disparity distribution statistical models of foreground and background; in this embodiment, N c =M c =64; N d =M d =16.
步骤四,基于多层次图结构的全局优化;Step 4, global optimization based on multi-level graph structure;
由于图像中前景、背景各自的分布比较聚集即前、背景内部像素差异较小,边界处像素差异较大。利用这一特性,用区域具有代表性的像素来表示邻域所有像素。本方法采用高斯滤波、下采样的方式,得到代表性的像素点。进而得到粗糙的尺度较小的图像。将粗糙图像与原始图像融合,构成多层次图结构。对多层次图结构的模型进行全局处理。将原始立体图像对表示为I={Il,Ir},粗糙的立体图像对表示为Iτ={Il,τ,Ir,τ},Il、Il,τ与Ir、Ir,τ分别表示左、右图像。将原始立体图像与粗糙立体图像共同表示成一个无向图G=<ν,ε>;其中,ν为无向图G中的节点集合,ε为边的集合;无向图G中的每个顶点对应立体图像I与Iτ中的一个像素;交互式立体图像快速分割是在输入笔画的约束下,为原始立体图像对中的每个像素pi赋予一个标签xi;xi∈{1,0},分别表示前、背景;无向图G中的边包含每个像素与源点、汇点的连接边,图像内相邻像素的连接边,以及视差图决定的立体图像对应点之间的连接边;同时还包含粗糙层与原始图像的父子节点之间的连接边。令为粗糙层图像像素点。由于粗糙层是对原始层进行下采样得到的,所以一个代表采样前的I图像中Nl*Nl的区域内的像素,在本实施例中Nl=3。Because the respective distributions of the foreground and background in the image are more concentrated, that is, the difference between the internal pixels of the foreground and the background is small, and the pixel difference at the boundary is relatively large. Using this feature, the representative pixels of the region are used to represent all the pixels in the neighborhood. This method uses Gaussian filtering and down-sampling to obtain representative pixels. In turn, a coarser and smaller-scale image is obtained. The rough image is fused with the original image to form a multi-level graph structure. Global processing of models with multi-level graph structures. Denote the original stereo image pair as I={I l , I r }, and the rough stereo image pair as I τ ={I l,τ ,I r,τ }, I l , I l,τ and I r , I r, τ denote the left and right images, respectively. Express the original stereo image and the rough stereo image together as an undirected graph G=<ν,ε>; where, ν is the node set in the undirected graph G, ε is the edge set; each in the undirected graph G The vertex corresponds to a pixel in the stereo image I and I τ ; the interactive stereo image fast segmentation is to assign a label x i to each pixel p i in the original stereo image pair under the constraint of the input stroke; x i ∈ {1 ,0}, representing the front and background respectively; the edges in the undirected graph G include the connection edges between each pixel and the source point and sink point, the connection edges between adjacent pixels in the image, and the corresponding points of the stereo image determined by the disparity map. The connecting edges between; it also contains the connecting edges between the rough layer and the parent-child nodes of the original image. make is the image pixel of the rough layer. Since the rough layer is obtained by downsampling the original layer, a represents the pixels in the area of N l *N l in the I image before sampling, and N l =3 in this embodiment.
把求解上述基于多层次图结构的立体图像快速分割问题定义为以下目标能量函数的最优化问题:The problem of solving the above-mentioned rapid segmentation of stereo images based on multi-level graph structure is defined as the optimization problem of the following target energy function:
其中是一元项,表示粗糙层像素的颜色、视差与前、背景颜色和视差统计模型的相似性,也叫做数据项;相似性越高,值越大;是粗糙层图像内二元项,反映了粗糙层图像所有像素与四邻域之间的差异,Νintra表示包含左右粗糙层图中所有像素点的邻接关系的集合;差异越大,则该项越小;根据图割算法的原理,此时邻域像素之间倾向于取不同的标签;是粗糙图像间的二元项,定义了对应点的匹配的结果,匹配度越高,则该项越大;Νinter表示含有左右粗糙层像素点对应关系的集合。in is a unary item, which represents the similarity between the color of the coarse layer pixel, the disparity and the front and background colors and the disparity statistical model, also called the data item; the higher the similarity, the The larger the value; is a binary item in the rough layer image, which reflects the difference between all the pixels in the rough layer image and the four neighbors, and N intra represents the set of adjacency relations of all pixels in the left and right rough layer images; the greater the difference, the more the item Small; according to the principle of the graph cut algorithm, at this time, the neighboring pixels tend to take different labels; is a binary item between rough images, which defines the matching result of the corresponding points, the higher the matching degree, the larger the item; N inter represents the set containing the corresponding relationship between the left and right rough layer pixels.
是粗糙层图像与原始图像间的二元约束关系,表示父子节点的相似性,父子节点差异越小,该值越大,边界经过两者的可能性越小。Νpaternity表示父子对应关系的集合。wunary,wintra,winter,wpaternity调节各能量项之间的权值;wunary=1,wintra=4000,winter=8000,wpaternity=1000000。 Is the binary constraint relationship between the rough layer image and the original image, indicating the similarity between the parent and child nodes. The smaller the difference between the parent and child nodes, the larger the value, and the less likely the boundary passes through the two. N paternity represents the set of parent-child correspondence. w unary , w intra , w inter , w paternity adjust the weights among the energy items; w unary =1, w intra =4000, w inter =8000, w paternity =1000000.
(1)定义一元约束项(1) Define unary constraints
一元约束项包括颜色一元项和视差一元项两部分,定义如下:The unary constraint item includes two parts, the color unary item and the parallax unary item, which are defined as follows:
其中,表示给定像素的颜色取前景或背景标签的概率值;因为概率越大,能量函数应当越小,所以取1-Pc表示颜色一元项;同样地,表示给定像素的视差值取前景或背景标签的概率值;取1-Pd表示视差一元项;wc、wd分别代表颜色与视差的影响权值,wc+wd=1;in, represents a given pixel s color Take the probability value of the foreground or background label; because the greater the probability, the smaller the energy function should be, so take 1-P c to represent the color unary item; similarly, Represents the disparity value for a given pixel Take the probability value of the foreground or background label; take 1-P d to represent the parallax unary item; w c , w d represent the influence weights of color and parallax respectively, w c +w d =1;
本方法以类簇形式表示前、背景的颜色和视差模型,包括Nc个前景颜色类簇Mc个背景颜色类簇Nd个前景视差类簇Md个背景视差类簇给出一元项的计算方法;This method represents the color and disparity model of the foreground and background in the form of clusters, including N c foreground color clusters M c background color clusters N d foreground disparity clusters M d background disparity clusters Give the calculation method of the unary term;
颜色一元项的计算方式如下:本方法采用基于CUDA并行方法来计算。将CPU端的所有像素的颜色值传到GPU端。在GPU中,并行处理所有像素。每个线程表示一个未标记像素。线程相互独立,所有线程同时计算像素颜色到前景、背景颜色模型的类簇中心的距离,找到其中最小的距离;用这个最小的距离描述像素颜色与前、背景颜色的相似度;离前景或背景颜色距离越小,则颜色越相近,根据图割理论,该像素越倾向于选择前景或背景标签;待所有线程结束,将GPU端每个像素的求解结果传到CPU端,在CPU端进行详细的建图过程。颜色一元项的数学形式描述为:The calculation method of the color unary item is as follows: This method uses a parallel method based on CUDA to calculate. Pass the color values of all pixels on the CPU side to the GPU side. In a GPU, all pixels are processed in parallel. Each thread represents an unlabeled pixel. Threads are independent of each other, and all threads simultaneously calculate the distance from the pixel color to the cluster center of the foreground and background color models, and find the smallest distance; use this smallest distance to describe the similarity between the pixel color and the front and background colors; the distance from the foreground or background The smaller the color distance, the closer the color. According to the graph cut theory, the pixel is more inclined to select the foreground or background label; after all threads are finished, the solution result of each pixel on the GPU side is transmitted to the CPU side, and the CPU side performs detailed analysis. mapping process. The mathematical form of the color unary term is described as:
其中,分别表示像素的颜色到前景和背景颜色的各类簇中心的最小距离,其表达式分别为:in, represent pixels respectively s color The minimum distances to the centers of various clusters for the foreground and background colors are expressed as:
视差一元项与颜色一元项的计算过程相同;The calculation process of the disparity unary item is the same as that of the color unary item;
(2)定义图像内二元约束项(2) Define binary constraints in the image
图像内二元约束项包含两项,分别描述像素点周围颜色变化和视差变化,即颜色梯度和视差梯度,定义如下:In-image binary constraints Contains two items, which respectively describe the color change and parallax change around the pixel point, that is, the color gradient and the parallax gradient, which are defined as follows:
其中,表示相邻像素间颜色的相似性,颜色越相近其值越大,根据图割算法的原理,边界穿过二者的几率就较小;表示像素相对于邻接像素点视差的相似性;二者视差越相近,其值越大,根据图割算法的原理,二者取不同标签的几率就较小;为了减少视差产生的误差,视差项中的视差,本步骤采用的是经过高斯滤波以及下采样得到的粗糙层的视差信息。两项的定义形式如下:in, Indicates the similarity of colors between adjacent pixels. The closer the color is, the larger the value is. According to the principle of the graph cut algorithm, the probability of the boundary passing through the two is smaller; represent pixels Relative to neighboring pixels The similarity of the parallax; the closer the two parallaxes are, the larger the value is. According to the principle of the graph cut algorithm, the probability of the two taking different labels is smaller; in order to reduce the error caused by the parallax, the parallax in the parallax item, this step uses What is the disparity information of the rough layer obtained by Gaussian filtering and downsampling. The definitions of the two terms are as follows:
(3)定义图像间二元约束项(3) Define binary constraints between images
图像间二元项约束图像间对应像素取相同标签,定义如下:The binary term between images constrains the corresponding pixels between images to take the same label, which is defined as follows:
其中,C表示立体图像中之间作为对应点的可能性,是非对称函数:Among them, C represents the stereo image The likelihood of corresponding points between is an asymmetric function:
是基于视差图确定的之间作为对应点的概率分布函数;函数表示是左粗糙层像素在右粗糙层上的对应点,对应关系根据原始视差图决定;采用一致的Delta函数,定义方式如下; is determined based on the disparity map between as the probability distribution function of corresponding points; the function express is the left coarse layer pixel The corresponding point on the right rough layer, the corresponding relationship is determined according to the original disparity map; A consistent Delta function is used, defined as follows;
其中,为左粗糙层中像素与右图中对应点的视差值;为右粗糙层中像素与左图对应点的视差;为了更好的确定左右图像素的对应关系,在此采用的是未经处理的原始视差图的视差。in, is the pixel in the left rough layer Corresponding to the point in the figure on the right the parallax value; is the pixel in the right rough layer Corresponding to the left figure disparity; in order to better determine the corresponding relationship between left and right image pixels, the disparity of the unprocessed original disparity map is used here.
式(8)中表示与之间的颜色相似的概率,在视差完全准确的情况下,但目前的视差计算方法存在误差,为了更好的确定左右图的对应关系,摒弃了视差项。仅利用颜色项,采取如下形式:In formula (8) express and The probability that the colors between However, there are errors in the current disparity calculation method. In order to better determine the correspondence between the left and right images, the disparity item is discarded. Using only color terms, it takes the following form:
其中,为左粗糙层图像素的颜色值,是在右粗糙层对应点的值;in, is the pixel of the left rough layer image the color value, yes Corresponding points in the right rough layer value;
(4)定义上下层间的父子约束关系(4) Define the parent-child constraint relationship between the upper and lower layers
图像分割最终的结果应在像素层中表示出来。为了将粗糙层的处理结果传递到像素层,同时保持上下层图像间的父子像素的一致性,将上下层间的父子约束关系定义为:The final result of image segmentation should be represented in the pixel layer. In order to transfer the processing results of the rough layer to the pixel layer while maintaining the consistency of the parent-child pixels between the upper and lower images, the parent-child constraint relationship between the upper and lower layers is defined as:
表示上下层父子像素间的相似性。由于粗糙层的像素代表原始像素层Nl*Nl区域的所有像素,粗糙层像素的标签即代表像素层对应区域的所有像素标签,因此将父子像素间的边权定义为无穷大。非父子节点像素的边不再考虑。 Indicates the similarity between the parent and child pixels of the upper and lower layers. Since the pixels of the rough layer represent all the pixels in the N l * N l area of the original pixel layer, the pixels of the rough layer The label of represents all the pixel labels in the corresponding area of the pixel layer, so the edge weight between the parent and child pixels is defined as infinity. Edges that are not parent-child pixels are no longer considered.
(5)求解能量函数最小值(5) Find the minimum value of the energy function
对于上下层间的父子约束关系,本发明中定义为无穷大,因此父子之间的边永不会被分割,父节点的标签会直接传递到子节点。由于计算父子节点的边会消耗大量的内存,同时增加计算的时间。在具体优化求解过程中,不再详细计算父子节点间的边。采用图割算法,例如Yuri Boykov等人于2004年在《IEEE Transaction on PAMI》上发表的论文“AnExperimental Comparison of Min-Cut/Max-Flow Algorithms for EnergyMinimization in Vision”中所提出的最大流/最小割算法,通过最优化本发明所定义的能量函数(式(1)),得到最优的标记结果,即粗糙层分割结果。然后根据粗糙层像素的标签,直接确定像素层对应的区域像素标签。通过这种方法在准确率不变的情况下,可以显著提高分割的速度。由于直接将粗糙层的标签传递到像素层,对于边界处邻域像素差异较大的像素点存在较大的误差。为了提高分割的准确率,统计边界处误差较大的点,进行局部优化。The parent-child constraint relationship between the upper and lower layers is defined as infinite in the present invention, so the edge between the parent and the child will never be split, and the label of the parent node will be directly passed to the child node. Calculating the edges of parent and child nodes consumes a lot of memory and increases the calculation time. In the specific optimization solution process, the edges between parent and child nodes are no longer calculated in detail. Using a graph cut algorithm, such as the maximum flow/minimum cut proposed in the paper "An Experimental Comparison of Min-Cut/Max-Flow Algorithms for EnergyMinimization in Vision" published by Yuri Boykov et al. Algorithm, by optimizing the energy function (formula (1)) defined in the present invention, the optimal marking result, that is, the rough layer segmentation result is obtained. Then, according to the labels of the pixels in the coarse layer, the corresponding region pixel labels of the pixel layer are directly determined. In this way, the speed of segmentation can be significantly improved without changing the accuracy rate. Because the label of the rough layer is directly transferred to the pixel layer, there is a large error for pixels with large differences in neighboring pixels at the boundary. In order to improve the accuracy of segmentation, the points with larger errors at the boundary are counted and local optimization is performed.
步骤五,基于原始图像的边界处局部优化Step 5, local optimization based on the boundary of the original image
经过步骤四的全局优化,得到粗糙的分割边界。由于粗糙层像素对应原始像素层的Nl*Nl区域内像素的集合,将的标签直接传递到像素层Nl*Nl的区域。在本实施例中Nl=3。对于边界处,邻域像素的差异大,直接把粗糙层像素的标签赋给区域的所有像素,会存在较大的误差。因此,对边界处进行单独的局部优化。After the global optimization in step 4, a rough segmentation boundary is obtained. due to rough layer pixels Corresponding to the set of pixels in the N l *N l area of the original pixel layer, the The label of is passed directly to the region of the pixel layer N l * N l . N l =3 in this embodiment. For the boundary, the difference between the neighboring pixels is large, and there will be a large error if the label of the coarse layer pixel is directly assigned to all the pixels in the area. Therefore, a separate local optimization is performed on the boundaries.
进行局部优化前,先统计局部边界信息。首先将得到的粗糙的分割边界分为上、下边界与左、右的边界两部分。然后将上、下边界向边界线的上面与下面分别扩充Nl个像素,将左、右边界分别向边界线的左面与右面扩充Nl个像素,在本实施例中Nl=3。对统计的边界像素,采用传统图割理论进行局部优化。局部优化是在像素层上进行的,由于视差计算存在误差,在局部优化时放弃了视差信息。在全局处理时,保证了立体图像分割的一致性,而且局部优化是对局部像素点进行的处理。因此,在局部优化时,同时在左右两幅图像上独立进行。若Ie为统计的局部待处理图。定义局部的能量函数为:Before performing local optimization, the local boundary information is counted first. Firstly, the obtained rough segmentation boundary is divided into upper and lower boundaries and left and right boundaries. Then expand the upper and lower boundaries by N1 pixels above and below the boundary line, and expand the left and right boundaries by N1 pixels to the left and right of the boundary line, respectively, N1 =3 in this embodiment. For the statistical boundary pixels, the traditional graph cut theory is used for local optimization. Local optimization is performed on the pixel level, and disparity information is discarded during local optimization due to errors in disparity calculations. During global processing, the consistency of stereoscopic image segmentation is guaranteed, and local optimization is the processing of local pixels. Therefore, during local optimization, it is performed independently on the left and right images at the same time. If I e is a statistical local graph to be processed. Define the local energy function as:
是一元项即数据项,表示边界处的像素与前、背景颜色模型的相似性,相似性越大,值越大。是二元项即平滑项,表示邻域像素的相似性,二者越相似,值越小。边界经过二者的可能性就越小。代表边界图中所有邻接关系的结合。其中,wunary+wintra=1 It is a unary item, that is, a data item, which indicates the similarity between the pixel at the boundary and the front and background color models. The greater the similarity, the greater the value. Is a binary item, that is, a smooth item, indicating the similarity of neighboring pixels, the more similar the two are, the smaller the value. The less likely it is that the border will pass through both. Represents the union of all adjacencies in the boundary graph. Wherein, w unary + w intra = 1
一元项具体定义如下:The specific definition of the unary item is as follows:
边界处的优化是局部的精确的优化,应尽可能减少误差,因此,一元项仅采用颜色项。一元项的具体计算同全局优化中一元项颜色的计算。The optimization at the boundary is a local exact optimization, which should reduce the error as much as possible, therefore, only the color term is used for the unary term. The specific calculation of the unary item is the same as the calculation of the color of the unary item in the global optimization.
二元项为了减少误差,也仅采用颜色项。具体定义如下所示:In order to reduce the error of the binary item, only the color item is used. The specific definition is as follows:
局部能量函数定义好后,采用步骤四提到的最大流/最小割优化算法,最优化局部能量函数即式(12),得到最优的标记结果,即分割结果;同步骤四分割的结果相融合,构成整个图像对的分割结果。After the local energy function is defined, the maximum flow/minimum cut optimization algorithm mentioned in step 4 is used to optimize the local energy function, namely formula (12), to obtain the optimal marking result, that is, the segmentation result; similar to the result of the segmentation in step 4 fused to form the segmentation result for the entire image pair.
步骤六,交互Step six, interact
如对分割结果不满意,返回步骤二,继续添加前、背景线索;每添加一笔,将触发一次完整的分割过程。在已分割的基础上,进行进一步的分割,直到得到满意的结果。If you are not satisfied with the segmentation result, return to step 2 and continue to add foreground and background clues; each addition will trigger a complete segmentation process. On the basis of the segmentation, further segmentation is carried out until a satisfactory result is obtained.
以Price等人在2011年的ICCV上发表的“StereoCut:Consistent InteractiveObject Selection in Stereo Image Pairs”中的方法为对比对象,说明本发明方法的有效性。两种方法均采用一致的Delta函数(式(9))作为对应点之间的概率分布函数。图2给出了效果对比。图2(a)、(b)为输入的左、右图像。(c)、(d)是采用StereoCut方法分割的结果;图2(e)、(f)为本发明的分割结果;下面两列给出了两种方法分割的准确率以及分割的总时间。准确率(用A表示)的具体定义如下:Taking the method in "StereoCut: Consistent Interactive Object Selection in Stereo Image Pairs" published by Price et al. on ICCV in 2011 as a comparison object, the effectiveness of the method of the present invention is illustrated. Both methods use a consistent Delta function (Formula (9)) as the probability distribution function between corresponding points. Figure 2 shows the effect comparison. Figure 2(a), (b) are the input left and right images. (c), (d) are the result of adopting StereoCut method segmentation; Fig. 2 (e), (f) are the segmentation result of the present invention; The following two columns provide the accuracy rate of two kinds of method segmentations and the total time of segmentation. The specific definition of the accuracy rate (expressed in A) is as follows:
其中 in
其中,NL和Nr分别表示左图和右图图像的像素总数,为分割后左图中第i个像素的标签(0或1),相应的表示分割后右图第j个像素的标签。分别代表左、右图真值,则反映了左图某一像素的标签与真值之间的差异。函数fA是关于差异的函数,差异为0时,函数为1,否则记为0。从公式(15)可看出,单幅图像中与真值的无差异总数与图像大小的比值即为分割的准确率,立体图像的分割准确性就是左右两图准确率的平均值。Among them, N L and N r represent the total number of pixels of the left image and the right image, respectively, is the label (0 or 1) of the i-th pixel in the left image after segmentation, and the corresponding Indicates the label of the jth pixel in the right image after segmentation. represent the truth values of the left and right graphs, respectively, It reflects the difference between the label and the true value of a pixel in the left image. The function f A is a function about the difference. When the difference is 0, the function is 1, otherwise it is recorded as 0. It can be seen from formula (15) that the ratio of the total number of indistinguishable values from the true value in a single image to the image size is the segmentation accuracy, and the segmentation accuracy of a stereo image is the average of the accuracy of the left and right images.
两种方法所用的用户输入分别在图(c)、(e)中显示,目标物内部的第一线条的线标记前景,目标物外部的第二线条的线标记背景。对比图(c)、(d)和图(e)、(f),以及所给出的两种方法的计算时间和准确率值,可看出:本方法在同等交互量的前提下,在分割准确率变化不大的情况下,可以显著提高图像分割的速度。The user input used by the two methods is shown in Figures (c) and (e) respectively, the line of the first line inside the object marks the foreground, and the line of the second line outside the object marks the background. Comparing Figures (c), (d) and Figures (e), (f), as well as the calculation time and accuracy of the two methods given, it can be seen that: under the premise of the same amount of interaction, this method is In the case of little change in segmentation accuracy, the speed of image segmentation can be significantly improved.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510354774.9A CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510354774.9A CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105046689A CN105046689A (en) | 2015-11-11 |
CN105046689B true CN105046689B (en) | 2017-12-15 |
Family
ID=54453207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510354774.9A Active CN105046689B (en) | 2015-06-24 | 2015-06-24 | A kind of interactive stereo-picture fast partition method based on multi-level graph structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105046689B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203447B (en) * | 2016-07-06 | 2019-12-06 | 华东理工大学 | Foreground target extraction method based on pixel inheritance |
CN106408531A (en) * | 2016-09-09 | 2017-02-15 | 四川大学 | GPU acceleration-based hierarchical adaptive three-dimensional reconstruction method |
CN106887009B (en) * | 2017-01-04 | 2020-01-03 | 深圳市赛维电商股份有限公司 | Method, device and terminal for realizing interactive image segmentation |
CN109615600B (en) * | 2018-12-12 | 2023-03-31 | 南昌工程学院 | Color image segmentation method of self-adaptive hierarchical histogram |
CN110110594B (en) * | 2019-03-28 | 2021-06-22 | 广州广电运通金融电子股份有限公司 | Product distribution identification method and device |
CN110428506B (en) * | 2019-08-09 | 2023-04-25 | 成都景中教育软件有限公司 | Method for realizing dynamic geometric three-dimensional graph cutting based on parameters |
CN110751668B (en) * | 2019-09-30 | 2022-12-27 | 北京迈格威科技有限公司 | Image processing method, device, terminal, electronic equipment and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310452A (en) * | 2013-06-17 | 2013-09-18 | 北京工业大学 | Method for segmenting images by aid of automatic weight selection |
CN104091336A (en) * | 2014-07-10 | 2014-10-08 | 北京工业大学 | Stereoscopic image synchronous segmentation method based on dense disparity map |
CN104166988A (en) * | 2014-07-10 | 2014-11-26 | 北京工业大学 | Sparse matching information fusion-based three-dimensional picture synchronization segmentation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7720282B2 (en) * | 2005-08-02 | 2010-05-18 | Microsoft Corporation | Stereo image segmentation |
-
2015
- 2015-06-24 CN CN201510354774.9A patent/CN105046689B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310452A (en) * | 2013-06-17 | 2013-09-18 | 北京工业大学 | Method for segmenting images by aid of automatic weight selection |
CN104091336A (en) * | 2014-07-10 | 2014-10-08 | 北京工业大学 | Stereoscopic image synchronous segmentation method based on dense disparity map |
CN104166988A (en) * | 2014-07-10 | 2014-11-26 | 北京工业大学 | Sparse matching information fusion-based three-dimensional picture synchronization segmentation method |
Non-Patent Citations (1)
Title |
---|
基于图像分割的立体匹配算法;颜轲等;《计算机应用》;20110131;第31卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105046689A (en) | 2015-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105046689B (en) | A kind of interactive stereo-picture fast partition method based on multi-level graph structure | |
CN109559320B (en) | Method and system for implementing visual SLAM semantic mapping function based on dilated convolutional deep neural network | |
Zhang et al. | Learning signed distance field for multi-view surface reconstruction | |
Wei et al. | Superpixel hierarchy | |
Ochmann et al. | Automatic reconstruction of fully volumetric 3D building models from oriented point clouds | |
Liu et al. | Local similarity pattern and cost self-reassembling for deep stereo matching networks | |
CN108038905B (en) | A kind of Object reconstruction method based on super-pixel | |
Papon et al. | Voxel cloud connectivity segmentation-supervoxels for point clouds | |
CN104809187B (en) | A kind of indoor scene semanteme marking method based on RGB D data | |
CN104599275B (en) | The RGB-D scene understanding methods of imparametrization based on probability graph model | |
CN103984953B (en) | Semantic segmentation method based on multiple features fusion Yu the street view image of Boosting decision forests | |
CN103413347B (en) | Based on the extraction method of monocular image depth map that prospect background merges | |
CN110163239B (en) | Weak supervision image semantic segmentation method based on super-pixel and conditional random field | |
CN115035260B (en) | A method for constructing three-dimensional semantic maps for indoor mobile robots | |
CN109544677A (en) | Indoor scene main structure method for reconstructing and system based on depth image key frame | |
CN109887021B (en) | Stereo matching method based on cross-scale random walk | |
CN103530882B (en) | Improved image segmentation method based on picture and color texture features | |
CN114926699B (en) | Method, device, medium and terminal for semantic classification of indoor 3D point cloud | |
CN104123417B (en) | A Method of Image Segmentation Based on Cluster Fusion | |
CN104091336B (en) | Stereoscopic image synchronous segmentation method based on dense disparity map | |
CN105809672A (en) | Super pixels and structure constraint based image's multiple targets synchronous segmentation method | |
CN108629783A (en) | Image partition method, system and medium based on the search of characteristics of image density peaks | |
CN109255833A (en) | Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization | |
CN105809651A (en) | Image saliency detection method based on edge non-similarity comparison | |
CN104166988B (en) | A kind of stereo sync dividing method for incorporating sparse match information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20151111 Assignee: Beijing Fu Chain Technology Co.,Ltd. Assignor: Beijing University of Technology Contract record no.: X2024980040815 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241225 Application publication date: 20151111 Assignee: BEIJING ASIAINFO DATA CO.,LTD. Assignor: Beijing University of Technology Contract record no.: X2024980040487 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241223 |
|
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20151111 Assignee: Beijing Yaote Xiaohong Technology Co.,Ltd. Assignor: Beijing University of Technology Contract record no.: X2024980042137 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241226 Application publication date: 20151111 Assignee: Beijing Feiwang Technology Co.,Ltd. Assignor: Beijing University of Technology Contract record no.: X2024980041920 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241226 |
|
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20151111 Assignee: Beijing Longxin Shengguang Technology Co.,Ltd. Assignor: Beijing University of Technology Contract record no.: X2024980042724 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241227 Application publication date: 20151111 Assignee: Beijing Juchuan Yingcai Technology Co.,Ltd. Assignor: Beijing University of Technology Contract record no.: X2024980043262 Denomination of invention: A fast interactive stereo image segmentation method based on multi-level graph structure Granted publication date: 20171215 License type: Open License Record date: 20241227 |