CN105069774B

CN105069774B - The Target Segmentation method of optimization is cut based on multi-instance learning and figure

Info

Publication number: CN105069774B
Application number: CN201510375307.4A
Authority: CN
Inventors: 赵祥模; 刘占文; 高涛; 安毅生; 王润民; 徐志刚; 张立成; 周洲; 刘慧琪; 闵海根; 穆柯楠; 李强; 杨楠
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2015-06-30
Filing date: 2015-06-30
Publication date: 2017-11-10
Anticipated expiration: 2035-06-30
Also published as: CN105069774A

Abstract

本发明公开了一种基于多示例学习与图割优化的目标分割方法：步骤1：对训练图像采用多示例学习的方法进行显著模型建模，并利用显著模型对测试图像中的包和示例进行预测，得到测试图像的显著性检测结果；步骤2：将测试图像的显著性检测结果引入图割框架，依据示例特征矢量与示例包的标记对图割框架进行优化，求解图割优化的次优解，得到目标的精确分割。本发明采用多示例学习的方法建立显著性检测模型，使其适合特定种类的图像，并将显著性检测的结果用于基于图论的图像分割方法中以指导图像分割，对图割模型框架环节进行了优化，并采用凝聚层次聚类算法求解，使得分割结果能更好地符合语义感知的输出，得到精确的目标分割结果。The invention discloses a target segmentation method based on multi-instance learning and graph-cut optimization: Step 1: use the multi-instance learning method to carry out saliency model modeling on the training image, and use the saliency model to carry out saliency model modeling on the packages and examples in the test image Predict, get the saliency detection result of the test image; Step 2: Introduce the saliency detection result of the test image into the graph cut framework, optimize the graph cut framework according to the example feature vector and the label of the example package, and solve the suboptimal graph cut optimization solution to obtain an accurate segmentation of the target. The present invention adopts the method of multi-instance learning to establish a saliency detection model to make it suitable for specific types of images, and uses the results of saliency detection in an image segmentation method based on graph theory to guide image segmentation, and the frame link of the graph cut model It is optimized and solved by agglomerative hierarchical clustering algorithm, so that the segmentation results can better conform to the output of semantic awareness, and obtain accurate target segmentation results.

Description

Object Segmentation Method Based on Multiple Instance Learning and Graph Cut Optimization

技术领域technical field

本发明属于图像处理领域，涉及一种图像分割方法，具体是一种基于多示例学习与图割优化的目标分割方法。The invention belongs to the field of image processing, and relates to an image segmentation method, in particular to an object segmentation method based on multi-instance learning and graph cut optimization.

背景技术Background technique

图像目标分割是计算机视觉领域的一个重要研究方向，同时也是视觉检测、跟踪与识别等应用的重要基础，其分割质量的好坏在很大程度上影响着整个视觉系统的性能。然而由于缺乏对人类视觉系统的深层认识，图像分割同时也成为了计算机视觉领域的一个经典难题。人类视觉系统能够有选择地注意所观察场景的主要内容，而忽略其他次要内容。视觉的这种选择性注意机制使得高效的信息处理成为可能，同时也启发了计算机视觉的研究者们从注意机制的角度另辟蹊径，因此具有人类视觉特性的图像分割模型将成为图像分割领域一个新的研究热点。Image target segmentation is an important research direction in the field of computer vision, and it is also an important basis for applications such as visual inspection, tracking and recognition. The quality of its segmentation affects the performance of the entire visual system to a large extent. However, due to the lack of deep understanding of the human visual system, image segmentation has also become a classic problem in the field of computer vision. The human visual system is capable of selectively paying attention to the primary content of the observed scene while ignoring other secondary content. This selective attention mechanism of vision makes efficient information processing possible, and also inspires computer vision researchers to find another way from the perspective of attention mechanism, so the image segmentation model with human visual characteristics will become a new field of image segmentation. Research hotspots.

从计算机视觉角度出发的显著性检测，其主要分为自底向上和自顶向下的方法。目前大多显著性检测是基于非监督模型，存在所定义的模型缺乏学习能力，显著度的计算不能很好地反映视觉注意机制，以及对特定种类的图像适应能力不足和鲁棒性较差等问题；而单一的采用基于代价函数的图割算法进行目标分割，也存在计算复杂度高、分割效率与局部分割精度低等问题。Saliency detection from the perspective of computer vision is mainly divided into bottom-up and top-down methods. At present, most saliency detection is based on unsupervised models. There are problems such as the lack of learning ability of the defined model, the calculation of saliency can not reflect the visual attention mechanism well, and the lack of adaptability and poor robustness to specific types of images. ; while the single cost function-based graph cut algorithm for object segmentation also has problems such as high computational complexity, low segmentation efficiency and local segmentation accuracy.

发明内容Contents of the invention

针对上述现有技术存在的不足，本发明的目的在于，提出了一种基于多示例学习与图割优化的目标分割方法，采用多示例学习的方法建立显著性检测模型，使其适合特定种类的图像，并将显著性检测的结果用于基于图论的图像分割方法中以指导图像分割，对图割模型框架等诸多环节进行了优化，并采用凝聚层次聚类算法作为图割优化的求解方法，使得分割结果能更好地符合语义感知的输出，得到精确的目标分割结果。In view of the deficiencies in the above-mentioned prior art, the object of the present invention is to propose a target segmentation method based on multi-instance learning and graph cut optimization, and adopt the method of multi-instance learning to establish a saliency detection model to make it suitable for specific types of objects. image, and use the results of saliency detection in the image segmentation method based on graph theory to guide image segmentation, optimize the graph cut model framework and many other links, and use the agglomerative hierarchical clustering algorithm as the solution method for graph cut optimization , so that the segmentation results can better conform to the output of semantic awareness, and obtain accurate target segmentation results.

为了实现上述目的，本发明采用如下技术方案予以解决：In order to achieve the above object, the present invention adopts the following technical solutions to solve it:

基于多示例学习与图割优化的目标分割方法，包括如下步骤：A target segmentation method based on multi-instance learning and graph cut optimization, including the following steps:

步骤1：对训练图像采用多示例学习的方法进行显著模型建模，并利用显著模型对测试图像中的包和示例进行预测，得到测试图像的显著性检测结果；具体包括：Step 1: Use the method of multi-instance learning to model the saliency model on the training image, and use the saliency model to predict the packages and examples in the test image, and obtain the saliency detection result of the test image; specifically include:

步骤11，对训练图像进行预处理，并提取图像亮度梯度特征和色彩梯度特征；Step 11, preprocessing the training image, and extracting image brightness gradient features and color gradient features;

步骤12，将多示例学习引入到图像显著性检测中，得到测试图像的显著性检测结果；Step 12, introducing multi-instance learning into the image saliency detection to obtain the saliency detection result of the test image;

步骤2：将测试图像的显著性检测结果引入图割框架，依据示例特征矢量与示例包的标记对图割框架进行优化，求解图割优化的次优解，得到目标的精确分割。Step 2: Introduce the saliency detection results of the test image into the graph cut framework, optimize the graph cut framework according to the example feature vector and the label of the example package, solve the suboptimal solution of the graph cut optimization, and obtain the accurate segmentation of the target.

进一步的，所述步骤11中对训练图像进行预处理，并提取亮度梯度特征和色彩梯度特征，具体包括：Further, in the step 11, the training image is preprocessed, and the brightness gradient feature and the color gradient feature are extracted, specifically including:

步骤111，对训练图像进行色彩空间的转换及其各分量的量化预处理，得到归一化后的亮度分量L和色彩分量a、b；Step 111, performing color space conversion and quantization preprocessing of each component on the training image to obtain normalized brightness component L and color components a, b;

步骤112，计算亮度分量L的矩阵中每一个像素点的亮度梯度；Step 112, calculating the brightness gradient of each pixel in the matrix of the brightness component L;

步骤113，分别计算色彩分量a和色彩分量b的矩阵中每一个像素点的色彩梯度。Step 113, calculating the color gradient of each pixel in the matrix of color component a and color component b respectively.

3、如权利要求2所述的基于多示例学习与图割优化的目标分割方法，其特征在于，所述步骤111具体如下：3. The object segmentation method based on multi-instance learning and graph cut optimization according to claim 2, wherein the step 111 is specifically as follows:

首先，将训练图像进行伽马校正，以实现对图像色彩分量的非线性调整，将训练图像由RGB色彩空间转换至Lab色彩空间；再对训练图像在Lab色彩空间下的亮度分量L和两个色彩分量a、b进行归一化处理，得到归一化后的亮度分量L和色彩分量a、b。First, gamma correction is performed on the training image to achieve nonlinear adjustment of the image color components, and the training image is converted from the RGB color space to the Lab color space; then the brightness component L of the training image in the Lab color space and the two The color components a and b are normalized to obtain the normalized brightness component L and the color components a and b.

进一步的，所述步骤112具体包括步骤A-D：Further, the step 112 specifically includes steps A-D:

A、构建3个尺度的权值矩阵Wights＜＞；A. Construct the weight matrix Wights<> of 3 scales;

B、构建3个尺度的索引地图矩阵Slice_map＜＞；每个尺度的索引地图矩阵Slice_map＜＞对应尺度的权值矩阵Wights＜＞具有相同的维度；选取8个方向(0°、22.5°、45°、67.5°、90°、112.5°、135°、157.5°)将矩阵分为16个区域，每个区域中元素的取值与该区域的编号0～15相同；B. Construct the index map matrix Slice_map<> of 3 scales; the index map matrix Slice_map<> of each scale Slice_map<> has the same dimension as the weight matrix Wights<> of the corresponding scale; select 8 directions (0 °, 22.5 °, 45 °, 67.5°, 90°, 112.5°, 135°, 157.5°) divide the matrix into 16 areas, and the value of the elements in each area is the same as the number 0~15 of the area;

C、将每个索引地图矩阵Slice_map＜＞与其对应尺度的权值矩阵Wights＜＞中的元素一一对应相乘得到对应尺度的矩阵，即邻域梯度算子；C. Multiply each index map matrix Slice_map<> with the elements in the weight matrix Wights<> of the corresponding scale one by one to obtain the matrix of the corresponding scale, that is, the neighborhood gradient operator;

D、利用邻域梯度算子，计算亮度分量L的矩阵中一个待求像素点的亮度梯度；D. Using the neighborhood gradient operator, calculate the brightness gradient of a pixel to be calculated in the matrix of the brightness component L;

进一步的，所述步骤A具体如下：Further, the step A is specifically as follows:

分别构建3个尺度的权值矩阵Wights＜＞；所述的权值矩阵Wights＜＞是行数和列数均等于2r+1的方阵；权值矩阵Wights＜＞中的元素非0即1，等于1的元素分布在以方阵中心元素(r+1,r+1)为圆心、以r为半径的圆盘范围内，形成方阵的内切圆，方阵中其余元素均为0；3个尺度分别为r＝3、r＝5和r＝10。Construct the weight matrix Wights<> of three scales respectively; the weight matrix Wights<> is a square matrix with the number of rows and columns equal to 2r+1; the elements in the weight matrix Wights<> are either 0 or 1 , the elements equal to 1 are distributed within the range of the disk with the central element (r+1, r+1) as the center of the square matrix and the radius r as the circle, forming the inscribed circle of the square matrix, and the rest of the elements in the square matrix are 0 ; The 3 scales are r=3, r=5 and r=10 respectively.

进一步的，所述步骤D具体如下：Further, the step D is specifically as follows:

①对于某一个尺度，以步骤111得到的亮度分量L的矩阵中一待求像素点为中心，通过某一尺度的邻域梯度算子与待求像素点邻域范围内的每个亮度分量进行点乘，得到待求像素点邻域范围内的矩阵Neibor＜＞；选取竖直方向(90°)的直线作为分界线，将邻域梯度算子中的圆盘划分成左半圆和右半圆，左半圆包括第0扇区到第7扇区，右半圆包括第8扇区到第15扇区；每个半圆对应的矩阵Neibor＜＞的元素构成一个直方图并对其进行归一化，分别记为Slice_hist₁＜＞和Slice_hist₂＜＞；H₁代表左边半圆区域所对应的直方图，H₂代表右边半圆区域所对应的直方图，i为直方图的bin的取值，定义为[0,24]，即亮度范围；① For a certain scale, take a pixel to be sought in the matrix of the luminance component L obtained in step 111 as the center, and perform a calculation with each luminance component within the neighborhood of the pixel to be sought by a neighborhood gradient operator of a certain scale Dot multiplication to obtain the matrix Neibor<> within the neighborhood of the pixel to be sought; select a straight line in the vertical direction (90°) as the dividing line, and divide the disk in the neighborhood gradient operator into a left semicircle and a right semicircle, The left semicircle includes the 0th sector to the 7th sector, and the right semicircle includes the 8th sector to the 15th sector; the elements of the matrix Neibor<> corresponding to each semicircle form a histogram and normalize it, respectively Recorded as Slice_hist ₁ <> and Slice_hist ₂ <>; H ₁ represents the histogram corresponding to the left semicircle area, H ₂ represents the histogram corresponding to the right semicircle area, i is the value of the bin of the histogram, defined as [0 ,24], that is, the brightness range;

②通过式(1)所示的卡方距离计算两个归一化直方图之间的差异，即得到某一尺度下一个待求像素点的竖直方向上的亮度梯度；②Calculate the difference between the two normalized histograms through the chi-square distance shown in formula (1), that is, obtain the brightness gradient in the vertical direction of a pixel to be obtained at a certain scale;

在计算完某一尺度竖直方向上的亮度梯度之后，分别选取其他方向所在直线作为分界线，得到该待求像素点某一尺度所有其他方向上的亮度梯度；再根据步骤D同样的方式计算得到该待求像素点其他尺度上的所有方向的亮度梯度；当完成该待求像素点所有尺度所有方向上的亮度梯度计算后，由公式(2)计算得到该待求像素点的最终亮度梯度：After calculating the luminance gradient in the vertical direction of a certain scale, select the straight lines in other directions as the dividing line to obtain the luminance gradient in all other directions of a certain scale of the pixel to be obtained; then calculate in the same way as step D Obtain the brightness gradient of the pixel to be requested in all directions on other scales; after completing the calculation of the brightness gradient of the pixel to be requested in all directions on all scales, the final brightness gradient of the pixel to be requested is calculated by formula (2) :

f(x,y,r,n_ori；r＝3,5,10；n_ori＝1,2,......8)-＞Brightness Gradient(x,y) (2)f(x,y,r,n_ori; r=3,5,10; n_ori=1,2,...8)->Brightness Gradient(x,y) (2)

式中，f为一映射函数，(x,y)为任一待求像素点，r表示选取的尺度，n_ori表示选取的方向；Brightness Gradient(x,y)为像素点(x,y)的最终亮度梯度；f的对应法则为选择每个方向在3个尺度中的最大亮度梯度值作为该方向上的亮度梯度值，将8个方向上的亮度梯度求和得到像素点(x,y)的最终亮度梯度。In the formula, f is a mapping function, (x, y) is any pixel point to be obtained, r represents the selected scale, n_ori represents the selected direction; Brightness Gradient(x, y) is the pixel point (x, y) The final brightness gradient; the corresponding rule of f is to select the maximum brightness gradient value in each direction in 3 scales as the brightness gradient value in this direction, and sum the brightness gradients in 8 directions to obtain the pixel point (x, y) The final brightness gradient of .

进一步的，所述步骤113中，色彩梯度的计算与亮度梯度的计算类似，不同的是色彩梯度特征是针对两个色彩分量的色彩梯度a和b；选取的3个尺度分别为r＝5、r＝10和r＝20；因此，相应的权值矩阵和地图索引矩阵的大小分别为11*11、21*21和41*41；两个色彩分量的色彩梯度的计算和亮度梯度采用相同的计算方法，得到色彩分量a和b矩阵中每个待求像素点的最终色彩梯度。Further, in the step 113, the calculation of the color gradient is similar to the calculation of the brightness gradient, the difference is that the color gradient feature is the color gradient a and b for the two color components; the three selected scales are r=5, r=10 and r=20; therefore, the sizes of the corresponding weight matrix and map index matrix are 11*11, 21*21 and 41*41 respectively; the calculation of the color gradient of the two color components and the brightness gradient adopt the same The calculation method obtains the final color gradient of each pixel to be calculated in the matrix of color components a and b.

进一步的，所述步骤12中将多示例学习引入至图像显著性检测得到测试图像的显著性检测结果，具体包括步骤121和步骤122：Further, in the step 12, the multi-instance learning is introduced into the image saliency detection to obtain the saliency detection result of the test image, which specifically includes steps 121 and 122:

步骤121，利用步骤11中所述方法得到的亮度和色彩梯度特征，结合多示例学习EMDD算法实现对训练集的学习，得到学习好的显著性检测模型；Step 121, using the brightness and color gradient features obtained by the method described in step 11, combined with the multi-instance learning EMDD algorithm to realize the learning of the training set, and obtain a learned saliency detection model;

步骤122，将测试图像代入学习好的显著性检测模型，得到测试图像的显著性检测结果。Step 122, substituting the test image into the learned saliency detection model to obtain a saliency detection result of the test image.

进一步的，所述所述的步骤2具体包括如下步骤：Further, said step 2 specifically includes the following steps:

步骤21，将步骤1得到的图像的显著性检测结果作为图割算法的输入，依据包的显著性标记与示例特征矢量构建如式(3)所示的权函数；并得到如式(4)所示的优化后的图割代价函数；Step 21, the saliency detection result of the image obtained in step 1 is used as the input of the graph cut algorithm, and the weight function shown in formula (3) is constructed according to the saliency mark of the package and the example feature vector; and the formula (4) is obtained The optimized graph cut cost function shown;

式(3)中，w_ij表示i示例包与j示例包对应区域的视觉特征相似性，Salien(i)与Salien(j)分别表示区域i与区域j归一化后的显著度值，σ为调节视觉特征差异的敏感参数，取值为10～20；区域i与其自身的相似权值为0；相似度矩阵W＝{w_ij}是对角线为0的对称矩阵，且w_ij∈[0,1]；f_i,f_j表示i与j示例包中分别对应的示例特征矢量，即图像的亮度梯度特征与色彩梯度特征向量合成3维的组合向量Mixvector_i＝{BrightnessGradient_i,ColorGradient_i}，则Sim(f_i,f_j)＝‖Mixvector_i-Mixvector_j‖₂。式(4)所表示的图割框架中，D为N维对角矩阵，其对角线上元素U＝{U₁,U₂,...,U_i,...,U_j,...U_N}为分割状态向量，每一个向量分量U_i表示区域i的分割状态；式(4)的分子表示区域i与区域j之间的视觉相似性，分母表示区域i内的视觉相似性；In formula (3), w _ij represents the similarity of visual features of the regions corresponding to sample package i and package j, Salien(i) and Salien(j) respectively represent the normalized saliency values of region i and region j, σ In order to adjust the sensitive parameter of visual feature difference, the value is 10-20; the similarity weight between region i and itself is 0; the similarity matrix W={w _ij } is a symmetric matrix whose diagonal is 0, and w _ij ∈ [0,1]; f _i , f _j represent the example feature vectors corresponding to i and j example packages respectively, that is, the brightness gradient feature and the color gradient feature vector of the image are synthesized into a 3-dimensional combination vector Mixvector _i ={BrightnessGradient _i , ColorGradient _i }, then Sim(f _i , f _j )=‖Mixvector _i -Mixvector _j ‖ ₂ . In the graph cut framework represented by formula (4), D is an N-dimensional diagonal matrix, and the elements on the diagonal U={U ₁ , U ₂ ,...,U _i ,...,U _j ,...U _N } is the segmentation state vector, and each vector component U _i represents the segmentation state of area i; formula (4 ) represents the visual similarity between region i and region j, and the denominator represents the visual similarity within region i;

步骤22，求解R(U)的最小值特征值所对应的分割状态向量，即得到图像的最优分割结果。Step 22, solving the segmentation state vector corresponding to the minimum eigenvalue of R(U), that is, obtaining the optimal segmentation result of the image.

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2是测试图像通过多个方法进行分割的结果对比图。其中，子图(a‐1)至(a‐4)为原始的测试图像，子图(b‐1)至(b‐4)为基于多尺度图分解的谱分割算法的分割结果，子图(c‐1)至(c‐4)为直接采用凝聚层次聚类算法的分割结果，子图(d‐1)至(d‐4)为本发明方法的分割结果。Figure 2 is a comparison of the results of the test image segmentation by multiple methods. Among them, the subgraphs (a-1) to (a-4) are the original test images, the subgraphs (b-1) to (b-4) are the segmentation results of the spectral segmentation algorithm based on multi-scale graph decomposition, and the subgraphs (c‐1) to (c‐4) are the segmentation results directly using the agglomerative hierarchical clustering algorithm, and the subgraphs (d‐1) to (d‐4) are the segmentation results of the method of the present invention.

图3是圆盘左右分区示意图。Fig. 3 is a schematic diagram of the left and right partitions of the disk.

图4是H₁,H₂直方图示意图。Fig. 4 is a schematic diagram of the histogram of H ₁ and H ₂ .

图5是改变圆盘分界线方向示意图。Fig. 5 is a schematic diagram of changing the direction of the boundary line of the disk.

以下结合附图与具体实施方式对本发明进一步解释说明。The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.

具体实施方式detailed description

如图1所示，本发明的基于多示例学习与图割优化的目标分割方法，具体包括如下步骤：As shown in Figure 1, the object segmentation method based on multi-instance learning and graph cut optimization of the present invention specifically includes the following steps:

步骤1：对训练图像采用多示例学习的方法进行显著模型建模，并利用显著模型对测试图像中的包和示例进行预测，得到测试图像的显著性检测结果；Step 1: Use the method of multi-instance learning to model the saliency model on the training image, and use the saliency model to predict the packages and examples in the test image, and obtain the saliency detection result of the test image;

步骤2：将测试图像的显著度引入图割框架，依据示例特征矢量与示例包的标记对图割框架进行优化，采用凝聚层次聚类算法求解图割优化的次优解，得到目标的精确分割。Step 2: Introduce the saliency of the test image into the graph cut framework, optimize the graph cut framework according to the example feature vector and the label of the example package, use the agglomerative hierarchical clustering algorithm to solve the suboptimal solution of the graph cut optimization, and obtain the accurate segmentation of the target .

进一步的，所述的步骤1包括步骤11和步骤12：Further, said step 1 includes step 11 and step 12:

步骤12，将多示例学习引入到图像显著性检测中，得到测试图像的显著性检测结果。Step 12, introduce multi-instance learning into image saliency detection, and obtain the saliency detection result of the test image.

进一步的，所述步骤11中对训练图像进行预处理，并提取亮度梯度特征和色彩梯度特征，具体包括步骤111～步骤113：Further, in the step 11, the training image is preprocessed, and the brightness gradient feature and the color gradient feature are extracted, specifically including steps 111 to 113:

步骤111，对训练图像进行色彩空间的转换及其各分量的量化预处理，得到归一化后的亮度分量L和色彩分量a、b；具体如下：Step 111, carry out the conversion of color space and the quantitative preprocessing of each component to the training image, obtain the normalized brightness component L and color components a, b; specifically as follows:

首先，将训练图像进行伽马校正，以实现对图像色彩分量的非线性调整，将训练图像由RGB色彩空间转换至Lab色彩空间；再对训练图像在Lab色彩空间下的亮度分量L和两个色彩分量a、b进行归一化处理，得到归一化后的亮度分量L和色彩分量a、b；First, gamma correction is performed on the training image to achieve nonlinear adjustment of the image color components, and the training image is converted from the RGB color space to the Lab color space; then the brightness component L of the training image in the Lab color space and the two The color components a and b are normalized to obtain the normalized brightness component L and the color components a and b;

步骤112，计算亮度分量L的矩阵中每一个像素点的亮度梯度。具体包括步骤A-D：Step 112, calculating the brightness gradient of each pixel in the matrix of the brightness component L. Specifically include steps A-D:

A、构建3个尺度的权值矩阵Wights＜＞。具体如下：A. Construct the weight matrix Wights<> of 3 scales. details as follows:

分别构建3个尺度的权值矩阵Wights＜＞；所述的权值矩阵Wights＜＞是行数和列数均等于2r+1的方阵；权值矩阵Wights＜＞中的元素非0即1，等于1的元素分布在以方阵中心元素(r+1,r+1)为圆心、以r为半径的圆盘范围内，形成方阵的内切圆，方阵中其余元素均为0；本发明中，3个尺度分别为r＝3、r＝5和r＝10时，分别对应的权值矩阵Wights＜＞如下：Construct the weight matrix Wights<> of three scales respectively; the weight matrix Wights<> is a square matrix with the number of rows and columns equal to 2r+1; the elements in the weight matrix Wights<> are either 0 or 1 , the elements equal to 1 are distributed within the range of the disk with the central element (r+1, r+1) as the center of the square matrix and the radius r as the circle, forming the inscribed circle of the square matrix, and the rest of the elements in the square matrix are 0 ; In the present invention, when the three scales are respectively r=3, r=5 and r=10, the corresponding weight matrix Wights<> is as follows:

B、构建3个尺度的索引地图矩阵Slice_map＜＞；每个尺度的索引地图矩阵Slice_map＜＞对应尺度的权值矩阵Wights＜＞具有相同的维度，即每个索引地图Slice_map＜＞矩阵也是行数和列数都为2r+1的方阵；选取8个方向(0°、22.5°、45°、67.5°、90°、112.5°、135°、157.5°)将矩阵分为16个区域，每个区域中元素的取值与该区域的编号0～15相同；建立索引地图矩阵Slice_map＜＞的目的是为了实现对分区的快速定位。本发明中，3个索引地图矩阵Slice_map＜＞分别如下：B. Construct the index map matrix Slice_map<> of three scales; the index map matrix Slice_map<> of each scale and the weight matrix Wights<> of the corresponding scale have the same dimension, that is, each index map Slice_map<> matrix is also the number of rows and a square matrix whose number of columns is 2r+1; choose 8 directions (0°, 22.5°, 45°, 67.5°, 90°, 112.5°, 135°, 157.5°) to divide the matrix into 16 regions, each The values of the elements in a region are the same as the numbers 0-15 of the region; the purpose of establishing the index map matrix Slice_map<> is to realize the fast positioning of the partition. In the present invention, the three index map matrices Slice_map<> are as follows:

C、将每个索引地图矩阵Slice_map＜＞与其对应尺度的权值矩阵Wights＜＞中的元素一一对应相乘得到对应尺度的矩阵，即邻域梯度算子。3个尺度下的邻域梯度算子如下：C. Multiply each index map matrix Slice_map<> with the elements in the weight matrix Wights<> of the corresponding scale one by one to obtain a matrix of the corresponding scale, that is, the neighborhood gradient operator. The neighborhood gradient operators at the three scales are as follows:

D、利用邻域梯度算子，计算亮度分量L的矩阵中一个待求像素点的亮度梯度。具体如下：D. Using a neighborhood gradient operator to calculate the brightness gradient of a pixel to be calculated in the matrix of the brightness component L. details as follows:

①对于某一个尺度，以步骤111得到的亮度分量L的矩阵中一待求像素点为中心，通过某一尺度的邻域梯度算子与待求像素点邻域范围内的每个亮度分量进行点乘，得到待求像素点邻域范围内的矩阵Neibor＜＞；选取竖直方向(90°)的直线作为分界线，将邻域梯度算子中的圆盘划分成左半圆和右半圆，左半圆包括第0扇区到第7扇区，右半圆包括第8扇区到第15扇区；每个半圆对应的矩阵Neibor＜＞的元素构成一个直方图并对其进行归一化，分别记为Slice_hist₁＜＞和Slice_hist₂＜＞；如图4所示。H₁代表左边半圆区域所对应的直方图，H₂代表右边半圆区域所对应的直方图，i为直方图的bin的取值，定义为[0,24]，即亮度范围。① For a certain scale, take a pixel to be sought in the matrix of the luminance component L obtained in step 111 as the center, and perform a calculation with each luminance component within the neighborhood of the pixel to be sought by a neighborhood gradient operator of a certain scale Dot multiplication to obtain the matrix Neibor<> within the neighborhood of the pixel to be sought; select a straight line in the vertical direction (90°) as the dividing line, and divide the disk in the neighborhood gradient operator into a left semicircle and a right semicircle, The left semicircle includes the 0th sector to the 7th sector, and the right semicircle includes the 8th sector to the 15th sector; the elements of the matrix Neibor<> corresponding to each semicircle form a histogram and normalize it, respectively Recorded as Slice_hist ₁ <> and Slice_hist ₂ <>; as shown in Figure 4. H ₁ represents the histogram corresponding to the left semicircle area, H ₂ represents the histogram corresponding to the right semicircle area, and i is the value of the bin of the histogram, which is defined as [0,24], that is, the brightness range.

在计算完某一尺度竖直方向上的亮度梯度之后，如图5所示，分别选取其他方向所在直线作为分界线，得到该待求像素点某一尺度所有其他方向上的亮度梯度；再根据步骤D同样的方式计算得到该待求像素点其他尺度上的所有方向的亮度梯度。当完成该待求像素点所有尺度所有方向上的亮度梯度计算后，由公式(2)计算得到该待求像素点的最终亮度梯度：After calculating the brightness gradient in the vertical direction of a certain scale, as shown in Figure 5, select the straight lines in other directions as the dividing line to obtain the brightness gradient in all other directions of a certain scale of the pixel to be obtained; then according to In step D, the brightness gradients in all directions on other scales of the pixel to be obtained are calculated in the same manner. After completing the calculation of the brightness gradient of the pixel to be requested in all scales and directions, the final brightness gradient of the pixel to be requested is calculated by formula (2):

式中，f为一映射函数，(x,y)为任一待求像素点，r表示选取的尺度，n_ori表示选取的方向；Brightness Gradient(x,y)为像素点(x,y)的最终亮度梯度；f的对应法则为选择每个方向在3个尺度中的最大亮度梯度值作为该方向上的亮度梯度值，将8个方向上的亮度梯度求和得到像素点(x,y)的最终亮度梯度；In the formula, f is a mapping function, (x, y) is any pixel point to be obtained, r represents the selected scale, n_ori represents the selected direction; Brightness Gradient(x, y) is the pixel point (x, y) The final brightness gradient; the corresponding rule of f is to select the maximum brightness gradient value in each direction in 3 scales as the brightness gradient value in this direction, and sum the brightness gradients in 8 directions to obtain the pixel point (x, y) The final brightness gradient of ;

步骤113，分别计算色彩分量a和色彩分量b的矩阵中每一个像素点的色彩梯度。具体如下：Step 113, calculating the color gradient of each pixel in the matrix of color component a and color component b respectively. details as follows:

色彩梯度的计算与亮度梯度的计算类似，不同的是色彩梯度特征是针对两个色彩分量的色彩梯度，即Lab色彩空间下的色彩分量a和b；与亮度梯度的计算不同之处在于，选取的3个尺度分别为r＝5、r＝10和r＝20；因此，相应的权值矩阵和地图索引矩阵的大小分别为11*11、21*21和41*41；两个色彩分量的色彩梯度的计算和亮度梯度采用相同的计算方法，得到色彩分量a和b矩阵中每个待求像素点的最终色彩梯度。The calculation of the color gradient is similar to the calculation of the brightness gradient, the difference is that the color gradient feature is the color gradient for two color components, that is, the color components a and b in the Lab color space; the difference from the calculation of the brightness gradient is that the selection The three scales of are r=5, r=10 and r=20 respectively; therefore, the sizes of the corresponding weight matrix and map index matrix are 11*11, 21*21 and 41*41 respectively; the two color components The same calculation method is used for the calculation of the color gradient and the brightness gradient, and the final color gradient of each pixel to be obtained in the color component a and b matrix is obtained.

进一步的，步骤12中将多示例学习引入至图像显著性检测得到测试图像的显著性检测结果，具体包括步骤121和步骤122：Further, in step 12, the multi-instance learning is introduced into the image saliency detection to obtain the saliency detection result of the test image, specifically including steps 121 and 122:

步骤121，利用步骤11中所述方法得到的亮度和色彩梯度特征，结合多示例学习EMDD算法实现对训练集的学习，得到学习好的显著性检测模型。具体步骤如下：Step 121, using the brightness and color gradient features obtained by the method described in step 11, combined with the multi-instance learning EMDD algorithm to learn the training set, and obtain a learned saliency detection model. Specific steps are as follows:

首先采用超分割方法对训练图像进行区域分割，使每个区域包含的最小像素数目为200；每个区域被当作一个包，对每个区域进行随机采样，被采样的区域中的像素被当作示例，提取相应的亮度梯度特征与色彩梯度特征矢量作为采样示例特征矢量；根据采样示例特征矢量，采用多示例学习方法EMDD算法进行分类器的训练，得到学习好的显著性检测模型；First, the super-segmentation method is used to segment the training image, so that the minimum number of pixels contained in each area is 200; each area is regarded as a bag, and each area is randomly sampled, and the pixels in the sampled area are regarded as As an example, extract the corresponding brightness gradient feature and color gradient feature vector as the sampling example feature vector; according to the sampling example feature vector, use the multi-example learning method EMDD algorithm to train the classifier, and obtain a learned saliency detection model;

对每一幅测试图像，利用与步骤11相同的过程对测试图像进行预处理，得到亮度梯度特征和色彩梯度特征；然后采用超分割方法对测试图像进行区域分割，使每个区域包含的最小像素数目为200；将每个区域当作一个包并对每个区域进行随机采样，被采样的区域中像素被当作示例，提取相应的亮度梯度特征与色彩梯度特征矢量作为采样示例特征矢量，利用步骤121得到的学习好的显著性检测模型，得到显著的示例特征矢量和每个包的显著性，从而得到测试图像的显著性检测结果。For each test image, use the same process as step 11 to preprocess the test image to obtain the brightness gradient feature and color gradient feature; then use the super-segmentation method to segment the test image, so that the minimum pixel contained in each area The number is 200; treat each area as a package and randomly sample each area, the pixels in the sampled area are taken as examples, and the corresponding brightness gradient features and color gradient feature vectors are extracted as sampling example feature vectors, using The learned saliency detection model obtained in step 121 obtains the saliency example feature vector and the saliency of each package, so as to obtain the saliency detection result of the test image.

进一步的，所述的步骤2具体包括如下步骤：Further, the step 2 specifically includes the following steps:

步骤22，采用凝聚层次聚类算法，求解R(U)的最小值特征值所对应的分割状态向量，即得到图像的最优分割结果。Step 22, using the agglomerative hierarchical clustering algorithm to solve the segmentation state vector corresponding to the minimum eigenvalue of R(U), that is, to obtain the optimal segmentation result of the image.

其中，上述凝聚层次聚类算法是指专利申请号为201210257591.1的方法的步骤2和步骤3。Wherein, the above-mentioned agglomerative hierarchical clustering algorithm refers to step 2 and step 3 of the method with patent application number 201210257591.1.

试验验证Test verification

为验证本发明方法的有效性，采用Achanta等人建立的数据库，选取其中的300张为训练图像，剩余700张为测试图像进行算法验证。列举部分实验结果如图2所示，图2分别给出了基于多尺度图分解的谱分割算法对测试图像的分割结果，专利申请号为201210257591.1的方法对测试图像的分割结果，以及采用本发明方法的分割结果。说明如下：In order to verify the effectiveness of the method of the present invention, the database established by Achanta et al. was used, 300 of which were selected as training images, and the remaining 700 were test images for algorithm verification. Some experimental results are listed as shown in Figure 2. Figure 2 shows the segmentation results of the test image based on the spectrum segmentation algorithm based on multi-scale graph decomposition, the segmentation result of the test image by the method with the patent application number 201210257591.1, and the use of the present invention The split result of the method. described as follows:

图2中子图(a‐1)至(a‐4)为原始图像，子图(b‐1)至(b‐4)为基于多尺度图分解的谱分割算法的分割结果，子图(c‐1)至(c‐4)为专利申请号为201210257591.1的方法，子图(d‐1)至(d‐4)为本发明方法。由实验结果可以得出，当背景相对复杂时，基于多尺度图分解的谱分割算法存在严重的误分与目标分割不完整现象，而专利申请号为201210257591.1的方法与本发明方法都具有较好的分割结果；当背景比较简单且与目标特征差异较大时，如原始图像(a‐1)与(a‐2)，三种方法都能够分割出较为完整的目标；但在目标与背景边界过渡缓慢且差异极小的情况下，如原始图像(a‐3)与(a‐4)，三种方法都存在不同程度的目标分割不完整，但本发明的方法与专利申请号为201210257591.1的方法的分割效果要更好一些，且本发明的方法在差异极小的目标与背景的交界处分割地更为精细，能够得到显著目标较为精确的分割结果。专利申请号为201210257591.1的方法的输入图像是原始图像，粗化的对象是从像素级开始的，虽然像素级图像较为精细，但处于计算量的考虑，直接采用亮度和色彩特征的图割方法在权函数定义时只考虑了灰度差异，而本发明方法结合多示例学习方法可以很快得到图像中的显著区域标记，且每个示例包中的示例特征矢量包含了反映目标信息的底层视觉特征和目标轮廓的中高层特征，在粗化伊始，就考虑了图像的全面特征为后续处理提供了较为准确的分割依据，因此当目标与背景边界过渡缓慢且差异极小的情况，依然能够得到较好的分割结果。对于大多数的测试图像，本发明方法的层次迭代次数少于专利申请号为201210257591.1的方法的迭代次数，大大降低了运算量与时间复杂度。In Fig. 2, the subgraphs (a-1) to (a-4) are the original images, the subgraphs (b-1) to (b-4) are the segmentation results of the spectral segmentation algorithm based on multi-scale graph decomposition, and the subgraphs ( c‐1) to (c‐4) are the methods of the patent application number 201210257591.1, and the subfigures (d‐1) to (d‐4) are the methods of the present invention. It can be concluded from the experimental results that when the background is relatively complex, the spectral segmentation algorithm based on multi-scale graph decomposition has serious misclassification and incomplete target segmentation, while the method with the patent application number 201210257591.1 and the method of the present invention have better Segmentation results; when the background is relatively simple and the characteristics of the target are quite different, such as the original image (a-1) and (a-2), the three methods can all segment a relatively complete target; but in the boundary between the target and the background When the transition is slow and the difference is very small, such as the original image (a‐3) and (a‐4), the three methods all have varying degrees of incomplete target segmentation, but the method of the present invention and the patent application number 201210257591.1 The segmentation effect of the method is better, and the method of the present invention can segment more finely at the junction of the target with minimal difference and the background, and can obtain a more accurate segmentation result of the salient target. The input image of the method whose patent application number is 201210257591.1 is the original image, and the coarsening object starts from the pixel level. Although the pixel-level image is relatively fine, in consideration of the amount of calculation, the graph cut method directly adopts the brightness and color features in the When the weight function is defined, only the gray level difference is considered, and the method of the present invention can quickly obtain the salient area marks in the image in combination with the multi-instance learning method, and the example feature vector in each example bag contains the underlying visual features reflecting the target information At the beginning of roughening, the overall features of the image are considered to provide a more accurate segmentation basis for subsequent processing. Therefore, when the transition between the target and the background boundary is slow and the difference is very small, it can still be obtained. good segmentation results. For most of the test images, the number of hierarchical iterations of the method of the present invention is less than that of the method with patent application number 201210257591.1, which greatly reduces the amount of computation and time complexity.

Claims

1. A target segmentation method based on multi-instance learning and graph cut optimization, characterized in that, comprising the steps:

Step 1: Use the method of multi-instance learning to model the saliency model on the training image, and use the saliency model to predict the packages and examples in the test image, and obtain the saliency detection result of the test image; specifically include:

Step 11, preprocessing the training image, and extracting image brightness gradient features and color gradient features;

Step 12, introducing multi-instance learning into the image saliency detection to obtain the saliency detection result of the test image;

Step 2: Introduce the saliency detection results of the test image into the graph cut framework, optimize the graph cut framework according to the example feature vector and the label of the example package, solve the suboptimal solution of the graph cut optimization, and obtain the accurate segmentation of the target;

Described step 2 specifically comprises the following steps:

Step 21, the saliency detection result of the image obtained in step 1 is used as the input of the graph cut algorithm, and the weight function shown in formula (3) is constructed according to the saliency mark of the package and the example feature vector; and the formula (4) is obtained The optimized graph cut cost function shown;

In formula (3), w _ij represents the similarity of visual features of the regions corresponding to sample package i and package j, Salien(i) and Salien(j) respectively represent the normalized saliency values of region i and region j, δ In order to adjust the sensitive parameter of visual feature difference, the value is 10-20; the similarity weight between region i and itself is 0; the similarity matrix W={w _ij } is a symmetric matrix whose diagonal is 0, and w _ij ∈ [0,1]; f _i , f _j represent the example feature vectors corresponding to i and j example packages respectively, that is, the brightness gradient feature and the color gradient feature vector of the image are synthesized into a 3-dimensional combination vector Mixvector _i ={BrightnessGradient _i , ColorGradient _i }, then Sim(f _i , f _j )＝||Mixvector _i -Mixvector _j || ₂ ; in the graph-cut framework represented by formula (4), D is an N-dimensional diagonal matrix, and the elements on the diagonal U＝{U ₁ ,U ₂ ,...,U _i ,...,U _j ,...U _N } is the segmentation state vector,

Each vector component U _i represents the segmentation state of region i; the numerator of formula (4) represents the distance between region i and region j

The visual similarity of , the denominator represents the visual similarity in region i;

Step 22, solving the segmentation state vector corresponding to the minimum eigenvalue of R(U), that is, obtaining the optimal segmentation result of the image.

2. The target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 1, wherein, in the step 11, the training image is preprocessed, and the brightness gradient feature and the color gradient feature are extracted, specifically comprising :

Step 111, performing color space conversion and quantization preprocessing of each component on the training image to obtain normalized brightness component L and color components a, b;

Step 112, calculating the brightness gradient of each pixel in the matrix of the brightness component L;

Step 113, calculating the color gradient of each pixel in the matrix of color component a and color component b respectively.

3. The target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 2, wherein said step 111 is specifically as follows:

First, gamma correction is performed on the training image to achieve nonlinear adjustment of the image color components, and the training image is converted from the RGB color space to the Lab color space; then the brightness component L of the training image in the Lab color space and the two The color components a and b are normalized to obtain the normalized brightness component L and the color components a and b.

4. The target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 2, wherein said step 112 specifically comprises steps A-D:

A. Construct the weight matrix Wights<> of 3 scales;

B. Construct the index map matrix Slice_map<> of 3 scales; the index map matrix Slice_map<> of each scale Slice_map<> has the same dimension as the weight matrix Wights<> of the corresponding scale; select 8 directions (0 °, 22.5 °, 45 °, 67.5°, 90°, 112.5°, 135°, 157.5°) divide the matrix into 16 areas, and the value of the elements in each area is the same as the number 0~15 of the area;

C. Multiply each index map matrix Slice_map<> with the elements in the weight matrix Wights<> of the corresponding scale one by one to obtain the matrix of the corresponding scale, that is, the neighborhood gradient operator;

D. Using a neighborhood gradient operator to calculate the brightness gradient of a pixel to be calculated in the matrix of the brightness component L.

5. the target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 4, is characterized in that, described step A is specifically as follows:

Construct the weight matrix Wights<> of three scales respectively; the weight matrix Wights<> is a square matrix with the number of rows and columns equal to 2r+1; the elements in the weight matrix Wights<> are either 0 or 1 , the elements equal to 1 are distributed within the range of the disk with the central element (r+1, r+1) as the center of the square matrix and the radius r as the circle, forming the inscribed circle of the square matrix, and the rest of the elements in the square matrix are 0 ; The 3 scales are r=3, r=5 and r=10 respectively.

6. the target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 4, is characterized in that, described step D is specifically as follows:

① For a certain scale, take a pixel to be sought in the matrix of the luminance component L obtained in step 111 as the center, and perform a calculation with each luminance component within the neighborhood of the pixel to be sought through a neighborhood gradient operator of a certain scale. Dot multiplication to obtain the matrix Neibor<> within the neighborhood of the pixel to be sought; select a straight line in the vertical direction (90°) as the dividing line, and divide the disk in the neighborhood gradient operator into a left semicircle and a right semicircle, The left semicircle includes the 0th sector to the 7th sector, and the right semicircle includes the 8th sector to the 15th sector; the elements of the matrix Neibor<> corresponding to each semicircle form a histogram and normalize it, respectively Recorded as Slice_hist ₁ <> and Slice_hist ₂ <>; H ₁ represents the histogram corresponding to the left semicircle area, H ₂ represents the histogram corresponding to the right semicircle area, i is the value of the bin of the histogram, defined as [0 ,24], that is, the brightness range;

②Calculate the difference between the two normalized histograms through the chi-square distance shown in formula (1), that is, obtain the brightness gradient in the vertical direction of a pixel to be obtained at a certain scale;

After calculating the luminance gradient in the vertical direction of a certain scale, select the straight lines in other directions as the dividing line to obtain the luminance gradient in all other directions of a certain scale of the pixel to be obtained; then calculate in the same way as step D Obtain the brightness gradient of the pixel to be requested in all directions on other scales; after completing the calculation of the brightness gradient of the pixel to be requested in all directions of all scales, the final brightness gradient of the pixel to be obtained is calculated by formula (2) :

f(x,y,r,n_ori; r=3,5,10; n_ori=1,2,...8)->Brightness Gradient(x,y) (2)

In the formula, f is a mapping function, (x, y) is any pixel point to be obtained, r represents the selected scale, n_ori represents the selected direction; Brightness Gradient(x, y) is the pixel point (x, y) The final brightness gradient; the corresponding rule of f is to select the maximum brightness gradient value in each direction in 3 scales as the brightness gradient value in this direction, and sum the brightness gradients in 8 directions to obtain the pixel point (x, y) The final brightness gradient of .

7. The target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 2, wherein in the step 113, the calculation of the color gradient is similar to the calculation of the brightness gradient, and the difference is that the color gradient feature is For the color gradients a and b of the two color components; the selected three scales are r=5, r=10 and r=20; therefore, the sizes of the corresponding weight matrix and map index matrix are 11*11, 21*21 and 41*41; the calculation method of the color gradient of the two color components is the same as that of the brightness gradient, and the final color gradient of each pixel to be obtained in the color component a and b matrix is obtained.

8. The target segmentation method based on multi-instance learning and graph cut optimization as claimed in claim 1, wherein in said step 12, multi-instance learning is introduced into image saliency detection to obtain the saliency detection result of the test image, Specifically include step 121 and step 122:

Step 121, using the brightness and color gradient features obtained by the method described in step 11, combined with the multi-instance learning EMDD algorithm to realize the learning of the training set, and obtain a learned saliency detection model;

Step 122, substituting the test image into the learned saliency detection model to obtain a saliency detection result of the test image.