CN107103326B

CN107103326B - Collaborative significance detection method based on super-pixel clustering

Info

Publication number: CN107103326B
Application number: CN201710283829.0A
Authority: CN
Inventors: 刘纯平; 朱桂墘; 季怡; 邢腾飞; 万晓依; 王大木
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2020-06-02
Anticipated expiration: 2037-04-26
Also published as: CN107103326A

Abstract

The invention discloses a cooperative significance detection method based on superpixel clustering, which is characterized in that a superpixel pyramid is constructed, superpixel blocks are used for replacing common pixel points, cooperative significance calculation is accelerated, meanwhile, the superpixel pyramid is constructed to obtain characteristic information on different scales and ensure the accuracy of the boundary of a cooperative significant target, on the basis, the superpixel blocks are further classified by using a clustering method, the calculation time of the cooperative significance is further accelerated, and finally, a final cooperative significance map is obtained by using a method for fusing the cooperative significance map and a significance map, so that the accuracy of the cooperative significant target is ensured. The boundary contour positioning of the obvious target obtained by the method is more accurate, and has certain advantages in time and accuracy.

Description

Collaborative significance detection method based on super-pixel clustering

Technical Field

The invention relates to an image detection processing method, in particular to an image collaborative saliency detection method, which is used for detecting a common saliency region in a plurality of images.

Background

Saliency detection is the rapid detection of objects of interest in an image or video by simulating the visual attention mechanism of the human eye, while the purpose of cooperative saliency detection is to detect regions of the same or similar saliency in multiple images or videos. The method has wide application value in many fields, such as collaborative segmentation, video foreground detection, image retrieval, target tracking and the like. In recent years, with the rapid development of technologies such as internet and multimedia, a collaborative saliency detection technology for searching for the same or similar saliency objects from multiple images or multiple videos has become a new requirement, and the saliency of the background or the interference in each image can be better suppressed by effectively utilizing the advantages of the multiple images. Multiple related images within a group of images contain more abundant and useful information than a single image. However, the processing efficiency of the multi-image cooperative saliency detection also puts forward a higher requirement than that of single-image saliency detection, and an accurate and quick multi-image cooperative saliency detection method is urgently needed.

In the prior art, methods for detecting the cooperative significance mainly comprise two types, one is to firstly detect the significance of a single image and then calculate the cooperative significance on the basis. The requirement of the method for detecting the significance is high, and if the significant area is positioned wrongly, the collaborative significant area is definitely wrongly positioned. And if clustering judgment is carried out on each pixel point of the original image, the time consumption caused by clustering can be continuously increased along with the increase of the number of the images. The other method is a learning-based cooperative significance detection method, and the cooperative significance detection problem is regarded as a classification problem of each pixel point or area. These methods first provide an initial estimate using any existing unsupervised saliency detection method, and then design an iterative self-learning scheme to learn the appearance of the co-saliency objects and gradually adjust the co-saliency map. However, the huge computational load of the deep learning framework deviates from the research purpose of cooperative significance, and therefore further optimization in time complexity is required.

The super-pixel segmentation method is to group pixels by using the similarity of features between the pixels, so that redundant information of an image can be acquired, and complex graphs of subsequent image processing tasks are reduced to a great extent. By applying the super-pixel segmentation method to the cooperative significance detection, the calculation time can be reduced. However, when the current super-pixel segmentation method is combined into the cooperative saliency detection, the content perception is not available, and the super-pixel segmentation is not performed on multiple scales, so that the boundary contour of a salient object is difficult to accurately position.

Therefore, how to keep high accuracy and ensure low calculation cost is a problem which needs to be solved urgently by cooperative significance detection at present.

Disclosure of Invention

The invention aims to provide a cooperative significance detection method based on super-pixel clustering, which reduces the calculation time while ensuring the significance detection accuracy so as to adapt to the requirement of cooperative significance detection.

In order to achieve the purpose, the invention provides a cooperative significance detection method based on superpixel clustering, which comprises the steps of constructing a superpixel pyramid, using superpixel blocks to replace common pixel points, accelerating cooperative significance calculation, simultaneously constructing the superpixel pyramid to obtain characteristic information on different scales, and ensuring the accuracy of the boundary of a cooperative significant target.

The technical scheme is as follows: a collaborative significance detection method based on super-pixel clustering comprises the following steps:

(1) constructing a super-pixel pyramid image:

(1a) inputting original image group data, and constructing a three-layer Gaussian pyramid, wherein a first layer image is obtained by performing Gaussian smoothing on an original image, a second layer image is obtained by performing Gaussian smoothing after downsampling on the first layer image, and a third layer image is obtained by performing Gaussian smoothing after downsampling on the second layer image;

(1b) performing superpixel segmentation on each layer of image by using a superpixel segmentation method based on content perception, and dynamically setting the number of superpixel blocks and the number of superpixel blocks according to the size of the image for the image with the length and width of width and height pixels

Obtaining a pyramid image of the super-pixel;

the super-pixel segmentation method based on content perception (MSLIC) can be found in the literature: liu YJ, Yu C, Yu M J, et al, Manifold slic: A fast method to Computer content-sensitive superpixels [ A ]. Ruzena Bajcs.

(2) Calculating a single saliency map:

(2a) calculating a weakly significant map

The N superpixel blocks obtained by dividing the superpixels are expressed as: { c _i1, 2.. N, supposing a super pixel at an edge region as a background, expressed as: { n_j},j＝1,2,...,N_BIn which N is_BThe number of superpixels located in the image edge region, and then the characteristics of each superpixel block are respectively calculated: dark channel values, center prior weights and color characteristics;

the dark channel calculation formula is as follows:

wherein

Indicating area c_iNumber of covered pixels, S_d(p) is a calculated function of dark channel values, as follows:

wherein I^ch(q) represents a color value of q within the corresponding channel;

the weak significance detection model is realized by the following formula:

wherein d is_k(c_i,n_j) Indicating area c_iAnd n_jThe Euclidean distance between the features corresponding to k, three features are RGB (F)₁)、CIELab(F₂) And LBP (F)₃) Characteristic of g (c)_i) For central prior weighting, by superpixel c_iThe normalized spatial distance between the center of the image and the center of the image is calculated;

obtaining a significant value of each region according to the formula (3), and assigning the significant value of each region to all pixels in the region to obtain a weak significant image;

(2b) training strong significance detection model

Taking a plurality of support vector machine classifiers with single kernel and single characteristics as weak classifiers, and obtaining a strong classifier by adopting an Adaboost reinforcement learning method through repeated iterative learning;

support vector machine kernel function with multiple single cores and single features

The linear sum of (d) is expressed as:

where r denotes training samples, r_iFor the ith sample, the training set sample set is derived from the weak saliency map of the previous step, β_mRepresenting the weight of the corresponding kernel function, wherein M represents the number of weak classifiers, obtaining a strong classifier through multiple iterations, applying the strong classifier to all test samples of the current image, and finally predicting to obtain a single saliency map;

(3) superpixel block clustering

Selecting RGB, CIELab and Gabor characteristics as clustering characteristics, and clustering all superpixel blocks of all images in the group, wherein the clustering method is a K-Means clustering method;

(4) calculating synergistic significance

Clustering the M images to obtain K classes, and recording the K classes

Cluster center is noted as

By calculating contrastDescribing the cooperative significance by measuring, repetition rate measuring and position measuring, and obtaining a weak cooperative significance map according to the product of 3 measuring;

(5) fusing:

performing multi-scale fusion; and multiplying the weak saliency map and the weak synergetic saliency map to obtain a fused saliency map.

In the above technical solution, in the step (1a), during the down-sampling, the pixels in the x and y directions of the image are respectively adjusted to be half of the original size.

In step (3), the number of clusters K is min (max (2 × N)_img,10),30)。

In the step (3), the method for acquiring the Gabor characteristics comprises the following steps: firstly, Gabor characteristics in 8 directions are obtained, the bandwidth is 1, then the 8 direction characteristics are combined to be used as the Gabor characteristics, and the combination method is linear addition combination.

In the step (4), the step (c),

the contrast measure calculates the characteristic distance of a class from all other classes, class C_kThe contrast measure calculation formula is as follows:

where N is the sum of the super-pixel blocks of all images, N_iAs a cluster C_iThe number of pixels of (a);

the position measure calculates the normalization of the distance to the center point for all superpixel blocks of each class, the formula is as follows:

wherein n is_kNumber of superpixel blocks, N, representing class k_jIs the number of superpixel blocks on the jth graph,

is the position of the center point of the ith super-pixel block in the class on the jth image, o^jIs the center position of the jth image,

refers to a super pixel

δ (·) is a kronecker function, returns 1 if the two numbers are equal, otherwise returns 0;

the distribution of the superpixel blocks in a category on all images is calculated by using the repetition rate measure, and the histogram of M bins is adopted

To describe the cluster C_kRepetition rate measure in M images:

in the step (5), the multi-scale fusion adopts a weighted fusion method, and the results of the 1 st layer, the 2 nd layer and the 3 rd layer are fused with the weight of 5: 3: 2.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

1. the invention provides a multi-scale saliency detection method based on content perception, which is used for fusing an obtained saliency map and a weak synergy saliency map to obtain a final synergy saliency region.

2. According to the method, the images are segmented by using a content perception-based superpixel segmentation method, a superpixel pyramid is constructed, all the images are clustered on each scale, so that each pixel point obtains global relevance, the cooperative significance is calculated according to contrast measure, unknown measure and repetition rate measure, the obtained weak cooperative significance map and the significance map are fused to obtain a final cooperative significance map, and experimental results prove that the provided detection method has certain advantages in time and accuracy.

Drawings

FIG. 1 is a method framework diagram of an embodiment of the invention;

FIG. 2 is a comparison graph of significance detection on Image Paris in the examples;

FIG. 3 is a comparison graph of the synergistic significance detection on Image Paris in the examples;

FIG. 4 is a comparison of the 4 significance methods in the examples;

FIG. 5 is a PR plot on an iCoseg dataset for 3 synergistic significance detection methods;

fig. 6 is a PR map of 3 synergistic significance detection methods on an Image Pairs dataset.

Detailed Description

The invention is further described with reference to the following figures and examples:

the first embodiment is as follows: a collaborative significance detection method based on super-pixel clustering is provided, and the framework is shown in figure 1. By constructing the superpixel pyramid, the method tries to use the superpixel blocks to replace common pixel points, so that the cooperative significance calculation is accelerated, and meanwhile, the construction of the superpixel pyramid can obtain feature information on different scales, so that the accuracy of the boundary of the cooperative significant target is ensured. In addition, the super-pixel blocks are further classified by using a clustering method, and the calculation time of the cooperative significance is further accelerated. And finally, a final cooperative saliency map is obtained by using a method of fusing the cooperative saliency map and the saliency map, so that the accuracy of the cooperative saliency target is ensured. The method comprises the following specific steps: constructing a super-pixel pyramid, calculating a single saliency map, clustering super-pixel blocks, and calculating the cooperative saliency and fusion.

1. Constructing a super-pixel pyramid

The super-pixel segmentation can segment an image into a plurality of super-pixel blocks, and the super-pixel blocks have similar characteristic expressions, so that the super-pixel blocks are used for replacing common pixel points to perform significance calculation or collaborative significance calculation, not only can calculation errors not be caused, but also the calculation speed can be increased. The super-pixel pyramid construction method comprises the following 2 steps:

(1) constructing a Gaussian pyramid

When a Scale-invariant feature transform (SIFT) algorithm is calculated, a Gaussian pyramid is constructed to acquire feature information on different scales, and the method is verified. Therefore, the method constructs a three-layer Gaussian pyramid for each image. The first layer image is obtained by performing Gaussian smoothing on the original image, so that some noise points in the image are eliminated, and more accurate boundaries can be obtained in the following super-pixel segmentation stage. The second layer is obtained by the first layer of downsampling and then Gaussian smoothing, the second layer of downsampling and then Gaussian smoothing are used as a third layer of image, and the image is downsampled to be half of the original size.

(2) Superpixel segmentation

The super-pixel segmentation method based on content perception is utilized to perform super-pixel segmentation on each layer of image, and compared with the common super-pixel segmentation method, the super-pixel segmentation method has the advantages that: content awareness and computation speed. The method is a new super-pixel segmentation method proposed by CVPR in 2016, and can be used for obtaining more detailed regions with more information, and the segmentation time is faster than that of other methods, so that MSLIC is selected as the super-pixel method. The number of super pixel blocks is dynamically set according to the size of the image

The width and height of the image are the width and height of the image, and multiple experiments prove that the super-pixel block is selected to have better effect on time and precision. And obtaining a super-pixel pyramid.

2. Computing a single saliency map

The method comprises the steps of calculating a single saliency map by using a multi-scale saliency detection method based on content perception, and obtaining more superpixel blocks in a region with more contents by using a superpixel segmentation method with content perception, so that the calculation of the cooperative saliency of different positions in the region is facilitated, and meanwhile, the segmentation speed is increased by using the segmentation method; in addition, the super-pixel pyramid is beneficial to obtaining multi-scale information and has a great guiding effect on calculating the cooperative significance, and the significance of each layer can be calculated independently by the method, and finally, the multi-layer significance map is fused. The process of calculating the saliency map of each layer is mainly divided into 4 steps:

(1) calculating a weakly significant map

The N superpixel blocks obtained by dividing the superpixels are expressed as: { c _i1, 2. Supposing the super-pixel in the edge area as background, the expression is: { n_j},j＝1,2,...,N_BIn which N is_BIs the number of superpixels located at the edge region of the image. The characteristics of each superpixel block are then computed separately: dark channel values, central prior weights and color characteristics.

The dark channels are mainly created by some colored or dark objects and shadows, which Tong et al find to be also a significant object. The calculation formula is as follows:

wherein

wherein I^ch(q) represents the color value of q within the corresponding channel.

The weak significance detection model is realized by the following formula:

wherein d is_k(c_i，n_j) Indicating area c_iAnd n_jThe Euclidean distance between the features corresponding to k is F₁(RGB)、F₂(CIELab) and F₃(LBP) characteristics. Furthermore, g (c)_i) For central prior weighting, by superpixel c_iIs normalized to the center of the imageAnd (5) calculating to obtain. The saliency value of each region can be obtained from equation (3), and the saliency value of each region is assigned to all pixels in the region.

(2) Training strong significance detection model

The invention takes a plurality of single-core single-feature SVM (Support Vector Machine) classifiers as weak classifiers, and obtains a strong classifier by adopting an Adaboost reinforcement learning method through repeated iterative learning. Multiple SVM kernel function

The linear sum of (d) can be expressed as:

where r denotes training samples, r_iFor the ith sample, the training set sample set is derived from the weakly significant map of the previous step β_mAnd (3) representing the weight of the corresponding kernel function, M representing the number of weak classifiers, and taking 12 (the feature number is 3, and the kernel function number is 4) through verification to achieve the best effect. And obtaining a strong classifier through multiple iterations, wherein the classifier can be directly applied to all test samples of the current image, and finally predicting to obtain a single saliency map.

3. Superpixel block clustering

RGB, CIELab and Gabor characteristics are selected as clustering characteristics, and 7 dimensions are provided. The two color features are used because the inventors have found that the two color spaces play a complementary role when performing significance detection. The process of obtaining the Gabor characteristics is as follows: the 8 directional Gabor features, bandwidth 1, are then combined as texture features, here using a simple linear additive combination method. In order to obtain global relevance, the method clusters all superpixel blocks of all images in a group, and the superpixel blocks with close characteristics are clustered into the same class, wherein a K-Means clustering method is used. For the clustering number, in order to avoid that the clustering number is too much or too little to affect the clustering result, the formula K is min (max (2 × N)_tmB10), 30) are calculated.

4. Calculating synergistic significance

Clustering the M images to obtain K classes, and recording the K classes

Cluster center is noted as

The classes are from different images, and the super-pixel blocks in different images are clustered together, which shows that the super-pixel blocks have global relevance, but not necessarily are collaborative significant areas, and may be background areas or other relevant areas which are wanted to be relevant. Next, the cooperative significance of each class needs to be calculated, and the method describes the cooperative significance by calculating a contrast measure, a repetition rate measure and a position measure. The contrast measure and the position measure are used for finding a foreground region of an image, and the foreground region is found to have the characteristic of high contrast through the research of significance detection. The repetition rate measure describes the frequency with which a class appears repeatedly on multiple images and is an important global property reflecting the synergistic significance. It is particularly well understood that the more evenly a class is distributed in the image, the more synergistic it is said to be. The weak saliency map is obtained from the product of these 3 measures.

where N is the sum of the super-pixel blocks of all images, N_iAs a cluster C_iThe number of pixels.

The position measure is normalized by calculating the distance to the center point for all superpixel blocks of each class, and the formula is as follows:

refers to a super pixel

δ (-) is a kronecker function, which returns 1 if the two numbers are equal, and 0 otherwise.

The method calculates the distribution of the superpixel blocks in a category on all images to express the measure of the repetition rate, and firstly adopts the histogram of M bins

To describe the cluster C_kRepetition rate measure in M images:

in order to avoid errors caused by a denominator of 0, a 1 is added.

5. Fusion

The fusion comprises multi-scale fusion and fusion of a saliency map and a cooperative saliency map, wherein the multi-scale fusion is used for fusing multi-scale information, and a more accurate salient target region and a more accurate cooperative salient region can be obtained. Since the influence weights of information on different scales are different, a weighted fusion method is adopted, and the method adopts 5: 3: specific gravity of 2 fuses the results of layers 1,2 and 3. In order to further accurately cooperate with the salient region, the noise region of the weak salient image is removed by a method of multiplying the salient image and the weak cooperation salient image.

The method of this example was demonstrated using the data set: MSRA, Image Pair and iCoseg. MSRA is the largest image library containing pixel level truth values in the significance detection field at present, and verification on the library shows that the significance detection effect of the invention has certain superiority. The latter two data sets are the two most commonly used data sets for the cooperative significance detection at present, each group in the ImagePair data set only comprises two images, and each group in the iCoseg data set has 5-20 unequal images, so that the calculation requirement is high. Experiment hardware environment: window 7, Core i7 processor, with a master frequency of 3.4G and memory of 8G. The code execution environment is: matlab 2013 a.

1. Qualitative analysis

The comparison of the algorithm of the present invention with CCD (Cluster-Based Co-robust Detection, 2013), BL (Bootstrap Learning, 2015) and ELD (2016, Encoded Low Level) shows the results in FIG. 2. From the figure, it can be found that the method of the embodiment can well find the salient object, and can eliminate the noise area, and finally keep the cleaner salient area, which provides great help for eliminating the noise of the weak synergetic salient figure later.

Since the currently found method for detecting the cooperative significance of the source code only includes CCD and CMIP (2011, Co-business Model of Image Pairs), and CMIP can only detect the cooperative significance of a Pair of images, the method only uses these two methods to compare with the method provided by the present invention in Image Pair, and there are two sets of images in fig. 3, where the 1 st line is an input Image, the 2 nd line is a truth diagram, the 3 rd and 4 th lines are CCD and CMIP methods, and the 5 th line is the method of the present invention, so that it can be seen that the method of the present invention is closest to a true value diagram.

2. Quantitative assessment

The method of the present invention is evaluated here using an accuracy-recall (P-R) curve. The process of computing the P-R curve is also known as fixed threshold segmentation. Firstly, quantizing the saliency map to [0,255], setting a threshold value every 5 numerical values, and dividing the saliency map to be detected according to the threshold values to obtain 52 binary maps. Referring to the truth map, the average accuracy and recall values of all the pictures in the tested database were calculated to obtain 52 sets of P-R values, and thus a P-R curve could be drawn. Fig. 4 is a comparison of the 4 significance methods, and it can be seen that this method works best in these 4 methods, but only a small improvement is achieved for the BL method.

Fig. 5 is a PR diagram of 3 kinds of cooperative significance detection methods on an iCoseg dataset, where Ours-SLIC and Ours-MSLIC are cooperative significance detection models that respectively adopt SLIC (single linear iterative computing) and MSLC (modeled SLIC) as a superpixel segmentation algorithm in the model proposed by the present invention, and MSLC is a superpixel segmentation algorithm with content perception, and it can be seen from the diagram that the superpixel segmentation algorithm with content perception improves the cooperative significance detection effect. Fig. 6 is a PR diagram of 3 synergistic significance detection methods on an Image Pairs dataset, and it can be seen from the PR diagram that the model provided by the invention has greater advantages compared with the other two models.

Another advantage of the present invention is that the speed of calculating the synergistic significance is fast, and according to the following table, it is found that the method of the present invention calculates the synergistic significance on Image Pairs and iCoseg most quickly, and the data provided in the paper is not contrasted because other methods do not provide source codes, but the data provided in the paper are obtained by running in different environments.

The table calculates the average time(s) of the synergistic significance of each image

	CMIP	CCD	The true bookExamples of the embodiments
				Image Pairs	0.58	0.43	0.32
iCoseg	--	1.07	0.65

Claims

1. A collaborative significance detection method based on super-pixel clustering is characterized by comprising the following steps:

(1) constructing a super-pixel pyramid image:

(1b) performing superpixel segmentation on each layer of image by using a superpixel segmentation method based on content perception, wherein the length and the width are respectivelywidthAndheightthe number of super pixel blocks is dynamically set according to the size of the image

Obtaining a pyramid image of the super pixel;

(2) calculating a single saliency map:

(2a) calculating a weakly significant map

The N superpixel blocks obtained by dividing the superpixels are expressed as:

will be in the edge areaThe pixel is assumed to be background and is represented as:

wherein

The number of superpixels located in the image edge region, and then the characteristics of each superpixel block are respectively calculated: dark channel values, center prior weights and color characteristics;

the dark channel calculation formula is as follows:

（1）

wherein

Representing superpixel blocks

The number of pixels to be covered is,

is a calculated function of the dark channel values as follows:

（2）

wherein

Representing a color value of q within a corresponding channel;

the weak significance detection model is realized by the following formula:

（3）

wherein

Representing superpixel blocks

And

and the Euclidean distance between the features corresponding to the k is as follows: f₁Is an RGB feature, F₂Is a CIELab feature, F₃It is the feature of the LBP that,

for central prior weighting, by superpixel blocks

The normalized spatial distance between the center of the image and the center of the image is calculated;

(2b) training strong significance detection model

The linear sum of (d) is expressed as:

(4)

wherein, r represents a training sample,

for the ith sample, the training set sample set is obtained from the weak saliency map of the last step,

representing the weight of the corresponding kernel function, wherein M represents the number of weak classifiers, obtaining a strong classifier through multiple iterations, applying the strong classifier to all test samples of the current image, and finally predicting to obtain a single saliency map;

(3) superpixel block clustering

(4) calculating synergistic significance

Clustering the M images to obtain K classes, and recording the K classes

Cluster center is marked as

Describing the cooperative significance by calculating contrast measure, repetition rate measure and position measure, and obtaining a weak cooperative significance map according to the product of 3 measures;

(5) fusing:

2. The method of claim 1, wherein the method comprises: in the step (1a), during the down-sampling, the pixels in the x and y directions of the image are respectively adjusted to be half of the original size.

3. The method of claim 1, wherein the method comprises: in step (3), the number of clusters

。

4. The method of claim 1, wherein the method comprises: in the step (3), the method for acquiring the Gabor characteristics comprises the following steps: firstly, Gabor characteristics in 8 directions are obtained, the bandwidth is 1, then the 8 direction characteristics are combined to be used as the Gabor characteristics, and the combination method is linear addition combination.

5. The method of claim 3, wherein the method comprises: in the step (4), the step (c),

the contrast measure calculates the characteristic distance between a class and all other classes

The contrast measure calculation formula is as follows:

(5)

where N is the sum of the super-pixel blocks of all images,

for clustering

The number of pixels of (a);

(6)

wherein

Number of superpixel blocks, N, representing class k_iIs the number of super pixel blocks on the ith map,

is the position of the center point of the ith superpixel block in the class on the jth image,

is the center position of the j-th image,

refers to a super pixel

Is stored in the memory,

is a kronecker function, returns 1 if the two numbers are equal, otherwise returns 0;

To describe clustering

Repetition rate measure in M images:

（7）

（8）。

6. the method of claim 1, wherein the method comprises: in the step (5), the multi-scale fusion adopts a weighted fusion method, and the results of the 1 st layer, the 2 nd layer and the 3 rd layer are fused with the weight of 5: 3: 2.