Detailed Description
A method for detecting a salient region in a depth image comprises the following steps:
step 1, aiming at each pixel I in the depth image I
kSeparately extracting gradient features
In a further embodiment, extracting the gradient feature of the depth map specifically includes the following steps:
step 1-1, traversing all pixel points of the depth image I to obtain a gradient vector of each pixel point, and counting the pixel points IkN, N is the total number of pixels, and its gradient vector (dr) isk,dck) The calculation formula is as follows:
drk=(dep(r+1,c)-dep(r-1,c)/2 (1)
dck=(dep(r,c+1)-dep(r,c-1)/2 (2)
wherein r and c correspond to rows and columns of image coordinates, dep (r, c) represents the depth value of the r-th row and c-th column in the depth image I;
step 1-2, traversing all pixel points to obtain the gradient characteristic of each point, and obtaining a pixel point I
kCharacteristic of gradient of
The method specifically comprises the following steps:
where ε is a constant greater than zero and Maximun is specified as
And
is measured.
Step 2, calculating an initial significance value S (I) of each pixel by adopting a global contrast calculation mode according to the gradient features extracted in the step 1k) Obtaining an initial saliency map of the same resolution;
in a further embodiment, an initial saliency value S (I) of each pixel is calculated by a global contrast method according to the gradient characteristicsk) Obtaining an initial saliency map of the same resolution, specifically comprising the following steps:
step 2-1, normalizing two elements of all gradient feature vectors to an interval [0, 255%]Rounding off to obtain integer value and pixel point I
kAfter being normalized, the gradient features of
Thereby the gradient characteristic values of all pixel points
All correspond to [0,255]An integer of up to 256 different values, noted
The same can be obtained
Step 2-2, obtaining characteristic values according to the calculation mode of the global contrast
Corresponding significance values:
where n is 256, the total number of feature values extracted from the depth image, and f
jRepresents
The probability of occurrence in the image is,
is composed of
And
a distance metric function of the two features; for characteristic value
In the same way, the corresponding significance values are obtained:
step 2-3, the corresponding significance values of the pixels with the same characteristic value are also the same, and the pixel I is subjected to
kIf its characteristic value is
The initial saliency value for that pixel is then:
wherein, waAnd wbIs a weight parameter; for each pixel, according to the characteristic value of the pixel, the significance value of the pixel can be obtained, and therefore the initial significance map of the full resolution is obtained.
Step 3, utilizing the histogram statistical characteristics of the depth map to detect wave crests and wave troughs;
in a further embodiment, the peak-valley detection is performed by using the histogram statistical features of the depth map, and the steps are as follows:
step 3-1, dividing the depth values of all pixels in the depth map into 256 intervals, and counting the number of the pixels of the depth values in each interval range to obtain a statistical histogram;
step 3-2, calculating a derivative of the histogram statistic value to obtain the growth rate of each position corresponding to the abscissa of the histogram, and forming a vector alpha ═ alpha1,α2,…,α256};
Step 3-3, taking alphaiSymbol λ of 1,2, …, 256αiAnd forming them into a vector lambda in orderα={λα1,λα2,…,λα256},αiSign of (a)αiThe concrete formula of (1) is as follows:
step 3-4, vector λαCarrying out mean value filtering, and executing the operation of the step 3-3 on the filtered result to obtain a new digital string lambdaβ={λβ1,λβ2,…,λβ256};
Step 3-5, vector lambda is aligned by adopting a template matching modeβCarrying out jump detection; there are 4 types of hopping: [1, -1],[1,0,-1]The jump position is the peak position Pp;[-1,1],[-1,0,1]Corresponding to the position P of the wave trought。
Step 4, estimating a Zero Parallax Area (ZPA) of the depth map;
in a further embodiment, the specific step of estimating the zero-disparity region ZPA of the depth image is as follows:
step 4-1, calculating the median of the depth values in the depth image, i.e.
And 4-2, taking the median as a center, wherein the area within the range of the distance between the front and the back of the center and the distance between the front and the back of the center is ZPA of the scene:
in equation (9), H is the depth-of-field (DOF) of the scene, and σ is the scale parameter.
And 5, dividing a background area and a foreground area in the depth image, adjusting the initial saliency map obtained in the step 2 according to the background area and the foreground area, inhibiting the saliency value of the background area, and obtaining an improved saliency map.
In a further embodiment, the improved saliency map is obtained by the following steps:
step 5-1, determining a depth value corresponding to a peak-valley position which is behind the zero parallax zone ZPA and is closest to the zero parallax zone ZPA, namely a final threshold value T of the background estimation:
T=min(p),st p∈{Pp,Ptand p > ZPA (10)
Step 5-2, in the depth image, taking an area with a depth value larger than a background threshold value T as a background part, and taking a part with a depth value smaller than T as a foreground area, and thus determining whether a pixel at a corresponding position in the saliency map belongs to the background part or the foreground part; suppressing the significance value of the background part in the significance map, and reserving the significance value of the foreground part in the significance map to obtain an improved significance map, wherein the suppression formula of the significance value of the background part is as follows:
in the formula, depkIs a pixel point IkCorresponding to the depth value on the depth image, S (I)k) Is the initial saliency value, S' (I) of the background portionk) The significance value of the background part after inhibition is shown.
Step 6, performing superpixel segmentation on the original image by adopting a superpixel segmentation algorithm, and then optimizing the saliency map obtained in the step 5 to obtain a final saliency area;
in a further embodiment, the step of optimizing the saliency map in step 5 based on the super-pixel pairs is as follows:
step 6-1, initializing a clustering center: setting the number of superpixels as C, length in two-dimensional space
For the interval, periodically sampling the depth image, taking each sampling point as an initial clustering center, setting the category labels of all the pixels of the initial clustering centers to be 1,2, … and C, setting the category labels of all the pixels of the non-clustering centers to be-1, setting the distance between the pixels of the non-clustering centers and the clustering centers to be infinite, and setting N to be the total number of the pixels in the whole depth image;
step 6-2, for each clustering center IcC, respectively calculating the cluster center and each pixel point I in the 2s × 2s neighborhood search range of the cluster centeri1, 2., a distance of 2s × 2s, the distance calculation formula is as follows:
wherein depcFor clustering central pixel point IcDepth value of uc,νcIs IcAbscissa and ordinate in the image; depiIs a pixel point IiDepth value of ui,νiIs IiThe horizontal and vertical coordinates in the image, m is the compactness adjusting parameter of the super pixel;
each non-clustering center pixel point is searched by a plurality of surrounding clustering center points, the clustering center corresponding to the minimum distance value is taken as the clustering center of the pixel point, and the clustering center is set as a category label same as the clustering center, so that a super-pixel segmentation result is obtained;
6-3, calculating the depth mean value and the horizontal and vertical coordinate mean values of the pixel points in each super pixel, taking the depth mean value and the coordinate mean value of each super pixel as a new clustering center of the super pixel, and repeating the step 6-2 until the clustering center to which each pixel point belongs does not change any more;
step 6-4, counting the number of pixels contained in each super pixel, and merging the number of pixels with the super pixel with the nearest coordinate position in the adjacent super pixels when the number of pixels is smaller than a set minimum value e; after combination, all the super-pixels R are obtainedcC ═ 1,2,. C ', where C' ≦ C;
6-5, according to the super pixel RcOptimizing the significance result obtained in step 5, i.e. if Ik∈RcThen the pixel IkFinal significance value of S ″ (I)k) Comprises the following steps:
wherein, | RcIs at the super-pixel RcThe number of pixels contained in (a).
The present invention is further illustrated by the following specific examples.
Example 1
As shown in fig. 1, a method for detecting a salient region in a depth image includes the following steps:
step 1, for a depth image I, for each pixel I in the image
kSeparately extracting gradient features
The original image is shown in fig. 2 (a), and the depth image is shown in fig. 2 (b);
step 1-1, traversing all pixel points of the depth image I to obtain a gradient vector of each pixel point, and counting the pixel points IkN, N is the total number of pixels, and its gradient vector (dr) isk,dck) The calculation formula is as follows:
drk=(dep(r+1,c)-dep(r-1,c)/2 (1)
dck=(dep(r,c+1)-dep(r,c-1)/2 (2)
wherein r and c correspond to rows and columns of image coordinates, dep (r, c) represents the depth value of the r-th row and c-th column in the depth image I;
step 1-2, traversing all pixel points to obtain the gradient characteristic of each point, and obtaining a pixel point I
kCharacteristic of gradient of
The method specifically comprises the following steps:
wherein ε is 0.02 and Maximun is GaAnd GbMaximum value of (a), maximum 600 in this example;
step 2, calculating an initial significance value S (I) of each pixel by adopting a global contrast calculation method according to the gradient characteristics in the step 1k) Obtaining an initial saliency map of the same resolution, specifically comprising the following steps:
step 2-1, normalizing two elements of all gradient feature vectors to an interval [0, 255%]Rounding off to obtain integer value and pixel point I
kAfter being normalized, the gradient features of
Thereby the gradient characteristic values of all pixel points
All correspond to [0,255]An integer of up to 256 different values, noted
The same can be obtained
Step 2-2, obtaining characteristic values according to the calculation mode of the global contrast
Corresponding significance values:
where n is 256, the total number of feature values extracted from the depth image, and f
jRepresents
The probability of occurrence in the image is,
is composed of
And
a distance metric function of the two features; for characteristic value
In the same way, the corresponding significance values are obtained:
step 2-3, the corresponding significance values of the pixels with the same characteristic value are also the same, and the pixel I is subjected to
kIf its characteristic value is
The initial saliency value for that pixel is then:
wherein, waAnd wbAre weight parameters, all set to 0.5; for each pixel, according to the characteristic value of the pixel, the significance value of the pixel can be obtained, and therefore an initial significance map of the full resolution is obtained;
step 3, utilizing the histogram statistical characteristics of the depth map to detect wave crests and wave troughs, and comprising the following steps:
step 3-1, dividing the depth values of all pixels in the depth map into 256 intervals, and counting the number of the pixels of the depth values in each interval range to obtain a statistical histogram;
step 3-2, calculating a derivative of the histogram statistic value to obtain the growth rate of each position corresponding to the abscissa of the histogram, and forming a vector alpha ═ alpha1,α2,…,α256};
Step 3-3, taking alpha according to a formula (8)iSign of (a)αiAnd forming them into a vector lambda in orderα={λα1,λα2,…,λα256}:
Step 3-4, vector λαCarrying out mean value filtering, repeating the operation of the step 3-3 once on the filtered result to obtain a new digital string lambdaβ={λβ1,λβ2,…,λβ256};
Step 3-5, vector lambda is matched in a template matching modeβCarrying out jump detection; there are 4 types of hopping: [1, -1],[1,0,-1]The jump position is the peak position Pp;[-1,1],[-1,0,1]Corresponding to the position P of the wave trought。
Step 4, estimating a Zero Parallax Area (ZPA) of the depth image, which comprises the following steps:
step 4-1, calculating the median of the depth values in the depth image, i.e.
And 4-2, determining areas around the median as ZPA of the scene:
in equation (9), H is the depth-of-field (DOF) of the scene, and σ is 0.1, which is a scale parameter.
Step 5, dividing a background area and a foreground area in the depth image, adjusting the saliency map obtained in the step 2 according to the background area, suppressing the saliency value of the background area, and obtaining an improved saliency map, wherein the steps are as follows:
step 5-1, determining a depth value corresponding to a peak-valley position which is behind the zero parallax zone ZPA and is closest to the zero parallax zone ZPA, namely a final threshold value T of the background estimation:
T=min(p),st p∈{Pp,Ptand p > ZPA (10)
Step 5-2, in the depth image, taking an area with a depth value larger than a background threshold value T as a background part, and taking a part with a depth value smaller than T as a foreground area, and thus determining whether a pixel at a corresponding position in the saliency map belongs to the background part or the foreground part; suppressing the significance value of the background part in the significance map, and reserving the significance value of the foreground part in the significance map to obtain an improved significance map, wherein the suppression formula of the significance value of the background part is as follows:
in the formula, depkIs a pixel point IkCorresponding to the depth value on the depth image, S (I)k) Is the initial saliency value, S' (I) of the background portionk) For background partial significance value after suppression
And 6, further improving and obtaining a final significant region detection result by adopting a method based on super-pixel division, wherein the steps are as follows:
step 6-1, initializing a clustering center: setting the number of superpixels as the number of superpixels C1600 and the length in the two-dimensional space in the whole depth image
Periodically sampling the depth image for intervals, taking each sampling point as an initial clustering center, setting the class labels of all pixels in the initial clustering centers to be 1,2, … and C, setting the class labels of all pixels in non-clustering centers to be-1, setting the distance between the pixels and the clustering centers to be infinite, setting N to be the total number of pixels in the whole depth image, and for a typical depth image with the resolution of 640 multiplied by 480, corresponding interval length to the depth image is equal to
Step 6-2, for each clustering center IcC, calculating the cluster center and each pixel point I in the 28 × 28 neighborhood search range respectivelyi1, 2., a distance of 2s × 2s, the distance calculation formula is as follows:
wherein depcFor clustering central pixel point IcDepth value of uc,νcIs IcAbscissa and ordinate in the image; depiIs a pixel point IiDepth value of ui,νiIs IiAbscissa and ordinate in the image; the compactness adjusting parameter m of the super pixel is 40;
each non-clustering center pixel point is searched by a plurality of surrounding clustering center points, the clustering center corresponding to the minimum distance value is taken as the clustering center of the pixel point, and the clustering center is set as a category label same as the clustering center, so that a super-pixel segmentation result is obtained;
6-3, calculating the depth mean value and the horizontal and vertical coordinate mean values of the pixel points in each super pixel, taking the depth mean value and the coordinate mean value of each super pixel as a new clustering center of the super pixel, and repeating the step 6-2 until the clustering center to which each pixel point belongs does not change any more; in the embodiment, 10 iterations can obtain ideal effects on most pictures, so that 10 iterations are selected;
step 6-4, setting the minimum value e of the number of pixels contained in the superpixel to be 20, and combining a morphological region smaller than e with a neighborhood thereof; after combination, all the super-pixels R are obtainedcC ═ 1,2,. C ', where C' ≦ C;
6-5, according to the super pixel RcOptimizing the significance result obtained in step 5, i.e. if Ik∈RcThen the pixel IkFinal significance value of S ″ (I)k) Comprises the following steps:
wherein, | RcIs at the super-pixel RcThe number of pixels contained in (a). As shown in fig. 2 (c), the closer the area in the graph is to white, the higher the saliency value of the area is, and the closer to black, the lower the saliency value is.
The invention adopts the gradient characteristic of the depth value as the calculation basis of the overall contrast, reduces noise interference, is not influenced by the change of the range of the depth value, and improves the accuracy of the detection result; by using a background area division method, the significance of a background and a highlighted target area is effectively inhibited, and the stability of a detection result is improved; the saliency of the foreground part is optimized by dividing the superpixels, so that a saliency map of a full resolution is obtained through calculation, a reliable saliency region is provided for target detection, target identification, scene understanding and the like, and the acquisition capability of an image region of interest is improved.