CN109887008B - Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity - Google Patents
Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity Download PDFInfo
- Publication number
- CN109887008B CN109887008B CN201811016383.6A CN201811016383A CN109887008B CN 109887008 B CN109887008 B CN 109887008B CN 201811016383 A CN201811016383 A CN 201811016383A CN 109887008 B CN109887008 B CN 109887008B
- Authority
- CN
- China
- Prior art keywords
- pixel point
- value
- pixel
- confidence
- parallax
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses aBased on the forward and backward smoothingO(1) The complexity parallax stereo matching method is used for respectively smoothing the left and right eye images in the forward and backward directions, and combining color and gradient information to construct a cost calculation function and calculate a cost value. In a cost aggregation stage, respectively constructing minimum spanning trees for the smoothed left and right eye images, performing cost aggregation on cost function values, obtaining initial parallax by adopting a WTA (WTA) strategy, judging stable and unstable points through left and right consistency detection and obtaining initial parallax confidence, filling holes in the unstable points to obtain an initial parallax image, obtaining a mixed weight value by combining color information of the left image and the initial parallax image, and performing confidence aggregation on the confidence value by adopting a horizontal tree structure on the basis of the initial parallax confidence and the mixed weight value; carrying out belief propagation on the belief aggregation value to obtain an optimal parallax estimation value so as to obtain a final dense parallax image; the invention effectively improves the accuracy and efficiency of stereo matching.
Description
Technical Field
The invention belongs to the technical field of image processing, and relates to a parallax stereo matching method based on forward and backward smoothness and O (1) complexity.
Background
The stereo matching algorithm has a wide application range in computer vision, such as 3D reconstruction, image focusing, etc., but still has many challenging problems. The main work of stereo matching is to find corresponding image point pairs in an image, and the method comprises the following four steps: matching cost calculation, cost aggregation, disparity calculation and disparity refinement. Algorithms are generally classified into global and local algorithms.
The objective of the global algorithm is to minimize the energy function of the matching problem, which includes data items and smoothing items, and when there is a large difference in disparity values of neighboring nodes, the smoothing items serve as penalty factors. The global algorithm mainly comprises methods such as dynamic planning, belief propagation and graph cutting. The global method has better robustness in a non-texture area, is not easily influenced by noise, and obtains a disparity map more accurately. However, such methods are computationally complex and not suitable for real-time applications.
Compared with a global method, the local algorithm is sensitive to noise, has low accuracy, consumes less time and has high efficiency. The difficulty with local algorithms is the choice of cost functions and windows. The conventional method of calculating the cost function includes: mutual information, Absolute Difference (AD), Squared Difference (SD), Census transform, and the like. Common local windows include a cross window, an adaptive window, and the like.
The non-local stereo matching algorithm based on the Minimum Spanning Tree (MST) carries out cost aggregation in the whole image without the limitation of a window, all pixel points can support corresponding weights of other points, the accuracy is higher than that of a local algorithm, and the operation efficiency is higher than that of a global algorithm.
At present, many researches are based on two parts of matching cost calculation and cost aggregation, and the complexity is high in the parallax refinement stage, so that the method becomes a difficulty in a stereo matching algorithm. The traditional parallax refinement method comprises left-right consistency detection, hole filling and median filtering, and the steps are used for updating the cost value of unstable points and improving the matching accuracy. The complexity of any pixel point in the image of the dynamic programming algorithm is O (d), wherein d is a parallax range. In the MST-based parallax refinement process, the cost value of the unstable point is updated by transmitting the value of the stable point to the unstable point, and the complexity of any pixel point is O (d) and is higher. In addition, the matching accuracy is affected by a small amount of noise in the image.
Disclosure of Invention
The purpose of the invention is as follows: in order to solve the problems of the prior art, the invention discloses a parallax stereo matching method based on forward and backward smoothness and O (1) complexity, which reduces the computational complexity and effectively improves the accuracy and efficiency of matching.
The technical scheme of the invention is as follows:
a parallax stereo matching method based on forward and backward smoothing and O (1) complexity comprises the following steps:
(1) respectively carrying out forward and backward smoothing treatment on the left eye image and the right eye image;
(2) constructing a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image, and calculating a cost function value;
(3) constructing a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and performing cost aggregation on the cost function values to generate cost aggregation values;
(4) obtaining a disparity map by adopting a WTA strategy, judging stable points and unstable points through left-right consistency detection, obtaining initial disparity confidence, and filling holes in the unstable points to obtain an initial disparity map;
(5) combining the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight, and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure based on the initial parallax confidence and the mixed weight to obtain a confidence aggregation value;
(6) and (4) in the parallax value updating stage, performing belief propagation on the belief aggregation value according to the minimum spanning tree generated in the step (3) to obtain the optimal parallax estimation and obtain a dense parallax image.
The step (1) comprises the following steps:
the smoothing process of each pixel point in the left eye image and the right eye image is updated by scanning the pixel points on the horizontal tree structure, each pixel point is taken as a root node, the forward and backward smoothing is carried out by taking the RGB three-channel image as input, and the smoothing processing formula is as shown in formula (1):
representing the pixel value of the input image after the pixel points (u, v) under the i channel are smoothed;
wherein, I i (u, v) is the pixel value of the pixel point (u, v) of the input image under the i channel,and (3) pixel value updating of pixel points (u, v) of the input image under the i channel in a forward or backward iteration mode:
▽ r I i (u,v)=I i (u,v)-I i (u,v-r)
wherein the constant λ is used to adjust the smoothing speed + r I i (u, v) is the difference between a pixel point (u, v) of the input image under the i channel and an adjacent pixel point under the direction r, (u, v-r) is a pixel point which is previous to the pixel point (u, v) in the horizontal propagation direction, and f and b respectively represent the forward direction and the backward direction; ω is a constant.
In order to improve the efficiency of the algorithm operation, the forward and backward smoothing process comprises the following steps:
s1, passing from the leftmost end node to the rightmost end node of each line of the input image in turn, and storing the result of the forward smoothing in a plurality of groupsPerforming the following steps;
s2, in reverse direction, sequentially transferring from the rightmost node to the leftmost node of each line of the input image, and storing the result of backward smoothing in the arrayIn (3), the smoothing result is obtained as formula (3):
representing the image matrix smoothed under the I-channel, I i Representing the original image under the i channel; equation (3) is a matrix form of data;
the forward and backward smoothing keeps the real depth edge information of the image while inhibiting the background noise, and the intensity value is updated through the forward and backward smoothing, so that the high texture area of the image is inhibited, and the final matching precision is improved.
The step (2) comprises the following steps:
(201) in order to avoid mismatching between pixels with the same gray level and different color information in the image, RGB three-channel information is adopted to replace single gray level information; setting any pixel point p in the left-eye image as (x, y), setting the corresponding parallax value of the pixel point p as (x, y) as d (the parallax map is a matrix, and the value of each element in the matrix is the parallax value, so that the parallax map and the parallax value are different in that one is the value of a specific point as a whole), and setting the corresponding matching point of the pixel point p in the right-eye image as pd as (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is:
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) represents gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of a right eye image under an i channel;andrepresenting the gradient in the x and y directions respectively under the i channel of the pixel point p of the left eye image,andrespectively representing gradients in x and y directions of a pixel point pd of the right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein, w 1 、w 2 Weight, w, of color information and gradient information, respectively 1 +w 2 =1;
C (p, d) is a cost function of the pixel point p when the disparity value is d, and the cost function value is calculated based on the cost function.
The step (3) specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;representing a cost Aggregation value (representing a symbol after Aggregation) of a pixel point P when the parallax value is d, and S (P, q) is a similarity function of the pixel point P and the pixel point q and represents the similarity between the pixel point P and the pixel point q;
d (p, q) represents the distance between a pixel point p and a pixel point q, and sigma is a constant and is used for adjusting the similarity between the two pixel points; in the non-texture region, the value of each pixel point is basically the same, the difference value of the color information is very small but not 0, which leads to the problem of small weight accumulation, i.e. many small edge weights are accumulated continuously along the aggregation path, and the aggregation becomes a high weight in the non-texture region, so in order to suppress the problem, the invention proposes an improved weight function, which is shown in formula (8):
m, n represents adjacent pixel points in the image;for the largest pixel value in RGB three channels, w (m, n) is the weight of the adjacent pixel point, D (p, q) is the sum accumulated by the weight w (m, n) along the path, and the distance between the pixel point p and the pixel point q is the sum of the weights of the adjacent pixel points on the path.
The step (4) specifically comprises the following steps:
(401) obtaining a disparity map of the left eye image and a disparity map of the right eye image by adopting a WTA (WTA for winner take all);
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image to divide pixel points into stable points and unstable points;
(403) the initial parallax confidence reflects the correct probability of the initial parallax value, if the pixel point and the pixels in the neighborhood of the pixel point have larger parallax confidence in the pixel point, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map:
wherein, p is any pixel point in the input image (left eye image or right eye image), if p is a stable point, the probability is 1, otherwise, the probability of the pixel point being a correct parallax value is 0.1; b (p) represents the parallax confidence of a pixel point p in the initial parallax map;
(404) for unstable point to enterFilling the row holes: for an unstable point p (occlusion point), the first stable point (non-occlusion point) on the left and right sides is respectively searched in the horizontal direction, and is marked as p left 、p right The parallax value d (p) of the unstable point p is p left And p right The smaller of the median disparity values, i.e.
d(p)=min(d(p left ),d(p right )) (10)
After the filling of the holes is completed, an initial disparity map D is obtained init 。
The step (5) specifically comprises the following steps:
(501) establishing a new mixed weight function based on the initial disparity map and the smoothed left eye image, wherein the formula (11) is as follows:
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the mixing weight of the edge connecting the adjacent pixel point n and the pixel point m, and the subscript H represents the mixing (hybrid), D init (m)、D init (n) refer to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel values of the pixel point m and the pixel point n under an i channel;
the pixel point m and the pixel point n are two adjacent pixel points on the image, and alpha represents a weight value for balancing information of the initial disparity map and information of the image pixel point after smoothing (alpha is used for balancing the information of the initial disparity map and the information of the image pixel point after smoothing, and alpha is 0.5);
S H (p, q) denotes the mixture similarity function of p-and q-points, the subscript H denotes the mixture, D H (p, q) represents the weight w of the mixture from pixel point p to pixel point q H (m, n) distances accumulated along the path; sigma H Is a constant of the mixed similarity function and is used for adjusting the similarity between two pixel points.
(502) And performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation value of the aggregated pixel points is as follows:
wherein p is a pixel point in the image, the superscript LR represents the direction of aggregation from left to right, RL represents the direction of aggregation from right to left, pl represents the previous pixel point of the pixel point p, pr represents the next pixel point of the pixel point p, and S H (p, q) represents the mixed similarity between the adjacent pixel point p and the pixel point q;representing the confidence aggregation value of the pixel point p accumulated from left to right based on the horizontal tree when the parallax value is d, B (p) is obtained in formula (9) and represents the parallax confidence value of the p point, B (pr) represents the parallax confidence aggregation value of the next pixel point of the pixel point, S H (p, pr) indicating the mixed similarity of the pixel point p and the next point pr,and the average value of the confidence aggregate value accumulated from left to right, the confidence aggregate value accumulated from right to left and the parallax confidence value of the pixel point p is represented.
The step (6) specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, according to the minimum spanning tree (weight is constructed by color information) established in the step (3), confidence propagation is carried out on the confidence aggregation value based on the minimum spanning tree, and the method comprises the following steps:
(6a) aggregation from leaf node to root node, i.e.:
wherein Ch (p) represents a child node of pixel point p,the confidence aggregation value of the pixel point p is expressed as a value after the confidence propagation is carried out from the leaf node to the root node, so the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) aggregation from the root node to the leaf nodes, i.e.:
wherein, Pr (p) represents the father node of the pixel point p;performing belief propagation on the belief aggregation value of the pixel point p from the root node to the leaf node;
(602) for any pixel point q, S (p, q) represents the similarity of color information of points p and q in the minimum spanning tree, and the confidence aggregation value B A (q) is the area size similar in both color and disparity information in the neighborhood of q, so S (p, q) B A (q) means the probability that p and q have the same disparity; when in useThen, the disparity value d (p) of p is the optimal disparity estimation of q points,is the probability of the optimal disparity estimate, said probabilityThe rate is obtained by propagating confidence aggregation values of the minimum spanning tree, and the confidence propagation of the node p is defined as B Pro (p) optimal disparity estimation is defined as disparity propagation D Pro (p), for each node p:
the process is to find the optimal disparity estimation of the unstable point from the stable points, so as to update the disparity value of the unstable point and obtain the final dense disparity map.
A disparity stereo matching device based on forward and backward smoothing and O (1) complexity, comprising:
the device comprises a smoothing processing module, a cost function construction module, a cost aggregation module, a disparity map acquisition module, a confidence aggregation module and a confidence propagation module;
the smoothing module is used for respectively carrying out forward smoothing and backward smoothing on the left eye image and the right eye image;
the cost function construction module constructs a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image and calculates a cost function value;
the cost aggregation module constructs a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and carries out cost aggregation on the cost function value to generate a cost aggregation value;
the parallax image acquisition module obtains a parallax image by adopting a WTA strategy, judges stable points and unstable points through left-right consistency detection, obtains initial parallax confidence, and fills holes in the unstable points to obtain an initial parallax image;
the confidence aggregation module is combined with the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight, and based on the initial parallax confidence and the mixed weight, confidence aggregation is carried out on the initial parallax confidence by adopting a horizontal tree structure to obtain a confidence aggregation value;
and in the parallax value updating stage, the belief propagation module performs belief propagation on the belief aggregation value according to the minimum spanning tree to obtain the optimal parallax estimation and obtain a dense parallax image.
The smoothing module for smoothing specifically comprises the following steps:
the smoothing process of each pixel point in the left eye image and the right eye image is updated by scanning the pixel points on the horizontal tree structure, each pixel point is taken as a root node, the forward smoothing and the backward smoothing are input by taking the RGB three-channel image, and the smoothing processing formula is as formula (1):
representing the pixel value of the input image after the pixel points (u, v) under the i channel are smoothed;
wherein, I i (u, v) is the pixel value of the pixel point (u, v) of the input image under the i channel,updating pixel values representing forward or backward iterations of pixel points (u, v) of the input image under the i channel:
▽ r I i (u,v)=I i (u,v)-I i (u,v-r)
wherein the constant λ is used to adjust the smoothing speed + r I i (u, v) is the difference between a pixel (u, v) of the input image under the i channel and an adjacent pixel under the direction r, and (u, v-r) is the horizontal propagation directionThe previous pixel point of the upper pixel point (u, v), f and b represent the forward and backward directions, respectively; ω is a constant.
The processing process of the cost function construction module specifically comprises the following steps:
(201) in order to avoid mismatching between pixels with the same gray level and different color information in the image, RGB three-channel information is adopted to replace single gray level information; setting any pixel point p in the left-eye image as (x, y), setting the corresponding parallax value of the pixel point p as (x, y) as d (the parallax map is a matrix, and the value of each element in the matrix is the parallax value, so that the parallax map and the parallax value are different in that one is the value of a specific point as a whole), and setting the corresponding matching point of the pixel point p in the right-eye image as pd as (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is formula (4):
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of a right eye image under an i channel;andat x and i channels of pixel p representing the left eye image respectivelyThe gradient in the y-direction is,andrespectively representing gradients in x and y directions of a pixel point pd of a right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein, w 1 、w 2 Weight, w, of color information and gradient information, respectively 1 +w 2 1 in this example w 1 =0.2;
C (p, d) is a cost function of the pixel point p when the parallax value is d, and a cost function value is calculated based on the cost function;
the processing process of the cost aggregation module specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;representing a cost Aggregation value (representing an aggregated symbol) of a pixel point P when the parallax value is d, and S (P, q) is a similarity function of the pixel point P and the pixel point q and represents the similarity between the pixel point P and the pixel point q;
d (p, q) represents the distance between a pixel point p and a pixel point q, and sigma is a constant and is used for adjusting the similarity between the two pixel points; the invention provides an improved weight function, which is shown in an equation (8):
m, n represents adjacent pixel points in the image;for the maximum pixel value in RGB three channels, w (m, n) is the weight of the adjacent pixel point, D (p, q) is the sum accumulated by the weight w (m, n) along the path, and the distance between the pixel point p and the pixel point q is the sum of the weights of the adjacent pixel points on the path;
the processing process of the disparity map acquisition module specifically comprises the following steps:
(401) obtaining a disparity map of the left eye image and a disparity map of the right eye image by adopting a WTA (WTA for winner take all);
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image, and dividing pixel points into stable points and unstable points;
(403) the initial parallax confidence reflects the correct probability of the initial parallax value, if the pixel point and the pixels in the neighborhood of the pixel point have larger parallax confidence in the pixel point, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map, i.e.:
wherein, p is any pixel point in the input image (left eye image or right eye image), if p is a stable point, the probability is 1, otherwise, the probability of the pixel point being a correct parallax value is 0.1; b (p) represents the parallax confidence of a pixel point p in the initial parallax map;
(404) filling holes in unstable points: for the unstable point p (shielded point), respectively in waterLooking for the first stable point (non-occlusion point) on the left and right sides in the direction of the square, and recording as p left 、p right The value of the disparity at the unstable point p is p left And p right The smaller of the median disparity values, i.e.
d(p)=min(d(p left ),d(p right )) (10)
After the filling of the holes is completed, an initial disparity map D is obtained init ;
The confidence aggregation module processing process specifically comprises the following steps:
(501) establishing a new mixed weight function w based on the initial disparity map and the smoothed left eye image H (m, n) represented by formula (11):
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the blending weight of the edge connecting the adjacent pixel n and pixel m, subscript H represents blending (hybrid), D init (m)、D init (n) refer to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel values of the pixel point m and the pixel point n under the channel i;
the pixel point m and the pixel point n are two adjacent pixel points on the image, and alpha represents the weight value for balancing the information of the initial disparity map and the information of the smoothed image pixel point;
S H (p, q) denotes the mixture similarity function of p-and q-points, the subscript H denotes the mixture, D H (p, q) represents the weight w of the mixture from pixel point p to pixel point q H (m, n) distances accumulated along the path; sigma H The constant is a constant of a mixed similarity function and is used for adjusting the similarity between two pixel points;
(502) performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation values of the aggregated pixel points are as follows:
wherein p is a pixel point in the image, the superscript LR represents the direction of aggregation from left to right, RL represents the direction of aggregation from right to left, pl represents the previous pixel point of the pixel point p, pr represents the next pixel point of the pixel point p, and S H (p, q) represents the mixed similarity between adjacent pixel points.Representing the confidence aggregation value of the pixel point p accumulated from left to right based on the horizontal tree when the parallax value is d, B (p) is obtained in formula (9) and represents the parallax confidence value of the pixel point p, B (pr) represents the parallax confidence aggregation value of the next pixel point of the pixel point, S H (p, pr) indicating the mixed similarity of the pixel point p and the next point pr,representing confidence aggregate values accumulated from left to right, confidence aggregate values accumulated from right to left, pixel point p p Average of the three disparity confidence values.
The belief propagation module processing process specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, according to the minimum spanning tree (weight is constructed by color information) established in the step (3), confidence propagation is carried out on the confidence aggregation value based on the minimum spanning tree, and the method comprises the following steps:
(6a) aggregation from leaf node to root node, i.e.:
wherein Ch (p) represents the child node of the pixel point p,the confidence aggregation value of the pixel point p is expressed as a value after the confidence propagation is carried out from the leaf node to the root node, so the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) from the root node to the leaf nodes, the aggregation is:
wherein, Pr (p) represents the father node of the pixel point p;performing belief propagation on the belief aggregation value of the pixel point p from the root node to the leaf node;
(602) for any pixel point q, S (p, q) represents the similarity of color information of points p and q in the minimum spanning tree, and the confidence aggregation value B A (q) is the area size similar in both color and disparity information in the neighborhood of q, so S (p, q) B A (q) means the probability that p and q have the same disparity; when in useThen, the parallax value d (p) of p is the optimal parallax estimation of q point,is the probability of the optimal disparity estimation, which is obtained by propagating the confidence aggregation value of the minimum spanning tree, so it is defined as the confidence propagationIs B Pro The optimal disparity estimate is defined as the disparity propagation D Pro (p), for each node p:
the process is to find the optimal disparity estimation of the unstable point from the stable points, so as to update the disparity value of the unstable point and obtain the final dense disparity map.
A computing device comprising one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing a front-to-back smoothing and O (1) complexity disparity stereo matching method.
The beneficial effects of the invention include:
the invention discloses a stereo matching method, which comprises the steps of firstly, respectively smoothing left and right eye images in the front-back direction, combining color and gradient information to construct a cost calculation function and calculate a cost value, then respectively constructing a minimum spanning tree for the smoothed left and right eye images, and carrying out cost aggregation on cost function values. And obtaining an initial parallax by adopting a WTA strategy, judging stable and unstable points through left and right consistency detection, obtaining initial parallax confidence, and filling holes in the unstable points to obtain an initial parallax image. And then combining the mixed weight values, and performing confidence aggregation on the confidence values by adopting a horizontal tree structure. And finally, carrying out belief propagation on the belief aggregation value according to the minimum spanning tree constructed in the cost aggregation stage to obtain an optimal parallax estimation value, thereby obtaining a dense parallax image. The method overcomes the defects of low matching precision and high calculation complexity in a parallax refinement stage caused by the existence of noise in the existing binocular stereo matching technology;
the invention respectively carries out forward and backward smoothing treatment on the left and right images in the preprocessing stage, can remove the noise in the original image, simultaneously keeps the information of the image edge and effectively improves the accuracy of the parallax image. In the cost aggregation stage, a non-local method based on minimum spanning tree aggregation is adopted, the accuracy is higher than that of a local algorithm, and the operation efficiency is higher than that of a global algorithm. In the parallax refinement stage, for any pixel point, the calculation complexity is O (1), so that the complexity is greatly reduced, and particularly for high-resolution images, the matching efficiency is effectively improved.
Drawings
The invention is further explained below with reference to the figures and examples;
FIG. 1 is a flow chart of the stereo matching method based on forward and backward smoothing and O (1) complexity parallax according to the present invention;
FIG. 2 is a root node p based smoothing process of the present invention;
FIG. 3a is a bottom-up traversal process of the cost aggregation process of the present invention based on a minimum spanning tree;
FIG. 3b is a top-down traversal process of the cost aggregation process based on the minimum spanning tree according to the present invention;
FIG. 4 is a confidence aggregation process based on a horizontal tree structure in accordance with the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments, which are illustrative only and not limiting, and the scope of the present invention is not limited thereby. In order to achieve the objectives and effects of the technical means, creation features, working procedures and using methods of the present invention, and to make the evaluation methods easy to understand, the present invention will be further described with reference to the following embodiments.
As shown in fig. 1, a disparity stereo matching method based on forward and backward smoothing and O (1) complexity includes the following steps:
(1) respectively carrying out forward and backward smoothing treatment on the left eye image and the right eye image;
(2) constructing a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image and calculating a cost function value;
(3) constructing a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and performing cost aggregation on the cost function values to generate cost aggregation values;
(4) obtaining a disparity map by adopting a WTA strategy, judging a stable point and an unstable point through left-right consistency detection, obtaining initial disparity confidence, and filling holes in the unstable point to obtain an initial disparity map;
(5) combining the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight (the mixed weight passes through a formula 11), and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure based on the initial parallax confidence and the mixed weight to obtain a confidence aggregation value;
(6) and (4) in the parallax value updating stage, performing belief propagation on the belief aggregation value according to the minimum spanning tree generated in the step (3) to obtain the optimal parallax estimation and obtain a dense parallax image.
The step (1) comprises the following steps:
as shown in fig. 2, the smoothing process of each pixel in the left eye image and the right eye image is updated by scanning the pixels on the horizontal tree structure, each pixel is taken as a root node, the forward and backward smoothing is input by using the RGB three-channel image, and the smoothing formula is formula (1):
representing the pixel value of the input image after pixel points (u, v) under an i channel are smoothed;
wherein, I i (u, v) is a pixel value of a pixel point (u, v) of the input image under the i-channel,and (3) updating pixel values representing forward or backward iteration of pixel points (u, v) of the input image under the i channel:
where the constant λ is used to adjust the smoothing speed, λ ═ 0.2 is set in the present invention r I i (u, v) is the difference between a pixel point (u, v) of the input image under the i channel and an adjacent pixel point under the direction r, (u, v-r) is a pixel point which is previous to the pixel point (u, v) in the horizontal propagation direction, and f and b respectively represent the forward direction and the backward direction; ω is a constant and can be a fixed value, or based on noise estimation, ω is set to 0.1; when there is a large difference between adjacent pixel points, especially in a high texture region, the value of the exponential term is small, so that the contribution between pixels is small, and the depth information of the image edge can be effectively maintained.
In order to improve the efficiency of the algorithm operation, the forward and backward smoothing process comprises the following steps:
s1, sequentially transferring from the leftmost node to the rightmost node of each line of the input image, and storing the result of forward smoothing in the arrayPerforming the following steps;
s2, in reverse direction, sequentially transferring from the rightmost node to the leftmost node of each line of the input image, and storing the result of backward smoothing in the arrayIn (3), the smoothing result is obtained as formula (3):
representing the image matrix smoothed under the I-channel, I i Representing the original image under the i channel; equation (3) is a matrix form of data;
the forward and backward smoothing keeps the real depth edge information of the image while inhibiting the background noise, and the intensity value is updated through the forward and backward smoothing, so that the high texture area of the image is inhibited, and the final matching precision is improved.
The step (2) comprises the following steps:
(201) in order to avoid mismatching between pixels with the same gray level and different color information in the image, RGB three-channel information is adopted to replace single gray level information; setting any pixel point p in the left-eye image as (x, y), setting the corresponding parallax value of the pixel point p as (x, y) as d (the parallax map is a matrix, and the value of each element in the matrix is the parallax value, so that the parallax map and the parallax value are different in that one is the value of a specific point as a whole), and setting the corresponding matching point of the pixel point p in the right-eye image as pd as (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is:
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) represents gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of the right eye image under an i channel;andrepresenting the gradient in the x and y directions respectively under the i channel of the pixel point p of the left eye image,andrespectively representing gradients in x and y directions of a pixel point pd of the right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein, w 1 、w 2 Weights, w, of colour information and gradient information, respectively 1 +w 2 1, w in this embodiment 1 =0.2;
C (p, d) is a cost function of the pixel point p when the disparity value is d, and the cost function value is calculated based on the cost function.
The step (3) specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;representing a cost Aggregation value (representing an aggregated symbol) of a pixel point P when the parallax value is d, and S (P, q) is a similarity function of the pixel point P and the pixel point q and represents the similarity between the pixel point P and the pixel point q;
d (p, q) represents the distance between a pixel point p and a pixel point q, and sigma is a constant and is used for adjusting the similarity between the two pixel points; in the non-texture region, the value of each pixel point is basically the same, the difference value of the color information is very small but not 0, which leads to the problem of small weight accumulation, i.e. many small edge weights are accumulated continuously along the aggregation path, and the aggregation becomes a high weight in the non-texture region, so in order to suppress the problem, the invention proposes an improved weight function, which is shown in formula (8):
m, n represents adjacent pixel points in the image;for the largest pixel value in the three channels of RGB, w (m, n) is the weight of the adjacent pixel, D (p, q) in formula (7) is the sum accumulated by the weights w (m, n) along the path, and the distance between the pixel p and the pixel q is the sum of the weights of the adjacent pixels on the path.
The step (4) specifically comprises the following steps:
(401) obtaining a disparity map of the left eye image and a disparity map of the right eye image by adopting a WTA (WTA for winner take all);
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image to divide pixel points into stable points and unstable points;
if the parallax value d of the pixel point p in the left eye image L (p) is equal to the disparity value d of the corresponding right eye image R (p-d L (p)), i.e. d L (p)=d R (p-d L (p)), p is considered to be a stable point, otherwise, it is considered to be an unstable point.
(403) The initial parallax confidence reflects the correct probability of the initial parallax value, if the parallax value and the color information of the pixels in the neighborhood of the pixel point are obtained, the pixel point has larger parallax confidence, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map:
wherein, p is any pixel point in the input image (left eye image or right eye image), if p is a stable point, the probability is 1, otherwise, the probability of the pixel point being a correct parallax value is 0.1; b (p) represents the parallax confidence of a pixel point p in the initial parallax map;
(404) filling holes in unstable points: for an unstable point p (occlusion point), the first stable point (non-occlusion point) on the left and right sides is respectively searched in the horizontal direction, and is marked as p left 、p right The parallax value d (p) of the unstable point p is p left And p right The smaller of the median disparity values, i.e.
d(p)=min(d(p left ),d(p right )) (10)
After the filling of the holes is completed, an initial disparity map D is obtained init 。
The step (5) specifically comprises the following steps:
(501) establishing a new mixing weight function based on the initial disparity map and the smoothed left eye image, wherein the formula (11) is as follows:
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the blending weight of the edge connecting the adjacent pixel point n and the pixel point mThe value, subscript H represents hybrid, D init (m)、D init (n) refer to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel values of the pixel point m and the pixel point n under an i channel;
the pixel point m and the pixel point n are two adjacent pixel points on the image, and alpha represents a weight value for balancing information of the initial disparity map and information of the image pixel point after smoothing (alpha is used for balancing the information of the initial disparity map and the information of the image pixel point after smoothing, and alpha is 0.5);
S H (p, q) denotes the mixture similarity function of p-and q-points, the subscript H denotes the mixture, D H (p, q) represents the weight w of the mixture from pixel point p to pixel point q H (m, n) distances accumulated along the path; sigma H Is a constant of a mixed similarity function and is used for adjusting the similarity between two pixel points
(502) And performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation value of the aggregated pixel points is as follows:
wherein p is a pixel point in the image, the superscript LR represents the direction of aggregation from left to right, RL represents the direction of aggregation from right to left, pl represents the previous pixel point of the pixel point p, pr represents the next pixel point of the pixel point p, and S H (p, q) tableIndicating the mixed similarity between the adjacent pixel point p and the pixel point q;representing the confidence aggregation value of the pixel point p accumulated from left to right based on the horizontal tree when the parallax value is d, B (p) is obtained in formula (9) and represents the parallax confidence aggregation value obtained by the pixel point according to the mixed weight value, B A (pr) a subsequent pixel disparity confidence cluster value, S, representing a pixel H (p, pr) indicating the mixed similarity of the pixel point p and the next point pr,and the average value of the confidence aggregation value accumulated from left to right, the confidence aggregation value accumulated from right to left and the parallax confidence value of the pixel point p is represented.
The step (6) specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, according to the minimum spanning tree (weight is constructed by color information) established in the step (3), confidence propagation is carried out on the confidence aggregation value based on the minimum spanning tree, and the method comprises the following steps:
(6a) as shown in fig. 3a, aggregation is from leaf nodes to root nodes, i.e.:
where Ch (p) represents the child node of pixel p,the confidence aggregation value of the pixel point p is expressed as a value after the confidence propagation is carried out from the leaf node to the root node, so the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) as shown in fig. 3b, aggregation is from the root node to the leaf nodes, i.e.:
wherein, Pr (p) represents the father node of the pixel point p;the confidence aggregation value of the pixel point p is subjected to confidence propagation from the root node to the leaf node;
(602) as shown in FIG. 4, for any pixel point q, S (p, q) represents the color information similarity of points p and q in the minimum spanning tree, and the confidence aggregation value B A Since (q) is a region size similar to each other in both color and parallax information in the neighborhood of q, S (p, q) B A (q) means the probability that p and q have the same disparity; when the temperature is higher than the set temperatureThen, the disparity value of p is the optimal disparity estimation of q points,is the probability of the optimal disparity estimation, the probability is obtained by propagating the confidence aggregation value of the minimum spanning tree, and the confidence propagation of the node p is B Pro (p) optimal disparity estimation is defined as disparity propagation D Pro (p), for each node p:
the process is to find the optimal disparity estimation of the unstable point from the stable points, so as to update the disparity value of the unstable point and obtain the final dense disparity map.
A disparity stereo matching device based on forward and backward smoothing and O (1) complexity, comprising: the device comprises a smoothing processing module, a cost function construction module, a cost aggregation module, a disparity map acquisition module, a confidence aggregation module and a confidence propagation module;
the smoothing module is used for respectively carrying out forward smoothing and backward smoothing on the left eye image and the right eye image; the cost function construction module constructs a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image and calculates a cost function value; the cost aggregation module constructs a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and carries out cost aggregation on the cost function value to generate a cost aggregation value; the parallax image acquisition module obtains a parallax image by adopting a WTA strategy, judges stable points and unstable points through left-right consistency detection, obtains initial parallax confidence, and fills holes in the unstable points to obtain an initial parallax image; the confidence aggregation module is combined with the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight, and based on the initial parallax confidence and the mixed weight, confidence aggregation is carried out on the initial parallax confidence by adopting a horizontal tree structure to obtain a confidence aggregation value; and in the parallax value updating stage, the belief propagation module performs belief propagation on the belief aggregation value according to the minimum spanning tree to obtain the optimal parallax estimation and obtain a dense parallax image.
The smoothing module for smoothing specifically comprises the following steps:
the smoothing process of each pixel point in the left eye image and the right eye image is updated by scanning the pixel points on the horizontal tree structure, each pixel point is taken as a root node, the forward smoothing and the backward smoothing are input by taking the RGB three-channel image, and the smoothing processing formula is as formula (1):
representing the pixel value of the input image after the pixel points (u, v) under the i channel are smoothed;
wherein,I i (u, v) is the pixel value of the pixel point (u, v) of the input image under the i channel,updating pixel values representing forward or backward iterations of pixel points (u, v) of the input image under the i channel:
▽ r I i (u,v)=I i (u,v)-I i (u,v-r)
where a constant λ is used to adjust the smoothing speed, λ ═ 0.2 is set in the present invention r I i (u, v) is the difference between a pixel point (u, v) of the input image under the i channel and an adjacent pixel point under the direction r, (u, v-r) is a pixel point which is previous to the pixel point (u, v) in the horizontal propagation direction, and f and b respectively represent the forward direction and the backward direction; ω is a constant value, which can be a fixed value, or based on noise estimation, ω is set to 0.1; when there is a great difference between adjacent pixels, especially in a high texture region, the value of the exponential term is small, so the contribution between pixels is also small, and the depth information of the image edge can be effectively maintained.
Forward and backward smoothing keeps real depth edge information of the image while inhibiting background noise, and updates the intensity value through the forward and backward smoothing, so that a high-texture area of the image is inhibited, and the accuracy of final matching is improved;
the processing process of the cost function construction module specifically comprises the following steps:
(201) in order to avoid mismatching between pixels with the same gray level and different color information in the image, RGB three-channel information is adopted to replace single gray level information; if any pixel point p in the left eye image is equal to (x, y), the disparity value corresponding to the pixel point p is equal to (x, y) is d (the disparity map is a matrix, and the value of each element in the matrix is the disparity value, so the disparity map and the disparity value are different in that one is the value of a specific point as a whole), and the pixel point p is corresponding to the right eye imageThe coordination point is pd ═ (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is formula (4):
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) represents gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of a right eye image under an i channel;andrepresenting the gradient in the x and y directions respectively under the i channel of the pixel point p of the left eye image,andrespectively representing gradients in x and y directions of a pixel point pd of a right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein, w 1 、w 2 Respectively color information and gradient informationWeight of, w 1 +w 2 1 in this example w 1 =0.2;
C (p, d) is a cost function of the pixel point p when the parallax value is d, and a cost function value is calculated based on the cost function;
the processing process of the cost aggregation module specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;representing a cost Aggregation value (representing a symbol after Aggregation) of a pixel point P when the parallax value is d, and S (P, q) is a similarity function of the pixel point P and the pixel point q and represents the similarity between the pixel point P and the pixel point q;
d (p, q) represents the distance between a pixel point p and a pixel point q, and sigma is a constant and is used for adjusting the similarity between the two pixel points; in the non-texture region, the value of each pixel point is basically the same, the difference value of the color information is very small but not 0, which leads to the problem of small weight accumulation, i.e. many small edge weights are accumulated continuously along the aggregation path, and the aggregation becomes a high weight in the non-texture region, so in order to suppress the problem, the invention proposes an improved weight function, which is shown in formula (8):
m, n represent adjacent pixels in the imagePoint;for the maximum pixel value in RGB three channels, w (m, n) is the weight of the adjacent pixel point, D (p, q) in formula (7) is the sum accumulated by the weight w (m, n) along the path, and the distance between the pixel point p and the pixel point q is the sum of the weights of the adjacent pixel points on the path;
the processing process of the disparity map acquisition module specifically comprises the following steps:
(401) obtaining a disparity map of the left eye image and a disparity map of the right eye image by adopting a WTA (WTA for winner take all);
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image, and dividing pixel points into stable points and unstable points;
if the parallax value d of the pixel point p in the left eye image L (p) is equal to the disparity value d of the corresponding right eye image R (p-d L (p)), i.e. d L (p)=d R (p-d L (p)), then p is considered to be a stable point, otherwise, it is considered to be an unstable point.
(403) The initial parallax confidence reflects the correct probability of the initial parallax value, if the pixel point and the pixels in the neighborhood of the pixel point have larger parallax confidence in the pixel point, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map, i.e.:
wherein, p is any pixel point in the input image (left eye image or right eye image), if p is a stable point, the probability is 1, otherwise, the probability of the pixel point being a correct parallax value is 0.1; b (p) represents the parallax confidence of a pixel point p in the initial parallax map;
(404) filling holes in unstable points: for an unstable point p (occlusion point), the first stable point (non-occlusion point) on the left and right sides is respectively searched in the horizontal direction, and is marked as p left 、p right The value of the disparity at the unstable point p is p left And p right The smaller of the median disparity values, i.e.
d(p)=min(d(p left ),d(p right )) (10)
After the filling of the holes is completed, an initial disparity map D is obtained init ;
The processing process of the confidence aggregation module specifically comprises the following steps:
(501) establishing a new mixed weight function w based on the initial disparity map and the smoothed left eye image H (m, n) represented by formula (11):
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the blending weight of the edge connecting the adjacent pixel n and pixel m, subscript H represents blending (hybrid), D init (m)、D init (n) refer to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel values of the pixel point m and the pixel point n under the channel i;
pixel m and pixel n are two adjacent pixels on the image, and α represents a weight value for balancing information of the initial disparity map and information of the image pixel after smoothing (α is used for balancing information of the initial disparity map and information of the image pixel after smoothing, and α is equal to 0.5);
S H (p, q) denotes the mixture similarity function of p-and q-points, the subscript H denotes the mixture, D H (p, q) represents that the weight w is mixed from pixel point p to pixel point q H (m, n) cumulative distances along the path; sigma H The constant is a constant of a mixed similarity function and is used for adjusting the similarity between two pixel points;
(502) and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation value of the aggregated pixel points is as follows:
the belief propagation module processing process specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, according to the minimum spanning tree (weight is constructed by color information) established in the step (3), confidence propagation is carried out on the confidence aggregation value based on the minimum spanning tree, and the method comprises the following steps:
6a) as shown in fig. 3a, aggregation is from leaf nodes to root nodes, i.e.:
wherein Ch (p) represents the child node of the pixel point p,the confidence aggregation value of the pixel point p is expressed as a value after the confidence propagation is carried out from the leaf node to the root node, so the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) as shown in fig. 3b, aggregation is from the root node to the leaf nodes, i.e.:
wherein, Pr (p) represents the father node of the pixel point p;the confidence aggregation value of the pixel point p is subjected to confidence propagation from the root node to the leaf node;
(602) for any pixel point q, S (p, q) represents the similarity of color information of points p and q in the minimum spanning tree, as shown in fig. 4, the confidence aggregation value B A (q) is the area size similar in both color and disparity information in the neighborhood of q, so S (p, q) B A (q) means the probability that p and q have the same disparity; when the temperature is higher than the set temperatureThen, the disparity value d (p) of p is the optimal disparity estimation of q points,is the probability of the optimal disparity estimation, which is obtained by propagating the confidence aggregation value of the minimum spanning tree, and the confidence propagation of the node p is B Pro (p) optimal disparity estimation is defined as disparity propagation D Pro (p), for each node p:
the process is to find the optimal disparity estimation of the unstable point from the stable points, thereby updating the disparity value of the unstable point and obtaining the final dense disparity map.
A computing device, comprising:
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing a method based on forward-backward smoothing and O (1) complexity disparity stereo matching.
For the non-local stereo matching algorithm based on the minimum spanning tree, in the parallax refinement stage, a new cost value is obtained through left-right consistency detection, then cost aggregation is carried out in each parallax range by adopting a method based on the minimum spanning tree, the parallax value of a stable point is transmitted to an unstable point, the effect of updating the parallax value is achieved, and then a Winner Take All (WTA) strategy is adopted to calculate the final parallax. In the whole parallax refinement process, 2 times of addition operation and 3 times of multiplication operation are required for any pixel point in any parallax range in the image. Therefore, 2N additions and 3N multiplications are required for any pixel, where N is the disparity range and the computational complexity of each pixel is o (N).
The stereo matching method based on forward and backward smoothing and O (1) complexity parallax refinement provided by the invention only needs 4 times of addition and 3 times of multiplication operation for any pixel point in an image in a confidence aggregation part. In the belief propagation stage, any pixel point needs 2 times of addition operation and 3 times of multiplication operation, and 6 times of addition operation and 6 times of multiplication operation are needed in total. Therefore, for any pixel point, the calculation complexity is O (1), the calculation complexity is greatly reduced, and the matching efficiency is improved.
Those skilled in the art can design the invention to be modified or varied without departing from the spirit and scope of the invention. Therefore, if such modifications and variations of the present invention fall within the technical scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A parallax stereo matching method based on forward and backward smoothing and O (1) complexity comprises the following steps:
(1) respectively carrying out forward and backward smoothing treatment on the left eye image and the right eye image;
(2) constructing a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image, and calculating a cost function value;
(3) constructing a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and performing cost aggregation on the cost function values to generate cost aggregation values;
(4) obtaining a disparity map by adopting a WTA strategy, judging a stable point and an unstable point through left-right consistency detection, obtaining initial disparity confidence, and filling holes in the unstable point to obtain an initial disparity map;
(5) combining the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight, and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure based on the initial parallax confidence and the mixed weight to obtain a confidence aggregation value;
(6) and (4) in the parallax value updating stage, performing belief propagation on the belief aggregation value according to the minimum spanning tree generated in the step (3) to obtain the optimal parallax estimation and obtain a dense parallax image.
2. The method for disparity stereo matching based on forward and backward smoothing and O (1) complexity as claimed in claim 1, wherein:
the step (1) specifically comprises the following steps:
the smoothing process of each pixel point in the left eye image and the right eye image is updated by scanning the pixel points on the horizontal tree structure, each pixel point is taken as a root node, the forward and backward smoothing is carried out by taking the RGB three-channel image as input, and the smoothing processing formula is as shown in formula (1):
representing the pixel value of the input image after the pixel points (u, v) under the i channel are smoothed;
wherein the content of the first and second substances,I i (u, v) is the pixel value of the pixel point (u, v) of the input image under the i channel,and (3) updating pixel values representing forward or backward iteration of pixel points (u, v) of the input image under the i channel:
wherein the constant lambda is used to adjust the smoothing speed,is the difference between a pixel point (u, v) of an input image under an i channel and an adjacent pixel point under a direction r, (u, v-r) is a previous pixel point of the pixel point (u, v) in the horizontal propagation direction, and f and b respectively represent the forward direction and the backward direction; omega is a constant;
the forward and backward smoothing process includes the steps of:
s1, passing from the leftmost end node to the rightmost end node of each line of the input image in turn, and storing the result of the forward smoothing in a plurality of groupsThe preparation method comprises the following steps of (1) performing;
s2, in reverse direction, passing from the rightmost node to the leftmost node of each line of the input image in sequence, and storing the result of backward smoothing in array I i RL In (3), the smoothing result is obtained as formula (3):
I i new =(I i LR +I i RL +I i )/3,i∈R,G,B (3)
I i new graph showing smoothing under i-channelImage matrix, I i Representing the original image under the i channel; equation (3) is a matrix form of data.
3. The method for disparity stereo matching based on forward and backward smoothing and O (1) complexity as claimed in claim 1, wherein:
the step (2) specifically comprises the following steps:
(201) replacing single gray scale information with RGB three-channel information; setting any pixel point p in the left-eye image as (x, y), setting the parallax value corresponding to the pixel point p as (x, y) as d, and setting the matching point corresponding to the pixel point p in the right-eye image as pd as (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is:
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) represents gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of the right eye image under an i channel;andunder the i channel of the pixel point p respectively representing the left eye imageThe gradient in the x and y directions is,andrespectively representing gradients in x and y directions of a pixel point pd of the right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein, w 1 、w 2 Weights, w, of colour information and gradient information, respectively 1 +w 2 =1;
C (p, d) is a cost function of the pixel point p when the disparity value is d, and the cost function value is calculated based on the cost function.
4. The method for disparity stereo matching based on forward and backward smoothing and O (1) complexity as claimed in claim 1, wherein:
the step (3) specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;expressing a cost aggregation value of a pixel point P when the parallax value is d, and expressing the similarity between the pixel point P and the pixel point q by using S (P, q) as a similarity function of the pixel point P and the pixel point q;
the improved weight function is shown in formula (8):
m, n represents adjacent pixel points in the image;for the maximum pixel value in RGB three channels, w (m, n) is the weight of the adjacent pixel point, D (p, q) is the sum accumulated by the weight w (m, n) along the path, and the distance between the pixel point p and the pixel point q is the sum of the weights of the adjacent pixel points on the path.
5. The method for disparity stereo matching based on forward and backward smoothing and O (1) complexity as claimed in claim 1, wherein: the step (4) specifically comprises the following steps:
(401) obtaining a disparity map of a left eye image and a disparity map of a right eye image by adopting a WTA strategy;
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image to divide pixel points into stable points and unstable points;
(403) the initial parallax confidence reflects the correct probability of the initial parallax value, if the parallax value and the color information of the pixels in the neighborhood of the pixel point are obtained, the pixel point has larger parallax confidence, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map:
wherein, p is any pixel point in the input image, if p is a stable point, the probability is 1, otherwise, the probability of the pixel point being a correct parallax value is 0.1; b (p) represents the parallax confidence of a pixel point p in the initial parallax map;
(404) filling holes in unstable points: for the unstable point p, the first stable point on the left and right sides is respectively searched in the horizontal direction, and is marked as p left 、p right The parallax value d (p) of the unstable point p is p left And p right The smaller of the median disparity values, i.e.
d(p)=min(d(p left ),d(p right )) (10)
After the filling of the holes is completed, an initial disparity map D is obtained init 。
6. The method for disparity stereo matching based on forward and backward smoothing and O (1) complexity as claimed in claim 1, wherein:
the step (5) specifically comprises the following steps:
(501) establishing a new mixing weight function based on the initial disparity map and the smoothed left eye image, wherein the formula (11) is as follows:
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the mixing weight of the edge connecting the adjacent pixel n and pixel m, D init (m)、D init (n) refer to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel values of the pixel point m and the pixel point n under the channel i;
the pixel point m and the pixel point n are two adjacent pixel points on the image, and alpha represents the weight value for balancing the information of the initial disparity map and the information of the smoothed image pixel point;
S H (p, q) representing pixel points p andthe mixture similarity function of pixel point q, subscript H denotes the mixture, D H (p, q) represents that pixel point p and pixel point q are subjected to mixed weight w H (m, n) cumulative distances along the path; sigma H Is a constant of the hybrid similarity function;
(502) and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation value of the aggregated pixel points is as follows:
wherein, p is a pixel point in the image, the superscript LR represents that the aggregation direction goes from left to right, the superscript RL represents that the aggregation direction goes from right to left, the pl represents a former pixel point of the pixel point p, the pr represents a latter pixel point of the pixel point p, and S H (p, q) represents the mixed similarity between the adjacent pixel point p and the pixel point q;representing the confidence aggregation value of the pixel point p accumulated from left to right based on the horizontal tree when the parallax value is d, B (p) is obtained in formula (9) and represents the parallax confidence value of the p point, B (pr) represents the parallax confidence aggregation value of the next pixel point of the pixel point, S H (p, pr) indicating the mixed similarity of the pixel point p and the next point pr,and the average value of the confidence aggregation value accumulated from left to right, the confidence aggregation value accumulated from right to left and the parallax confidence value of the pixel point p is represented.
7. The method according to claim 6, wherein the method comprises the following steps:
the step (6) specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, performing confidence propagation on the confidence aggregation value based on the minimum spanning tree according to the minimum spanning tree established in the step (3), and comprising the following steps of:
(6a) aggregation from leaf node to root node, i.e.:
where Ch (p) represents the child node of pixel p,the confidence aggregate value representing the pixel point p is a value after belief propagation, and therefore,
the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) from the root node to the leaf nodes, the aggregation is:
wherein, Pr (p) represents the father node of the pixel point p;performing belief propagation on the belief aggregation value of the pixel point p from the root node to the leaf node;
(602) for any pixel point q, S (p, q) represents the similarity of color information of points p and q in the minimum spanning tree, and the confidence aggregation value B A (q) area size similar in both color and disparity information in q-neighborhood, S(p,q)B A (q) indicating the probability that pixel points p and q have the same parallax; when in useThen, the disparity value of p is the optimal disparity estimation of q points,is the probability of the optimal disparity estimation, the probability is obtained by propagating the confidence aggregation value of the minimum spanning tree, and the confidence propagation of the node p is defined asOptimal disparity estimation is defined as disparity propagationFor each node p:
the parallax confidence point p is an unstable point, the point q is a stable point, and the point I represents the whole input image.
8. A disparity stereo matching device based on forward and backward smoothing and O (1) complexity, comprising:
the device comprises a smoothing processing module, a cost function construction module, a cost aggregation module, a disparity map acquisition module, a confidence aggregation module and a confidence propagation module;
the smoothing module is used for respectively carrying out forward smoothing and backward smoothing on the left eye image and the right eye image;
the cost function construction module constructs a cost function based on the color and gradient information of the smoothed left eye image and the smoothed right eye image and calculates a cost function value;
the cost aggregation module constructs a minimum spanning tree for the smoothed left eye image and the smoothed right eye image, and carries out cost aggregation on the cost function value to generate a cost aggregation value;
the parallax image acquisition module obtains a parallax image by adopting a WTA strategy, judges stable points and unstable points through left-right consistency detection, obtains initial parallax confidence, and fills holes in the unstable points to obtain an initial parallax image;
the confidence aggregation module is combined with the color information of the smoothed left eye image and the initial parallax image to obtain a mixed weight, and based on the initial parallax confidence and the mixed weight, confidence aggregation is carried out on the initial parallax confidence by adopting a horizontal tree structure to obtain a confidence aggregation value;
and in the parallax value updating stage, the belief propagation module performs belief propagation on the belief aggregation value according to the minimum spanning tree to obtain the optimal parallax estimation and obtain a dense parallax image.
9. The apparatus for forward-backward smoothing and O (1) complexity-based parallax stereo matching according to claim 8,
the smoothing module for smoothing specifically comprises the following steps:
the smoothing process of each pixel point in the left eye image and the right eye image is updated by scanning the pixel points on the horizontal tree structure, each pixel point is taken as a root node, the forward and backward smoothing is carried out by taking the RGB three-channel image as input, and the smoothing processing formula is as shown in formula (1):
representing the pixel value of the input image after the pixel points (u, v) under the i channel are smoothed;
wherein, I i (u, v) is the pixel value of the pixel point (u, v) of the input image under the i channel,updating a pixel value representing forward or backward iteration of a pixel point (u, v) of the input image under an i channel:
wherein the constant lambda is used to adjust the smoothing speed,is the difference between a pixel point (u, v) of an input image under an i channel and an adjacent pixel point under a direction r, (u, v-r) is a previous pixel point of the pixel point (u, v) in the horizontal propagation direction, and f and b respectively represent the forward direction and the backward direction; omega is a constant;
the processing process of the cost function construction module specifically comprises the following steps:
(201) replacing single gray scale information with RGB three-channel information; setting any pixel point p in the left eye image as (x, y), setting the parallax value corresponding to the pixel point p as (x, y) as d, and setting the matching point corresponding to the pixel point p in the right eye image as pd as (x-d, y); color information C AD (p, d) and gradient information C Grad The expression of (p, d) is formula (4):
wherein, C AD (p, d) represents color information of the pixel point p when the parallax value is d, C Grad (p, d) represents gradient information of the pixel point p when the parallax value is d;the pixel value of the pixel point p of the left eye image under the channel i,representing the pixel value of a pixel point pd of the right eye image under an i channel;andrepresenting the gradient in the x and y directions respectively under the i channel of the pixel point p of the left eye image,andrespectively representing gradients in x and y directions of a pixel point pd of the right eye image under an i channel;
(202) the constructed cost function is as follows:
C(p,d)=w 1 C AD (p,d)+w 2 C Grad (p,d) (5)
wherein w 1 、w 2 Weight, w, of color information and gradient information, respectively 1 +w 2 =1;
C (p, d) is a cost function of the pixel point p when the parallax value is d, and a cost function value is calculated based on the cost function;
the processing procedure of the cost aggregation module specifically comprises the following steps:
the cost aggregation value based on the minimum spanning tree is the sum of the multiplication of the cost function value and the corresponding weight value, and is represented by formula (6):
wherein, C d (q) a cost function value of a pixel point q when the parallax value is d, wherein q is any pixel point in the input image;expressing a cost aggregation value of a pixel point P when the parallax value is d, and expressing the similarity between the pixel point P and the pixel point q by using S (P, q) as a similarity function of the pixel point P and the pixel point q;
d (p, q) represents the distance between a pixel point p and a pixel point q, and sigma is a constant and is used for adjusting the similarity between the two pixel points; the improved weight function is shown in the following equation (8):
m, n represents adjacent pixel points in the image;for the maximum pixel value in RGB three channels, w (m, n) is the weight of the adjacent pixel point, D (p, q) is the sum accumulated by the weight w (m, n) along the path, and the distance between the pixel point p and the pixel point q is the sum of the weights of the adjacent pixel points on the path;
the processing process of the disparity map acquisition module specifically comprises the following steps:
(401) obtaining a disparity map of a left eye image and a disparity map of a right eye image by adopting a WTA strategy;
(402) performing left-right consistency detection on the disparity map of the left eye image and the disparity map of the right eye image to divide pixel points into stable points and unstable points;
(403) the initial parallax confidence reflects the correct probability of the initial parallax value, if the pixel point and the pixels in the neighborhood of the pixel point have larger parallax confidence in the pixel point, and the parallax confidence value is set based on a stable point and an unstable point;
let B be the disparity confidence of the disparity map, i.e.:
wherein, p is any pixel point in the input image, if p is a stable point, the probability is 1, otherwise, the probability of the pixel point is 0.1 of the correct parallax value; b (p) represents the parallax confidence of a pixel point p in the initial parallax image;
(404) filling holes in unstable points: for the unstable point p, the first stable points on the left and right sides are respectively found in the horizontal direction and are denoted as p left 、p right The parallax value d (p) of the unstable point p is p left And p right The one with the smaller intermediate disparity value:
d(p)=min(d(p left ),d(p right )) (10)
after the filling of the holes is completed, an initial disparity map D is obtained init ;
The processing process of the confidence aggregation module specifically comprises the following steps:
(501) establishing a new mixed weight function w based on the initial disparity map and the smoothed left eye image H (m, n) represented by formula (11):
w H (m, n) represents the mixing weight of the edge connecting the adjacent pixel m and pixel n, w H (n, m) represents the mixing weight of the edge connecting the adjacent pixel n and pixel m, D init (m)、D init (n) refers to the initial disparity values, I, of pixel m and pixel n, respectively i (m)、I i (n) representing pixel m and pixelThe pixel value of n is the pixel value under the channel i;
the pixel point m and the pixel point n are two adjacent pixel points on the image, and alpha represents the weight value for balancing the information of the initial disparity map and the information of the smoothed image pixel point;
S H (p, q) represents the mixture similarity function of pixel p and pixel q, subscript H represents the mixture, D H (p, q) represents the weight w of the mixture from pixel point p to pixel point q H (m, n) cumulative distances along the path; sigma H The constant is a constant of a mixed similarity function and is used for adjusting the similarity between two pixel points;
(502) and performing confidence aggregation on the initial parallax confidence by adopting a horizontal tree structure, wherein the aggregation process is divided into a left-to-right process and a right-to-left process, and the confidence aggregation value of the aggregated pixel points is as follows:
wherein p is a pixel point in the image, the superscript LR represents the direction of aggregation from left to right, RL represents the direction of aggregation from right to left, pl represents the previous pixel point of the pixel point p, pr represents the next pixel point of the pixel point p, and S H (p, q) represents the mixed similarity between adjacent pixel points;indicating the parallax value of the pixel point pD is confidence aggregation value accumulated from left to right based on horizontal tree, B (p) is obtained in formula (9) and represents parallax confidence value of p point, B (pr) represents parallax confidence aggregation value of next pixel point of pixel point, S H (p, pr) represents the mixed similarity of the pixel point p and the next point pr,representing the average value of the confidence aggregation value accumulated from left to right, the confidence aggregation value accumulated from right to left and the parallax confidence value of the pixel point p;
the belief propagation module processing process specifically comprises the following steps:
(601) in the disparity updating phase, namely after confidence aggregation, according to the minimum spanning tree established in the step (3), performing confidence propagation on confidence aggregation values based on the minimum spanning tree, and comprising the following steps of:
(6a) aggregation from leaf node to root node, i.e.:
wherein Ch (p) represents a child node of the pixel point p,the confidence aggregation value of the pixel point p is expressed as a value after the confidence propagation is carried out from the leaf node to the root node, so the confidence propagation value of the pixel point p comprises the confidence aggregation value of the pixel point p and the sum of the multiplication side weights of all subtrees from the pixel point p;
(6b) from the root node to the leaf nodes, the aggregation is:
wherein, Pr (p) represents the father node of the pixel point p;performing belief propagation on the belief aggregation value of the pixel point p from the root node to the leaf node;
(602) for any pixel point q, S (p, q) represents the similarity of color information of points p and q in the minimum spanning tree, and the confidence aggregation value B A (q) is the area size similar in both color and disparity information in the neighborhood of q, so S (p, q) B A (q) means the probability that p and q have the same disparity; when the temperature is higher than the set temperatureThen, the disparity value d (p) of p is the optimal disparity estimation of q points,is the probability of the best disparity estimate, I denotes the whole input image, which is propagated from the confidence cluster values of the minimum spanning tree, so it is defined as confidence propagationOptimal disparity estimation is defined as disparity propagationFor each node p:
the process is to find the optimal disparity estimation of the unstable point from the stable points, so as to update the disparity value of the unstable point and obtain the final dense disparity map.
10. A computing device, comprising:
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811016383.6A CN109887008B (en) | 2018-08-31 | 2018-08-31 | Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811016383.6A CN109887008B (en) | 2018-08-31 | 2018-08-31 | Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109887008A CN109887008A (en) | 2019-06-14 |
CN109887008B true CN109887008B (en) | 2022-09-13 |
Family
ID=66924833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811016383.6A Active CN109887008B (en) | 2018-08-31 | 2018-08-31 | Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109887008B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610503B (en) * | 2019-08-21 | 2023-10-27 | 河海大学常州校区 | Three-dimensional information recovery method for electric knife switch based on three-dimensional matching |
CN111242999B (en) * | 2020-01-10 | 2022-09-20 | 大连理工大学 | Parallax estimation optimization method based on up-sampling and accurate re-matching |
CN111415305A (en) * | 2020-03-10 | 2020-07-14 | 桂林电子科技大学 | Method for recovering three-dimensional scene, computer-readable storage medium and unmanned aerial vehicle |
CN111432194B (en) * | 2020-03-11 | 2021-07-23 | 北京迈格威科技有限公司 | Disparity map hole filling method and device, electronic equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105761270B (en) * | 2016-03-15 | 2018-11-27 | 杭州电子科技大学 | A kind of tree-shaped filtering solid matching method based on EP point range conversion |
CN106504276B (en) * | 2016-10-25 | 2019-02-19 | 桂林电子科技大学 | Non local solid matching method |
CN107274448B (en) * | 2017-07-11 | 2023-05-23 | 江南大学 | Variable weight cost aggregation stereo matching algorithm based on horizontal tree structure |
-
2018
- 2018-08-31 CN CN201811016383.6A patent/CN109887008B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109887008A (en) | 2019-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109887008B (en) | Method, device and equipment for parallax stereo matching based on forward and backward smoothing and O (1) complexity | |
Dai et al. | Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry | |
Pham et al. | Domain transformation-based efficient cost aggregation for local stereo matching | |
Foi | Noise estimation and removal in MR imaging: The variance-stabilization approach | |
CN103236082B (en) | Towards the accurate three-dimensional rebuilding method of two-dimensional video of catching static scene | |
Çiğla et al. | Efficient edge-preserving stereo matching | |
CN107578430B (en) | Stereo matching method based on self-adaptive weight and local entropy | |
CN103310421B (en) | The quick stereo matching process right for high-definition image and disparity map acquisition methods | |
CN106875443B (en) | The whole pixel search method and device of 3-dimensional digital speckle based on grayscale restraint | |
CN101625768A (en) | Three-dimensional human face reconstruction method based on stereoscopic vision | |
CN109146946B (en) | Image non-local stereo matching method | |
CN103702098A (en) | In-depth extracting method of three-viewpoint stereoscopic video restrained by time-space domain | |
CN111223059B (en) | Robust depth map structure reconstruction and denoising method based on guide filter | |
CN103996201A (en) | Stereo matching method based on improved gradient and adaptive window | |
CN103268604B (en) | Binocular video depth map acquiring method | |
CN108629809B (en) | Accurate and efficient stereo matching method | |
CN106408596A (en) | Edge-based local stereo matching method | |
CN104318576A (en) | Super-pixel-level image global matching method | |
Yang | Local smoothness enforced cost volume regularization for fast stereo correspondence | |
CN102740096A (en) | Space-time combination based dynamic scene stereo video matching method | |
CN112734822A (en) | Stereo matching algorithm based on infrared and visible light images | |
CN108805841B (en) | Depth map recovery and viewpoint synthesis optimization method based on color map guide | |
CN103413332B (en) | Based on the image partition method of two passage Texture Segmentation active contour models | |
CN109816781B (en) | Multi-view solid geometry method based on image detail and structure enhancement | |
CN103489183B (en) | A kind of sectional perspective matching process split based on edge with seed point |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |