CN112991421A

CN112991421A - Robot vision stereo matching method

Info

Publication number: CN112991421A
Application number: CN202110304658.1A
Authority: CN
Inventors: 王耀南; 安果维; 毛建旭; 朱青; 张辉; 曾凯
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-18
Anticipated expiration: 2041-03-23
Also published as: CN112991421B

Abstract

The invention discloses a robot visual stereo matching method, which comprises the steps that firstly, in a cost calculation part, through an SAD and MCT matching cost calculation mode, the correlation and the global property of window pixel points are considered while weak texture and repeated texture effects of an image are ensured, and noise is prevented from being introduced; secondly, in a cost aggregation stage, introducing an adaptive window which changes the size and the direction based on the image gradient, fully considering the gradient information of the image by using the adaptive window based on the gradient change, namely increasing the size of the window in a mild gradient area and reducing the size of the window in a severe gradient area, so that the edge part of the image is kept to the maximum extent, and simultaneously, conducting guide filtering in the adaptive window to aggregate the cost, and further using a multi-scale aggregation method on the basis to obtain a better cost aggregation result; and finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.

Description

Robot vision stereo matching method

Technical Field

The invention belongs to the technical field of visual perception of industrial robots and mobile robots, particularly relates to a robot visual stereo matching method, and particularly relates to a robot visual stereo matching method based on cost improvement calculation and gradient adaptive window multi-scale aggregation.

Background

The stereo matching technology is an extremely critical step of stereo vision, the matching precision and speed are key restriction factors of stereo vision application and development, and the stereo matching technology is widely applied in the fields of photogrammetry, three-dimensional reconstruction, virtual reality, unmanned driving, mobile robots, mobile trolleys, mars, lunar vehicles and the like. The precision of stereo matching directly influences the precision of depth estimation and terrain three-dimensional reconstruction of the mobile robot, plays a very key role in improving the precision of visual navigation of the mobile robot based on stereo vision, and directly influences whether a visual perception task of the mobile robot can be completed.

The stereo matching can be divided into a global algorithm and a local algorithm according to different optimization methods. The global algorithm is to establish an energy function in a global range, optimize the energy function to obtain a cost value of each pixel, and then perform cost calculation and parallax optimization to obtain a final parallax image. The global algorithm has high precision, but has high calculation complexity and poor real-time performance, and has great limitation in practical application scenes. The local algorithm comprises four steps of cost calculation, cost aggregation, parallax calculation and parallax optimization, wherein the purpose of the cost calculation is to calculate the correlation of a pixel pair to be matched, an Xing Mei provides a cost calculation mode of combining an absolute pixel value AD and a census-transform in an On Building Stereo Matching System On Graphics Hardware, the Matching effect of a weak texture region and a repeated texture region of an image is improved, the absolute pixel value AD is the correlation of a single pixel and noise is easily introduced, and the traditional census-transform only compares each pixel in a window with a central pixel and ignores the global property of the whole window. Cost Aggregation is the most important step in a local Stereo matching algorithm, and can be essentially regarded as a process of filtering initial matching Cost, the traditional filtering is carried out by utilizing a box Filter and a Gaussian Filter in a Cost Aggregation stage, edge information in an image cannot be well protected, Pauline Tan proposes that Cost Aggregation is carried out by using a guide Filter in Stereo Disaridity route Cost Aggregation with Guided Filter, the edge protection performance is improved, but the guide filtering is carried out in the image global range, and a better result cannot be obtained in an area with discontinuous depth and large image gradient change. Kang Zhang first proposes the idea of Cost Aggregation through Cross-Scale fusion in Cross-Scale Cost Aggregation for Stereo Matching, but the Cost Aggregation effect in single Scale is relatively general.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a robot vision stereo matching method. And secondly, in a cost aggregation stage, gradient information of the image is fully considered by utilizing a gradient change-based adaptive window, so that the edge part of the image is maintained to the maximum extent, and a multi-scale aggregation method is further used on the basis to obtain a better cost aggregation result. And finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.

The purpose of the invention is realized by the following technical scheme: the robot visual stereo matching method comprises the following steps:

step 1, obtaining a binocular image to be matched after distortion correction and stereo correction;

step 2, improving the traditional gradient-based matching cost calculation mode, and fusing the gradients in the x and y directions;

step 3, fusing the SAD and MCT matching cost calculation modes, wherein the SAD represents the sum of the gray difference absolute values of all pixels in the neighborhood of the pixel to be matched, and the MCT represents improved census transformation which takes the central pixel of a support window in the binocular image to be matched as the pixel mean value;

step 4, the improved gradient-based matching cost calculation mode and the SAD and MCT matching cost calculation mode are fused again to obtain a final matching cost calculation mode;

step 5, down-sampling the binocular image to be matched to generate an image pyramid;

step 6, generating an adaptive window with size changing based on gradient on the image of each scale of the generated image pyramid;

step 7, obtaining a parallax space map corresponding to the image of each scale through the matching cost calculation mode of the step 4, sliding each parallax space map by using the adaptive window obtained in the step 6, and performing guiding filtering in the adaptive window, namely a cost aggregation process of each scale;

step 8, performing multi-scale polymerization on the cost polymerization result of each scale obtained in the step 7 to obtain a final cost polymerization result;

step 9, calculating a cost aggregation result in the step 8 by using a method of taking the winner as the WTA to obtain a pixel parallax value;

and step 10, optimizing the obtained parallax value by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final parallax result, wherein the parallax result is a final stereo matching result.

As a further improvement, the step 2 is embodied as follows:

setting the gradient in the known x, y direction as

Wherein the content of the first and second substances,

for the calculation of the gradient operation in the x-direction,

for calculating gradient operations in the y-direction, G_RFor the value of a pixel in the R channel in a three channel image, G_GFor the value of a pixel in the G channel in a three channel image, G_BObtaining an improved gradient-based matching cost calculation formula for the value of a pixel in a B channel in a three-channel image:

wherein p represents a pixel, d represents a parallax value, α represents a proportion of a gradient in the y direction in the gradient cost, which is a set value,

representing the gradient value of pixel p in the left image in the x-direction,

representing the gradient values of the pixels p-d in the x-direction in the right diagram,

representing the gradient value of pixel p in the left image in the y direction,

representing the gradient value, τ, of the pixel p-d in the y-direction in the right diagram₁、τ₂Each represents a set cutoff value.

As a further improvement, the SAD matching cost in the step 3 is calculated as

Wherein, I_l(p) represents the sum of the values of the three channels of the pixel p in the left image, I_r(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, N_PRepresenting a neighborhood centered on pixel p, the matching cost of MCT is calculated as C_mcent(p,d)＝Hamming(C_cl(p),C_cr(p-d)), wherein Hamming represents a Hamming distance, and the specific operation is onC_cl(p) and C_cr(p-d) performing XOR operation, wherein the statistical result is the number of 1, C_cl(p) a character string obtained by converting the pixel p in the left image, C_crAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.

As a further improvement, the MCT matching cost calculation method comprises the following processes:

firstly, comparing the neighborhood pixel with the central pixel to obtain a Boolean value, mapping the Boolean value to a bit string, wherein the central pixel value is the average value of all pixels in the neighborhood window to obtain the Boolean value

C_c(p) represents a character string obtained after the conversion operation is performed on the pixel p,

representing a connection by bit, N_PA neighborhood of p is represented in the neighborhood of p,

representing the mean value of all pixels in the neighborhood, and I (p) representing the value of a pixel p, which is the sum of the values of three channels;

then, obtaining MCT matching cost calculation by taking a Hamming distance of two bit strings, wherein the Hamming distance is the number of different corresponding bits of the two bit strings, specifically, carrying out XOR operation on the two bit strings, and counting the number of bits which is not 1 in the bits of the XOR operation result, and the obtained MCT matching cost calculation mode is that

C_mcent(p,d)＝Hamming(C_cl(p),C_cr(p-d))。

As a further improvement, the final matching cost calculation method obtained in step 4 is as follows:

in the formula, λ_SADThe control parameter, λ, representing the SAD matching cost calculation mode_mcentThe control parameter, λ, representing the MCT matching cost calculation mode_gA control parameter, λ, representing an improved gradient-based matching cost calculation_SAD、λ_mcentAnd λ_gAre all set values.

As a further improvement, the step 6 is embodied as follows:

step a, respectively calculating the gradients g in the horizontal direction and the vertical direction of the binocular image to be matched_xAnd g_yIn a calculation manner of

The direction of the initial smoothing window centered on pixel p is θ⁰(i, j) is calculated in a manner of θ⁰(i,j)＝arctan(g_(y)(i,j)/g_(x)(i, j)), i represents an abscissa value of the pixel p, j represents an ordinate value of the pixel p, g_(x)(i, j) represents the gradient of the pixel p in the x-direction, g_(y)(i, j) represents the gradient of the pixel p in the y-direction, and the size of the initial window is set as w⁰(i, j) and H⁰(i,j)，w⁰(i, j) denotes the width of the initial window, H⁰(i, j) represents the height of the initial window, which is calculated as

a represents the maximum square window size in the smoothing window;

step b: calculating the gradient algebraic sum of the horizontal direction and the vertical direction in the window

And

the calculation method is

Wherein k represents a unit value in the horizontal direction, l represents a unit value in the vertical direction, and g_x(i + k, j + l) represents a gradient of a pixel having coordinates (i + k, j + l) in the horizontal direction, g_y(i + k, j + L) represents the gradient of the pixel with the coordinate of (i + k, j + L) in the vertical direction, L is the adaptive window, the window direction is updated according to the algebraic sum of the gradients, the calculation mode is,

step c: calculating the sum of absolute values of gradients in the horizontal direction and the vertical direction in the window

And

the window size is updated according to the sum of absolute values of the gradients which are calculated in the way that

The window size is calculated in the manner of

Wherein, w^m(i, j) represents the size of the adaptive window in the horizontal direction, H^m(i, j) represents the size of the adaptive window in the vertical direction, θ^m(i, j) represents a direction of the adaptive window;

step d: when the adaptive window size satisfies the condition

When the window size and direction are stopped to be updated, w^m+1(i, j) represents the horizontal dimension of the adaptive window neighborhood, H^m+1(i, j) represents the size of the adaptive window neighborhood in the vertical direction.

As a further improvement, the step 7 performs guided filtering within an adaptive window, and is specifically implemented by the following steps:

firstly, an energy function optimization model of a guide filtering algorithm is established

In the formula, a_kAnd b_kLinear coefficients representing guided filtering, I_iRepresenting an input image, P_iRepresenting the image to be filtered, i and k representing the image indices,

for the regularization term, put in the energy function equation to prevent a_kOver size, N_kAn adaptive support window representing pixel k;

secondly, a linear coefficient a of the guiding filtering based on the adaptive window is obtained by minimizing and calculating the energy function optimization model_kAnd b_kAre respectively as

In the formula, mu_kIs represented by_iThe average value of (a) of (b),

representing the mean, σ, of the image to be filtered in an adaptive window_kIs represented by_iStandard deviation of (d);

finally, the cost function obtained after filtering is as

Wherein a is_kI_i+b_kRepresenting an output image, with I in a window centered on a pixel k_iThere is a local linear relationship.

As a further improvement, the final cost polymerization result obtained in step 8 is

Representing the final aggregation cost, S representing the number of downsampling layers, a representing the coefficient matrix in the solving process, S representing each specific layer,

representing the matching cost matrix of layer 0.

According to the robot visual stereo matching method provided by the invention, firstly, in a cost calculation part, through an SAD and MCT matching cost calculation mode, the correlation and the global property of a window pixel point are considered while the weak texture and the repeated texture effect of an image are ensured, and the introduction of noise is avoided; secondly, in a cost aggregation stage, introducing an adaptive window which changes the size and the direction based on the image gradient, fully considering the gradient information of the image by using the adaptive window based on the gradient change, namely increasing the size of the window in a mild gradient area and reducing the size of the window in a severe gradient area, so that the edge part of the image is kept to the maximum extent, and simultaneously, conducting guide filtering in the adaptive window to aggregate the cost, and further using a multi-scale aggregation method on the basis to obtain a better cost aggregation result; and finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.

Drawings

The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.

Fig. 1 is a flowchart of a robot visual stereo matching method.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings and specific embodiments, and it is to be noted that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.

Fig. 1 is a diagram illustrating a robot visual stereo matching method according to an embodiment of the present invention. Referring to fig. 1, the robot vision stereo matching method includes the following steps:

step 2, improving the traditional gradient-based matching cost calculation mode, and fusing the gradients in the x and y directions; the steps are embodied as follows: setting the gradient in the known x, y direction as

Wherein

For the calculation of the gradient operation in the x-direction,

for calculating gradient operations in the y-direction, G_RFor the value of a pixel in the R channel in a three channel image, G_GFor the value of a pixel in the G channel in a three channel image, G_BFor the values of pixels in the B channel in a three channel image, an improved gradient-based matching cost calculation formula is obtained:

representing the gradient value, τ, of the pixel p-d in the y-direction in the right diagram₁、τ₂Indicating the set cutoff value.

Step 3, fusing a SAD (sum of absolute difference) and MCT (mean transform) matching cost calculation mode, wherein SAD represents the sum of absolute difference values of all pixels in a neighborhood of a pixel to be matched, and MCT represents improved census (statistical) transform which supports a window central pixel as a pixel mean value in a binocular image to be matched;

specifically, the SAD matching cost in this step is calculated as

Wherein, I_l(p) represents the sum of the values of the three channels of the pixel p in the left image, I_r(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, N_PRepresenting a neighborhood centered on a pixel p; the MCT matching cost calculation method comprises the steps of firstly, comparing a neighborhood pixel with a center pixel to obtain a Boolean value, mapping the Boolean value to a bit string, and obtaining the center pixel value which is the average value of all pixels in a neighborhood window

Wherein C is_c(p) represents a character string obtained after the conversion operation is performed on the pixel p,

representing the mean value of all pixels in the neighborhood, and I (p) representing the value of a pixel p, which is the sum of the values of three channels; then, obtaining MCT matching cost calculation by taking the Hamming distance of two bit strings, wherein the Hamming distance is the number of different corresponding bits of the two bit strings, specifically, carrying out XOR operation on the two bit strings, and counting the number of bits which is not 1 in the bits of the XOR operation result to obtain the numberThe MCT matching cost calculation mode is C_mcent(p,d)＝Hamming(C_cl(p),C_cr(p-d)), wherein Hamming represents Hamming distance, the specific operation is to C_cl(p) and C_cr(p-d) performing XOR operation, wherein the statistical result is the number of 1, C_cl(p) a character string obtained by converting the pixel p in the left image, C_crAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.

Step 4, the improved gradient-based matching cost calculation mode and the SAD and MCT matching cost calculation mode are fused again to obtain the final matching cost calculation mode

In the formula, λ_SADRepresenting a control parameter, λ, of an SAD calculation method_mcentRegulation parameter, lambda, representing the MCT calculation method_gThe regulation and control parameters representing the improved gradient method are set values;

step 6, generating an adaptive window with size changing based on gradient on the image of each scale of the generated image pyramid; it should be noted that the size, shape and orientation of the window may vary with the image structure information. The specific steps for generating the adaptive window are as follows:

a represents the maximum square window size in the smoothing window;

And

the calculation method is

And

The window size is calculated in the manner of

Wherein the content of the first and second substances,w^m(i, j) represents the size of the adaptive window in the horizontal direction, H^m(i, j) represents the size of the adaptive window in the vertical direction, θ^m(i, j) represents a direction of the adaptive window;

step d: when the adaptive window size satisfies the condition

Step 7, obtaining a parallax space map corresponding to the image of each scale through the matching cost calculation mode of the step 4, sliding each parallax space map by using the adaptive window obtained in the step 6, and performing guiding filtering in the adaptive window, namely a cost aggregation process of each scale; in other words, in this step, the adaptive window is only one window, the guided filtering is performed within the adaptive window, the target is a result (which is a disparity space map) obtained by previous matching cost calculation, images of different scales can obtain their disparity space maps, the images are respectively subjected to adaptive window guided filtering, then are summarized, and the process of performing adaptive window guided filtering on each disparity space map is cost aggregation.

Specifically, in this step, first, an energy function optimization model of a guided filtering algorithm is established

second, by optimizing the model minimization to the energy functionCalculating to obtain a linear coefficient a of the guide filtering based on the self-adaptive window_kAnd b_kAre respectively as

μ_kIs represented by_iThe average value of (a) of (b),

finally, the cost function obtained after filtering is as

Step 8, carrying out multi-scale polymerization on the cost polymerization result of each scale obtained in the step 7 to obtain a final cost polymerization result

representing the matching cost matrix of layer 0. It should be noted that, since cost aggregation is performed on images of each scale before, the step is called multi-scale aggregation because results obtained by each scale are summarized.

Step 9, calculating a cost aggregation result in the step 8 by using a method of selecting a parallax corresponding to the minimum cost as an optimal parallax from the cost values of the winner take all of the pixels under all the parallaxes of the pixels to obtain a parallax value of the pixels;

In summary, the robot vision stereo matching method of the invention specifically comprises the steps of (1) obtaining binocular images to be matched after distortion correction and stereo correction; (2) the traditional gradient-based matching cost calculation mode is improved, and the gradients in the x and y directions are subjected to reasonable and effective normalized fusion; (3) fusing the sum SAD of the absolute value of the gray level difference of all pixels in the neighborhood of the pixel to be matched with a matching cost calculation mode of improved census transformation MCT which supports the central pixel of a window as the pixel mean value; (4) the improved gradient-based matching cost calculation mode is fused with the SAD and MCT matching cost calculation mode again to obtain a final matching cost calculation mode; (5) down-sampling the binocular image to be matched to generate an image pyramid; (6) generating an adaptive window for automatically controlling the size based on the gradient on the image of each scale; (7) performing guided filtering within each window; (8) aggregating results of different scales of the image pyramid to obtain a final cost aggregation result; (9) calculating the cost aggregation result in the step 8 by using a method of taking the winner as the WTA to obtain a parallax value of the pixel; (10) and performing parallax optimization on the result obtained by parallax calculation by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final stereo matching result. Through the process, in the cost aggregation process of stereo matching, the self-adaptive window based on gradient change is introduced, the size of the window can be changed along with the change of the gradient, the window is larger when the gradient change is smooth, the window is smaller when the gradient change is severe, and the filtering cost aggregation is guided in the self-adaptive window. Compared with the method for conducting guided filtering cost aggregation in the whole picture global range, the method can more emphatically calculate the local part, can weaken the influence in regions with discontinuous depth and the like, which are easy to make mistakes in stereo matching, and can increase the influence in regions with good stereo matching effect, such as the regions with smooth depth change and the like.

Therefore, compared with the prior art, the invention has the advantages that:

(1) effectively weakening the introduction of noise in the calculation process and improving the precision of stereo matching

The invention improves the traditional cost calculation mode of AD-Census (combination of absolute difference of adjacent degree Census and Census transformation) in the cost calculation process, provides a matching cost calculation mode of SAD and MCT, fuses the cost of pixel sum of a neighborhood support window and Census transformation cost of a central pixel which is the mean value of the neighborhood support window, considers the functions of other pixels in the neighborhood window while considering pixel correlation, has global property, can weaken the influence of noise in the calculation process and reduce the introduction of noise.

(2) The method has the advantages that the method obtains better effect in areas with discontinuous depth and rich texture change, and improves the precision of stereo matching.

In the description above, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore should not be construed as limiting the scope of the present invention.

In conclusion, although the present invention has been described with reference to the preferred embodiments, it should be noted that, although various changes and modifications may be made by those skilled in the art, they should be included in the scope of the present invention unless they depart from the scope of the present invention.

Claims

1. A robot vision stereo matching method is characterized by comprising the following steps:

step 3, integrating the SAD and MCT matching cost calculation modes, wherein the SAD represents the sum of the absolute values of the gray differences of all pixels in the neighborhood of the pixel to be matched in the binocular image to be matched, and the MCT represents improved census transformation which takes the central pixel of the support window in the binocular image to be matched as the pixel mean value;

2. The robot-vision stereo matching method according to claim 1, wherein the step 2 is embodied as follows:

setting the gradient in the known x, y direction as

Wherein the content of the first and second substances,

is at the same timeThe computed gradient in the x-direction operates,

3. The robot visual stereo matching method according to claim 2, wherein the SAD matching cost in the step 3 is calculated as

Wherein, I_l(p) representsSum of three channel values, I, of pixel p in the left image_r(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, N_PRepresenting a neighborhood centered on pixel p, the matching cost of MCT is calculated as C_mcent(p,d)＝Hamming(C_cl(p),C_cr(p-d)), where Hamming represents the Hamming distance, the specific operation is for C_cl(p) and C_cr(p-d) performing XOR operation, wherein the statistical result is the number of 1, C_cl(p) a character string obtained by converting the pixel p in the left image, C_crAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.

4. The robot visual stereo matching method according to claim 3, wherein the MCT matching cost calculation method process is as follows:

C_mcent(p,d)＝Hamming(C_cl(p),C_cr(p-d))。

5. The robot visual stereo matching method according to claim 4, wherein the final matching cost calculation method obtained in the step 4 is:

6. The robot-vision stereo matching method according to claim 5, wherein the step 6 is embodied as:

a represents the maximum square window size in the smoothing window;

And

the calculation method is

And

The window size is calculated in the manner of

step d: when the adaptive window size satisfies the condition

7. The robot vision stereo matching method according to claim 6, wherein the step 7 of conducting guided filtering within an adaptive window is implemented by:

In the formula, mu_kIs represented by_iThe average value of (a) of (b),

representing the mean value of the image to be filtered in an adaptive window,σ_kIs represented by_iStandard deviation of (d);

finally, the cost function obtained after filtering is as

8. The method for robot visual stereo matching according to claim 7, wherein the final cost aggregation result obtained in the step 8 is

representing the matching cost matrix of layer 0.