CN112991421A - Robot vision stereo matching method - Google Patents

Robot vision stereo matching method Download PDF

Info

Publication number
CN112991421A
CN112991421A CN202110304658.1A CN202110304658A CN112991421A CN 112991421 A CN112991421 A CN 112991421A CN 202110304658 A CN202110304658 A CN 202110304658A CN 112991421 A CN112991421 A CN 112991421A
Authority
CN
China
Prior art keywords
pixel
window
gradient
representing
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110304658.1A
Other languages
Chinese (zh)
Other versions
CN112991421B (en
Inventor
王耀南
安果维
毛建旭
朱青
张辉
曾凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202110304658.1A priority Critical patent/CN112991421B/en
Publication of CN112991421A publication Critical patent/CN112991421A/en
Application granted granted Critical
Publication of CN112991421B publication Critical patent/CN112991421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a robot visual stereo matching method, which comprises the steps that firstly, in a cost calculation part, through an SAD and MCT matching cost calculation mode, the correlation and the global property of window pixel points are considered while weak texture and repeated texture effects of an image are ensured, and noise is prevented from being introduced; secondly, in a cost aggregation stage, introducing an adaptive window which changes the size and the direction based on the image gradient, fully considering the gradient information of the image by using the adaptive window based on the gradient change, namely increasing the size of the window in a mild gradient area and reducing the size of the window in a severe gradient area, so that the edge part of the image is kept to the maximum extent, and simultaneously, conducting guide filtering in the adaptive window to aggregate the cost, and further using a multi-scale aggregation method on the basis to obtain a better cost aggregation result; and finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.

Description

Robot vision stereo matching method
Technical Field
The invention belongs to the technical field of visual perception of industrial robots and mobile robots, particularly relates to a robot visual stereo matching method, and particularly relates to a robot visual stereo matching method based on cost improvement calculation and gradient adaptive window multi-scale aggregation.
Background
The stereo matching technology is an extremely critical step of stereo vision, the matching precision and speed are key restriction factors of stereo vision application and development, and the stereo matching technology is widely applied in the fields of photogrammetry, three-dimensional reconstruction, virtual reality, unmanned driving, mobile robots, mobile trolleys, mars, lunar vehicles and the like. The precision of stereo matching directly influences the precision of depth estimation and terrain three-dimensional reconstruction of the mobile robot, plays a very key role in improving the precision of visual navigation of the mobile robot based on stereo vision, and directly influences whether a visual perception task of the mobile robot can be completed.
The stereo matching can be divided into a global algorithm and a local algorithm according to different optimization methods. The global algorithm is to establish an energy function in a global range, optimize the energy function to obtain a cost value of each pixel, and then perform cost calculation and parallax optimization to obtain a final parallax image. The global algorithm has high precision, but has high calculation complexity and poor real-time performance, and has great limitation in practical application scenes. The local algorithm comprises four steps of cost calculation, cost aggregation, parallax calculation and parallax optimization, wherein the purpose of the cost calculation is to calculate the correlation of a pixel pair to be matched, an Xing Mei provides a cost calculation mode of combining an absolute pixel value AD and a census-transform in an On Building Stereo Matching System On Graphics Hardware, the Matching effect of a weak texture region and a repeated texture region of an image is improved, the absolute pixel value AD is the correlation of a single pixel and noise is easily introduced, and the traditional census-transform only compares each pixel in a window with a central pixel and ignores the global property of the whole window. Cost Aggregation is the most important step in a local Stereo matching algorithm, and can be essentially regarded as a process of filtering initial matching Cost, the traditional filtering is carried out by utilizing a box Filter and a Gaussian Filter in a Cost Aggregation stage, edge information in an image cannot be well protected, Pauline Tan proposes that Cost Aggregation is carried out by using a guide Filter in Stereo Disaridity route Cost Aggregation with Guided Filter, the edge protection performance is improved, but the guide filtering is carried out in the image global range, and a better result cannot be obtained in an area with discontinuous depth and large image gradient change. Kang Zhang first proposes the idea of Cost Aggregation through Cross-Scale fusion in Cross-Scale Cost Aggregation for Stereo Matching, but the Cost Aggregation effect in single Scale is relatively general.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a robot vision stereo matching method. And secondly, in a cost aggregation stage, gradient information of the image is fully considered by utilizing a gradient change-based adaptive window, so that the edge part of the image is maintained to the maximum extent, and a multi-scale aggregation method is further used on the basis to obtain a better cost aggregation result. And finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.
The purpose of the invention is realized by the following technical scheme: the robot visual stereo matching method comprises the following steps:
step 1, obtaining a binocular image to be matched after distortion correction and stereo correction;
step 2, improving the traditional gradient-based matching cost calculation mode, and fusing the gradients in the x and y directions;
step 3, fusing the SAD and MCT matching cost calculation modes, wherein the SAD represents the sum of the gray difference absolute values of all pixels in the neighborhood of the pixel to be matched, and the MCT represents improved census transformation which takes the central pixel of a support window in the binocular image to be matched as the pixel mean value;
step 4, the improved gradient-based matching cost calculation mode and the SAD and MCT matching cost calculation mode are fused again to obtain a final matching cost calculation mode;
step 5, down-sampling the binocular image to be matched to generate an image pyramid;
step 6, generating an adaptive window with size changing based on gradient on the image of each scale of the generated image pyramid;
step 7, obtaining a parallax space map corresponding to the image of each scale through the matching cost calculation mode of the step 4, sliding each parallax space map by using the adaptive window obtained in the step 6, and performing guiding filtering in the adaptive window, namely a cost aggregation process of each scale;
step 8, performing multi-scale polymerization on the cost polymerization result of each scale obtained in the step 7 to obtain a final cost polymerization result;
step 9, calculating a cost aggregation result in the step 8 by using a method of taking the winner as the WTA to obtain a pixel parallax value;
and step 10, optimizing the obtained parallax value by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final parallax result, wherein the parallax result is a final stereo matching result.
As a further improvement, the step 2 is embodied as follows:
setting the gradient in the known x, y direction as
Figure BDA0002987603710000031
Wherein the content of the first and second substances,
Figure BDA0002987603710000032
for the calculation of the gradient operation in the x-direction,
Figure BDA0002987603710000033
for calculating gradient operations in the y-direction, GRFor the value of a pixel in the R channel in a three channel image, GGFor the value of a pixel in the G channel in a three channel image, GBObtaining an improved gradient-based matching cost calculation formula for the value of a pixel in a B channel in a three-channel image:
Figure BDA0002987603710000034
wherein p represents a pixel, d represents a parallax value, α represents a proportion of a gradient in the y direction in the gradient cost, which is a set value,
Figure BDA0002987603710000035
representing the gradient value of pixel p in the left image in the x-direction,
Figure BDA0002987603710000036
representing the gradient values of the pixels p-d in the x-direction in the right diagram,
Figure BDA0002987603710000037
representing the gradient value of pixel p in the left image in the y direction,
Figure BDA0002987603710000038
representing the gradient value, τ, of the pixel p-d in the y-direction in the right diagram1、τ2Each represents a set cutoff value.
As a further improvement, the SAD matching cost in the step 3 is calculated as
Figure BDA0002987603710000041
Wherein, Il(p) represents the sum of the values of the three channels of the pixel p in the left image, Ir(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, NPRepresenting a neighborhood centered on pixel p, the matching cost of MCT is calculated as Cmcent(p,d)=Hamming(Ccl(p),Ccr(p-d)), wherein Hamming represents a Hamming distance, and the specific operation is onCcl(p) and Ccr(p-d) performing XOR operation, wherein the statistical result is the number of 1, Ccl(p) a character string obtained by converting the pixel p in the left image, CcrAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.
As a further improvement, the MCT matching cost calculation method comprises the following processes:
firstly, comparing the neighborhood pixel with the central pixel to obtain a Boolean value, mapping the Boolean value to a bit string, wherein the central pixel value is the average value of all pixels in the neighborhood window to obtain the Boolean value
Figure BDA0002987603710000042
Cc(p) represents a character string obtained after the conversion operation is performed on the pixel p,
Figure BDA0002987603710000043
representing a connection by bit, NPA neighborhood of p is represented in the neighborhood of p,
Figure BDA0002987603710000044
Figure BDA0002987603710000045
representing the mean value of all pixels in the neighborhood, and I (p) representing the value of a pixel p, which is the sum of the values of three channels;
then, obtaining MCT matching cost calculation by taking a Hamming distance of two bit strings, wherein the Hamming distance is the number of different corresponding bits of the two bit strings, specifically, carrying out XOR operation on the two bit strings, and counting the number of bits which is not 1 in the bits of the XOR operation result, and the obtained MCT matching cost calculation mode is that
Cmcent(p,d)=Hamming(Ccl(p),Ccr(p-d))。
As a further improvement, the final matching cost calculation method obtained in step 4 is as follows:
Figure BDA0002987603710000046
in the formula, λSADThe control parameter, λ, representing the SAD matching cost calculation modemcentThe control parameter, λ, representing the MCT matching cost calculation modegA control parameter, λ, representing an improved gradient-based matching cost calculationSAD、λmcentAnd λgAre all set values.
As a further improvement, the step 6 is embodied as follows:
step a, respectively calculating the gradients g in the horizontal direction and the vertical direction of the binocular image to be matchedxAnd gyIn a calculation manner of
Figure BDA0002987603710000051
The direction of the initial smoothing window centered on pixel p is θ0(i, j) is calculated in a manner of θ0(i,j)=arctan(g(y)(i,j)/g(x)(i, j)), i represents an abscissa value of the pixel p, j represents an ordinate value of the pixel p, g(x)(i, j) represents the gradient of the pixel p in the x-direction, g(y)(i, j) represents the gradient of the pixel p in the y-direction, and the size of the initial window is set as w0(i, j) and H0(i,j),w0(i, j) denotes the width of the initial window, H0(i, j) represents the height of the initial window, which is calculated as
Figure BDA0002987603710000052
a represents the maximum square window size in the smoothing window;
step b: calculating the gradient algebraic sum of the horizontal direction and the vertical direction in the window
Figure BDA0002987603710000053
And
Figure BDA0002987603710000054
the calculation method is
Figure BDA0002987603710000055
Wherein k represents a unit value in the horizontal direction, l represents a unit value in the vertical direction, and gx(i + k, j + l) represents a gradient of a pixel having coordinates (i + k, j + l) in the horizontal direction, gy(i + k, j + L) represents the gradient of the pixel with the coordinate of (i + k, j + L) in the vertical direction, L is the adaptive window, the window direction is updated according to the algebraic sum of the gradients, the calculation mode is,
Figure BDA0002987603710000056
step c: calculating the sum of absolute values of gradients in the horizontal direction and the vertical direction in the window
Figure BDA0002987603710000057
And
Figure BDA0002987603710000058
the window size is updated according to the sum of absolute values of the gradients which are calculated in the way that
Figure BDA0002987603710000059
The window size is calculated in the manner of
Figure BDA00029876037100000510
Wherein, wm(i, j) represents the size of the adaptive window in the horizontal direction, Hm(i, j) represents the size of the adaptive window in the vertical direction, θm(i, j) represents a direction of the adaptive window;
step d: when the adaptive window size satisfies the condition
Figure BDA0002987603710000061
When the window size and direction are stopped to be updated, wm+1(i, j) represents the horizontal dimension of the adaptive window neighborhood, Hm+1(i, j) represents the size of the adaptive window neighborhood in the vertical direction.
As a further improvement, the step 7 performs guided filtering within an adaptive window, and is specifically implemented by the following steps:
firstly, an energy function optimization model of a guide filtering algorithm is established
Figure BDA0002987603710000062
In the formula, akAnd bkLinear coefficients representing guided filtering, IiRepresenting an input image, PiRepresenting the image to be filtered, i and k representing the image indices,
Figure BDA0002987603710000063
for the regularization term, put in the energy function equation to prevent akOver size, NkAn adaptive support window representing pixel k;
secondly, a linear coefficient a of the guiding filtering based on the adaptive window is obtained by minimizing and calculating the energy function optimization modelkAnd bkAre respectively as
Figure BDA0002987603710000064
In the formula, mukIs represented byiThe average value of (a) of (b),
Figure BDA0002987603710000065
representing the mean, σ, of the image to be filtered in an adaptive windowkIs represented byiStandard deviation of (d);
finally, the cost function obtained after filtering is as
Figure BDA0002987603710000066
Wherein a iskIi+bkRepresenting an output image, with I in a window centered on a pixel kiThere is a local linear relationship.
As a further improvement, the final cost polymerization result obtained in step 8 is
Figure BDA0002987603710000067
Figure BDA0002987603710000068
Representing the final aggregation cost, S representing the number of downsampling layers, a representing the coefficient matrix in the solving process, S representing each specific layer,
Figure BDA0002987603710000069
representing the matching cost matrix of layer 0.
According to the robot visual stereo matching method provided by the invention, firstly, in a cost calculation part, through an SAD and MCT matching cost calculation mode, the correlation and the global property of a window pixel point are considered while the weak texture and the repeated texture effect of an image are ensured, and the introduction of noise is avoided; secondly, in a cost aggregation stage, introducing an adaptive window which changes the size and the direction based on the image gradient, fully considering the gradient information of the image by using the adaptive window based on the gradient change, namely increasing the size of the window in a mild gradient area and reducing the size of the window in a severe gradient area, so that the edge part of the image is kept to the maximum extent, and simultaneously, conducting guide filtering in the adaptive window to aggregate the cost, and further using a multi-scale aggregation method on the basis to obtain a better cost aggregation result; and finally, parallax calculation and parallax optimization are carried out to obtain an optimal parallax result, and the method has the advantage of high visual stereo matching degree of the robot.
Drawings
The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.
Fig. 1 is a flowchart of a robot visual stereo matching method.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings and specific embodiments, and it is to be noted that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.
Fig. 1 is a diagram illustrating a robot visual stereo matching method according to an embodiment of the present invention. Referring to fig. 1, the robot vision stereo matching method includes the following steps:
step 1, obtaining a binocular image to be matched after distortion correction and stereo correction;
step 2, improving the traditional gradient-based matching cost calculation mode, and fusing the gradients in the x and y directions; the steps are embodied as follows: setting the gradient in the known x, y direction as
Figure BDA0002987603710000081
Wherein
Figure BDA0002987603710000082
For the calculation of the gradient operation in the x-direction,
Figure BDA0002987603710000083
for calculating gradient operations in the y-direction, GRFor the value of a pixel in the R channel in a three channel image, GGFor the value of a pixel in the G channel in a three channel image, GBFor the values of pixels in the B channel in a three channel image, an improved gradient-based matching cost calculation formula is obtained:
Figure BDA0002987603710000084
wherein p represents a pixel, d represents a parallax value, α represents a proportion of a gradient in the y direction in the gradient cost, which is a set value,
Figure BDA0002987603710000085
representing the gradient value of pixel p in the left image in the x-direction,
Figure BDA0002987603710000086
representing the gradient values of the pixels p-d in the x-direction in the right diagram,
Figure BDA0002987603710000087
representing the gradient value of pixel p in the left image in the y direction,
Figure BDA0002987603710000088
representing the gradient value, τ, of the pixel p-d in the y-direction in the right diagram1、τ2Indicating the set cutoff value.
Step 3, fusing a SAD (sum of absolute difference) and MCT (mean transform) matching cost calculation mode, wherein SAD represents the sum of absolute difference values of all pixels in a neighborhood of a pixel to be matched, and MCT represents improved census (statistical) transform which supports a window central pixel as a pixel mean value in a binocular image to be matched;
specifically, the SAD matching cost in this step is calculated as
Figure BDA0002987603710000089
Wherein, Il(p) represents the sum of the values of the three channels of the pixel p in the left image, Ir(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, NPRepresenting a neighborhood centered on a pixel p; the MCT matching cost calculation method comprises the steps of firstly, comparing a neighborhood pixel with a center pixel to obtain a Boolean value, mapping the Boolean value to a bit string, and obtaining the center pixel value which is the average value of all pixels in a neighborhood window
Figure BDA00029876037100000810
Wherein C isc(p) represents a character string obtained after the conversion operation is performed on the pixel p,
Figure BDA00029876037100000811
representing a connection by bit, NPA neighborhood of p is represented in the neighborhood of p,
Figure BDA00029876037100000812
Figure BDA00029876037100000813
representing the mean value of all pixels in the neighborhood, and I (p) representing the value of a pixel p, which is the sum of the values of three channels; then, obtaining MCT matching cost calculation by taking the Hamming distance of two bit strings, wherein the Hamming distance is the number of different corresponding bits of the two bit strings, specifically, carrying out XOR operation on the two bit strings, and counting the number of bits which is not 1 in the bits of the XOR operation result to obtain the numberThe MCT matching cost calculation mode is Cmcent(p,d)=Hamming(Ccl(p),Ccr(p-d)), wherein Hamming represents Hamming distance, the specific operation is to Ccl(p) and Ccr(p-d) performing XOR operation, wherein the statistical result is the number of 1, Ccl(p) a character string obtained by converting the pixel p in the left image, CcrAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.
Step 4, the improved gradient-based matching cost calculation mode and the SAD and MCT matching cost calculation mode are fused again to obtain the final matching cost calculation mode
Figure BDA0002987603710000091
In the formula, λSADRepresenting a control parameter, λ, of an SAD calculation methodmcentRegulation parameter, lambda, representing the MCT calculation methodgThe regulation and control parameters representing the improved gradient method are set values;
step 5, down-sampling the binocular image to be matched to generate an image pyramid;
step 6, generating an adaptive window with size changing based on gradient on the image of each scale of the generated image pyramid; it should be noted that the size, shape and orientation of the window may vary with the image structure information. The specific steps for generating the adaptive window are as follows:
step a, respectively calculating the gradients g in the horizontal direction and the vertical direction of the binocular image to be matchedxAnd gyIn a calculation manner of
Figure BDA0002987603710000092
The direction of the initial smoothing window centered on pixel p is θ0(i, j) is calculated in a manner of θ0(i,j)=arctan(g(y)(i,j)/g(x)(i, j)), i represents an abscissa value of the pixel p, j represents an ordinate value of the pixel p, g(x)(i, j) represents the gradient of the pixel p in the x-direction, g(y)(i, j) represents the gradient of the pixel p in the y-direction, and the size of the initial window is set as w0(i, j) and H0(i,j),w0(i, j) denotes the width of the initial window, H0(i, j) represents the height of the initial window, which is calculated as
Figure BDA0002987603710000101
a represents the maximum square window size in the smoothing window;
step b: calculating the gradient algebraic sum of the horizontal direction and the vertical direction in the window
Figure BDA0002987603710000102
And
Figure BDA0002987603710000103
the calculation method is
Figure BDA0002987603710000104
Wherein k represents a unit value in the horizontal direction, l represents a unit value in the vertical direction, and gx(i + k, j + l) represents a gradient of a pixel having coordinates (i + k, j + l) in the horizontal direction, gy(i + k, j + L) represents the gradient of the pixel with the coordinate of (i + k, j + L) in the vertical direction, L is the adaptive window, the window direction is updated according to the algebraic sum of the gradients, the calculation mode is,
Figure BDA0002987603710000105
step c: calculating the sum of absolute values of gradients in the horizontal direction and the vertical direction in the window
Figure BDA0002987603710000106
And
Figure BDA0002987603710000107
the window size is updated according to the sum of absolute values of the gradients which are calculated in the way that
Figure BDA0002987603710000108
The window size is calculated in the manner of
Figure BDA0002987603710000109
Wherein the content of the first and second substances,wm(i, j) represents the size of the adaptive window in the horizontal direction, Hm(i, j) represents the size of the adaptive window in the vertical direction, θm(i, j) represents a direction of the adaptive window;
step d: when the adaptive window size satisfies the condition
Figure BDA00029876037100001010
When the window size and direction are stopped to be updated, wm+1(i, j) represents the horizontal dimension of the adaptive window neighborhood, Hm+1(i, j) represents the size of the adaptive window neighborhood in the vertical direction.
Step 7, obtaining a parallax space map corresponding to the image of each scale through the matching cost calculation mode of the step 4, sliding each parallax space map by using the adaptive window obtained in the step 6, and performing guiding filtering in the adaptive window, namely a cost aggregation process of each scale; in other words, in this step, the adaptive window is only one window, the guided filtering is performed within the adaptive window, the target is a result (which is a disparity space map) obtained by previous matching cost calculation, images of different scales can obtain their disparity space maps, the images are respectively subjected to adaptive window guided filtering, then are summarized, and the process of performing adaptive window guided filtering on each disparity space map is cost aggregation.
Specifically, in this step, first, an energy function optimization model of a guided filtering algorithm is established
Figure BDA0002987603710000111
In the formula, akAnd bkLinear coefficients representing guided filtering, IiRepresenting an input image, PiRepresenting the image to be filtered, i and k representing the image indices,
Figure BDA0002987603710000112
for the regularization term, put in the energy function equation to prevent akOver size, NkAn adaptive support window representing pixel k;
second, by optimizing the model minimization to the energy functionCalculating to obtain a linear coefficient a of the guide filtering based on the self-adaptive windowkAnd bkAre respectively as
Figure BDA0002987603710000113
μkIs represented byiThe average value of (a) of (b),
Figure BDA0002987603710000114
representing the mean, σ, of the image to be filtered in an adaptive windowkIs represented byiStandard deviation of (d);
finally, the cost function obtained after filtering is as
Figure BDA0002987603710000115
Wherein a iskIi+bkRepresenting an output image, with I in a window centered on a pixel kiThere is a local linear relationship.
Step 8, carrying out multi-scale polymerization on the cost polymerization result of each scale obtained in the step 7 to obtain a final cost polymerization result
Figure BDA0002987603710000116
Figure BDA0002987603710000117
Representing the final aggregation cost, S representing the number of downsampling layers, a representing the coefficient matrix in the solving process, S representing each specific layer,
Figure BDA0002987603710000118
representing the matching cost matrix of layer 0. It should be noted that, since cost aggregation is performed on images of each scale before, the step is called multi-scale aggregation because results obtained by each scale are summarized.
Step 9, calculating a cost aggregation result in the step 8 by using a method of selecting a parallax corresponding to the minimum cost as an optimal parallax from the cost values of the winner take all of the pixels under all the parallaxes of the pixels to obtain a parallax value of the pixels;
and step 10, optimizing the obtained parallax value by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final parallax result, wherein the parallax result is a final stereo matching result.
In summary, the robot vision stereo matching method of the invention specifically comprises the steps of (1) obtaining binocular images to be matched after distortion correction and stereo correction; (2) the traditional gradient-based matching cost calculation mode is improved, and the gradients in the x and y directions are subjected to reasonable and effective normalized fusion; (3) fusing the sum SAD of the absolute value of the gray level difference of all pixels in the neighborhood of the pixel to be matched with a matching cost calculation mode of improved census transformation MCT which supports the central pixel of a window as the pixel mean value; (4) the improved gradient-based matching cost calculation mode is fused with the SAD and MCT matching cost calculation mode again to obtain a final matching cost calculation mode; (5) down-sampling the binocular image to be matched to generate an image pyramid; (6) generating an adaptive window for automatically controlling the size based on the gradient on the image of each scale; (7) performing guided filtering within each window; (8) aggregating results of different scales of the image pyramid to obtain a final cost aggregation result; (9) calculating the cost aggregation result in the step 8 by using a method of taking the winner as the WTA to obtain a parallax value of the pixel; (10) and performing parallax optimization on the result obtained by parallax calculation by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final stereo matching result. Through the process, in the cost aggregation process of stereo matching, the self-adaptive window based on gradient change is introduced, the size of the window can be changed along with the change of the gradient, the window is larger when the gradient change is smooth, the window is smaller when the gradient change is severe, and the filtering cost aggregation is guided in the self-adaptive window. Compared with the method for conducting guided filtering cost aggregation in the whole picture global range, the method can more emphatically calculate the local part, can weaken the influence in regions with discontinuous depth and the like, which are easy to make mistakes in stereo matching, and can increase the influence in regions with good stereo matching effect, such as the regions with smooth depth change and the like.
Therefore, compared with the prior art, the invention has the advantages that:
(1) effectively weakening the introduction of noise in the calculation process and improving the precision of stereo matching
The invention improves the traditional cost calculation mode of AD-Census (combination of absolute difference of adjacent degree Census and Census transformation) in the cost calculation process, provides a matching cost calculation mode of SAD and MCT, fuses the cost of pixel sum of a neighborhood support window and Census transformation cost of a central pixel which is the mean value of the neighborhood support window, considers the functions of other pixels in the neighborhood window while considering pixel correlation, has global property, can weaken the influence of noise in the calculation process and reduce the introduction of noise.
(2) The method has the advantages that the method obtains better effect in areas with discontinuous depth and rich texture change, and improves the precision of stereo matching.
In the description above, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore should not be construed as limiting the scope of the present invention.
In conclusion, although the present invention has been described with reference to the preferred embodiments, it should be noted that, although various changes and modifications may be made by those skilled in the art, they should be included in the scope of the present invention unless they depart from the scope of the present invention.

Claims (8)

1. A robot vision stereo matching method is characterized by comprising the following steps:
step 1, obtaining a binocular image to be matched after distortion correction and stereo correction;
step 2, improving the traditional gradient-based matching cost calculation mode, and fusing the gradients in the x and y directions;
step 3, integrating the SAD and MCT matching cost calculation modes, wherein the SAD represents the sum of the absolute values of the gray differences of all pixels in the neighborhood of the pixel to be matched in the binocular image to be matched, and the MCT represents improved census transformation which takes the central pixel of the support window in the binocular image to be matched as the pixel mean value;
step 4, the improved gradient-based matching cost calculation mode and the SAD and MCT matching cost calculation mode are fused again to obtain a final matching cost calculation mode;
step 5, down-sampling the binocular image to be matched to generate an image pyramid;
step 6, generating an adaptive window with size changing based on gradient on the image of each scale of the generated image pyramid;
step 7, obtaining a parallax space map corresponding to the image of each scale through the matching cost calculation mode of the step 4, sliding each parallax space map by using the adaptive window obtained in the step 6, and performing guiding filtering in the adaptive window, namely a cost aggregation process of each scale;
step 8, performing multi-scale polymerization on the cost polymerization result of each scale obtained in the step 7 to obtain a final cost polymerization result;
step 9, calculating a cost aggregation result in the step 8 by using a method of taking the winner as the WTA to obtain a pixel parallax value;
and step 10, optimizing the obtained parallax value by using a self-adaptive weight median filtering and left-right consistency detection method to obtain a final parallax result, wherein the parallax result is a final stereo matching result.
2. The robot-vision stereo matching method according to claim 1, wherein the step 2 is embodied as follows:
setting the gradient in the known x, y direction as
Figure FDA0002987603700000021
Wherein the content of the first and second substances,
Figure FDA0002987603700000022
is at the same timeThe computed gradient in the x-direction operates,
Figure FDA0002987603700000023
for calculating gradient operations in the y-direction, GRFor the value of a pixel in the R channel in a three channel image, GGFor the value of a pixel in the G channel in a three channel image, GBObtaining an improved gradient-based matching cost calculation formula for the value of a pixel in a B channel in a three-channel image:
Figure FDA0002987603700000024
wherein p represents a pixel, d represents a parallax value, α represents a proportion of a gradient in the y direction in the gradient cost, which is a set value,
Figure FDA0002987603700000025
representing the gradient value of pixel p in the left image in the x-direction,
Figure FDA0002987603700000026
representing the gradient values of the pixels p-d in the x-direction in the right diagram,
Figure FDA0002987603700000027
representing the gradient value of pixel p in the left image in the y direction,
Figure FDA0002987603700000028
representing the gradient value, τ, of the pixel p-d in the y-direction in the right diagram1、τ2Each represents a set cutoff value.
3. The robot visual stereo matching method according to claim 2, wherein the SAD matching cost in the step 3 is calculated as
Figure FDA0002987603700000029
Wherein, Il(p) representsSum of three channel values, I, of pixel p in the left imager(p-d) represents the sum of the three channel values of pixel p-d in the right diagram, NPRepresenting a neighborhood centered on pixel p, the matching cost of MCT is calculated as Cmcent(p,d)=Hamming(Ccl(p),Ccr(p-d)), where Hamming represents the Hamming distance, the specific operation is for Ccl(p) and Ccr(p-d) performing XOR operation, wherein the statistical result is the number of 1, Ccl(p) a character string obtained by converting the pixel p in the left image, CcrAnd (p-d) represents a character string obtained by converting the pixel p-d in the right graph.
4. The robot visual stereo matching method according to claim 3, wherein the MCT matching cost calculation method process is as follows:
firstly, comparing the neighborhood pixel with the central pixel to obtain a Boolean value, mapping the Boolean value to a bit string, wherein the central pixel value is the average value of all pixels in the neighborhood window to obtain the Boolean value
Figure FDA0002987603700000031
Wherein C isc(p) represents a character string obtained after the conversion operation is performed on the pixel p,
Figure FDA0002987603700000032
representing a connection by bit, NPA neighborhood of p is represented in the neighborhood of p,
Figure FDA0002987603700000033
Figure FDA0002987603700000034
representing the mean value of all pixels in the neighborhood, and I (p) representing the value of a pixel p, which is the sum of the values of three channels;
then, obtaining MCT matching cost calculation by taking a Hamming distance of two bit strings, wherein the Hamming distance is the number of different corresponding bits of the two bit strings, specifically, carrying out XOR operation on the two bit strings, and counting the number of bits which is not 1 in the bits of the XOR operation result, and the obtained MCT matching cost calculation mode is that
Cmcent(p,d)=Hamming(Ccl(p),Ccr(p-d))。
5. The robot visual stereo matching method according to claim 4, wherein the final matching cost calculation method obtained in the step 4 is:
Figure FDA0002987603700000035
in the formula, λSADThe control parameter, λ, representing the SAD matching cost calculation modemcentThe control parameter, λ, representing the MCT matching cost calculation modegA control parameter, λ, representing an improved gradient-based matching cost calculationSAD、λmcentAnd λgAre all set values.
6. The robot-vision stereo matching method according to claim 5, wherein the step 6 is embodied as:
step a, respectively calculating the gradients g in the horizontal direction and the vertical direction of the binocular image to be matchedxAnd gyIn a calculation manner of
Figure FDA0002987603700000036
The direction of the initial smoothing window centered on pixel p is θ0(i, j) is calculated in a manner of θ0(i,j)=arctan(g(y)(i,j)/g(x)(i, j)), i represents an abscissa value of the pixel p, j represents an ordinate value of the pixel p, g(x)(i, j) represents the gradient of the pixel p in the x-direction, g(y)(i, j) represents the gradient of the pixel p in the y-direction, and the size of the initial window is set as w0(i, j) and H0(i,j),w0(i, j) denotes the width of the initial window, H0(i, j) represents the height of the initial window, which is calculated as
Figure FDA0002987603700000041
a represents the maximum square window size in the smoothing window;
step b: calculating the gradient algebraic sum of the horizontal direction and the vertical direction in the window
Figure FDA0002987603700000042
And
Figure FDA0002987603700000043
the calculation method is
Figure FDA0002987603700000044
Wherein k represents a unit value in the horizontal direction, l represents a unit value in the vertical direction, and gx(i + k, j + l) represents a gradient of a pixel having coordinates (i + k, j + l) in the horizontal direction, gy(i + k, j + L) represents the gradient of the pixel with the coordinate of (i + k, j + L) in the vertical direction, L is the adaptive window, the window direction is updated according to the algebraic sum of the gradients, the calculation mode is,
Figure FDA0002987603700000045
step c: calculating the sum of absolute values of gradients in the horizontal direction and the vertical direction in the window
Figure FDA0002987603700000046
And
Figure FDA0002987603700000047
the window size is updated according to the sum of absolute values of the gradients which are calculated in the way that
Figure FDA0002987603700000048
The window size is calculated in the manner of
Figure FDA0002987603700000049
Wherein, wm(i, j) represents the size of the adaptive window in the horizontal direction, Hm(i, j) represents the size of the adaptive window in the vertical direction, θm(i, j) represents a direction of the adaptive window;
step d: when the adaptive window size satisfies the condition
Figure FDA00029876037000000410
When the window size and direction are stopped to be updated, wm+1(i, j) represents the horizontal dimension of the adaptive window neighborhood, Hm+1(i, j) represents the size of the adaptive window neighborhood in the vertical direction.
7. The robot vision stereo matching method according to claim 6, wherein the step 7 of conducting guided filtering within an adaptive window is implemented by:
firstly, an energy function optimization model of a guide filtering algorithm is established
Figure FDA0002987603700000051
In the formula, akAnd bkLinear coefficients representing guided filtering, IiRepresenting an input image, PiRepresenting the image to be filtered, i and k representing the image indices,
Figure FDA0002987603700000052
for the regularization term, put in the energy function equation to prevent akOver size, NkAn adaptive support window representing pixel k;
secondly, a linear coefficient a of the guiding filtering based on the adaptive window is obtained by minimizing and calculating the energy function optimization modelkAnd bkAre respectively as
Figure FDA0002987603700000053
In the formula, mukIs represented byiThe average value of (a) of (b),
Figure FDA0002987603700000054
representing the mean value of the image to be filtered in an adaptive window,σkIs represented byiStandard deviation of (d);
finally, the cost function obtained after filtering is as
Figure FDA0002987603700000055
Wherein a iskIi+bkRepresenting an output image, with I in a window centered on a pixel kiThere is a local linear relationship.
8. The method for robot visual stereo matching according to claim 7, wherein the final cost aggregation result obtained in the step 8 is
Figure FDA0002987603700000056
Figure FDA0002987603700000057
Representing the final aggregation cost, S representing the number of downsampling layers, a representing the coefficient matrix in the solving process, S representing each specific layer,
Figure FDA0002987603700000058
representing the matching cost matrix of layer 0.
CN202110304658.1A 2021-03-23 2021-03-23 Robot vision stereo matching method Active CN112991421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110304658.1A CN112991421B (en) 2021-03-23 2021-03-23 Robot vision stereo matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110304658.1A CN112991421B (en) 2021-03-23 2021-03-23 Robot vision stereo matching method

Publications (2)

Publication Number Publication Date
CN112991421A true CN112991421A (en) 2021-06-18
CN112991421B CN112991421B (en) 2023-08-08

Family

ID=76334333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110304658.1A Active CN112991421B (en) 2021-03-23 2021-03-23 Robot vision stereo matching method

Country Status (1)

Country Link
CN (1) CN112991421B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822915A (en) * 2021-07-30 2021-12-21 济宁安泰矿山设备制造有限公司 Image stereo matching method for intelligent pump cavity endoscope fault diagnosis
CN116071415A (en) * 2023-02-08 2023-05-05 淮阴工学院 Stereo matching method based on improved Census algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013126114A (en) * 2011-12-14 2013-06-24 Samsung Yokohama Research Institute Co Ltd Stereo image processing method and stereo image processing apparatus
US20130259360A1 (en) * 2012-03-27 2013-10-03 Fujitsu Limited Method and system for stereo correspondence
CN103440653A (en) * 2013-08-27 2013-12-11 北京航空航天大学 Binocular vision stereo matching method
CN110473217A (en) * 2019-07-25 2019-11-19 沈阳工业大学 A kind of binocular solid matching process based on Census transformation
CN112102382A (en) * 2020-09-16 2020-12-18 北京邮电大学 Electromechanical equipment visual information stereo matching algorithm based on multi-scale transformation and ADcensus-JWGF

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013126114A (en) * 2011-12-14 2013-06-24 Samsung Yokohama Research Institute Co Ltd Stereo image processing method and stereo image processing apparatus
US20130259360A1 (en) * 2012-03-27 2013-10-03 Fujitsu Limited Method and system for stereo correspondence
CN103440653A (en) * 2013-08-27 2013-12-11 北京航空航天大学 Binocular vision stereo matching method
CN110473217A (en) * 2019-07-25 2019-11-19 沈阳工业大学 A kind of binocular solid matching process based on Census transformation
CN112102382A (en) * 2020-09-16 2020-12-18 北京邮电大学 Electromechanical equipment visual information stereo matching algorithm based on multi-scale transformation and ADcensus-JWGF

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王云峰;吴炜;余小亮;王安然;: "基于自适应权重AD-Census变换的双目立体匹配", 工程科学与技术, no. 04 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822915A (en) * 2021-07-30 2021-12-21 济宁安泰矿山设备制造有限公司 Image stereo matching method for intelligent pump cavity endoscope fault diagnosis
CN116071415A (en) * 2023-02-08 2023-05-05 淮阴工学院 Stereo matching method based on improved Census algorithm
CN116071415B (en) * 2023-02-08 2023-12-01 淮阴工学院 Stereo matching method based on improved Census algorithm

Also Published As

Publication number Publication date
CN112991421B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN106780590A (en) The acquisition methods and system of a kind of depth map
CN107578430B (en) Stereo matching method based on self-adaptive weight and local entropy
CN104616286B (en) Quick semi-automatic multi views depth restorative procedure
CN102184540B (en) Sub-pixel level stereo matching method based on scale space
CN109887021B (en) Cross-scale-based random walk stereo matching method
CN104318576B (en) Super-pixel-level image global matching method
CN112991421A (en) Robot vision stereo matching method
CN111105452B (en) Binocular vision-based high-low resolution fusion stereo matching method
CN106408596A (en) Edge-based local stereo matching method
CN102740096A (en) Space-time combination based dynamic scene stereo video matching method
CN112435267B (en) Disparity map calculation method for high-resolution urban satellite stereo image
CN107945222A (en) A kind of new Stereo matching cost calculates and parallax post-processing approach
CN112287824A (en) Binocular vision-based three-dimensional target detection method, device and system
CN115601406A (en) Local stereo matching method based on fusion cost calculation and weighted guide filtering
CN104980726B (en) A kind of binocular video solid matching method of associated movement vector
CN113034681B (en) Three-dimensional reconstruction method and device for spatial plane relation constraint
CN107274448B (en) Variable weight cost aggregation stereo matching algorithm based on horizontal tree structure
CN113344989B (en) NCC and Census minimum spanning tree aerial image binocular stereo matching method
CN113674415B (en) Method for jointly manufacturing continuous and hollow-free DSM (digital image) by utilizing high-resolution seventh image and resource third image
CN109816711B (en) Stereo matching method adopting adaptive structure
CN110910438B (en) High-speed stereo matching algorithm for ultrahigh-resolution binocular image
CN113850293A (en) Positioning method based on multi-source data and direction prior joint optimization
Sandström et al. Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians
CN114187208B (en) Semi-global stereo matching method based on fusion cost and self-adaptive penalty term coefficient
CN117078982B (en) Deep learning-based large-dip-angle stereoscopic image alignment dense feature matching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant