CN111062900B

CN111062900B - Binocular disparity map enhancement method based on confidence fusion

Info

Publication number: CN111062900B
Application number: CN201911148748.5A
Authority: CN
Inventors: 赵春晖; 苏梅梅; 刘慧霞; 胡劲文; 田雪涛
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-02-12
Anticipated expiration: 2039-11-21
Also published as: CN111062900A

Abstract

The invention discloses a binocular disparity map enhancement method based on confidence fusion, which comprises the following steps of: s1, placing the picture right in front of the center of the binocular camera, and utilizing the double cameraShooting a target by a target camera to obtain a left image a and a right image b of a binocular image; s2, respectively carrying out stereogram distortion correction on the left image a and the right image b to obtain a stereogram corrected left image a1 and a stereogram corrected right image b 1; s3, performing disparity estimation on the left image a1 and the right image b1 by using a convolutional neural network algorithm to obtain a dense disparity map a 2; carrying out disparity map optimization processing on the dense disparity map a2 to obtain an optimized disparity map I_c(ii) a S4, carrying out parallax estimation on the stereogram corrected left image a1 and right image b1 by adopting an improved optical flow estimation algorithm to obtain a parallax image I_O(ii) a S5, according to the confidence level, comparing the obtained disparity map I_OAnd a disparity map I_cAnd (5) performing fusion to obtain a final disparity map I (u).

Description

Binocular disparity map enhancement method based on confidence fusion

[ technical field ] A method for producing a semiconductor device

The invention belongs to the technical field of stereo matching in image processing, and particularly relates to a binocular disparity map enhancement method based on confidence fusion.

[ background of the invention ]

Binocular stereo vision is an important form of machine vision, and three-dimensional information of an object is acquired by combining two images according to the parallax principle. The stereo matching is a main technical means for obtaining three-dimensional object information from two-dimensional image information, and refers to projection points of any point on two or more cameras on a spatial scene, and the points are called corresponding points to obtain corresponding points between a left image plane and a right image plane. The process of finding stereo corresponding points is called disparity estimation process, so disparity estimation is an important basis for depth estimation.

The traditional parallax estimation method generally uses a large window to solve the problem in processing low-texture and depth discontinuous areas, which can cause blurring of important features, and the poor-texture and blurred surfaces cannot be matched consistently, so that errors caused by noise exist in parallax values.

[ summary of the invention ]

The invention aims to provide a binocular disparity map enhancement method based on confidence fusion, and the method is used for solving the problems that the accuracy of disparity values in low-texture areas and depth discontinuous areas cannot be improved and high-quality disparity maps cannot be generated in the prior art.

The invention adopts the following technical scheme: a binocular disparity map enhancement method based on confidence fusion comprises the following steps:

s1, placing the picture in front of the center of the binocular camera, and shooting a target by using the binocular camera to obtain a left picture a and a right picture b of the binocular picture; the camera coordinate system takes the center of the camera as an origin, x points to the right of the camera, y points to the upper part of the camera, and z points to the front of the camera;

s2, respectively carrying out stereogram distortion correction on the left image a and the right image b to obtain a stereogram corrected left image a1 and a stereogram corrected right image b 1;

s3, performing disparity estimation on the left image a1 and the right image b1 by using a convolutional neural network algorithm to obtain a dense disparity map a 2;

carrying out disparity map optimization processing on the dense disparity map a2 to obtain an optimized disparity map I_c；

S4, carrying out parallax estimation on the stereogram corrected left image a1 and right image b1 by adopting an improved optical flow estimation algorithm to obtain a parallax image I_O；

S5, according to the confidence level, comparing the obtained disparity map I_OAnd a disparity map I_cAnd (5) performing fusion to obtain a final disparity map I (u).

Further, the specific method adopted in step S2 is as follows: co-planar left and right panels a and b; aligning the image lines of the left image a and the right image b; and (5) correcting the left image a and the right image b to obtain a stereo corrected output matrix, namely obtaining a corrected left image a1 and a corrected right image b 1.

Further, the specific method of step S3 is:

3.1, carrying out weight distribution on the left graph a1 and the right graph b1 by using a convolutional neural network, wherein each layer of neural network is subjected to homogenization treatment and a ReLU function; feature maps obtained after the left map a1 and the right map b1 pass through a convolutional neural network are sent to a relevant layer to calculate cost, and an initial disparity map is obtained after three basic processes of cost aggregation and disparity prediction;

step 3.2, performing left-right verification on the generated initial disparity map, updating the network weight, and updating the updating rule according to a gradient descent method;

3.3, repeating the step 3.1 and the step 3.2, after updating the network weight, continuing to predict the initial disparity map, and then checking out the pixels with high confidence coefficient to continue training;

step 3.4, after the training is finished, smoothing the predicted disparity map by a median filter to obtain a dense disparity map a 2;

step 3.5, filling points with edge parallax value of 0 in the dense parallax map a2, filling the parallax information with value of 0 by adopting an image fusion method, and fusing to obtain an optimized parallax map I_c(x,y)。

Further, the specific process of step S4 is:

step 4.1, acquiring a first derivative and a second derivative of the midpoint of the images a1 and b1 according to an optical flow equation; assuming that the pixel points of the fixed window have the same motion and have smaller displacement; for the left graph a1, define u_aThe moving speed v of the pixel point in the x-axis direction_aThe motion speed of the pixel point in the y-axis direction is obtained; wherein I_x，I_yAnd I_tThe derivatives of the image with respect to x-direction, y-direction, and time, respectively; in the right graph b1, u is defined_bThe moving speed v of the pixel point in the x-axis direction_bObtaining the derivative L of the pixel point in the image b1 for the x direction, the y direction and the time for the moving speed of the pixel point in the y-axis direction_x、L_yAnd L_t；

Step 4.2, obtaining a characteristic value of the Hessian matrix, and calculating a condition number of the Hessian matrix;

if the condition number is between 0.91 and 0.99, the pixel is a reliable point;

if the condition number is not between 0.91 and 0.99, the pixel is an unreliable point, and the unreliable point is rejected;

4.3, removing unreliable points by using a Hessian matrix, and taking the reciprocal of the condition number corresponding to the points as the weight of the points;

step 4.4, solving the optical flow field according to the weighted least square method [ u v]^TAnd [ u 'v']^TThe obtained optical flow field is approximate to a parallax value, and a parallax image I is obtained_O。

Further, the specific process of S5 is as follows:

step 5.1, in the disparity map I_cAnd a disparity map I_OFor a certain position (x)_i,y_i) In two parallax images I_cAnd I_OSelecting a fusion window W with a certain size, and calculating the confidence levels according to an SNR confidence measure method, wherein the confidence levels are calculated as shown in the specification,

s_i＝|d_li-d_ri|，

Wherein d is_liObtaining a disparity value, d, for the reference image as the primary image_riObtaining parallax for the image to be matched as the main image;

step 5.2, comparing the two disparity maps (x)_i,y_i) Selecting the confidence level with high confidence as the final parallax; according to the weight and the confidence level of each parallax in the fusion window, the total confidence level can be obtained as follows:

wherein the weight is

d_i(x_c,y_c) For blending the disparity values of the central pixels of the window, d_i(x, y) is the disparity value of the pixels in the fusion window, and the final disparity is defined as follows:

D＝argmax[c_i(x,y,d_i)]，

and processing according to the steps to obtain a final fused disparity map I (u).

The invention has the beneficial effects that: aiming at the problems in stereo matching of images with low texture and discontinuous depth, stereo parallax is approximated through horizontal optical flow, meanwhile, the accuracy of parallax values of low texture and discontinuous depth areas is improved through fusion between the convolutional neural network-based parallax map and the improved optical flow image, and a high-quality parallax map is generated.

[ description of the drawings ]

Fig. 1 is a flow chart of a binocular disparity map enhancement method based on confidence fusion according to the present invention;

FIG. 2(a) is a left image captured by a binocular camera in the embodiment;

FIG. 2(b) is a right view of the binocular camera in the embodiment;

FIG. 2(c) is the resulting disparity map of the improved LK optical flow algorithm in an embodiment;

FIG. 2(d) is the resulting disparity map of the modified convolutional neural network in an embodiment;

fig. 2(e) is a disparity map obtained by using the binocular disparity map enhancement method based on confidence fusion according to the embodiment of the present invention;

fig. 2(f) is a disparity map obtained by using the SGBM method in the embodiment.

[ detailed description ] embodiments

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a binocular disparity map enhancement method based on confidence fusion, which comprises the following steps of:

s1, placing the picture in front of the center of the binocular camera, and shooting a target by using the binocular camera to obtain a left picture a and a right picture b of the binocular picture; the camera coordinate system takes the center of the camera as an origin, x points to the right of the camera, y points to the upper of the camera, and z points to the front of the camera.

S2, respectively carrying out stereogram distortion correction on the left image a and the right image b to obtain a stereogram corrected left image a1 and a stereogram corrected right image b 1; the distortion correction process is that projection transformation matrixes of images are obtained by aiming at the two cameras obtained by calibration, so that the images obtained by the two cameras are coplanar and aligned in rows from a world coordinate system to a camera coordinate system, and epipolar alignment is realized;

the specific method comprises the following steps:

step 2.1, coplanar left graph a and right graph b: taking the upper left corner of the image as an origin, arranging the image rows as an x axis and arranging the image rows as a y axis, so that the focal planes of the two images become coplanar; dividing a camera rotation matrix R into a composite matrix of left and right cameras

To achieve coplanar image planes;

step 2.2, aligning the image lines of the left image a and the right image b: first a rotation matrix R of the direction of the translation vector is created_e＝[e₁,e₂,e₃]^TWherein e is₁＝T/||T||，T＝[T_x,T_y,T_z]，T_x,T_yAnd T_zTranslation vectors in the x, y and z directions respectively,

e₃＝e₁×e₂(ii) a The row alignment matrix is obtained as follows:

R_l＝R_er_l,R_r＝R_er_r；

and 2.3, correcting the left image a and the right image b to obtain a three-dimensional corrected output matrix, wherein a specific formula is as follows:

in the above formula, the Q matrix is a reprojection matrix, which realizes the conversion between the world coordinate system and the pixel coordinate system, d is the parallax, f is the focal length, and the vertical and horizontal offset of the origin a of the left image is c_xAnd c_y，

Is used for shooting three-dimensional coordinates, c 'corresponding to an original image'_xMain point representing right image b, when c_x＝c′_xIf so, correct correction is carried out;

according to the steps, whether the corresponding points on the two images are on the same polar line is finally judged, and the two corrected images obtained by the method are successfully corrected through verification, so that a left image a1 and a right image b1 of the corrected images are obtained;

s3, performing disparity estimation on the left image a1 and the right image b1 by using a convolutional neural network algorithm to obtain a dense disparity map a 2; carrying out disparity map optimization processing on the dense disparity map a2 to obtain an optimized disparity map I_c；

The specific method comprises the following steps:

3.1, carrying out weight distribution on the left graph a1 and the right graph b1 by using a convolutional neural network, wherein each layer of neural network is subjected to homogenization treatment and a ReLU function;

feature maps obtained after the left map a1 and the right map b1 pass through a convolutional neural network are sent to a relevant layer to calculate cost, and an initial disparity map is obtained after three basic processes of cost aggregation and disparity prediction;

step 3.2, performing left-right verification on the generated initial disparity map, updating the network weight, and updating the updating rule according to a gradient descent method, wherein the formula is as follows:

wherein J (w, beta; a1, b1) is a cost function, P_w,β() represents the predicted values of the network for the left graph a1 and the right graph b1, w, β represent the weight parameters of the neural network; α represents learning efficiency;

step 3.4, after the training is finished, smoothing the predicted disparity map by a median filter to ensure edge information, and obtaining a dense disparity map a 2;

step 3.5, filling points with edge parallax value of 0 in the dense parallax map a2, filling the parallax information with value of 0 by adopting an image fusion method, and fusing to obtain an optimized parallax map I_c(x,y)；

The fusion mode is as follows:

wherein, I_c(x, y) is the optimized disparity map; i is_a1(x,y)、I_b1(x, y) respectively representing a disparity map obtained by matching the reference image with the adjacent left and right images; δ represents a threshold value; t is a translation vector in the camera's external reference matrix;

The specific process is as follows:

step 4.1: acquiring a first derivative and a second derivative of the point in the images a1 and b1 according to an optical flow equation; assuming that the pixel points of the fixed window have the same motion and have smaller displacement; for the left graph a1, define u_aThe moving speed v of the pixel point in the x-axis direction_aThe motion speed of the pixel point in the y-axis direction is obtained; wherein I_x，I_yAnd I_tThe derivatives of the image with respect to x-direction, y-direction, and time, respectively; in the right graph b1, u is defined_bThe moving speed v of the pixel point in the x-axis direction_bFor the moving speed of the pixel point in the y-axis direction, the derivative L of the pixel point in the image b1 with respect to the x-direction, the y-direction, and the time can be obtained_x、L_yAnd L_t；

According to the basic optical flow equation:

I_xu+I_yv+I_t＝0 (4)，

respectively deriving x and y in formula (4):

defining the Hessian matrix as:

the same can be obtained

Step 4.2: obtaining an eigenvalue of the Hessian matrix, and calculating a condition number of the Hessian matrix, as shown in formula (8):

wherein λ is_maxAnd λ_minRespectively a Hessian matrix H_KThe maximum eigenvalue and the minimum eigenvalue of the eigenvalue of (6) can be judged by the formula (8); if the condition number is large, corresponding to H_KThe rank of the matrix is very small, and the calculated optical flow is unreliable; if the condition number is close to 1, corresponding to H_KThe rank of the matrix is large, and the solution robustness of the equation (6) is good;

therefore, if the condition number is between 0.91-0.99, the pixel is a reliable point;

step 4.3: setting the weight of points in the image:

unreliable points are eliminated by using a Hessian matrix, reciprocals of condition numbers corresponding to the points are used as weights of the points, as shown in a formula (9),

step 4.4: optical flow field [ u v ] is solved according to weighted least squares]^TAnd [ u 'v']^TAs shown in formulas (10) and (11);

according to the processing of the steps, the obtained optical flow field is approximate to a parallax value, and a parallax image I is obtained_O。

The specific process is as follows:

step 5.1, in the disparity map I_cAnd a disparity map I_OFor a certain position (x)_i,y_i) In two parallax images I_cAnd I_OSelecting a fusion window W with a certain size, and calculating the confidence levels according to an SNR confidence measure method, wherein the confidence levels are calculated as follows:

s_i＝|d_li-d_ri| (11)，

wherein d is_liObtaining a disparity value, d, for the reference image as the primary image_riIs the image to be matched isObtaining parallax of the main image;

wherein the weight is

D＝argmax[c_i(x,y,d_i)]，

and (4) obtaining the finally fused disparity map I (u) according to the processing of the steps, wherein the obtained disparity map can solve the problems of detail loss and large noise.

Since the convolutional neural network has a certain defect in understanding the image features, even though the problem of insufficient edge information is improved. However, the obtained disparity map still has the problem of detail loss. However, the optical flow estimation algorithm not only has a good effect on processing the motion information of objects in the image, but also is sensitive to other detailed information in the image, and the single optical flow algorithm has the problem of double images when estimating the image parallax. Therefore, in order to solve the problems of double images and lack of details of the disparity map, the two algorithms are fused, and the difference is made up by taking the advantages and the disadvantages, and the disparity map is obtained according to the fusion algorithm. Aiming at the problems in stereo matching of images with low textures and discontinuous depths, the invention approximates stereo parallax through horizontal optical flow, and simultaneously improves the accuracy of parallax values of low textures and discontinuous depth regions and generates a high-quality parallax image by fusing a convolutional neural network-based parallax image and an improved optical flow image.

The convolutional neural network is a popular algorithm for binocular disparity estimation at present, and many scholars make many innovations on network structure in order to improve training performance of the neural network and accelerate convergence rate. However, many approaches fail to compromise training performance with proficiency speed. For example, a method of increasing the number of feature extraction layers of an original network will bring a larger calculation scale. In order to obtain a high-quality parallax map in a simpler and more convenient mode, the method of the invention develops a new way, is not limited by the design on the neural network, skillfully combines the improved convolutional neural network with the LK optical flow estimation algorithm, and in the design of the method, under the condition of ensuring the rapid convergence of the convolutional neural network, the LK optical flow estimation algorithm is utilized to make up the problem of detail loss caused by the shortage of training samples of the convolutional neural network. The binocular disparity map enhancement method based on confidence fusion has the difficulty that the fusion mode of the two disparity maps is selected, and the two disparity maps are fused by a plurality of fusion methods and a plurality of experimental verifications, so that the method meets the effective information of the two disparity maps required by the method, and can achieve good disparity maps.

Example (b):

in order to verify the effectiveness of the method, a Tsukuba image is selected, and the traditional method SGBM is compared with the binocular disparity map enhancement method based on confidence fusion.

First, image correction is performed on the input left Tsukuba map of fig. 2(a) and right Tsukuba map of fig. 2(b) according to the method in the above step S2; and then performing disparity estimation on the corrected left Tsukuba map and right Tsukuba map by the improved convolutional neural network method described in step S3: the method comprises the steps of using a KITTI Stereo 2012 data set as a training set, using a Tsukuba graph as a test set, updating weights through optimizing a cost function through calculation of a convolution layer, and finally obtaining a disparity map I_cAs shown in fig. 2 (d). The left Tsukuba map and the right Tsukuba map corrected by the method in the step S2 are processed by the LK optical flow estimation algorithm in the step S4 to obtain an optical flow field, and a disparity map I is obtained_OAs shown in fig. 2 (c). Then, the disparity map I is used_cAnd a disparity map I_O. The disparity map i (u) is obtained by the memorable disparity map fusion according to the confidence level in step S5 in the present invention, and the result is shown in fig. 2 (e). Meanwhile, based on the same group of Tsukuba graphs, we perform disparity estimation based on a traditional algorithm SGBM (semi-global block matching) to obtain a disparity map as shown in fig. 2 (f).

Based on the proposed inventive method, a set of experiments was performed, and we compared the performance of the method based on the percentage of bad pixels represented by the percentage of pixels in the disparity map with disparity values different from the true values to the total pixels. According to the experimental data, the poor pixel rate of the final parallax result is 3.4% (Tsukuba graph), which is reduced by 2.08% (Tsukuba graph) compared to the conventional SGBM (semi-global block matching) method. From the experimental results, as shown in fig. 2(e), it can be seen that the edge region has no missing information and the contour in the image has no ghost phenomenon.

Claims

1. A binocular disparity map enhancement method based on confidence fusion is characterized by comprising the following steps:

s3, performing disparity estimation on the left image a1 and the right image b1 by adopting a convolutional neural network algorithm to obtain a dense disparity map a 2;

performing disparity map optimization processing on the dense disparity map a2 to obtain an optimized disparity map I_c；

The specific process of step S4 is as follows:

and 4. step 4.1. Acquiring a first derivative and a second derivative of the point in the images a1 and b1 according to an optical flow equation; the pixel points of the fixed window have the same motion and have smaller displacement; for the left graph a1, define u_aThe moving speed v of the pixel point in the x-axis direction_aThe motion speed of the pixel point in the y-axis direction is obtained; wherein I_x，I_yAnd I_tThe derivatives of the image with respect to x-direction, y-direction, and time, respectively; in the right graph b1, u is defined_bThe moving speed v of the pixel point in the x-axis direction_bObtaining the derivative L of the pixel point in the image b1 for the x direction, the y direction and the time for the moving speed of the pixel point in the y-axis direction_x、L_yAnd L_t；

step 4.4, solving the optical flow field according to the weighted least square method [ u v]^TAnd [ u 'v']^TThe obtained optical flow field is approximate to a parallax value, and a parallax image I is obtained_O；

2. The binocular disparity map enhancement method based on confidence fusion of claim 1, wherein the specific application method of the step S2 is as follows: co-planar the left and right panels a and b; aligning the image lines of the left image a and the right image b; and correcting the left image a and the right image b to obtain a three-dimensional corrected output matrix, namely obtaining a corrected left image a1 and a corrected right image b 1.

3. The binocular disparity map enhancement method based on confidence fusion according to claim 1 or 2, wherein the specific method of the step S3 is as follows:

step 3.5, filling points with edge parallax value of 0 in the dense parallax map a2, filling the parallax information with value of 0 by adopting an image fusion method, and fusing to obtain an optimized parallax map I_c。

4. The binocular disparity map enhancement method based on confidence fusion according to claim 1 or 2, wherein the specific process of S5 is as follows:

s_i＝|d_li-d_ri|，

wherein the weight is

D＝arg max[c_i(x,y,d_i)]，