CN115880344A

CN115880344A - Binocular stereo matching data set parallax truth value acquisition method

Info

Publication number: CN115880344A
Application number: CN202211448064.9A
Authority: CN
Inventors: 应义斌; 王清玉; 周鸣川; 刘炜; 娄明照; 蒋焕煜
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-03-31

Abstract

The invention discloses a binocular stereo matching data set parallax truth value acquisition method. Respectively acquiring left and right views of a scene by using left and right cameras of a binocular camera, acquiring a depth map of the scene by using a structured light depth camera, acquiring internal parameters and external parameters of the respective cameras by using a camera joint calibration method, and calculating relative external parameters between the depth camera and the binocular left camera; calculating the pixel coordinate of each pixel point in the depth image under a left view pixel coordinate system through the internal parameter, the external parameter and the depth value, and realizing the registration of the depth image and the left view; and converting the depth value on the depth map into a parallax value by using internal parameters obtained by the binocular camera in binocular calibration to generate the parallax map, and obtaining a true value of the data set. The invention provides a method for acquiring the true value of the binocular stereo matching task and manufacturing the data set, and the manufactured data set can be used for transfer learning and fine adjustment of a deep learning model, and finally binocular three-dimensional reconstruction of a specific scene is realized.

Description

Binocular stereo matching data set parallax truth value acquisition method

Technical Field

The invention relates to a binocular data set parallax parameter acquisition method, in particular to a binocular stereo matching data set construction method for plant binocular three-dimensional reconstruction and phenotype measurement in the field of agricultural engineering.

Background

With the development of artificial intelligence and robot vision technology in recent years, the performance of the stereo matching method based on deep learning on the public binocular chart is far beyond the traditional algorithm. However, training of the deep learning model requires a large data set as a support, and the acquisition of a true value (disparity map) in the binocular stereo matching data set is a big problem in academia.

Aiming at the problem, a learner acquires a disparity map by using computer vision simulation software, but the model generalization capability trained on a virtual data set has a certain problem; for scenes such as outdoor reconstruction and automatic driving, a learner uses a depth value acquired by a laser radar as a true value and performs registration with a binocular left camera to generate a disparity map, but the disparity map acquired by the method has large error and low density; and a learner also builds a structured light measurement system by using a projector and a binocular camera aiming at an indoor reconstructed scene, and uses depth values obtained by encoding and decoding as true values, but the system of the method is complex, time-consuming and labor-consuming, and is not beneficial to automatically and quickly constructing a large-scale data set.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a method for quickly and semi-automatically acquiring a disparity truth value of a binocular stereo matching data set, and aims to construct a large-scale and high-quality binocular stereo matching data set under a specific application and scene, so as to be used for transfer learning and fine adjustment of a supervised deep learning model and realize high-precision depth information perception based on binocular vision.

Aiming at the problems, the invention provides a method for acquiring a high-precision and high-density parallax truth value by using a structured light depth camera and a binocular camera, which is used for quickly and automatically constructing a large-scale binocular stereo matching data set and for performing transfer learning and fine tuning of a deep learning stereo matching model under a specified scene to realize binocular reconstruction.

In order to achieve the above purpose, the present invention adopts the following technical scheme that the implementation steps include camera joint calibration, image registration, parallax calculation and parallax map generation.

The method comprises the steps of adopting a structured light depth camera and a binocular camera, respectively collecting images of a specified scene, and further processing the images and parameters of a left camera and the structured light depth camera in the binocular camera to obtain the parallax truth value of a binocular stereo matching data set under the specified scene.

The invention has the innovation that the parameters of the parallax truth value of the traditional binocular image data are difficult to accurately obtain, and the invention skillfully utilizes the high-precision depth map of the structured light depth camera to register with the image acquired by the left camera and convert the image into the parallax map, thereby realizing the accurate acquisition of the parallax truth value of the binocular stereo matching data set.

Step a, using a binocular camera and a structured light depth camera to build an imaging platform, and building a system by the imaging platform and a data set; and adjusting the relative poses of the two cameras until the relative poses are suitable, and keeping the relative poses of the two cameras unchanged in the subsequent calibration, image registration and data set construction processes.

The binocular camera adopts a ZED binocular camera;

the structured light depth camera adopts a Mech-Mind high-precision structured light depth camera.

B, obtaining internal parameters and external parameters of a left camera and a structured light depth camera of the binocular camera by using a checkerboard calibration plate and Zhang Zhengyou calibration method as calibration results, and then calculating relative external parameters between the left camera and the structured light depth camera, wherein the relative external parameters comprise a rotation matrix and a translation matrix;

c, shooting a specified scene by using a left camera and a right camera of a binocular camera to obtain a left view and a right view of the specified scene, and shooting the specified scene by using a structured light depth camera to obtain a depth map of the scene;

d, traversing each pixel in the depth map by using the calibration result and the relative external reference result, and calculating the pixel coordinate of each pixel in the left view for registration; namely, the corresponding relation between each pixel on the depth map and the left view is obtained, and the image registration is realized.

E, converting the depth values in the depth map by using the internal reference calibration result of the binocular camera to calculate parallax values so as to generate a parallax map, carrying out size normalization processing on the generated parallax map and the left and right views originally acquired by the binocular camera, for example, carrying out post-processing operations such as clipping and the like, and taking the left and right views after size normalization as binocular views of a binocular stereo matching data set, namely input signals; and taking the disparity map with the normalized size as a disparity truth value of a binocular stereo matching data set, namely a supervision signal.

And continuously repeating the process to construct a large-scale binocular reconstruction data set.

The step d comprises the following steps:

converting the coordinates of the current pixel under the pixel coordinate system of the depth map into the coordinates under the camera coordinate system of the depth camera by using the depth value of the current pixel in the depth map and the internal reference matrix of the depth camera;

secondly, converting the coordinates of the current pixel under the camera coordinate system of the depth camera into the coordinates under the camera coordinate system of the left camera in the binocular camera by using the relative external reference result obtained in the step b;

and converting the coordinates of the current pixel under the camera coordinate system of the binocular left camera into the coordinates under the pixel coordinate system of the left view by using the depth value of the current pixel and the internal reference matrix of the binocular left camera.

In the step e, the internal reference calibration result of the binocular camera comprises the focal length and the baseline distance, and the parallax value is converted and calculated according to the following formula:

wherein, b _ZED Representing binocular phaseBase line distance of the machine, f _ZED Denotes the focal length of the binocular camera, d _i Representing depth value of ith pixel point in depth map

The converted disparity value.

The invention has the beneficial effects that:

the invention generates the parallax truth value of the binocular stereo matching data set by a semi-automatic means through the proposed camera combined calibration and image registration mode.

The method for constructing the binocular stereo matching data set obtains the parallax truth value construction data set, can greatly reduce the cost of manpower and material resources in the construction process of the binocular stereo matching data set, and can ensure that a large-scale stereo matching data set is quickly constructed in a real scene.

Compared with data sets constructed by other methods, the real disparity map of the data set constructed by the method has the advantages of high precision, high density and the like, can be used for transfer learning and fine tuning of a supervised deep learning model in specific tasks and requirements, and realizes high-precision binocular stereo matching, three-dimensional reconstruction and robot depth perception in specific scenes (such as plant three-dimensional reconstruction and phenotype measurement in the field of agricultural engineering or robot indoor grabbing operation in the field of robots).

Drawings

FIG. 1 is a general flow diagram of the process;

FIG. 2 is a schematic diagram of an image registration process of the present invention;

FIG. 3 is a schematic diagram of an imaging platform set up;

FIG. 4 is a data set (including left view, right view, true parallax) constructed using this method, taking plants (including seedlings of spinach, tomato, pepper, pumpkin plants) as an example;

FIG. 5 is a representation of representative stereo matching algorithms (including conventional matching algorithms: BM, SGM, deep learning algorithms: PSmNet, gwcNet) on a test data set.

In the figure: 1, a ZED binocular camera, 2, a Mech-Mini structured light high-precision depth camera and 3, a camera connecting piece.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

The hardware device of the present invention includes: the system comprises a ZED binocular camera, a Mech-Mind structured light depth camera, connecting pieces of the camera and an imaging platform. The parameters of the structured light depth camera, such as resolution, field angle and the like, are different from those of the binocular camera, the depth map cannot be directly used as a true value of the binocular camera, and the pixel corresponding relation between the depth map and the image acquired by the binocular left camera is calculated.

On the basis of the imaging platform, firstly, the internal reference and the external reference of the left camera and the structured light depth camera of the binocular camera are respectively obtained by using a chessboard format calibration plate and a Zhang-Yongyou calibration method, and the relative external reference (including a rotation matrix and a translation matrix) between the two cameras is obtained by using the parameters. And traversing each pixel point in a pixel coordinate system of the structured light camera depth map, solving the coordinate of the pixel point in the pixel coordinate system of the left camera of the binocular camera, and converting the depth value corresponding to the pixel point into a corresponding parallax value by utilizing internal parameters of the binocular camera and the binocular camera to form the parallax map. The disparity map and left and right views acquired by a subsequent clipped binocular camera can jointly construct a data set of a binocular stereo matching task.

As shown in fig. 1 and fig. 2, the method described in the present invention includes three steps: camera joint calibration, image registration, parallax calculation and parallax map generation, wherein:

1) Camera joint calibration:

firstly, a binocular camera and a structured light depth camera are respectively used for acquiring a scene view containing a checkerboard calibration board. Obtaining an external reference matrix (including a rotation matrix R) of the left camera of the structured light depth camera and the binocular camera under a world coordinate system by using a Zhang-Zhengyou calibration method _mech 、R _ZED And a translation matrix t _mech 、t _ZED ) With respective reference matrices (including K) _mech 、K _ZED ). Then, by using the external parameter matrix, a relative rotation matrix R from a camera coordinate system of the structured light depth camera to a camera coordinate system of a left camera of a binocular camera is obtained _mech→ZED And a translation matrix t _mech→ZED ：

R _mech→ZED ＝R _ZED (R _mech ) ^-1

t _mech→ZED ＝t _ZED -R _ZED (R _mech ) ^-1 t _mech

Wherein R is _mech 、R _ZED Respectively representing the rotation matrixes of the ZED binocular camera and the Mech-Mind structured light depth camera.

2) Image registration:

referring to the steps shown in fig. 2, the key point of obtaining the real disparity map is to calculate the corresponding position of each pixel point in the depth map under the binocular camera left view pixel coordinate system. For this, all the pixels in each row and each column in the depth map need to be traversed for calculation.

Here, the ith pixel in the depth map is taken as an example for description.

First, using the depth value of the ith pixel

With internal reference of a structured light depth camera>

Pixel coordinates of ith pixel of depth map acquired by structured light depth camera

Conversion into coordinates in the camera coordinate system of a structured light depth camera @>

/>

Wherein the content of the first and second substances,

respectively representing the coordinate values of the ith pixel on the X, Y and Z axes of the camera coordinate system of the structured light depth camera, f _x，mech 、f _y，mech Denotes the focal length, u, of the structured light depth camera in the X and Y directions, respectively _0，mech 、v _0，mech Respectively represents the main point pixel coordinate values of the structured light depth camera in the u and v directions>

And coordinate values of the ith pixel in the pixel coordinate systems u and v of the structured light depth camera are respectively shown.

Next, using the relative conversion relation R between the two cameras obtained in step 1) _mech→ZED And t _mech→ZED Coordinates in the camera coordinate system of the depth camera may be determined

Coordinate under camera coordinates converted into binocular left camera

Wherein the content of the first and second substances,

and coordinate values of the ith pixel on X, Y and Z axes of a camera coordinate system of a left camera of the binocular camera are respectively represented.

Finally, utilizing the internal reference of the left camera of the binocular camera obtained by calibration in the step 1)

Depth value of corresponding pixel/>

Calculating the pixel coordinate of the pixel point under the field of view of the binocular left camera->

Wherein f is _x，ZED 、f _y，ZED Denotes focal lengths, u, of the left camera of the binocular camera in X and Y directions, respectively _0，ZED 、u _0，ZED Respectively representing principal point pixel coordinate values of a left camera of the binocular camera in u and v directions,

and coordinate values of the ith pixel in the pixel coordinate systems u and v of the left camera of the binocular camera are respectively represented.

The coordinates of the ith pixel in the pixel coordinate system of the depth camera may then be determined

Conversion into coordinates in the pixel coordinate system of a binocular left camera>

By traversing each pixel in the depth map, the pixel coordinates of each pixel in the depth map in the binocular left view can be obtained.

3) Parallax calculation and parallax map generation:

before the operation of the step is carried out, the ZeD binocular camera is subjected to additional binocular calibration again by using the chessboard calibration plate and Zhang Zhen calibration method, and the internal parameters (including binocular baseline distance b) of the ZED binocular camera are obtained _ZED Focal length f of camera _ZED )。

Followed by internal reference to binocular camera according to binocular vision (bag)Including baseline distance and focal length), the depth value of the corresponding pixel point can be determined

Converted into a disparity value d _i ：

Finally, because the field angle of the structured light depth camera is smaller than that of the binocular camera, edges of the left and right views acquired by the binocular camera need to be cut, and the view field range of the disparity map is kept consistent. The cut left and right views can be used as the input of a binocular stereo matching data set; the disparity map obtained after registration can be used as a true value of a data set, and provides supervision information during deep learning model training.

The plant binocular reconstruction data set constructed by the method described in the present invention is shown in fig. 4. The prediction results of the deep learning model trained using this data set are shown in fig. 5.

Therefore, the method is ingeniously provided for acquiring the true value of the parallax with high precision and high density of the binocular stereo matching task and manufacturing the data set, the manufactured data set can be used for transfer learning and fine adjustment of the deep learning model, and finally binocular three-dimensional reconstruction of a specific scene (such as robot grabbing, plant phenotype measurement and the like) is achieved.

Claims

1. A binocular stereo matching data set parallax truth value acquisition method is characterized by comprising the following steps:

2. The binocular stereo matching data set parallax true value acquisition method according to claim 1, wherein: the method specifically comprises the following steps:

step a, using a binocular camera and a structured light depth camera to build an imaging platform;

c, shooting a specified scene by using a binocular camera to obtain left and right views of the specified scene, and shooting the specified scene by using a structured light depth camera to obtain a depth map of the scene;

d, traversing each pixel in the depth map by using the calibration result and the relative external reference result, and calculating the pixel coordinate of each pixel in the left view for registration;

e, converting the depth values in the depth map by using the internal parameters of the binocular camera to calculate parallax values so as to generate a parallax map, carrying out size normalization processing on the generated parallax map and the left and right views originally acquired by the binocular camera, and taking the left and right views after size normalization as binocular views of a binocular stereo matching data set; and taking the disparity map after size normalization as a disparity true value of the binocular stereo matching data set.

3. The binocular stereo matching data set parallax true value acquisition method according to claim 2, characterized in that: the step d comprises the following steps:

4. The method for acquiring the parallax true value of the binocular stereo matching data set according to claim 2, wherein the method comprises the following steps: in the step e, the internal parameters of the binocular camera comprise the focal length and the baseline distance, and the parallax value is converted and calculated according to the following formula:

wherein, b _ZED Representing the base-line distance, f, of a binocular camera _ZED Denotes the focal length of the binocular camera, d _i Depth value for representing ith pixel point in depth map

The converted disparity value. />