CN112750155B

CN112750155B - Panoramic depth estimation method based on convolutional neural network

Info

Publication number: CN112750155B
Application number: CN202110053166.XA
Authority: CN
Inventors: 何炳蔚; 邓清康; 胡誉生; 张立伟; 陈彦杰; 林立雄
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2022-07-01
Anticipated expiration: 2041-01-15
Also published as: CN112750155A

Abstract

The invention relates to a panoramic depth estimation method based on a convolutional neural network, which comprises the following steps of: step S1, collecting RGB images, depth images and point cloud data of the outdoor environment, and splicing the RGB images and the depth images into a panoramic image according to the cylindrical projection principle; step S2, constructing a convolutional neural network model, and training a panoramic image based on the convolutional neural network model to obtain a trained convolutional neural network model; and step S3, inputting the panoramic image to be detected into the trained convolutional neural network model to obtain a dense panoramic depth prediction image. The method can adjust and optimize the local details of the panoramic image, thereby estimating the dense and accurate panoramic depth image.

Description

Panoramic depth estimation method based on convolutional neural network

Technical Field

The invention belongs to the field of image recognition and artificial intelligence, and particularly relates to a panoramic depth estimation method based on a convolutional neural network.

Background

Depth estimation is one of the basic tasks in computer vision. With the development of computer technology, deep learning has made a series of breakthrough progresses in the field of computer vision. Accurate 3D perception is always desirable because it plays a crucial role in numerous tasks of robotics and computer vision, such as automatic driving, positioning and mapping, path planning and 3D reconstruction. Various techniques have been proposed to obtain depth estimates, but none of them have some drawbacks. For example, an RGB-D camera is only suitable for short-range depth acquisition; the 3D LIDAR provides only sparse point cloud depth information; stereo cameras cannot produce reliable depth estimates in areas with uniform appearance or large variations in illumination. It is also common that the field angle of the depth estimate for any camera is limited by the size of the camera field angle.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method for estimating a panoramic depth based on a convolutional neural network, which is capable of estimating a dense and accurate panoramic depth image by fusing a panoramic image and LIDAR depth information in a cascade manner and adjusting and optimizing local details of the panoramic image according to a proposed PDCBN (panoramic depth regularization) network layer.

In order to achieve the purpose, the invention adopts the following technical scheme:

a panoramic depth estimation method based on a convolutional neural network comprises the following steps:

step S1, collecting RGB images, depth images and point cloud data of the outdoor environment, and splicing the RGB images and the depth images into a panoramic image according to the cylindrical projection principle;

step S2, constructing a convolutional neural network model, and training a panoramic image based on the convolutional neural network model to obtain a trained convolutional neural network model;

and step S3, inputting the panoramic image to be detected into the trained convolutional neural network model to obtain a dense panoramic depth prediction image.

Further, in step S1, an open city simulator cara is used to collect RGB images, depth images, and point cloud data of the outdoor environment.

Further, the step S1 is specifically:

step 1-1: loading a plurality of RGB cameras, a plurality of depth cameras and a 64-line LIDAR on a data acquisition vehicle of an open city simulator Carla, wherein the depth cameras correspond to the RGB cameras to form a 360-degree panoramic view, and controlling the data acquisition vehicle in the Carla to acquire RGB images, depth images and point cloud data in an outdoor environment;

step 1-2: based on the principle of cylindrical projection, performing cylindrical projection on each RGB image and depth image, and stitching the RGB images and the depth image into a panoramic image according to the overlapped area after the cylindrical projection;

and 1-3, cutting the stitched image into a panoramic image with a preset proportion.

Further, the step S1-2 specifically includes: setting a single image as a quadrilateral ABCD which represents a plane to be processed, and changing the single image into a curved surface EFGE1F1G1 after cylindrical projection;

assuming that the original image width is w, the height is h, and the camera view field angle is α, the camera focal length f is expressed as:

f＝w/(2*tan(α/2)) (1)

the position of a certain pixel point on the image is (x, y), and the coordinates of the pixel point after cylindrical projection are (x)₁,y₁)：

And after the image is projected in the cylindrical surface, stitching the image into a panoramic image according to the overlapped view field angle theta of the single camera and the left and right adjacent cameras.

Further, the step S2 is specifically:

step 2-1, constructing a convolutional neural network model to be trained, and taking a panoramic RGB image and a sparse panoramic depth image projected from a LIDAR as input;

and 2-2, calculating a loss function loss value by using a back propagation algorithm, reducing errors through iterative calculation, and performing parameter learning to enable a predicted value to approach a true value so as to obtain a trained convolutional neural network model.

Further, each layer of the convolutional neural network model sequentially performs operations of convolution, panorama condition regularization, activation and pooling, and a ReLU activation function is used:

let the input fused feature size be C × H × W, and the minimum batch λ ═ F_i]The panoramic depth condition regularization network layer is defined as:

wherein ε is a numerical stability small constant, α_i,c,h,w,β_i,c,h,wAre learnable parameters.

Setting LIDAR depth information as a function

Since the new parameters generated depend on LIDAR depth information, the network layer is named PDCBN;

different pixel values are arranged at different positions, and the mapping should be carried out according to pixels in sequence in the mapping process; setting s_cAnd y_cFunction:

at function s_cAnd y_cGiven a point in the panoramic LIDAR depth image

If the point exists in the LIDAR projection of the point into the panoramic image, the point is considered valid, enhanced or suppressed by the PDCBN network layer

Depth value of (d); otherwise, the point is an invalid point, and normal BN network layer processing is used.

Further, the convolutional neural network model uses a depth true value d_truAnd the model to pixel predicted value d_preTraining the model according to the absolute error of the LIDAR, and averaging the loss on the effective pixel N according to the true value of the sparse depth of the LIDAR, wherein a loss function is defined as:

compared with the prior art, the invention has the following beneficial effects:

the invention can adjust and optimize the local details of the panoramic image according to the proposed PDCBN (panoramic depth condition regularization) network layer by cascading and fusing the panoramic image and LIDAR depth information, thereby estimating the dense and accurate panoramic depth image.

Drawings

FIG. 1 is a schematic overall flow diagram of an embodiment of the present invention;

FIG. 2 is a schematic view of the principle of cylindrical projection in the present invention;

FIG. 3 is a schematic view of a cylindrical projection of the present invention: (a) is a top view of the cylindrical projection, (b) is a side view of the cylindrical projection;

fig. 4 is an example of stitching effect of a panoramic image after cylindrical projection in the present invention: (a) is a panoramic RGB image, (b) is a true panoramic depth image;

FIG. 5 is a schematic diagram of the convolutional neural network structure of the present invention;

fig. 6 is an example of a qualitative result of panoramic depth estimation according to the present invention, which is a panoramic RGB image, a sparse panoramic depth image projected from a LIDAR, an estimated panoramic depth image, a real panoramic depth image, and a panoramic depth error image in sequence from top to bottom.

Detailed Description

The invention is further explained by the following embodiments in conjunction with the drawings.

Referring to fig. 1, the present invention provides a panoramic depth estimation method based on a convolutional neural network, which includes the following steps:

step S1: collecting RGB images, depth images and point cloud data of an outdoor environment through an open city simulator Carla, splicing the RGB images and the depth images into panoramic images according to a cylindrical projection principle, and dividing a generated data set into a training set and a testing set (90/10%);

step S2: constructing a convolutional neural network model to be trained, taking a panoramic RGB image and a sparse panoramic depth image projected from LIDAR as input, calculating a loss function loss value by using a back propagation algorithm, reducing errors through iterative computation, and performing parameter learning to enable a predicted value to approach a true value, thereby obtaining an optimal weight model of the convolutional neural network;

step S3: and loading the weight model trained in the step S2, and inputting the divided panoramic test set into a convolutional neural network model for panoramic depth estimation to obtain a dense panoramic depth prediction image.

In this embodiment, the specific processing procedure of step S1 is as follows:

1) the method is characterized in that 5 RGB cameras, 5 depth cameras and a 64-line LIDAR are mounted on a data acquisition vehicle of an open city simulator Carla, the depth cameras correspond to the RGB cameras, the field angle is 90 degrees, and each camera is installed in a rotating mode of 72 degrees to form a 360-degree panoramic view. Then, operating a data acquisition vehicle in Carla, and acquiring RGB images, depth images and point cloud data in an outdoor environment;

2) fig. 2 is a schematic diagram illustrating the principle of cylindrical projection of an image. The image plane is a quadrilateral ABCD and represents the plane to be processed, and after cylindrical projection, becomes a curved surface EFGE1F1G 1. Suppose thatWhen a certain pixel point on the image is located at (x, y), the pixel point after the cylindrical projection is located at (x)₁,y₁)；

Fig. 3(a) is a top view of the principle of cylindrical projection, which can deduce the projection change process of the x value of the pixel:

wherein w represents the width of the image, i.e. the length of the line segment AB, theta represents the included angle between Ox and the line segment Ow, f represents the focal length of the camera, alpha represents the angle of the field of view of the camera, the size is 90 DEG, and w₁Representing the width, x, of the image after cylindrical projective transformation₁Indicating the post x-cylinder projection position.

The transformation can be obtained by substituting the formulas (1) and (2) into the formula (3), and the position x after the projection transformation of the point x is obtained₁：

Fig. 3(b) is a side view of the principle of cylindrical projection, which can deduce the projection change process of the y value of the pixel point, and can deduce from the similarity of the triangle:

where h represents the height of the image, i.e., the length of the line segment BC, f represents the focal length of the camera, θ represents the angle between Ox and the line segment Ow, h represents the distance between Ox and the line segment Ow₁Representing the height, y, of the image after cylindrical projective transformation₁Represents the y columnThe position of the surface after projection. And there are:

therefore, the position y after projective transformation of the point y is obtained by substituting the formula (6) into the formula (5)₁：

The image cylindrical projection is performed in a specific calculation process, and after the image cylindrical projection is performed, a panoramic image is stitched according to the condition that the overlapped view field angle of a single camera and the left and right adjacent cameras is 9 degrees. A panoramic image stitching example effect is shown in fig. 4.

3) The stitched image is cropped to a 256 x 3840 panoramic image, forming an integrated data set for the outdoor environment. Of these, 90% are the training set and 10% are the test set.

Fig. 5 shows the convolutional neural network structure of the present invention, and step 2 is to construct the convolutional neural network model to be trained according to the proposed convolutional neural network structure.

In this embodiment, the convolutional neural network for panoramic depth estimation is an end-to-end depth learning framework that takes a panoramic color image and a sparse panoramic depth image projected from a LIDAR as inputs and outputs a dense panoramic depth image.

The entire network is mainly composed of two paths: and the coarse-scale network path and the PDCBN network optimization path. And a coarse-scale network path firstly fuses the input panoramic color image and the sparse LIDAR depth image to realize the prediction of the panoramic depth of the scene on the global level. Then, the PDCBN network optimization path adaptively adjusts and optimizes local details according to the LIDAR depth value.

Thus, the neural network model can compile global predictions and incorporate finer details, thereby generating dense and accurate panoramic depth images. The specific processing process of the network comprises the following steps:

1) the network structure cascade fuses the input panoramic RGB image and the sparse panoramic depth image projected from the LIDAR.

2) Each layer network performs convolution, panorama conditional regularization (PDCBN), activation, pooling operations in turn, using the ReLU activation function:

let the input fused feature size be C × H × W, and the minimum batch λ ═ F_i]. At this time, the proposed panorama depth conditional regularization network layer is defined as:

where ε is a numerical stability small constant, α_i,c,h,w,β_i,c,h,wAre learnable parameters. When model training is applied to the PDCBN, new parameters α, β are generated by mapping the LIDAR depth information. Setting LIDAR depth information as a function

Since the new parameters generated depend on the LIDAR depth information, the network layer is named PDCBN.

The mapping method has different pixel values at different positions, so that the mapping should be performed in sequence according to pixels in the mapping process; and the LIDAR depth information is discontinuous, a policy needs to be set to handle the depth information contained in the sparse depth image. By setting s_cAnd y_cThe function solves the above problem:

at function s_cAnd y_cGiven a point in the panoramic LIDAR depth image

Depth value of (d); otherwise, the point is an invalid point, and normal BN network layer processing is used. This completes the pixel-wise sequential mapping of the panoramic LIDAR depth, thereby adaptively adjusting and optimizing the local depth values.

3) And setting a loss function in the constructed convolutional neural network model to be trained. Using true values of depth d_truAnd the model to pixel predicted value d_preThe absolute error of the model. The loss over the effective pixel N is averaged according to the true value of the sparse depth of the LIDAR. The loss function is defined as:

step 3 is to input the test data into the weight model trained in step 2, the resolution of the input panoramic image is 256 × 3840, the output dense panoramic depth image has good quality of depth reconstruction in the scene with recovered details and deeper, and the completed panoramic depth image is shown in fig. 6.

Preferably, in this embodiment, the setting of an evaluation index for the panoramic depth image obtained by prediction specifically includes: root Mean Square Error (RMSE), Mean Absolute Error (MAE), root mean square error of inverse depth (iRMSE) and mean absolute error of inverse depth (iMAE), where RMSE is the most important indicator.

And (4) carrying out error evaluation on the tested panoramic depth image according to the four evaluation indexes, wherein the experimental evaluation error result is shown in table 1.

Table 1: error assessment

Preferably, in this embodiment, the operating system for training the neural network model is ubuntu16.04, the graphics card model is NVIDIA Tesla M40, and pytorch1.0, deep learning framework and python3.5 programming languages are used. And (3) performing 20 times of loop training in the actual training process, selecting a RMSProp iterative optimizer, setting a parameter alpha to be 0.9, and setting the learning rate to be 0.001.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A panoramic depth estimation method based on a convolutional neural network is characterized by comprising the following steps:

step S1, collecting RGB images, depth images and LIDAR projected sparse panoramic depth images of the outdoor environment, and splicing the RGB images and the depth images into panoramic images according to the cylindrical projection principle;

step S3, inputting the panoramic image to be tested into the trained convolutional neural network model to obtain a dense panoramic depth prediction image;

each layer of the convolutional neural network model sequentially executes operations of convolution, panorama condition regularization, activation and pooling, and a ReLU activation function is used:

let the input panoramic image size be C × H × W, and the minimum batch λ be [ F [ ]_i]The panoramic depth condition regularization network layer is defined as:

wherein ε is a numerical stability small constant, α_i,c,h,w,β_i,c,h,wIs a learnable parameter;

setting LIDAR depth information as a function

at function s_cAnd y_cGiven a point in a sparse panoramic depth image of a LIDAR projection

If the point is present from the LIDAR projection into the panoramic image, the point is deemed valid, enhanced or suppressed by the PDCBN network layer

2. The convolutional neural network-based panoramic depth estimation method of claim 1, wherein the step S1 adopts an open city simulator cara to acquire RGB images, depth images, and sparse panoramic depth images projected by LIDAR of outdoor environment.

3. The panoramic depth estimation method based on the convolutional neural network as claimed in claim 2, wherein the step S1 specifically comprises:

step 1-1: loading a plurality of RGB cameras, a plurality of depth cameras and a 64-line LIDAR on a data acquisition vehicle of an open city simulator Carla, wherein the depth cameras correspond to the RGB cameras to form a 360-degree panoramic view, controlling the data acquisition vehicle in the Carla, and acquiring RGB images, depth images and sparse panoramic depth images projected by the LIDAR in an outdoor environment;

step 1-2: based on the principle of cylindrical projection, carrying out cylindrical projection on each RGB image and depth image, and splicing into panoramic images according to the overlapped areas after cylindrical projection;

and 1-3, cutting the spliced image into a panoramic image with a preset proportion.

4. The panoramic depth estimation method based on the convolutional neural network as claimed in claim 3, wherein the step 1-2 is specifically: setting a single image as a quadrilateral ABCD which represents a plane to be processed, and changing the single image into a curved surface EFGE1F1G1 after cylindrical projection;

f＝w/(2*tan(α/2)) (1)

the position of a certain pixel point on the image is (x, y), and the coordinate of the pixel point after the cylindrical projection is (x)₁,y₁)：

After the image is projected on the cylindrical surface, the panoramic image is spliced according to the coincidence view field angle theta of the single camera and the left and right adjacent cameras.

5. The panoramic depth estimation method based on the convolutional neural network of claim 1, wherein the step S2 specifically comprises:

step 2-1, constructing a convolutional neural network model to be trained, and taking a panoramic image and a sparse panoramic depth image projected from a LIDAR as input;

6. The convolutional neural network-based panoramic depth estimation method of claim 5, wherein the convolutional neural network model uses a true depth value d_truAnd the model to pixel predicted value d_preAbsolute error ofTraining the model and averaging the loss on the effective pixel N according to the LIDAR sparse depth true value

The loss function is defined as: