CN114445475A

CN114445475A - Depth completion method for sparse depth map, computer device, and storage medium

Info

Publication number: CN114445475A
Application number: CN202210074048.1A
Authority: CN
Inventors: 郭裕兰; 杜沛峰; 胡俊
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: National University of Defense Technology; Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-06

Abstract

The invention discloses a depth completion method of a sparse depth map, a computer device and a storage medium, wherein the process of training a neural network in the depth completion method comprises the steps of obtaining a color image and a depth map true value, carrying out equidistant sampling on the depth map true value to obtain a sparse depth map, extracting from the color image and the sparse depth map to obtain a multi-scale feature map, carrying out regression on the multi-scale feature map to obtain an initial depth map, calculating pixel correlation, carrying out multiple iterative filtering on the initial depth map to obtain a dense depth map, carrying out multiple iterative processing processes on the dense depth map to train an image consistency optimization module and the like. The invention trains the neural network, can improve the prediction effect of the image consistency optimization module on the pixel depth at the boundary of the dense depth map, so that the image consistency optimization module has the capability of predicting the pixel point depth of the dense depth map according to the dense depth map, and the neural network has the capability of deep completion. The invention is widely applied to the technical field of image processing.

Description

Depth completion method for sparse depth map, computer device, and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a depth completion method of a sparse depth map, a computer device and a storage medium.

Background

Depth perception is a basic task of three-dimensional vision, and the distance between a point on an object represented by each pixel point in an image and a certain reference plane can be obtained through the depth perception, so that important control parameters are provided for applications in aspects of automatic driving control, robots, augmented reality and the like. Depth perception technology has been rapidly developed in recent years, but how to obtain high-precision and high-resolution depth maps at low cost is still a challenging task. The depth sensor with low cost and low power consumption can only obtain a low-resolution and very sparse depth map, the depth information of a large number of pixel points is lost, and a dense depth map is required in practical application.

Interpretation of terms:

depth map: refers to an image having the distance (depth) from the image collector to each point in the scene as a pixel value;

sparse depth map: the depth map is a depth map with definite depth values of partial pixel values, and compared with a dense depth map, the proportion of the number of pixels with definite depth values in the sparse depth map to all pixels is low;

dense depth map: the depth map is a depth map with definite depth values of partial or all pixel values, and compared with the sparse depth map, the dense depth map has a higher proportion of pixels with definite depth values in all pixels.

Disclosure of Invention

The invention aims to provide a depth completion method of a sparse depth map, a computer device and a storage medium, aiming at least one technical problem that the sparse depth map obtained in the current depth perception technology cannot meet the requirement of practical application on a dense depth map.

In one aspect, an embodiment of the present invention includes a neural network training method, where the depth completion method for a sparse depth map includes:

training a neural network; the neural network comprises a first coding network, a first decoding network, a pixel correlation calculation module, a pixel correlation optimization module and an image consistency optimization module;

acquiring an image to be processed;

performing deep completion on the image to be processed by using the trained neural network;

the training of the neural network comprises:

acquiring a color image;

acquiring a depth map true value corresponding to the color image;

equidistant sampling is carried out on the depth map truth value, and a sparse depth map is obtained;

the first coding network extracts a multi-scale feature map from the color image and the sparse depth map; the multi-scale feature map comprises feature information of a plurality of different scales;

the first decoding network regresses the multi-scale feature map to obtain an initial depth map;

the pixel correlation calculation module calculates pixel correlation according to the multi-scale feature map; the pixel correlation represents the degree of correlation between each pixel and adjacent pixels in the color image;

the pixel correlation optimization module carries out multiple iterative filtering on the initial depth map to obtain a dense depth map;

the image consistency optimization module executes a plurality of rounds of iterative processing processes; in one round of iterative processing, determining a residual map of the iterative processing according to a target depth map and the color image, determining a processing result of the iterative processing according to the target depth map and the residual map, determining a loss function value according to the processing result of the iterative processing and the target depth map, and adjusting parameters of the image consistency optimization module, the first coding network, the first decoding network and the pixel correlation calculation module according to the loss function value or ending the training process; when the iteration processing procedure of the current round is the first iteration processing procedure, the target depth map is the dense depth map, and when the iteration processing procedure of the current round is not the first iteration processing procedure, the target depth map is the processing result of the previous iteration processing procedure.

Further, the equidistant sampling of the depth map truth values to obtain a sparse depth map includes:

establishing a grid; distances between any two adjacent grid points in the grid are equal;

taking the position of the depth map truth value corresponding to each grid point as a reference point;

and sampling the depth map truth value by taking the reference point in the depth map truth value and/or a point with a distance between the reference point and the depth map truth value not exceeding a distance threshold as a sampling point to obtain the sparse depth map.

Further, the extracting the multi-scale feature map from the color image and the sparse depth map includes:

coding the color image to obtain first characteristic coding information;

coding the sparse depth map to obtain second feature coding information; the first feature encoding information and the second feature encoding information are in the same feature space;

fusing the first feature coding information and the second feature coding information to obtain third feature coding information;

and coding the third feature coding information to obtain the multi-scale feature map.

Further, the calculating pixel relevance according to the multi-scale feature map comprises:

expanding the feature information of each scale in the multi-scale feature map to the same size as the color image;

acquiring the channel dimension of the color image corresponding to the characteristic information of each scale;

splicing the expanded feature information of each scale together according to corresponding channel dimensions to obtain a spliced feature map;

carrying out multilayer convolution processing on the spliced feature map to obtain a multidimensional tensor; and taking the multi-dimensional tensor as the pixel correlation.

Further, the determining a processing result of the iterative processing procedure in this round according to the target depth map and the residual map includes:

and taking the sum of the target depth map and the residual map of the iteration processing process of the current round as a processing result of the iteration processing process of the current round.

Further, the determining a loss function value according to the processing result of the iterative processing procedure in the current round includes:

for the processing result of the iterative processing procedure of the t-th round, according to a formula

Determining a loss function value; wherein Loss is the Loss function value, D₁For the dense depth map, D_gtFor the true values of the depth map,

for the processing result of the iterative processing procedure of the nth round, λ₁And

in order to be the weight coefficient,

is a loss function.

Further, the loss function is any one of the following functions or a combination of several functions: an L1 function, an L2 function, a SmoothL1 function, or a Berhu function.

Further, the performing deep completion on the image to be processed by using the trained neural network includes:

inputting the image to be processed into a trained neural network for processing;

and acquiring a processing result of the neural network.

In another aspect, an embodiment of the present invention further includes a computer apparatus, including a memory and a processor, where the memory is used to store at least one program, and the processor is used to load the at least one program to perform the depth completion method in the embodiment.

In another aspect, the present invention further includes a storage medium in which a processor-executable program is stored, the processor-executable program being configured to perform the deep completion method in the embodiments when executed by a processor.

The invention has the beneficial effects that: in the depth completion method in the embodiment, an image consistency optimization module in a neural network is trained, so that the prediction effect of the image consistency optimization module on the pixel depth at the boundary of a dense depth map can be improved, the image consistency optimization module has the capability of predicting the pixel point depth of the dense depth map according to the dense depth map, and in each iteration processing process in the training process, the obtained processing result has more accurate depth information than the processing result of the previous iteration processing process, so that the result obtained by the processing of the image consistency optimization module can obtain more accurate depth information than the dense depth map, which is equivalent to obtaining a dense depth map which is finer than the dense depth map input to the image consistency optimization module, so that the neural network has the capability of depth completion; the trained neural network is used in the image processing method, and depth completion can be performed on the image to be processed, so that a dense depth map relative to the image to be processed is obtained.

Drawings

FIG. 1 is a flow chart of training a neural network according to an embodiment;

FIG. 2 is a schematic diagram of training a neural network in an embodiment;

FIG. 3 is a diagram illustrating the construction of a grid in an embodiment;

FIG. 4 is a schematic diagram illustrating equidistant sampling of depth map truth values using the reference points themselves as sampling points in the embodiment;

fig. 5 is a schematic diagram of equidistant sampling of depth map truth values with non-reference points as sampling points in the embodiment.

Detailed Description

In this embodiment, the depth completion method for the sparse depth map includes the following steps:

s1, training a neural network;

s2, acquiring an image to be processed;

and S3, carrying out deep completion on the image to be processed by using the trained neural network.

In step S1, the neural network to be trained includes a sampling module, a first encoding network, a first decoding network, a pixel correlation calculation module, a pixel correlation optimization module, and an image consistency optimization module. The training of the neural network can make the neural network have the performance of deep completion, so before the deep completion method is explained, the neural network training method, that is, step S1, is explained.

Referring to fig. 1, the neural network training method, i.e., the process of training the neural network in step S1, includes the following steps:

p1, acquiring a color image;

p2, acquiring a depth map true value corresponding to the color image;

p3, carrying out equidistant sampling on the true value of the depth map by a sampling module to obtain a sparse depth map;

p4, extracting the multi-scale characteristic image from the color image and the sparse depth image by the first coding network; the multi-scale feature map comprises feature information of a plurality of different scales;

p5. the first decoding network regresses the multi-scale feature map to obtain an initial depth map;

p6, the pixel correlation calculation module calculates the pixel correlation according to the multi-scale feature map; the pixel correlation represents the degree of correlation between each pixel and adjacent pixels in the color image;

p7, carrying out multiple iterative filtering on the initial depth map by a pixel correlation optimization module to obtain a dense depth map;

p8, the image consistency optimization module executes a plurality of rounds of iterative processing processes; in a round of iterative processing process, determining a residual error map of the round of iterative processing process according to a target depth map and a color image, determining a processing result of the round of iterative processing process according to the target depth map and the residual error map, determining a loss function value according to the processing result of the round of iterative processing process and the target depth map, adjusting a parameter of an image consistency optimization module, a parameter of a first coding network, a parameter of a first decoding network and a parameter of a pixel correlation calculation module according to the loss function value or ending the training process; when the current iteration processing process is the first iteration processing process, the target depth map is a dense depth map, and when the current iteration processing process is not the first iteration processing process, the target depth map is a processing result of the previous iteration processing process.

The principle of the steps P1-P8 is shown in FIG. 2.

Before performing step P1, the depth values may be dense from a pre-constructed RGB-D data set, each color image in the data set corresponding to a depth map true value describing the depth values of some or all of the pixels in the corresponding color image. In steps P1 and P2, color images and corresponding depth map true values are read from the RGB-D data set.

In step P3, the sampling module performs equidistant sampling on the depth map truth values to obtain a sparse depth map. When the step P3 is executed, the sampling module specifically executes the following steps:

p301, establishing a grid; the distance between any two adjacent grid points in the grid is equal;

p302, using the position corresponding to each grid point in the depth map truth value as a reference point;

and P303, sampling the true value of the depth map by taking the reference point in the true value of the depth map or a point with the distance from the reference point not exceeding a distance threshold as a sampling point, so as to obtain the sparse depth map.

The mesh created in step P301 has a feature of "the distance between any two adjacent mesh points is equal", where "two adjacent mesh points" may refer to two mesh points in which no other mesh point exists in the middle of a line segment after the line segment is connected. The equilateral triangular grid shown in FIG. 3 is a symbolThe above-mentioned special electric grid is composed of a plurality of equilateral triangles, the vertex of each equilateral triangle can be regarded as a grid point in the equilateral triangle grid, and the distances between two adjacent grid points are equal, so in step P302, the grid lines in the grid are removed, only the grid points are kept, and the result shown in fig. 4 can be obtained, that is, the reference points in the true value of the depth map are determined, and these reference points correspond to the intersection points in the equilateral triangle grid, so that the distances between any two adjacent reference points are equal. In step P303, the depth map true value may be directly sampled by using the reference point itself shown in fig. 4 as a sampling point, and equidistant sampling may be implemented, where the result of sampling each sampling point is to obtain a depth value of the depth map true value at the position of the sampling point, and the depth values collected for multiple sampling points are combined into the sparse depth map in the present embodiment. In step P303, the reference point itself shown in fig. 4 may not be used as the sampling point, for example, referring to fig. 5, where point A1, point B and point C are all reference points, and an equilateral triangle Δ A1BC with a strict side length L is formed therebetween; point A2 is not a reference point, and the distance between point A2 and point A1 is Δ L if a length threshold R, expressed in percentage, is set_maxWhen R is_maxIs small enough (e.g. R)_maxLess than 10%) and Δ L/L<ΔR_maxThe difference caused by Δ L can be ignored, i.e., Δ A2BC can be regarded as an equilateral triangle approximately, and at this time, if the reference points B and C and the non-reference point A2 are used as sampling points, approximately equidistant sampling can be realized.

For a grid formed by equilateral triangles, the number of sampling points is related to the side length of the equilateral triangles, and under the condition that the dimension of the true value of the depth map is unchanged and the full true value of the depth map is sampled, the smaller the side length of the equilateral triangles is, the more the number of the sampling points is, so that the number of the sampling points can be determined firstly, the side length of each equilateral triangle is determined according to the number of the sampling points, and then the equilateral triangle grid is established.

When step P3 is executed, the position of the reference point may also be determined in other ways, for example, a grid composed of a plurality of equally large circles may be established, two adjacent circles are in a tangent relationship, and equidistant sampling or approximately equidistant sampling may also be implemented by using the position of the center of each circle as the reference point.

The sparse depth map is obtained by sampling in an equidistant sampling or approximately equidistant sampling mode, the sparse depth map is used in the subsequent training process of the neural network, the generalization of the training of the neural network is favorably improved, and particularly, compared with the mode of randomly determining sampling points, the generalization optimization effect is more obvious. When the distances between adjacent sampling points are not strictly equal but have certain deviation, the method belongs to approximate equidistant sampling, and the effect of improving the generalization of training close to the equidistant sampling can be obtained.

The first coding network and the first decoding network in steps P4 and P5 may be independent networks or may be part of the same coding network, i.e. the first coding network.

In step P4, the first coding network processes the color image obtained in step P1 and the sparse depth map obtained in step P3, and extracts a multi-scale feature map. The first coding network, when performing step P4, may specifically perform the following steps:

p401, coding the color image to obtain first characteristic coding information;

p402, coding the sparse depth map to obtain second feature coding information;

p403, fusing the first feature coding information and the second feature coding information to obtain third feature coding information;

and P404, encoding the third feature encoding information to obtain a multi-scale feature map.

In this embodiment, the first coding network includes a plurality of convolutional layers, and when step P401 is executed, the first coding network codes the color image through a part of the convolutional layers to obtain first feature coding information, and when step P402 is executed, the first coding network codes the sparse depth map through another part of the convolutional layers to obtain first feature coding information, and the first feature coding information and the second feature coding information are coded into the same feature space. In step P403, the first coding network fuses the first feature coding information and the second feature coding information, and the obtained result is third feature coding information. In step P404, the first coding network codes the third feature coding information to obtain the multi-scale feature map.

The multi-scale feature map obtained by the first coding network executing steps P401-P404 contains a plurality of feature information extracted from the color image and the sparse depth map, and the feature information corresponds to different scales.

In step P5, the first decoding network performs regression on the multi-scale feature map obtained in step P4 to obtain an initial depth map. Specifically, the first decoding network is composed of multiple deconvolution layers, and gradually restores the multi-scale feature map to the size of the color image, in which the original depth map D is obtained by regression₀。

In step P6, the pixel relevance computation module computes pixel relevance from the multi-scale feature map.

The basis for performing step P6 is the local consistency of the distribution of depth values. In a depth map obtained by shooting a real scene, the depth value distribution of most regions has local consistency, that is, for any pixel point, the corresponding depth value is more similar to the depth value of the pixel point in the adjacent region than to the depth value of the pixel point in the non-adjacent region, and by applying the principle, the depth value corresponding to any pixel point is always more similar to the depth value of the pixel point in the adjacent region than to the depth value of the pixel point in the non-adjacent region, so that for one pixel point in the depth map, the long-distance information (for example, the depth information of the pixel point in the non-adjacent region) is invalid, and the local depth information is similar or consistent.

Since in the process of extracting the multi-scale feature map, the multi-scale feature map is written with neighborhood pixel information of different receptive field sizes, in this embodiment, based on the principle of local depth consistency, when the pixel correlation calculation module executes step P6, the following steps may be specifically executed:

p601, respectively expanding the feature information of each scale in the multi-scale feature map to the same size as the color image;

p602, acquiring the channel dimension of the color image corresponding to the characteristic information of each scale;

p603, splicing the expanded feature information of each scale together according to corresponding channel dimensions to obtain a spliced feature map;

p604, carrying out multilayer convolution processing on the spliced characteristic diagram to obtain a multidimensional tensor; the multidimensional tensor is taken as the pixel correlation.

In step P601, the feature information of each scale in the multi-scale feature map is respectively extended to the same size as the color image, specifically, the original small-scale feature information may be copied for multiple times and combined into an image with the same resolution size as the color image, or the small-scale feature information may be extended to the same resolution size as the color image by interpolation methods such as nearest neighbor interpolation, and the diffusion of the multi-scale feature map is naturally limited in a local area in the process of executing step P601.

In step P602, channel dimensions of the color image corresponding to the feature information of each scale are obtained, and in step P603, the expanded multi-scale feature maps are spliced together according to the corresponding channel dimensions to obtain a spliced feature map.

In step P604, the multi-layer convolution processing is performed on the stitched feature map obtained in step P603, so as to obtain a multi-dimensional tensor. The multidimensional tensor obtained in step P604 is the pixel correlation obtained by performing step P6.

A specific example of steps P601-P604 is: assuming that the size of the original image, i.e. the color image, is H × W, if the multi-scale feature map has N layers, i.e. the multi-scale feature map includes N pieces of feature information with different scales, the sizes corresponding to the N pieces of feature information are respectively H × W

Wherein C is_nRepresenting the number of channels of the n-th layer of feature information in the multi-scale feature map, and expanding each layer of feature information into an original image with a large size by adopting a certain interpolation modeSmall, expanded n layers of characteristic information is C_nXhxw, N is 1,2, …, N. Splicing the expanded characteristic information according to the channel dimension to obtain

And then outputting the pixel correlation through the multilayer convolution layer. Taking a 3 × 3 neighborhood as an example, the relationship between the central pixel and 8 pixels in the neighborhood needs to be measured, so executing steps P601-P604 will obtain an 8 × H × W multidimensional tensor, which can represent the correlation between all pixels of the full image of the color image and the neighborhood pixels, i.e. can be used as the pixel correlation obtained in step P6.

In step P7, the pixel correlation optimization module guides the optimization process of the initial depth map by using the pixel correlation calculated by the pixel correlation calculation module in step P6. Specifically, the pixel correlation optimization module normalizes the 8 × H × W pixel correlation to construct H × W convolution kernels with position correlation, the H × W convolution kernels correspond to H × W pixels respectively, the initial depth map is subjected to multiple iterative filtering by using the set of convolution kernels, and a dense depth map D is output₁。

In step P8, the image consistency optimization module performs multiple rounds of iterative processing on the color image obtained in step P1 and the dense depth map obtained in step P7.

When step P8 is executed, the images to be processed by the image consistency optimization module, except for the color images, are collectively referred to as a target depth map for each round of processing, specifically, in the first round of iterative processing, the target depth map refers to the dense depth map obtained in step P7, and in each round of iterative processing after the first round of iterative processing, the target depth map refers to the processing result obtained by executing the previous round of iterative processing. The various iterative processes performed by the image consistency optimization module can therefore be described as:

in the first round of iterative processing, the image consistency optimization module determines a residual image of the current round (first round) iterative processing according to the dense depth image obtained in the step P7 and the color image obtained in the step P1; determining a processing result of the iteration processing process of the current round (first round) according to the dense depth map and the residual map of the current round (first round), determining a loss function value according to the processing result of the iteration processing process of the current round, adjusting parameters of the image consistency optimization module, the first coding network, the first decoding network and the pixel correlation calculation module according to the loss function value (executing the iteration processing process of the second round after adjusting the parameters of the image consistency optimization module), or ending all the training processes;

in the process of the t (t is more than 1) th iteration processing, the image consistency optimization module determines a residual image of the process of the current iteration processing (the t-th iteration processing) according to the processing result obtained in the process of the t-1 th iteration processing and the color image obtained in the step P1; determining the processing result of the iterative processing procedure of the current round (the t-th round) according to the processing result obtained by the iterative processing procedure of the t-1 th round and the residual error map of the current round (the t-th round), determining a loss function value according to the processing result of the iterative processing procedure of the current round, and adjusting the parameters of the image consistency optimization module, the first coding network, the first decoding network and the pixel correlation calculation module according to the loss function value (after adjusting the parameters of the image consistency optimization module, executing the iterative processing procedure of the t +1 th round), or ending the whole training procedure.

Since the principle of each iteration process is the same, the difference between the first iteration process and each other iteration process is mainly that the dense depth map obtained in step P7 or the processing result obtained in the previous iteration process is processed, so that the whole iteration process can be described by describing the t (t > 1) th iteration process.

In the process of the t round of iterative processing, the image consistency optimization module executes the following steps:

p801. processing result obtained according to the t-1 th iteration processing process

Determining the residual map Delta D of the iterative processing procedure of the current round (the t-th round) with the color image obtained in the step P1^t；

Specifically, image oneThe responsiveness optimization module comprises a second coding and decoding network, and the second coding and decoding network is a cyclic coding and decoding network. When the step P801 is executed, the processing result obtained by the t-1 th round of iterative processing procedure is processed

Inputting the color image obtained in step P1 into a second coding/decoding network, and outputting a residual error map Δ D^tResidual map Δ D^tRepresents the processing result obtained by the t-1 th iteration processing procedure

If it needs to be adjusted to coincide with the color image, the depth value needs to be adjusted.

P802. according to the processing result obtained by the t-1 round of iterative processing process

And residual map Δ D of the current round (t-th round)^tDetermining the processing result of the iteration processing procedure of the current round (the t-th round)

Namely, it is

P803. the processing result of the iterative processing procedure according to the current round (the tth round)

A loss function value is determined. In particular, according to the formula

Determining a Loss function value, wherein Loss is the Loss function value, D₁For the dense depth map obtained in step P7, D_gtFor the depth map true value obtained in step P2,

for the nth (n < t) iterationResult of the treatment process, λ₁And

in order to be the weight coefficient,

for the loss function, in particular,

may be an L1 function, an L2 function, a SmoothL1 function, a Berhu function, or a combination thereof, for example,

it may be the function of L1 that,

it may be the function of L2 that,

or a combination of the L1 function and the Berhu function.

P804, setting a loss value threshold, comparing the loss function value determined in the step P803 with the loss value threshold, if the loss function value is smaller than the loss value threshold, ending the iterative processing process, and not executing the (t + 1) th and subsequent iterative processing processes, namely ending all the training processes for the neural network; and if the loss function value is not less than the loss threshold value, executing the (t + 1) th round of iterative processing procedure.

The image consistency optimization module is trained through the steps P801-P804, so that the prediction effect of the image consistency optimization module on the pixel depth of the boundary of the dense depth map can be improved, the image consistency optimization module has the capability of predicting the pixel depth of the image consistency optimization module according to the dense depth map, and in each iteration processing process in the training process, the obtained processing result has more accurate depth information than the processing result of the previous iteration processing process, so that the more accurate depth information than the dense depth map can be obtained through the result obtained through the processing of the image consistency optimization module, and equivalently, a dense depth map which is finer than the dense depth map input to the image consistency optimization module is obtained. Specifically, the training method for the neural network in this embodiment further includes the following advantages:

(1) the method keeps the consistency of the distribution of the sparse depth map in the image during sampling, so that the diffusion range of the depth value learned by the neural network is stable, and compared with the sparse depth map obtained by the conventional random sampling, the generalization of the neural network can be effectively improved. The method adopts the sparse depth map with variable density during training, improves the adaptability of the neural network to sparse input with different densities in a training stage, and has better robustness compared with the sparse depth map with fixed density commonly used in the prior training;

(2) a pixel correlation calculation method based on local consistency. The invention explicitly introduces the local consistency constraint of depth distribution in a real scene when calculating the pixel correlation, and effectively improves the measurement precision of the pixel correlation. Experimental results show that the optimization effect of the pixel correlation calculated based on the method on the initial depth map is superior to that of other methods.

(3) A depth map optimization method based on image consistency. The depth map predicted by the prior art often has the problem of fuzzy depth edge prediction, and the method adopts an optimization method based on image consistency to modify the depth map by maximally utilizing image information, so that the prediction effect of a scene fine structure can be effectively improved.

In this embodiment, the neural network trained through steps P1-P8 is applied to the processing of the sparse depth map, and steps S2 and S3 in the depth completion method of the sparse depth map may be performed.

S1, training a neural network;

s2, acquiring an image to be processed;

The image to be processed obtained in step S2 belongs to a sparse depth map with respect to the image obtained after depth completion by the neural network.

In step S3, the to-be-processed image belonging to the sparse depth map is input to the neural network, the neural network can predict the depth information of the sparse depth map, and the output result of the neural network is a dense depth map of the relatively sparse depth map, thereby achieving the effect of depth completion.

In step S3, the neural network used in the step S1-S3 may be obtained by the neural network de-sampling module trained in steps P1-P8, that is, the neural network used in steps S3832-S3 may be composed of only the first encoding network, the first decoding network, the pixel correlation calculation module, the pixel correlation optimization module, and the image consistency optimization module. In step S3, after the image to be processed is input to the neural network, the neural network may perform the following steps based on the principle of steps P1-P8:

s301, acquiring an image to be processed;

s302, obtaining a depth map true value corresponding to an image to be processed;

s303, carrying out equidistant sampling on the true value of the depth map by a sampling module to obtain a sparse depth map;

s304, extracting a multi-scale feature map from the image to be processed and the sparse depth map by using a first coding network; the multi-scale feature map comprises feature information of a plurality of different scales;

s305, the first decoding network regresses the multi-scale feature map to obtain an initial depth map;

s306, a pixel correlation calculation module calculates pixel correlation according to the multi-scale feature map; the pixel correlation represents the degree of correlation between each pixel and adjacent pixels in the image to be processed;

s307, carrying out multiple iterative filtering on the initial depth map by a pixel correlation optimization module to obtain a dense depth map;

and S308, the second coding and decoding network in the image consistency optimization module carries out coding prediction on the dense depth map in the step S307, and outputs a depth map which is more accurate than the dense depth map in the step S307, so that depth completion of the image to be processed is realized.

The depth completion method for the sparse depth map in the present embodiment may be implemented by writing a computer program for implementing the depth completion method for the sparse depth map in the present embodiment, writing the computer program into a computer device or a storage medium, and executing the depth completion method for the sparse depth map in the present embodiment when the computer program is read out to run, thereby achieving the same technical effect as the depth completion method for the sparse depth map in the embodiment.

It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. Furthermore, the descriptions of upper, lower, left, right, etc. used in the present disclosure are only relative to the mutual positional relationship of the constituent parts of the present disclosure in the drawings. As used in this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, unless defined otherwise, all technical and scientific terms used in this example have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this embodiment, the term "and/or" includes any combination of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "or the like") provided with this embodiment is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.

It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, operations of processes described in this embodiment can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described in this embodiment (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described in this embodiment includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.

A computer program can be applied to input data to perform the functions described in the present embodiment to convert the input data to generate output data that is stored to a non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims

1. A depth completion method for a sparse depth map, the depth completion method for the sparse depth map comprising:

acquiring an image to be processed;

the training of the neural network comprises:

acquiring a color image;

acquiring a depth map true value corresponding to the color image;

2. The depth completion method according to claim 1, wherein the equidistant sampling of the depth map truth values to obtain a sparse depth map comprises:

3. The depth completion method according to claim 1, wherein the extracting a multi-scale feature map from the color image and the sparse depth map comprises:

coding the color image to obtain first characteristic coding information;

4. The depth completion method of claim 1, wherein the calculating pixel correlations from the multi-scale feature map comprises:

carrying out multilayer convolution processing on the spliced characteristic diagram to obtain a multidimensional tensor; and taking the multi-dimensional tensor as the pixel correlation.

5. The method according to claim 1, wherein the determining a processing result of the iterative processing procedure in the current round according to the target depth map and the residual map comprises:

and taking the sum of the target depth map and the residual map of the iterative processing process in the current round as a processing result of the iterative processing process in the current round.

6. The method of claim 1, wherein determining the loss function value according to the processing result of the iterative processing procedure comprises:

in order to be the weight coefficient,

is a loss function.

7. The method of claim 6, wherein the loss function is any one of the following functions or a combination of several functions: an L1 function, an L2 function, a SmoothL1 function, or a Berhu function.

8. The method according to any one of claims 1 to 7, wherein the performing deep completion on the image to be processed by using the trained neural network comprises:

and acquiring a processing result of the neural network.

9. A computer apparatus comprising a memory for storing at least one program and a processor for loading the at least one program to perform the deep completion method of any one of claims 1-8.

10. A storage medium having stored therein a processor-executable program, wherein the processor-executable program, when executed by a processor, is configured to perform the deep completion method of any one of claims 1 to 8.