CN113763447B

CN113763447B - Method for completing depth map, electronic device and storage medium

Info

Publication number: CN113763447B
Application number: CN202110974023.2A
Authority: CN
Inventors: 季栋; 薛远; 曹天宇; 王亚运; 李绪琴; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2022-08-26
Anticipated expiration: 2041-08-24
Also published as: CN113763447A

Abstract

The embodiment of the invention relates to the field of image processing, and discloses a depth map completion method, electronic equipment and a storage medium. In some embodiments of the present application, a method for completing a depth map includes: acquiring a sparse depth map and a color map corresponding to the sparse depth map; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in the training data set. According to the technical scheme provided by the embodiment of the application, the sparse depth map can be completed into the dense depth map, and the engineering cost is reduced.

Description

Method for completing depth map, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of image processing, in particular to a depth map completion method, electronic equipment and a storage medium.

Background

The depth perception and measurement technology is more and more widely applied in the fields of unmanned aerial vehicles, automatic driving, robots and the like. In these emerging technology areas of explosive development, sensors take a very important position. It is a bridge for information interaction between a computer and the outside world. The sensor transmits the captured external environment information to the computer, and the computer judges the external environment information and carries out a series of planning and decision-making, for example, sending an action instruction to a robot and the like.

For outdoor open scenes, a high-quality depth map is particularly important for instruction decision of a computer due to the complex external environment. Generally, the distance required to be perceived in an open scene is far, and can reach tens of meters or even hundreds of meters. At this time, a high quality sensor becomes indispensable. Common high-quality laser radar systems are very expensive, and obtained sampling points are sparse, so that the development of the fields of automatic driving and the like is hindered to a certain extent. Therefore, a depth map completion algorithm is needed to perform completion on the sparse depth map to obtain a dense depth map, so as to reduce the engineering cost.

Disclosure of Invention

An object of the embodiments of the present invention is to provide a depth map completion method, an electronic device, and a storage medium, which can complete a sparse depth map into a dense depth map, thereby reducing engineering cost.

To solve the foregoing technical problem, in a first aspect, an embodiment of the present invention provides a method for completing a depth map, including: acquiring a sparse depth map and a color map corresponding to the sparse depth map; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in the training data set.

In a second aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for completing a depth map as mentioned in the above embodiments.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for completing a depth map mentioned in the foregoing embodiments.

Compared with the prior art, the depth map completion method and the device have the advantages that the depth map completion model is trained on the basis of the semi-dense depth sample map, the dense depth map does not need to be marked manually, and cost is greatly reduced. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent applications of outdoor scenes such as automatic driving, unmanned aerial vehicles and the like, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings which correspond to and are not to be construed as limiting the embodiments, in which elements having the same reference numeral designations represent like elements throughout, and in which the drawings are not to be construed as limiting in scale unless otherwise specified.

FIG. 1 is a flow chart of a method of completing a depth map in an embodiment of the present application;

FIG. 2 is a diagram illustrating a depth map completion model in an embodiment of the present application;

FIG. 3 is a schematic diagram of a depth map completion model in another embodiment of the present application;

FIG. 4 is a flow chart of a method of completing a depth map in another embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.

In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified.

In the embodiment of the present application, the method for completing a depth map as shown in fig. 1, which is executed by an electronic device, may be used for completing a depth map of an outdoor scene, and includes the following steps.

Step 101: and acquiring a sparse depth map and a color map corresponding to the sparse depth map.

Step 102: inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in a training data set.

In the embodiment of the application, the depth map completion model is supervised and trained based on the semi-dense depth sample map, the dense depth map does not need to be marked manually, and the cost is greatly reduced. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent automatic driving, unmanned aerial vehicles and other outdoor scenes, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced.

At present, the commonly used method for obtaining a dense depth map is as follows: and performing degradation processing according to the existing dense depth map, generally obtaining the sparse depth map by a random sampling method, inputting the sparse depth map into a neural network to extract features, and using the original dense depth map as a supervision signal. By the completely supervised training mode, the sparse depth map is completed to obtain a dense depth map. However, this method has the following problems:

1. the random sampling simulation sparse depth map has a certain difference with the sparse depth map obtained by the laser radar. Because the sampling of the sensor depends on the mechanical structure of the sensor, the sparse depth map obtained by manual degradation processing often has strong randomness, and the sparse depth map obtained by radar sampling cannot be simulated.

2. In a real outdoor scene, a dense depth map is often not obtained by radar, and a lot of time and resources are consumed if manual labeling is performed.

3. The completion of the depth map is simplified to the underlying visual enhancement problem, and other characteristic information available for the scene map and the geometric structure information of the visual system are ignored. And the requirement on the original data set is high, a dense depth map needs to be prepared in advance, and the application and popularization of the algorithm are limited.

Based on this, the embodiment provides a method for completing a depth map, which trains a depth map completion model based on a semi-dense depth sample map, and does not need to manually label a dense depth map, thereby greatly reducing the cost. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent automatic driving, unmanned aerial vehicles and other outdoor scenes, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced. Among them, a semi-dense depth map (semi-dense depth map) is a depth map between a sparse depth map and a dense depth map, i.e., a depth map with a small number of missing depth values. The dense depth map is a map in which depth information of almost all points is calculated. A sparse depth map is a map in which depth information of a large number of points is missing. Namely, the area of effective recovery depth in the sparse depth map < the area of effective recovery depth in the semi-dense depth map < the area of effective recovery depth in the dense depth map.

In one embodiment, the semi-dense depth sample map to which the sparse depth sample map corresponds is obtained from a color map to which the sparse depth sample map corresponds. Specifically, for sampling of outdoor scenes, the laser radar cannot acquire a dense depth map due to the limitation of a sampling mode and a long-distance scene, and an area of effective depth recovery in a sparse depth map obtained by sampling accounts for 30% -60% of the whole scene, so that compared with the dense depth map recovered pixel by pixel, the sparse depth map has a large proportion of depth value missing. Therefore, a need exists to build a semi-dense depth map dataset for a scene map. The semi-dense depth map may be obtained by color mapping. Due to the fact that the characteristic points cannot be found due to the fact that shielding occurs in a scene, or the situation that no texture area or repeated texture occurs in the scene, such as the sky, a smooth desktop, a water surface and the like, or the quality of a scene graph is too dark or too bright, the depth values cannot be calculated for part of pixel points, and a semi-dense depth graph is obtained. In this embodiment, the SGM or another depth recovery algorithm may be used to perform depth calculation on the color map to obtain a semi-dense depth map.

In one embodiment, the depth map completion model comprises a pre-trained first feature extraction submodel, a pre-trained second feature extraction submodel and a fusion submodel; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map, which comprises the following steps: inputting the sparse depth map into a first feature extraction submodel to obtain a first feature map; the first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module; inputting the color image corresponding to the sparse depth map into a pre-trained second feature extraction sub-model to obtain a second feature map; the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module; inputting the first feature map and the second feature map into the fusion sub-model to obtain a dense depth map; the fusion sub-model is used for splicing the first characteristic graph and the second characteristic graph to obtain a fusion characteristic graph fusing the space characteristic information of the corresponding color graph and the characteristic information of the sparse depth graph; and obtaining a dense depth map according to the fused feature map. Specifically, after the sparse depth map passes through the first residual pyramid module, the resolution is gradually reduced, and after the sparse depth map passes through the second pyramid pooling module, the sparse depth map and the output result of the second pyramid pooling module in the second feature extraction submodel are subjected to channel dimension splicing operation to obtain a group of fusion feature maps fusing the feature information of the color map and the sparse depth map. The fusion feature map fuses feature information of the depth map and spatial feature information of the color map, and a dense depth map is obtained based on the fusion feature map, so that the dense depth map is more accurate.

In one embodiment, a depth map completion model is schematically illustrated in FIG. 2. In the training process, sparse depth samples are obtainedThe map is input into a first feature extraction submodel 201, the color sample map is input into a second feature extraction submodel 202, and a dense depth map is predicted based on the output of the first feature extraction submodel 201 and the output of the second feature extraction submodel 202. In this process, supervised learning is performed using the semi-dense depth map. In the training phase, the second feature extraction submodel outputs a loss function (L) between the dense depth map and the semi-dense depth map for the final prediction _D ) Is defined as:

formula a:

wherein, N represents the total amount of the samples,

representing a semi-dense depth map, C ⁱ A color map representing the input is displayed,

a sparse depth map is represented and,

a dense depth map representing a depth map completion model prediction. H is the Huber loss, which is defined as follows:

formula b:

wherein y represents a semi-dense depth map and y represents a predicted dense depth map.

In one embodiment, obtaining feature information of a sparse depth map and spatial feature information of a color map corresponding to the sparse depth map includes: inputting the sparse depth map into a pre-trained first feature extraction sub-model to obtain a first feature map; the first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module; inputting the color image corresponding to the sparse depth map into a pre-trained second feature extraction sub-model to obtain a second feature map; the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module.

The following illustrates the construction of the second feature extraction submodel using the second residual pyramid module and the second pyramid pooling module.

The color map is sent to a second feature extraction submodel, the second feature extraction submodel comprises two branches, the first branch comprises a pre-trained image classification Module, the image classification Module can adopt a residual error network (Resnet101) model, and the second branch comprises a second residual error Pyramid Module (in a dashed frame) and a second Pyramid Pooling Module (PPM). In the first branch, for the neural network, the features extracted by the shallow convolutional layer are more global, and the extracted information can reflect the local feature expression in the image along with the continuous deepening of the layer number. The Resnet101 model is used in this embodiment. The Resnet101 model is an image classification module trained on large datasets. Before training the second feature extraction submodel, the Resnet101 model is a model that is trained in advance and whose model parameters are fixed and remain unchanged. Forward propagation is performed when the color map is fed into the Resnet101 model, without involving model parameter updates. The second branch comprises a second residual pyramid module and a second pyramid pooling module, wherein the second residual pyramid module and the second pooling module are composed of second residual error units (Res blocks). And after the color image passes through each group of second residual error units, the size is reduced by half. In this embodiment, an example is given in which the second residual pyramid module includes three cascaded sets of second residual units. For the cascaded three sets of second residual units, the sizes of their feature maps are 1/2, 1/4, and 1/8, respectively, of the original image resolution. Because the unique jump connection design of the residual error network is beneficial to gradient conduction and model convergence of the neural network, in the embodiment, the first residual error pyramid module and the second residual error pyramid module are constructed based on the basic unit of the residual error network, so that the multi-scale information of the image can be effectively utilized, and the feature utilization is more sufficient.

Aiming at the task of depth completion, in order to better utilize the spatial characteristic information of the color image, a second pyramid pooling module is used. For the second pyramid POOLING module, input data of the second pyramid POOLING module is firstly subjected to POOLING (POOLING) to obtain three groups of feature maps with different scales, convolution processing is performed in respective branches, then feature maps with the same scale are obtained through up-sampling processing, and finally channel dimension splicing operation is performed to obtain output. The second pyramid POOLING module can well reserve the global context information of the color image and enhance the characteristic capability of the features through the POOLING operation with different scales and the subsequent rolling and splicing operation.

It should be noted that, as can be understood by those skilled in the art, the first feature extraction submodel and the second feature extraction submodel have similar structures, and are not described in detail in this embodiment.

It should be noted that, as can be understood by those skilled in the art, the number of residual error units in the first residual error pyramid module and the second residual error pyramid module may be set according to needs, and the embodiment is not limited.

In one embodiment, the second feature extraction submodel further comprises an image classification module, and the training process of the second feature extraction submodel comprises: respectively inputting each color sample image into an image classification module to obtain a color feature image of each color sample image; using the color characteristic map of each color sample map as supervision data of the color sample map; the second feature extraction submodel is trained using the color sample maps and the supervised data of the color sample maps based on perceptual loss.

Specifically, in order to make the feature expression capability of the extracted color image sufficiently strong, an operation of perception loss is introduced. And extracting the characteristic diagram of the color diagram extracted by the Resnet101 model in the first branch. And for a second residual pyramid module in the second branch, outputting the feature maps after the last group of second residual units, and comparing the feature levels of the two groups of feature maps by using the sensing loss so as to supervise the feature extraction capability of the residual pyramid module. In this embodiment, the perceptual loss may be an euclidean distance at a pixel level of the feature image, and the function definition of the perceptual loss may be:

formula c:

where C denotes the number of channels of the feature map extracted to calculate the perceptual loss, H denotes the height of the feature map extracted to calculate the perceptual loss, W denotes the width of the feature map extracted to calculate the perceptual loss,

a feature map representing the ith second residual unit in the second residual pyramid module,

a characteristic diagram representing the jth layer of the Resnet101 model, and x represents the input color diagram.

It should be noted that, as will be understood by those skilled in the art, the Resnet101 model introduced in this embodiment is used in the model training stage, and in the model testing or verifying stage, the color map is sent to the second branch of the second feature extraction model, without using the first branch.

In one embodiment, the first residual pyramid module includes T sequentially connected first residual units, and the second residual pyramid sub-module includes T sequentially connected second residual units; in the first residual pyramid module, the input data of the (i + 1) th first residual unit is data obtained by splicing the output data of the ith first residual unit and the output data of the ith second residual unit; wherein T is a positive integer greater than 1, and i is a positive integer less than T. In order to effectively utilize the global and local feature information of the color map, in this embodiment, a jump cascade operation from the second residual pyramid module in the second feature extraction submodel to the first residual pyramid module in the first feature extraction submodel is added. Specifically, the feature map of the second residual unit in the second residual pyramid module is sent to the first residual unit in the first residual pyramid module, and a Concatenation operation (Concatenation, simply referred to as Concat) of channel dimensions is performed.

In one embodiment, from the fused feature map, a dense depth map is obtained, comprising: acquiring output data of each first residual error unit in the first residual error pyramid module; acquiring a confidence coefficient representation map of the sparse depth map based on the output data of each first residual error unit; and obtaining a dense depth map according to the fused feature map and the confidence degree characterization map. In particular, for sparse depth maps of outdoor scenes, the information available in the depth map is very low due to the fact that many depth values are missing. In addition, the sparse depth map contains many noise points, which is very disadvantageous for the completion of the depth map, and especially, the traditional interpolation method often has large deviation for the completion of the depth values of the noise points. In order to more accurately judge the reliability of the depth value points and improve the accuracy of depth value completion, in this embodiment, the confidence level of the scene area pixel by pixel is calculated to obtain a confidence level representation map of the sparse depth map, and then the feature map and the confidence level representation map are fused to obtain a dense depth map.

It is worth mentioning that the confidence degree characterization graph is obtained by calculating the confidence degree of pixel points in a scene, a dense depth graph is obtained based on the confidence degree characterization graph, the influence of noise points can be reduced, and the accuracy of depth completion is enhanced for the reliability evaluation of pixel points.

Optionally, the fusion submodel comprises a first processing module; obtaining a confidence characterization map of the sparse depth map based on the output data of each first residual unit, wherein the confidence characterization map comprises: inputting the output data of each first residual error unit into a first processing module to obtain a confidence coefficient representation diagram; the first processing module sequentially performs up-sampling processing, splicing processing and convolution processing on output data of each first residual error unit, and obtains a confidence coefficient representation diagram through logistic regression processing.

Optionally, the fusion submodel further comprises a second processing module, a convolution module and a dot product module; obtaining a dense depth map according to the fused feature map and the confidence degree characterization map, wherein the dense depth map comprises: inputting the fusion characteristic graph into a convolution module; inputting the intermediate output of each convolution layer in the convolution module into a second processing module to obtain a fusion output graph; the second processing module is used for sequentially performing up-sampling processing, splicing processing and convolution processing on the intermediate output of each convolution layer in the convolution module to obtain a fusion output graph; inputting the fusion output image and the confidence coefficient representation image into a point multiplication module to obtain a dense depth image; and the point multiplication module is used for performing point multiplication on the fusion output image and the confidence coefficient representation image to obtain a dense depth image.

Specifically, in this embodiment, a schematic diagram of the depth map completion model is shown in fig. 3. The depth map completion model includes a first feature extraction submodel 301, a second feature extraction submodel 302, and a fusion submodel 303. The sparse depth map is input to a first feature extraction submodel 301 and the color map is input to a second feature extraction submodel 302. The first feature extraction submodel 301 includes a first residual pyramid module 3011 and a first pyramid pooling module 3012, the second feature extraction submodel 302 includes a second residual pyramid module 3021, a second pyramid pooling module 3022, and an image classification module 3023, and the color feature map 3041 output by the image classification module 3023 and the predicted feature map 3042 output by the second residual pyramid module 3021 calculate a perceptual loss to perform supervised training on the second feature extraction submodel 302. The fusion sub-model 303 includes a concatenation module 3031, a convolution module 3032, a first processing module 3033, a second processing module 3034, and a dot product module 3035. The splicing module 3031 splices the first feature map and the second feature map to obtain a spliced map 3043. The convolution module 3032 performs a convolution operation on the splice map 3043. The first processing module 3033 performs upsampling, stitching, and convolution on the intermediate output of each set of first residual units of the first residual pyramid module 3011, and uses Softmax to logically regress a confidence map 3044(confidence map) with the same resolution as the original sparse depth map. Similarly, the second processing module 3034 performs upsampling, stitching, and convolution on the intermediate output of the stacked convolutional layer in the convolution module 3032 to obtain a fused output map. The dot multiplication module 3035 performs dot multiplication on the fused output map and the confidence characterization map 3044, so as to obtain a final predicted dense depth map 3045(dense depth). The calculation formula of the point multiplication processing of the fusion output graph and the confidence coefficient characterization graph is as follows:

formula d: d _out ＝D _p (i,j)*e ^C(i,j)

Wherein C (i, j) represents a confidence token map, D _p (i, j) denotes fusionOutput diagram, D _out Representing the final computed dense depth map. The formula d is a pixel-by-pixel multiplication operation, and (i, j) represents the coordinate position of the pixel.

It should be noted that, as can be understood by those skilled in the art, the completion method of the depth map mentioned in the present embodiment is suitable for the sparse depth map completion of the outdoor scene, and may also be used in the indoor scene or other situations where the depth map result is sparse, and has a certain mobility.

The above embodiments can be mutually combined and cited, for example, the following embodiments are examples after being combined, but not limited thereto; the embodiments can be arbitrarily combined into a new embodiment without contradiction.

In one embodiment, fig. 4 shows a method for completing a depth map performed by an electronic device, which includes the following steps.

Step 401: and acquiring a sparse depth map and a color map corresponding to the sparse depth map.

Step 402: and inputting the sparse depth map into a first feature extraction submodel to obtain a first feature map. The first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module.

Step 403: and inputting the color image corresponding to the sparse depth image into a pre-trained second feature extraction sub-model to obtain a second feature image. The second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module.

Step 404: and inputting the first feature map and the second feature map into the fusion sub-model to obtain a dense depth map. The fusion sub-model is used for splicing the first characteristic graph and the second characteristic graph to obtain a fusion characteristic graph fusing the space characteristic information of the corresponding color graph and the characteristic information of the sparse depth graph; and obtaining a dense depth map according to the fused feature map. The first feature extraction submodel, the second feature extraction submodel and the fusion submodel are used for carrying out supervision training on the basis of semi-dense depth sample maps corresponding to the sparse depth sample maps in the training data set. The semi-dense depth sample map corresponding to the sparse depth sample map is obtained according to the color map corresponding to the sparse depth sample map.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

An embodiment of the present application further provides an electronic device, as shown in fig. 5, including: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; wherein the memory stores instructions executable by the at least one processor 501, the instructions being executable by the at least one processor 501 to enable the at least one processor 501 to perform the above-described method embodiments.

The memory 502 and the processor 501 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 501 and the memory 502 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 501.

The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by processor 501 in performing operations.

An embodiment of the present application further provides a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method for completing a depth map, comprising:

acquiring a sparse depth map and a color map corresponding to the sparse depth map;

inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in a training data set, and comprises a pre-trained first feature extraction submodel, a pre-trained second feature extraction submodel and a fusion submodel;

inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map, including:

inputting the sparse depth map into the first feature extraction submodel to obtain a first feature map, inputting the color map corresponding to the sparse depth map into the second feature extraction submodel to obtain a second feature map, and performing splicing operation on the first feature map and the second feature map through the fusion submodel to obtain a fusion feature map;

performing convolution operation on the fusion characteristic graph by adopting a convolution module in the fusion sub-model, and sequentially performing up-sampling processing, splicing processing and convolution processing on intermediate output of the stacked convolution layers in the convolution module to obtain a fusion output graph;

acquiring a confidence coefficient representation map of the sparse depth map based on the output data of each first residual unit of a first residual pyramid module in the first feature extraction submodel;

and performing point multiplication on the fusion output image and the confidence coefficient characterization image by using a point multiplication module in the fusion sub-model to obtain the dense depth image.

2. The method of completing a depth map according to claim 1, wherein the semi-dense depth sample map corresponding to the sparse depth sample map is obtained from the color map corresponding to the sparse depth sample map.

3. The method according to claim 1, wherein the first residual pyramid module comprises T first residual units connected in sequence, the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module, and the second residual pyramid module comprises T second residual units connected in sequence; in the first residual pyramid module, the input data of the (i + 1) th first residual unit is data obtained by splicing the output data of the ith first residual unit and the output data of the ith second residual unit; wherein T is a positive integer greater than 1, and i is a positive integer less than T.

4. The method of completing a depth map according to claim 1, wherein the fusion submodel comprises a first processing module;

the obtaining of the confidence characterization map of the sparse depth map based on the output data of each first residual unit of the first residual pyramid module in the first feature extraction submodel includes:

inputting the output data of each first residual error unit into a first processing module to obtain the confidence coefficient representation diagram; the first processing module sequentially performs up-sampling processing, splicing processing and convolution processing on output data of each first residual error unit, and obtains the confidence coefficient representation diagram through logistic regression processing.

5. The method of completing a depth map according to claim 3, wherein the second feature extraction submodel further comprises an image classification module, and the training process of the second feature extraction submodel comprises:

respectively inputting each color sample image into an image classification module to obtain a color feature image of each color sample image;

using the color feature map of each color sample map as supervision data of the color sample map;

training the second feature extraction submodel using each of the color sample maps and supervised data for each of the color sample maps based on perceptual loss.

6. An electronic device, comprising: at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of complementing a depth map of any one of claims 1 to 5.

7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of complementing a depth map according to any one of claims 1 to 5.