CN113763447B - Method for completing depth map, electronic device and storage medium - Google Patents

Method for completing depth map, electronic device and storage medium Download PDF

Info

Publication number
CN113763447B
CN113763447B CN202110974023.2A CN202110974023A CN113763447B CN 113763447 B CN113763447 B CN 113763447B CN 202110974023 A CN202110974023 A CN 202110974023A CN 113763447 B CN113763447 B CN 113763447B
Authority
CN
China
Prior art keywords
map
depth map
module
residual
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110974023.2A
Other languages
Chinese (zh)
Other versions
CN113763447A (en
Inventor
季栋
薛远
曹天宇
王亚运
李绪琴
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Dilusense Technology Co Ltd filed Critical Hefei Dilusense Technology Co Ltd
Priority to CN202110974023.2A priority Critical patent/CN113763447B/en
Publication of CN113763447A publication Critical patent/CN113763447A/en
Application granted granted Critical
Publication of CN113763447B publication Critical patent/CN113763447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to the field of image processing, and discloses a depth map completion method, electronic equipment and a storage medium. In some embodiments of the present application, a method for completing a depth map includes: acquiring a sparse depth map and a color map corresponding to the sparse depth map; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in the training data set. According to the technical scheme provided by the embodiment of the application, the sparse depth map can be completed into the dense depth map, and the engineering cost is reduced.

Description

Method for completing depth map, electronic device and storage medium
Technical Field
The embodiment of the invention relates to the field of image processing, in particular to a depth map completion method, electronic equipment and a storage medium.
Background
The depth perception and measurement technology is more and more widely applied in the fields of unmanned aerial vehicles, automatic driving, robots and the like. In these emerging technology areas of explosive development, sensors take a very important position. It is a bridge for information interaction between a computer and the outside world. The sensor transmits the captured external environment information to the computer, and the computer judges the external environment information and carries out a series of planning and decision-making, for example, sending an action instruction to a robot and the like.
For outdoor open scenes, a high-quality depth map is particularly important for instruction decision of a computer due to the complex external environment. Generally, the distance required to be perceived in an open scene is far, and can reach tens of meters or even hundreds of meters. At this time, a high quality sensor becomes indispensable. Common high-quality laser radar systems are very expensive, and obtained sampling points are sparse, so that the development of the fields of automatic driving and the like is hindered to a certain extent. Therefore, a depth map completion algorithm is needed to perform completion on the sparse depth map to obtain a dense depth map, so as to reduce the engineering cost.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a depth map completion method, an electronic device, and a storage medium, which can complete a sparse depth map into a dense depth map, thereby reducing engineering cost.
To solve the foregoing technical problem, in a first aspect, an embodiment of the present invention provides a method for completing a depth map, including: acquiring a sparse depth map and a color map corresponding to the sparse depth map; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in the training data set.
In a second aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for completing a depth map as mentioned in the above embodiments.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for completing a depth map mentioned in the foregoing embodiments.
Compared with the prior art, the depth map completion method and the device have the advantages that the depth map completion model is trained on the basis of the semi-dense depth sample map, the dense depth map does not need to be marked manually, and cost is greatly reduced. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent applications of outdoor scenes such as automatic driving, unmanned aerial vehicles and the like, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings which correspond to and are not to be construed as limiting the embodiments, in which elements having the same reference numeral designations represent like elements throughout, and in which the drawings are not to be construed as limiting in scale unless otherwise specified.
FIG. 1 is a flow chart of a method of completing a depth map in an embodiment of the present application;
FIG. 2 is a diagram illustrating a depth map completion model in an embodiment of the present application;
FIG. 3 is a schematic diagram of a depth map completion model in another embodiment of the present application;
FIG. 4 is a flow chart of a method of completing a depth map in another embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified.
In the embodiment of the present application, the method for completing a depth map as shown in fig. 1, which is executed by an electronic device, may be used for completing a depth map of an outdoor scene, and includes the following steps.
Step 101: and acquiring a sparse depth map and a color map corresponding to the sparse depth map.
Step 102: inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in a training data set.
In the embodiment of the application, the depth map completion model is supervised and trained based on the semi-dense depth sample map, the dense depth map does not need to be marked manually, and the cost is greatly reduced. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent automatic driving, unmanned aerial vehicles and other outdoor scenes, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced.
At present, the commonly used method for obtaining a dense depth map is as follows: and performing degradation processing according to the existing dense depth map, generally obtaining the sparse depth map by a random sampling method, inputting the sparse depth map into a neural network to extract features, and using the original dense depth map as a supervision signal. By the completely supervised training mode, the sparse depth map is completed to obtain a dense depth map. However, this method has the following problems:
1. the random sampling simulation sparse depth map has a certain difference with the sparse depth map obtained by the laser radar. Because the sampling of the sensor depends on the mechanical structure of the sensor, the sparse depth map obtained by manual degradation processing often has strong randomness, and the sparse depth map obtained by radar sampling cannot be simulated.
2. In a real outdoor scene, a dense depth map is often not obtained by radar, and a lot of time and resources are consumed if manual labeling is performed.
3. The completion of the depth map is simplified to the underlying visual enhancement problem, and other characteristic information available for the scene map and the geometric structure information of the visual system are ignored. And the requirement on the original data set is high, a dense depth map needs to be prepared in advance, and the application and popularization of the algorithm are limited.
Based on this, the embodiment provides a method for completing a depth map, which trains a depth map completion model based on a semi-dense depth sample map, and does not need to manually label a dense depth map, thereby greatly reducing the cost. And (3) completing the sparse depth map by utilizing the color map corresponding to the sparse depth map to obtain a dense depth map, so that the dense depth map can be used for subsequent automatic driving, unmanned aerial vehicles and other outdoor scenes, low-cost radars can replace expensive high-quality radars, and the cost is greatly reduced. Among them, a semi-dense depth map (semi-dense depth map) is a depth map between a sparse depth map and a dense depth map, i.e., a depth map with a small number of missing depth values. The dense depth map is a map in which depth information of almost all points is calculated. A sparse depth map is a map in which depth information of a large number of points is missing. Namely, the area of effective recovery depth in the sparse depth map < the area of effective recovery depth in the semi-dense depth map < the area of effective recovery depth in the dense depth map.
In one embodiment, the semi-dense depth sample map to which the sparse depth sample map corresponds is obtained from a color map to which the sparse depth sample map corresponds. Specifically, for sampling of outdoor scenes, the laser radar cannot acquire a dense depth map due to the limitation of a sampling mode and a long-distance scene, and an area of effective depth recovery in a sparse depth map obtained by sampling accounts for 30% -60% of the whole scene, so that compared with the dense depth map recovered pixel by pixel, the sparse depth map has a large proportion of depth value missing. Therefore, a need exists to build a semi-dense depth map dataset for a scene map. The semi-dense depth map may be obtained by color mapping. Due to the fact that the characteristic points cannot be found due to the fact that shielding occurs in a scene, or the situation that no texture area or repeated texture occurs in the scene, such as the sky, a smooth desktop, a water surface and the like, or the quality of a scene graph is too dark or too bright, the depth values cannot be calculated for part of pixel points, and a semi-dense depth graph is obtained. In this embodiment, the SGM or another depth recovery algorithm may be used to perform depth calculation on the color map to obtain a semi-dense depth map.
In one embodiment, the depth map completion model comprises a pre-trained first feature extraction submodel, a pre-trained second feature extraction submodel and a fusion submodel; inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map, which comprises the following steps: inputting the sparse depth map into a first feature extraction submodel to obtain a first feature map; the first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module; inputting the color image corresponding to the sparse depth map into a pre-trained second feature extraction sub-model to obtain a second feature map; the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module; inputting the first feature map and the second feature map into the fusion sub-model to obtain a dense depth map; the fusion sub-model is used for splicing the first characteristic graph and the second characteristic graph to obtain a fusion characteristic graph fusing the space characteristic information of the corresponding color graph and the characteristic information of the sparse depth graph; and obtaining a dense depth map according to the fused feature map. Specifically, after the sparse depth map passes through the first residual pyramid module, the resolution is gradually reduced, and after the sparse depth map passes through the second pyramid pooling module, the sparse depth map and the output result of the second pyramid pooling module in the second feature extraction submodel are subjected to channel dimension splicing operation to obtain a group of fusion feature maps fusing the feature information of the color map and the sparse depth map. The fusion feature map fuses feature information of the depth map and spatial feature information of the color map, and a dense depth map is obtained based on the fusion feature map, so that the dense depth map is more accurate.
In one embodiment, a depth map completion model is schematically illustrated in FIG. 2. In the training process, sparse depth samples are obtainedThe map is input into a first feature extraction submodel 201, the color sample map is input into a second feature extraction submodel 202, and a dense depth map is predicted based on the output of the first feature extraction submodel 201 and the output of the second feature extraction submodel 202. In this process, supervised learning is performed using the semi-dense depth map. In the training phase, the second feature extraction submodel outputs a loss function (L) between the dense depth map and the semi-dense depth map for the final prediction D ) Is defined as:
formula a:
Figure BDA0003226661510000041
wherein, N represents the total amount of the samples,
Figure BDA0003226661510000042
representing a semi-dense depth map, C i A color map representing the input is displayed,
Figure BDA0003226661510000043
a sparse depth map is represented and,
Figure BDA0003226661510000051
a dense depth map representing a depth map completion model prediction. H is the Huber loss, which is defined as follows:
formula b:
Figure BDA0003226661510000052
wherein y represents a semi-dense depth map and y represents a predicted dense depth map.
In one embodiment, obtaining feature information of a sparse depth map and spatial feature information of a color map corresponding to the sparse depth map includes: inputting the sparse depth map into a pre-trained first feature extraction sub-model to obtain a first feature map; the first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module; inputting the color image corresponding to the sparse depth map into a pre-trained second feature extraction sub-model to obtain a second feature map; the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module.
The following illustrates the construction of the second feature extraction submodel using the second residual pyramid module and the second pyramid pooling module.
The color map is sent to a second feature extraction submodel, the second feature extraction submodel comprises two branches, the first branch comprises a pre-trained image classification Module, the image classification Module can adopt a residual error network (Resnet101) model, and the second branch comprises a second residual error Pyramid Module (in a dashed frame) and a second Pyramid Pooling Module (PPM). In the first branch, for the neural network, the features extracted by the shallow convolutional layer are more global, and the extracted information can reflect the local feature expression in the image along with the continuous deepening of the layer number. The Resnet101 model is used in this embodiment. The Resnet101 model is an image classification module trained on large datasets. Before training the second feature extraction submodel, the Resnet101 model is a model that is trained in advance and whose model parameters are fixed and remain unchanged. Forward propagation is performed when the color map is fed into the Resnet101 model, without involving model parameter updates. The second branch comprises a second residual pyramid module and a second pyramid pooling module, wherein the second residual pyramid module and the second pooling module are composed of second residual error units (Res blocks). And after the color image passes through each group of second residual error units, the size is reduced by half. In this embodiment, an example is given in which the second residual pyramid module includes three cascaded sets of second residual units. For the cascaded three sets of second residual units, the sizes of their feature maps are 1/2, 1/4, and 1/8, respectively, of the original image resolution. Because the unique jump connection design of the residual error network is beneficial to gradient conduction and model convergence of the neural network, in the embodiment, the first residual error pyramid module and the second residual error pyramid module are constructed based on the basic unit of the residual error network, so that the multi-scale information of the image can be effectively utilized, and the feature utilization is more sufficient.
Aiming at the task of depth completion, in order to better utilize the spatial characteristic information of the color image, a second pyramid pooling module is used. For the second pyramid POOLING module, input data of the second pyramid POOLING module is firstly subjected to POOLING (POOLING) to obtain three groups of feature maps with different scales, convolution processing is performed in respective branches, then feature maps with the same scale are obtained through up-sampling processing, and finally channel dimension splicing operation is performed to obtain output. The second pyramid POOLING module can well reserve the global context information of the color image and enhance the characteristic capability of the features through the POOLING operation with different scales and the subsequent rolling and splicing operation.
It should be noted that, as can be understood by those skilled in the art, the first feature extraction submodel and the second feature extraction submodel have similar structures, and are not described in detail in this embodiment.
It should be noted that, as can be understood by those skilled in the art, the number of residual error units in the first residual error pyramid module and the second residual error pyramid module may be set according to needs, and the embodiment is not limited.
In one embodiment, the second feature extraction submodel further comprises an image classification module, and the training process of the second feature extraction submodel comprises: respectively inputting each color sample image into an image classification module to obtain a color feature image of each color sample image; using the color characteristic map of each color sample map as supervision data of the color sample map; the second feature extraction submodel is trained using the color sample maps and the supervised data of the color sample maps based on perceptual loss.
Specifically, in order to make the feature expression capability of the extracted color image sufficiently strong, an operation of perception loss is introduced. And extracting the characteristic diagram of the color diagram extracted by the Resnet101 model in the first branch. And for a second residual pyramid module in the second branch, outputting the feature maps after the last group of second residual units, and comparing the feature levels of the two groups of feature maps by using the sensing loss so as to supervise the feature extraction capability of the residual pyramid module. In this embodiment, the perceptual loss may be an euclidean distance at a pixel level of the feature image, and the function definition of the perceptual loss may be:
formula c:
Figure BDA0003226661510000061
where C denotes the number of channels of the feature map extracted to calculate the perceptual loss, H denotes the height of the feature map extracted to calculate the perceptual loss, W denotes the width of the feature map extracted to calculate the perceptual loss,
Figure BDA0003226661510000062
a feature map representing the ith second residual unit in the second residual pyramid module,
Figure BDA0003226661510000063
a characteristic diagram representing the jth layer of the Resnet101 model, and x represents the input color diagram.
It should be noted that, as will be understood by those skilled in the art, the Resnet101 model introduced in this embodiment is used in the model training stage, and in the model testing or verifying stage, the color map is sent to the second branch of the second feature extraction model, without using the first branch.
In one embodiment, the first residual pyramid module includes T sequentially connected first residual units, and the second residual pyramid sub-module includes T sequentially connected second residual units; in the first residual pyramid module, the input data of the (i + 1) th first residual unit is data obtained by splicing the output data of the ith first residual unit and the output data of the ith second residual unit; wherein T is a positive integer greater than 1, and i is a positive integer less than T. In order to effectively utilize the global and local feature information of the color map, in this embodiment, a jump cascade operation from the second residual pyramid module in the second feature extraction submodel to the first residual pyramid module in the first feature extraction submodel is added. Specifically, the feature map of the second residual unit in the second residual pyramid module is sent to the first residual unit in the first residual pyramid module, and a Concatenation operation (Concatenation, simply referred to as Concat) of channel dimensions is performed.
In one embodiment, from the fused feature map, a dense depth map is obtained, comprising: acquiring output data of each first residual error unit in the first residual error pyramid module; acquiring a confidence coefficient representation map of the sparse depth map based on the output data of each first residual error unit; and obtaining a dense depth map according to the fused feature map and the confidence degree characterization map. In particular, for sparse depth maps of outdoor scenes, the information available in the depth map is very low due to the fact that many depth values are missing. In addition, the sparse depth map contains many noise points, which is very disadvantageous for the completion of the depth map, and especially, the traditional interpolation method often has large deviation for the completion of the depth values of the noise points. In order to more accurately judge the reliability of the depth value points and improve the accuracy of depth value completion, in this embodiment, the confidence level of the scene area pixel by pixel is calculated to obtain a confidence level representation map of the sparse depth map, and then the feature map and the confidence level representation map are fused to obtain a dense depth map.
It is worth mentioning that the confidence degree characterization graph is obtained by calculating the confidence degree of pixel points in a scene, a dense depth graph is obtained based on the confidence degree characterization graph, the influence of noise points can be reduced, and the accuracy of depth completion is enhanced for the reliability evaluation of pixel points.
Optionally, the fusion submodel comprises a first processing module; obtaining a confidence characterization map of the sparse depth map based on the output data of each first residual unit, wherein the confidence characterization map comprises: inputting the output data of each first residual error unit into a first processing module to obtain a confidence coefficient representation diagram; the first processing module sequentially performs up-sampling processing, splicing processing and convolution processing on output data of each first residual error unit, and obtains a confidence coefficient representation diagram through logistic regression processing.
Optionally, the fusion submodel further comprises a second processing module, a convolution module and a dot product module; obtaining a dense depth map according to the fused feature map and the confidence degree characterization map, wherein the dense depth map comprises: inputting the fusion characteristic graph into a convolution module; inputting the intermediate output of each convolution layer in the convolution module into a second processing module to obtain a fusion output graph; the second processing module is used for sequentially performing up-sampling processing, splicing processing and convolution processing on the intermediate output of each convolution layer in the convolution module to obtain a fusion output graph; inputting the fusion output image and the confidence coefficient representation image into a point multiplication module to obtain a dense depth image; and the point multiplication module is used for performing point multiplication on the fusion output image and the confidence coefficient representation image to obtain a dense depth image.
Specifically, in this embodiment, a schematic diagram of the depth map completion model is shown in fig. 3. The depth map completion model includes a first feature extraction submodel 301, a second feature extraction submodel 302, and a fusion submodel 303. The sparse depth map is input to a first feature extraction submodel 301 and the color map is input to a second feature extraction submodel 302. The first feature extraction submodel 301 includes a first residual pyramid module 3011 and a first pyramid pooling module 3012, the second feature extraction submodel 302 includes a second residual pyramid module 3021, a second pyramid pooling module 3022, and an image classification module 3023, and the color feature map 3041 output by the image classification module 3023 and the predicted feature map 3042 output by the second residual pyramid module 3021 calculate a perceptual loss to perform supervised training on the second feature extraction submodel 302. The fusion sub-model 303 includes a concatenation module 3031, a convolution module 3032, a first processing module 3033, a second processing module 3034, and a dot product module 3035. The splicing module 3031 splices the first feature map and the second feature map to obtain a spliced map 3043. The convolution module 3032 performs a convolution operation on the splice map 3043. The first processing module 3033 performs upsampling, stitching, and convolution on the intermediate output of each set of first residual units of the first residual pyramid module 3011, and uses Softmax to logically regress a confidence map 3044(confidence map) with the same resolution as the original sparse depth map. Similarly, the second processing module 3034 performs upsampling, stitching, and convolution on the intermediate output of the stacked convolutional layer in the convolution module 3032 to obtain a fused output map. The dot multiplication module 3035 performs dot multiplication on the fused output map and the confidence characterization map 3044, so as to obtain a final predicted dense depth map 3045(dense depth). The calculation formula of the point multiplication processing of the fusion output graph and the confidence coefficient characterization graph is as follows:
formula d: d out =D p (i,j)*e C(i,j)
Wherein C (i, j) represents a confidence token map, D p (i, j) denotes fusionOutput diagram, D out Representing the final computed dense depth map. The formula d is a pixel-by-pixel multiplication operation, and (i, j) represents the coordinate position of the pixel.
It should be noted that, as can be understood by those skilled in the art, the completion method of the depth map mentioned in the present embodiment is suitable for the sparse depth map completion of the outdoor scene, and may also be used in the indoor scene or other situations where the depth map result is sparse, and has a certain mobility.
The above embodiments can be mutually combined and cited, for example, the following embodiments are examples after being combined, but not limited thereto; the embodiments can be arbitrarily combined into a new embodiment without contradiction.
In one embodiment, fig. 4 shows a method for completing a depth map performed by an electronic device, which includes the following steps.
Step 401: and acquiring a sparse depth map and a color map corresponding to the sparse depth map.
Step 402: and inputting the sparse depth map into a first feature extraction submodel to obtain a first feature map. The first feature extraction submodel comprises a first residual pyramid module and a first pyramid pooling module.
Step 403: and inputting the color image corresponding to the sparse depth image into a pre-trained second feature extraction sub-model to obtain a second feature image. The second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module.
Step 404: and inputting the first feature map and the second feature map into the fusion sub-model to obtain a dense depth map. The fusion sub-model is used for splicing the first characteristic graph and the second characteristic graph to obtain a fusion characteristic graph fusing the space characteristic information of the corresponding color graph and the characteristic information of the sparse depth graph; and obtaining a dense depth map according to the fused feature map. The first feature extraction submodel, the second feature extraction submodel and the fusion submodel are used for carrying out supervision training on the basis of semi-dense depth sample maps corresponding to the sparse depth sample maps in the training data set. The semi-dense depth sample map corresponding to the sparse depth sample map is obtained according to the color map corresponding to the sparse depth sample map.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
An embodiment of the present application further provides an electronic device, as shown in fig. 5, including: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; wherein the memory stores instructions executable by the at least one processor 501, the instructions being executable by the at least one processor 501 to enable the at least one processor 501 to perform the above-described method embodiments.
The memory 502 and the processor 501 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 501 and the memory 502 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 501.
The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by processor 501 in performing operations.
An embodiment of the present application further provides a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (7)

1. A method for completing a depth map, comprising:
acquiring a sparse depth map and a color map corresponding to the sparse depth map;
inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map; the depth map completion model is obtained by performing supervised training on a semi-dense depth sample map corresponding to each sparse depth sample map in a training data set, and comprises a pre-trained first feature extraction submodel, a pre-trained second feature extraction submodel and a fusion submodel;
inputting the sparse depth map and the corresponding color map into a depth map completion model to obtain a completed dense depth map, including:
inputting the sparse depth map into the first feature extraction submodel to obtain a first feature map, inputting the color map corresponding to the sparse depth map into the second feature extraction submodel to obtain a second feature map, and performing splicing operation on the first feature map and the second feature map through the fusion submodel to obtain a fusion feature map;
performing convolution operation on the fusion characteristic graph by adopting a convolution module in the fusion sub-model, and sequentially performing up-sampling processing, splicing processing and convolution processing on intermediate output of the stacked convolution layers in the convolution module to obtain a fusion output graph;
acquiring a confidence coefficient representation map of the sparse depth map based on the output data of each first residual unit of a first residual pyramid module in the first feature extraction submodel;
and performing point multiplication on the fusion output image and the confidence coefficient characterization image by using a point multiplication module in the fusion sub-model to obtain the dense depth image.
2. The method of completing a depth map according to claim 1, wherein the semi-dense depth sample map corresponding to the sparse depth sample map is obtained from the color map corresponding to the sparse depth sample map.
3. The method according to claim 1, wherein the first residual pyramid module comprises T first residual units connected in sequence, the second feature extraction submodel comprises a second residual pyramid module and a second pyramid pooling module, and the second residual pyramid module comprises T second residual units connected in sequence; in the first residual pyramid module, the input data of the (i + 1) th first residual unit is data obtained by splicing the output data of the ith first residual unit and the output data of the ith second residual unit; wherein T is a positive integer greater than 1, and i is a positive integer less than T.
4. The method of completing a depth map according to claim 1, wherein the fusion submodel comprises a first processing module;
the obtaining of the confidence characterization map of the sparse depth map based on the output data of each first residual unit of the first residual pyramid module in the first feature extraction submodel includes:
inputting the output data of each first residual error unit into a first processing module to obtain the confidence coefficient representation diagram; the first processing module sequentially performs up-sampling processing, splicing processing and convolution processing on output data of each first residual error unit, and obtains the confidence coefficient representation diagram through logistic regression processing.
5. The method of completing a depth map according to claim 3, wherein the second feature extraction submodel further comprises an image classification module, and the training process of the second feature extraction submodel comprises:
respectively inputting each color sample image into an image classification module to obtain a color feature image of each color sample image;
using the color feature map of each color sample map as supervision data of the color sample map;
training the second feature extraction submodel using each of the color sample maps and supervised data for each of the color sample maps based on perceptual loss.
6. An electronic device, comprising: at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of complementing a depth map of any one of claims 1 to 5.
7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of complementing a depth map according to any one of claims 1 to 5.
CN202110974023.2A 2021-08-24 2021-08-24 Method for completing depth map, electronic device and storage medium Active CN113763447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110974023.2A CN113763447B (en) 2021-08-24 2021-08-24 Method for completing depth map, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110974023.2A CN113763447B (en) 2021-08-24 2021-08-24 Method for completing depth map, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113763447A CN113763447A (en) 2021-12-07
CN113763447B true CN113763447B (en) 2022-08-26

Family

ID=78791113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110974023.2A Active CN113763447B (en) 2021-08-24 2021-08-24 Method for completing depth map, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113763447B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272709B (en) * 2022-07-29 2023-08-15 梅卡曼德(北京)机器人科技有限公司 Training method, device, equipment and medium of depth completion model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108175402A (en) * 2017-12-26 2018-06-19 智慧康源(厦门)科技有限公司 The intelligent identification Method of electrocardiogram (ECG) data based on residual error network
CN112233160A (en) * 2020-10-15 2021-01-15 杭州知路科技有限公司 Binocular camera-based real-time depth and confidence degree prediction method

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510535B (en) * 2018-03-14 2020-04-24 大连理工大学 High-quality depth estimation method based on depth prediction and enhancer network
US10839543B2 (en) * 2019-02-26 2020-11-17 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
WO2021013334A1 (en) * 2019-07-22 2021-01-28 Toyota Motor Europe Depth maps prediction system and training method for such a system
CN110415284B (en) * 2019-07-31 2022-04-19 中国科学技术大学 Method and device for obtaining depth map of single-view color image
CN110517306B (en) * 2019-08-30 2023-07-28 的卢技术有限公司 Binocular depth vision estimation method and system based on deep learning
US11315266B2 (en) * 2019-12-16 2022-04-26 Robert Bosch Gmbh Self-supervised depth estimation method and system
CN112001960B (en) * 2020-08-25 2022-09-30 中国人民解放军91550部队 Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
CN112001914B (en) * 2020-08-31 2024-03-01 三星(中国)半导体有限公司 Depth image complement method and device
CN112348870B (en) * 2020-11-06 2022-09-30 大连理工大学 Significance target detection method based on residual error fusion
CN112330729B (en) * 2020-11-27 2024-01-12 中国科学院深圳先进技术研究院 Image depth prediction method, device, terminal equipment and readable storage medium
CN112560875B (en) * 2020-12-25 2023-07-28 北京百度网讯科技有限公司 Depth information complement model training method, device, equipment and storage medium
CN112541482B (en) * 2020-12-25 2024-04-02 北京百度网讯科技有限公司 Depth information complement model training method, device, equipment and storage medium
CN112861729B (en) * 2021-02-08 2022-07-08 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
CN113256546A (en) * 2021-05-24 2021-08-13 浙江大学 Depth map completion method based on color map guidance

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108175402A (en) * 2017-12-26 2018-06-19 智慧康源(厦门)科技有限公司 The intelligent identification Method of electrocardiogram (ECG) data based on residual error network
CN112233160A (en) * 2020-10-15 2021-01-15 杭州知路科技有限公司 Binocular camera-based real-time depth and confidence degree prediction method

Also Published As

Publication number Publication date
CN113763447A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN111507927B (en) Method and device for integrating images and point cloud images in neural network
CN109086668B (en) Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN109784283B (en) Remote sensing image target extraction method based on scene recognition task
CN109063569B (en) Semantic level change detection method based on remote sensing image
CN110414526B (en) Training method, training device, server and storage medium for semantic segmentation network
CN113936139B (en) Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN110009637B (en) Remote sensing image segmentation network based on tree structure
CN114758337B (en) Semantic instance reconstruction method, device, equipment and medium
CN116484971A (en) Automatic driving perception self-learning method and device for vehicle and electronic equipment
CN111444923A (en) Image semantic segmentation method and device under natural scene
CN113763447B (en) Method for completing depth map, electronic device and storage medium
CN115661767A (en) Image front vehicle target identification method based on convolutional neural network
CN113724388B (en) High-precision map generation method, device, equipment and storage medium
CN114529890A (en) State detection method and device, electronic equipment and storage medium
Wofk et al. Monocular Visual-Inertial Depth Estimation
CN114495089A (en) Three-dimensional target detection method based on multi-scale heterogeneous characteristic self-adaptive fusion
CN116861262B (en) Perception model training method and device, electronic equipment and storage medium
CN110751061B (en) SAR image recognition method, device, equipment and storage medium based on SAR network
CN112597996A (en) Task-driven natural scene-based traffic sign significance detection method
CN112085001A (en) Tunnel recognition model and method based on multi-scale edge feature detection
CN116258756B (en) Self-supervision monocular depth estimation method and system
CN112215766A (en) Image defogging method integrating image restoration and image enhancement and convolution network thereof
CN117036607A (en) Automatic driving scene data generation method and system based on implicit neural rendering
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220510

Address after: 230091 room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei, Anhui Province

Applicant after: Hefei lushenshi Technology Co.,Ltd.

Address before: 100083 room 3032, North B, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing

Applicant before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Applicant before: Hefei lushenshi Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant