CN115496788A - Deep completion method using airspace propagation post-processing module - Google Patents
Deep completion method using airspace propagation post-processing module Download PDFInfo
- Publication number
- CN115496788A CN115496788A CN202211215408.1A CN202211215408A CN115496788A CN 115496788 A CN115496788 A CN 115496788A CN 202211215408 A CN202211215408 A CN 202211215408A CN 115496788 A CN115496788 A CN 115496788A
- Authority
- CN
- China
- Prior art keywords
- depth
- processing module
- post
- map
- propagation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000012805 post-processing Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000000694 effects Effects 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 claims 1
- 239000011159 matrix material Substances 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000009792 diffusion process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a depth completion method using an airspace propagation post-processing module, which comprises the steps of firstly determining a depth estimation network model; then determining an airspace propagation post-processing module; training the depth estimation network added with the airspace propagation post-processing module; and finally completing deep completion through the trained model. Compared with the traditional monocular depth estimation and completion network, the method and the system have the advantages that an additional post-processing process is added, so that the model can more fully utilize the precise sparse depth information from the LiDAR, and the effect of enabling the depth completion result to be more accurate is achieved.
Description
Technical Field
The invention belongs to the field of computer vision, particularly relates to a depth complementing method using a space domain propagation post-processing module, and aims at a depth sensing system.
Background
In recent years, with the rapid growth of computer vision applications, depth estimation of a single image, i.e., prediction of the distance of each pixel from the camera, has become an important issue. It has wide application in fields such as augmented reality, unmanned aerial vehicle control, autopilot and motion planning. In order to obtain a reliable depth prediction, information from various sensors is utilized, such as RGB cameras, radar, lidar and ultrasonic sensors. Depth sensors, such as LiDAR sensors, may produce accurate depth measurements at high frequencies. However, due to hardware limitations, such as the number of scan channels, the density of acquisition depths is typically sparse. To overcome these limitations, the method of estimating dense depth information from a given sparse depth value is referred to as depth completion.
An affinity matrix is a generic matrix that determines the distance or degree of similarity of two points in space. In computer vision tasks, it is a weighted graph that treats each pixel as a node and connects each pair of pixels by an edge. The weights on the edges reflect the similarity of pairs of pixels in different tasks. For example, for low-level visual tasks, such as image filtering, affinity values should reveal low-level coherence of color and texture; for medium and high level visual tasks, such as image matting and segmentation, the affinity measure should preserve the semantic level of pairwise similarity. The sparse depth acquired by the LiDAR sensor may be propagated to surrounding pixels using an affinity matrix. Therefore, the airspace propagation post-processing module can perform post-processing on the result of depth completion by combining the learned data of the affinity moment array and the LiDAR sensor, and a more accurate depth estimation effect is achieved.
Disclosure of Invention
The present invention primarily considers that depth sensing and estimation is critical in a wide range of engineering applications with the rapid growth of computer vision applications. However, existing depth sensors, including lidar, structured light-based depth sensors, and stereo cameras, have their own limitations. For example, top-level 3D LiDAR is costly (up to $ 75,000 per unit cost) and only provides sparse measurements of distant objects. Structured light based depth sensors (e.g., kinect) are sensitive to sunlight and consume power, and have short ranging distances. Therefore, how to reduce the depth estimation cost and improve the estimation accuracy is a problem worth discussing.
Aiming at the defects in the prior art, the invention provides a deep completion method using a space domain propagation post-processing module. Aiming at the problem of monocular depth completion, the method optimizes the traditional monocular depth estimation network. In order to cooperate with a subsequent spatial domain propagation post-processing module, the depth estimation network is led to output an affinity diagram (affinity matrix) additionally, namely the output of the traditional monocular depth estimation network is set from 1 channel to 9 channels, wherein 1 channel is used as the depth diagram, and the other 8 channels are used for representing the affinity diagram. Therefore, the spatial domain propagation post-processing module can optimize the estimation result by combining the affinity information learned by the network and the sparse depth map. The estimated result is more accurate through the post-processing method, only the common LiDAR is used for acquiring sparse depth information, and the depth estimation cost can be effectively reduced by combining the common RGB camera.
A deep completion method using an airspace propagation post-processing module comprises the following specific steps:
step 1, determining a depth estimation network model;
step 2, determining a space domain propagation post-processing module;
step 3, training the depth estimation network with the airspace propagation post-processing module;
step 4, completing deep completion through the trained model;
further, the specific method of step 1 is as follows;
the encoder part uses a ResNet50 neural network, the decoder part uses standard upsampling, and each residual block needs to be in numerical communication with a corresponding upsampling layer, namely, skip connection is added. The two parts constitute a depth estimation network that receives RGB maps from the cameras and sparse depth maps from the LiDAR and outputs an initial depth map and an affinity map.
Further, the specific method of step 2 is as follows;
and (3) the spatial domain propagation post-processing module adopts a linear propagation mode, propagates the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step (1), and finally replaces the depth value of the corresponding position by the sparse depth map. And the process of propagation therein is implemented using recursive convolution operations.
Depth map D given the output of a depth estimation network 0 The convolution transformation function formula with convolution kernel size k at each iteration is as follows:
in which the kernel is transformedIs the output of the spatial domain propagation post-processing module spatially dependent on the input image, t denotes the t-th timeIteration, κ i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote κ i,j Is (e.g., a =0 and b =0 represent the center position of the kernel, a =1 and b =1 represent the lower right corner position of the kernel), and thus D represents the center position of the kernel, and D represents the lower right corner position of the kernel i,j,t+1 Representing depth values, D, at the ith row and jth column of the depth map at the t +1 th iteration i-a,j-b,t Indicating a depth value in row i-a and column j-b of the depth map at the t-th iteration, an element-by-element multiplication. The size k of the convolution kernel is set to 3 here, and is set to an odd number in order to make the pixels around each pixel (i, j) symmetrical. The weight of the convolution kernel is determined by the affinity diagram of step 1, and normalization processing of the (-1, 1) interval is performed on the weight of the convolution kernel in order to make the model stably converge. Then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is implemented:
wherein m is i,j Is a discriminant function used to discriminate whether the pixel is from LiDAR acquisition, if so, the value is 1, otherwise, it is 0.Represents depth values from locations corresponding to sparse depth maps of LiDAR acquisitions.
Further, the specific method in step 3 is as follows;
the training platform employs a Pythrch. And respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the depth estimation network added with the airspace propagation post-processing module. Wherein the weight of the ResNet-50 network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the raw data to simulate the LiDAR sampling effect for training.
Further, the specific method in step 4 is as follows;
the trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR. The method comprises the steps of firstly obtaining an initial dense depth map and a corresponding affinity map through a depth estimation network, and then obtaining a final dense depth map through a spatial domain propagation post-processing module.
The invention has the following beneficial effects:
compared with the traditional monocular depth estimation and completion network, the method and the system have the advantages that an additional post-processing process is added, so that the model can more fully utilize the precise sparse depth information from the LiDAR, and the effect of enabling the depth completion result to be more accurate is achieved.
Drawings
FIG. 1 is a diagram of an overall network model of the present invention;
FIG. 2 is a schematic representation of the spatial propagation operation of the post-processing portion of the present invention;
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
The deep completion method using the airspace propagation post-processing network is implemented according to the following steps.
Step 1, determining a depth estimation neural network model
The encoder part uses the classical ResNet-50 network model, the decoder part uses standard upsampling, and each residual block is passed numerically with the corresponding upsampling layer, i.e. a skip connection is added. The overall model is shown in fig. 1, wherein after the ResNet-50 network model performs convolution operation on the input, 4 different residual blocks are needed to pass through, and finally classification is completed through an average pooling layer and a full connection layer through a classification function. Where the last fully connected layer and the sorted portion need to be removed and then followed by 4 upsampled layers. The up-sampling method is realized by a bilinear interpolation method of a Pythroch platform. And a skip connection is added between each pair of coding and decoding layers.
The depth estimation network receives RGB maps from the cameras and sparse depth maps from LiDAR and outputs an initial depth map and an affinity map.
Step 2, determining a space domain propagation post-processing module
And (3) the spatial domain propagation post-processing module propagates (diffuses) the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step (1) in a linear propagation mode, and finally replaces the depth value of the corresponding position by the sparse depth map. And the process of propagation therein is implemented using recursive convolution operations. The convolution operation is used in practical applications because it can be efficiently implemented by image vectorization, thus being competent for real-time depth estimation tasks. As shown in FIG. 2, spatial domain propagation is similar to an anisotropic diffusion process, and the affinity diagram for diffusion will be learned by the depth estimation network model to guide the refinement of the output depth map. Depth map D given the output of a depth estimation network 0 For each iteration t, the convolution transformation function formula with a convolution kernel size of k is as follows:
wherein the kernel is transformedIs the output of a spatial domain propagation module, κ, spatially dependent on the input image i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote κ i,j Relative positions (e.g., a =0 and b =0 represent the center position of the kernel, and a =1 and b =1 represent the lower right corner position of the kernel). The kernel size k is here set to 3, and the odd number is set in order to make the context around each pixel (i, j) symmetric. In order to stably converge the model, normalization processing is performed for the (-1, 1) interval on the weight of the kernel. Then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is performed:
wherein m is i,j And the function is a discriminant function used for distinguishing whether the pixel comes from the pixel acquired by the LiDAR, if so, the value is 1, and if not, the value is 0.Represents a sparse depth map from LiDAR acquisition.
Step 3, model training
The training platform employs a Pythrch. And respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the whole network model.
The NYU v2 dataset includes RGB and depth images collected from 464 different indoor scenes. Official data segmentation is used in the present invention, using 249 scenes for training and taking 50K image samples from the training set. The test was performed using a 654 image with a label. The original image size is 640 × 480, down-sampled to half size and center-cropped, and the final input image size of the network is 304 × 228.
The KITTI data set consists of 22 sequences of data measured by cameras and lidar (a depth sensor that can also produce sparse depth maps). One half of the sequence is used for training and the other half is used for testing. Training was performed using all 46K images in the training sequence, and 3200 images were randomly drawn from the test sequence for testing. Further, since the top area of the image has no depth, the present invention uses the bottom 912 × 228 sized image as the input to the network.
The weight of the ResNet-50 network model of the encoder portion of the network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the original data to simulate the LiDAR sampling effect for training.
Step 4, model using method
The trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR. The method comprises the steps of firstly obtaining an initial dense depth map and a corresponding affinity map through a depth estimation network, and then obtaining a final dense depth map through a spatial domain propagation post-processing module.
Claims (5)
1. A deep completion method using a space domain propagation post-processing module is characterized by comprising the following steps:
step 1, determining a depth estimation network model;
step 2, determining an airspace propagation post-processing module;
step 3, training the depth estimation network added with the airspace propagation post-processing module;
and 4, completing deep completion through the trained model.
2. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 1, wherein the specific method of step 1 is as follows;
the encoder part uses a ResNet50 neural network, the decoder part uses standard upsampling, and each residual block needs to carry out numerical value transmission with a corresponding upsampling layer, namely skip connection is added; these two parts constitute a depth estimation network that receives the RGB map from the camera and the sparse depth map from the LiDAR and outputs an initial depth map and an affinity map.
3. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 2, wherein the step 2 is as follows;
the spatial domain propagation post-processing module adopts a linear propagation mode, propagates the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step 1, and finally replaces the depth value of the corresponding position by the sparse depth map; and the process of propagation is realized by using a recursive convolution operation mode;
depth map D given the output of a depth estimation network 0 The convolution transformation function formula with a convolution kernel size of k at each iteration is as follows:
in which the kernel is transformedIs the output of a spatial domain propagation post-processing module that is spatially dependent on the input image, t denotes the t-th iteration, k i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote kappa i,j Relative position in (1), thus D i,j,t+1 Denotes the depth value at the ith row and jth column of the depth map at the t +1 th iteration, D i-a,j-b,t A depth value indicating that at the t-th iteration located in row i-a and column j-b of the depth map indicates a element-by-element multiplication; the size k of the convolution kernel is set to 3 here, and is set to an odd number in order to make the pixels around each pixel (i, j) symmetrical; the weight value of the convolution kernel is determined by the affinity diagram in the step 1, and in order to make the model stably converge, normalization processing in the (-1, 1) interval is carried out on the weight value of the convolution kernel; then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is performed:
4. The method for depth completion by using the space domain propagation post-processing module as claimed in claim 3, wherein the specific method in step 3 is as follows;
the training platform adopts a Pythrch; respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the depth estimation network added with the airspace propagation post-processing module; wherein the weight of the ResNet-50 network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the original data to simulate the LiDAR sampling effect for training.
5. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 4, wherein the detailed method in step 4 is as follows;
the trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR; firstly, an initial dense depth map and a corresponding affinity map are obtained through a depth estimation network, and a final dense depth map is obtained through a spatial domain propagation post-processing module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211215408.1A CN115496788A (en) | 2022-09-30 | 2022-09-30 | Deep completion method using airspace propagation post-processing module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211215408.1A CN115496788A (en) | 2022-09-30 | 2022-09-30 | Deep completion method using airspace propagation post-processing module |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115496788A true CN115496788A (en) | 2022-12-20 |
Family
ID=84473276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211215408.1A Withdrawn CN115496788A (en) | 2022-09-30 | 2022-09-30 | Deep completion method using airspace propagation post-processing module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496788A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117953029A (en) * | 2024-03-27 | 2024-04-30 | 北京科技大学 | General depth map completion method and device based on depth information propagation |
-
2022
- 2022-09-30 CN CN202211215408.1A patent/CN115496788A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117953029A (en) * | 2024-03-27 | 2024-04-30 | 北京科技大学 | General depth map completion method and device based on depth information propagation |
CN117953029B (en) * | 2024-03-27 | 2024-06-07 | 北京科技大学 | General depth map completion method and device based on depth information propagation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10353271B2 (en) | Depth estimation method for monocular image based on multi-scale CNN and continuous CRF | |
CN111625608B (en) | Method and system for generating electronic map according to remote sensing image based on GAN model | |
CN114565655B (en) | Depth estimation method and device based on pyramid segmentation attention | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN115311186B (en) | Cross-scale attention confrontation fusion method and terminal for infrared and visible light images | |
CN110399820B (en) | Visual recognition analysis method for roadside scene of highway | |
CN113554032B (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN111797920B (en) | Remote sensing extraction method and system for depth network impervious surface with gate control feature fusion | |
CN115908772A (en) | Target detection method and system based on Transformer and fusion attention mechanism | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN111652273A (en) | Deep learning-based RGB-D image classification method | |
CN115240079A (en) | Multi-source remote sensing image depth feature fusion matching method | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN117132759A (en) | Saliency target detection method based on multiband visual image perception and fusion | |
CN115496788A (en) | Deep completion method using airspace propagation post-processing module | |
CN117809016A (en) | Cloud layer polarization removal orientation method based on deep learning | |
CN118212127A (en) | Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method | |
CN117036442A (en) | Robust monocular depth completion method, system and storage medium | |
CN115797684A (en) | Infrared small target detection method and system based on context information | |
CN115482257A (en) | Motion estimation method integrating deep learning characteristic optical flow and binocular vision | |
CN113435243B (en) | Hyperspectral true downsampling fuzzy kernel estimation method | |
CN109697695A (en) | The ultra-low resolution thermal infrared images interpolation algorithm of visible images guidance | |
CN115223033A (en) | Synthetic aperture sonar image target classification method and system | |
CN114091519A (en) | Shielded pedestrian re-identification method based on multi-granularity shielding perception | |
CN116958800A (en) | Remote sensing image change detection method based on hierarchical attention residual unet++ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20221220 |