CN115496788A - Deep completion method using airspace propagation post-processing module - Google Patents

Deep completion method using airspace propagation post-processing module Download PDF

Info

Publication number
CN115496788A
CN115496788A CN202211215408.1A CN202211215408A CN115496788A CN 115496788 A CN115496788 A CN 115496788A CN 202211215408 A CN202211215408 A CN 202211215408A CN 115496788 A CN115496788 A CN 115496788A
Authority
CN
China
Prior art keywords
depth
processing module
post
map
propagation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211215408.1A
Other languages
Chinese (zh)
Inventor
颜成钢
杨智文
张杰华
李亮
陈楚翘
高宇涵
胡冀
孙垚棋
王鸿奎
朱尊杰
殷海兵
张继勇
李宗鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211215408.1A priority Critical patent/CN115496788A/en
Publication of CN115496788A publication Critical patent/CN115496788A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a depth completion method using an airspace propagation post-processing module, which comprises the steps of firstly determining a depth estimation network model; then determining an airspace propagation post-processing module; training the depth estimation network added with the airspace propagation post-processing module; and finally completing deep completion through the trained model. Compared with the traditional monocular depth estimation and completion network, the method and the system have the advantages that an additional post-processing process is added, so that the model can more fully utilize the precise sparse depth information from the LiDAR, and the effect of enabling the depth completion result to be more accurate is achieved.

Description

Deep completion method using airspace propagation post-processing module
Technical Field
The invention belongs to the field of computer vision, particularly relates to a depth complementing method using a space domain propagation post-processing module, and aims at a depth sensing system.
Background
In recent years, with the rapid growth of computer vision applications, depth estimation of a single image, i.e., prediction of the distance of each pixel from the camera, has become an important issue. It has wide application in fields such as augmented reality, unmanned aerial vehicle control, autopilot and motion planning. In order to obtain a reliable depth prediction, information from various sensors is utilized, such as RGB cameras, radar, lidar and ultrasonic sensors. Depth sensors, such as LiDAR sensors, may produce accurate depth measurements at high frequencies. However, due to hardware limitations, such as the number of scan channels, the density of acquisition depths is typically sparse. To overcome these limitations, the method of estimating dense depth information from a given sparse depth value is referred to as depth completion.
An affinity matrix is a generic matrix that determines the distance or degree of similarity of two points in space. In computer vision tasks, it is a weighted graph that treats each pixel as a node and connects each pair of pixels by an edge. The weights on the edges reflect the similarity of pairs of pixels in different tasks. For example, for low-level visual tasks, such as image filtering, affinity values should reveal low-level coherence of color and texture; for medium and high level visual tasks, such as image matting and segmentation, the affinity measure should preserve the semantic level of pairwise similarity. The sparse depth acquired by the LiDAR sensor may be propagated to surrounding pixels using an affinity matrix. Therefore, the airspace propagation post-processing module can perform post-processing on the result of depth completion by combining the learned data of the affinity moment array and the LiDAR sensor, and a more accurate depth estimation effect is achieved.
Disclosure of Invention
The present invention primarily considers that depth sensing and estimation is critical in a wide range of engineering applications with the rapid growth of computer vision applications. However, existing depth sensors, including lidar, structured light-based depth sensors, and stereo cameras, have their own limitations. For example, top-level 3D LiDAR is costly (up to $ 75,000 per unit cost) and only provides sparse measurements of distant objects. Structured light based depth sensors (e.g., kinect) are sensitive to sunlight and consume power, and have short ranging distances. Therefore, how to reduce the depth estimation cost and improve the estimation accuracy is a problem worth discussing.
Aiming at the defects in the prior art, the invention provides a deep completion method using a space domain propagation post-processing module. Aiming at the problem of monocular depth completion, the method optimizes the traditional monocular depth estimation network. In order to cooperate with a subsequent spatial domain propagation post-processing module, the depth estimation network is led to output an affinity diagram (affinity matrix) additionally, namely the output of the traditional monocular depth estimation network is set from 1 channel to 9 channels, wherein 1 channel is used as the depth diagram, and the other 8 channels are used for representing the affinity diagram. Therefore, the spatial domain propagation post-processing module can optimize the estimation result by combining the affinity information learned by the network and the sparse depth map. The estimated result is more accurate through the post-processing method, only the common LiDAR is used for acquiring sparse depth information, and the depth estimation cost can be effectively reduced by combining the common RGB camera.
A deep completion method using an airspace propagation post-processing module comprises the following specific steps:
step 1, determining a depth estimation network model;
step 2, determining a space domain propagation post-processing module;
step 3, training the depth estimation network with the airspace propagation post-processing module;
step 4, completing deep completion through the trained model;
further, the specific method of step 1 is as follows;
the encoder part uses a ResNet50 neural network, the decoder part uses standard upsampling, and each residual block needs to be in numerical communication with a corresponding upsampling layer, namely, skip connection is added. The two parts constitute a depth estimation network that receives RGB maps from the cameras and sparse depth maps from the LiDAR and outputs an initial depth map and an affinity map.
Further, the specific method of step 2 is as follows;
and (3) the spatial domain propagation post-processing module adopts a linear propagation mode, propagates the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step (1), and finally replaces the depth value of the corresponding position by the sparse depth map. And the process of propagation therein is implemented using recursive convolution operations.
Depth map D given the output of a depth estimation network 0 The convolution transformation function formula with convolution kernel size k at each iteration is as follows:
wherein,
Figure BDA0003875868420000031
in which the kernel is transformed
Figure BDA0003875868420000032
Is the output of the spatial domain propagation post-processing module spatially dependent on the input image, t denotes the t-th timeIteration, κ i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote κ i,j Is (e.g., a =0 and b =0 represent the center position of the kernel, a =1 and b =1 represent the lower right corner position of the kernel), and thus D represents the center position of the kernel, and D represents the lower right corner position of the kernel i,j,t+1 Representing depth values, D, at the ith row and jth column of the depth map at the t +1 th iteration i-a,j-b,t Indicating a depth value in row i-a and column j-b of the depth map at the t-th iteration, an element-by-element multiplication. The size k of the convolution kernel is set to 3 here, and is set to an odd number in order to make the pixels around each pixel (i, j) symmetrical. The weight of the convolution kernel is determined by the affinity diagram of step 1, and normalization processing of the (-1, 1) interval is performed on the weight of the convolution kernel in order to make the model stably converge. Then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is implemented:
Figure BDA0003875868420000041
wherein m is i,j Is a discriminant function used to discriminate whether the pixel is from LiDAR acquisition, if so, the value is 1, otherwise, it is 0.
Figure BDA0003875868420000042
Represents depth values from locations corresponding to sparse depth maps of LiDAR acquisitions.
Further, the specific method in step 3 is as follows;
the training platform employs a Pythrch. And respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the depth estimation network added with the airspace propagation post-processing module. Wherein the weight of the ResNet-50 network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the raw data to simulate the LiDAR sampling effect for training.
Further, the specific method in step 4 is as follows;
the trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR. The method comprises the steps of firstly obtaining an initial dense depth map and a corresponding affinity map through a depth estimation network, and then obtaining a final dense depth map through a spatial domain propagation post-processing module.
The invention has the following beneficial effects:
compared with the traditional monocular depth estimation and completion network, the method and the system have the advantages that an additional post-processing process is added, so that the model can more fully utilize the precise sparse depth information from the LiDAR, and the effect of enabling the depth completion result to be more accurate is achieved.
Drawings
FIG. 1 is a diagram of an overall network model of the present invention;
FIG. 2 is a schematic representation of the spatial propagation operation of the post-processing portion of the present invention;
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
The deep completion method using the airspace propagation post-processing network is implemented according to the following steps.
Step 1, determining a depth estimation neural network model
The encoder part uses the classical ResNet-50 network model, the decoder part uses standard upsampling, and each residual block is passed numerically with the corresponding upsampling layer, i.e. a skip connection is added. The overall model is shown in fig. 1, wherein after the ResNet-50 network model performs convolution operation on the input, 4 different residual blocks are needed to pass through, and finally classification is completed through an average pooling layer and a full connection layer through a classification function. Where the last fully connected layer and the sorted portion need to be removed and then followed by 4 upsampled layers. The up-sampling method is realized by a bilinear interpolation method of a Pythroch platform. And a skip connection is added between each pair of coding and decoding layers.
The depth estimation network receives RGB maps from the cameras and sparse depth maps from LiDAR and outputs an initial depth map and an affinity map.
Step 2, determining a space domain propagation post-processing module
And (3) the spatial domain propagation post-processing module propagates (diffuses) the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step (1) in a linear propagation mode, and finally replaces the depth value of the corresponding position by the sparse depth map. And the process of propagation therein is implemented using recursive convolution operations. The convolution operation is used in practical applications because it can be efficiently implemented by image vectorization, thus being competent for real-time depth estimation tasks. As shown in FIG. 2, spatial domain propagation is similar to an anisotropic diffusion process, and the affinity diagram for diffusion will be learned by the depth estimation network model to guide the refinement of the output depth map. Depth map D given the output of a depth estimation network 0 For each iteration t, the convolution transformation function formula with a convolution kernel size of k is as follows:
wherein,
Figure BDA0003875868420000061
wherein the kernel is transformed
Figure BDA0003875868420000062
Is the output of a spatial domain propagation module, κ, spatially dependent on the input image i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote κ i,j Relative positions (e.g., a =0 and b =0 represent the center position of the kernel, and a =1 and b =1 represent the lower right corner position of the kernel). The kernel size k is here set to 3, and the odd number is set in order to make the context around each pixel (i, j) symmetric. In order to stably converge the model, normalization processing is performed for the (-1, 1) interval on the weight of the kernel. Then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is performed:
Figure BDA0003875868420000071
wherein m is i,j And the function is a discriminant function used for distinguishing whether the pixel comes from the pixel acquired by the LiDAR, if so, the value is 1, and if not, the value is 0.
Figure BDA0003875868420000072
Represents a sparse depth map from LiDAR acquisition.
Step 3, model training
The training platform employs a Pythrch. And respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the whole network model.
The NYU v2 dataset includes RGB and depth images collected from 464 different indoor scenes. Official data segmentation is used in the present invention, using 249 scenes for training and taking 50K image samples from the training set. The test was performed using a 654 image with a label. The original image size is 640 × 480, down-sampled to half size and center-cropped, and the final input image size of the network is 304 × 228.
The KITTI data set consists of 22 sequences of data measured by cameras and lidar (a depth sensor that can also produce sparse depth maps). One half of the sequence is used for training and the other half is used for testing. Training was performed using all 46K images in the training sequence, and 3200 images were randomly drawn from the test sequence for testing. Further, since the top area of the image has no depth, the present invention uses the bottom 912 × 228 sized image as the input to the network.
The weight of the ResNet-50 network model of the encoder portion of the network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the original data to simulate the LiDAR sampling effect for training.
Step 4, model using method
The trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR. The method comprises the steps of firstly obtaining an initial dense depth map and a corresponding affinity map through a depth estimation network, and then obtaining a final dense depth map through a spatial domain propagation post-processing module.

Claims (5)

1. A deep completion method using a space domain propagation post-processing module is characterized by comprising the following steps:
step 1, determining a depth estimation network model;
step 2, determining an airspace propagation post-processing module;
step 3, training the depth estimation network added with the airspace propagation post-processing module;
and 4, completing deep completion through the trained model.
2. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 1, wherein the specific method of step 1 is as follows;
the encoder part uses a ResNet50 neural network, the decoder part uses standard upsampling, and each residual block needs to carry out numerical value transmission with a corresponding upsampling layer, namely skip connection is added; these two parts constitute a depth estimation network that receives the RGB map from the camera and the sparse depth map from the LiDAR and outputs an initial depth map and an affinity map.
3. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 2, wherein the step 2 is as follows;
the spatial domain propagation post-processing module adopts a linear propagation mode, propagates the depth value of each pixel in the initial depth map to the periphery according to the affinity map output by the network model determined in the step 1, and finally replaces the depth value of the corresponding position by the sparse depth map; and the process of propagation is realized by using a recursive convolution operation mode;
depth map D given the output of a depth estimation network 0 The convolution transformation function formula with a convolution kernel size of k at each iteration is as follows:
wherein,
Figure FDA0003875868410000021
in which the kernel is transformed
Figure FDA0003875868410000022
Is the output of a spatial domain propagation post-processing module that is spatially dependent on the input image, t denotes the t-th iteration, k i,j Is a normalized transform kernel, i and j denote the ith row and jth column in the image, a and b denote kappa i,j Relative position in (1), thus D i,j,t+1 Denotes the depth value at the ith row and jth column of the depth map at the t +1 th iteration, D i-a,j-b,t A depth value indicating that at the t-th iteration located in row i-a and column j-b of the depth map indicates a element-by-element multiplication; the size k of the convolution kernel is set to 3 here, and is set to an odd number in order to make the pixels around each pixel (i, j) symmetrical; the weight value of the convolution kernel is determined by the affinity diagram in the step 1, and in order to make the model stably converge, normalization processing in the (-1, 1) interval is carried out on the weight value of the convolution kernel; then, in order to ensure that the depth after the post-processing has the same value in the corresponding pixels in the sparse depth map, the following is performed:
Figure FDA0003875868410000023
wherein m is i,j Is a discriminant function for discriminating whether the pixel is from LiDAR acquisition, if so, the value is 1, otherwise, the value is 0;
Figure FDA0003875868410000024
depth values representing corresponding locations from sparse depth maps of LiDAR acquisition.
4. The method for depth completion by using the space domain propagation post-processing module as claimed in claim 3, wherein the specific method in step 3 is as follows;
the training platform adopts a Pythrch; respectively adopting an NYU v2 indoor data set and a KITTI outdoor data set to train the depth estimation network added with the airspace propagation post-processing module; wherein the weight of the ResNet-50 network model is initialized using results pre-trained on the ImageNet dataset; the optimizer adopts a random gradient descent SGD optimizer, the batch size is set to be 12, and the iteration number is set to be 40; the learning rate was initialized to 0.01 and reduced by 20% every 10 iteration cycles; weight decay is set to 0.00001 for regularization; 500 pixel points are collected on the original data to simulate the LiDAR sampling effect for training.
5. The method for depth completion by using the spatial domain propagation post-processing module as claimed in claim 4, wherein the detailed method in step 4 is as follows;
the trained complete model needs to receive two parts of input: RGB video images from a camera and sparse depth information from LiDAR; firstly, an initial dense depth map and a corresponding affinity map are obtained through a depth estimation network, and a final dense depth map is obtained through a spatial domain propagation post-processing module.
CN202211215408.1A 2022-09-30 2022-09-30 Deep completion method using airspace propagation post-processing module Withdrawn CN115496788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211215408.1A CN115496788A (en) 2022-09-30 2022-09-30 Deep completion method using airspace propagation post-processing module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211215408.1A CN115496788A (en) 2022-09-30 2022-09-30 Deep completion method using airspace propagation post-processing module

Publications (1)

Publication Number Publication Date
CN115496788A true CN115496788A (en) 2022-12-20

Family

ID=84473276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211215408.1A Withdrawn CN115496788A (en) 2022-09-30 2022-09-30 Deep completion method using airspace propagation post-processing module

Country Status (1)

Country Link
CN (1) CN115496788A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953029A (en) * 2024-03-27 2024-04-30 北京科技大学 General depth map completion method and device based on depth information propagation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953029A (en) * 2024-03-27 2024-04-30 北京科技大学 General depth map completion method and device based on depth information propagation
CN117953029B (en) * 2024-03-27 2024-06-07 北京科技大学 General depth map completion method and device based on depth information propagation

Similar Documents

Publication Publication Date Title
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
CN111625608B (en) Method and system for generating electronic map according to remote sensing image based on GAN model
CN114565655B (en) Depth estimation method and device based on pyramid segmentation attention
CN110910437B (en) Depth prediction method for complex indoor scene
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN110399820B (en) Visual recognition analysis method for roadside scene of highway
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN111797920B (en) Remote sensing extraction method and system for depth network impervious surface with gate control feature fusion
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN111652273A (en) Deep learning-based RGB-D image classification method
CN115240079A (en) Multi-source remote sensing image depth feature fusion matching method
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN117132759A (en) Saliency target detection method based on multiband visual image perception and fusion
CN115496788A (en) Deep completion method using airspace propagation post-processing module
CN117809016A (en) Cloud layer polarization removal orientation method based on deep learning
CN118212127A (en) Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method
CN117036442A (en) Robust monocular depth completion method, system and storage medium
CN115797684A (en) Infrared small target detection method and system based on context information
CN115482257A (en) Motion estimation method integrating deep learning characteristic optical flow and binocular vision
CN113435243B (en) Hyperspectral true downsampling fuzzy kernel estimation method
CN109697695A (en) The ultra-low resolution thermal infrared images interpolation algorithm of visible images guidance
CN115223033A (en) Synthetic aperture sonar image target classification method and system
CN114091519A (en) Shielded pedestrian re-identification method based on multi-granularity shielding perception
CN116958800A (en) Remote sensing image change detection method based on hierarchical attention residual unet++

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221220