CN116468768B - Scene depth completion method based on conditional variation self-encoder and geometric guidance - Google Patents

Scene depth completion method based on conditional variation self-encoder and geometric guidance Download PDF

Info

Publication number
CN116468768B
CN116468768B CN202310422520.0A CN202310422520A CN116468768B CN 116468768 B CN116468768 B CN 116468768B CN 202310422520 A CN202310422520 A CN 202310422520A CN 116468768 B CN116468768 B CN 116468768B
Authority
CN
China
Prior art keywords
depth
map
point cloud
depth map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310422520.0A
Other languages
Chinese (zh)
Other versions
CN116468768A (en
Inventor
魏明强
吴鹏
燕雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310422520.0A priority Critical patent/CN116468768B/en
Publication of CN116468768A publication Critical patent/CN116468768A/en
Application granted granted Critical
Publication of CN116468768B publication Critical patent/CN116468768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/24Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Optics & Photonics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a scene depth completion method based on a conditional variation self-encoder and geometric guidance, which comprises the following steps: acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene; the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features; and converting the sparse depth map into point cloud by using camera internal parameters, namely focal length and optical center coordinates, extracting a geometric space feature from a point cloud up-sampling model, and mapping the geometric space feature back to the sparse depth map. The invention can solve the problem that the data acquired by the laser radar is too sparse, so that the low-cost laser radar with fewer wire harnesses can obtain more accurate and dense depth information, and provides a cost-effective solution for industries such as automatic driving, robot environment sensing and the like which need accurate and dense depth data.

Description

Scene depth completion method based on conditional variation self-encoder and geometric guidance
Technical Field
The invention relates to the technical field of depth map completion, in particular to a scene depth completion method based on a conditional variation self-encoder and geometric guidance.
Background
Human perception, understanding, and experience of the surrounding environment relies on visually acquired three-dimensional scene information. The computer vision simulates the human behavior, and various sensors are used as vision organs to acquire scene information, so that the scene is identified and understood, wherein the depth information plays a key role in the fields of robots, automatic driving, augmented reality and the like. In the field of automatic driving, the distance between the current vehicle and other vehicles, pedestrians, obstacles and the like needs to be perceived in the driving process, and the full-automatic Level5 is required to have the distance measuring capability accurate to centimeters. Currently, liDAR (LiDAR) is the primary active distance sensor in autopilot. Compared with a two-dimensional RGB image acquired by a color camera, a depth map acquired by a laser radar (the depth map and a point cloud can be mutually converted through camera internal parameters) has an accurate depth distance, so that the position information of a 3D target in the surrounding environment can be accurately perceived. However, one laser radar can only emit a limited laser beam with 16 lines, 32 lines or 64 lines in the vertical direction, so that the acquired point cloud density is extremely sparse (the pixels of the effective depth value only account for about 5% of the color image), and serious influence is brought to downstream tasks such as 3D target detection, three-dimensional environment perception and the like.
Disclosure of Invention
The invention aims to provide a scene depth completion method based on a conditional variation self-encoder and geometric guidance, so as to solve the key problems of data sparseness and missing caused by the existing depth imaging equipment such as a laser radar.
In order to achieve the above purpose, the present invention provides the following technical solutions: a scene depth completion method based on conditional variation self-encoder and geometric guidance, comprising the steps of:
acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene;
the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features;
converting the sparse depth map into point cloud by utilizing camera internal parameters or focal length, extracting a point cloud up-sampling model to geometric space features, and mapping the geometric space features back to the sparse depth map;
the fusion of the image characteristics and the point cloud characteristics is realized by adopting a dynamic image message transmission module;
generating a preliminary depth complement diagram by using a U-shaped coder decoder based on a residual error network;
and inputting the preliminary predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
Preferably, the acquiring the color image and the sparse depth map in the autopilot scene includes:
capturing a color image and a sparse depth map in an autopilot scene using a color camera and a lidar;
sparse depth maps are changed into dense depth maps by using sparsityInvariantCNNs algorithm as real tag auxiliary training.
Preferably, the design has a condition-variable self-encoder of a priori network and a posterior network, the color image and the sparse depth map are input into the priori network to extract features, and the color image, the sparse depth map and the dense depth map are input into the posterior network to extract features, including:
the feature extraction module based on the ResNet structure designs a priori network and a posterior network with the same structure as a condition variation self-encoder;
inputting the color image and the sparse depth map into a priori network to extract the feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract the feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain the probability distribution D of the respective features 1 And D 2 Then use the Kullback-Leibler divergence loss functionDigital supervision distribution D 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
Preferably, the converting the sparse depth map into a point cloud by using a camera reference or focal length, extracting a geometric spatial feature from a point cloud up-sampling model, and mapping the geometric spatial feature back to the sparse depth map includes:
sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1
The method comprises the steps of randomly sampling point clouds for a plurality of times to obtain different numbers of point cloud sets, aggregating 16 nearest points around each point by utilizing a KNN nearest node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric perception neural network to extract local geometric features of the point;
the sparse point cloud features extracted from each point are combined with the original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 The second term represents S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
Preferably, the implementation of the fusion of the image features and the point cloud features by using the dynamic image message propagation module includes:
designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5
For L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5
T to be rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5
Preferably, the generating a preliminary depth complement map using a U-shaped codec based on a residual network includes:
designing a corresponding multi-scale decoder structure according to the encoder structure to form a U-Net structured encoder-decoder network;
feature map L generated by branching RGB 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1
Feature map M generated by branching point cloud 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2
Preferably, the inputting the preliminary predicted complement depth map to the confidence uncertainty estimation module, to implement final depth complement optimization, includes:
confidence map C to be generated 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2
Map of uncertainty F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
Compared with the prior art, the invention has the beneficial effects that:
the feature distribution in the true dense depth map is learned by utilizing the condition variation self-encoder, the color image and the sparse depth map are guided to generate more valuable depth features, the space structural features under different modes are captured by utilizing the point cloud features of the three-dimensional space, the geometric perception capability of a network is enhanced, auxiliary information is provided for predicting more accurate depth values, and the dynamic map message propagation module skillfully fuses the features between the color image and the point cloud to realize high-precision depth complement prediction, so that the problem that data acquired by a laser radar is too sparse can be overcome, the low-cost laser radar with fewer wire harnesses can obtain more accurate dense depth information, and a cost-effective solution is provided for industries needing accurate dense depth data such as automatic driving, robot environment perception and the like.
Drawings
FIG. 1 is a flow chart of a scene depth completion method based on a conditional variation self-encoder and geometric guidance provided by an embodiment of the invention;
fig. 2 is a depth complement diagram of a scene depth complement method based on a conditional variation self-encoder and geometric guidance according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The main execution body of the method in this embodiment is a terminal, and the terminal may be a device such as a mobile phone, a tablet computer, a PDA, a notebook or a desktop, but of course, may be another device with a similar function, and this embodiment is not limited thereto.
Referring to fig. 1 and 2, the present invention provides a scene depth completion method based on a conditional variation self-encoder and geometric guidance, which is applied to automatic driving scene depth completion, and includes:
step S1, a color image, a sparse depth map and a dense depth map in an automatic driving scene are obtained.
Specifically, step S1 further includes the following steps:
s101, capturing a color image and a sparse depth map in an automatic driving scene by using a color camera and a laser radar;
s102, changing the sparse depth map into a dense depth map by using a Sparsity Invariant CNNs algorithm as real tag auxiliary training.
The automatic driving vehicle is mainly provided with a color camera and a laser radar, RGB images and depth images are respectively acquired, and a full depth image is required to be additionally generated as a training label in the method, and the specific steps are as follows:
capturing color images and depth images in an autopilot scenario using a color camera and a Velodyne HDL-64E lidar; the sparse depth map is changed into a dense depth map as a real label by using a Sparsity Invariant CNNs algorithm.
And S2, designing a conditional variation self-encoder with a priori network and a posterior network, inputting the color image and the sparse depth map into the priori network to extract the features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract the features.
Specifically, step S2 further includes the following steps:
s201, designing a priori network and a posterior network with the same structure as a condition variation self-encoder based on a feature extraction module of a ResNet structure;
s202, inputting the color image and the sparse depth map into a priori network to extract a feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract a feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain probability distribution D of respective features 1 And D 2 Supervision distribution D by using Kullback-Leibler divergence loss function 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
And S3, converting the sparse depth map into point clouds by utilizing camera internal parameters or focal lengths, extracting a geometric space feature from a point cloud up-sampling model, and mapping the geometric space feature back to the sparse depth map.
Specifically, step S3 further includes the following steps:
s301, sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1
S302, carrying out random sampling on point clouds for a plurality of times to obtain point cloud sets with different numbers, aggregating 16 nearest points around each point by utilizing a KNN nearest neighbor node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric sense neural network to extract local geometric features of the point;
s303, extracting sparse point cloud features of each point and original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 The second term represents S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
And S4, realizing fusion of the image characteristics and the point cloud characteristics by adopting a dynamic image message transmission module.
Specifically, step S4 further includes the following steps:
s401, designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5
S402, for L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5
S403, T which is rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5
And S5, generating a preliminary depth complement diagram by using a U-shaped codec based on a residual network.
Specifically, step S5 further includes the following steps:
s501, designing a corresponding multi-scale decoder structure according to the encoder structure in the step S4, and forming a U-Net structured encoder-decoder network;
s502, feature map L generated by RGB branch 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1
S503, generating a feature map M by branching point clouds 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2
And S6, inputting the preliminarily predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
Specifically, step S6 further includes the following steps:
s601, the confidence map C generated in the step S5 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2
S602, uncertainty map F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
In the embodiment, firstly, feature distribution in a true dense depth map is learned by a conditional variation self-encoder, a color image and a sparse depth map are guided to generate more valuable depth features, secondly, space structure features under different modes are captured by utilizing point cloud features of a three-dimensional space, the geometric perceptibility of a network is enhanced, auxiliary information is provided for predicting more accurate depth values, and a dynamic map message propagation module skillfully fuses the features between the color image and the point cloud to realize high-precision depth complement prediction.
In addition, it should be noted that the combination of the technical features described in the present invention is not limited to the combination described in the claims or the combination described in the specific embodiments, and all the technical features described in the present invention may be freely combined or combined in any manner unless contradiction occurs between them.
It should be noted that the above-mentioned embodiments are merely examples of the present invention, and it is obvious that the present invention is not limited to the above-mentioned embodiments, and many similar variations are possible. All modifications attainable or obvious from the present disclosure set forth herein should be deemed to be within the scope of the present disclosure.
The foregoing is merely illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The scene depth completion method based on the conditional variation self-encoder and the geometric guidance is characterized by comprising the following steps of:
acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene;
the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features;
converting the sparse depth map into point cloud by utilizing camera internal parameters or focal length, extracting a point cloud up-sampling model to geometric space features, and mapping the geometric space features back to the sparse depth map;
the dynamic image information transmission module is adopted to realize the fusion of image characteristics and point cloud characteristics, and the specific steps of the fusion are as follows: designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5 The method comprises the steps of carrying out a first treatment on the surface of the For L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5 The method comprises the steps of carrying out a first treatment on the surface of the T to be rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5
Generating a preliminary depth complement diagram by using a U-shaped coder decoder based on a residual error network;
and inputting the preliminary predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
2. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein the acquiring of color image and sparse depth map in an autopilot scene comprises:
capturing a color image and a sparse depth map in an autopilot scene using a color camera and a lidar;
the sparse depth map is changed into a dense depth map by using a Sparsity Invariant CNNs algorithm to serve as a real tag to assist training.
3. The scene depth completion method based on conditional variance self-encoder and geometric guidance according to claim 2, wherein the designing the conditional variance self-encoder with a priori network and a posterior network, inputting the color image and sparse depth map into the a priori network to extract features, and then inputting the color image, sparse depth map and dense depth map into the posterior network to extract features, comprises:
the feature extraction module based on the ResNet structure designs a priori network and a posterior network with the same structure as a condition variation self-encoder;
inputting the color image and the sparse depth map into a priori network to extract the feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract the feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain the probability distribution D of the respective features 1 And D 2 Supervision distribution D by using Kullback-Leibler divergence loss function 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
4. A scene depth completion method based on a conditional variation self-encoder and geometric guidance according to claim 3, wherein said converting a sparse depth map into a point cloud using camera intrinsic parameters or focal lengths, extracting a point cloud upsampling model to geometric spatial features, and mapping back onto the sparse depth map comprises:
sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1
The method comprises the steps of randomly sampling point clouds for a plurality of times to obtain different numbers of point cloud sets, aggregating 16 nearest points around each point by utilizing a KNN nearest node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric perception neural network to extract local geometric features of the point;
the sparse point cloud features extracted from each point are combined with the original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 Sum of minimum distances of (2), second termRepresent S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
5. The conditional variance self-encoder and geometry guided scene depth completion method of claim 1, wherein generating a preliminary depth complement map using a residual network-based U-shaped codec comprises:
designing a corresponding multi-scale decoder structure according to the encoder structure to form a U-Net structured encoder-decoder network;
feature map L generated by branching RGB 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1
Feature map M generated by branching point cloud 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2
6. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein said inputting the preliminary predicted completion depth map to the confidence uncertainty estimation module, achieves final depth completion optimization, comprises:
confidence map C to be generated 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2
Map of uncertainty F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
CN202310422520.0A 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance Active CN116468768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310422520.0A CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310422520.0A CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Publications (2)

Publication Number Publication Date
CN116468768A CN116468768A (en) 2023-07-21
CN116468768B true CN116468768B (en) 2023-10-17

Family

ID=87183885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310422520.0A Active CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Country Status (1)

Country Link
CN (1) CN116468768B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351310B (en) * 2023-09-28 2024-03-12 山东大学 Multi-mode 3D target detection method and system based on depth completion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767294A (en) * 2021-01-14 2021-05-07 Oppo广东移动通信有限公司 Depth image enhancement method and device, electronic equipment and storage medium
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
WO2022045495A1 (en) * 2020-08-25 2022-03-03 Samsung Electronics Co., Ltd. Methods for depth map reconstruction and electronic computing device for implementing the same
CN114998406A (en) * 2022-07-14 2022-09-02 武汉图科智能科技有限公司 Self-supervision multi-view depth estimation method and device
CN115423978A (en) * 2022-08-30 2022-12-02 西北工业大学 Image laser data fusion method based on deep learning and used for building reconstruction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11315266B2 (en) * 2019-12-16 2022-04-26 Robert Bosch Gmbh Self-supervised depth estimation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022045495A1 (en) * 2020-08-25 2022-03-03 Samsung Electronics Co., Ltd. Methods for depth map reconstruction and electronic computing device for implementing the same
CN112767294A (en) * 2021-01-14 2021-05-07 Oppo广东移动通信有限公司 Depth image enhancement method and device, electronic equipment and storage medium
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
CN114998406A (en) * 2022-07-14 2022-09-02 武汉图科智能科技有限公司 Self-supervision multi-view depth estimation method and device
CN115423978A (en) * 2022-08-30 2022-12-02 西北工业大学 Image laser data fusion method based on deep learning and used for building reconstruction

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Depth Completion Auto-Encoder;Kaiyue Lu等;《2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)》;全文 *
Sparse-to-Dense Multi-Encoder Shape Completion of Unstructured Point Cloud;Yanjun Peng等;《IEEE Access》;第8卷;全文 *
周云成 ; 邓寒冰 ; 许童羽 ; 苗腾 ; 吴琼 ; .基于稠密自编码器的无监督番茄植株图像深度估计模型.农业工程学报.2020,(第11期),全文. *
王东敏 ; 彭永胜 ; 李永乐 ; .视觉与激光点云融合的深度图像获取方法.军事交通学院学报.2017,(第10期),全文. *
面向鲁棒和智能化的多源融合SLAM技术研究;左星星;《中国博士学位论文全文数据库 (信息科技辑)》;全文 *

Also Published As

Publication number Publication date
CN116468768A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111563923B (en) Method for obtaining dense depth map and related device
CN109377530B (en) Binocular depth estimation method based on depth neural network
WO2019223382A1 (en) Method for estimating monocular depth, apparatus and device therefor, and storage medium
CN110853075B (en) Visual tracking positioning method based on dense point cloud and synthetic view
US11455806B2 (en) System and method for free space estimation
CN116468768B (en) Scene depth completion method based on conditional variation self-encoder and geometric guidance
CN113284173B (en) End-to-end scene flow and pose joint learning method based on false laser radar
CN113421217A (en) Method and device for detecting travelable area
CN113012191B (en) Laser mileage calculation method based on point cloud multi-view projection graph
CN116740488B (en) Training method and device for feature extraction model for visual positioning
CN112270701A (en) Packet distance network-based parallax prediction method, system and storage medium
CN116703996A (en) Monocular three-dimensional target detection algorithm based on instance-level self-adaptive depth estimation
US20230377180A1 (en) Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
KR102299902B1 (en) Apparatus for providing augmented reality and method therefor
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN115222815A (en) Obstacle distance detection method, obstacle distance detection device, computer device, and storage medium
CN114155406A (en) Pose estimation method based on region-level feature fusion
US10896333B2 (en) Method and device for aiding the navigation of a vehicle
Ren et al. T-UNet: A novel TC-based point cloud super-resolution model for mechanical lidar
CN117422629B (en) Instance-aware monocular semantic scene completion method, medium and device
WO2024042704A1 (en) Learning device, image processing device, learning method, image processing method, and computer program
CN117523547B (en) Three-dimensional scene semantic perception method, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant