CN116468768B - Scene depth completion method based on conditional variation self-encoder and geometric guidance - Google Patents
Scene depth completion method based on conditional variation self-encoder and geometric guidance Download PDFInfo
- Publication number
- CN116468768B CN116468768B CN202310422520.0A CN202310422520A CN116468768B CN 116468768 B CN116468768 B CN 116468768B CN 202310422520 A CN202310422520 A CN 202310422520A CN 116468768 B CN116468768 B CN 116468768B
- Authority
- CN
- China
- Prior art keywords
- depth
- map
- point cloud
- depth map
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 238000013507 mapping Methods 0.000 claims abstract description 6
- 230000003287 optical effect Effects 0.000 claims abstract description 4
- 230000000295 complement effect Effects 0.000 claims description 23
- 238000010586 diagram Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01B—MEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
- G01B11/00—Measuring arrangements characterised by the use of optical techniques
- G01B11/24—Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10044—Radar image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Optics & Photonics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a scene depth completion method based on a conditional variation self-encoder and geometric guidance, which comprises the following steps: acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene; the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features; and converting the sparse depth map into point cloud by using camera internal parameters, namely focal length and optical center coordinates, extracting a geometric space feature from a point cloud up-sampling model, and mapping the geometric space feature back to the sparse depth map. The invention can solve the problem that the data acquired by the laser radar is too sparse, so that the low-cost laser radar with fewer wire harnesses can obtain more accurate and dense depth information, and provides a cost-effective solution for industries such as automatic driving, robot environment sensing and the like which need accurate and dense depth data.
Description
Technical Field
The invention relates to the technical field of depth map completion, in particular to a scene depth completion method based on a conditional variation self-encoder and geometric guidance.
Background
Human perception, understanding, and experience of the surrounding environment relies on visually acquired three-dimensional scene information. The computer vision simulates the human behavior, and various sensors are used as vision organs to acquire scene information, so that the scene is identified and understood, wherein the depth information plays a key role in the fields of robots, automatic driving, augmented reality and the like. In the field of automatic driving, the distance between the current vehicle and other vehicles, pedestrians, obstacles and the like needs to be perceived in the driving process, and the full-automatic Level5 is required to have the distance measuring capability accurate to centimeters. Currently, liDAR (LiDAR) is the primary active distance sensor in autopilot. Compared with a two-dimensional RGB image acquired by a color camera, a depth map acquired by a laser radar (the depth map and a point cloud can be mutually converted through camera internal parameters) has an accurate depth distance, so that the position information of a 3D target in the surrounding environment can be accurately perceived. However, one laser radar can only emit a limited laser beam with 16 lines, 32 lines or 64 lines in the vertical direction, so that the acquired point cloud density is extremely sparse (the pixels of the effective depth value only account for about 5% of the color image), and serious influence is brought to downstream tasks such as 3D target detection, three-dimensional environment perception and the like.
Disclosure of Invention
The invention aims to provide a scene depth completion method based on a conditional variation self-encoder and geometric guidance, so as to solve the key problems of data sparseness and missing caused by the existing depth imaging equipment such as a laser radar.
In order to achieve the above purpose, the present invention provides the following technical solutions: a scene depth completion method based on conditional variation self-encoder and geometric guidance, comprising the steps of:
acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene;
the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features;
converting the sparse depth map into point cloud by utilizing camera internal parameters or focal length, extracting a point cloud up-sampling model to geometric space features, and mapping the geometric space features back to the sparse depth map;
the fusion of the image characteristics and the point cloud characteristics is realized by adopting a dynamic image message transmission module;
generating a preliminary depth complement diagram by using a U-shaped coder decoder based on a residual error network;
and inputting the preliminary predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
Preferably, the acquiring the color image and the sparse depth map in the autopilot scene includes:
capturing a color image and a sparse depth map in an autopilot scene using a color camera and a lidar;
sparse depth maps are changed into dense depth maps by using sparsityInvariantCNNs algorithm as real tag auxiliary training.
Preferably, the design has a condition-variable self-encoder of a priori network and a posterior network, the color image and the sparse depth map are input into the priori network to extract features, and the color image, the sparse depth map and the dense depth map are input into the posterior network to extract features, including:
the feature extraction module based on the ResNet structure designs a priori network and a posterior network with the same structure as a condition variation self-encoder;
inputting the color image and the sparse depth map into a priori network to extract the feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract the feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain the probability distribution D of the respective features 1 And D 2 Then use the Kullback-Leibler divergence loss functionDigital supervision distribution D 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
Preferably, the converting the sparse depth map into a point cloud by using a camera reference or focal length, extracting a geometric spatial feature from a point cloud up-sampling model, and mapping the geometric spatial feature back to the sparse depth map includes:
sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1 ;
The method comprises the steps of randomly sampling point clouds for a plurality of times to obtain different numbers of point cloud sets, aggregating 16 nearest points around each point by utilizing a KNN nearest node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric perception neural network to extract local geometric features of the point;
the sparse point cloud features extracted from each point are combined with the original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 The second term represents S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
Preferably, the implementation of the fusion of the image features and the point cloud features by using the dynamic image message propagation module includes:
designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5 ;
For L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5 ;
T to be rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5 。
Preferably, the generating a preliminary depth complement map using a U-shaped codec based on a residual network includes:
designing a corresponding multi-scale decoder structure according to the encoder structure to form a U-Net structured encoder-decoder network;
feature map L generated by branching RGB 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1 ;
Feature map M generated by branching point cloud 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2 。
Preferably, the inputting the preliminary predicted complement depth map to the confidence uncertainty estimation module, to implement final depth complement optimization, includes:
confidence map C to be generated 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2 ;
Map of uncertainty F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
Compared with the prior art, the invention has the beneficial effects that:
the feature distribution in the true dense depth map is learned by utilizing the condition variation self-encoder, the color image and the sparse depth map are guided to generate more valuable depth features, the space structural features under different modes are captured by utilizing the point cloud features of the three-dimensional space, the geometric perception capability of a network is enhanced, auxiliary information is provided for predicting more accurate depth values, and the dynamic map message propagation module skillfully fuses the features between the color image and the point cloud to realize high-precision depth complement prediction, so that the problem that data acquired by a laser radar is too sparse can be overcome, the low-cost laser radar with fewer wire harnesses can obtain more accurate dense depth information, and a cost-effective solution is provided for industries needing accurate dense depth data such as automatic driving, robot environment perception and the like.
Drawings
FIG. 1 is a flow chart of a scene depth completion method based on a conditional variation self-encoder and geometric guidance provided by an embodiment of the invention;
fig. 2 is a depth complement diagram of a scene depth complement method based on a conditional variation self-encoder and geometric guidance according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The main execution body of the method in this embodiment is a terminal, and the terminal may be a device such as a mobile phone, a tablet computer, a PDA, a notebook or a desktop, but of course, may be another device with a similar function, and this embodiment is not limited thereto.
Referring to fig. 1 and 2, the present invention provides a scene depth completion method based on a conditional variation self-encoder and geometric guidance, which is applied to automatic driving scene depth completion, and includes:
step S1, a color image, a sparse depth map and a dense depth map in an automatic driving scene are obtained.
Specifically, step S1 further includes the following steps:
s101, capturing a color image and a sparse depth map in an automatic driving scene by using a color camera and a laser radar;
s102, changing the sparse depth map into a dense depth map by using a Sparsity Invariant CNNs algorithm as real tag auxiliary training.
The automatic driving vehicle is mainly provided with a color camera and a laser radar, RGB images and depth images are respectively acquired, and a full depth image is required to be additionally generated as a training label in the method, and the specific steps are as follows:
capturing color images and depth images in an autopilot scenario using a color camera and a Velodyne HDL-64E lidar; the sparse depth map is changed into a dense depth map as a real label by using a Sparsity Invariant CNNs algorithm.
And S2, designing a conditional variation self-encoder with a priori network and a posterior network, inputting the color image and the sparse depth map into the priori network to extract the features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract the features.
Specifically, step S2 further includes the following steps:
s201, designing a priori network and a posterior network with the same structure as a condition variation self-encoder based on a feature extraction module of a ResNet structure;
s202, inputting the color image and the sparse depth map into a priori network to extract a feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract a feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain probability distribution D of respective features 1 And D 2 Supervision distribution D by using Kullback-Leibler divergence loss function 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
And S3, converting the sparse depth map into point clouds by utilizing camera internal parameters or focal lengths, extracting a geometric space feature from a point cloud up-sampling model, and mapping the geometric space feature back to the sparse depth map.
Specifically, step S3 further includes the following steps:
s301, sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1 ;
S302, carrying out random sampling on point clouds for a plurality of times to obtain point cloud sets with different numbers, aggregating 16 nearest points around each point by utilizing a KNN nearest neighbor node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric sense neural network to extract local geometric features of the point;
s303, extracting sparse point cloud features of each point and original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 The second term represents S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
And S4, realizing fusion of the image characteristics and the point cloud characteristics by adopting a dynamic image message transmission module.
Specifically, step S4 further includes the following steps:
s401, designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5 ;
S402, for L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5 ;
S403, T which is rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5 。
And S5, generating a preliminary depth complement diagram by using a U-shaped codec based on a residual network.
Specifically, step S5 further includes the following steps:
s501, designing a corresponding multi-scale decoder structure according to the encoder structure in the step S4, and forming a U-Net structured encoder-decoder network;
s502, feature map L generated by RGB branch 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1 ;
S503, generating a feature map M by branching point clouds 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2 。
And S6, inputting the preliminarily predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
Specifically, step S6 further includes the following steps:
s601, the confidence map C generated in the step S5 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2 ;
S602, uncertainty map F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
In the embodiment, firstly, feature distribution in a true dense depth map is learned by a conditional variation self-encoder, a color image and a sparse depth map are guided to generate more valuable depth features, secondly, space structure features under different modes are captured by utilizing point cloud features of a three-dimensional space, the geometric perceptibility of a network is enhanced, auxiliary information is provided for predicting more accurate depth values, and a dynamic map message propagation module skillfully fuses the features between the color image and the point cloud to realize high-precision depth complement prediction.
In addition, it should be noted that the combination of the technical features described in the present invention is not limited to the combination described in the claims or the combination described in the specific embodiments, and all the technical features described in the present invention may be freely combined or combined in any manner unless contradiction occurs between them.
It should be noted that the above-mentioned embodiments are merely examples of the present invention, and it is obvious that the present invention is not limited to the above-mentioned embodiments, and many similar variations are possible. All modifications attainable or obvious from the present disclosure set forth herein should be deemed to be within the scope of the present disclosure.
The foregoing is merely illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. The scene depth completion method based on the conditional variation self-encoder and the geometric guidance is characterized by comprising the following steps of:
acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene;
the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features;
converting the sparse depth map into point cloud by utilizing camera internal parameters or focal length, extracting a point cloud up-sampling model to geometric space features, and mapping the geometric space features back to the sparse depth map;
the dynamic image information transmission module is adopted to realize the fusion of image characteristics and point cloud characteristics, and the specific steps of the fusion are as follows: designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5 The method comprises the steps of carrying out a first treatment on the surface of the For L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5 The method comprises the steps of carrying out a first treatment on the surface of the T to be rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5 ;
Generating a preliminary depth complement diagram by using a U-shaped coder decoder based on a residual error network;
and inputting the preliminary predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
2. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein the acquiring of color image and sparse depth map in an autopilot scene comprises:
capturing a color image and a sparse depth map in an autopilot scene using a color camera and a lidar;
the sparse depth map is changed into a dense depth map by using a Sparsity Invariant CNNs algorithm to serve as a real tag to assist training.
3. The scene depth completion method based on conditional variance self-encoder and geometric guidance according to claim 2, wherein the designing the conditional variance self-encoder with a priori network and a posterior network, inputting the color image and sparse depth map into the a priori network to extract features, and then inputting the color image, sparse depth map and dense depth map into the posterior network to extract features, comprises:
the feature extraction module based on the ResNet structure designs a priori network and a posterior network with the same structure as a condition variation self-encoder;
inputting the color image and the sparse depth map into a priori network to extract the feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract the feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain the probability distribution D of the respective features 1 And D 2 Supervision distribution D by using Kullback-Leibler divergence loss function 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
4. A scene depth completion method based on a conditional variation self-encoder and geometric guidance according to claim 3, wherein said converting a sparse depth map into a point cloud using camera intrinsic parameters or focal lengths, extracting a point cloud upsampling model to geometric spatial features, and mapping back onto the sparse depth map comprises:
sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1 ;
The method comprises the steps of randomly sampling point clouds for a plurality of times to obtain different numbers of point cloud sets, aggregating 16 nearest points around each point by utilizing a KNN nearest node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric perception neural network to extract local geometric features of the point;
the sparse point cloud features extracted from each point are combined with the original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 Sum of minimum distances of (2), second termRepresent S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
5. The conditional variance self-encoder and geometry guided scene depth completion method of claim 1, wherein generating a preliminary depth complement map using a residual network-based U-shaped codec comprises:
designing a corresponding multi-scale decoder structure according to the encoder structure to form a U-Net structured encoder-decoder network;
feature map L generated by branching RGB 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1 ;
Feature map M generated by branching point cloud 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2 。
6. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein said inputting the preliminary predicted completion depth map to the confidence uncertainty estimation module, achieves final depth completion optimization, comprises:
confidence map C to be generated 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2 ;
Map of uncertainty F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310422520.0A CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310422520.0A CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116468768A CN116468768A (en) | 2023-07-21 |
CN116468768B true CN116468768B (en) | 2023-10-17 |
Family
ID=87183885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310422520.0A Active CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116468768B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117351310B (en) * | 2023-09-28 | 2024-03-12 | 山东大学 | Multi-mode 3D target detection method and system based on depth completion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112767294A (en) * | 2021-01-14 | 2021-05-07 | Oppo广东移动通信有限公司 | Depth image enhancement method and device, electronic equipment and storage medium |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
WO2022045495A1 (en) * | 2020-08-25 | 2022-03-03 | Samsung Electronics Co., Ltd. | Methods for depth map reconstruction and electronic computing device for implementing the same |
CN114998406A (en) * | 2022-07-14 | 2022-09-02 | 武汉图科智能科技有限公司 | Self-supervision multi-view depth estimation method and device |
CN115423978A (en) * | 2022-08-30 | 2022-12-02 | 西北工业大学 | Image laser data fusion method based on deep learning and used for building reconstruction |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11315266B2 (en) * | 2019-12-16 | 2022-04-26 | Robert Bosch Gmbh | Self-supervised depth estimation method and system |
-
2023
- 2023-04-20 CN CN202310422520.0A patent/CN116468768B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022045495A1 (en) * | 2020-08-25 | 2022-03-03 | Samsung Electronics Co., Ltd. | Methods for depth map reconstruction and electronic computing device for implementing the same |
CN112767294A (en) * | 2021-01-14 | 2021-05-07 | Oppo广东移动通信有限公司 | Depth image enhancement method and device, electronic equipment and storage medium |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
CN114998406A (en) * | 2022-07-14 | 2022-09-02 | 武汉图科智能科技有限公司 | Self-supervision multi-view depth estimation method and device |
CN115423978A (en) * | 2022-08-30 | 2022-12-02 | 西北工业大学 | Image laser data fusion method based on deep learning and used for building reconstruction |
Non-Patent Citations (5)
Title |
---|
Depth Completion Auto-Encoder;Kaiyue Lu等;《2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)》;全文 * |
Sparse-to-Dense Multi-Encoder Shape Completion of Unstructured Point Cloud;Yanjun Peng等;《IEEE Access》;第8卷;全文 * |
周云成 ; 邓寒冰 ; 许童羽 ; 苗腾 ; 吴琼 ; .基于稠密自编码器的无监督番茄植株图像深度估计模型.农业工程学报.2020,(第11期),全文. * |
王东敏 ; 彭永胜 ; 李永乐 ; .视觉与激光点云融合的深度图像获取方法.军事交通学院学报.2017,(第10期),全文. * |
面向鲁棒和智能化的多源融合SLAM技术研究;左星星;《中国博士学位论文全文数据库 (信息科技辑)》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116468768A (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111563923B (en) | Method for obtaining dense depth map and related device | |
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
WO2019223382A1 (en) | Method for estimating monocular depth, apparatus and device therefor, and storage medium | |
CN110853075B (en) | Visual tracking positioning method based on dense point cloud and synthetic view | |
US11455806B2 (en) | System and method for free space estimation | |
CN116468768B (en) | Scene depth completion method based on conditional variation self-encoder and geometric guidance | |
CN113284173B (en) | End-to-end scene flow and pose joint learning method based on false laser radar | |
CN113421217A (en) | Method and device for detecting travelable area | |
CN113012191B (en) | Laser mileage calculation method based on point cloud multi-view projection graph | |
CN116740488B (en) | Training method and device for feature extraction model for visual positioning | |
CN112270701A (en) | Packet distance network-based parallax prediction method, system and storage medium | |
CN116703996A (en) | Monocular three-dimensional target detection algorithm based on instance-level self-adaptive depth estimation | |
US20230377180A1 (en) | Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints | |
CN116485892A (en) | Six-degree-of-freedom pose estimation method for weak texture object | |
CN116129234A (en) | Attention-based 4D millimeter wave radar and vision fusion method | |
KR102299902B1 (en) | Apparatus for providing augmented reality and method therefor | |
CN115239559A (en) | Depth map super-resolution method and system for fusion view synthesis | |
CN115330935A (en) | Three-dimensional reconstruction method and system based on deep learning | |
CN115222815A (en) | Obstacle distance detection method, obstacle distance detection device, computer device, and storage medium | |
CN114155406A (en) | Pose estimation method based on region-level feature fusion | |
US10896333B2 (en) | Method and device for aiding the navigation of a vehicle | |
Ren et al. | T-UNet: A novel TC-based point cloud super-resolution model for mechanical lidar | |
CN117422629B (en) | Instance-aware monocular semantic scene completion method, medium and device | |
WO2024042704A1 (en) | Learning device, image processing device, learning method, image processing method, and computer program | |
CN117523547B (en) | Three-dimensional scene semantic perception method, system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |