CN116862965A - Depth completion method based on sparse representation - Google Patents
Depth completion method based on sparse representation Download PDFInfo
- Publication number
- CN116862965A CN116862965A CN202310836476.8A CN202310836476A CN116862965A CN 116862965 A CN116862965 A CN 116862965A CN 202310836476 A CN202310836476 A CN 202310836476A CN 116862965 A CN116862965 A CN 116862965A
- Authority
- CN
- China
- Prior art keywords
- sampling
- depth map
- depth
- map
- uncertainty
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000005070 sampling Methods 0.000 claims abstract description 65
- 238000012549 training Methods 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000013461 design Methods 0.000 claims abstract description 12
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 47
- 230000000295 complement effect Effects 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 3
- 241000412626 Penetes Species 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a depth completion method based on sparse representation, and belongs to the technical field of image processing. The invention designs a self-adaptive sampling mode for capturing important sampling points, which is beneficial to reconstructing a denser depth map by a network, and specifically comprises the following steps: s1, outputting an uncertainty graph from an RGB image through a sampling network; s2, acquiring a sampled sparse depth map based on an uncertainty sampling process; s3, building a reconstructed neural network structure, inputting the RGB image and the sampled sparse depth map into a reconstruction network, training, and recovering a dense depth map.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a depth complement method based on sparse representation.
Background
In recent years, with the continuous development of the field of computer vision, it is important to obtain effective depth information for the fields of automatic driving, robots, augmented reality and the like. In an actual scene, depth information is often acquired through an RGB-D depth camera or a laser radar and other modes, for the laser radar, due to the limitation of hardware equipment, the depth map of an image plane projected by point cloud acquired by original laser radar scanning is sparse, and more depth maps are needed in a real application scene, so that environment information can be perceived conveniently. The process of deriving a dense depth map from sparse depth maps is called depth complement.
For the deep complement task, with the vigorous development of deep learning, many scientific researchers have studied and innovated on the task in recent years, and the task is mainly divided into two aspects: on one hand, only the sparse depth map is used as the input of the neural network, and a dense depth map is reconstructed; on the other hand, the rich semantic information of RGB is combined as a guide to realize the completion work of the sparse depth map. In an actual application scene, the method is more popular by combining the semantic information of RGB as a guiding method for depth completion. In general, the RGB picture and the sparse depth map are directly used as inputs of the network, and if a better effect can be reconstructed by using relatively few sampling points, the cost is saved, so that the cost performance is higher. Previous researchers have typically chosen to randomly sample the original real depth map to obtain some sparse samples, but the samples obtained in this case do not reflect well the edge information of some objects in the real depth map, so there are few flaws in the reconstructed effect.
In summary, the invention provides a depth complement method based on sparse representation.
Disclosure of Invention
The invention aims to provide a depth complement method based on sparse representation to solve the problems in the background technology.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a depth completion method based on sparse representation specifically comprises the following steps:
s1, selecting RGB images from a plurality of disclosed depth complement data sets, and inputting the RGB images into a sampling network to output an uncertainty map;
s2, acquiring a sampled sparse depth map based on a sampling process of the uncertainty map;
s3, designing a reconstructed neural network structure, building an image reconstruction network, inputting the RGB image and the sampled sparse depth map obtained in the S2 into the reconstruction network together for training, and recovering a dense depth map.
Preferably, the sampling network in S1 adopts a U-net type structural design and consists of an encoder and a decoder;
the encoder comprises four residual modules, wherein each residual module consists of a feature extraction and downsampling module and a feature holding module, and the feature extraction and downsampling module is used for extracting high-latitude features of an image and completing downsampling of the image; the feature holding module is used for ensuring that the feature map is further deepened under the condition that the resolution is not lost;
the decoder consists of four receptive field dominant up-sampling modules;
a jump connection is arranged between the encoder and the decoder to further promote feature fusion, and finally an uncertainty graph is output.
Preferably, the uncertainty graph represents a sampling logic relationship with high uncertainty, i.e. low probability of representing sampling; based on this, the S2 specifically includes the following:
s2.1, assuming that the size of the depth map D is M multiplied by n, defining elements in the binarized sampling mask M of the depth map D as:
in the formula ,pi,j A sampling probability of a pixel representing the (i, j) position;
the sampling process of the depth map D is defined as:
S=D·M
wherein S represents a sparse depth map after sampling of the depth map D; dot product representing pixel level;
s2.2, assuming that the size of the uncertainty map P is also M n, generating a binary sampling mask M' from the uncertainty map based on S2.1: firstly, generating a random matrix R with m multiplied by n, wherein the element values in the matrix are random values between 0 and 1, and marking the random value as R i,j ∈[0,1];
S2.3, the sampling probability of the pixel of the element in the uncertainty diagram is denoted as p' i,j ,p’ i,j ∈[0,1]The method comprises the steps of carrying out a first treatment on the surface of the Will r i,j And p' i,j Comparing to satisfy r i,j ≤p’ i,j The corresponding binary mask position for the condition is set to 1, otherwise to 0, expressed as:
s2.4, based on the contents from S2.1 to S2.3, defining a sampling process of the uncertainty map P as follows:
S’=P·M’
wherein S' represents a sparse depth map after the uncertainty map P is sampled; dot product representing pixel level.
Preferably, the reconstruction network in the S3 consists of a multi-scale convolution module, a parallel double-flow encoding-decoding structure and an adaptive fusion mechanism, and dense depth map reconstruction is realized by the reconstruction network through cross-channel information interaction;
the loss function design of the sampling part in the reconstruction network training process specifically comprises the following contents:
(1) l2Loss of Loss function L prob : the l is prob The specific function used to supervise the generated uncertainty graph is expressed as:
wherein ,representing a Sobel gradient operation symbol;
(2) regular loss function l reg : the l is reg To constrain the training process, the specific functions are expressed as:
wherein N represents the total number of depth image pixels available; s represents the sampling point number for sampling;
(3) sampling part total loss function sample : the l is sample The specific function is expressed as:
l sample =l prob +αl reg
wherein α represents a weighting coefficient;
the loss function design of the reconstruction part in the reconstruction network training process specifically comprises the following contents:
(1) reconstructed depth map D * Gradient loss term L based on L1 loss between the depth map D and the true depth map D grad : the l is grad To reduce the error in computing depth gradients, the specific function is expressed as:
(2) surface normal loss function l norm : the l is norm To further deepen the localization details, the specific functions are expressed as:
wherein < > represents the inner product of the vector;
(3) reconstructing a partial total loss function l rec : the l is rec The specific function is expressed as:
wherein ,w1 、w 2 、、w 3 、w 4 The weighting coefficients representing the different parts;
the loss function design of the sampling part and the reconstruction part is integrated, and the total loss function l of the reconstruction network can be obtained final The method comprises the following steps:
l final =l rec +βl sample 。
where β represents a weighting coefficient.
Compared with the prior art, the invention provides a depth complement method based on sparse representation, which has the following beneficial effects:
(1) The invention can be improved to a greater extent on the aspect of sampling strategies, and can effectively utilize the sampling points of good edge parts under the condition that the edge depth information of an object cannot be effectively captured by a traditional random sampling mode.
(2) The method provided by the invention can reconstruct a denser effect, thereby opening a new idea for reducing the equipment cost in the actual application scene.
Drawings
FIG. 1 is an overall block diagram of a depth completion method based on sparse representation;
fig. 2 is an effect diagram of an example of embodiment 1 of the present invention, and each column sequentially includes, from left to right, input RGB, randomly sampled sampling points, an uncertainty map, adaptively sampled sampling points, a reconstructed dense depth map, and an original real depth map.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
The invention uses an indoor data set NYUDEPthV2 and an outdoor data set KITTI as experimental data sets, wherein the NYUDEPthV2 data set is a video sequence of an indoor scene shot by an RGBD camera of Microsoft Kinect, and comprises about 5 ten thousand indoor RGB-D image pairs collected under 464 different indoor scenes. The present invention uses official data set partitioning, with 249 scenes for training and the remaining 215 for testing. The invention first downsamples the RGB-D image pair from the original 640 x 480 resolution to 320 x 240. Since the boundaries of the original depth map contain no measurements, the present invention only evaluates the 304 x 228 center cropped region. In the KITTI dataset, the training set contains 85898 RGB-D image pairs, the validation set contains 1000 RGB-D image pairs, and the other 1000 frames are used for the test set. The dataset provides an RGB image and an aligned sparse depth map obtained by projecting 3D LiDAR points onto corresponding image frames, where the color image and depth image are at the same resolution of 352 x 1216. The original 64-line laser radar scanning depth map has about 5% of effective pixels, and the ground real semi-dense depth map has about 15% of effective pixels. Since there are invalid pixels at the upper boundary of the depth map, the crop is 256 x 1216 in size for the training and testing stage. Specific examples are as follows.
Example 1:
referring to fig. 1-2, the invention provides a depth complement method based on sparse representation, which specifically comprises the following steps:
s1, selecting RGB images from a plurality of disclosed depth complement data sets, and inputting the RGB images into a sampling network to output an uncertainty map;
the sampling network adopts a U-net type structural design and consists of an encoder and a decoder;
the encoder comprises four residual modules, wherein each residual module consists of a feature extraction downsampling module and a feature holding module, and the feature extraction downsampling module is used for extracting high-latitude features of the image and completing downsampling of the image; the feature holding module is used for ensuring that the feature map is further deepened under the condition that the resolution is not lost; the 1*1 convolution and 3*3 convolution are mainly used, and the corresponding batch normalization and RELU activation functions;
the decoder consists of four receptive field dominant up-sampling modules;
jump connection is arranged between the encoder and the decoder to further promote feature fusion, and finally an uncertainty graph is output;
s2, acquiring a sampled sparse depth map based on a sampling process of the uncertainty map;
the uncertainty graph represents a sampling logic relationship, and has high uncertainty, namely represents low sampling probability; based on this, S2 specifically includes the following:
s2.1, assuming that the size of the depth map D is M multiplied by n, defining elements in the binarized sampling mask M of the depth map D as:
in the formula ,pi,j A sampling probability of a pixel representing the (i, j) position;
the sampling process of the depth map D is defined as:
S=D·M
wherein S represents a sparse depth map after sampling of the depth map D; dot product representing pixel level;
s2.2, assuming that the size of the uncertainty map P is also M n, generating a binary sampling mask M' from the uncertainty map based on S2.1: firstly, generating a random matrix R with m multiplied by n, wherein the element values in the matrix are random values between 0 and 1, and marking the random value as R i,j ∈[0,1];
S2.3, the sampling probability of the pixel of the element in the uncertainty diagram is denoted as p' i,j ,p’ i,j ∈[0,1]The method comprises the steps of carrying out a first treatment on the surface of the Will ber i,j And p' i,j Comparing to satisfy r i,j ≤p’ i,j The corresponding binary mask position for the condition is set to 1, otherwise to 0, expressed as:
s2.4, based on the contents of S2.1-S2.3, defining the sampling process of the uncertainty map P as follows:
S’=P·M’
wherein S' represents a sparse depth map after the uncertainty map P is sampled; dot product representing pixel level;
s3, designing a reconstructed neural network structure, constructing an image reconstruction network, inputting the RGB image and the sampled sparse depth map obtained in the S2 into the reconstruction network together for training, and recovering a dense depth map; the multi-scale convolution module is mainly formed by fusing convolution kernels with the sizes of 1 and 3, and comprises addition of element levels and cascading in the channel direction, so that features are extracted more effectively. The coding part of the parallel double-flow coding and decoding network is mainly a pre-trained MobileNet V3-large, and has the main advantages of light weight, capability of guaranteeing the effectiveness of extracted features, and the network of the decoding part is also an up-sampling module guided by a receptive field. A channel attention mechanism is employed between the encoder and decoder to facilitate further fusion of features. Finally, calculating the respective characteristic weights of the double streams through a sigmoid activation function, and finally obtaining the reconstructed depth map.
Connecting the sampled sparse depth map with the RGB image in the channel dimension, and inputting the sparse depth map and the RGB image into a reconstruction network into an image with 4 channels;
the reconstruction network consists of a multi-scale convolution module, a parallel double-flow encoding-decoding structure and a self-adaptive fusion mechanism, and dense depth map reconstruction is realized by the reconstruction network through cross-channel information interaction;
the loss function design of the sampling part in the reconstruction network training process specifically comprises the following contents:
(1) l2Loss of Loss function L prob :l prob The specific function used to supervise the generated uncertainty graph is expressed as:
wherein ,representing a Sobel gradient operation symbol;
(2) regular loss function l reg :l reg To constrain the training process, the specific functions are expressed as:
wherein N represents the total number of depth image pixels available; s represents the sampling point number for sampling;
(3) sampling part total loss function sample :l sample The specific function is expressed as:
l sample =l prob +αl reg
wherein α represents a weighting coefficient;
the loss function design of the reconstruction part in the reconstruction network training process specifically comprises the following contents:
(1) reconstructed depth map D * Gradient loss term L based on L1 loss between the depth map D and the true depth map D grad :l grad To reduce the error in computing depth gradients, the specific function is expressed as:
(2) surface normal loss function l norm :l norm To further deepen the localization details, the specific functions are expressed as:
wherein < > represents the inner product of the vector;
(3) reconstructing a partial total loss function l rec :l rec The specific function is expressed as:
wherein ,w1 、w 2 、、w 3 、w 4 The weighting coefficients representing the different parts;
the loss function design of the sampling part and the reconstruction part is integrated, and the total loss function l of the reconstruction network can be obtained final The method comprises the following steps:
l final =l rec +βl sample 。
where β represents a weighting coefficient.
In the invention, an Adam optimizer is used, and the parameter is set as beta 1 =0.9,β 2 =0.999 and set the weight decay to 10 -5 . The initial learning rate was set to 0.001. The invention uses a deep learning framework Pytorch training model to train 20 cycles on the whole training set in total, and the learning rate is adjusted to 80% before training every 2 cycles. In addition, to prevent overfitting while improving overall model performance, two data enhancement strategies are used herein to increase the diversity of training data, including:
random horizontal flip: both the color image and the depth image are flipped with a 50% probability level;
random channel switching: the RGB three channels of the color image are randomly swapped with 50% probability.
Example 2:
based on example 1 but with the difference that:
the invention selects 10 advanced comparison methods for training on the KITTI data set, comprising the following steps: ACMNet, sparse-to-Dense, CSPN, deepLIDAR, NConv-CNN, MSG-CHN, guideNet, uncertainty, PENet and AdaptiveLIDAR. For both 256 samples and 512 samples, see tables 1 and 2 for specific results.
Table 1 comparison of 256 sample point results
Table 2 comparison of 512 sample point results
Method | RMSE | MAE | iRMSE | iMAE | REL | δ 1.25 |
ACMNet | 3417.34 | 1413.75 | 15.54 | 7.91 | 0.086 | 90.4 |
Sparse-to-Dense | 2151.12 | 659.15 | 4.41 | 2.29 | 0.033 | 98.2 |
CSPN | 1828.93 | 506.39 | 4.04 | 2.16 | 0.023 | 99.0 |
DeepLIDAR | 1735.29 | 543.62 | 3.96 | 1.97 | 0.025 | 98.9 |
NConv-CNN | 1973.10 | 508.64 | 4.27 | 2.14 | 0.025 | 98.7 |
MSG-CHN | 1862.38 | 591.95 | 4.13 | 2.14 | 0.029 | 98.6 |
GuideNet | 1787.59 | 554.37 | 3.98 | 2.06 | 0.028 | 99.0 |
Uncertainty | 1771.60 | 568.18 | 4.08 | 2.16 | 0.026 | 98.8 |
PENet | 1842.54 | 597.16 | 4.31 | 2.29 | 0.030 | 98.5 |
AdaptiveLIDAR | 1789.41 | 590.62 | 3.92 | 1.89 | 0.027 | 98.7 |
Ours | 1346.79 | 446.57 | 4.70 | 2.15 | 0.025 | 99.2 |
As shown in tables 1 and 2, the tables show the values at RMSE, MAE, iRMSE, REL and delta 1.25 Quantitative comparison results on indexes, wherein RMSE, MAE, iRMSE, REL is smaller and better, delta 1.25 The larger the better; as can be seen from the table, the method of the present invention is important for RMSE, MAE, delta 1.25 The index can reach the best effect.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Claims (4)
1. The depth completion method based on sparse representation is characterized by comprising the following steps of:
s1, selecting RGB images from a plurality of disclosed depth complement data sets, and inputting the RGB images into a sampling network to output an uncertainty map;
s2, acquiring a sampled sparse depth map based on a sampling process of the uncertainty map;
s3, designing a reconstructed neural network structure, building an image reconstruction network, inputting the RGB image and the sampled sparse depth map obtained in the S2 into the reconstruction network together for training, and recovering a dense depth map.
2. The sparse representation-based depth completion method of claim 1, wherein the sampling network in S1 is designed by a U-net type structure, and is composed of an encoder and a decoder;
the encoder comprises four residual modules, wherein each residual module consists of a feature extraction and downsampling module and a feature holding module, and the feature extraction and downsampling module is used for extracting high-latitude features of an image and completing downsampling of the image; the feature holding module is used for ensuring that the feature map is further deepened under the condition that the resolution is not lost;
the decoder consists of four receptive field dominant up-sampling modules;
a jump connection is arranged between the encoder and the decoder to further promote feature fusion, and finally an uncertainty graph is output.
3. A depth completion method based on sparse representation according to claim 1, wherein said uncertainty map represents a sampling logic relationship with high uncertainty, i.e. low probability of representing sampling; based on this, the S2 specifically includes the following:
s2.1, assuming that the size of the depth map D is M multiplied by n, defining elements in the binarized sampling mask M of the depth map D as:
in the formula ,pi,j A sampling probability of a pixel representing the (i, j) position;
the sampling process of the depth map D is defined as:
S=D·M
wherein S represents a sparse depth map after sampling of the depth map D; dot product representing pixel level;
s2.2, assuming that the size of the uncertainty map P is also M n, generating a binary sampling mask M' from the uncertainty map based on S2.1: firstly, generating a random matrix R with m multiplied by n, wherein the element values in the matrix are random values between 0 and 1, and marking the random value as R i,j ∈[0,1];
S2.3, the sampling probability of the pixel of the element in the uncertainty diagram is denoted as p' i,j ,p’ i,j ∈[0,1]The method comprises the steps of carrying out a first treatment on the surface of the Will r i,j And p' i,j Comparing to satisfy r i,j ≤p’ i,j The corresponding binary mask position for the condition is set to 1, otherwise to 0, expressed as:
s2.4, based on the contents from S2.1 to S2.3, defining a sampling process of the uncertainty map P as follows:
S’=P·M’
wherein S' represents a sparse depth map after the uncertainty map P is sampled; dot product representing pixel level.
4. The sparse representation-based depth completion method of claim 1, wherein the reconstruction network in S3 consists of a multi-scale convolution module, a parallel double-stream encoding-decoding structure and an adaptive fusion mechanism, and the reconstruction network realizes dense depth map reconstruction through cross-channel information interaction;
the loss function design of the sampling part in the reconstruction network training process specifically comprises the following contents:
(1) l2Loss of Loss function L prob : the l is prob The specific function used to supervise the generated uncertainty graph is expressed as:
wherein ,representing a Sobel gradient operation symbol;
(2) regular loss function l reg : the l is reg To constrain the training process, the specific functions are expressed as:
wherein N represents the total number of depth image pixels available; s represents the sampling point number for sampling;
(3) sampling part total loss function sample : the l is sample The specific function is expressed as:
l sample =l prob +αl reg
wherein α represents a weighting coefficient;
the loss function design of the reconstruction part in the reconstruction network training process specifically comprises the following contents:
(1) reconstructed depth map D * Gradient loss term L based on L1 loss between the depth map D and the true depth map D grad : the l is grad To reduce the error in computing depth gradients, the specific function is expressed as:
(2) surface normal loss function l norm : the l is norm To further deepen the localization details, the specific functions are expressed as:
wherein < > represents the inner product of the vector;
(3) reconstructing a partial total loss function l rec : the l is rec The specific function is expressed as:
l rec =w1[l 1 (D low * ,D)+l grad (D low * ,D)+l norm (D low * ,D)]+w 2 l 1 (D up * ,D)+l 1 (D * ,D)+w 3 l grad (D * ,D)+w 4 (D * ,D)
wherein ,w1 、w 2 、、w 3 、w 4 The weighting coefficients representing the different parts;
the loss function design of the sampling part and the reconstruction part is integrated, and the total loss function l of the reconstruction network can be obtained final The method comprises the following steps:
l finai =l rec +βl sample 。
where β represents a weighting coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310836476.8A CN116862965A (en) | 2023-07-08 | 2023-07-08 | Depth completion method based on sparse representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310836476.8A CN116862965A (en) | 2023-07-08 | 2023-07-08 | Depth completion method based on sparse representation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116862965A true CN116862965A (en) | 2023-10-10 |
Family
ID=88229944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310836476.8A Pending CN116862965A (en) | 2023-07-08 | 2023-07-08 | Depth completion method based on sparse representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116862965A (en) |
-
2023
- 2023-07-08 CN CN202310836476.8A patent/CN116862965A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Golts et al. | Unsupervised single image dehazing using dark channel prior loss | |
Zhang et al. | Semantic image inpainting with progressive generative networks | |
Remez et al. | Class-aware fully convolutional Gaussian and Poisson denoising | |
CN109711413B (en) | Image semantic segmentation method based on deep learning | |
CN112233038B (en) | True image denoising method based on multi-scale fusion and edge enhancement | |
CN111325751A (en) | CT image segmentation system based on attention convolution neural network | |
CN110751649B (en) | Video quality evaluation method and device, electronic equipment and storage medium | |
CN111325165B (en) | Urban remote sensing image scene classification method considering spatial relationship information | |
CN110689599B (en) | 3D visual saliency prediction method based on non-local enhancement generation countermeasure network | |
CN113392711B (en) | Smoke semantic segmentation method and system based on high-level semantics and noise suppression | |
CN112241939B (en) | Multi-scale and non-local-based light rain removal method | |
CN116309648A (en) | Medical image segmentation model construction method based on multi-attention fusion | |
CN114119975A (en) | Language-guided cross-modal instance segmentation method | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
CN115565056A (en) | Underwater image enhancement method and system based on condition generation countermeasure network | |
CN115330620A (en) | Image defogging method based on cyclic generation countermeasure network | |
CN111882516B (en) | Image quality evaluation method based on visual saliency and deep neural network | |
Liu et al. | Deep image inpainting with enhanced normalization and contextual attention | |
Zhao et al. | Self-supervised representation learning for RGB-D salient object detection | |
CN114943894A (en) | ConvCRF-based high-resolution remote sensing image building extraction optimization method | |
CN104123707B (en) | Local rank priori based single-image super-resolution reconstruction method | |
CN116862965A (en) | Depth completion method based on sparse representation | |
CN111861877A (en) | Method and apparatus for video hyper-resolution | |
CN113436198A (en) | Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction | |
CN113139899A (en) | Design method of high-quality light-weight super-resolution reconstruction network model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |