CN113570658A - Monocular video depth estimation method based on depth convolutional network - Google Patents
Monocular video depth estimation method based on depth convolutional network Download PDFInfo
- Publication number
- CN113570658A CN113570658A CN202110648477.0A CN202110648477A CN113570658A CN 113570658 A CN113570658 A CN 113570658A CN 202110648477 A CN202110648477 A CN 202110648477A CN 113570658 A CN113570658 A CN 113570658A
- Authority
- CN
- China
- Prior art keywords
- network
- depth
- error
- sub
- estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 238000009499 grossing Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000003287 optical effect Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 7
- 238000004088 simulation Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of video processing, and discloses a monocular video depth estimation method based on a depth convolution network, which comprises the following steps: acquiring training data and a monocular video to be tested; constructing a depth estimation network model, wherein the depth estimation network model comprises a depth prediction sub-network and a camera pose estimation sub-network, and a decoder comprises an up-sampling module and a dense hole pyramid module; performing joint training on the depth prediction sub-network and the camera pose estimation sub-network by using training data, and performing iterative updating on network parameters of the two sub-networks by using a loss function; a depth map of the monocular video to be tested is estimated. The invention utilizes more space information of the original image, and effectively improves the precision of depth prediction.
Description
Technical Field
The invention belongs to the technical field of video processing, and further relates to a monocular video depth estimation method based on a depth convolution network, which can be used for three-dimensional reconstruction, robot navigation and automatic driving.
Background
Depth estimation is indispensable in many tasks, such as three-dimensional reconstruction, automatic driving, robot navigation, and other important fields. The binocular depth estimation algorithm is the most common depth estimation algorithm at present, which estimates depth by simulating human eyes and using parallax between pictures of different visual angles taken by a stereo camera or a plurality of cameras. However, the binocular depth estimation algorithm has a lot of problems, such as high computational complexity, high difficulty in acquiring binocular pictures, difficult matching of low-texture regions, and the like. Single-view pictures tend to be less difficult to acquire than multi-view pictures. The monocular depth estimation algorithm obtains depth from a picture or a video shot by a single camera, and can greatly reduce cost and data obtaining difficulty.
In addition, in the depth estimation problem, the acquisition cost of the depth true value is very high, and the image is usually labeled by acquiring depth information through a light sensor (indoor) and a laser radar (outdoor). The unsupervised depth estimation method based on the video sequence considers the depth prediction problem of the video sequence as an intermediate process of an image synthesis process between adjacent frames, so that a depth true value is not required for training.
A paper "Unsupervised Learning of Depth and Ego-Motion from Video" (The IEEE Conference on Computer Vision and Pattern Recognition, 2017) published by zhou.t.h, brown.m, snavely.n, lowe.d. discloses an Unsupervised Video Depth estimation algorithm based on Depth Learning. The algorithm does not need a depth true value, predicts the depth based on the multi-angle matching relation between video sequences, provides geometric consistency constraint after considering the problem of the inconsistency of the output scales of previous work, provides a self-discovery mask module on the basis, solves the problem of the inconsistency of the scales between frames of an output depth image, and has higher precision on depth prediction.
But still have the following disadvantages: the network used by the method does not fully utilize multi-scale feature fusion information to improve the accuracy of depth prediction. The feature reuse effect of the backbone network is limited, and the image features cannot be fully extracted.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a monocular video depth estimation method and system based on a depth convolutional network, which improve the accuracy of a finally obtained depth map by utilizing a deep convolutional network structure.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
The monocular video depth estimation method based on the depth convolutional network comprises the following steps:
wherein the training data comprises an RGB optical video sequence I ═ { I ═ ItT is more than or equal to 0 and less than or equal to T, T belongs to Z and the corresponding depth true value image sequence D ═ DtT is more than or equal to 0 and less than or equal to T, T belongs to Z, Z represents a time set, ItRepresenting RGB image at time t, DtRepresenting a depth truth image at time t;
step 3, performing joint training on the depth prediction sub-network and the camera pose estimation sub-network by using training data, and performing iterative updating on network parameters of the two sub-networks by using a loss function to obtain a trained depth prediction sub-network;
wherein the loss function comprises an image reconstruction error LpScale consistency error LGCAnd the error of the smoothing term Ls;
Compared with the prior art, the invention has the beneficial effects that:
because the constructed depth prediction sub-network has the densely connected deep structure and the multi-scale pyramid feature fusion module, more image information can be extracted, and the defects that the depth prediction is carried out only by using skip level connection and utilizing the multi-scale information and the feature extraction network cannot carry out feature reuse in the prior art are overcome, so that more original image space information is utilized, and the precision of the depth prediction is effectively improved.
Drawings
The invention is described in further detail below with reference to the figures and specific embodiments.
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a diagram of the deep convolutional network architecture of the present invention;
FIG. 3 is an RGB image of adjacent frames input in an embodiment of the present invention;
FIG. 4 is an output depth map of adjacent frame images obtained using the present invention;
fig. 5 is a schematic diagram of the image reconstruction process of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention.
Referring to fig. 1, the monocular video depth estimation method based on the depth convolutional network provided by the present invention includes the following steps:
wherein the training data comprises an RGB optical video sequence I ═ { I ═ ItT is more than or equal to 0 and less than or equal to T, T belongs to Z and the corresponding depth true value image sequence D ═ DtT is more than or equal to 0 and less than or equal to T, T belongs to Z, Z represents a time set, ItRepresenting RGB image at time t, DtRepresenting a depth truth image at time t;
in the embodiment, the RGB image sequence and the 3D laser radar point cloud data in the KITTI data set are randomly divided into a training set and a testing set. The samples in the test set correspond to the monocular video to be tested.
Randomly sampling in a training set to obtain RGB images I at t moment and t-1 moment of two adjacent framest,It-1And then the corresponding t time and t-1 time of the 3D laser radar point cloud data recovery are utilizedTrue depth map D of the scalet,Dt-1。
specifically, the structure of the depth estimation network model is shown in fig. 2:
the depth prediction sub-network is a self-coding network, and the encoder is a densely connected depth convolution network DenseNet; the main body of the decoder is image up-sampling, and a dense hole pyramid module DenseASPP is additionally introduced to perform multi-scale feature fusion. RGB image I of two adjacent framest,It-1As the input of the depth prediction sub-network, as shown in FIG. 3, the network output is the corresponding depth prediction graphAs shown in FIG. 4, wherein ItAndthe subscript t of (a) represents the time t,the upper mark of (A) represents that the depth prediction network predicts the result and the depth true value D obtained by the sensortAnd (5) distinguishing.
The sub-network for predicting the pose of the camera is a deep convolutional network, and the input of the network is RGB images I of two adjacent framest,It-1The output is a camera motion matrix T from time T to time T-1t→t-1。
Step 3, performing joint training on the depth prediction sub-network and the camera pose estimation sub-network by using training data, and performing iterative updating on network parameters of the two sub-networks by using a loss function to obtain a trained depth prediction sub-network;
wherein the loss function comprises an image reconstruction error LpScale consistency error LGCAnd the error of the smoothing term Ls;
(3.1) randomly sampling from Gaussian distribution with the mean value of 0 and the variance of 0.01, and taking an array of random sampling as an initialization parameter of the depth estimation network model;
(3.2) comparing the RGB image I of two adjacent framest,It-1Respectively inputting a depth prediction sub-network and a camera attitude prediction sub-network, and then respectively calculating the mask weight, the scale consistency error, the image reconstruction error and the smooth regular term error of each sub-network;
(3.3) jointly training the depth prediction sub-network and the camera attitude estimation sub-network by minimizing an overall error, so that the depth prediction sub-network can output a high-precision depth map;
and (3.4) performing iterative updating on all parameters in the depth prediction sub-network and the camera pose estimation sub-network obtained in the step (3.3) by using a batch random gradient descent method until the model converges, and finishing the optimization of the network model.
The loss function mainly includes an image reconstruction error LpError of scale uniformity LGCError of smoothing term Ls. In the process of image reconstruction, moving objects between adjacent frames, a sheltering area or other complex pixel points which are difficult to interpret often cause poor image reconstruction performance. Therefore, the pixel points of the parts need to be detected first, and then the pixel points are given lower weight, the step of detecting the complex pixel points is called as a mask module, and the specific implementation flow is shown as (3.2 a).
(3.2a) output graph of the depth prediction sub-network at time tAnd a camera motion matrix T from the output T moment to the T-1 moment of the camera pose estimation sub-networkt→t-1And a depth map under the camera view angle at the t-1 moment can be reconstructedThen will beAnd the output of the depth prediction sub-network at time t-1Making a normalized difference value to obtain a depth prediction error D based on the pixel point pdiff(p) the following:
in the above formula, p represents a certain pixel. Ddiff(p) is a group of [0, 1]Where moving objects, occlusion areas or other difficult to interpret pixels DdiffThe larger (p) is close to 1, and D does not belong to the pixel pointsdiffThe smaller the (p) is, the closer to 0 is, in order to give Ddiff(p) the lower weight of the pixel with a large value, and the mask weight M (p) based on the pixel p is calculated as follows:
M(p)=1-Ddiff(p)
this weight will be applied to the scaled image reconstruction error in (3.3)
(3.2b) Pixel depth prediction error D for the entire mapdiff(p) taking the mean to obtain the scale consistency error:
where V is the effective pixel set of the whole picture, and num (V) represents the number of effective pixels.
(3.2c) As shown in FIG. 5, the image reconstruction process is as follows, combining the RGB image I at time ttPredicted depth mapCamera sportsMoving matrix Tt→t-1The RGB image at the t-1 moment can be reconstructedBesides the gray value error, the image reconstruction error also introduces a structural similarity error SSIM, and the image reconstruction error formula is as follows in combination with the mask weight M (p) obtained in step (3 a):
wherein λi=0.15,λsThe left side of the plus sign in the above formula is the absolute value error of image reconstruction, and the right side ssim (p) is the structural similarity error of the two graphs at the time t-1.
Among them, ssim (structural similarity), structural similarity, is an index for measuring the similarity between two images. The index was first proposed by the Laboratory for Image and Video Engineering (Laboratory for Image and Video Engineering) at the university of Texas, Austin.
Given two images x and y, the structural similarity of the two images can be found as follows:
wherein muxIs the average value of x, μyIs the average value of y and is,is the variance of x and is,is the variance of y, σxyIs the covariance of x and y. c. C1=(k1L),c2=(k2L) is a constant used to maintain stability. L is the dynamic range of the pixel values. k is a radical of10.01, and 0.03 for k 2. The more similar the two pictures, the closer the SSIM value is to 1.
(3.2d) to solve the problem of noise and low texture region gradient disappearance, a smoothing term error is introduced, which is shown below:
wherein the content of the first and second substances,for the gradient at the pixel point p in the input RGB image,is the gradient of pixel point p in the depth map.
Reconstructing the image with an error LpError of scale uniformity LGCError of smoothing term LsTaking the weighted sum, the overall loss function is as follows:
L=aLp+βLs+γLGC
wherein α is 1.0, β is 0.1, and γ is 0.5. Alpha, beta and gamma respectively represent the weight of the corresponding error, and the value of the weight is between [0 and 1 ].
And training and optimizing the network model by minimizing a loss function, namely the overall error L.
And (4.1) inputting the single RGB picture of the test sample into a depth prediction sub-network, and outputting a corresponding normalized depth map.
And (4.2) calibrating the output normalized depth map according to the actual physical scale to obtain a final predicted depth map.
Simulation experiment
The effectiveness of the invention is verified by simulation experiments as follows
1. Simulation conditions are as follows:
the simulation test of the invention is carried out under the linux operating environment with the GPU being Tesla P4. Dividing pictures, training set: 5240 pictures, verification set: 2070 pictures, test set 200 pictures.
2. Simulation content:
TABLE 1 comparison of prediction accuracy of the method of the present invention and conventional SC-sfmlearner
Estimation method | SqRel | RMSE | RMSE_log |
SC-sfmlearner | 0.1834 | 6.8903 | 0.2630 |
The invention | 0.1751 | 6.4451 | 0.2496 |
From the results in table 1 it can be seen that: compared with the existing SC-sfmlearner image depth prediction method, the relative square error SqRel, the root mean square error RMSE and the root mean square logarithmic error RMSE _ log predicted by the method are smaller, which shows the effectiveness of the method provided by the invention.
Although the present invention has been described in detail in this specification with reference to specific embodiments and illustrative embodiments, it will be apparent to those skilled in the art that modifications and improvements can be made thereto based on the present invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (6)
1. The monocular video depth estimation method based on the depth convolutional network is characterized by comprising the following steps of:
step 1, acquiring training data and a monocular video to be tested;
wherein the training data comprises an RGB optical video sequence I ═ { I ═ ItT is more than or equal to 0 and less than or equal to T, T belongs to Z and the corresponding depth true value image sequence D ═ DtT is more than or equal to 0 and less than or equal to T, T belongs to Z, Z represents a time set, ItRepresenting RGB image at time t, DtRepresenting a depth truth image at time t;
step 2, a depth estimation network model is built, the depth estimation network model comprises a depth prediction sub-network and a camera pose estimation sub-network, the depth prediction sub-network is a self-coding network and comprises an encoder and a decoder, the encoder is a densely connected depth convolution network, and the decoder comprises an up-sampling module and a dense hole pyramid module; the camera pose estimation sub-network is a deep convolutional neural network;
step 3, performing joint training on the depth prediction sub-network and the camera pose estimation sub-network by using training data, and performing iterative updating on network parameters of the two sub-networks by using a loss function to obtain a trained depth prediction sub-network;
wherein the loss function comprises an image reconstruction error LpScale consistency error LGCAnd the error of the smoothing term Ls;
Step 4, inputting the monocular video to be tested into the trained depth prediction sub-network, and outputting a normalized depth prediction image; and calibrating the output normalized depth map according to the actual physical scale to obtain a final predicted depth map.
2. The method of claim 1, wherein the encoder is a densely connected deep convolutional network DenseNet; the main body of the decoder is image up-sampling, and multi-scale feature fusion is carried out by adding an introduced dense hollow pyramid module DenseASPP.
3. The method for monocular video depth estimation based on depth convolutional network of claim 1, wherein the joint training of the depth prediction subnetwork and the camera pose estimation subnetwork is performed by using training data, and the specific process is as follows:
(3.1) randomly initializing network parameters of the depth estimation network model;
(3.2) comparing the RGB image I of two adjacent framest,It-1Respectively inputting a depth prediction sub-network and a camera attitude prediction sub-network, and then respectively calculating the mask weight, the scale consistency error, the image reconstruction error and the smooth regular term error of each sub-network;
(3.3) jointly training the depth prediction sub-network and the camera attitude estimation sub-network by minimizing an overall error, so that the depth prediction sub-network can output a high-precision depth map;
and (3.4) iteratively updating all network parameters in the depth prediction sub-network and the camera pose estimation sub-network obtained in the step (3.3) by using a batch random gradient descent method until the model converges, and finishing the optimization of the network model.
4. The method as claimed in claim 3, wherein the random sampling is performed from a Gaussian distribution with a mean value of 0 and a variance of 0.01, and the random sampling array is used as an initialization parameter of the depth estimation network model.
5. The method for monocular video depth estimation based on depth convolutional network of claim 3, wherein the mask weight, scale consistency error, image reconstruction error and smooth regularization term error of each sub-network are calculated respectively, and the specific steps are as follows:
(3.2a) predicting the output map of the subnetwork according to the depth at time tCamera motion matrix T from output T moment to T-1 moment of camera pose estimation sub-networkt→t-1Reconstructing a depth map under the camera view angle at the time t-1Then will beAnd the output of the depth prediction sub-network at time t-1Making a normalized difference value to obtain a depth prediction error D based on the pixel point pdiff(p):
In the formula, Ddiff(p) is of pixel p, whose value belongs to [0, 1]]To (c) to (d);
to administer Ddiff(p) the lower weight of the pixel with a large value, and the mask weight M (p) based on the pixel p is calculated as follows:
M(p)=1-Ddiff(p);
(3.2b) Pixel depth prediction error D for the entire mapdiff(p) taking the mean value to obtain the scale consistency error:
wherein, V is the effective pixel set of the whole picture, num (V) represents the number of effective pixels;
(3.2c) combining the RGB image I at time ttPredicted depth mapAnd camera motion matrix Tt→t-1Reconstructing the RGB image at the time of t-1The error term in the process includes an image reconstruction gray value error and a structure similarity error SSIM, and then an image reconstruction error formula is as follows:
wherein λ isi、λsThe weight parameters are respectively, and the sum of the weight parameters and the weight parameters is 1;RGB image representing reconstructed time t-1The gray value of the middle pixel point p; SSIM (p) is at time t-1And It-1Structural similarity errors of pixel points p on the two images;
(3.2d) the equation for calculating the error of the smoothing term is as follows:
6. The method of claim 5, wherein the overall error is an image reconstruction error LpScale consistency error LGCAnd the error of the smoothing term LsTaking the weighted sum:
L=αLp+βLs+γLGC
wherein alpha, beta and gamma respectively represent the weight of the corresponding error, and the value of the weight is between [0 and 1 ].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110648477.0A CN113570658A (en) | 2021-06-10 | 2021-06-10 | Monocular video depth estimation method based on depth convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110648477.0A CN113570658A (en) | 2021-06-10 | 2021-06-10 | Monocular video depth estimation method based on depth convolutional network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113570658A true CN113570658A (en) | 2021-10-29 |
Family
ID=78161933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110648477.0A Pending CN113570658A (en) | 2021-06-10 | 2021-06-10 | Monocular video depth estimation method based on depth convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113570658A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627351A (en) * | 2022-02-18 | 2022-06-14 | 电子科技大学 | Fusion depth estimation method based on vision and millimeter wave radar |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
CN115272438A (en) * | 2022-08-19 | 2022-11-01 | 中国矿业大学 | High-precision monocular depth estimation system and method for three-dimensional scene reconstruction |
WO2023155043A1 (en) * | 2022-02-15 | 2023-08-24 | 中国科学院深圳先进技术研究院 | Historical information-based scene depth reasoning method and apparatus, and electronic device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510535A (en) * | 2018-03-14 | 2018-09-07 | 大连理工大学 | A kind of high quality depth estimation method based on depth prediction and enhancing sub-network |
CN109741383A (en) * | 2018-12-26 | 2019-05-10 | 西安电子科技大学 | Picture depth estimating system and method based on empty convolution sum semi-supervised learning |
WO2019223382A1 (en) * | 2018-05-22 | 2019-11-28 | 深圳市商汤科技有限公司 | Method for estimating monocular depth, apparatus and device therefor, and storage medium |
CN111311685A (en) * | 2020-05-12 | 2020-06-19 | 中国人民解放军国防科技大学 | Motion scene reconstruction unsupervised method based on IMU/monocular image |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
CN111860386A (en) * | 2020-07-27 | 2020-10-30 | 山东大学 | Video semantic segmentation method based on ConvLSTM convolutional neural network |
WO2021013334A1 (en) * | 2019-07-22 | 2021-01-28 | Toyota Motor Europe | Depth maps prediction system and training method for such a system |
-
2021
- 2021-06-10 CN CN202110648477.0A patent/CN113570658A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510535A (en) * | 2018-03-14 | 2018-09-07 | 大连理工大学 | A kind of high quality depth estimation method based on depth prediction and enhancing sub-network |
WO2019174378A1 (en) * | 2018-03-14 | 2019-09-19 | 大连理工大学 | High-quality depth estimation method based on depth prediction and enhancement sub-networks |
WO2019223382A1 (en) * | 2018-05-22 | 2019-11-28 | 深圳市商汤科技有限公司 | Method for estimating monocular depth, apparatus and device therefor, and storage medium |
CN109741383A (en) * | 2018-12-26 | 2019-05-10 | 西安电子科技大学 | Picture depth estimating system and method based on empty convolution sum semi-supervised learning |
WO2021013334A1 (en) * | 2019-07-22 | 2021-01-28 | Toyota Motor Europe | Depth maps prediction system and training method for such a system |
CN111311685A (en) * | 2020-05-12 | 2020-06-19 | 中国人民解放军国防科技大学 | Motion scene reconstruction unsupervised method based on IMU/monocular image |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739078A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Monocular unsupervised depth estimation method based on context attention mechanism |
CN111860386A (en) * | 2020-07-27 | 2020-10-30 | 山东大学 | Video semantic segmentation method based on ConvLSTM convolutional neural network |
Non-Patent Citations (2)
Title |
---|
岑仕杰;何元烈;陈小聪;: "结合注意力与无监督深度学习的单目深度估计", 广东工业大学学报, no. 04 * |
王欣盛;张桂玲;: "基于卷积神经网络的单目深度估计", 计算机工程与应用, no. 13 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023155043A1 (en) * | 2022-02-15 | 2023-08-24 | 中国科学院深圳先进技术研究院 | Historical information-based scene depth reasoning method and apparatus, and electronic device |
CN114627351A (en) * | 2022-02-18 | 2022-06-14 | 电子科技大学 | Fusion depth estimation method based on vision and millimeter wave radar |
CN114627351B (en) * | 2022-02-18 | 2023-05-16 | 电子科技大学 | Fusion depth estimation method based on vision and millimeter wave radar |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
CN114998411B (en) * | 2022-04-29 | 2024-01-09 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
CN115272438A (en) * | 2022-08-19 | 2022-11-01 | 中国矿业大学 | High-precision monocular depth estimation system and method for three-dimensional scene reconstruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110009674B (en) | Monocular image depth of field real-time calculation method based on unsupervised depth learning | |
CN113570658A (en) | Monocular video depth estimation method based on depth convolutional network | |
CN110503680B (en) | Unsupervised convolutional neural network-based monocular scene depth estimation method | |
US11715258B2 (en) | Method for reconstructing a 3D object based on dynamic graph network | |
CN107818554B (en) | Information processing apparatus and information processing method | |
CN110084304B (en) | Target detection method based on synthetic data set | |
CN111462206B (en) | Monocular structure light depth imaging method based on convolutional neural network | |
CN108171249B (en) | RGBD data-based local descriptor learning method | |
CN105513033B (en) | A kind of super resolution ratio reconstruction method that non local joint sparse indicates | |
CN112819853B (en) | Visual odometer method based on semantic priori | |
CN113177592B (en) | Image segmentation method and device, computer equipment and storage medium | |
CN113450396A (en) | Three-dimensional/two-dimensional image registration method and device based on bone features | |
Eichhardt et al. | Affine correspondences between central cameras for rapid relative pose estimation | |
CN112288788A (en) | Monocular image depth estimation method | |
CN114996814A (en) | Furniture design system based on deep learning and three-dimensional reconstruction | |
CN107392211B (en) | Salient target detection method based on visual sparse cognition | |
CN114332125A (en) | Point cloud reconstruction method and device, electronic equipment and storage medium | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN111160362B (en) | FAST feature homogenizing extraction and interframe feature mismatching removal method | |
CN111401209B (en) | Action recognition method based on deep learning | |
Nouduri et al. | Deep realistic novel view generation for city-scale aerial images | |
CN117274515A (en) | Visual SLAM method and system based on ORB and NeRF mapping | |
CN111696167A (en) | Single image super-resolution reconstruction method guided by self-example learning | |
CN111553954A (en) | Direct method monocular SLAM-based online luminosity calibration method | |
CN115496859A (en) | Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |