CN111783582A - Unsupervised monocular depth estimation algorithm based on deep learning - Google Patents
Unsupervised monocular depth estimation algorithm based on deep learning Download PDFInfo
- Publication number
- CN111783582A CN111783582A CN202010571133.XA CN202010571133A CN111783582A CN 111783582 A CN111783582 A CN 111783582A CN 202010571133 A CN202010571133 A CN 202010571133A CN 111783582 A CN111783582 A CN 111783582A
- Authority
- CN
- China
- Prior art keywords
- image
- depth
- loss
- network
- optical flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Abstract
The invention discloses an unsupervised monocular depth estimation algorithm based on depth learning, which can be used for detecting a moving target in a scene by comparing the difference between an optical flow generated by camera motion and a full-optical flow, and finally improving the depth estimation effect of the algorithm.
Description
Technical Field
The invention relates to a monocular depth estimation algorithm, in particular to an unsupervised monocular depth estimation algorithm based on deep learning.
Background
Computer vision simulates the human visual function through a computer, enabling the computer to have human-like capabilities of perceiving a real three-dimensional scene from two-dimensional planar images, including understanding and recognizing information in the scene, such as content, motion, and structure. However, techniques based on two-dimensional images suffer from some inherent drawbacks, since planar images lack depth information in three-dimensional space during imaging. Therefore, how to reconstruct three-dimensional information of a scene from a single image or multiple images, i.e. depth estimation, is a very important fundamental subject of current research in the field of computer vision. The depth refers to the distance between a point in a scene and a plane where a camera is located, depth information corresponding to an image can be described by a depth image, and the gray value of each pixel point of the depth image can be used for representing the distance between a certain point in the scene and the camera. With the progress of research, the depth estimation technology is gradually applied to the fields of intelligent robots, intelligent medical treatment, unmanned driving, target detection and tracking, face recognition, 3D video production and the like, and has great social value and economic value.
Depth estimation algorithms can be classified into multi-view image-based, binocular image-based and monocular image-based according to the number of viewpoints of a scene image. Compared with the former two methods, the monocular image lacks abundant spatial structure information, and is the most difficult of the three methods. However, the depth estimation through the monocular image is convenient to use, low in cost and most close to the actual application requirements, so that the method has high research value and is a hotspot in the field of current depth estimation.
Most conventional depth estimation methods estimate image depth directly through visual cues. However, the traditional method has strict use conditions, and the calculation amount is relatively large. In recent years, the deep learning technology has been rapidly developed, and therefore, an image depth estimation method combined with deep learning also starts to get attention of researchers at home and abroad. Monocular depth estimation algorithms based on deep learning can be classified into supervised and unsupervised types according to whether a real depth label is used. The supervised method takes a single image as training data, considers depth estimation as a dense predictive regression task, and fits depth values using a convolutional neural network. The disadvantages of this approach are also apparent in that it relies on a large amount of tag data and the cost of obtaining a corresponding deep tag is high. Unsupervised methods derive heuristics from traditional motion-based methods, using a continuous sequence of images as training data, and inferring the three-dimensional structure of the scene based on the motion of the camera. But such methods need to assume that only the motion of the camera is present in the scene, i.e. neglecting the presence of moving objects such as vehicles, pedestrians. The prediction accuracy of such methods can be greatly affected when there are a large number of moving objects in the scene.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an algorithm for unsupervised estimation of monocular image depth without depending on labels.
The technical scheme is as follows: an unsupervised monocular depth estimation algorithm based on deep learning comprises the following steps:
step 1: processing a video shot by a monocular camera to obtain an image sequence with the length of N, and taking an intermediate frame in the image sequence as a target image ItThe rest frames are used as source images Is;
Step 2: the target image I obtained in the step 1 is processedtInputting the depth image into a constructed depth network DepthNet to obtain a depth imageThe target image I obtained in the step 1 is processedtAnd a source image IsThe tensors connected according to the channels are input into a constructed camera pose network PoseNet to obtain camera pose transformationDepth-based imageAnd camera pose transformationSolving to obtain rigid motion light stream caused by rigid motion of cameraSubsequently reconstructing the imageCalculating depth smoothing loss Lds;
And step 3: inputting the image sequence obtained in the step 1 into a constructed optical flow network FlowNet to obtain a full optical flow caused by camera motion and the self-movement of an objectBased on full light flowReconstructing an imageAnd calculating reconstruction lossAnd to combat the loss Ladv;
And 4, step 4: comparing the rigid motion light flow obtained in step 2And the total optical flow obtained in step 3To obtain a moving target maskMask based on moving targetCalculating to obtain the consistency loss L of the optical flowfcAnd loss of rigidity reconstruction
And 5: based on the antagonistic loss LadvOptical flow uniformity loss LfcLoss of rigidity reconstructionLoss of reconstructionAnd depth smoothing loss LdsConstructing a loss function LtotalIterate until the loss function LtotalConverging to obtain a trained depth network DepthNet, a camera pose network PoseNet and an optical flow network FlowNet;
step 6: and respectively inputting the images to be estimated into the trained depth network DepthNet, camera pose network PosenET and optical flow network FlowNet to obtain unsupervised estimation results of the corresponding image depth, camera pose and motion optical flow.
Further, the depth network DepthNet in the step 2 is a full convolution network, and comprises an encoder and a decoder, wherein the encoder and the decoder are connected in a cross-layer manner;
Further, the depth-based image in step 2And camera pose transformationSolving to obtain rigid motion light stream caused by rigid motion of cameraThe method comprises the following steps:
calculating according to the formula (1) to obtain a certain pixel in a source image IsProjected coordinates of
In the formula, ptIs a target image ItThe secondary coordinate of the last pixel;
the optical flow at a certain pixel is calculated according to equation (2):
Further, the depth smoothing loss L in step 2dsCalculated according to equation (3):
wherein the content of the first and second substances,respectively representing the longitudinal and transverse gradients, ptIs a target image ItThe next coordinate of the last pixel.
Further, the optical flow network FlowNet in the step 3 is a countermeasure network comprising a generator and a discriminator, wherein the generator accepts the target image ItAnd a source image IsThe tensors connected according to the channels are used as input to output full light flowThe discriminator receives a target image ItAnd reconstructing the imageAs input, a target image ItReconstruction of images, viewed as true imagesThe generated image is regarded as a generated image, and a probability value representing that the generated image is a real image is output.
Further, the structure of the generator is consistent with that of the deep network DepthNet.
In the formula, SSIM represents a structural similarity index, w is a parameter,as a full light streamThe corresponding effective mask.
Further, in step 3, the countermeasure loss L is calculated according to the formula (5)adv:
Where G, D denotes the generator and the discriminator, respectively, I, X denotes the real image and the data distribution of the real image,respectively, a generated image and a data distribution of the generated image.
In the formula, 1(·) is an indicator function, and alpha is a threshold;
obtaining the light flow consistency loss L according to the equation (7)fc:
Further, the loss function L in step 5totalExpressed as:
wherein λ isadv、Lds、λr、λf、λfcRespectively, the weights corresponding to the losses.
Has the advantages that: compared with the prior art, the invention has the following advantages:
1. the method realizes the detection of the moving target in the scene by comparing the difference between the optical flow generated by the motion of the camera and the all-optical flow, and finally improves the depth estimation effect of the algorithm;
2. according to the method, the accuracy and robustness of the algorithm are effectively enhanced by detecting the dynamic target in the scene;
3. the method takes the video shot by the monocular camera as training data, does not need expensive depth labels, and can greatly reduce the influence of the moving target on an unsupervised method by modeling the moving target, thereby ensuring that the algorithm can obtain good effect in monocular depth estimation, camera pose prediction and optical flow estimation tasks;
4. the invention generates the countermeasure loss introduced by the countermeasure network structure, and obviously improves the precision of the optical flow prediction.
Drawings
FIG. 1 is a schematic view of a model structure;
FIG. 2 is a generative countermeasure network structure of an optical flow network FlowNet;
FIG. 4 shows the target image I from the top downtSource image IsAnd rigid motion light flowExamples are given;
FIG. 5 shows the target image I from the top downtReconstructing the imageAnd corresponding effective maskExamples are given;
FIG. 6 shows the target image I from top to bottomtSource image IsAnd all optical flowExamples are given;
Detailed Description
The technical solution of the present invention will be further explained with reference to the accompanying drawings and examples.
Referring to fig. 1, the algorithm model of the invention is composed of a depth network DepthNet, a camera pose network PosenNet and an optical flow network FlowNet. The depth network depthNet outputs a depth image with resolution equal to that of a monocular input image, the depth value is represented by gray scale, the camera pose network PosenNet is used for estimating the distance between adjacent frame images, the pose transformation quantity of a camera in a three-dimensional space, the optical flow network FlowNet is used for estimating the full optical flow between the adjacent frame images, and FIG. 2 shows a generation countermeasure network structure of the optical flow network FlowNet.
Based on the model, the unsupervised monocular depth estimation algorithm based on depth learning is designed, the detection of the moving target in the scene is realized by comparing the difference between the optical flow generated by the motion of the camera and the plenoptic flow, and the depth estimation effect of the algorithm is finally improved. The method can realize the unsupervised estimation of the depth image, the camera pose and the motion optical flow for the moving monocular camera video without training the label, and the three tasks have excellent prediction precision.
The method specifically comprises the following steps:
step 1: the method comprises the steps of taking a video shot by a monocular camera as a training set, obtaining a series of image sequences with the length N of 3 after processing by taking camera internal parameters K as known, and taking the image sequences as data of a final input model, wherein an intermediate frame is taken as a target image ItThe rest frames are used as source images Is;
Step 2: constructing a depth network DepthNet and a camera pose network PoseNet, combining the input in the step 1, and respectively outputting a depth imageAnd camera pose transformationSolving the optical flow caused by rigid body motion of cameraAnd corresponding effective maskSubsequently reconstructing the imageCalculating depth smoothing loss Lds;
The deep network DepthNet structure is described as follows: DepthNet is a full convolutional network of encoder-decoder architecture, with cross-layer connections between the encoder and decoder. The encoder consists of 7 pairs of convolution layers with convolution step sizes of 2 and 1 respectively, and the convolution kernel numbers are 32, 64, 128, 256, 512 and 512 respectively; the decoder consists of a series of successive deconvolution and convolution layers, and finally, as shown in FIG. 3, outputs and inputs the target image ItEqual resolution gray scale imageThe grayscale size represents the depth value at that pixel. The convolution kernel sizes for all layers are 3, except for the 2 pairs of convolutional layers before the encoder, which have convolution kernel sizes of 7 and 5. All layers except the last output layer were normalized using the LeakyReLU activation function and batch.
The camera pose network PoseNet structure is described as follows: posenet is composed of 7 layers of convolution layers, the number of convolution kernels is 16, 32, 64, 128, 256 and 256 respectively, convolution step length is 2, except the size of the convolution kernel of the first 2 layers is set to be 7 and 5, the sizes of all the convolution kernels are 3, and Posenet receives a target image ItAnd a source image IsThe tensors connected according to the channels are used as input, and finally the position and attitude of the camera are converted through a 1 x 1 convolution layer of a 6 channelRepresenting a secondary target image ItTo the source image IsThe rigid motion of the camera in space includes 3 euler angles and 3 translation amounts.
Camera motion light streamConstructing and reconstructing imagesThe description is as follows: note ptIs a target image ItThe secondary coordinate of the last pixel is combined with the depth imageAnd camera pose transformationCan obtain the pixel in the source image IsProjected coordinates of
The optical flow at the pixel can be found as:
the optical flow represents the variation of the position of the same pixel between the target image and the source image. As shown in FIG. 4, the target image I is sequentially arranged from top to bottomtSource image IsAnd rigid motion light flowExamples are given.
Due to the fact thatMay exceed the image boundary, so a corresponding effective mask needs to be establishedIn the reconstructed imageDue to the fact thatThe value is continuous and the number of the first and second,by sampling on the source imageThe bilinear interpolation of 4 surrounding pixels is calculated, thereby reconstructingAs shown in FIG. 5, the target image I is sequentially arranged from top to bottomtReconstructing the imageAnd corresponding effective maskExamples are given.
Depth smoothing loss LdsThe calculation is as follows:
wherein the content of the first and second substances,representing the longitudinal and transverse gradients, respectively, the depth smoothing penalty LdsThe depth change of the object contour and other positions in the depth image is ensured to be large, and the depth images of the rest positions are as smooth as possible.
And step 3: constructing optical flow networksFlowNet, combining the inputs in step 1, outputs a plenoptic flow caused by camera motion and the movement of the object itselfSolving for the corresponding effective maskSubsequently reconstructing the imageCalculating reconstruction lossAnd to combat the loss Ladv;
The structure of the optical flow network FlowNet is described as follows: FlowNet is a form of generating a countermeasure network as shown in FIG. 2, and is composed of a generator that accepts a target image I and a discriminatortAnd a source image IsThe tensors after the channel connection are used as input, and the full optical flow shown in FIG. 6 is outputThe optical flow is caused by the camera motion and the object self-movement, and the structure of the generator is completely the same as that of the depth network DepthNet except that the number of channels of the final output layer is 2. In combination with step 2, according to total light flowReconfigurable imageAnd constructing corresponding effective maskCalculating reconstruction lossThe following were used:
where SSIM denotes a structural similarity index, and the parameter w is set to 0.85. Theoretically, if the depth estimation and the camera pose estimation are error-free, the effective mask isIn the interior of said container body,and ItShould be identical, the reconstruction loss should be zero. The discriminator receives ItAndas an input, ItTo be regarded as a real image,treating as generating an image; and outputting a probability value which represents the probability that the corresponding input image is a real image. The structure of the discriminator is similar to PoseNet, and the discriminator is composed of 7 convolutional layers and is finally output after global average pooling and sigmoid activation functions.
Against loss LadvThe formula is as follows:
where G, D denote the generator and discriminator sections, respectively, I, X are the real image and the data distribution of the real image,respectively, a generated image and a data distribution of the generated image.
And 4, step 4: comparing rigid body motion light flowAnd all optical flowDetecting the difference of (2), detecting the shiftMoving object, outputting moving object maskComputing optical flow consistency loss LfcWhile calculating the loss of stiffness reconstruction in conjunction with step 2
where 1 () is the indicator function, the threshold α is set to 7, theoretically, ifAndestimation is completed without error and at the moving targetAndthe optical-flow difference should be large and the two types of optical-flow values should be exactly equal at the static background. As shown in FIG. 7, the optical flow is a rigid motion from top to bottomFull light streamAnd moving the target maskExamples are given.
Optical flow uniformityLoss LfcComprises the following steps:
this loss guaranteeAndthe two types of light flow are as equal as possible at the static background.
and 5: based on the antagonistic loss LadvOptical flow uniformity loss LfcLoss of rigidity reconstructionLoss of reconstructionAnd depth smoothing loss LdsConstructing a loss function LtotalMinimizing L using Adam iteratortotalUntil convergence, obtaining a trained depth network DepthNet, a camera pose network PoseNet and an optical flow network FlowNet;
final loss function LtotalThe formula is as follows:
wherein λadv、Lds、λr、λf、λfcWeights for losses of respective terms, respectively, of the magnitude of 0.005, 1, 10, 1 and 0.01, respectively parameter β of Adam iterator1、β2The sizes were 0.9 and 0.999, respectively. During model training, the initial learning rate is 0.0002, and the batch size is set to 8.
Step 6: and respectively inputting the images to be estimated into the trained depth network DepthNet, camera pose network PosenET and optical flow network FlowNet to obtain unsupervised estimation results of the corresponding image depth, camera pose and motion optical flow.
Claims (10)
1. An unsupervised monocular depth estimation algorithm based on deep learning is characterized in that: the method comprises the following steps:
step 1: processing a video shot by a monocular camera to obtain an image sequence with the length of N, and taking an intermediate frame in the image sequence as a target image ItThe rest frames are used as source images Is;
Step 2: the target image I obtained in the step 1 is processedtInputting the depth image into a constructed depth network DepthNet to obtain a depth imageThe target image I obtained in the step 1 is processedtAnd a source image IsThe tensors connected according to the channels are input into a constructed camera pose network PoseNet to obtain camera pose transformationDepth-based imageAnd camera pose transformationSolving to obtain rigid motion light stream caused by rigid motion of cameraSubsequently reconstructing the imageCalculating depth smoothing loss Lds;
And step 3: inputting the image sequence obtained in the step 1 into a constructed optical flow network FlowNet to obtain a full optical flow caused by camera motion and the self-movement of an objectBased on full light flowReconstructing an imageAnd calculating reconstruction lossAnd to combat the loss Ladv;
And 4, step 4: comparing the rigid motion light flow obtained in step 2And the total optical flow obtained in step 3To obtain a moving target maskMask based on moving targetCalculating to obtain the consistency loss L of the optical flowfcAnd loss of rigidity reconstruction
And 5: based on the antagonistic loss LadvOptical flow uniformity loss LfcRigid reconstructionLoss of powerLoss of reconstructionAnd depth smoothing loss LdsConstructing a loss function LtotalIterate until the loss function LtotalConverging to obtain a trained depth network DepthNet, a camera pose network PoseNet and an optical flow network FlowNet;
step 6: and respectively inputting the images to be estimated into the trained depth network DepthNet, camera pose network PosenET and optical flow network FlowNet to obtain unsupervised estimation results of the corresponding image depth, camera pose and motion optical flow.
2. The unsupervised monocular depth estimation algorithm based on deep learning of claim 1, wherein: the depth network DepthNet in the step 2 is a full convolution network and comprises an encoder and a decoder, wherein the encoder and the decoder are connected in a cross-layer manner;
3. The unsupervised monocular depth estimation algorithm based on deep learning of claim 1, wherein: depth-based image in step 2And camera pose transformationSolving to obtain rigid motion light stream caused by rigid motion of cameraThe method comprises the following steps:
calculating according to the formula (1) to obtain a certain pixel in a source image IsProjected coordinates of
In the formula, ptIs a target image ItThe secondary coordinate of the last pixel;
the optical flow at a certain pixel is calculated according to equation (2):
4. An unsupervised monocular depth estimation algorithm based on deep learning according to claim 1 or 3, characterized in that: depth smoothing loss L in step 2dsCalculated according to equation (3):
5. The unsupervised monocular depth estimation algorithm based on deep learning of claim 1, wherein: the optical flow network FlowNet in the step 3 is a countermeasure network and comprises a generator and a discriminator, wherein the generator receives a target image ItAnd a source image IsThe tensors connected according to the channels are used as input to output full light flowThe discriminator receives a target image ItAnd reconstructing the imageAs input, a target image ItReconstruction of images, viewed as true imagesThe generated image is regarded as a generated image, and a probability value representing that the generated image is a real image is output.
6. The unsupervised monocular depth estimation algorithm based on deep learning of claim 5, wherein: the structure of the generator is consistent with that of the deep network DepthNet.
7. The unsupervised monocular depth estimation algorithm based on deep learning of claim 1, wherein: in step 3, the reconstruction loss is calculated according to equation (4)
8. The unsupervised monocular depth estimation algorithm based on deep learning of claim 5, wherein: in step 3, the countermeasure loss L is calculated from the formula (5)adv:
9. The unsupervised monocular depth estimation algorithm based on deep learning of claim 1, wherein: in step 4, the target mask is obtained according to the formula (6)
In the formula, 1(·) is an indicator function, and alpha is a threshold;
obtaining the light flow consistency loss L according to the equation (7)fc:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571133.XA CN111783582A (en) | 2020-06-22 | 2020-06-22 | Unsupervised monocular depth estimation algorithm based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571133.XA CN111783582A (en) | 2020-06-22 | 2020-06-22 | Unsupervised monocular depth estimation algorithm based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111783582A true CN111783582A (en) | 2020-10-16 |
Family
ID=72756281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010571133.XA Pending CN111783582A (en) | 2020-06-22 | 2020-06-22 | Unsupervised monocular depth estimation algorithm based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111783582A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112344922A (en) * | 2020-10-26 | 2021-02-09 | 中国科学院自动化研究所 | Monocular vision odometer positioning method and system |
CN112396657A (en) * | 2020-11-25 | 2021-02-23 | 河北工程大学 | Neural network-based depth pose estimation method and device and terminal equipment |
CN113139990A (en) * | 2021-05-08 | 2021-07-20 | 电子科技大学 | Depth grid stream robust image alignment method based on content perception |
CN113160294A (en) * | 2021-03-31 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Image scene depth estimation method and device, terminal equipment and storage medium |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113379821A (en) * | 2021-06-23 | 2021-09-10 | 武汉大学 | Stable monocular video depth estimation method based on deep learning |
CN113724155A (en) * | 2021-08-05 | 2021-11-30 | 中山大学 | Self-boosting learning method, device and equipment for self-supervision monocular depth estimation |
CN114066987A (en) * | 2022-01-12 | 2022-02-18 | 深圳佑驾创新科技有限公司 | Camera pose estimation method, device, equipment and storage medium |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
CN116164770A (en) * | 2023-04-23 | 2023-05-26 | 禾多科技(北京)有限公司 | Path planning method, path planning device, electronic equipment and computer readable medium |
WO2023178951A1 (en) * | 2022-03-25 | 2023-09-28 | 上海商汤智能科技有限公司 | Image analysis method and apparatus, model training method and apparatus, and device, medium and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522828A (en) * | 2018-11-01 | 2019-03-26 | 上海科技大学 | A kind of accident detection method and system, storage medium and terminal |
CN109977847A (en) * | 2019-03-22 | 2019-07-05 | 北京市商汤科技开发有限公司 | Image generating method and device, electronic equipment and storage medium |
CN110705376A (en) * | 2019-09-11 | 2020-01-17 | 南京邮电大学 | Abnormal behavior detection method based on generative countermeasure network |
-
2020
- 2020-06-22 CN CN202010571133.XA patent/CN111783582A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522828A (en) * | 2018-11-01 | 2019-03-26 | 上海科技大学 | A kind of accident detection method and system, storage medium and terminal |
CN109977847A (en) * | 2019-03-22 | 2019-07-05 | 北京市商汤科技开发有限公司 | Image generating method and device, electronic equipment and storage medium |
CN110705376A (en) * | 2019-09-11 | 2020-01-17 | 南京邮电大学 | Abnormal behavior detection method based on generative countermeasure network |
Non-Patent Citations (2)
Title |
---|
GAO HAOSHENG,TENG WANG: "Unsupervised Learning of Monocular Depth from Videos", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 * |
WEI-SHENG LAI等: "Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks", 《《31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112344922A (en) * | 2020-10-26 | 2021-02-09 | 中国科学院自动化研究所 | Monocular vision odometer positioning method and system |
CN112396657A (en) * | 2020-11-25 | 2021-02-23 | 河北工程大学 | Neural network-based depth pose estimation method and device and terminal equipment |
CN113160294A (en) * | 2021-03-31 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Image scene depth estimation method and device, terminal equipment and storage medium |
CN113139990A (en) * | 2021-05-08 | 2021-07-20 | 电子科技大学 | Depth grid stream robust image alignment method based on content perception |
CN113379821A (en) * | 2021-06-23 | 2021-09-10 | 武汉大学 | Stable monocular video depth estimation method based on deep learning |
CN113313732A (en) * | 2021-06-25 | 2021-08-27 | 南京航空航天大学 | Forward-looking scene depth estimation method based on self-supervision learning |
CN113724155A (en) * | 2021-08-05 | 2021-11-30 | 中山大学 | Self-boosting learning method, device and equipment for self-supervision monocular depth estimation |
CN113724155B (en) * | 2021-08-05 | 2023-09-05 | 中山大学 | Self-lifting learning method, device and equipment for self-supervision monocular depth estimation |
CN114066987A (en) * | 2022-01-12 | 2022-02-18 | 深圳佑驾创新科技有限公司 | Camera pose estimation method, device, equipment and storage medium |
WO2023178951A1 (en) * | 2022-03-25 | 2023-09-28 | 上海商汤智能科技有限公司 | Image analysis method and apparatus, model training method and apparatus, and device, medium and program |
CN114998411A (en) * | 2022-04-29 | 2022-09-02 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss |
CN114998411B (en) * | 2022-04-29 | 2024-01-09 | 中国科学院上海微系统与信息技术研究所 | Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss |
CN116164770A (en) * | 2023-04-23 | 2023-05-26 | 禾多科技(北京)有限公司 | Path planning method, path planning device, electronic equipment and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783582A (en) | Unsupervised monocular depth estimation algorithm based on deep learning | |
Zhai et al. | Optical flow and scene flow estimation: A survey | |
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
Lv et al. | Learning rigidity in dynamic scenes with a moving camera for 3d motion field estimation | |
Yan et al. | Ddrnet: Depth map denoising and refinement for consumer depth cameras using cascaded cnns | |
JP7177062B2 (en) | Depth Prediction from Image Data Using Statistical Model | |
WO2019174377A1 (en) | Monocular camera-based three-dimensional scene dense reconstruction method | |
CN111105432A (en) | Unsupervised end-to-end driving environment perception method based on deep learning | |
CN115187638B (en) | Unsupervised monocular depth estimation method based on optical flow mask | |
Wang et al. | Depth estimation of video sequences with perceptual losses | |
CN113850900A (en) | Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction | |
CN114996814A (en) | Furniture design system based on deep learning and three-dimensional reconstruction | |
Song et al. | Depth estimation from a single image using guided deep network | |
Liu et al. | A survey on deep learning methods for scene flow estimation | |
Feng et al. | Deep depth estimation on 360 images with a double quaternion loss | |
JP2024510230A (en) | Multi-view neural human prediction using implicitly differentiable renderer for facial expression, body pose shape and clothing performance capture | |
Wang et al. | Recurrent neural network for learning densedepth and ego-motion from video | |
Durasov et al. | Double refinement network for efficient monocular depth estimation | |
Basak et al. | Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image | |
Kashyap et al. | Sparse representations for object-and ego-motion estimations in dynamic scenes | |
CN113436254A (en) | Cascade decoupling pose estimation method | |
Hou et al. | Joint learning of image deblurring and depth estimation through adversarial multi-task network | |
Rabby et al. | Beyondpixels: A comprehensive review of the evolution of neural radiance fields | |
Khan et al. | Towards monocular neural facial depth estimation: Past, present, and future |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201016 |