CN111028282A - Unsupervised pose and depth calculation method and system - Google Patents

Unsupervised pose and depth calculation method and system Download PDF

Info

Publication number
CN111028282A
CN111028282A CN201911196111.3A CN201911196111A CN111028282A CN 111028282 A CN111028282 A CN 111028282A CN 201911196111 A CN201911196111 A CN 201911196111A CN 111028282 A CN111028282 A CN 111028282A
Authority
CN
China
Prior art keywords
pose
depth
image
module
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911196111.3A
Other languages
Chinese (zh)
Inventor
蔡行
张兰清
李承远
王璐瑶
李宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN201911196111.3A priority Critical patent/CN111028282A/en
Priority to CN202010281576.5A priority patent/CN111325784A/en
Publication of CN111028282A publication Critical patent/CN111028282A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised pose and depth calculation method and a system, wherein the unsupervised pose and depth calculation method mainly adopts the following modules: the pose prediction network model TNet, the depth estimation network model DMNet, the visual reconstruction model V and the error loss function module; calculating a forward motion relative pose and a backward motion relative pose, calculating a depth estimation result of an image and a corresponding depth of the image, summing a reconstruction error, a smoothing error and a twinning consistency error to obtain a loss function, carrying out iterative updating until the loss function is converged, and finally calculating a camera relative pose and a prediction depth map according to the trained model Tnet and the trained model DNet.

Description

Unsupervised pose and depth calculation method and system
Technical Field
The invention belongs to the fields of SLAM (simultaneous Localization And mapping) And SfM (structural from motion), And particularly relates to an unsupervised pose And depth calculation method And system.
Background
In recent years, monocular dense depth estimation based on a depth learning method and algorithms of visual odometry vo (visual odometry) have rapidly developed, and are also key modules of SfM and SLAM systems. Studies have shown that VO and depth estimation based on supervised depth learning achieve good performance in many challenging environments and mitigate performance degradation problems such as scale drift. However, in practical applications it is difficult and expensive to train these supervised models to obtain sufficient data with authentic signatures. In contrast, the unsupervised approach has the great advantage that only unlabeled video sequences are required.
Depth unsupervised models of depth and pose estimation typically employ two modules, one of which predicts the depth map and the other of which estimates the relative pose of the camera. And then, after the image is projected and transformed from the source image to the target image by using the estimated depth map and the estimated posture, the models are trained in an end-to-end mode by using photometric error loss as an optimization target. However, the prior art rarely considers the following key problems: the VO is time-sequenced, the defect that the unmanned data set only has a single motion direction is ignored, the model can only process the motion in a single direction, and the motion constraint of the forward direction and the backward direction is not utilized. The existing model does not consider the complexity of the model, has large parameter quantity, and is difficult to be suitable for the practical application scene of VO.
Disclosure of Invention
The working principle of the invention is as follows: and (3) by utilizing a Twin pose network model and the time sequence information of ConvLSTM learning data, improving a depth estimation network, and providing DispNet (visibility Mobile Net) to enable the pose and depth estimation accuracy to reach higher levels.
In order to solve the above problems, the present invention provides an unsupervised absolute scale calculation method and system.
The technical scheme adopted by the invention is as follows:
an unsupervised pose and depth calculation method comprises a pose network model TNet, a depth network model DNet, an image visual reconstruction model V and a loss function, and comprises the following steps:
s1, preparing a monocular video data set;
s2, extracting continuous images from the monocular video data set in the step S1, sequentially inputting adjacent images into the position network model TNet to obtain a common feature F between the images, inputting the feature F into the position network model TNet, and respectively obtaining a forward motion relative position and a backward motion relative position;
s3, inputting the continuous images in the step S2 into a depth network model DNet, and obtaining a depth estimation result of the images and the corresponding depth of the images through forward propagation;
s4, inputting the continuous images, the forward motion relative pose, the backward motion relative pose and the image corresponding depth in the S2 into an image visual reconstruction model V to obtain a distorted image;
s5, calculating the reconstruction error between the distorted image and the continuous image in S2, calculating the smooth error of the depth estimation result, and calculating the twin consistency error;
s6, obtaining a loss function through the summation of the reconstruction error, the smoothing error and the twin consistency error, carrying out reverse propagation, and carrying out iterative updating until the loss function is converged;
and S7, forecasting, and respectively propagating forward by using the pose network model Tnet and the depth network model DNet to calculate the relative pose of the camera and a forecast depth map.
A brand new twin module is adopted to simultaneously process forward and backward motion of a video sequence, and meanwhile, forward and backward motion is restrained by utilizing a time sequence consistency error item under the constraint of reversal consistency, so that the pose estimation accuracy is greatly improved; by adopting the DispmNet model based on the MobileNet structure, the parameter quantity is reduced by 37%, and meanwhile, the depth estimation accuracy of the model is improved.
Further, the calculation formula of the reconstruction error lreprjection between the warped image in step S5 and the consecutive image in step S2 is:
Lreprojection=α*Lphotometric+(1-α)*Lssim
where Lphotometric is the photometric error, Lssim is the inter-image similarity, and α is the weight coefficient.
Further, the Lphotomeric is:
Figure BDA0002294662540000021
where It Is the continuous image, Is the warped image, and L Is the number of continuous image images minus 1.
Further, Lssim is:
Figure BDA0002294662540000031
where It Is a continuous image and Is a warped image.
Further, the twin consistency error Ltwin in step S6 is:
Figure BDA0002294662540000032
wherein, I is a unit matrix, L is the number of continuous images minus 1, and T is a relative pose.
Further, the loss function in step S6 is:
LTotal=Lreprojection+β*LSmooth+γ*LTwin
where lreprejection is a reconstruction error, Lsoooth is a smoothing error of the depth estimation result, and β and γ are weight coefficients.
Further, the loss function in step S6 is trained by using Adam optimization method.
A system for unsupervised pose and depth calculation comprises a pose network module TNet, a depth network module DNet, an image vision reconstruction module V and a loss function module; the position and pose network module TNet carries out position and pose estimation, the depth network module DNet carries out depth estimation, the image visual reconstruction module V carries out image projection, and the position and pose network module TNet and the depth network module DNet are restrained through the loss function module.
Preferably, the module TNet comprises an encoder and a twin module, the encoder comprises a convolutional layer and an activation function, the twin module comprises a pose prediction module with the same structure, and the pose prediction module comprises ConvLstm and a convolutional layer; the module DNet comprises an encoder comprising a convolutional layer and a Dwise, and a decoder comprising an anti-convolutional layer, a convolutional layer and a Dwise.
Compared with the prior art, the invention has the following advantages and effects:
1. a novel unsupervised framework for monocular vision and depth estimation is provided, and a pose network model of the framework adopts time sequence information using ConvLSTM learning data to improve pose estimation accuracy.
2. The pose network adopts a brand new twin module, simultaneously processes forward and backward movement of a video sequence, and simultaneously restrains forward and backward movement by utilizing a time sequence consistency error item under the constraint of reversal consistency, thereby greatly improving the pose estimation accuracy.
3. The DispmNet model based on the MobileNet structure is provided, the parameter quantity is reduced by 37%, and meanwhile, the depth estimation accuracy of the model is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention.
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a block diagram of a model TNet of the present invention;
FIG. 3 is a block diagram of a model DMNet of the present invention;
FIG. 4 is a comparison of the depth map results of the present invention with the GrountTruth algorithm, SfmLearner algorithm;
FIG. 5 is a comparison of pose estimation results of the present invention with other algorithms;
FIG. 6 is a comparison of depth estimation results of the present invention with other algorithms.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1:
as shown in fig. 1-6, an unsupervised pose and depth calculation method mainly employs the following modules: the system comprises a pose prediction network model TNet, a depth estimation network model DMNet, a visual reconstruction model V and an error loss function module. The TNet model comprises an encoder and a twin module, wherein the encoder comprises 7 convolutional layers, an activation function is connected behind each convolutional layer, and the sizes of convolutional kernels are 7, 5, 3 and 3 respectively; the twin module comprises two sub-network modules with the same structure, and the two sub-network modules are respectively used for processing pose prediction in forward movement or backward movement, and each sub-module is composed of a ConvLstm layer and a convolution layer Conv with the convolution kernel size of 1. DMNet contains encoder, decoder, tie layer triplex, and wherein the encoder comprises 7 layers of convolution module, and each convolution module specifically contains: convolutional layers (convolutional kernel size 1x1, Relu (activation function)), Dwise (3x3, Relu), convolutional layers (1x1, Relu), Dwise (3x3, Relu), convolutional layers (1x1, Relu); the decoder comprises 6 layers of deconvolution modules, and each deconvolution module specifically comprises: deconvolution (convolution kernel size 3x3, Relu), convolution (1x1, Relu), Dwise (3x3, Relu), convolution (1x1, Relu); the connection layer is used for transmitting the network shallow feature to a back-end decoder and cascading with the back-end feature.
Step 1, accurately obtaining monocular video sequences, such as KITTI unmanned data set, EuRoc data set, TUM data set and Oxford data set.
Step 2, each time a video segment V with a fixed frame length is taken, two adjacent frames are input into the pose network in sequence, for example, the length of the video segment V is 5 frames, and two adjacent frames (t) are input0And t1,t1And t2,t2And t3,t3And t4) Inputting into network, 4 groups of characteristics F common to two frames can be obtained1、F2、F3、F4. 4 feature groups are respectively independent and pass through two pose prediction modules of the TNet module, any sub-module can be appointed to perform forward pose prediction, the other sub-module is used for backward pose prediction, and for the forward module, the features are F1To F4The two frames of relative pose prediction results of forward motion can be obtained: t is0-1,T1-2,T2-3,T3-4For backward modules, the features are as follows4To F1To obtain the relative pose of the backward motion, T4-3,T3-2,T2-1,T1-0
For example, a video segment V of length 3 frames, two adjacent frames (t)0And t1,t1And t2) Inputting into network, obtaining 2 groups of characteristics F common to two frames1、F2. 2 feature groups are respectively independent and pass through two pose prediction modules of a TNet module, and for a forward module, the features are F1To F2The two frames of relative pose prediction results of forward motion can be obtained: t is0-1,T1-2For the backward module, the feature is as F2To F1To obtain the relative pose of the backward motion, T2-1,T1-0
Step 3, for the video clip V, each frame Ii(i-0, 1,2 …) inputting the depth estimation network separately, obtaining the depth estimation result of single frame by network forward propagation calculation, each image corresponding to depth Di(i ═ 0,1,2 …). For example, if the length of the video clip V is 5 frames, i is 0,1,2,3, 4.
Step 4, combining the relative poses T between two frames by using the image segment Vn-m,Tm-n(n-0, 1,2 …; m-i +1) and a depth per frame DiAnd obtaining a distorted image I 'through a visual reconstruction module by adopting formula 1, wherein I' comprises a forward distorted image and a backward distorted image. For example, if the length of the video segment V is 5 frames, n is 0,1,2,3, and m is 1,2,3, 4.
Figure BDA0002294662540000061
Wherein Pt is the pixel coordinate, K is the camera reference, Dt is the predicted depth map, TtAnd s is the predicted pose.
Step 5, comparing the image I in the image segment V with the distorted image I 'obtained in the step 4 pixel by pixel, calculating the reconstruction error between the image I in the image segment V and the distorted image I' obtained in the step 3 by adopting a formula 2,
Lreprojection=α*Lphotometric+(1-α)*Lssim(2)
wherein Lphotometric is the luminosity error and is calculated by formula 3, Lssim is the similarity between images and is calculated by formula 4, α is a weight coefficient, and the value range is 0-1, such as 0.85;
Figure BDA0002294662540000062
Figure BDA0002294662540000063
wherein, ItIs a continuous image, IsIs a warped image, L is the number of consecutive image images minus 1 (i.e., L ═ i-1), for example, if the length of the video segment V is 5 frames, L ═ 4;
calculating a smoothing error of the predicted depth map;
the twin consistency error is calculated using equation 5,
Figure BDA0002294662540000064
wherein, I is a unit matrix, L is the number of continuous images minus 1 (i.e., L ═ I-1), T is a pose transformation matrix, and T is a pose transformation matrixn-m*Tm-nI (n-0, 1,2 …; m-I + 1). For example, if the length of the video segment V is 5 frames, n is 0,1,2,3, m is 1,2,3,4, and L is 4.
And 6, summing the reconstruction error, the smoothing error and the twin consistency error obtained in the step 5 by adopting a formula 6 to obtain a final loss function.
LTotal=Lreprojection+β*LSmooth+γ*LTwin(6)
Wherein lreprjection is the reconstruction error calculated in step 5, Lsoooth is the smoothing error of the depth estimation result, β and γ are weight coefficients, β and γ range from 0 to 1, for example, β value is 0.85, and γ value is 0.5.
And then performing back propagation by using an Adam optimization method, and performing iterative updating on parameter values in all modules in the frame until the loss function is converged, so that the training stage of the method is completed.
And 7, in a testing stage, preparing a testing data set, inputting a pair of source images for a pose estimation task, and calculating the relative pose of the camera between two frames by forward propagation by using the TNet network trained in the steps 1 to 6 to obtain a prediction result. For a depth estimation task, inputting a single frame image to a trained DMNet module, and calculating to obtain a prediction depth map through network forward propagation.
As shown in fig. 5, the pose estimation result of the algorithm is compared with other algorithms, and the result of the algorithm is displayed from the result of the video sequence 09-10, so that the result of the algorithm is most accurate; as shown in fig. 6, comparing the depth estimation result of the present algorithm with other algorithms, the abs rel absolute difference, sq rel square difference, RMSE mean square difference, log R log mean square difference and the highest accuracy of the present algorithm are seen from the error metric and accuracy metric accuracy metrics.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. An unsupervised pose and depth calculation method is characterized by comprising a pose network model TNet, a depth network model DNet, an image visual reconstruction model V and a loss function, and comprises the following steps:
s1, preparing a monocular video data set;
s2, extracting continuous images from the monocular video data set in the step S1, sequentially inputting adjacent images into the position network model TNet to obtain a common feature F between the images, inputting the feature F into the position network model TNet, and respectively obtaining a forward motion relative position and a backward motion relative position;
s3, inputting the continuous images in the step S2 into a depth network model DNet, and obtaining a depth estimation result of the images and the corresponding depth of the images through forward propagation;
s4, inputting the continuous images, the forward motion relative pose, the backward motion relative pose and the image corresponding depth in the S2 into an image visual reconstruction model V to obtain a distorted image;
s5, calculating the reconstruction error between the distorted image and the continuous image in S2, calculating the smooth error of the depth estimation result, and calculating the twin consistency error;
s6, obtaining a loss function through the summation of the reconstruction error, the smoothing error and the twin consistency error, carrying out reverse propagation, and carrying out iterative updating until the loss function is converged;
and S7, forecasting, and respectively propagating forward by using the pose network model Tnet and the depth network model DNet to calculate the relative pose of the camera and a forecast depth map.
2. The unsupervised pose and depth calculation method according to claim 1, wherein the calculation formula of the reconstruction error between the warped image in step S5 and the continuous image in step S2 is:
Lreprojection=α*Lphotometric+(1-α)*Lssim
where Lphotometric is the photometric error, Lssim is the inter-image similarity, and α is the weight coefficient.
3. The unsupervised pose and depth calculation method of claim 2, wherein the Lphotomeric is:
Figure FDA0002294662530000021
where It Is the continuous image, Is the warped image, and L Is the number of continuous image images minus 1.
4. The unsupervised pose and depth calculation method of claim 2, wherein the Lssim is:
Figure FDA0002294662530000022
where It Is a continuous image and Is a warped image.
5. The unsupervised pose and depth calculation method according to claim 1, wherein the twin consistency error in step S6 is:
Figure FDA0002294662530000023
wherein, I is an identity matrix, L is the number of continuous images minus 1, and T is a pose transformation matrix.
6. The unsupervised pose and depth calculation method according to claim 5, wherein the loss function in step S6 is:
LTotal=LReconstruction+β*LSmooth+γ*LTwin
wherein Lreconstruction is, Lsoooth is the smoothing error of the depth estimation result, and β and γ are weight coefficients.
7. The unsupervised pose and depth calculation method of claim 1, wherein the loss function in step S6 is trained using Adam optimization.
8. A system for unsupervised pose and depth calculation is characterized by comprising a pose network module TNet, a depth network module DNet, an image vision reconstruction module V and a loss function module; the position and pose network module TNet carries out position and pose estimation, the depth network module DNet carries out depth estimation, the image visual reconstruction module V carries out image projection, and the position and pose network module TNet and the depth network module DNet are restrained through the loss function module.
9. The system of unsupervised pose and depth computation of claim 8, wherein the module TNet comprises an encoder and a twin module, the encoder comprising a convolutional layer and an activation function, the twin module comprising a pose prediction module of identical construction, the pose prediction module comprising ConvLstm and a convolutional layer; the module DNet comprises an encoder comprising a convolutional layer and a Dwise, and a decoder comprising an anti-convolutional layer, a convolutional layer and a Dwise.
CN201911196111.3A 2019-11-29 2019-11-29 Unsupervised pose and depth calculation method and system Pending CN111028282A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911196111.3A CN111028282A (en) 2019-11-29 2019-11-29 Unsupervised pose and depth calculation method and system
CN202010281576.5A CN111325784A (en) 2019-11-29 2020-04-10 Unsupervised pose and depth calculation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911196111.3A CN111028282A (en) 2019-11-29 2019-11-29 Unsupervised pose and depth calculation method and system

Publications (1)

Publication Number Publication Date
CN111028282A true CN111028282A (en) 2020-04-17

Family

ID=70207039

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201911196111.3A Pending CN111028282A (en) 2019-11-29 2019-11-29 Unsupervised pose and depth calculation method and system
CN202010281576.5A Pending CN111325784A (en) 2019-11-29 2020-04-10 Unsupervised pose and depth calculation method and system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202010281576.5A Pending CN111325784A (en) 2019-11-29 2020-04-10 Unsupervised pose and depth calculation method and system

Country Status (1)

Country Link
CN (2) CN111028282A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476835A (en) * 2020-05-21 2020-07-31 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111950599A (en) * 2020-07-20 2020-11-17 重庆邮电大学 Dense visual odometer method for fusing edge information in dynamic environment
CN112052626A (en) * 2020-08-14 2020-12-08 杭州未名信科科技有限公司 Automatic neural network design system and method
CN112053393A (en) * 2020-10-19 2020-12-08 北京深睿博联科技有限责任公司 Image depth estimation method and device
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
WO2021218282A1 (en) * 2020-04-28 2021-11-04 深圳市商汤科技有限公司 Scene depth prediction method and apparatus, camera motion prediction method and apparatus, device, medium, and program

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359363A (en) * 2022-01-11 2022-04-15 浙江大学 Video consistency depth estimation method and device based on deep learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11313684B2 (en) * 2016-03-28 2022-04-26 Sri International Collaborative navigation and mapping
CN108427920B (en) * 2018-02-26 2021-10-15 杭州电子科技大学 Edge-sea defense target detection method based on deep learning
CN109145743A (en) * 2018-07-19 2019-01-04 叶涵 A kind of image-recognizing method and device based on deep learning
CN109472830A (en) * 2018-09-28 2019-03-15 中山大学 A kind of monocular visual positioning method based on unsupervised learning
CN109798888B (en) * 2019-03-15 2021-09-17 京东方科技集团股份有限公司 Posture determination device and method for mobile equipment and visual odometer
CN110473164B (en) * 2019-05-31 2021-10-15 北京理工大学 Image aesthetic quality evaluation method based on attention mechanism
CN110287849B (en) * 2019-06-20 2022-01-07 北京工业大学 Lightweight depth network image target detection method suitable for raspberry pi

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021218282A1 (en) * 2020-04-28 2021-11-04 深圳市商汤科技有限公司 Scene depth prediction method and apparatus, camera motion prediction method and apparatus, device, medium, and program
CN111476835A (en) * 2020-05-21 2020-07-31 中国科学院自动化研究所 Unsupervised depth prediction method, system and device for consistency of multi-view images
CN111950599A (en) * 2020-07-20 2020-11-17 重庆邮电大学 Dense visual odometer method for fusing edge information in dynamic environment
CN111950599B (en) * 2020-07-20 2022-07-01 重庆邮电大学 Dense visual odometer method for fusing edge information in dynamic environment
CN112052626A (en) * 2020-08-14 2020-12-08 杭州未名信科科技有限公司 Automatic neural network design system and method
CN112052626B (en) * 2020-08-14 2024-01-19 杭州未名信科科技有限公司 Automatic design system and method for neural network
CN112053393A (en) * 2020-10-19 2020-12-08 北京深睿博联科技有限责任公司 Image depth estimation method and device
CN113240722A (en) * 2021-04-28 2021-08-10 浙江大学 Self-supervision depth estimation method based on multi-frame attention
CN113240722B (en) * 2021-04-28 2022-07-15 浙江大学 Self-supervision depth estimation method based on multi-frame attention

Also Published As

Publication number Publication date
CN111325784A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111325784A (en) Unsupervised pose and depth calculation method and system
CN110490928B (en) Camera attitude estimation method based on deep neural network
CN110782490B (en) Video depth map estimation method and device with space-time consistency
US11182620B2 (en) Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
US10803546B2 (en) Systems and methods for unsupervised learning of geometry from images using depth-normal consistency
JP6861249B2 (en) How to Train a Convolutional Recurrent Neural Network, and How to Semantic Segmentation of Input Video Using a Trained Convolutional Recurrent Neural Network
US10553026B2 (en) Dense visual SLAM with probabilistic surfel map
Yang et al. Unsupervised learning of geometry with edge-aware depth-normal consistency
CN107292912B (en) Optical flow estimation method based on multi-scale corresponding structured learning
CA3010163A1 (en) Method and apparatus for joint image processing and perception
CN108986166A (en) A kind of monocular vision mileage prediction technique and odometer based on semi-supervised learning
CN111832484B (en) Loop detection method based on convolution perception hash algorithm
JP2006260527A (en) Image matching method and image interpolation method using same
CN108491763B (en) Unsupervised training method and device for three-dimensional scene recognition network and storage medium
CN110610486B (en) Monocular image depth estimation method and device
CN111354030B (en) Method for generating unsupervised monocular image depth map embedded into SENet unit
CN110942484B (en) Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN109389156B (en) Training method and device of image positioning model and image positioning method
Adarve et al. A filter formulation for computing real time optical flow
CN111833400B (en) Camera pose positioning method
Wang et al. Unsupervised learning of accurate camera pose and depth from video sequences with Kalman filter
CN115187638A (en) Unsupervised monocular depth estimation method based on optical flow mask
CN115346207A (en) Method for detecting three-dimensional target in two-dimensional image based on example structure correlation
CN110782480A (en) Infrared pedestrian tracking method based on online template prediction
WO2020001046A1 (en) Video prediction method based on adaptive hierarchical kinematic modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200417