CN111028282A - Unsupervised pose and depth calculation method and system - Google Patents
Unsupervised pose and depth calculation method and system Download PDFInfo
- Publication number
- CN111028282A CN111028282A CN201911196111.3A CN201911196111A CN111028282A CN 111028282 A CN111028282 A CN 111028282A CN 201911196111 A CN201911196111 A CN 201911196111A CN 111028282 A CN111028282 A CN 111028282A
- Authority
- CN
- China
- Prior art keywords
- pose
- depth
- image
- module
- error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 20
- 230000033001 locomotion Effects 0.000 claims abstract description 24
- 230000000007 visual effect Effects 0.000 claims abstract description 11
- 238000009499 grossing Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 13
- 238000000034 method Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an unsupervised pose and depth calculation method and a system, wherein the unsupervised pose and depth calculation method mainly adopts the following modules: the pose prediction network model TNet, the depth estimation network model DMNet, the visual reconstruction model V and the error loss function module; calculating a forward motion relative pose and a backward motion relative pose, calculating a depth estimation result of an image and a corresponding depth of the image, summing a reconstruction error, a smoothing error and a twinning consistency error to obtain a loss function, carrying out iterative updating until the loss function is converged, and finally calculating a camera relative pose and a prediction depth map according to the trained model Tnet and the trained model DNet.
Description
Technical Field
The invention belongs to the fields of SLAM (simultaneous Localization And mapping) And SfM (structural from motion), And particularly relates to an unsupervised pose And depth calculation method And system.
Background
In recent years, monocular dense depth estimation based on a depth learning method and algorithms of visual odometry vo (visual odometry) have rapidly developed, and are also key modules of SfM and SLAM systems. Studies have shown that VO and depth estimation based on supervised depth learning achieve good performance in many challenging environments and mitigate performance degradation problems such as scale drift. However, in practical applications it is difficult and expensive to train these supervised models to obtain sufficient data with authentic signatures. In contrast, the unsupervised approach has the great advantage that only unlabeled video sequences are required.
Depth unsupervised models of depth and pose estimation typically employ two modules, one of which predicts the depth map and the other of which estimates the relative pose of the camera. And then, after the image is projected and transformed from the source image to the target image by using the estimated depth map and the estimated posture, the models are trained in an end-to-end mode by using photometric error loss as an optimization target. However, the prior art rarely considers the following key problems: the VO is time-sequenced, the defect that the unmanned data set only has a single motion direction is ignored, the model can only process the motion in a single direction, and the motion constraint of the forward direction and the backward direction is not utilized. The existing model does not consider the complexity of the model, has large parameter quantity, and is difficult to be suitable for the practical application scene of VO.
Disclosure of Invention
The working principle of the invention is as follows: and (3) by utilizing a Twin pose network model and the time sequence information of ConvLSTM learning data, improving a depth estimation network, and providing DispNet (visibility Mobile Net) to enable the pose and depth estimation accuracy to reach higher levels.
In order to solve the above problems, the present invention provides an unsupervised absolute scale calculation method and system.
The technical scheme adopted by the invention is as follows:
an unsupervised pose and depth calculation method comprises a pose network model TNet, a depth network model DNet, an image visual reconstruction model V and a loss function, and comprises the following steps:
s1, preparing a monocular video data set;
s2, extracting continuous images from the monocular video data set in the step S1, sequentially inputting adjacent images into the position network model TNet to obtain a common feature F between the images, inputting the feature F into the position network model TNet, and respectively obtaining a forward motion relative position and a backward motion relative position;
s3, inputting the continuous images in the step S2 into a depth network model DNet, and obtaining a depth estimation result of the images and the corresponding depth of the images through forward propagation;
s4, inputting the continuous images, the forward motion relative pose, the backward motion relative pose and the image corresponding depth in the S2 into an image visual reconstruction model V to obtain a distorted image;
s5, calculating the reconstruction error between the distorted image and the continuous image in S2, calculating the smooth error of the depth estimation result, and calculating the twin consistency error;
s6, obtaining a loss function through the summation of the reconstruction error, the smoothing error and the twin consistency error, carrying out reverse propagation, and carrying out iterative updating until the loss function is converged;
and S7, forecasting, and respectively propagating forward by using the pose network model Tnet and the depth network model DNet to calculate the relative pose of the camera and a forecast depth map.
A brand new twin module is adopted to simultaneously process forward and backward motion of a video sequence, and meanwhile, forward and backward motion is restrained by utilizing a time sequence consistency error item under the constraint of reversal consistency, so that the pose estimation accuracy is greatly improved; by adopting the DispmNet model based on the MobileNet structure, the parameter quantity is reduced by 37%, and meanwhile, the depth estimation accuracy of the model is improved.
Further, the calculation formula of the reconstruction error lreprjection between the warped image in step S5 and the consecutive image in step S2 is:
Lreprojection=α*Lphotometric+(1-α)*Lssim
where Lphotometric is the photometric error, Lssim is the inter-image similarity, and α is the weight coefficient.
Further, the Lphotomeric is:
where It Is the continuous image, Is the warped image, and L Is the number of continuous image images minus 1.
Further, Lssim is:
where It Is a continuous image and Is a warped image.
Further, the twin consistency error Ltwin in step S6 is:
wherein, I is a unit matrix, L is the number of continuous images minus 1, and T is a relative pose.
Further, the loss function in step S6 is:
LTotal=Lreprojection+β*LSmooth+γ*LTwin
where lreprejection is a reconstruction error, Lsoooth is a smoothing error of the depth estimation result, and β and γ are weight coefficients.
Further, the loss function in step S6 is trained by using Adam optimization method.
A system for unsupervised pose and depth calculation comprises a pose network module TNet, a depth network module DNet, an image vision reconstruction module V and a loss function module; the position and pose network module TNet carries out position and pose estimation, the depth network module DNet carries out depth estimation, the image visual reconstruction module V carries out image projection, and the position and pose network module TNet and the depth network module DNet are restrained through the loss function module.
Preferably, the module TNet comprises an encoder and a twin module, the encoder comprises a convolutional layer and an activation function, the twin module comprises a pose prediction module with the same structure, and the pose prediction module comprises ConvLstm and a convolutional layer; the module DNet comprises an encoder comprising a convolutional layer and a Dwise, and a decoder comprising an anti-convolutional layer, a convolutional layer and a Dwise.
Compared with the prior art, the invention has the following advantages and effects:
1. a novel unsupervised framework for monocular vision and depth estimation is provided, and a pose network model of the framework adopts time sequence information using ConvLSTM learning data to improve pose estimation accuracy.
2. The pose network adopts a brand new twin module, simultaneously processes forward and backward movement of a video sequence, and simultaneously restrains forward and backward movement by utilizing a time sequence consistency error item under the constraint of reversal consistency, thereby greatly improving the pose estimation accuracy.
3. The DispmNet model based on the MobileNet structure is provided, the parameter quantity is reduced by 37%, and meanwhile, the depth estimation accuracy of the model is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention.
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a block diagram of a model TNet of the present invention;
FIG. 3 is a block diagram of a model DMNet of the present invention;
FIG. 4 is a comparison of the depth map results of the present invention with the GrountTruth algorithm, SfmLearner algorithm;
FIG. 5 is a comparison of pose estimation results of the present invention with other algorithms;
FIG. 6 is a comparison of depth estimation results of the present invention with other algorithms.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1:
as shown in fig. 1-6, an unsupervised pose and depth calculation method mainly employs the following modules: the system comprises a pose prediction network model TNet, a depth estimation network model DMNet, a visual reconstruction model V and an error loss function module. The TNet model comprises an encoder and a twin module, wherein the encoder comprises 7 convolutional layers, an activation function is connected behind each convolutional layer, and the sizes of convolutional kernels are 7, 5, 3 and 3 respectively; the twin module comprises two sub-network modules with the same structure, and the two sub-network modules are respectively used for processing pose prediction in forward movement or backward movement, and each sub-module is composed of a ConvLstm layer and a convolution layer Conv with the convolution kernel size of 1. DMNet contains encoder, decoder, tie layer triplex, and wherein the encoder comprises 7 layers of convolution module, and each convolution module specifically contains: convolutional layers (convolutional kernel size 1x1, Relu (activation function)), Dwise (3x3, Relu), convolutional layers (1x1, Relu), Dwise (3x3, Relu), convolutional layers (1x1, Relu); the decoder comprises 6 layers of deconvolution modules, and each deconvolution module specifically comprises: deconvolution (convolution kernel size 3x3, Relu), convolution (1x1, Relu), Dwise (3x3, Relu), convolution (1x1, Relu); the connection layer is used for transmitting the network shallow feature to a back-end decoder and cascading with the back-end feature.
Step 1, accurately obtaining monocular video sequences, such as KITTI unmanned data set, EuRoc data set, TUM data set and Oxford data set.
For example, a video segment V of length 3 frames, two adjacent frames (t)0And t1,t1And t2) Inputting into network, obtaining 2 groups of characteristics F common to two frames1、F2. 2 feature groups are respectively independent and pass through two pose prediction modules of a TNet module, and for a forward module, the features are F1To F2The two frames of relative pose prediction results of forward motion can be obtained: t is0-1,T1-2For the backward module, the feature is as F2To F1To obtain the relative pose of the backward motion, T2-1,T1-0。
Step 3, for the video clip V, each frame Ii(i-0, 1,2 …) inputting the depth estimation network separately, obtaining the depth estimation result of single frame by network forward propagation calculation, each image corresponding to depth Di(i ═ 0,1,2 …). For example, if the length of the video clip V is 5 frames, i is 0,1,2,3, 4.
Step 4, combining the relative poses T between two frames by using the image segment Vn-m,Tm-n(n-0, 1,2 …; m-i +1) and a depth per frame DiAnd obtaining a distorted image I 'through a visual reconstruction module by adopting formula 1, wherein I' comprises a forward distorted image and a backward distorted image. For example, if the length of the video segment V is 5 frames, n is 0,1,2,3, and m is 1,2,3, 4.
Wherein Pt is the pixel coordinate, K is the camera reference, Dt is the predicted depth map, Tt→And s is the predicted pose.
Step 5, comparing the image I in the image segment V with the distorted image I 'obtained in the step 4 pixel by pixel, calculating the reconstruction error between the image I in the image segment V and the distorted image I' obtained in the step 3 by adopting a formula 2,
Lreprojection=α*Lphotometric+(1-α)*Lssim(2)
wherein Lphotometric is the luminosity error and is calculated by formula 3, Lssim is the similarity between images and is calculated by formula 4, α is a weight coefficient, and the value range is 0-1, such as 0.85;
wherein, ItIs a continuous image, IsIs a warped image, L is the number of consecutive image images minus 1 (i.e., L ═ i-1), for example, if the length of the video segment V is 5 frames, L ═ 4;
calculating a smoothing error of the predicted depth map;
the twin consistency error is calculated using equation 5,
wherein, I is a unit matrix, L is the number of continuous images minus 1 (i.e., L ═ I-1), T is a pose transformation matrix, and T is a pose transformation matrixn-m*Tm-nI (n-0, 1,2 …; m-I + 1). For example, if the length of the video segment V is 5 frames, n is 0,1,2,3, m is 1,2,3,4, and L is 4.
And 6, summing the reconstruction error, the smoothing error and the twin consistency error obtained in the step 5 by adopting a formula 6 to obtain a final loss function.
LTotal=Lreprojection+β*LSmooth+γ*LTwin(6)
Wherein lreprjection is the reconstruction error calculated in step 5, Lsoooth is the smoothing error of the depth estimation result, β and γ are weight coefficients, β and γ range from 0 to 1, for example, β value is 0.85, and γ value is 0.5.
And then performing back propagation by using an Adam optimization method, and performing iterative updating on parameter values in all modules in the frame until the loss function is converged, so that the training stage of the method is completed.
And 7, in a testing stage, preparing a testing data set, inputting a pair of source images for a pose estimation task, and calculating the relative pose of the camera between two frames by forward propagation by using the TNet network trained in the steps 1 to 6 to obtain a prediction result. For a depth estimation task, inputting a single frame image to a trained DMNet module, and calculating to obtain a prediction depth map through network forward propagation.
As shown in fig. 5, the pose estimation result of the algorithm is compared with other algorithms, and the result of the algorithm is displayed from the result of the video sequence 09-10, so that the result of the algorithm is most accurate; as shown in fig. 6, comparing the depth estimation result of the present algorithm with other algorithms, the abs rel absolute difference, sq rel square difference, RMSE mean square difference, log R log mean square difference and the highest accuracy of the present algorithm are seen from the error metric and accuracy metric accuracy metrics.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (9)
1. An unsupervised pose and depth calculation method is characterized by comprising a pose network model TNet, a depth network model DNet, an image visual reconstruction model V and a loss function, and comprises the following steps:
s1, preparing a monocular video data set;
s2, extracting continuous images from the monocular video data set in the step S1, sequentially inputting adjacent images into the position network model TNet to obtain a common feature F between the images, inputting the feature F into the position network model TNet, and respectively obtaining a forward motion relative position and a backward motion relative position;
s3, inputting the continuous images in the step S2 into a depth network model DNet, and obtaining a depth estimation result of the images and the corresponding depth of the images through forward propagation;
s4, inputting the continuous images, the forward motion relative pose, the backward motion relative pose and the image corresponding depth in the S2 into an image visual reconstruction model V to obtain a distorted image;
s5, calculating the reconstruction error between the distorted image and the continuous image in S2, calculating the smooth error of the depth estimation result, and calculating the twin consistency error;
s6, obtaining a loss function through the summation of the reconstruction error, the smoothing error and the twin consistency error, carrying out reverse propagation, and carrying out iterative updating until the loss function is converged;
and S7, forecasting, and respectively propagating forward by using the pose network model Tnet and the depth network model DNet to calculate the relative pose of the camera and a forecast depth map.
2. The unsupervised pose and depth calculation method according to claim 1, wherein the calculation formula of the reconstruction error between the warped image in step S5 and the continuous image in step S2 is:
Lreprojection=α*Lphotometric+(1-α)*Lssim
where Lphotometric is the photometric error, Lssim is the inter-image similarity, and α is the weight coefficient.
6. The unsupervised pose and depth calculation method according to claim 5, wherein the loss function in step S6 is:
LTotal=LReconstruction+β*LSmooth+γ*LTwin
wherein Lreconstruction is, Lsoooth is the smoothing error of the depth estimation result, and β and γ are weight coefficients.
7. The unsupervised pose and depth calculation method of claim 1, wherein the loss function in step S6 is trained using Adam optimization.
8. A system for unsupervised pose and depth calculation is characterized by comprising a pose network module TNet, a depth network module DNet, an image vision reconstruction module V and a loss function module; the position and pose network module TNet carries out position and pose estimation, the depth network module DNet carries out depth estimation, the image visual reconstruction module V carries out image projection, and the position and pose network module TNet and the depth network module DNet are restrained through the loss function module.
9. The system of unsupervised pose and depth computation of claim 8, wherein the module TNet comprises an encoder and a twin module, the encoder comprising a convolutional layer and an activation function, the twin module comprising a pose prediction module of identical construction, the pose prediction module comprising ConvLstm and a convolutional layer; the module DNet comprises an encoder comprising a convolutional layer and a Dwise, and a decoder comprising an anti-convolutional layer, a convolutional layer and a Dwise.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911196111.3A CN111028282A (en) | 2019-11-29 | 2019-11-29 | Unsupervised pose and depth calculation method and system |
CN202010281576.5A CN111325784A (en) | 2019-11-29 | 2020-04-10 | Unsupervised pose and depth calculation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911196111.3A CN111028282A (en) | 2019-11-29 | 2019-11-29 | Unsupervised pose and depth calculation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111028282A true CN111028282A (en) | 2020-04-17 |
Family
ID=70207039
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911196111.3A Pending CN111028282A (en) | 2019-11-29 | 2019-11-29 | Unsupervised pose and depth calculation method and system |
CN202010281576.5A Pending CN111325784A (en) | 2019-11-29 | 2020-04-10 | Unsupervised pose and depth calculation method and system |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010281576.5A Pending CN111325784A (en) | 2019-11-29 | 2020-04-10 | Unsupervised pose and depth calculation method and system |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111028282A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476835A (en) * | 2020-05-21 | 2020-07-31 | 中国科学院自动化研究所 | Unsupervised depth prediction method, system and device for consistency of multi-view images |
CN111950599A (en) * | 2020-07-20 | 2020-11-17 | 重庆邮电大学 | Dense visual odometer method for fusing edge information in dynamic environment |
CN112053393A (en) * | 2020-10-19 | 2020-12-08 | 北京深睿博联科技有限责任公司 | Image depth estimation method and device |
CN112052626A (en) * | 2020-08-14 | 2020-12-08 | 杭州未名信科科技有限公司 | Automatic neural network design system and method |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113298860A (en) * | 2020-12-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Data processing method and device, electronic equipment and storage medium |
WO2021218282A1 (en) * | 2020-04-28 | 2021-11-04 | 深圳市商汤科技有限公司 | Scene depth prediction method and apparatus, camera motion prediction method and apparatus, device, medium, and program |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359363B (en) * | 2022-01-11 | 2024-06-18 | 浙江大学 | Video consistency depth estimation method and device based on depth learning |
WO2024098240A1 (en) * | 2022-11-08 | 2024-05-16 | 中国科学院深圳先进技术研究院 | Gastrointestinal endoscopy visual reconstruction navigation system and method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11313684B2 (en) * | 2016-03-28 | 2022-04-26 | Sri International | Collaborative navigation and mapping |
CN108427920B (en) * | 2018-02-26 | 2021-10-15 | 杭州电子科技大学 | Edge-sea defense target detection method based on deep learning |
CN109145743A (en) * | 2018-07-19 | 2019-01-04 | 叶涵 | A kind of image-recognizing method and device based on deep learning |
CN109472830A (en) * | 2018-09-28 | 2019-03-15 | 中山大学 | A kind of monocular visual positioning method based on unsupervised learning |
CN109798888B (en) * | 2019-03-15 | 2021-09-17 | 京东方科技集团股份有限公司 | Posture determination device and method for mobile equipment and visual odometer |
CN110473164B (en) * | 2019-05-31 | 2021-10-15 | 北京理工大学 | Image aesthetic quality evaluation method based on attention mechanism |
CN110287849B (en) * | 2019-06-20 | 2022-01-07 | 北京工业大学 | Lightweight depth network image target detection method suitable for raspberry pi |
-
2019
- 2019-11-29 CN CN201911196111.3A patent/CN111028282A/en active Pending
-
2020
- 2020-04-10 CN CN202010281576.5A patent/CN111325784A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021218282A1 (en) * | 2020-04-28 | 2021-11-04 | 深圳市商汤科技有限公司 | Scene depth prediction method and apparatus, camera motion prediction method and apparatus, device, medium, and program |
CN111476835A (en) * | 2020-05-21 | 2020-07-31 | 中国科学院自动化研究所 | Unsupervised depth prediction method, system and device for consistency of multi-view images |
CN111950599A (en) * | 2020-07-20 | 2020-11-17 | 重庆邮电大学 | Dense visual odometer method for fusing edge information in dynamic environment |
CN111950599B (en) * | 2020-07-20 | 2022-07-01 | 重庆邮电大学 | Dense visual odometer method for fusing edge information in dynamic environment |
CN112052626A (en) * | 2020-08-14 | 2020-12-08 | 杭州未名信科科技有限公司 | Automatic neural network design system and method |
CN112052626B (en) * | 2020-08-14 | 2024-01-19 | 杭州未名信科科技有限公司 | Automatic design system and method for neural network |
CN112053393A (en) * | 2020-10-19 | 2020-12-08 | 北京深睿博联科技有限责任公司 | Image depth estimation method and device |
CN113298860A (en) * | 2020-12-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113240722A (en) * | 2021-04-28 | 2021-08-10 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
CN113240722B (en) * | 2021-04-28 | 2022-07-15 | 浙江大学 | Self-supervision depth estimation method based on multi-frame attention |
Also Published As
Publication number | Publication date |
---|---|
CN111325784A (en) | 2020-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111325784A (en) | Unsupervised pose and depth calculation method and system | |
CN110490928B (en) | Camera attitude estimation method based on deep neural network | |
CN110782490B (en) | Video depth map estimation method and device with space-time consistency | |
CN109754417B (en) | System and method for unsupervised learning of geometry from images | |
US10553026B2 (en) | Dense visual SLAM with probabilistic surfel map | |
JP6861249B2 (en) | How to Train a Convolutional Recurrent Neural Network, and How to Semantic Segmentation of Input Video Using a Trained Convolutional Recurrent Neural Network | |
CN107292912B (en) | Optical flow estimation method based on multi-scale corresponding structured learning | |
CA3010163A1 (en) | Method and apparatus for joint image processing and perception | |
CN108986166A (en) | A kind of monocular vision mileage prediction technique and odometer based on semi-supervised learning | |
CN108491763B (en) | Unsupervised training method and device for three-dimensional scene recognition network and storage medium | |
JP2006260527A (en) | Image matching method and image interpolation method using same | |
CN109272493A (en) | A kind of monocular vision odometer method based on recursive convolution neural network | |
CN113177882A (en) | Single-frame image super-resolution processing method based on diffusion model | |
CN111354030B (en) | Method for generating unsupervised monocular image depth map embedded into SENet unit | |
CN110942484B (en) | Camera self-motion estimation method based on occlusion perception and feature pyramid matching | |
CN109389156B (en) | Training method and device of image positioning model and image positioning method | |
CN110610486A (en) | Monocular image depth estimation method and device | |
CN115187638A (en) | Unsupervised monocular depth estimation method based on optical flow mask | |
Wang et al. | Adversarial learning for joint optimization of depth and ego-motion | |
Adarve et al. | A filter formulation for computing real time optical flow | |
CN111833400B (en) | Camera pose positioning method | |
CN115346207A (en) | Method for detecting three-dimensional target in two-dimensional image based on example structure correlation | |
Wang et al. | Unsupervised learning of accurate camera pose and depth from video sequences with Kalman filter | |
CN114743105A (en) | Depth privilege visual odometer method based on cross-modal knowledge distillation | |
CN110782480A (en) | Infrared pedestrian tracking method based on online template prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200417 |