CN111260680B - RGBD camera-based unsupervised pose estimation network construction method - Google Patents

RGBD camera-based unsupervised pose estimation network construction method Download PDF

Info

Publication number
CN111260680B
CN111260680B CN202010034081.2A CN202010034081A CN111260680B CN 111260680 B CN111260680 B CN 111260680B CN 202010034081 A CN202010034081 A CN 202010034081A CN 111260680 B CN111260680 B CN 111260680B
Authority
CN
China
Prior art keywords
network
sequence
pose
image
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010034081.2A
Other languages
Chinese (zh)
Other versions
CN111260680A (en
Inventor
杨宇翔
潘耀辉
高明煜
何志伟
黄继业
董哲康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010034081.2A priority Critical patent/CN111260680B/en
Publication of CN111260680A publication Critical patent/CN111260680A/en
Application granted granted Critical
Publication of CN111260680B publication Critical patent/CN111260680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an RGBD camera-based unsupervised pose estimation network construction method. Estimating the motion of a camera from an image is a major research topic of current vision mobile robots. The traditional method is easy to fail in the environments of low texture, complex geometric structure, illumination, shading and the like. Most deep learning based methods require additional supervision data, which complicates the work and increases the cost. The method of the convolutional neural network makes up the defects of the traditional method, utilizes the distance information of the depth image, combines the traditional geometric knowledge, and utilizes the positive sequence and negative sequence input to increase the constraint, so that the network can accurately estimate the pose of the camera.

Description

RGBD camera-based unsupervised pose estimation network construction method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an RGBD camera-based unsupervised bit motion estimation network construction method
Background
The instant positioning And Mapping of the SLAM (singular Localization And Mapping) is an important research direction of machine vision, and through the development of the last 30 years, the SLAM And related technologies become research hotspots in the fields of robots, image processing, deep learning, motion recovery structures, augmented reality And the like. And the recovery Of Motion Structure, SFM (Structure Of Motion), from consecutive frames is a major focus. Although conventional SFM methods are efficient in many cases, they rely on accurate images and are prone to failure in low texture, complex geometry and lighting, shadowing, etc. environments.
In order to solve the problem, with the development of deep learning in recent years, some methods based on deep learning are proposed and applied to various stages of the conventional SFM. Because the deep learning methods use a large amount of data sets for training and the methods use accurate external supervision during training, the interframe estimation under a fixed scene becomes more accurate, and the defects of the traditional methods are overcome. However, external surveillance data, particularly true values of relative motion between frames, are not easily obtained, requiring the use of additional sensors such as IMU, GPS, etc., which makes the task more complicated and increases costs.
Depth cameras have been widely used in SLAM research in recent years, and accurate color maps and corresponding depth maps can be acquired easily and accurately. The color map has rich characteristic information, can learn characteristic representation by using a neural network, and can sense the characteristic correlation among different frames. While the depth map provides distance information of the object. And the distance information and the deep learning of the depth map are fused, and a new idea is provided for unsupervised network by combining geometric knowledge.
Disclosure of Invention
The invention aims to overcome the defects of the traditional method and the limitation of supervised deep learning, and provides an unsupervised pose estimation network construction method based on an RGBD camera. The method not only uses a deep learning method to learn the relationship of inter-frame conversion, but also combines the traditional geometric knowledge, utilizes the inter-frame conversion relationship generated by the network and the distance information of the depth map, and combines the traditional geometric knowledge to guide the network to generate more accurate results, thereby achieving unsupervised effect, and adding constraint through a positive sequence and reverse sequence network to ensure that the estimation of the network is more accurate. The method comprises the following specific steps:
step (1): obtaining same scene color image and depth image by RGB-D camera
Continuous color image using RGB-D camera
Figure BDA0002365397370000021
And corresponding continuous depth images
Figure BDA0002365397370000022
The resolution is H × W, and H and W are the height and width of the image respectively; selecting a color image at time t
Figure BDA0002365397370000023
And its adjacent image
Figure BDA0002365397370000024
Figure BDA0002365397370000025
Each color image is RGB three channels, and three color images are combined
Figure BDA0002365397370000026
In the form of a 9-channel splice
Figure BDA0002365397370000027
Wherein feature refers to a feature map obtained after convolutional layer, and 0 refers to convolution operation of 0 th time;
step (2): learning inter-frame structure relationship based on pose network
The pose network is composed of a convolutional neural network, and passes through a ReLU activation layer after each convolutional layer; first 9 channel sequences
Figure BDA0002365397370000028
Passing through a convolution layer with convolution kernel size of 7 × 7, then a convolution layer with convolution kernel size of 5 × 5, and then a convolution layer with five convolution kernels of 3 × 3 to obtain 256 channels
Figure BDA0002365397370000029
Then passing through a layer of rollThe convolution kernel with a kernel size of 1 x 1 performs dimensionality reduction to obtain a channel number of 12
Figure BDA00023653973700000210
Finally, averaging H and W dimensions to form a number to obtain a group of 12-dimensional numbers; the number is divided into two groups of 6-dimensional numbers which are respectively marked as T t→t-1 、T t→t+1 (ii) a For T t→t-1 The first three digits represent
Figure BDA00023653973700000211
To
Figure BDA00023653973700000212
The last three positions are expressed by Euler angles
Figure BDA00023653973700000213
To
Figure BDA00023653973700000214
Rotation of the coordinate system of, T t→t+1 The same is shown;
and (3): and (3) completing self-supervision by utilizing an interframe camera pose relation and combining distance information of a depth map and geometric knowledge:
for images
Figure BDA00023653973700000215
The corresponding depth map is
Figure BDA00023653973700000216
Image corresponding to time t +1
Figure BDA00023653973700000217
Has a conversion relation of T t→t+1 (ii) a For images
Figure BDA00023653973700000218
A certain point pixel on
Figure BDA00023653973700000219
Correspond to
Figure BDA00023653973700000220
Is as
Figure BDA00023653973700000221
The camera projection model and the inter-frame triangular relation can be obtained
Figure BDA00023653973700000222
Correspond to
Figure BDA00023653973700000223
Is formed by a plurality of pixels
Figure BDA00023653973700000224
There is a relationship:
Figure BDA00023653973700000225
wherein K is an internal reference of the camera; according to
Figure BDA00023653973700000226
Mapping to
Figure BDA00023653973700000227
Corresponding space is obtained
Figure BDA00023653973700000228
Each pixel value corresponds to
Figure BDA00023653973700000229
Is further according to
Figure BDA00023653973700000230
The size of the pixel value and the position of the initial pixel are obtained by using a differentiable bilinear sampling interpolation method
Figure BDA00023653973700000231
Corresponding to (1)Composite drawing
Figure BDA00023653973700000232
Wherein each pixel value of the composite map is not a simple mapping
Figure BDA0002365397370000031
The differentiable bilinear sampling interpolation is obtained by weighting four pixels around the pixel;
Figure BDA0002365397370000032
where i = top or bottom, j = left or right, stands for
Figure BDA0002365397370000033
Four surrounding pixels, where w ij Representing the weight of four pixels, has ∑ w ij =1; composite views
Figure BDA0002365397370000034
Later, the original view
Figure BDA0002365397370000035
Self-supervision is formed between two frames, and a loss function is formed:
Figure BDA0002365397370000036
therefore, the purpose of self-supervision without external supervision is achieved by synthesizing a new graph by using a depth graph and constructing a photometric error;
and (4): prevention of network training gradient corruption by masking network
When the traditional geometric knowledge is used in the previous step, the preconditions of no dynamic object, no shielding object and the like in the image need to be met, and a mask network is provided for preventing network training from being inhibited; the mask network and the pose network share the former five-layer convolution network,training with pose network, and obtaining a mask corresponding to a sequence by upsampling through four layers of 4 × 4 convolutional layers and one layer of 3 × 3 convolutional layers
Figure BDA0002365397370000037
For each pixel corresponding mask
Figure BDA0002365397370000038
The loss function for two frames becomes from equation (3)
Figure BDA0002365397370000039
And (5): adding constraints through an inverse sequence network enables the network to more accurately estimate relative pose between frames
When the positive sequence image is used for inputting, the input sequence is
Figure BDA00023653973700000310
The image input of the reverse order network is
Figure BDA00023653973700000311
A good pose estimation network can not only estimate the pose relationship between frames in a positive sequence, but also estimate the pose between the frames when an image sequence is input in a negative sequence, thereby increasing the constraint; for a sequence of three pictures, the position and posture obtained by the network when the sequence is positive are
Figure BDA00023653973700000312
Pose obtained by inverse sequence is
Figure BDA00023653973700000313
Ideally, there are
Figure BDA0002365397370000041
However, the network estimate always has an error, and the error increases the constraint, and the loss function is as follows
Figure BDA0002365397370000042
Figure BDA0002365397370000043
Representing the displacement estimated by the network at the input of a positive sequence,
Figure BDA0002365397370000044
representing the displacement of the network estimate at the time of the inverse sequence input,
Figure BDA0002365397370000045
representing the rotation of the network estimate at the input of the positive sequence,
Figure BDA0002365397370000046
representing the rotation of the network estimation when the inverse sequence is input, and omega represents the weight;
therefore, the pose network is trained by adding constraints, so that the network has the capability of accurately estimating the relative motion between frames.
The invention has the beneficial effects that: the method of deep learning is used for searching the association between adjacent frames from the characteristic information of the color image, the distance information provided by the depth image is utilized, the traditional geometric method is combined, the network avoids complicated external supervision to achieve unsupervised learning, and the constraint is added through the positive sequence and reverse sequence network, so that the motion of the camera is estimated more accurately.
Drawings
FIG. 1 is a single-sequence flow chart of the present invention;
FIG. 2 is an image reconstruction process;
FIG. 3 is a forward and reverse order combining network.
Detailed Description
The invention is further described below with reference to the accompanying drawings, comprising the steps of:
step (1): obtaining same scene color image and depth image by RGB-D camera
Obtaining connections using RGB-D cameraColor-continued image
Figure BDA0002365397370000047
And corresponding continuous depth images
Figure BDA0002365397370000048
The resolution is H W, H and W being the height and width of the image, respectively. Selecting a color image at time t
Figure BDA0002365397370000049
And its adjacent image
Figure BDA00023653973700000410
Figure BDA00023653973700000411
Each color image is RGB three channels, and three color images are combined
Figure BDA00023653973700000412
In the form of a 9-channel splice
Figure BDA00023653973700000413
(wherein feature refers to a feature map obtained after convolutional layer, and 0 refers to the 0 th convolution operation).
Step (2): learning of inter-frame structure relationship based on convolutional neural network
The pose network is mainly composed of a convolutional neural network, and passes through a ReLU activation layer after each convolutional layer. First 9 channel sequences
Figure BDA00023653973700000414
Passing through a convolution layer with convolution kernel size of 7 × 7, then a convolution layer with convolution kernel size of 5 × 5, and then a convolution layer with five convolution kernels of 3 × 3 to obtain 256 channels
Figure BDA0002365397370000051
Then, dimension reduction is performed on the convolution kernel with the size of 1 x 1 after one layer of convolution kernel, and the number of channels is 12
Figure BDA0002365397370000052
And finally, averaging the dimensions H and W into a number to obtain a group of 12-dimensional numbers. The number is divided into two groups of 6-dimensional numbers which are respectively marked as T t→t-1 、T t→t+1 . For T t→t-1 The first three digits represent
Figure BDA0002365397370000053
To
Figure BDA0002365397370000054
The last three bits are expressed by Euler angles
Figure BDA0002365397370000055
To
Figure BDA0002365397370000056
Rotation of the coordinate system of, T t→t+1 The same is shown.
And (3): and (3) completing self-supervision by utilizing an inter-frame camera pose relation and combining distance information of a depth map and geometric knowledge: for images
Figure BDA0002365397370000057
The corresponding depth map is
Figure BDA0002365397370000058
Image corresponding to time t +1
Figure BDA0002365397370000059
Has a conversion relation of T t→t+1 . For images
Figure BDA00023653973700000510
A pixel of a certain point on
Figure BDA00023653973700000511
Is correspondingly provided with
Figure BDA00023653973700000512
Is as
Figure BDA00023653973700000513
The camera projection model and the inter-frame triangular relation can be obtained
Figure BDA00023653973700000514
Correspond to
Figure BDA00023653973700000515
Of a pixel
Figure BDA00023653973700000516
There is a relationship:
Figure BDA00023653973700000517
wherein K is an internal reference of the camera. According to
Figure BDA00023653973700000518
Mapping to
Figure BDA00023653973700000519
The corresponding space is easily obtained
Figure BDA00023653973700000520
Each pixel value corresponds to
Figure BDA00023653973700000521
Is further according to
Figure BDA00023653973700000522
The pixel value and the initial pixel position are obtained by using a differentiable bilinear sampling interpolation method
Figure BDA00023653973700000523
Corresponding composite map of
Figure BDA00023653973700000524
Wherein each pixel value of the composite map is not a simple mapping
Figure BDA00023653973700000525
The differentiable bilinear sampling interpolation is obtained by weighting four pixels around the pixel. As shown in figure 2 of the attached drawings of the specification
Figure BDA00023653973700000526
Where i = top or bottom, j = left or right, stands for
Figure BDA00023653973700000527
Four surrounding pixels, where w ij Representing the weight of four pixels, has ∑ w ij And =1. Composite views
Figure BDA00023653973700000528
Later, the original view
Figure BDA00023653973700000529
Self-supervision is formed between two frames, and a loss function is formed:
Figure BDA0002365397370000061
therefore, the purposes of synthesizing a new image by utilizing a depth image and constructing a luminosity error to achieve self-supervision without external supervision are achieved.
And (4): and the mask network is used for preventing the network training gradient from being damaged. Because the conventional geometric knowledge is used in the previous step, the preconditions of no dynamic object, no shielding object and the like in the image need to be met, and the mask network is provided in order to prevent the network training from being inhibited. The mask network and the pose network are trained together, and the mask corresponding to one sequence is obtained by adopting the upsampling, passing through four layers of 4 x 4 convolutional layers and then passing through one layer of 3 x 3 convolutional layers
Figure BDA0002365397370000062
For each pixel corresponding mask
Figure BDA0002365397370000063
The loss function for two frames becomes from equation (3)
Figure BDA0002365397370000064
And (5): an inverse sequence network, the network structure of which is shown in fig. 3, is used to add constraints to enable the network to estimate the relative pose between frames more accurately. When the positive sequence image is used for inputting, the input sequence is
Figure BDA0002365397370000065
The image input of the reverse order network is
Figure BDA0002365397370000066
A good pose estimation network can not only estimate the pose relationship between frames in a positive sequence, but also estimate the pose between the frames when an image sequence is input in a negative sequence, thereby increasing the constraint. For a sequence of three pictures, the pose obtained by the network in the positive sequence is
Figure BDA0002365397370000067
Pose obtained by inverse sequence is
Figure BDA0002365397370000068
Figure BDA0002365397370000069
Ideally, there are
Figure BDA00023653973700000610
However, the network estimate always has errors, and the error increases the constraint, and the loss function is as follows
Figure BDA00023653973700000611
Figure BDA00023653973700000612
Representing the displacement estimated by the network at the input of a positive sequence,
Figure BDA00023653973700000613
representing the displacement of the network estimate at the time of the inverse sequence input,
Figure BDA00023653973700000614
representing the rotation of the network estimate at the input of a positive sequence,
Figure BDA00023653973700000615
representing the rotation of the network estimate at the time of inverse sequence input and ω represents the weight.
Therefore, the pose network is trained by adding constraint, so that the network has the capability of accurately estimating the relative motion between frames.

Claims (1)

1. An RGBD camera-based unsupervised pose estimation network construction method is characterized by comprising the following specific steps:
step (1): obtaining co-scene color images and depth images using RGB-D camera
Continuous color image using RGB-D camera
Figure FDA0002365397360000011
And corresponding continuous depth images
Figure FDA0002365397360000012
The resolution is H × W, and H and W are the height and width of the image respectively; selecting a color image at time t
Figure FDA0002365397360000013
Image adjacent thereto
Figure FDA0002365397360000014
Figure FDA0002365397360000015
Each color image is RGB three channels, and three color images are formed
Figure FDA0002365397360000016
Is spliced into a 9-channel sequence
Figure FDA0002365397360000017
Wherein feature refers to a feature map obtained after convolution, and 0 refers to convolution operation of 0 th time;
step (2): pose network based learning of inter-frame structural relationships
The pose network is formed by a convolutional neural network, and passes through a ReLU activation layer after each convolution layer; first 9 channel sequence
Figure FDA0002365397360000018
Passing through a convolution layer with convolution kernel size of 7 × 7, then a convolution layer with convolution kernel size of 5 × 5, and then a convolution layer with five convolution kernels of 3 × 3 to obtain 256 channels
Figure FDA0002365397360000019
Then, dimension reduction is performed on the convolution kernel with the size of 1 x 1 after passing through one layer of convolution kernel, and the number of channels is 12
Figure FDA00023653973600000110
Finally, averaging H and W dimensions to form a number to obtain a group of 12-dimensional numbers; the number is divided into two groups of 6-dimensional numbers which are respectively marked as T t→t-1 、T t→t+1 (ii) a For T t→t-1 The first three digits represent
Figure FDA00023653973600000111
To
Figure FDA00023653973600000112
The last three positions are expressed by Euler angles
Figure FDA00023653973600000113
To
Figure FDA00023653973600000114
Rotation of the coordinate system of, T t→t+1 The same is shown;
and (3): and (3) completing self-supervision by utilizing an inter-frame camera pose relation and combining distance information of a depth map and geometric knowledge:
for images
Figure FDA00023653973600000115
The corresponding depth map is
Figure FDA00023653973600000116
Image corresponding to time t +1
Figure FDA00023653973600000117
Has a conversion relation of T t→t+1 (ii) a For images
Figure FDA00023653973600000118
A certain point pixel on
Figure FDA00023653973600000119
Is correspondingly provided with
Figure FDA00023653973600000120
Is as
Figure FDA00023653973600000121
The camera projection model and the inter-frame triangular relation can be used for obtaining
Figure FDA00023653973600000122
Correspond to
Figure FDA00023653973600000123
Is formed by a plurality of pixels
Figure FDA00023653973600000124
There is a relationship:
Figure FDA00023653973600000125
wherein K is an internal reference of the camera; according to
Figure FDA0002365397360000021
Mapping to
Figure FDA0002365397360000022
Corresponding space is obtained
Figure FDA0002365397360000023
Each pixel value corresponds to
Figure FDA0002365397360000024
Is further according to
Figure FDA0002365397360000025
The pixel value and the initial pixel position are obtained by using a differentiable bilinear sampling interpolation method
Figure FDA0002365397360000026
Corresponding composite map of
Figure FDA0002365397360000027
Wherein each pixel value of the composite map is not a simple mapping
Figure FDA0002365397360000028
The differentiable bilinear sampling interpolation is obtained by weighting four pixels around the pixel;
Figure FDA0002365397360000029
where i = top or bottom, j = left or right, stands for
Figure FDA00023653973600000210
Four surrounding pixels, where w ij Representing the weight of four pixels, has ∑ w ij =1; composite views
Figure FDA00023653973600000211
Later, the original view
Figure FDA00023653973600000212
Self-supervision is formed between two frames, and a loss function is formed:
Figure FDA00023653973600000213
therefore, the purposes of synthesizing a new image by utilizing a depth image and constructing a luminosity error to achieve the purpose of self-supervision without external supervision are achieved;
and (4): prevention of network training gradient corruption by masking network
The mask network and the pose network share the former five-layer convolution network, and are trained together with the pose network, and the up-sampling is adopted to obtain a mask I corresponding to a sequence through four layers of 4-4 convolutional layers and one layer of 3-3 convolutional layers t mask For each pixel corresponding mask P t mask Then the loss function for two frames becomes from equation (3)
Figure FDA00023653973600000214
And (5): adding constraints through an inverse sequence network enables the network to more accurately estimate relative pose between frames
When the positive sequence image is used for inputting, the input sequence is
Figure FDA00023653973600000215
The image input of the reverse order network is
Figure FDA00023653973600000216
A good pose estimation network can not only estimate the pose relationship between frames in a positive sequence, but also estimate the pose between the frames when an image sequence is input in a negative sequence, thereby increasing the constraint; for a sequence of three pictures, the position and posture obtained by the network when the sequence is positive are
Figure FDA0002365397360000031
Pose obtained by inverse sequence is
Figure FDA0002365397360000032
Ideally, there are
Figure FDA0002365397360000033
However, the network estimate always has an error, and the error increases the constraint, and the loss function is as follows
Figure FDA0002365397360000034
Figure FDA0002365397360000035
Representing the displacement estimated by the network at the input of the positive sequence,
Figure FDA0002365397360000036
representing the displacement of the network estimate at the time of the inverse sequence input,
Figure FDA0002365397360000037
representing the rotation of the network estimate at the input of the positive sequence,
Figure FDA0002365397360000038
representing the rotation of the network estimate at the time of the inverse sequence input, ω representing the weight;
therefore, the pose network is trained by adding constraint, so that the network has the capability of accurately estimating the relative motion between frames.
CN202010034081.2A 2020-01-13 2020-01-13 RGBD camera-based unsupervised pose estimation network construction method Active CN111260680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010034081.2A CN111260680B (en) 2020-01-13 2020-01-13 RGBD camera-based unsupervised pose estimation network construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010034081.2A CN111260680B (en) 2020-01-13 2020-01-13 RGBD camera-based unsupervised pose estimation network construction method

Publications (2)

Publication Number Publication Date
CN111260680A CN111260680A (en) 2020-06-09
CN111260680B true CN111260680B (en) 2023-01-03

Family

ID=70954018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010034081.2A Active CN111260680B (en) 2020-01-13 2020-01-13 RGBD camera-based unsupervised pose estimation network construction method

Country Status (1)

Country Link
CN (1) CN111260680B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739078B (en) * 2020-06-15 2022-11-18 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
CN112489128A (en) * 2020-12-14 2021-03-12 南通大学 RGB-D indoor unmanned aerial vehicle positioning implementation method based on unsupervised deep learning
CN113888629A (en) * 2021-10-28 2022-01-04 浙江大学 RGBD camera-based rapid object three-dimensional pose estimation method
CN114998411B (en) * 2022-04-29 2024-01-09 中国科学院上海微系统与信息技术研究所 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106658023A (en) * 2016-12-21 2017-05-10 山东大学 End-to-end visual odometer and method based on deep learning
CN110490928A (en) * 2019-07-05 2019-11-22 天津大学 A kind of camera Attitude estimation method based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915663B (en) * 2016-09-15 2024-04-30 谷歌有限责任公司 Image depth prediction neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106658023A (en) * 2016-12-21 2017-05-10 山东大学 End-to-end visual odometer and method based on deep learning
CN110490928A (en) * 2019-07-05 2019-11-22 天津大学 A kind of camera Attitude estimation method based on deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Positioning System Based on Monocular Vision for Industrial Robots;Mingyu Gao et al.;《IEEE Xplore》;20161103;全文 *
A Target Detection System for Mobile Robot Based On Single Shot Multibox Detector Neural Network;Yujie Du;《IEEE Xplore》;20190530;全文 *
Circular Trajectory Planning with Pose Control for Six-DOF Manipulator;Jincan Li et al.;《IEEE Xplore》;20190621;全文 *
深度学习实时多人姿态估计与跟踪;许忠雄 等;《中国电子科学研究院学报》;20180820(第04期);全文 *
融合扩张卷积网络与SLAM的无监督单目深度估计;戴仁月 等;《激光与光电子学进展》;20190902(第06期);全文 *

Also Published As

Publication number Publication date
CN111260680A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN111260680B (en) RGBD camera-based unsupervised pose estimation network construction method
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN110490928B (en) Camera attitude estimation method based on deep neural network
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN108765479A (en) Using deep learning to monocular view estimation of Depth optimization method in video sequence
CN111105432B (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN111354030B (en) Method for generating unsupervised monocular image depth map embedded into SENet unit
CN108986166A (en) A kind of monocular vision mileage prediction technique and odometer based on semi-supervised learning
CN115187638B (en) Unsupervised monocular depth estimation method based on optical flow mask
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN109272493A (en) A kind of monocular vision odometer method based on recursive convolution neural network
CN112767467B (en) Double-image depth estimation method based on self-supervision deep learning
CN110992414B (en) Indoor monocular scene depth estimation method based on convolutional neural network
CN113077505A (en) Optimization method of monocular depth estimation network based on contrast learning
CN113283525A (en) Image matching method based on deep learning
CN115883764A (en) Underwater high-speed video frame interpolation method and system based on data cooperation
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN112241959A (en) Attention mechanism generation semantic segmentation method based on superpixels
CN113610912B (en) System and method for estimating monocular depth of low-resolution image in three-dimensional scene reconstruction
CN112419411B (en) Realization method of vision odometer based on convolutional neural network and optical flow characteristics
Berenguel-Baeta et al. Fredsnet: Joint monocular depth and semantic segmentation with fast fourier convolutions from single panoramas
Liu et al. Towards better data exploitation in self-supervised monocular depth estimation
CN112132880A (en) Real-time dense depth estimation method based on sparse measurement and monocular RGB (red, green and blue) image
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN117197229B (en) Multi-stage estimation monocular vision odometer method based on brightness alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant