CN106658023B - A kind of end-to-end visual odometry and method based on deep learning - Google Patents

A kind of end-to-end visual odometry and method based on deep learning Download PDF

Info

Publication number
CN106658023B
CN106658023B CN201611191845.9A CN201611191845A CN106658023B CN 106658023 B CN106658023 B CN 106658023B CN 201611191845 A CN201611191845 A CN 201611191845A CN 106658023 B CN106658023 B CN 106658023B
Authority
CN
China
Prior art keywords
network
light stream
interframe
training
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611191845.9A
Other languages
Chinese (zh)
Other versions
CN106658023A (en
Inventor
刘国良
罗勇
田国会
赵洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201611191845.9A priority Critical patent/CN106658023B/en
Publication of CN106658023A publication Critical patent/CN106658023A/en
Application granted granted Critical
Publication of CN106658023B publication Critical patent/CN106658023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C22/00Measuring distance traversed on the ground by vehicles, persons, animals or other moving solid bodies, e.g. using odometers, using pedometers

Abstract

The invention discloses a kind of end-to-end visual odometry and method based on deep learning, network is estimated including cascade light stream network and interframe, the light stream network is according to the consecutive frame in data set in image sequence, choosing the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, the light stream of generation is exported, the interframe estimates network using light stream image as input, loss function is constructed based on six degree of freedom output the distance between pose vector and reference data, repetitive exercise network carries out interframe estimation.Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas in the present invention, and the two cascade is finally constituted visual odometry module end to end, further profound training, Optimal Parameters.The training time can be greatly reduced in the hierarchical training method, improve training effectiveness.

Description

A kind of end-to-end visual odometry and method based on deep learning
Technical field
The present invention relates to a kind of end-to-end visual odometry and method based on deep learning.
Background technique
Visual odometry is method of the robot using visual sensor estimation displacement, is robot localization, map The basic technology of building, avoidance and path planning contour level task.
Traditional visual odometry is based primarily upon the space geometry relationship of interframe visual signature, estimates robot interframe position Appearance, therefore also referred to as interframe is estimated.Feature is divided into two class of sparse features and dense characteristic, corresponds respectively to image local information table Showing indicates with global information.Traditional feature needs manually to choose or calculate, and causes to indicate image information to have centainly artificial Property and limitation, while the matched accuracy of dependence characteristics it is single etc. in the illumination variation of reply image, motion blur, texture Situation has biggish limitation, affects its estimated accuracy.
Summary of the invention
The present invention to solve the above-mentioned problems, proposes a kind of end-to-end visual odometry based on deep learning and side Method, the present invention using interframe estimating depth nerual network technique end to end, realize estimate from original image to interframe it is straight Output is connect, relative to conventional method, the technology is without manual extraction feature or light stream image, the sub, nothing without construction feature description Interframe characteristic matching is needed, there are no need to carry out complicated geometric operation.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of end-to-end visual odometry based on deep learning, including cascade light stream network and interframe estimate network, The light stream network chooses the light between output light stream vectors and reference data according to the consecutive frame in data set in image sequence Flow endpoint error is loss function, and after carrying out network training, the light stream image of generation is exported, and the interframe estimation network is with light Stream picture constructs loss function, iteration as input, based on six degree of freedom output the distance between pose vector and reference data Training network, carries out interframe estimation.
The light stream network and interframe estimation network are stratification training method.
The light stream network is convolutional neural networks training aids.
The light stream network is chosen between output light stream vectors and reference data using consecutive frame consecutive image as input Light stream end point error carries out the network training that the sequential frame image that will be inputted generates light stream image as loss function.
The interframe estimates that the training of entire light stream image using light stream image as input, is divided into global optical flow by network Figure is trained and the local of multiple sub-light stream pictures is trained, the feature of finally both combinations output, is output to full articulamentum, completion base Network is estimated in the interframe of light stream.
The interframe estimation network is to utilize KITTI data set training network.
The interframe estimation network is to train network using generated data.
A kind of end-to-end vision mileage estimation method based on deep learning, according to adjacent in image sequence in data set Frame, choosing the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, is generated Light stream image, according to light stream image, based on six degree of freedom output the distance between pose vector and reference data building loss letter Number, repetitive exercise network carry out interframe estimation.
Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas, finally by the two Cascade, further profound training, Optimal Parameters.
The invention has the benefit that
(1) present invention chooses or calculates feature compared to conventional method without artificial, eliminate the biggish feature of error With process, there are no complicated geometric operation is needed, have the characteristics that intuitive simple;
(2) stratification deep neural network training method proposed by the present invention is, it can be achieved that light stream network and interframe estimate net Network parallel training, improves training speed;
(3) present invention in light stream network application, optical flow computation speed is improved, so that algorithm real-time is mentioned It rises;
(4) light stream network module and interframe estimation network module is respectively trained using different inputoutput datas in the present invention, The two cascade is finally constituted into visual odometry module end to end, further profound training, Optimal Parameters.This is hierarchical The training time can be greatly reduced in training method, improve training effectiveness.
Detailed description of the invention
Fig. 1 is system structure diagram of the invention;
Fig. 2 is the light stream network diagram of the invention based on convolutional neural networks;
Fig. 3 is that interframe of the invention estimates network diagram.
Specific embodiment:
The invention will be further described with embodiment with reference to the accompanying drawing.
A kind of interframe estimating depth nerual network technique end to end, realize estimate from original image to interframe it is direct Output, is a modular visual odometry.Relative to conventional method, which is not necessarily to manual extraction feature or light stream figure Picture describes son without construction feature, is not necessarily to interframe characteristic matching, and there are no the geometric operations that need to carry out complexity.
As shown in Figure 1, odometer of the invention includes two submodules: light stream network module and interframe estimate network mould Block.Two modules use stratification training method, i.e., light stream network module and frame are respectively trained using different inputoutput datas Between estimate network module, the two cascade is finally constituted into visual odometry module end to end, further profound training, excellent Change parameter.The hierarchical training method can be greatly reduced the training time, improve the excellent of training effectiveness and deep neural network One of gesture.Specific step is as follows:
The building of light stream network: light stream network can be made of convolutional neural networks (CNN), and pass through truthful data or synthesis Data carry out network training, using consecutive frame consecutive image as input, choose the light between output light stream vectors and reference data Flow endpoint error (endpoint error, EPE) is used as loss function, realizes the net generated from input sequential frame image to light stream Network training.
As shown in Fig. 2, the i-th frame image and i+1 frame image are inputted CNN network respectively, respective characteristics of image is exported It indicates;Before and after frames image feature representation is combined, deeper CNN network is further input into;It is improved by upper convolutional network The pond operating result resolution ratio of CNN network exports dense global optical flow figure pixel-by-pixel.
Interframe estimation network building: the network using light stream image as input, with six degree of freedom export pose vector and The distance between reference data constructs loss function, repetitive exercise network.Fig. 3, which is illustrated, utilizes local light stream picture and global light The process that combination of network completes the interframe estimation based on light stream is respectively trained in stream picture.This process can be selected KITTI data set or Generated data calculates input light stream by traditional optical flow algorithm to train network.
In the establishment process of interframe estimation module, global optical flow figure is divided into multiple local light stream subgraphs first, then Global optical flow figure and local light stream subgraph are inputted into CNN network respectively, obtaining light stream local feature and global characteristics indicates.By light Stream local feature and global characteristics expression are combined, and are input to full articulamentum, are obtained the frame of six-freedom degree pose vector expression Between estimate.
Training process can be divided into three phases: light stream subgraph local first is as input, and interframe estimation is as output, training Network;Secondly using global optical flow figure as input, interframe estimation is as output, training network;Finally, by local light stream subgraph and For global optical flow figure simultaneously as input, interframe estimation further trains network as output.
It realizes end-to-end visual odometry: cascading trained light stream network and the interframe estimation network based on light stream, Using the consecutive frame of image sequence in data set as the input of whole network, with six degree of freedom output vector and reference data away from From construction loss function, repetitive exercise Optimal Parameters realize quick, accurate, robust end-to-end visual odometry.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims (7)

1. a kind of end-to-end visual odometry based on deep learning, it is characterized in that: including that cascade light stream network and interframe are estimated Network is counted, the light stream network is according to the consecutive frame in data set in image sequence, using consecutive frame consecutive image as input, choosing Taking the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, by generation The output of light stream image, the interframe estimate network using light stream image as input, based on six degree of freedom output pose vector and base The distance between quasi- data construct loss function, and repetitive exercise network carries out interframe estimation;
After light stream network and interframe estimation cascade, further profound training, Optimal Parameters.
2. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the light stream net Network and interframe estimation network are stratification training method.
3. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the light stream net Network is convolutional neural networks training aids.
4. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that:
The interframe estimates network using light stream image as input, and the training of entire light stream image is divided into global optical flow figure instruction Experienced and multiple sub-light stream pictures part training, the feature of finally both combinations output are output to full articulamentum, complete to be based on light The interframe of stream estimates network.
5. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the interframe is estimated Counting network is to utilize KITTI data set training network.
6. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the interframe is estimated Meter network is to train network using generated data.
7. a kind of end-to-end vision mileage estimation method based on deep learning, it is characterized in that: according to image sequence in data set In consecutive frame, using consecutive frame consecutive image as input, choose output light stream vectors and reference data between light stream endpoint Error is loss function, after carrying out network training, generates light stream image, according to light stream image, exports pose based on six degree of freedom The distance between vector and reference data construct loss function, and repetitive exercise network carries out interframe estimation;
Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas, finally by the two grade Connection, further profound training, Optimal Parameters.
CN201611191845.9A 2016-12-21 2016-12-21 A kind of end-to-end visual odometry and method based on deep learning Active CN106658023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611191845.9A CN106658023B (en) 2016-12-21 2016-12-21 A kind of end-to-end visual odometry and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611191845.9A CN106658023B (en) 2016-12-21 2016-12-21 A kind of end-to-end visual odometry and method based on deep learning

Publications (2)

Publication Number Publication Date
CN106658023A CN106658023A (en) 2017-05-10
CN106658023B true CN106658023B (en) 2019-12-03

Family

ID=58833548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611191845.9A Active CN106658023B (en) 2016-12-21 2016-12-21 A kind of end-to-end visual odometry and method based on deep learning

Country Status (1)

Country Link
CN (1) CN106658023B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11856181B2 (en) 2017-09-28 2023-12-26 Lg Electronics Inc. Method and device for transmitting or receiving 6DoF video using stitching and re-projection related metadata

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107289967B (en) * 2017-08-17 2023-06-09 珠海一微半导体股份有限公司 Separable optical odometer and mobile robot
CN107527358B (en) * 2017-08-23 2020-05-12 北京图森智途科技有限公司 Dense optical flow estimation method and device
CN109785376B (en) * 2017-11-15 2023-02-28 富士通株式会社 Training method of depth estimation device, depth estimation device and storage medium
CN107909602A (en) * 2017-12-08 2018-04-13 长沙全度影像科技有限公司 A kind of moving boundaries method of estimation based on deep learning
CN108122249A (en) * 2017-12-20 2018-06-05 长沙全度影像科技有限公司 A kind of light stream method of estimation based on GAN network depth learning models
CN109978924A (en) * 2017-12-27 2019-07-05 长沙学院 A kind of visual odometry method and system based on monocular
CN108303094A (en) * 2018-01-31 2018-07-20 深圳市拓灵者科技有限公司 The Position Fixing Navigation System and its positioning navigation method of array are merged based on multiple vision sensor
CN108648216B (en) * 2018-04-19 2020-10-09 长沙学院 Visual odometer implementation method and system based on optical flow and deep learning
CN108881952B (en) * 2018-07-02 2021-09-14 上海商汤智能科技有限公司 Video generation method and device, electronic equipment and storage medium
CN109272493A (en) * 2018-08-28 2019-01-25 中国人民解放军火箭军工程大学 A kind of monocular vision odometer method based on recursive convolution neural network
CN109656134A (en) * 2018-12-07 2019-04-19 电子科技大学 A kind of end-to-end decision-making technique of intelligent vehicle based on space-time joint recurrent neural network
CN109708658B (en) * 2019-01-14 2020-11-24 浙江大学 Visual odometer method based on convolutional neural network
CN111627051B (en) 2019-02-27 2023-12-15 中强光电股份有限公司 Electronic device and method for estimating optical flow
CN110335337B (en) * 2019-04-28 2021-11-05 厦门大学 Method for generating visual odometer of antagonistic network based on end-to-end semi-supervision
CN110111366B (en) * 2019-05-06 2021-04-30 北京理工大学 End-to-end optical flow estimation method based on multistage loss
CN110310299B (en) * 2019-07-03 2021-11-19 北京字节跳动网络技术有限公司 Method and apparatus for training optical flow network, and method and apparatus for processing image
CN110378936B (en) * 2019-07-30 2021-11-05 北京字节跳动网络技术有限公司 Optical flow calculation method and device and electronic equipment
CN110599542A (en) * 2019-08-30 2019-12-20 北京影谱科技股份有限公司 Method and device for local mapping of adaptive VSLAM (virtual local area model) facing to geometric area
CN112648997A (en) * 2019-10-10 2021-04-13 成都鼎桥通信技术有限公司 Method and system for positioning based on multitask network model
CN111192312B (en) * 2019-12-04 2023-12-26 中广核工程有限公司 Depth image acquisition method, device, equipment and medium based on deep learning
CN111127557B (en) * 2019-12-13 2022-12-13 中国电子科技集团公司第二十研究所 Visual SLAM front-end attitude estimation method based on deep learning
CN111260680B (en) * 2020-01-13 2023-01-03 杭州电子科技大学 RGBD camera-based unsupervised pose estimation network construction method
CN111539988B (en) * 2020-04-15 2024-04-09 京东方科技集团股份有限公司 Visual odometer implementation method and device and electronic equipment
CN111833400B (en) * 2020-06-10 2023-07-28 广东工业大学 Camera pose positioning method
CN112344922B (en) * 2020-10-26 2022-10-21 中国科学院自动化研究所 Monocular vision odometer positioning method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9761008B2 (en) * 2014-05-08 2017-09-12 The Trustees Of The University Of Pennsylvania Methods, systems, and computer readable media for visual odometry using rigid structures identified by antipodal transform
US9427874B1 (en) * 2014-08-25 2016-08-30 Google Inc. Methods and systems for providing landmarks to facilitate robot localization and visual odometry
US20160349379A1 (en) * 2015-05-28 2016-12-01 Alberto Daniel Lacaze Inertial navigation unit enhaced with atomic clock

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation》;Gabriele Costante 等;《IEEE ROBOTICS AND AUTOMATION LETTERS》;20160131;正文第3部分,第20-22页,图3-4 *
《F1owNet: Learning Optical Flow with Convolutional Networks》;Alexey Dosovitskiy 等;《IEEE International Conference on Computer Vision》;20151231;摘要,图1-3,第2579-2764段 *
《High Accuracy Optical Flow Estimation Based on a Theory for Warping》;Thomas Brox等;《IEEE Conference on European Conference on Computer Vision》;20040531;全文 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11856181B2 (en) 2017-09-28 2023-12-26 Lg Electronics Inc. Method and device for transmitting or receiving 6DoF video using stitching and re-projection related metadata

Also Published As

Publication number Publication date
CN106658023A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106658023B (en) A kind of end-to-end visual odometry and method based on deep learning
CN101625768B (en) Three-dimensional human face reconstruction method based on stereoscopic vision
CN106600583B (en) Parallax picture capturing method based on end-to-end neural network
CN104661010B (en) Method and device for establishing three-dimensional model
CN103003846B (en) Articulation region display device, joint area detecting device, joint area degree of membership calculation element, pass nodular region affiliation degree calculation element and joint area display packing
CN104408760B (en) A kind of high-precision virtual assembly system algorithm based on binocular vision
CN111340868B (en) Unmanned underwater vehicle autonomous decision control method based on visual depth estimation
CN105144196A (en) Method and device for calculating a camera or object pose
CN108986166A (en) A kind of monocular vision mileage prediction technique and odometer based on semi-supervised learning
CN106780592A (en) Kinect depth reconstruction algorithms based on camera motion and image light and shade
CN106296812A (en) Synchronize location and build drawing method
CN105225269A (en) Based on the object modelling system of motion
CN106780543A (en) A kind of double framework estimating depths and movement technique based on convolutional neural networks
CN103413352A (en) Scene three-dimensional reconstruction method based on RGBD multi-sensor fusion
CN101976455A (en) Color image three-dimensional reconstruction method based on three-dimensional matching
CN109272493A (en) A kind of monocular vision odometer method based on recursive convolution neural network
CN106780631A (en) A kind of robot closed loop detection method based on deep learning
CN104123747A (en) Method and system for multimode touch three-dimensional modeling
Aliakbarian et al. Flag: Flow-based 3d avatar generation from sparse observations
CN109708654A (en) A kind of paths planning method and path planning system
CN110264526A (en) A kind of scene depth and camera position posture method for solving based on deep learning
CN104966320A (en) Method for automatically generating camouflage pattern based on three-order Bezier curve
Liu et al. Atvio: Attention guided visual-inertial odometry
Liao et al. Maptrv2: An end-to-end framework for online vectorized hd map construction
Carvalho et al. Long-term prediction of motion trajectories using path homology clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant