CN106658023A

CN106658023A - End-to-end visual odometer and method based on deep learning

Info

Publication number: CN106658023A
Application number: CN201611191845.9A
Authority: CN
Inventors: 刘国良; 罗勇; 田国会; 赵洋
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2016-12-21
Filing date: 2016-12-21
Publication date: 2017-05-10
Anticipated expiration: 2036-12-21
Also published as: CN106658023B

Abstract

The invention discloses an end-to-end visual odometer and method based on deep learning. The end-to-end visual odometer based on the deep learning comprises cascaded optical flow network and inter-frame estimation network; according to adjacent frames in a data concentration image sequence, the optical flow network selects an endpoint error (EPE) between an output optical flow vector and reference data, utilizes the EPE as a loss function, performs network training and outputs a generated optical flow; and the inter-frame estimation network utilizes an optical flow image as an input, establishes a loss function based on a distance between a six-degrees-of-freedom output position vector and the reference data, iterates a training network and then perform inter-frame estimation. According to the end-to-end visual odometer and method based on the deep learning, an optical flow network module and an inter-frame estimation network module are respectively trained by utilizing different input/output data, are cascaded to form an end-to-end visual odometer module, and then are subject to deep training, thereby optimizing parameters. The hierarchical training method can dramatically reduce the training time and improve the training efficiency.

Description

A kind of end-to-end visual odometry and method based on deep learning

Technical field

The present invention relates to a kind of end-to-end visual odometry and method based on deep learning.

Background technology

Visual odometry is the method that robot estimates displacement using vision sensor, is robot localization, map The basic technology of structure, avoidance and path planning contour level task.

Traditional visual odometry is based primarily upon the space geometry relation of interframe visual signature, estimates robot interframe position Appearance, therefore also referred to as interframe is estimated.Feature is divided into sparse features and the class of dense characteristic two, corresponds respectively to image local information table Show and represented with global information.Traditional feature needs artificial selection or calculates, and causes to represent image information with certain artificial Property and limitation, while the accuracy of dependence characteristics matching, single etc. in the reply illumination variation of image, motion blur, texture Situation has larger limitation, have impact on its estimated accuracy.

The content of the invention

The present invention is in order to solve the above problems, it is proposed that a kind of end-to-end visual odometry and side based on deep learning Method, the present invention realizes estimate from original image to interframe straight using interframe estimating depth nerual network technique end to end Output is connect, relative to conventional method, the technology is without the need for manual extraction feature or light stream picture, without the need for construction feature description, nothing Interframe characteristic matching is needed, the geometric operation of complexity need not be more carried out.

To achieve these goals, the present invention is adopted the following technical scheme that：

A kind of end-to-end visual odometry based on deep learning, including the light flow network and interframe estimation network of cascade, Consecutive frame of the smooth flow network in image sequence in data set, chooses the light between output light stream vectors and reference data Flow endpoint error is loss function, and after carrying out network training, by the light stream picture output for generating, the interframe estimates network with light Stream picture builds loss function, iteration as input based on the distance between six degree of freedom output pose vector and reference data Training network, carries out interframe estimation.

The smooth flow network and interframe estimate that network is stratification training method.

The smooth flow network is convolutional neural networks training aids.

The smooth flow network is chosen between output light stream vectors and reference data using consecutive frame consecutive image as input Light stream end point error enters the network training that the sequential frame image for being about to be input into generates light stream picture as loss function.

The interframe estimates that the training of whole light stream picture, using light stream picture as input, is divided into global optical flow by network The local of figure training and multiple sub-light stream pictures is trained, and the feature of finally both outputs of combination, output to full articulamentum completes base Estimate network in the interframe of light stream.

The interframe estimates that network is using KITTI data set training networks.

The interframe estimates that network is come training network using generated data.

A kind of end-to-end vision mileage method of estimation based on deep learning is adjacent in image sequence in data set Frame, the light stream end point error chosen between output light stream vectors and reference data is loss function, after carrying out network training, is generated Light stream picture, according to light stream picture, based on the distance between six degree of freedom output pose vector and reference data loss letter is built Number, repetitive exercise network carries out interframe estimation.

Light stream mixed-media network modules mixed-media is respectively trained using different inputoutput datas and interframe estimates mixed-media network modules mixed-media, finally incite somebody to action both Cascade, further profound training, Optimal Parameters.

Beneficial effects of the present invention are：

(1) present invention chooses or calculates feature compared to conventional method without the need for artificial, eliminates the larger feature of error With process, more without the need for complicated geometric operation, with it is directly perceived simply the characteristics of；

(2) stratification deep neural network training method proposed by the present invention, is capable of achieving light flow network and interframe estimates net Network parallel training, improves training speed；

(3) present invention in light flow network application, improve optical flow computation speed so that algorithm real-time is carried Rise；

(4) present invention is respectively trained light stream mixed-media network modules mixed-media using different inputoutput datas and interframe estimates mixed-media network modules mixed-media, Finally both are cascaded into composition visual odometry module end to end, further profound training, Optimal Parameters.This is hierarchical Training method can be greatly reduced the training time, improve training effectiveness.

Description of the drawings

Fig. 1 is the system structure diagram of the present invention；

Fig. 2 is the light stream network diagram based on convolutional neural networks of the present invention；

Fig. 3 is that the interframe of the present invention estimates network diagram.

Specific embodiment：

Below in conjunction with the accompanying drawings the invention will be further described with embodiment.

A kind of interframe estimating depth nerual network technique end to end, realizes estimate from original image to interframe direct Output, is a modular visual odometry.Relative to conventional method, the technology is without the need for manual extraction feature or light flow graph As, without the need for construction feature description, without the need for interframe characteristic matching, need not more carry out the geometric operation of complexity.

As shown in figure 1, the odometer of the present invention includes two submodules：Light stream mixed-media network modules mixed-media and interframe estimate network mould Block.Two modules adopt stratification training method, i.e., be respectively trained light stream mixed-media network modules mixed-media and frame using different inputoutput datas Between estimate mixed-media network modules mixed-media, finally both are cascaded into composition visual odometry module end to end, further profound training, excellent Change parameter.The hierarchical training method can be greatly reduced the training time, improve training effectiveness, be also the excellent of deep neural network One of gesture.Comprise the following steps that：

The structure of light flow network：Light flow network can be made up of convolutional neural networks (CNN), and by True Data or synthesis Data carry out network training, using consecutive frame consecutive image as input, choose the light between output light stream vectors and reference data Flow endpoint error (endpoint error, EPE) realizes the net generated to light stream from input sequential frame image as loss function Network training.

As shown in Fig. 2 respectively the i-th two field picture and i+1 two field picture input CNN networks are exported into respective characteristics of image Represent；Frame image features are represented before and after combination, further input into deeper CNN networks；Improved by upper convolutional network The pond operating result resolution ratio of CNN networks, exports dense global optical flow figure pixel-by-pixel.

Interframe estimates the structure of network：The network using light stream picture as input, with six degree of freedom output pose vector with The distance between reference data builds loss function, repetitive exercise network.Fig. 3 is illustrated using local light stream picture and global light Stream picture is respectively trained combination of network and completes the process that the interframe based on light stream is estimated.This process can select KITTI data sets or Generated data carrys out training network, and calculates input light stream by traditional optical flow algorithm.

During the foundation of interframe estimation module, first global optical flow figure is divided into into multiple local light stream subgraphs, then Global optical flow figure and local light stream subgraph are input into into respectively CNN networks, light stream local feature are obtained and global characteristics is represented.By light Flow local feature and global characteristics are represented and are combined, be input to full articulamentum, obtain the frame of six-freedom degree pose vector representation Between estimate.

Training process can be divided into three phases：Local light stream subgraph first used as input, estimate as output, training by interframe Network；Secondly using global optical flow figure as input, interframe is estimated as output, training network；Finally, by local light stream subgraph and Global optical flow figure simultaneously as input, estimate as output, further training network by interframe.

Realize end-to-end visual odometry：The light flow network that cascade is trained and the interframe based on light stream estimate network, Using the consecutive frame of image sequence in data set as whole network input, with six degree of freedom output vector and reference data away from From construction loss function, repetitive exercise Optimal Parameters, quick, accurate, the end-to-end visual odometry of robust is realized.

Although the above-mentioned accompanying drawing that combines is described to the specific embodiment of the present invention, not to present invention protection model The restriction enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not Need the various modifications made by paying creative work or deformation still within protection scope of the present invention.

Claims

1. a kind of end-to-end visual odometry based on deep learning, is characterized in that：Estimate including the light flow network and interframe of cascade Meter network, consecutive frame of the smooth flow network in image sequence in data set chooses output light stream vectors and reference data Between light stream end point error be loss function, after carrying out network training, by the light stream picture output for generating, the interframe is estimated Network builds loss letter using light stream picture as input based on the distance between six degree of freedom output pose vector and reference data Number, repetitive exercise network carries out interframe estimation.

2. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：The smooth drift net Network and interframe estimate that network is stratification training method.

3. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：The smooth drift net Network is convolutional neural networks training aids.

4. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：The smooth drift net Network chooses the light stream end point error between output light stream vectors and reference data as damage using consecutive frame consecutive image as input Function is lost, enters the network training that the sequential frame image for being about to be input into generates light stream picture.

5. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：

The interframe estimates that the training of whole light stream picture, using light stream picture as input, is divided into global optical flow figure instruction by network Practice and the local of multiple sub-light stream pictures is trained, the feature of finally both outputs of combination, output to full articulamentum is completed based on light The interframe of stream estimates network.

6. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：The interframe is estimated Meter network is using KITTI data set training networks.

7. a kind of end-to-end visual odometry based on deep learning as claimed in claim 1, is characterized in that：The interframe is estimated Meter network is come training network using generated data.

8. a kind of end-to-end vision mileage method of estimation based on deep learning, is characterized in that：According to image sequence in data set In consecutive frame, choose output light stream vectors and reference data between light stream end point error be loss function, carry out network instruction After white silk, light stream picture is generated, according to light stream picture, based on the distance between six degree of freedom output pose vector and reference data structure Loss function is built, repetitive exercise network carries out interframe estimation.

9. a kind of end-to-end vision mileage method of estimation based on deep learning as claimed in claim 8, is characterized in that：Using Different inputoutput datas are respectively trained light stream mixed-media network modules mixed-media and interframe estimates mixed-media network modules mixed-media, finally cascade both, further Profound level training, Optimal Parameters.