CN106658023B

CN106658023B - A kind of end-to-end visual odometry and method based on deep learning

Info

Publication number: CN106658023B
Application number: CN201611191845.9A
Authority: CN
Inventors: 刘国良; 罗勇; 田国会; 赵洋
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2016-12-21
Filing date: 2016-12-21
Publication date: 2019-12-03
Anticipated expiration: 2036-12-21
Also published as: CN106658023A

Abstract

The invention discloses a kind of end-to-end visual odometry and method based on deep learning, network is estimated including cascade light stream network and interframe, the light stream network is according to the consecutive frame in data set in image sequence, choosing the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, the light stream of generation is exported, the interframe estimates network using light stream image as input, loss function is constructed based on six degree of freedom output the distance between pose vector and reference data, repetitive exercise network carries out interframe estimation.Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas in the present invention, and the two cascade is finally constituted visual odometry module end to end, further profound training, Optimal Parameters.The training time can be greatly reduced in the hierarchical training method, improve training effectiveness.

Description

A kind of end-to-end visual odometry and method based on deep learning

Technical field

The present invention relates to a kind of end-to-end visual odometry and method based on deep learning.

Background technique

Visual odometry is method of the robot using visual sensor estimation displacement, is robot localization, map The basic technology of building, avoidance and path planning contour level task.

Traditional visual odometry is based primarily upon the space geometry relationship of interframe visual signature, estimates robot interframe position Appearance, therefore also referred to as interframe is estimated.Feature is divided into two class of sparse features and dense characteristic, corresponds respectively to image local information table Showing indicates with global information.Traditional feature needs manually to choose or calculate, and causes to indicate image information to have centainly artificial Property and limitation, while the matched accuracy of dependence characteristics it is single etc. in the illumination variation of reply image, motion blur, texture Situation has biggish limitation, affects its estimated accuracy.

Summary of the invention

The present invention to solve the above-mentioned problems, proposes a kind of end-to-end visual odometry based on deep learning and side Method, the present invention using interframe estimating depth nerual network technique end to end, realize estimate from original image to interframe it is straight Output is connect, relative to conventional method, the technology is without manual extraction feature or light stream image, the sub, nothing without construction feature description Interframe characteristic matching is needed, there are no need to carry out complicated geometric operation.

To achieve the goals above, the present invention adopts the following technical scheme:

A kind of end-to-end visual odometry based on deep learning, including cascade light stream network and interframe estimate network, The light stream network chooses the light between output light stream vectors and reference data according to the consecutive frame in data set in image sequence Flow endpoint error is loss function, and after carrying out network training, the light stream image of generation is exported, and the interframe estimation network is with light Stream picture constructs loss function, iteration as input, based on six degree of freedom output the distance between pose vector and reference data Training network, carries out interframe estimation.

The light stream network and interframe estimation network are stratification training method.

The light stream network is convolutional neural networks training aids.

The light stream network is chosen between output light stream vectors and reference data using consecutive frame consecutive image as input Light stream end point error carries out the network training that the sequential frame image that will be inputted generates light stream image as loss function.

The interframe estimates that the training of entire light stream image using light stream image as input, is divided into global optical flow by network Figure is trained and the local of multiple sub-light stream pictures is trained, the feature of finally both combinations output, is output to full articulamentum, completion base Network is estimated in the interframe of light stream.

The interframe estimation network is to utilize KITTI data set training network.

The interframe estimation network is to train network using generated data.

A kind of end-to-end vision mileage estimation method based on deep learning, according to adjacent in image sequence in data set Frame, choosing the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, is generated Light stream image, according to light stream image, based on six degree of freedom output the distance between pose vector and reference data building loss letter Number, repetitive exercise network carry out interframe estimation.

Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas, finally by the two Cascade, further profound training, Optimal Parameters.

The invention has the benefit that

(1) present invention chooses or calculates feature compared to conventional method without artificial, eliminate the biggish feature of error With process, there are no complicated geometric operation is needed, have the characteristics that intuitive simple；

(2) stratification deep neural network training method proposed by the present invention is, it can be achieved that light stream network and interframe estimate net Network parallel training, improves training speed；

(3) present invention in light stream network application, optical flow computation speed is improved, so that algorithm real-time is mentioned It rises；

(4) light stream network module and interframe estimation network module is respectively trained using different inputoutput datas in the present invention, The two cascade is finally constituted into visual odometry module end to end, further profound training, Optimal Parameters.This is hierarchical The training time can be greatly reduced in training method, improve training effectiveness.

Detailed description of the invention

Fig. 1 is system structure diagram of the invention；

Fig. 2 is the light stream network diagram of the invention based on convolutional neural networks；

Fig. 3 is that interframe of the invention estimates network diagram.

Specific embodiment:

The invention will be further described with embodiment with reference to the accompanying drawing.

A kind of interframe estimating depth nerual network technique end to end, realize estimate from original image to interframe it is direct Output, is a modular visual odometry.Relative to conventional method, which is not necessarily to manual extraction feature or light stream figure Picture describes son without construction feature, is not necessarily to interframe characteristic matching, and there are no the geometric operations that need to carry out complexity.

As shown in Figure 1, odometer of the invention includes two submodules: light stream network module and interframe estimate network mould Block.Two modules use stratification training method, i.e., light stream network module and frame are respectively trained using different inputoutput datas Between estimate network module, the two cascade is finally constituted into visual odometry module end to end, further profound training, excellent Change parameter.The hierarchical training method can be greatly reduced the training time, improve the excellent of training effectiveness and deep neural network One of gesture.Specific step is as follows:

The building of light stream network: light stream network can be made of convolutional neural networks (CNN), and pass through truthful data or synthesis Data carry out network training, using consecutive frame consecutive image as input, choose the light between output light stream vectors and reference data Flow endpoint error (endpoint error, EPE) is used as loss function, realizes the net generated from input sequential frame image to light stream Network training.

As shown in Fig. 2, the i-th frame image and i+1 frame image are inputted CNN network respectively, respective characteristics of image is exported It indicates；Before and after frames image feature representation is combined, deeper CNN network is further input into；It is improved by upper convolutional network The pond operating result resolution ratio of CNN network exports dense global optical flow figure pixel-by-pixel.

Interframe estimation network building: the network using light stream image as input, with six degree of freedom export pose vector and The distance between reference data constructs loss function, repetitive exercise network.Fig. 3, which is illustrated, utilizes local light stream picture and global light The process that combination of network completes the interframe estimation based on light stream is respectively trained in stream picture.This process can be selected KITTI data set or Generated data calculates input light stream by traditional optical flow algorithm to train network.

In the establishment process of interframe estimation module, global optical flow figure is divided into multiple local light stream subgraphs first, then Global optical flow figure and local light stream subgraph are inputted into CNN network respectively, obtaining light stream local feature and global characteristics indicates.By light Stream local feature and global characteristics expression are combined, and are input to full articulamentum, are obtained the frame of six-freedom degree pose vector expression Between estimate.

Training process can be divided into three phases: light stream subgraph local first is as input, and interframe estimation is as output, training Network；Secondly using global optical flow figure as input, interframe estimation is as output, training network；Finally, by local light stream subgraph and For global optical flow figure simultaneously as input, interframe estimation further trains network as output.

It realizes end-to-end visual odometry: cascading trained light stream network and the interframe estimation network based on light stream, Using the consecutive frame of image sequence in data set as the input of whole network, with six degree of freedom output vector and reference data away from From construction loss function, repetitive exercise Optimal Parameters realize quick, accurate, robust end-to-end visual odometry.

Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims

1. a kind of end-to-end visual odometry based on deep learning, it is characterized in that: including that cascade light stream network and interframe are estimated Network is counted, the light stream network is according to the consecutive frame in data set in image sequence, using consecutive frame consecutive image as input, choosing Taking the light stream end point error between output light stream vectors and reference data is loss function, after carrying out network training, by generation The output of light stream image, the interframe estimate network using light stream image as input, based on six degree of freedom output pose vector and base The distance between quasi- data construct loss function, and repetitive exercise network carries out interframe estimation；

After light stream network and interframe estimation cascade, further profound training, Optimal Parameters.

2. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the light stream net Network and interframe estimation network are stratification training method.

3. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the light stream net Network is convolutional neural networks training aids.

4. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that:

The interframe estimates network using light stream image as input, and the training of entire light stream image is divided into global optical flow figure instruction Experienced and multiple sub-light stream pictures part training, the feature of finally both combinations output are output to full articulamentum, complete to be based on light The interframe of stream estimates network.

5. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the interframe is estimated Counting network is to utilize KITTI data set training network.

6. a kind of end-to-end visual odometry based on deep learning as described in claim 1, it is characterized in that: the interframe is estimated Meter network is to train network using generated data.

7. a kind of end-to-end vision mileage estimation method based on deep learning, it is characterized in that: according to image sequence in data set In consecutive frame, using consecutive frame consecutive image as input, choose output light stream vectors and reference data between light stream endpoint Error is loss function, after carrying out network training, generates light stream image, according to light stream image, exports pose based on six degree of freedom The distance between vector and reference data construct loss function, and repetitive exercise network carries out interframe estimation；

Light stream network module and interframe estimation network module is respectively trained using different inputoutput datas, finally by the two grade Connection, further profound training, Optimal Parameters.