CN111739144A

CN111739144A - Method and device for simultaneously positioning and mapping based on depth feature optical flow

Info

Publication number: CN111739144A
Application number: CN202010565428.6A
Authority: CN
Inventors: 向坤; 陶文源; 闫野; 唐荣富; 陶雨薇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-10-02

Abstract

The invention discloses a method and a device for simultaneously positioning and mapping based on depth feature optical flow, wherein the method comprises the following steps: estimating the characteristic optical flow information between two frames of images according to the characteristic optical flow mapping model; designing a visual odometer based on a characteristic light stream, and using the visual odometer for feature extraction, feature tracking, pose estimation of a carrying camera body and image three-dimensional feature recovery of an image; when the tracking of the visual odometer fails, the re-matching of the visual features is completed through a repositioning technology; detecting whether the main body moves to a region which is passed by before, and realizing the pose optimization on the scale; iterative optimization is carried out on the image three-dimensional characteristic information and the main body pose information provided by the visual odometer, so that errors are reduced; and constructing a three-dimensional map of the environment by using the three-dimensional feature information of the image subjected to the back-end optimization and the pose information of the main body. The device comprises: memory, processor implementing the method steps of when the processor executes the program. The present invention provides a robust and accurate solution.

Description

Method and device for simultaneously positioning and mapping based on depth feature optical flow

Technical Field

The invention relates to the field of computing vision and deep learning, in particular to a method and a device for simultaneously positioning and mapping based on a depth feature optical flow.

Background

Optical flow (optical flow), which is essentially the change in luminance of a pixel point when the motion of an object in a three-dimensional scene is projected onto a two-dimensional image plane. The optical flow method is an image motion analysis technique developed in the field of computer vision, and is an important research topic in the field of machine vision. Optical flow based motion analysis is the basis for many visual tasks.

Conventional optical flow methods mainly include the horns & Schunck (HS) and Lucas & Kanade (LK) methods. The HS method and the LK method are both proposed based on a most basic assumption that the brightness (gray value) displayed by the same pixel point in two adjacent frame images is unchanged. Noting time t, the gray level of the pixel point at (x, y) is I (x, y, t), and assuming time t + dt, it moves to (x + dx, y + dy). Since the gray scale is not changed, I (x + dx, y + dy, t + dt) is I (x, y, t).

The Simultaneous Localization and Mapping (SLAM) technology is a technology that a main body carrying a specific sensor starts to move without environment prior information, a model of an environment is established in the moving process, self-Localization is carried out according to position estimation and a map, and an incremental map is built on the basis of the self-Localization. If the sensor here is primarily a camera, it is called a visual SLAM.

In the visual SLAM, a system extracts visual feature information on two adjacent frames of images, completes feature tracking and matching through feature optical flow information of the images, and estimates pose transformation of a camera through an established geometric model.

In a real-world environment, when an acquired image sequence faces obvious illumination change or an object in a three-dimensional scene has a large running state, a large error exists in the conventional optical flow estimation, so that the accuracy of tracking and matching of visual features is influenced.

Disclosure of Invention

The invention provides a method and a device for simultaneously positioning and mapping based on depth characteristic optical flow, which designs a depth neural network model based on the mapping input of adjacent frame images, maps the depth characteristic optical flow information of the images, and provides a robust and accurate solution for the problem of simultaneously positioning and mapping based on the characteristic optical flow, and the detailed description is as follows:

a method for simultaneous localization and mapping based on depth-feature optical flow, the method comprising:

estimating the characteristic optical flow information between two frames of images according to the characteristic optical flow mapping model;

designing a visual odometer based on a characteristic light stream, and using the visual odometer for feature extraction, feature tracking, pose estimation of a carrying camera body and image three-dimensional feature recovery of an image;

when the tracking of the visual odometer fails, the re-matching of the visual features is completed through a repositioning technology;

detecting whether the main body moves to a region which is passed by before, and realizing the pose optimization on the scale;

iterative optimization is carried out on the image three-dimensional characteristic information and the main body pose information provided by the visual odometer, so that errors are reduced;

and constructing a three-dimensional map of the environment by using the three-dimensional feature information of the image subjected to the back-end optimization and the pose information of the main body.

Wherein, the designing of the visual odometer based on the characteristic optical flow is specifically as follows:

s1.1, collecting a large data set sample of image data and a real value of a characteristic light stream, and dividing the large data set sample into a training set and a testing set;

s1.2, establishing an image pyramid

Layer relative to

Scaling the layer image by 2 times, and performing optical flow feature extraction and iterative optimization in each layer of image pyramid;

s1.3, for the estimated optical flow f^l+1Performing a 2-fold upsampling, combining

Performing image transformation operation to obtain

S1.4, will obtain

And

carrying out correlation calculation to obtain cv^l；

S1.5, establishing an optical flow estimation local network, and inputting the optical flow estimation local network into a layer I image of an image pyramid

Image correlation value cv^lUp-sampled optical flow up of l +1 layer₂(f^l+1) Firstly, a convolution network with 5 layers is established, and the output is a primary characteristic optical flow w^lThen, a sixth layer of convolution network is established to output a precise characteristic optical flow v^l；

S1.6, establishing an optical flow optimization local network, and inputting the optical flow optimization local network as a primary characteristic optical flow w^lRefined characteristic light flow v^lFirstly, a convolution network with 5 layers is established, then a convolution network with a sixth layer with 2 convolution kernels is established, and optimized optical flow f is output^l；

S1.7, carrying out S1.3-S1.6 operation on each layer of image pyramid in an iterative manner, wherein the optical flow graph output by the 0 th layer is the obtained characteristic optical flow f of the two frames of images;

s1.8, training the characteristic light stream mapping model by using a training set, and performing error detection by using a test set;

s1.9, for a first frame image I of a test set image sequence₁Defining the distance of the feature points as d pixels and defining the extraction quantity of the feature points as n;

s1.10, willOne-frame and two-frame image I₁And I₂Predicting a characteristic light flow graph f between two frame images in an input optical flow network₁；

S1.11, the characteristic light flow graph f is a double-channel image, and the value stored at the coordinate (x, y) in the first channel corresponds to the image I₁Upper point of

Is shifted u in the x direction, the value stored at the coordinate (x, y) in the second channel, corresponds to the image I₁Upper point of

Y-direction displacement v;

s1.12, analyzing characteristic light flow diagram f₁Calculating a second frame image I₂Upper corresponding characteristic point

S1.13, feature points after tracking

Detecting and marking invalid points;

s1.14, for successfully tracked point pairs

Solving a corresponding space 3D point, and defining the space 3D point as a map point;

s1.15, first frame image I₁Setting the pose of the main body to be the standard pose T₁；

S1.16, solving a second frame image I through the corresponding relation between map points and image feature points₂Time subject pose T₂；

S1.17, feature point pair

Judging a threshold value, and when the number of feature points is less than a threshold value n, carrying out second frame image I₂Extracting Fast characteristic points again, and supplementing the number of the characteristic points to n;

s1.18, inputting the next frame image I_j(ii) a Image I_j-1And image I_jInputting the predicted characteristic light flow into a characteristic light flow mapping model to predict a characteristic light flow graph f between two frame images_j-1；

S1.19, judging whether the last frame is reached, and if not, repeating the steps S1.12-S1.19.

Further, the above-mentioned completion of the visual feature re-matching by the repositioning technique is specifically:

will I_jThe frame is set as a start frame, I_jSetting the frame corresponding main body pose as the standard pose T_j(ii) a In I_jOn the basis of the initial frame, the next frame of image is input, and the tracking process in the visual odometer is repeated.

Wherein, whether the detection subject moves to a region which has been passed before or not is detected, and the pose optimization on the scale is specifically as follows:

for any input image frame I_jAnd the corresponding characteristic points, and extracting ORB descriptor of each characteristic point;

setting a descriptor matching degree threshold value, and tracking any image frame I_lThen, corresponding descriptors of the feature points are matched with the image I of each frame in the front_kComparing descriptors, calculating a Hamming distance dis, and defining that the loop detection is successful when dis is less than th2, namely I_lAnd I_kAnd performing pose optimization by using a nonlinear optimization algorithm corresponding to the same position of the main body in the space.

Further, the iterative optimization of the image three-dimensional feature information and the main body pose information provided by the visual odometer is specifically to reduce errors:

for the obtained two adjacent frame images I_j-1And I_jMap point and pose T_jAs an optimization input;

reprojection of map points into image I_jObtaining the image points after the re-projection, and calculating the error between the image points and the original characteristic points as an optimization quantity;

optimizing map points and poses by minimizing optimization quantities using a non-linear optimization algorithmT_j。

An apparatus for simultaneous localization and mapping of depth-feature-based optical flow, the apparatus comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the method steps of the claims when executing the program.

The technical scheme provided by the invention has the beneficial effects that:

1) the specific deep neural network is used for extracting the optical flow characteristics, so that the robustness on the conditions of illumination change, unobvious texture and the like is better;

2) the optical flow information is used for constructing a simultaneous positioning and mapping system, the calculation cost is low, and the real-time performance is good.

Drawings

FIG. 1 is an architectural diagram of a method for simultaneous localization and mapping based on a characteristic optical flow;

FIG. 2 is an optical flow network model;

FIG. 3 is a schematic diagram of an Image pyramid (Image pyramid);

FIG. 4 is a flow estimator local network architecture in a network model;

FIG. 5 is a context layer local network architecture in a network model;

fig. 6 is a schematic structural diagram of a visual odometer module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

Example 1

A method for simultaneous localization and mapping based on depth-feature optical flow, see fig. 1 and 2, the method comprising the steps of:

firstly, constructing a characteristic optical flow mapping model based on a deep neural network

In a first aspect of the present invention, a deep neural network-based feature optical flow mapping model is provided, which includes the following steps:

(1) collecting large samples of image data and characteristic light stream information, and dividing the large samples into a training set and a test set;

the number of the large samples is set according to the needs in practical application, which is not described in detail in the embodiments of the present invention.

(2) Designing an image pyramid network for extracting depth features between two frames of images;

(3) designing a depth convolution network to carry out iterative optimization on the extracted depth features;

(4) and training a feature network mapping model from the image to the optical flow information based on the result of the iterative optimization by combining a training set and a test set.

Estimating the characteristic optical flow information between two frames of images through a characteristic optical flow mapping model, and simultaneously positioning and mapping

In the second aspect of the invention, the characteristic optical flow information between two frames of images is estimated through the characteristic optical flow mapping model constructed in the first step, and further the positioning and mapping are carried out simultaneously, and the method comprises the following steps:

(1) developing a feature-based optical flow visual odometer module for feature extraction of images, feature tracking between the images, pose estimation of a camera body and three-dimensional feature recovery of the images;

(2) developing a repositioning module, and completing the re-matching of the visual characteristics through a repositioning technology when the tracking of the visual odometer fails;

(3) developing a closed loop detection module, detecting whether the main body moves to a region which is passed by before, and realizing pose optimization on a certain scale;

(4) a back-end optimization module is developed to perform iterative optimization on the image three-dimensional characteristic information and the main body pose information provided by the visual odometer, so that errors are reduced;

(5) and the map building module is used for building a three-dimensional map of the environment by using the image three-dimensional characteristic information subjected to the back-end optimization and the pose information of the main body.

Example 2

The scheme of example 1 is further described with reference to fig. 1 to 6 and specific calculation formulas, which are described in detail below:

in step S1, the optical flow network shown in fig. 2 is constructed, and further, the visual odometer module is constructed, the key steps are as follows:

s1.2, establishing an image pyramid (shown in figure 3)

W represents the image of the frame, l represents the image pyramid of the layer), wherein

The layers are original images, and convolution is carried out between every two layers by using a mode of step length 2, so that

Layer relative to

The layer image is scaled by a factor of 2. Establishing 7 layers of image pyramids, extracting optical flow characteristics and performing iterative optimization in each layer of image pyramid, and using a leakage ReLU as an activation function;

s1.3, for the l +1 layer of the light stream f estimated in the image pyramid of the l layer^l+1Performing a 2-fold upsampling and then combining

An image conversion (warping layer) operation is performed,

to obtain

Wherein x is an image

The coordinates of each point. For the top level image pyramid, its up₂(f^l+1) Is 0;

s1.4, will obtain

And

performing correlation calculation (correlation layer), calculating correlation between them to obtain cv^l，

Wherein x is₂Value range x of_2x∈[x_1x-d，x_1x+d]，x_2y∈[x_1y-d，x_1y+d](d is generally set to 4).

S1.5, establishing an optical flow estimation local network as shown in FIG. 4, and inputting the optical flow estimation local network into a layer I image of an image pyramid

Image correlation value cv^lUp-sampled optical flow up of l +1 layer₂(f^l+1). Firstly, establishing a convolution network with 5 layers, sequentially setting the number of convolution kernels as (128, 128, 96, 64 and 32), setting the activation function as leak ReLU and outputting as a primary characteristic optical flow w^l. Then establishing a sixth layer of convolution network, wherein the number of convolution kernels is 2, and outputting precise characteristic optical flow v^l。

S1.6, establishing an optical flow optimization local network shown in FIG. 5, and inputting a preliminary characteristic optical flow w^lRefined characteristic light flow v^l. Firstly, a convolutional network with 5 layers is established, the number of convolutional kernels is (128, 128, 96, 64 and 32) in sequence, the activation function is leak ReLU, and then a convolutional network with a sixth layer and the number of convolutional kernels is 2 is established. Outputting optimized light flow f^l；

s1.8, training the characteristic light stream mapping model by using a training set, carrying out error detection by using a test set after the best fitting effect is achieved, and further building a visual odometer shown in FIG. 6;

s1.9, for a first frame image I of a test set image sequence₁Invoking Opencv library to extract Fast^[1]Characteristic point

Limiting the distance of the feature points to be d pixels, limiting the extraction quantity of the feature points to be n, and ensuring the sufficient quantity and the uniform distribution of the feature points (d is constrained by the size of an input image, and n is generally 200);

s1.10, a first frame image I₁And a second frame image I₂Predicting a characteristic light flow graph f between two frame images in an input optical flow network₁；

Y-direction displacement v;

Wherein

S1.13, invoking OpenCV library random sample detection algorithm (RANSAC)^[2]For the tracked feature points

Detecting, wherein an algorithm can mark invalid points in the sample, namely tracking failure points, and removing the tracking failure points;

s1.14, for successfully tracked point pairs

Calling OpenCV library triangularization algorithm^[3]The corresponding spatial 3D point is obtained and defined as the map point { m_i|i＝1,...,n}；

S1.16, corresponding relation between map points and image feature points

By calling the OpenCV library PnP algorithm^[4]Solving a second frame image I₂Time subject pose T₂；

S1.17, feature point pair

And S1.19, judging whether the last frame is reached, ending the tracking when the last frame is reached, and otherwise, repeating the steps S1.12-S1.19.

In step S2, a relocation module based on image feature matching detection is constructed, and the key steps are as follows:

s2.1, setting a threshold value th1 of the matching degree of the feature points, wherein in the visual odometer, when adjacent frame images I are adjacent_j-1And I_jWhen the number of the characteristic points is smaller than a threshold th1, the tracking is considered to be failed, and relocation is carried out;

s2.1, mixing I_jThe frame is set as a start frame, I_jSetting the frame corresponding main body pose as the standard pose T_j；

S2.3 at_jAnd inputting the next frame of image on the basis of the initial frame, and repeating the tracking process from S1.10 to S1.19 in the visual odometer.

In step S3, a loop detection module based on an image feature matching algorithm is constructed, which includes the following key steps;

s3.1, for any input image frame I_jAnd its corresponding characteristic points

Calling OpenCV library in image I_jExtract ORB descriptor of each feature point^[5]

S3.2, setting a descriptor matching degree threshold th 2;

s3.3, tracking to optional image frame I_lThen, corresponding descriptors of the characteristic points

And each preceding frame image I_kComparing descriptors, calculating the Hamming distance dis, and defining that the loop detection is successful when dis is less than th2, i.e. I_lAnd I_kUsing a non-linear optimization algorithm corresponding to the same position of the main body in space^[6](cerees) pose optimization.

In step S4, a back-end optimization module based on a nonlinear optimization algorithm is constructed, which includes the following key steps:

s4.1, for the obtained two adjacent frames of images I_j-1And I_jMap point of { m }_i1, ·, n }, and pose T_jAs an optimization input;

s4.2, map points { m_iI1.. n } is re-projected onto image I_jTo obtain the image point after the re-projection

Computing

And original characteristic point

Error between

As an optimized amount;

s4.3, using a non-linear optimization algorithm, by minimizing

Optimized map points { m_i1,. n } and pose T_j。

In step S5, a mapping module is constructed, which includes the following key steps:

s5.1, obtaining the optimized map points { m ] of each image_i1,. n } and pose T_jInformation;

s5.2, calling drawing library pangolin^[7]Drawing pose T_jAnd drawing the space map points by a corresponding camera model.

Reference to the literature

[1]E.Rosten,R.Porter and T.Drummond,"Faster and Better:A MachineLearning Approach to Corner Detection,"in IEEE Transactions on PatternAnalysis and Machine Intelligence,vol.32,no.1,pp.105-119,Jan.2010,doi:10.1109/TPAMI.2008.275.

[2]Nister D.Preemptive RANSAC for Live Structure and MotionEstimation[C]//Proceedings Ninth IEEE International Conference on ComputerVision.IEEE,2008.

[3]OpenCV Online documentation https://docs.opencv.org/3.4/d0/dbd/group__triangulation.html

[4]OpenCV Online documentation https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033

[5]Rublee E,Rabaud V,Konolige K,et al.ORB:An efficient alternative toSIFT or SURF[C]//International Conference on Computer Vision.IEEE,2012.

[6]Ceres Solver:http://www.ceres-solver.org

[7]Pangolin:https://github.com/stevenlovegrove/Pangolin

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for simultaneous localization and mapping based on depth-feature optical flow, the method comprising:

2. The method for simultaneous localization and mapping of depth-feature-based optical flow according to claim 1, wherein the designing of feature-based optical flow visual odometer is specifically:

s1.2, establishing an image pyramid

Layer relative to

Performing image transformation operation to obtain

S1.4, will obtain

And

carrying out correlation calculation to obtain cv^l；

s1.10, a first frame image I and a second frame image I are processed₁And I₂Predicting a characteristic light flow graph f between two frame images in an input optical flow network₁；

Y-direction displacement v;

S1.13, feature points after tracking

Detecting and marking invalid points;

s1.14, for successfully tracked point pairs

S1.17, feature point pair

3. The method for simultaneous localization and mapping of optical flow based on depth features according to claim 1, wherein the re-matching of visual features by repositioning technique is specifically:

4. The method for simultaneous localization and mapping based on depth-feature optical flow according to claim 1, wherein the detecting whether the subject moves to a region that has been passed before implements pose optimization on a scale specifically:

5. The method for simultaneous localization and mapping based on depth-feature optical flow according to claim 1, wherein the iterative optimization is performed on the image three-dimensional feature information and the main body pose information provided by the visual odometer, and the error reduction is specifically:

optimizing map points and pose T by minimizing optimization quantity using nonlinear optimization algorithm_j。

6. An apparatus for simultaneous localization and mapping of depth-feature-based optical flow, the apparatus comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the method steps of claim 1 are implemented when the processor executes the program.