CN108053445A

CN108053445A - The RGB-D camera motion methods of estimation of Fusion Features

Info

Publication number: CN108053445A
Application number: CN201711297500.6A
Authority: CN
Inventors: 陈白帆; 刘春发; 宋德臻; 王斌
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2018-05-18

Abstract

The invention discloses a feature fusion RGB-D camera motion estimation method. The present invention firstly extracts two-dimensional point and two-dimensional line features in the RGB image, and back-projects the two-dimensional features to obtain three-dimensional features according to the depth information in the D image. Then, the error of the 3D point is constructed from the RGB measurement error and the depth measurement error, and the uncertainty of the line is measured by calculating the Mahalanobis distance between the 3D projection of the 2D line sampling point and the estimated 3D line. Finally, the 3D point and 3D line feature matching pairs of two adjacent frames are fused, and the uncertainty information is used to calculate the motion of the RGB‑D camera through maximum likelihood estimation. The invention combines straight line features that are not sensitive to illumination changes, reasonably constructs a system error model, and improves the robustness and accuracy of camera motion estimation.

Description

RGB-D Camera Motion Estimation Method Based on Feature Fusion

技术领域technical field

本发明属于机器视觉技术领域，具体涉及一种特征融合的RGB-D相机运动估计方法。The invention belongs to the technical field of machine vision, and in particular relates to a feature fusion RGB-D camera motion estimation method.

背景技术Background technique

近些年来，随着图像处理技术的快速发展和各种视觉传感器的出现，基于视觉的移动机器人得到越来越多的关注。相比激光雷达、毫米波雷达等，视觉传感器可以获取更加丰富的环境信息，同时还能降低成本。视觉里程计(Visual Odometry，VO)是仅通过视觉传感器来估计相机或与其相连的本体(例如：汽车、人或移动机器人等)的运动过程。它是视觉同时定位与地图创建(VSLAM)的一个子问题，是移动机器人实现自主导航的核心问题。常用的视觉传感器，包括单目相机，双目相机，全景相机和RGB-D相机。RGB-D相机，如微软的Kinect、Primer Sense、华硕Xtion PRO Live、奥比中光等已经引起了极大的兴趣，不仅是因为它的重量轻、成本低，更为重要的是该类相机能同时提供颜色和深度信息。深度信息可以解决单目相机中的尺度问题。与双目相机相比，RGB-D相机的物理测量节省了大量的深度信息计算过程。In recent years, with the rapid development of image processing technology and the emergence of various vision sensors, vision-based mobile robots have received more and more attention. Compared with lidar, millimeter-wave radar, etc., visual sensors can obtain richer environmental information while reducing costs. Visual odometry (Visual Odometry, VO) is to estimate the motion process of the camera or the body connected to it (for example: car, human or mobile robot, etc.) only through the visual sensor. It is a sub-problem of Visual Simultaneous Localization and Mapping (VSLAM), and is the core problem for autonomous navigation of mobile robots. Commonly used vision sensors include monocular cameras, binocular cameras, panoramic cameras and RGB-D cameras. RGB-D cameras, such as Microsoft's Kinect, Primer Sense, ASUS Xtion PRO Live, Orbi Zhongguang, etc. have attracted great interest, not only because of its light weight and low cost, but more importantly, this type of camera Can provide color and depth information at the same time. Depth information can solve the scale problem in monocular cameras. Compared with binocular cameras, the physical measurement of RGB-D cameras saves a lot of depth information calculation process.

目前，主流的相机运动估计方法大致分为特征点法(如SIFT、SURF、ORB等)和直接法。它们都对光照变化较为敏感，且特征点法易受到特征不均匀分布的影响，算法的鲁棒性较差，不能满足应用需求。At present, the mainstream camera motion estimation methods are roughly divided into feature point methods (such as SIFT, SURF, ORB, etc.) and direct methods. They are all sensitive to illumination changes, and the feature point method is easily affected by the uneven distribution of features, and the robustness of the algorithm is poor, which cannot meet the application requirements.

发明内容Contents of the invention

本发明的目的是为了解决现有相机运动估计方法中易受光照影响和不均匀特征分布带来较大噪声的问题。提出一种基于点和线特征融合的鲁棒性RGB-D相机运动估计方法。The purpose of the present invention is to solve the problem that in the existing camera motion estimation method, it is easy to be affected by the illumination and the large noise is caused by the non-uniform feature distribution. A robust RGB-D camera motion estimation method based on point and line feature fusion is proposed.

为了实现上述技术目的，本发明的技术方案是，一种特征融合的RGB-D相机运动估计方法，包括以下步骤：In order to achieve the above technical purpose, the technical solution of the present invention is a feature fusion RGB-D camera motion estimation method, comprising the following steps:

步骤一，二维点和直线特征提取：在RGB图片上，分别用特征点检测算法和直线分割检测算法来提取二维特征点和二维特征直线；Step 1, two-dimensional point and line feature extraction: on the RGB image, use the feature point detection algorithm and the line segmentation detection algorithm to extract two-dimensional feature points and two-dimensional feature lines;

步骤二，特征反投影至三维并做不确定性分析：结合深度信息，利用针孔相机模型，将提取的二维特征点和二维特征直线反投影到三维，在高斯噪声分布的假设下，分别对三维点和三维直线进行不确定性分析；Step 2, feature back-projection to 3D and uncertainty analysis: combined with depth information, using the pinhole camera model, back-project the extracted 2D feature points and 2D feature lines to 3D, under the assumption of Gaussian noise distribution, Perform uncertainty analysis on 3D points and 3D lines respectively;

步骤三，特征匹配：对于点特征，计算特征点检测算法描述子并对连续两帧的点特征进行匹配；对于直线特征，计算其均值标准差描述子并进行匹配；然后用随机采样一致性算法去除误匹配；Step 3, feature matching: For point features, calculate the feature point detection algorithm descriptor and match the point features of two consecutive frames; for straight line features, calculate the mean standard deviation descriptor and match; then use the random sampling consensus algorithm Remove mismatches;

步骤四，相机运动的估计：利用不确定性信息，对相机运动做极大似然估计；通过列文伯格算法对问题的目标函数进行求解，得到相机的位姿。Step 4, camera motion estimation: use the uncertainty information to estimate the maximum likelihood of the camera motion; use the Levenberg algorithm to solve the objective function of the problem to obtain the camera pose.

所述的一种特征融合的RGB-D相机运动估计方法，所述的步骤一包括以下步骤：The RGB-D camera motion estimation method of described a kind of feature fusion, described step 1 comprises the following steps:

对于一幅RGB图，通过特征点检测算法获得二维特征点，同时，通过直线分割检测算法获得二维特征直线，得到特征集合{p_i,l_j|i＝1,2,…,j＝1,2,…}，其中二维点p_i＝[u_i,v_i]^T，二维直线[u_i,v_i]^T表示点p_i的像素坐标，a_j和b_j为直线l_j的两个端点。For an RGB image, the two-dimensional feature points are obtained through the feature point detection algorithm, and at the same time, the two-dimensional feature line is obtained through the line segmentation detection algorithm, and the feature set {p _i ,l _j |i=1,2,...,j= 1,2,…}, where the two-dimensional point p _i =[u _i ,v _i ] ^T , the two-dimensional straight line [u _i , v _i ] ^T represents the pixel coordinates of point p _i , a _j and b _j are the two endpoints of line l _j .

所述的一种特征融合的RGB-D相机运动估计方法，所述的步骤二包括以下步骤：The RGB-D camera motion estimation method of described a kind of feature fusion, described step 2 comprises the following steps:

步骤1，三维点特征与不确定性分析：Step 1, 3D point features and uncertainty analysis:

将图像中的二维点p通过针孔相机模型，反投影到三维得到三维点P：The 2D point p in the image is back-projected to 3D through the pinhole camera model to obtain the 3D point P:

其中(u,v)表示二维点p对应的像素坐标，d为二维点p对应的深度值，[c_u,c_v]^T为相机的光圈中心，f_c为焦距；Where (u, v) represents the pixel coordinates corresponding to the two-dimensional point p, d is the depth value corresponding to the two-dimensional point p, [cu _u , c _v ] ^T is the aperture center of the camera, and f _c is the focal length;

以二维点p的噪声是均值为0，协方差为的高斯分布，二维点p的深度值d的噪声与测量值成二次函数关系，即δ_d＝c₁d²+c₂d+c₃，常数系数c₁、c₂和c₃通过实验获得，三维点的不确定性为：The noise of a two-dimensional point p has a mean of 0 and a covariance of The Gaussian distribution of , the noise of the depth value d of the two-dimensional point p has a quadratic function relationship with the measured value, that is, δ _d =c ₁ d ² +c ₂ d+c ₃ , and the constant coefficients c ₁ , c ₂ and c ₃ pass through Experimentally obtained, the uncertainty of the three-dimensional point is:

其中，噪声的协方差矩阵雅克比矩阵 Among them, the covariance matrix of the noise Jacobian matrix

其中，I₂是2维单位矩阵，δ为噪声方差；Among them, I ₂ is a 2-dimensional identity matrix, and δ is the noise variance;

步骤2，三维直线特征和不确定性分析：Step 2, 3D line characteristics and uncertainty analysis:

将二维直线采样足够多个点，舍弃深度值异常的点，根据公式(1)计算剩余点的三维坐标；然后采用随机采样一致性算法耐去除由于深度噪声影响在二维直线反投影到三维之后出现的局外点，并得到拟合的三维直线方程；用L＝[A^T,B^T]^T来表示对应的三维直线，其中A^T,B^T是三维直线上的两个三维点；Sampling enough points on the two-dimensional straight line, discarding the points with abnormal depth values, and calculating the three-dimensional coordinates of the remaining points according to formula (1); then using the random sampling consistency algorithm to remove the back-projection of the two-dimensional straight line to three-dimensional due to the influence of depth noise The outlier point that appears afterwards, and obtain the fitting three-dimensional straight line equation; Use L=[ ^AT , B ^T ] ^T to represent the corresponding three-dimensional straight line, wherein A ^T , B ^T are two three-dimensional points on the three-dimensional straight line;

通过计算这些二维采样点所对应的三维点与所估计的三维直线L的马氏距离来度量三维直线的不确定性；Measure the uncertainty of the three-dimensional straight line by calculating the Mahalanobis distance between the three-dimensional points corresponding to these two-dimensional sampling points and the estimated three-dimensional straight line L;

三维点P＝[x,y,z]^T到三维直线L＝[A^T,B^T]^T的马氏距离定义如下：The Mahalanobis distance from a three-dimensional point P=[x,y,z] ^T to a three-dimensional straight line L=[A ^T ,B ^T ] ^T is defined as follows:

其中，Q∈L是直线L上的一个任意点；有Q＝A+λ(B-A),则Q的优化估计Q^*, Among them, Q∈L is an arbitrary point on the straight line L; there is Q=A+λ(BA), then the optimal estimate of Q is Q ^* ,

定义definition

δ(P-L)＝P-Q^* (4)δ(PL)=PQ ^* (4)

则三维点P到三维直线L的马氏距离为Then the Mahalanobis distance from the three-dimensional point P to the three-dimensional straight line L is

三维直线由一组经过随机采样一致性算法处理后的三维点{P_i,i＝1,…n_L}构成，其中,P₁和分别代表直线的两个端点，令三维直线的误差函数为The three-dimensional straight line is composed of a group of three-dimensional points {P _i ,i=1,…n _L } processed by the random sampling consensus algorithm, where P ₁ and Represent the two endpoints of the straight line respectively, so that the error function of the three-dimensional straight line is

于是三维直线L的极大似然估计L^*等价于最小化下式：Then the maximum likelihood estimation L ^* of the three-dimensional straight line L is equivalent to minimizing the following formula:

其中∑_P和δ(P-L)分别由公式(2)和公式(4)计算得到；利用列文伯格算法求解公式(7)的非线性最小化问题，优化后的三维直线的不确定性为in ∑ _P and δ(PL) are calculated by formula (2) and formula (4) respectively; using the Levenberg algorithm to solve the nonlinear minimization problem of formula (7), the uncertainty of the optimized three-dimensional straight line is

其中， in,

所述的一种特征融合的RGB-D相机运动估计方法，所述的步骤三包括以下步骤：The RGB-D camera motion estimation method of described a kind of feature fusion, described step 3 comprises the following steps:

对于点特征，计算其SURF描述子，通过描述子之间的相似度来匹配相邻两帧图像的点特征；对于直线特征，计算均值标准化描述子，然后来进行匹配。For point features, calculate its SURF descriptor, and match the point features of two adjacent frames of images through the similarity between the descriptors; for straight line features, calculate the mean normalization descriptor, and then perform matching.

所述的一种特征融合的RGB-D相机运动估计方法，所述的步骤四包括以下步骤：The RGB-D camera motion estimation method of described a kind of feature fusion, described step 4 comprises the following steps:

用T表示相邻两帧的运动转换，有T(X):＝RX+t，其中R为旋转矩阵，t为位移向量，取旋转矩阵和位移向量的非0项组成一个六维向量ξ；Represent the motion transformation of adjacent two frames with T, have T(X):=RX+t, wherein R is a rotation matrix, t is a displacement vector, get the non-zero item of rotation matrix and displacement vector to form a six-dimensional vector ξ;

以作为相邻两帧F和F’的一组匹配后的三维点集，则是相邻两帧的三维直线特征匹配对；若三维点匹配对对应的真实物理点在前一帧的局部坐标下为X_i三维点，以三维直线匹配对对应的真实物理直线为Y_j，那么，相机里程计系统中待优化的变量为 by As a set of matched 3D point sets of two adjacent frames F and F', is the 3D line feature matching pair of two adjacent frames; if the 3D point matching pair The corresponding real physical point is the three-dimensional point X _i in the local coordinates of the previous frame, and the three-dimensional straight line is used to match the pair The corresponding real physical straight line is Y _j , then the variable to be optimized in the camera odometer system is

定义系统的误差函数包括点特征的误差和直线特征的误差，即The error function that defines the system includes the error of the point feature and the error of the line feature, that is,

其中，点特征的误差为Among them, the error of the point feature is

直线特征的误差为The error of the straight line feature is

单个直线的误差为η(L_i-Y_i)＝[δ(A_i,Y_i)^T,δ(B_i,Y_i)^T]，且 The error of a single straight line is η(L _i -Y _i )=[δ(A _i ,Y _i ) ^T ,δ(B _i ,Y _i ) ^T ], and

设定目标为找寻使相机里程计系统系统误差最小的变量Δ，即The goal is to find the variable Δ that minimizes the systematic error of the camera odometer system, namely

其中，Σ_f＝diag(Σ_P,Σ_L)，本发明利用列文伯格算法对上式(10)进行求解，得到相机运动的初始位姿；利用相机运动的初始位姿信息构建位姿图，SLAM后端利用位姿图，通过图优化的方法来优化相机的运动轨迹。Wherein, Σ _f =diag(Σ _P , Σ _L ), the present invention uses the Levenberg algorithm to solve the above formula (10) to obtain the initial pose of the camera motion; use the initial pose information of the camera motion to construct the pose Graph, the SLAM backend uses the pose graph to optimize the camera's trajectory through the graph optimization method.

所述的一种特征融合的RGB-D相机运动估计方法，所述的步骤2中，舍弃深度值异常的点是舍弃深度值为空值的点。In the aforementioned feature fusion RGB-D camera motion estimation method, in the step 2, discarding points with abnormal depth values means discarding points with null depth values.

本发明的技术效果在于：Technical effect of the present invention is:

(1)直线特征对光照变化不敏感，本发明将点特征和线特征相结合，大大提高运动估计的鲁棒性。在室内环境下，直线特征尤其丰富，非常容易提取环境中的直线特征，降低了算法的复杂度。(1) Line features are not sensitive to illumination changes. The present invention combines point features and line features to greatly improve the robustness of motion estimation. In the indoor environment, the straight line features are particularly rich, and it is very easy to extract the straight line features in the environment, reducing the complexity of the algorithm.

(2)本发明对三维点和三维直线进行不确定性分析，并利用不确定信息对相机运动做极大似然估计，通过优化得到相机位姿，为后端优化提供了有效合理的初始值。(2) The present invention performs uncertainty analysis on three-dimensional points and three-dimensional straight lines, and utilizes uncertain information to perform maximum likelihood estimation on camera motion, obtains camera pose through optimization, and provides an effective and reasonable initial value for back-end optimization .

附图说明Description of drawings

图1为为特征融合的RGB-D相机运动估计系统框图。Figure 1 is a block diagram of an RGB-D camera motion estimation system for feature fusion.

图2为二维直线反投影到三维空间的示例图。FIG. 2 is an example diagram of back-projection of a two-dimensional straight line into a three-dimensional space.

图3为相机的位姿图。Figure 3 is the pose graph of the camera.

具体实施方式Detailed ways

下面结合附图对本发明的实施方式作进一步的说明。Embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

本发明提出了一种基于点和线特征融合的RGB-D相机运动估计方法，整个系统框图如附图1所示，包括以下步骤：The present invention proposes a RGB-D camera motion estimation method based on point and line feature fusion, and the whole system block diagram is as shown in accompanying drawing 1, comprises the following steps:

S1.二维特征提取：系统利用输入的RGB图分别提取二维点特征和二维直线特征。S1. Two-dimensional feature extraction: the system uses the input RGB image to extract two-dimensional point features and two-dimensional line features respectively.

在本发明中，通过SURF即特征点检测算法获得一组二维点特征，同时，通过LSD(Line Segment Detector)直线分割检测算法来获得二维直线特征。对于一幅RGB图，其特征集合{p_i,l_j|i＝1,2,…,j＝1,2,…}，其中二维点p_i＝[u_i,v_i]^T，二维直线[u_i,v_i]^T表示点p_i的像素坐标，a_j和b_j为直线l_j的两个端点。In the present invention, a group of two-dimensional point features are obtained through SURF (feature point detection algorithm), and at the same time, two-dimensional straight line features are obtained through LSD (Line Segment Detector) line segmentation detection algorithm. For an RGB image, its feature set {p _i ,l _j |i=1,2,…,j=1,2,…}, where two-dimensional point p _i =[u _i ,v _i ] ^T , two dimensional straight line [u _i , v _i ] ^T represents the pixel coordinates of point p _i , a _j and b _j are the two endpoints of line l _j .

S2.三维特征获取及不确定性分析：将二维特征反投影至三维，并分别对三维点和三维直线做不确定分析，并优化三维特征的估计。S2. Three-dimensional feature acquisition and uncertainty analysis: Back-project two-dimensional features to three-dimensional, and perform uncertainty analysis on three-dimensional points and three-dimensional straight lines, and optimize the estimation of three-dimensional features.

(1)三维点特征与不确定性分析(1) Three-dimensional point characteristics and uncertainty analysis

通过针孔相机模型，将二维点特征反投影到三维。假设一个图像中的二维点p特征对应的深度值为d，它对应的三维点P如下：Back-project 2D point features to 3D through a pinhole camera model. Assuming that the depth value corresponding to the two-dimensional point p feature in an image is d, its corresponding three-dimensional point P is as follows:

这里[c_u,c_v]^T为相机的光圈中心，f_c为焦距。Here [cu _u , c _v ] ^T is the aperture center of the camera, and f _c is the focal length.

通过公式(1),可以将二维点特征反投影至三维空间得到三维点特征(如附图2所示)。RGB-D相机存在较大测量误差，其获取的三维点云的噪声主要来源于RGB测量误差和深度值测量误差。Through the formula (1), the two-dimensional point feature can be back-projected to the three-dimensional space to obtain the three-dimensional point feature (as shown in Figure 2). The RGB-D camera has a large measurement error, and the noise of the 3D point cloud it acquires mainly comes from the RGB measurement error and the depth value measurement error.

假设二维点p的噪声是均值为0，协方差为的高斯分布。其中，I₂是2维单位矩阵，δ为噪声方差。三维点P的深度值d的噪声与测量值成二次函数关系，即δ_d＝c₁d²+c₂d+c₃，常数系数c₁、c₂和c₃通过实验获得。假设二维点p的测量误差和深度值d的误差相互独立，三维点的不确定性为：Assume that the noise of the two-dimensional point p has a mean of 0 and a covariance of Gaussian distribution. where _I2 is a 2-dimensional identity matrix, and δ is the noise variance. The noise of the depth value d of the three-dimensional point P has a quadratic function relationship with the measured value, that is, δ _d =c ₁ d ² +c ₂ d+c ₃ , and the constant coefficients c ₁ , c ₂ and c ₃ are obtained through experiments. Assuming that the measurement error of the 2D point p and the error of the depth value d are independent of each other, the uncertainty of the 3D point is:

(2)三维直线特征和不确定性分析(2) Three-dimensional straight line characteristics and uncertainty analysis

三维空间中的直线投影到二维仍然保持直线的特性。因此，本发明首先在二维RGB图中检测直线，然后反投影到三维空间。反投影的思想如图2所示，将二维直线采样足够多个点，根据公式(1)计算这些点的三维坐标，并舍弃那些具有无效深度值的点。由于深度噪声的影响，二维直线反投影到三维之后会出现局外点(如附图2中的小圆圈)，我们采用随机采样一致性算法(RANSAC)来去除，并得到拟合的三维直线方程。我们用L＝[A^T,B^T]^T来表示对应的三维直线，其中A^T,B^T是三维直线上的两个三维点。The projection of straight lines in three-dimensional space to two-dimensional space still maintains the characteristics of straight lines. Therefore, the present invention first detects straight lines in a two-dimensional RGB image, and then back-projects them into three-dimensional space. The idea of back-projection is shown in Figure 2. The two-dimensional straight line is sampled with enough points, the three-dimensional coordinates of these points are calculated according to formula (1), and those points with invalid depth values are discarded. Due to the influence of depth noise, outliers (such as the small circles in Figure 2) will appear after the back-projection of the 2D straight line to 3D. We use the Random Sampling Consensus Algorithm (RANSAC) to remove it and get the fitted 3D straight line equation. We use L=[A ^T , B ^T ] ^T to represent the corresponding three-dimensional straight line, where A ^T , B ^T are two three-dimensional points on the three-dimensional straight line.

通过计算这些二维采样点所对应的三维点与所估计的三维直线L的马氏距离来度量直线的不确定性。The uncertainty of the straight line is measured by calculating the Mahalanobis distance between the three-dimensional points corresponding to these two-dimensional sampling points and the estimated three-dimensional straight line L.

其中，Q∈L是直线L上的一个任意点。假设Q＝A+λ(B-A),于是公式(3)的最小化问题等价于单变量二次函数最小化问题。经过推导和计算可以得到Q的优化估计Q^*，即 Among them, Q∈L is an arbitrary point on the line L. Assuming Q=A+λ(BA), then the minimization problem of formula (3) is equivalent to the minimization problem of quadratic function of univariate. After derivation and calculation, the optimized estimate Q ^* of Q can be obtained, namely

定义definition

δ(P-L)＝P-Q^* (4)δ(PL)=PQ ^* (4)

于是，三维点P到三维直线L的马氏距离可写为Therefore, the Mahalanobis distance from a three-dimensional point P to a three-dimensional straight line L can be written as

三维直线由一组经过随机采样一致性算法处理后的三维点{P_i,i＝1,…n_L}构成，其中,P₁和分别代表直线的两个端点。令三维直线的误差函数为The three-dimensional straight line is composed of a group of three-dimensional points {P _i ,i=1,…n _L } processed by the random sampling consensus algorithm, where P ₁ and represent the two endpoints of the line, respectively. Let the error function of the three-dimensional straight line be

其中∑_P和δ(P-L)分别由公式(2)和公式(4)计算得到；利用列文伯格算法求解公式(7)的非线性最小化问题。优化后的三维直线的不确定性为in ∑ _P and δ(PL) are calculated by formula (2) and formula (4) respectively; the nonlinear minimization problem of formula (7) is solved by using the Levenberg algorithm. The uncertainty of the optimized 3D straight line is

其中， in,

S3.相邻两帧的特征匹配：包括点特征匹配和直线特征匹配。S3. Feature matching of two adjacent frames: including point feature matching and straight line feature matching.

对于点特征，本发明计算其SURF描述子，通过描述子之间的相似度来匹配相邻两帧图像的点特征。对于直线特征，计算MSLD描述子，然后来进行匹配。For the point feature, the present invention calculates its SURF descriptor, and matches the point features of two adjacent frames of images through the similarity between the descriptors. For straight line features, the MSLD descriptor is calculated and then matched.

S4.运动估计：本发明利用点特征和直线特征，对相机运动位姿进行估计。S4. Motion estimation: the present invention utilizes point features and line features to estimate the motion pose of the camera.

相邻两帧的相机运动包括旋转和平移。用T表示相邻两帧的运动转换，有T(X):＝RX+t，其中R为旋转矩阵，t为位移向量，取旋转矩阵和位移向量的非0项组成一个六维向量ξ。The camera motion of two adjacent frames includes rotation and translation. Use T to represent the motion conversion of two adjacent frames, T(X):=RX+t, wherein R is a rotation matrix, t is a displacement vector, and the non-zero items of the rotation matrix and displacement vector are used to form a six-dimensional vector ξ.

假设作为相邻两帧F和F’的一组匹配后的三维点集，则是相邻两帧的三维直线特征匹配对。若三维点匹配对对应的真实物理点在前一帧的局部坐标下为X_i三维点，以三维直线匹配对对应的真实物理直线为Y_j，那么，相机里程计系统中待优化的变量为 suppose As a set of matched 3D point sets of two adjacent frames F and F', is the matching pair of three-dimensional straight line features of two adjacent frames. If the 3D points match The corresponding real physical point is the three-dimensional point X _i in the local coordinates of the previous frame, and the three-dimensional straight line is used to match the pair The corresponding real physical straight line is Y _j , then the variable to be optimized in the camera odometer system is

其中，点特征的误差为Among them, the error of the point feature is

直线特征的误差为The error of the straight line feature is

其中，Σ_f＝diag(Σ_P,Σ_L)。本发明利用LM算法对上式(10)进行求解，得到相机运动的位姿。相机的位姿图如图3所示。相机初始位姿估计的信息可以构建位姿图，通过图优化方法来优化相机的轨迹。Among them, Σ _f =diag(Σ _P ,Σ _L ). The present invention uses the LM algorithm to solve the above formula (10) to obtain the pose of the camera movement. The pose graph of the camera is shown in Figure 3. The information of the camera's initial pose estimation can construct a pose graph, and optimize the camera's trajectory through graph optimization methods.

Claims

1. A feature-fused RGB-D camera motion estimation method is characterized by comprising the following steps:

step one, extracting two-dimensional point and straight line features: on an RGB picture, extracting two-dimensional feature points and two-dimensional feature straight lines by using a feature point detection algorithm and a straight line segmentation detection algorithm respectively;

step two, performing feature back projection to the three dimensions and performing uncertainty analysis: combining depth information, utilizing a pinhole camera model to perform back projection on the extracted two-dimensional feature points and two-dimensional feature straight lines to three dimensions, and performing uncertainty analysis on the three-dimensional points and the three-dimensional straight lines under the assumption of Gaussian noise distribution;

step three, feature matching: for the point characteristics, calculating a characteristic point detection algorithm descriptor and matching the point characteristics of two continuous frames; calculating and matching the mean standard deviation descriptors of the linear features; then removing mismatching by using a random sampling consistency algorithm;

step four, estimating the motion of the camera: carrying out maximum likelihood estimation on the camera motion by using the uncertainty information; and solving the objective function of the problem through a Levenberg algorithm to obtain the pose of the camera.

2. The method as claimed in claim 1, wherein the step one comprises the steps of:

for one RGB image, two-dimensional feature points are obtained through a feature point detection algorithm, meanwhile, two-dimensional feature straight lines are obtained through a straight line segmentation detection algorithm, and a feature set { p is obtained_i,l_j1,2, …, j 1,2, … }, where the two-dimensional point p is_i＝[u_i,v_i]^TTwo-dimensional straight line[u_i,v_i]^TRepresents a point p_iPixel coordinates of a_jAnd b_jIs a straight line l_jTwo end points of (a).

3. The feature-fused RGB-D camera motion estimation method as claimed in claim 1, wherein the second step comprises the steps of:

step 1, three-dimensional point characteristic and uncertainty analysis:

and (3) performing back projection on the two-dimensional point P in the image to obtain a three-dimensional point P through a pinhole camera model:

where (u, v) represents the pixel coordinate corresponding to the two-dimensional point p, d is the depth value corresponding to the two-dimensional point p, [ c_u,c_v]^TIs the center of the aperture of the camera, f_cIs the focal length;

the noise at two-dimensional point p is the mean 0 and the covarianceWith the noise of the depth value d of the two-dimensional point p being a quadratic function of the measured value, i.e. delta_d＝c₁d²+c₂d+c₃Constant coefficient c₁、c₂And c₃The uncertainty of the three-dimensional point is obtained through experiments as follows:

<mrow> <msub> <mo>&Sigma;</mo> <mi>P</mi> </msub> <mo>=</mo> <msub> <mi>J</mi> <mi>P</mi> </msub> <mi>c</mi> <mi>o</mi> <mi>v</mi> <mrow> <mo>(</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>p</mi> </mtd> </mtr> <mtr> <mtd> <mi>d</mi> </mtd> </mtr> </mtable> </mfenced> <mo>)</mo> </mrow> <msubsup> <mi>J</mi> <mi>P</mi> <mi>T</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

wherein the covariance matrix of the noiseJacobi matrix

Wherein, I₂Is a 2-dimensional unit matrix, δ is the noise variance;

step 2, three-dimensional straight line characteristic and uncertainty analysis:

sampling enough points of the two-dimensional straight line, discarding points with abnormal depth values, and calculating three-dimensional coordinates of the remaining points according to a formula (1); then adopting a random sampling consistency algorithm to resist and remove the local outer points which appear after the two-dimensional straight line is back projected to the three-dimensional due to the influence of depth noise, and obtaining a fitted three-dimensional straight line equation; with L ═ A^T,B^T]^TTo represent a corresponding three-dimensional straight line, wherein A^T,B^TAre two three-dimensional points on a three-dimensional straight line;

measuring the uncertainty of the three-dimensional straight line by calculating the Mahalanobis distance between the three-dimensional point corresponding to the two-dimensional sampling points and the estimated three-dimensional straight line L;

three-dimensional point P ═ x, y, z]^TTo a three-dimensional line L ═ A^T,B^T]^TThe mahalanobis distance of (a) is defined as follows:

<mrow> <msub> <mi>d</mi> <mrow> <mi>M</mi> <mi>A</mi> <mi>H</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>P</mi> <mo>,</mo> <mi>L</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mi>Q</mi> <mo>&Element;</mo> <mi>L</mi> </mrow> </munder> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <mi>P</mi> <mo>-</mo> <mi>Q</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mo>&Sigma;</mo> <mi>P</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>P</mi> <mo>-</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

wherein Q epsilon L is an arbitrary point on the straight line L; with Q ═ a + λ (B-a), the optimal estimate of Q is then Q^*,

Definition of

δ(P-L)＝P-Q^*(4)

The mahalanobis distance from the three-dimensional point P to the three-dimensional line L is

<mrow> <msub> <mi>d</mi> <mrow> <mi>M</mi> <mi>A</mi> <mi>H</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>P</mi> <mo>,</mo> <mi>L</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <mi>&delta;</mi> <msup> <mrow> <mo>(</mo> <mi>P</mi> <mo>-</mo> <mi>L</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mo>&Sigma;</mo> <mi>P</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>P</mi> <mo>-</mo> <mi>L</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

The three-dimensional straight line is processed by a group of three-dimensional points { P after being processed by a random sampling consistency algorithm_i,i＝1,…n_LIs formed of, wherein P₁Andrespectively representing two end points of a straight line, and making the error function of a three-dimensional straight line be

<mrow> <mi>w</mi> <mrow> <mo>(</mo> <mi>L</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msub> <mi>P</mi> <mn>1</mn> </msub> <mo>-</mo> <mi>A</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&delta;</mi> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mn>2</mn> </msub> <mo>,</mo> <mi>L</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&delta;</mi> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mrow> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mi>L</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>P</mi> <msub> <mi>n</mi> <mi>L</mi> </msub> </msub> <mo>-</mo> <mi>A</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

Maximum likelihood estimation L of three-dimensional straight line L^*Equivalent to minimizing the following:

<mrow> <msup> <mi>L</mi> <mo>*</mo> </msup> <mo>=</mo> <munder> <mi>min</mi> <mi>L</mi> </munder> <mi>w</mi> <msup> <mrow> <mo>(</mo> <mi>L</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mo>&Sigma;</mo> <mi>w</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mi>w</mi> <mrow> <mo>(</mo> <mi>L</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

wherein∑_PAnd delta (P-L) are respectively calculated by a formula (2) and a formula (4); solving the nonlinear minimization problem of equation (7) using the Levenberg algorithm with an optimized uncertainty of the three-dimensional line of

<mrow> <msub> <mo>&Sigma;</mo> <mi>L</mi> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>J</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mo>&Sigma;</mo> <mi>w</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <msub> <mi>J</mi> <mi>w</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

Wherein,

4. the feature-fused RGB-D camera motion estimation method according to claim 1, wherein said step three comprises the steps of:

for the point features, SURF descriptors of the point features are calculated, and the point features of two adjacent frames of images are matched through the similarity between the descriptors; for straight line features, a mean normalized descriptor is calculated and then matched.

5. The feature-fused RGB-D camera motion estimation method according to claim 1, wherein said step four includes the steps of:

t represents the motion conversion of two adjacent frames, and comprises T (X), RX + T, wherein R is a rotation matrix, T is a displacement vector, and a six-dimensional vector ξ is formed by taking the non-0 items of the rotation matrix and the displacement vector;

to be provided withAs a set of matched three-dimensional point sets of two adjacent frames F and F',then the three-dimensional straight line feature matching pairs of two adjacent frames are obtained; if three-dimensional point matching pairsThe corresponding real physical point is X under the local coordinate of the previous frame_iThree-dimensional points, matching pairs in three-dimensional linesCorresponding to a true physical straight line of Y_jThen, the variable to be optimized in the camera odometer system is

The error function defining the system includes errors for point features and errors for straight line features, i.e.

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>&Delta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <mi>P</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>f</mi> <mi>L</mi> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

Wherein the error of the point feature is

<mrow> <msub> <mi>f</mi> <mi>P</mi> </msub> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>P</mi> <mn>1</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>X</mi> <mi>n</mi> </msub> <mo>-</mo> <msub> <mi>P</mi> <mi>n</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>P</mi> <mn>1</mn> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>P</mi> <mi>n</mi> <mo>&prime;</mo> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

Error of straight line feature is

<mrow> <msub> <mi>f</mi> <mi>L</mi> </msub> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>&eta;</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&eta;</mi> <mrow> <mo>(</mo> <msub> <mi>L</mi> <mi>m</mi> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&eta;</mi> <mrow> <mo>(</mo> <msubsup> <mi>L</mi> <mn>1</mn> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mi>T</mi> <mo>(</mo> <msub> <mi>Y</mi> <mn>1</mn> </msub> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&eta;</mi> <mrow> <mo>(</mo> <msubsup> <mi>L</mi> <mi>m</mi> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mi>T</mi> <mo>(</mo> <msub> <mi>Y</mi> <mi>m</mi> </msub> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>

Error of a single straight line is η (L)_i-Y_i)＝[δ(A_i,Y_i)^T,δ(B_i,Y_i)^T]And is and

setting the target to find the variable Δ that minimizes the system error of the camera odometer system, i.e.

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>&Delta;</mi> </munder> <mi>f</mi> <msup> <mrow> <mo>(</mo> <mi>&Delta;</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mo>&Sigma;</mo> <mi>f</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mi>f</mi> <mrow> <mo>(</mo> <mi>&Delta;</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>

Wherein, sigma_f＝diag(Σ_P,Σ_L) Solving the above formula (10) by using a Levenberg algorithm to obtain an initial pose of the camera motion; and constructing a pose graph by using initial pose information of camera motion, and optimizing the motion track of the camera by using the pose graph at the back end of the SLAM through a graph optimization method.

6. The method as claimed in claim 3, wherein the step 2 of discarding the points with abnormal depth values is discarding the points with null depth values.