CN104732518B

CN104732518B - An Improved Method of PTAM Based on Ground Features of Intelligent Robot

Info

Publication number: CN104732518B
Application number: CN201510025810.7A
Authority: CN
Inventors: 贾松敏; 王可; 宣璇; 张鹏; 董政胤
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-01-19
Filing date: 2015-01-19
Publication date: 2017-09-01
Anticipated expiration: 2035-01-19
Also published as: CN104732518A

Abstract

A kind of PTAM improved methods based on intelligent robot terrain surface specifications, first, complete parameter correction, this includes parameter definition and camera calibration；Then current environment texture information is obtained using camera, and builds four layers of Gaussian image pyramid, extract the characteristic information in present image with FAST Corner Detection Algorithms, the data correlation set up between Corner Feature obtains pose estimation model；In the initial drawing stage of map, obtain two key frames and camera is erected in mobile robot；In initialization procedure, mobile robot starts movement, while angle point information and setting up association in camera capture current scene；After the initialization for realizing sparse 3 D map, update key frame and set up characteristic point sub-pixel precision mapping relations with block matching method using polar curve search, accurately relocated with reference to pose estimation model realization camera.Match point is finally projected into space, current global context three-dimensional map is completed and creates.

Description

An Improved Method of PTAM Based on Ground Features of Intelligent Robot

技术领域technical field

本发明属于机器人视觉领域，涉及一种基于地面特征的PTAM算法改进。The invention belongs to the field of robot vision and relates to an improved PTAM algorithm based on ground features.

背景技术Background technique

随着机器人与人类的关系越来越紧密，智能机器人的相关技术受到了极大的重视。移动机器人的同步定位与建图(Simultaneous Localization and Mapping,SLAM)是当前智能移动机器人最主流的定位技术之一。它实际上就是一个运动估计问题，利用传感器获取的内部和外部数据，计算出移动机器人在某一时刻的位置，同时建立所依赖的地图模型。基于视觉的SLAM是属于视觉测量的研究范畴，由于视觉传感器拥有自身独特的优点：体积小，重量轻，价格便宜，安装方便，提取的外部信息非常丰富。这些优势进一步促进了当前基于视觉的SLAM的研究和应用。申请号为201310678018.2提供了一种采用SURF作为单目视觉SLAM的特征检测算子，分别从感兴趣点的检测、SURF描述子的生成及SURF点匹配等三个方面进行了创新和改进，提出了基于区域增长的SURF特征匹配方法完成机器人的同时定位与地图创建，使得在SURF描述子匹配过程中，某一描述子仅与最可能与之匹配的描述子进行比对，从而显著的减少了单目视觉SLAM问题中的比对次数，提高了匹配速度。但该方法依然存在不能建立米制地图以及无法提供三维特征信息的问题，算法鲁棒性与实时性有待进一步提高。As the relationship between robots and humans is getting closer, the related technologies of intelligent robots have received great attention. Simultaneous Localization and Mapping (SLAM) of mobile robots is one of the most mainstream positioning technologies for intelligent mobile robots. It is actually a motion estimation problem, using the internal and external data acquired by the sensor to calculate the position of the mobile robot at a certain moment, and at the same time establish the map model it depends on. Vision-based SLAM belongs to the research category of visual measurement, because the visual sensor has its own unique advantages: small size, light weight, cheap price, easy installation, and the extracted external information is very rich. These advantages further promote the current research and application of vision-based SLAM. The application number is 201310678018.2, which provides a feature detection operator using SURF as a monocular visual SLAM. It innovates and improves in three aspects: the detection of interest points, the generation of SURF descriptors, and the matching of SURF points. The SURF feature matching method based on region growth completes the simultaneous positioning and map creation of the robot, so that in the process of SURF descriptor matching, a certain descriptor is only compared with the most likely matching descriptor, thus significantly reducing the single The number of comparisons in the visual SLAM problem improves the matching speed. However, this method still has the problems of not being able to establish a metric map and not being able to provide three-dimensional feature information, and the robustness and real-time performance of the algorithm need to be further improved.

从运动中创建环境模型技术是一种同时求解摄像机运动轨迹和场景结构模型的方法。作为SFM的代表性方法，G.Klein等人于2007年在增强现实(AR)领域首先提出了PTAM(Parallel Tracking and Mapping)的概念，用来实时地解决环境识别问题。但未改进的PTAM算法也存在无法建立米制地图的问题，同时对相机的移动有着严格限制。The technique of creating an environment model from motion is a method to simultaneously solve the camera motion trajectory and the scene structure model. As a representative method of SFM, G. Klein et al. first proposed the concept of PTAM (Parallel Tracking and Mapping) in the field of augmented reality (AR) in 2007 to solve the problem of environment recognition in real time. However, the unimproved PTAM algorithm also has the problem of being unable to establish a metric map, and has strict restrictions on the movement of the camera.

发明内容Contents of the invention

针对以上存在的问题，本发明提出了一种基于地面特征的PTAM改进算法，在加入地面特征的基础上，PTAM对单目摄像机的当前位姿与三维空间点进行同步估计，在利用FAST角点方法更新检测特征点的同时，用最优的局部与全局光束平差法，不断地更新摄像机与三维特征点的位姿，完成了高精度定位与地图创建，该发明有效解决了无法创建米制地图的问题，同时在实时性及三维特征提取上有着极大的提高，对于解决机器人导航定位、三维重建中的相关问题有着重要意义。In view of the above existing problems, the present invention proposes an improved PTAM algorithm based on ground features. On the basis of adding ground features, PTAM performs synchronous estimation on the current pose of the monocular camera and three-dimensional space points. While updating the detection feature points, the optimal local and global beam adjustment method is used to continuously update the poses of the camera and 3D feature points, completing high-precision positioning and map creation. This invention effectively solves the problem of the inability to create metric maps At the same time, it has greatly improved the real-time performance and 3D feature extraction, which is of great significance for solving related problems in robot navigation and positioning and 3D reconstruction.

本发明采用如下的技术方案：The present invention adopts following technical scheme:

首先，完成参数校正，这包括参数定义与相机校正；接着对关键帧中的特征进行跟踪，再实现位姿的初始化，再此基础上对平面参数进行估计；然后利用相机获取当前环境纹理信息，并构建四层高斯图像金字塔，运用FAST角点检测算法提取当前图像中的特征信息，建立角点特征间的数据关联，得到位姿估计模型；在地图的初始绘制阶段，获取两个关键帧将相机架设在移动机器人上，通过相机与地面标定后开始进行初始化过程；在初始化过程中，移动机器人开始移动，同时相机捕获当前场景中角点信息并建立关联，由位姿估计模型获取当前位姿初始估计，再运用基于地面特征的位姿估计方法获取当前位姿的准确估计；实现三维稀疏地图的初始化后，更新关键帧并利用极线搜索与块匹配方法建立特征点亚像素精度映射关系，结合位姿估计模型实现相机精确重定位。最后将匹配点投影到空间，完成当前全局环境三维地图创建。同时利用全局与局部集束调整完成算法的一致性优化。First, complete the parameter correction, which includes parameter definition and camera correction; then track the features in the key frame, and then realize the initialization of the pose, and then estimate the plane parameters on this basis; then use the camera to obtain the current environment texture information, And build a four-layer Gaussian image pyramid, use the FAST corner detection algorithm to extract the feature information in the current image, establish the data association between the corner features, and obtain the pose estimation model; in the initial drawing stage of the map, two key frames will be obtained The camera is set up on the mobile robot, and the initialization process starts after the camera and the ground are calibrated; during the initialization process, the mobile robot starts to move, and at the same time, the camera captures the corner information in the current scene and establishes an association, and the current pose is obtained by the pose estimation model Initial estimation, and then use the pose estimation method based on ground features to obtain an accurate estimate of the current pose; after realizing the initialization of the 3D sparse map, update the key frame and use the epipolar search and block matching method to establish the sub-pixel precision mapping relationship of feature points, Combined with the pose estimation model to achieve precise camera relocalization. Finally, the matching points are projected into the space to complete the creation of the 3D map of the current global environment. At the same time, the consistency optimization of the algorithm is completed by using global and local bundle adjustment.

具体包括以下步骤：Specifically include the following steps:

步骤1，参数校正Step 1, parameter correction

步骤1.1，参数定义Step 1.1, parameter definition

(1)机器人位姿(1) Robot pose

在移动机器人起始处建立世界坐标系，在机器人移动过程中，其位姿可表示为：The world coordinate system is established at the starting point of the mobile robot. During the movement of the robot, its pose can be expressed as:

由上式可知移动机器人位姿可由x＝(x，y,θ)描述，其中，x，y分别表示移动机器人在x轴，y轴方向的平移量，θ为移动机器人绕z轴的旋转角度。It can be seen from the above formula that the pose of the mobile robot can be described by x=(x, y, θ), where x and y represent the translation of the mobile robot in the x-axis and y-axis directions respectively, and θ is the rotation angle of the mobile robot around the z-axis .

在世界坐标系下，相机的位姿可表示为:In the world coordinate system, the pose of the camera can be expressed as:

其中，T_c∈SE(3)，SE(3)为特殊刚体变换群，R_c为3×3旋转矩阵，t_c为3×1平移矩阵。该位姿矩阵建立了世界坐标系与相机坐标系下点p_c，p_w的对应关系，即p_c＝T_cp_w。在附图2所示，本研究中相机固定在移动机器人上，相机与机器人间存在固定的旋转平移关系T_cr。在机器人移动过程中，若某一时间间隔机器人位姿增量为T_rs，与之对应，相机的位姿增量为T_cs，则：Among them, T _c ∈ SE(3), SE(3) is a special rigid body transformation group, R _c is a 3×3 rotation matrix, and t _c is a 3×1 translation matrix. The pose matrix establishes the corresponding relationship between the world coordinate system and the points p _c and p _w in the camera coordinate system, that is, p _c = T _c p _w . As shown in Figure 2, the camera is fixed on the mobile robot in this study, and there is a fixed rotation-translation relationship T _cr between the camera and the robot. During the moving process of the robot, if the pose increment of the robot is T _rs at a certain time interval, and correspondingly, the pose increment of the camera is T _cs , then:

T_rs＝T_rcT_csT_cr (3)T _rs = T _rc T _cs T _cr (3)

其中， in,

(2)地平面标定参数(2) Ground plane calibration parameters

将相机固定在移动机器人上，并将标定靶标平放在地面。相机通过获取具有标定靶标的图像信息建立相机与机器人间位姿关系。T_cp∈SE(3)为相机与标定靶标间的变换矩阵，R_cp，t_cp分别表示T_cp的旋转与平移分量，其逆变换假定机器人的坐标原点与相机保持一致，且机器人的x轴与相机的z轴朝向相同。在靶标坐标系下，机器人的x方向向量可由T_cp的第3行向量的前两个分量表示。此时，机器人坐标系与靶标坐标系的变换关系可表示为：Fix the camera on the mobile robot, and place the calibration target flat on the ground. The camera establishes the pose relationship between the camera and the robot by acquiring image information with a calibrated target. T _cp ∈ SE(3) is the transformation matrix between the camera and the calibration target, R _cp and t _cp represent the rotation and translation components of T _cp respectively, and their inverse transformation It is assumed that the coordinate origin of the robot is consistent with the camera, and the x-axis of the robot is oriented in the same direction as the z-axis of the camera. In the target coordinate system, the x-direction vector of the robot can be represented by the first two components of the vector in the third row of _Tcp . At this time, the transformation relationship between the robot coordinate system and the target coordinate system can be expressed as:

其中，γ为机器人坐标系与靶标坐标系的旋转角度，(x_r,y_r)为机器人原点在靶标坐标系下的坐标，该坐标可由相机与靶标的平移向量确定。因此机器人与相机间的变换关系可表示为T_cr＝T_cpT_pr，根据相机与靶标平面间的位姿关系，可获取在相机坐标系下的地平面方程：Among them, γ is the rotation angle between the robot coordinate system and the target coordinate system, and (x _r , y _r ) is the coordinate of the robot origin in the target coordinate system, which can be determined by the translation vector of the camera and the target. Therefore, the transformation relationship between the robot and the camera can be expressed as T _cr = T _cp T _pr , According to the pose relationship between the camera and the target plane, the ground plane equation in the camera coordinate system can be obtained:

(0,0,1,0)T_pcp_c＝0 (5)(0,0,1,0)T _pc p _c =0 (5)

其中，p_c为相机坐标系下三维点，地平面参数可表示为(n^T,d)，且n＝(0,0,1)R_pc，d＝t_p(3)。Wherein, p _c is a three-dimensional point in the camera coordinate system, the ground plane parameter can be expressed as (n ^T ,d), and n=(0,0,1)R _pc , d=t _p (3).

步骤1.2，相机校正Step 1.2, Camera Correction

由于透镜在制造上的原因会存在畸变，畸变主要分为由透镜形状引起的镜向畸变和由摄像机组装过程引起的切向畸变。本文采用FOV模型实现对单目相机的矫正，校正模型如图3。该模型是由Devernay和Faugeras提出的一种针对广角相机去畸变方法，其数学描述如下：Due to the manufacturing reasons of the lens, there will be distortion. The distortion is mainly divided into mirror distortion caused by the shape of the lens and tangential distortion caused by the camera assembly process. In this paper, the FOV model is used to correct the monocular camera, and the correction model is shown in Figure 3. This model is proposed by Devernay and Faugeras as a de-distortion method for wide-angle cameras. Its mathematical description is as follows:

式中，x_u是像素归一化坐标，u_d是畸变后像素坐标，ω为畸变系数，r_d,r_u分别为矫正前后归一化坐标到坐标系中心的距离。In the formula, x _u is the normalized coordinate of the pixel, u _d is the pixel coordinate after distortion, ω is the distortion coefficient, r _d , r _u are the distances from the normalized coordinate before and after correction to the center of the coordinate system, respectively.

利用上述相机模型，将图像像素坐标映射到归一化坐标平面，同时结合相机内参数矩阵K，实现图像畸变矫正，即：Using the above camera model, the image pixel coordinates are mapped to the normalized coordinate plane, and combined with the camera internal parameter matrix K, image distortion correction is realized, namely:

u＝Kx_u (7)u＝Kx _u (7)

步骤2，基于地面特征的初始化Step 2, initialization based on ground features

步骤2.1，特征跟踪Step 2.1, Feature Tracking

FAST角点检测是由Edward Rosten和Tom Drummond提出的一种简单快速的角点检测算法。该算法利用像素点某个邻域(通常为圆形区域)内相邻像素点灰度值的差异，判断该像素点是否为角点，如下式：FAST corner detection is a simple and fast corner detection algorithm proposed by Edward Rosten and Tom Drummond. This algorithm uses the difference in the gray value of adjacent pixels in a certain neighborhood (usually a circular area) of a pixel to determine whether the pixel is a corner point, as follows:

其中，u_f为候选角点，s为候选角点邻域内任意一点，c(u_f)表示u_f的邻域，I(x)为在x处图像灰度值，若N超出阈值条件，x处的候选角点即为角点候选点即为角点。Among them, u _f is the candidate corner point, s is any point in the neighborhood of the candidate corner point, c(u _f ) represents the neighborhood of u _f , I(x) is the gray value of the image at x, if N exceeds the threshold condition, The candidate corner point at x is the corner point. The candidate point is the corner point.

在相机跟踪线程前，系统利用相机获取当前环境纹理信息，并构建四层高斯图像金字塔，采用FAST的角点检测算法，结合块搜索的方式建立角点特征间的数据关联，利用投影关系预测三维点与当前帧中特征点的对应关系，并在位置附近固定区域进行搜索，以获取准确的匹配关系。Before the camera tracks the thread, the system uses the camera to obtain the current environment texture information, and constructs a four-layer Gaussian image pyramid, uses the FAST corner detection algorithm, combines the block search method to establish the data association between corner features, and uses the projection relationship to predict 3D The corresponding relationship between the point and the feature point in the current frame, and search in a fixed area near the position to obtain an accurate matching relationship.

步骤2.2，位姿初始化Step 2.2, pose initialization

在机器人移动过程中，通过相机获取地面信息，并利用FAST算法实现角点特征的跟踪。根据上述数据关联信息，可建立平面点的单应性关系，即：During the movement of the robot, the ground information is obtained through the camera, and the corner features are tracked using the FAST algorithm. According to the above data association information, the homography relationship of plane points can be established, namely:

π(Hu_r)＝u_p (9)π(Hu _r )=u _p (9)

其中，π(p)为投影函数，p＝(x,y,z),π(p)＝(x/z,y/z,1)，u_r，u_p分别为参考帧与当前帧下对应特征点。根据相机平移量T_c以及地平面方程参数(n^T,d)，可建立平面点的单应性关系：Among them, π(p) is the projection function, p=(x,y,z), π(p)=(x/z,y/z,1), u _r , u _p are the reference frame and the current frame Corresponding feature points. According to the camera translation T _c and the ground plane equation parameters (n ^T ,d), the homography relationship of plane points can be established:

H＝KT_c(I|n_d)^TK^-1 (10)H＝KT _c (I|n _d ) ^T K ^-1 (10)

其中，n_d＝n/d，K为相机内参数矩阵。将式(3)代入式(10)，此时机器人的位姿与地面特征单应性关系可描述为：Wherein, n _d =n/d, K is the camera intrinsic parameter matrix. Substituting Equation (3) into Equation (10), the homography relationship between the pose of the robot and the ground features can be described as:

H＝KT_crT_rsT_rc(I|n_d)TK^-1 (11)H＝KT _cr T _rs T _rc (I n _d )TK ^-1 (11)

利用上述图像特征点的单应性关系，可建立位姿估计模型：Using the homography relationship of the above image feature points, the pose estimation model can be established:

其中，r(x)＝π(H(x)u_ref)-u_p为单应性关系误差函数，即投影误差函数，H(x)为单应性矩阵。根据小位移假设，误差函数r的一阶泰勒展开表示为：Among them, r(x)=π(H(x)u _ref )-u _p is the homography relationship error function, that is, the projection error function, and H(x) is the homography matrix. According to the small displacement assumption, the first-order Taylor expansion of the error function r is expressed as:

此时，能量函数的极值条件为：At this time, the extreme value condition of the energy function is:

即：which is:

其中，可通过迭代求解上式获取移动机器人的位姿估计。为了进一步提高定位精度，本文引入鲁棒性权值构建基于M-estimation的位姿估计模型，此时能量函数表示为：in, The pose estimation of the mobile robot can be obtained by iteratively solving the above formula. In order to further improve the positioning accuracy, this paper introduces robust weights to build a pose estimation model based on M-estimation. At this time, the energy function is expressed as:

其中，w(x)为鲁棒性Tukey加权函数，此时，能量函数的极值条件为：Among them, w(x) is the robust Tukey weighting function. At this time, the extreme value condition of the energy function is:

步骤2.3，平面参数估计Step 2.3, Planar Parameter Estimation

机器人利用基于地面特征的单应性关系实现机器人定位，但由于非地面特征的影响，难以保证定位精度，因此本发明采用一种快速的非地面特征去除方法。根据平行向量叉乘方式构建法向估计模型。实验发现，由于角点检测存在一定误差，使得理想叉乘条件无法满足，法向估计模型适应度差。利用投影误差r作为判定依据，从而确定非地面特征信息。但随着非地面特征与观察点间距离减小，该方法区分度明显降低。相比上述方法，本发明在位姿初始估计的基础上，利用单应性矩阵直接推导法向估计模型，通过获取其与地平面方程参数的差异，判别该角点是否为地面特征信息。根据平面单应性关系式，可建立平面参数n_d的估计模型：The robot uses the homography relationship based on ground features to realize robot positioning, but due to the influence of non-ground features, it is difficult to ensure the positioning accuracy, so the present invention adopts a fast non-ground feature removal method. The normal estimation model is constructed according to the cross product of parallel vectors. Experiments show that the ideal cross product condition cannot be satisfied due to certain errors in corner point detection, and the fitness of the normal estimation model is poor. The projection error r is used as the judgment basis to determine the non-ground feature information. However, as the distance between non-ground features and observation points decreases, the discrimination of this method decreases significantly. Compared with the above method, the present invention uses the homography matrix to directly derive the normal estimation model on the basis of the initial estimation of the pose, and judges whether the corner point is ground feature information by obtaining the difference between it and the parameters of the ground plane equation. According to the plane homography relation, the estimation model of the plane parameter n _d can be established:

其中，r(n_d)＝π(H(n_d)u_ref)-u_p为相应投影误差函数。与位姿估计求解过程类似，在小位移假设的基础上，可根据误差函数的一阶泰勒公式建立式(18)的求解模型。Wherein, r(n _d )=π(H(n _d )u _ref )-u _p is the corresponding projection error function. Similar to the solution process of pose estimation, on the basis of the small displacement assumption, the solution model of formula (18) can be established according to the first-order Taylor formula of the error function.

此时，平面参数可通过迭代式(18)获取：At this time, the plane parameters can be obtained by iterative formula (18):

其中， in,

步骤3，位姿估计Step 3, pose estimation

位姿估计模块的主要功能是完成相机位姿的实时更新。其具体实施过程如下。在完成初始化工作后，根据投影误差，建立位姿估计模型，其数学描述如下：The main function of the pose estimation module is to complete the real-time update of the camera pose. Its specific implementation process is as follows. After the initialization work is completed, the pose estimation model is established according to the projection error, and its mathematical description is as follows:

其中，e_j是投影误差，且为Tukey双权目标函数，σ_T为特征点匹配标准差的无偏估计值，μ为相机位姿六元组描述向量。where e _j is the projection error, and is the Tukey dual-weight objective function, σ _T is the unbiased estimate of the standard deviation of feature point matching, μ is the camera pose six-tuple description vector.

对于跟踪中存在的失败情况，提出特征观察比值用以评估每帧图像的跟踪质量。若比值低于设定阈值，可以为跟踪效果较差，则继续跟踪，但是系统不会将该帧更新到地图中；若在多帧图像的跟踪中都出现了特征观察比值低于设定阈值的情况时，即可认为属于“跟丢”的情况，那么就初始化跟踪过程。通过以上步骤实现了位姿的精确估计，跟踪过程得以正常进行。For the failures in tracking, a feature observation ratio is proposed to evaluate the tracking quality of each frame image. If the ratio is lower than the set threshold, the tracking effect is poor, and the tracking will continue, but the system will not update the frame to the map; if the feature observation ratio is lower than the set threshold in the tracking of multiple frames of images In the case of , it can be considered as a "follow-up" situation, and then the tracking process is initialized. Through the above steps, the precise estimation of the pose is realized, and the tracking process can be carried out normally.

步骤4，地图创建Step 4, map creation

步骤4.1，地图初始化Step 4.1, map initialization

系统利用利用地面特征与标准立体相机算法模型建立当前环境初始化地图。在地图的初始化过程中，利用图像中FAST角点匹配关系，结合RANSAC算法，采用基于地面特征的初始位姿估计方法，运用三角测量法计算当前特征点处的三维坐标，并确定起始点为全局世界坐标系原点，完成三维稀疏地图的初始化。The system uses the ground features and standard stereo camera algorithm model to establish the current environment initialization map. In the initialization process of the map, using the FAST corner point matching relationship in the image, combined with the RANSAC algorithm, using the initial pose estimation method based on ground features, using the triangulation method to calculate the three-dimensional coordinates of the current feature point, and determining the starting point as the global The origin of the world coordinate system to complete the initialization of the 3D sparse map.

在地图的初始绘制阶段，获取两个关键帧的具体过程如下，将相机架设在移动机器人上，通过相机与地面标定后开始进行初始化过程。在初始化过程中，移动机器人开始移动，同时相机捕获当前场景中角点信息并建立关联，通过求解式(12)获取当前位姿初始估计。当该初始估计超过设定阈值d₀时，运用2.2中基于地面特征的位姿估计方法获取当前位姿的准确估计。In the initial drawing stage of the map, the specific process of obtaining two key frames is as follows. The camera is set up on the mobile robot, and the initialization process starts after the camera and the ground are calibrated. In the initialization process, the mobile robot starts to move, and at the same time, the camera captures the corner information in the current scene and establishes an association, and obtains the initial estimation of the current pose by solving equation (12). When the initial estimate exceeds the set threshold d ₀ , use the pose estimation method based on ground features in 2.2 to obtain an accurate estimate of the current pose.

步骤4.2，关键帧更新与极线搜索Step 4.2, keyframe update and epipolar search

地图初始化时，仅仅包含两个关键帧，并且只能对邻近较小的体积空间进行描述，在相机离开初始位置时，需要添加新的关键帧以及地图特征。若相机与当前关键帧间图像帧数超出阈值条件，相机跟踪效果为最佳，且相机与最近的地图关键点距离尽量小时，将自动执行添加关键帧过程。由于实时性的约束，跟踪系统可能只是测度到了帧中可视特征的子集，因此映射线程需要重新计划和测量余下的地图特征。When the map is initialized, it only contains two keyframes, and can only describe the adjacent small volume space. When the camera leaves the initial position, new keyframes and map features need to be added. If the number of image frames between the camera and the current key frame exceeds the threshold condition, the camera tracking effect is the best, and the distance between the camera and the nearest map key point is as small as possible, the process of adding key frames will be automatically executed. Due to real-time constraints, the tracking system may only measure a subset of features visible in the frame, so the mapping thread needs to replan and measure the remaining map features.

在关键帧的更新中，系统首先将会对新增加的关键帧中所有角点进行Shi-Tomas评估，获取当前具有显著特征的角点信息，成功观测的特征点附近的显著点将会被丢弃，保留下来的显著性点将作为候选的地图节点。由于新的地图节点需要已知深度信息，仅靠单帧图像无法获取，因此需要选取与之最近的关键帧作为第二视图，利用极线搜索与块匹配方法建立特征点亚像素精度映射关系。在第二视图中，采用零均值SSD法，结合位姿估计模型实现相机精确重定位。最后将匹配点投影到空间，以生成当前全局环境三维地图。In the update of the key frame, the system will first perform Shi-Tomas evaluation on all the corner points in the newly added key frame to obtain the current corner point information with salient features, and the salient points near the successfully observed feature points will be discarded , the remaining salient points will be used as candidate map nodes. Since the new map node requires known depth information, which cannot be obtained only by a single frame image, it is necessary to select the nearest key frame as the second view, and use epipolar search and block matching methods to establish the sub-pixel precision mapping relationship of feature points. In the second view, the zero-mean SSD method is adopted, combined with the pose estimation model to achieve precise camera relocalization. Finally, the matched points are projected into space to generate a 3D map of the current global environment.

步骤4.3，优化Step 4.3, Optimization

利用最优的局部与全局LM(Levenberg-Marquardt)集束调整算法实现当前地图全局一致性优化以提高鲁棒性。该集束调整算法的数学描述为：The optimal local and global LM (Levenberg-Marquardt) bundle adjustment algorithm is used to optimize the global consistency of the current map to improve robustness. The mathematical description of the cluster adjustment algorithm is:

其中，σ_ji为在第i个关键帧中，FAST特征点匹配标准差的无偏估计，μ_i表示第i个关键帧的位姿六元组描述向量，p_i为全局地图中的点。Among them, σ _ji is the unbiased estimate of the matching standard deviation of FAST feature points in the i-th keyframe, μ _i is the pose six-tuple description vector of the i-th keyframe, and p _i is a point in the global map.

通过不断的集束调整，使得(21)中的稳定代价函数极小。全局集束调整所有关键帧的位姿以及地图点的位置，同时利用结构-运动问题中的内在稀疏性使得整体矩阵分解的复杂度呈现立方级别的下降，由O((N+M)³)降至O(N³)，但在多数的估算情况中又受到生成的外点矢量积的制约，随着地图的生长，这个运算量会显著增加。The stable cost function in (21) is made extremely small by continuous bundle adjustment. Global clustering adjusts the poses of all keyframes and the positions of map points, and at the same time utilizes the inherent sparsity in the structure-motion problem to make the complexity of the overall matrix decomposition present a cubic-level reduction, which is reduced by O((N+M) ³ ) to O(N ³ ), but in most estimation cases is constrained by the generated outer point vector product, which increases significantly as the map grows.

针对以上问题，引入局部集束调整，这里只对关键帧的子集位姿进行调整，此时的目标函数变为：Aiming at the above problems, local bundle adjustment is introduced, here only the subset poses of the key frames are adjusted, and the objective function at this time becomes:

X为系列关键帧的待调整集合，Y为固定帧集，Z为地图点子集。集合X由地图中一个最新的关键帧以及4个最接近的关键帧构成，而关键帧的所有地图中的可视点又组成了集合Z，最后Y包含了所有由Z中测量数据所获取的关键帧。由此，局部集束调整优化了最新、最近的关键帧的位姿，更新了地图中的所有点，同时复杂度也显著降低，不超过O(NM)。X is a series of keyframes to be adjusted, Y is a fixed frame set, and Z is a subset of map points. The set X is composed of the latest keyframe and the 4 closest keyframes in the map, and all the visible points in the map of the keyframe form the set Z, and finally Y contains all the keys obtained from the measurement data in Z frame. As a result, local bundle adjustment optimizes the pose of the latest, most recent keyframe, updating all points in the map, while also significantly reducing the complexity to no more than O(NM).

与原PTAM算法相比，本发明加入了地面特征来对其进行改进，实现了如下的几个优点：Compared with the original PTAM algorithm, the present invention adds ground features to improve it, and realizes the following advantages:

(1)改进了初始化过程，有效解除了PTAM算法对相机移动的严格限制。(1) The initialization process is improved, which effectively removes the strict restrictions on camera movement imposed by the PTAM algorithm.

(2)可有效建立相对准确的米制地图，为进一步进行环境测量提供可靠保证。(2) It can effectively establish a relatively accurate metric map and provide a reliable guarantee for further environmental measurement.

附图说明Description of drawings

图1为本发明所涉及方法的流程图；Fig. 1 is the flowchart of the method involved in the present invention;

图2相机校正模型；Figure 2 Camera correction model;

图3平面参数标定示意图。Figure 3 Schematic diagram of planar parameter calibration.

具体实施方式detailed description

下面结合附图对本发明专利进行进一步详细说明。Below in conjunction with accompanying drawing, the patent of the present invention is described in further detail.

基于地面特征的PTAM改进算法流程图如附图(1)所示，具体包括以下几个步骤：The PTAM improved algorithm flow chart based on ground features is shown in Figure (1), which specifically includes the following steps:

步骤1，参数校正Step 1, parameter correction

步骤1.1，参数定义Step 1.1, parameter definition

由机器人坐标系与世界坐标系的关系，构建机器人位姿表示，而由相机与靶标平面间的位姿关系确定地平面标定参数。The robot pose representation is constructed from the relationship between the robot coordinate system and the world coordinate system, and the ground plane calibration parameters are determined by the pose relationship between the camera and the target plane.

步骤1.2，相机校正Step 1.2, Camera Correction

采用FOV模型实现单目相机的校正，将图像像素坐标映射到归一化坐标平面，同时结合相机内参数矩阵K，实现图像畸变矫正。The FOV model is used to correct the monocular camera, and the image pixel coordinates are mapped to the normalized coordinate plane. At the same time, the image distortion correction is realized by combining the camera internal parameter matrix K.

步骤2.1，特征跟踪Step 2.1, Feature Tracking

获取环境纹理信息，提取特征，建立角点特征关联。Obtain environmental texture information, extract features, and establish corner feature associations.

步骤2.2，位姿初始化Step 2.2, pose initialization

建立单应性关系以及位姿初始化估计模型，实现位姿初始化。Establish a homography relationship and a pose initialization estimation model to realize pose initialization.

步骤2.3，平面参数估计Step 2.3, Planar Parameter Estimation

步骤3，位姿估计Step 3, pose estimation

在完成初始化的基础上，根据基于地面特征的姿估计模型，实现相机的初始化位姿估计。更进一步，PTAM根据相机初始化位姿，采用极线搜索方式，建立图像金字塔中角点特征亚像素精度匹配关系，并重新带入位姿估计模型，实现相机精确重定位。After the initialization is completed, the initialization pose estimation of the camera is realized according to the pose estimation model based on ground features. Furthermore, PTAM uses the epipolar search method to establish the sub-pixel precision matching relationship of the corner points in the image pyramid according to the initial pose of the camera, and reintroduces it into the pose estimation model to achieve precise camera repositioning.

步骤4，地图创建Step 4, map creation

步骤4.1，地图初始化Step 4.1, map initialization

系统利用标准立体相机算法模型建立当前环境初始化地图The system uses the standard stereo camera algorithm model to establish the current environment initialization map

添加新的关键帧并利用极线搜索与块匹配方法建立特征点亚像素精度映射关系，结合位姿估计模型实现相机精确重定位。最后将匹配点投影到空间，完成当前全局环境三维地图创建。Add a new key frame and use the epipolar search and block matching methods to establish the sub-pixel precision mapping relationship of feature points, and combine the pose estimation model to achieve precise camera relocation. Finally, the matching points are projected into the space to complete the creation of the 3D map of the current global environment.

步骤4.3，优化Step 4.3, Optimization

利用全局与局部集束调整实现地图的一致性优化。Consistency optimization of maps is achieved using global and local bundle adjustments.

Claims

1. A kind of PTAM improvement method based on intelligent robot ground feature, it is characterized in that:

First, complete the parameter correction, which includes parameter definition and camera correction; then track the features in the key frame, and then realize the initialization of the pose, and then estimate the plane parameters on this basis; then use the camera to obtain the current environment texture information, And build a four-layer Gaussian image pyramid, use the FAST corner detection algorithm to extract the feature information in the current image, establish the data association between the corner features, and obtain the pose estimation model; in the initial drawing stage of the map, two key frames will be obtained The camera is set up on the mobile robot, and the initialization process starts after the camera and the ground are calibrated; during the initialization process, the mobile robot starts to move, and at the same time, the camera captures the corner information in the current scene and establishes an association, and the current pose is obtained by the pose estimation model Initial estimation, and then use the pose estimation method based on ground features to obtain an accurate estimate of the current pose; after realizing the initialization of the 3D sparse map, update the key frame and use the epipolar search and block matching method to establish the sub-pixel precision mapping relationship of feature points, Combined with the pose estimation model to achieve precise camera relocation; finally, the matching points are projected into space to complete the creation of a 3D map of the current global environment; at the same time, the consistency optimization of the algorithm is completed by using global and local cluster adjustments;

Specifically include the following steps:

Step 1, parameter correction

Step 1.1, parameter definition

(1) Robot pose

The world coordinate system is established at the starting point of the mobile robot. During the movement of the robot, its pose is expressed as:

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>T</mi> <mi>r</mi> </msub> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mi>R</mi> <mi>r</mi> </msub> </mtd> <mtd> <msub> <mi>t</mi> <mi>r</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>R</mi> <mi>r</mi> </msub> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mrow> <mi>cos</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> <msub> <mi>t</mi> <mi>r</mi> </msub> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mi>x</mi> </mtd> </mtr> <mtr> <mtd> <mi>y</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

It can be seen from the above formula that the pose of the mobile robot is described by x=(x, y, θ), where x and y represent the translation of the mobile robot in the x-axis and y-axis directions respectively, and θ is the rotation angle of the mobile robot around the z-axis ;

In the world coordinate system, the pose of the camera is expressed as:

Among them, T _c ∈ SE(3), SE(3) is a special rigid body transformation group, R _c is a 3×3 rotation matrix, t _c is a 3×1 translation matrix; the pose matrix establishes the world coordinate system and the camera coordinate Corresponding relationship between points p _c and p _w under the system, that is, p _c = T _c p _w ; in this method, the camera is fixed on the mobile robot, and there is a fixed rotation-translation relationship T _cr between the camera and the robot; during the movement of the robot, If the pose increment of the robot in a certain time interval is T _rs , and correspondingly, the pose increment of the camera is T _cs , then:

T _rs = T _rc T _cs T _cr (3)

in,

(2) Ground plane calibration parameters

Fix the camera on the mobile robot, and place the calibration target flat on the ground; the camera establishes the pose relationship between the camera and the robot by acquiring image information with the calibration target; T _cp ∈ SE(3) is the transformation between the camera and the calibration target Matrix, R _cp , t _cp represent the rotation and translation components of T _cp respectively, and its inverse transformation It is assumed that the coordinate origin of the robot is consistent with the camera, and the x-axis of the robot is oriented in the same direction as the z-axis of the camera; in the target coordinate system, the x-direction vector of the robot is represented by the first two components of the third line vector of T _cp ; When , the transformation relationship between the robot coordinate system and the target coordinate system is expressed as:

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>p</mi> <mi>r</mi> </msub> <mo>=</mo> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mi>p</mi> </mrow> </msub> <msub> <mi>p</mi> <mi>p</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>T</mi> <mrow> <mi>r</mi> <mi>p</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mi>&gamma;</mi> </mrow> </mtd> <mtd> <mrow> <mi>sin</mi> <mi>&gamma;</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>x</mi> <mi>r</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mi>sin</mi> <mi>&gamma;</mi> </mrow> </mtd> <mtd> <mrow> <mi>cos</mi> <mi>&gamma;</mi> </mrow> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>y</mi> <mi>r</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Among them, γ is the rotation angle between the robot coordinate system and the target coordinate system, (x _r , y _r ) is the coordinate of the robot origin in the target coordinate system, which is determined by the translation vector between the camera and the target; therefore, the distance between the robot and the camera The transformation relation is expressed as T _cr =T _cp T _pr , According to the pose relationship between the camera and the target plane, the ground plane equation in the camera coordinate system is obtained:

(0,0,1,0)T _pc p _c =0 (5)

Among them, p _c is a three-dimensional point in the camera coordinate system, the ground plane parameter is expressed as (n ^T ,d), and n=(0,0,1)R _pc , d=t _p (3);

Step 1.2, Camera Correction

There will be distortion due to the manufacturing of the lens, and the distortion is mainly divided into mirror distortion caused by the lens shape and tangential distortion caused by the camera assembly process; this method uses the FOV model to correct the monocular camera; the model is A de-distortion method for wide-angle cameras proposed by Devernay and Faugeras, whose mathematical description is as follows:

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>u</mi> <mi>d</mi> </msub> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>u</mi> <mn>0</mn> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>v</mi> <mn>0</mn> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>+</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <mi>u</mi> </msub> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>f</mi> <mi>v</mi> </msub> </mtd> </mtr> </mtable> </mfenced> <mfrac> <msub> <mi>r</mi> <mi>d</mi> </msub> <msub> <mi>r</mi> <mi>u</mi> </msub> </mfrac> <msub> <mi>x</mi> <mi>u</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>r</mi> <mi>d</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>&omega;</mi> </mfrac> <mi>arctan</mi> <mrow> <mo>(</mo> <mn>2</mn> <msub> <mi>r</mi> <mi>u</mi> </msub> <mi>tan</mi> <mfrac> <mi>&omega;</mi> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>r</mi> <mi>u</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>tan</mi> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mi>d</mi> </msub> <mi>&omega;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mn>2</mn> <mi>tan</mi> <mfrac> <mi>&omega;</mi> <mn>2</mn> </mfrac> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

In the formula, x _u is the normalized coordinate of the pixel, u _d is the pixel coordinate after distortion, ω is the distortion coefficient, r _d and r _u are the distances from the normalized coordinate before and after correction to the center of the coordinate system;

Using the camera model, the image pixel coordinates are mapped to the normalized coordinate plane, and combined with the camera internal parameter matrix K, the image distortion correction is realized, namely:

u＝Kx _u (7)

Step 2, initialization based on ground features

Step 2.1, Feature Tracking

FAST corner detection is a simple and fast corner detection algorithm proposed by Edward Rosten and Tom Drummond; this algorithm uses the difference in the gray value of adjacent pixels in a certain neighborhood of a pixel to determine whether the pixel is a corner , as follows:

<mrow> <mi>N</mi> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>&Element;</mo> <mi>c</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> </mrow> </munder> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

Among them, u _f is the candidate corner point, s is any point in the neighborhood of the candidate corner point, c(u _f ) represents the neighborhood of u _f , I(s) is the gray value of the image at s, if N exceeds the threshold condition, The candidate corner point at s is the corner point;

Before the camera tracks the thread, the system uses the camera to obtain the current environment texture information, and constructs a four-layer Gaussian image pyramid, uses the FAST corner detection algorithm, combines the block search method to establish the data association between corner features, and uses the projection relationship to predict 3D The corresponding relationship between the point and the feature point in the current frame, and search in a fixed area near the position to obtain an accurate matching relationship;

Step 2.2, pose initialization

During the moving process of the robot, the ground information is obtained through the camera, and the corner point features are tracked using the FAST algorithm; according to the above data association information, the homography relationship of the plane points is established, namely:

π(Hu _r )=u _p (9)

Among them, π(p) is the projection function, p=(x,y,z), π(p)=(x/z,y/z,1), u _r , u _p are the reference frame and the current frame Corresponding to the feature point; according to the camera translation T _c and the ground plane equation parameters (n ^T , d), the homography relationship of the plane point is established:

H＝KT _c (I|n _d ) ^T K ^-1 (10)

Among them, n _d =n/d, K is the internal parameter matrix of the camera; Substituting Equation (3) into Equation (10), the relationship between the pose of the robot and the ground feature homography is described as:

H＝KT _cr T _rs T _rc (I n _d )TK ^-1 (11)

Using the homography relationship of the above image feature points, a pose estimation model is established:

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>x</mi> <mo>=</mo> <mi>arg</mi> <munder> <mi>min</mi> <mi>x</mi> </munder> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

Among them, r(x)=π(H(x)u _ref )-u _p is the homography relationship error function, that is, the projection error function, and H(x) is the homography matrix; according to the small displacement assumption, the error function r The first-order Taylor expansion of is expressed as:

At this time, the extreme value condition of the energy function is:

<mrow> <mfrac> <mrow> <mo>&part;</mo> <mi>f</mi> </mrow> <mrow> <mo>&part;</mo> <mi>&xi;</mi> </mrow> </mfrac> <mo>=</mo> <msup> <mi>&Sigma;J</mi> <mi>T</mi> </msup> <mi>r</mi> <mo>=</mo> <mn>0</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

which is:

<mrow> <mtable> <mtr> <mtd> <mrow> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <msup> <mi>J</mi> <mi>T</mi> </msup> <mrow> <mo>(</mo> <mi>r</mi> <mo>(</mo> <mn>0</mn> <mo>)</mo> <mo>+</mo> <mi>J</mi> <mi>&xi;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>&xi;</mi> <mo>=</mo> <mo>-</mo> <msup> <mrow> <mo>(</mo> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <msup> <mi>J</mi> <mi>T</mi> </msup> <mi>J</mi> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <msup> <mi>J</mi> <mi>T</mi> </msup> <mi>r</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>

in, The pose estimation of the mobile robot is obtained by iteratively solving the above formula; in order to further improve the positioning accuracy, robust weights are introduced to build a pose estimation model based on M-estimation. At this time, the energy function is expressed as:

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>p</mi> </munder> <mi>w</mi> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>16</mn> <mo>)</mo> </mrow> </mrow>

Among them, w(r) is the robust Tukey weighting function. At this time, the extreme value condition of the energy function is:

<mrow> <mi>&xi;</mi> <mo>=</mo> <mo>-</mo> <msup> <mrow> <mo>(</mo> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <msup> <mi>J</mi> <mi>T</mi> </msup> <mi>w</mi> <mo>(</mo> <mi>r</mi> <mo>)</mo> <mi>J</mi> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <munder> <mi>&Sigma;</mi> <mi>p</mi> </munder> <msup> <mi>J</mi> <mi>T</mi> </msup> <mi>w</mi> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mi>r</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow>

Step 2.3, Planar Parameter Estimation

The robot uses the homography relationship based on ground features to realize robot positioning, but due to the influence of non-ground features, it is difficult to ensure the positioning accuracy, so a fast non-ground feature removal method is adopted; the normal direction estimation model is constructed according to the parallel vector cross product method ;Due to the error in corner detection, the ideal cross product condition cannot be satisfied, and the fitness of the normal estimation model is poor; the projection error r is used as the judgment basis to determine the non-ground feature information; but as the distance between the non-ground feature and the observation point , the discrimination of this method is significantly reduced; compared with the above method, on the basis of the initial estimation of the pose, the normal estimation model is directly derived by using the homography matrix, and the difference between the angle and the parameters of the ground plane equation is obtained to determine the angle Whether the point is ground feature information; according to the plane homography relational formula, the estimation model of the plane parameter n _d is established:

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>p</mi> </munder> <mo>|</mo> <mo>|</mo> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow>

Among them, r(n _d )=π(H(n _d )u _ref )-u _p is the corresponding projection error function; similar to the pose estimation solution process, based on the small displacement assumption, according to the first-order Taylor of the error function The solution model of formula establishment formula (18);

At this time, the plane parameters are obtained by iterative formula (18):

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>n</mi> <mi>d</mi> </msub> <mo>=</mo> <msub> <mi>n</mi> <mi>d</mi> </msub> <mo>+</mo> <mover> <mi>n</mi> <mo>^</mo> </mover> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mover> <mi>n</mi> <mo>^</mo> </mover> <mo>=</mo> <mo>-</mo> <msup> <mrow> <mo>(</mo> <munder> <mo>&Sigma;</mo> <mi>p</mi> </munder> <msubsup> <mi>J</mi> <mi>n</mi> <mi>T</mi> </msubsup> <msub> <mi>J</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <munder> <mo>&Sigma;</mo> <mi>p</mi> </munder> <msubsup> <mi>J</mi> <mi>n</mi> <mi>T</mi> </msubsup> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow>

in,

Step 3, pose estimation

The main function of the pose estimation module is to complete the real-time update of the camera pose; the specific implementation process is as follows; after the initialization is completed, the pose estimation model is established according to the projection error, and its mathematical description is as follows:

<mrow> <mi>&xi;</mi> <mo>=</mo> <munder> <mi>argmin</mi> <mi>&xi;</mi> </munder> <mi>&Sigma;</mi> <mi>O</mi> <mi>b</mi> <mi>j</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>e</mi> <mi>j</mi> </msub> <mo>|</mo> </mrow> <msub> <mi>&sigma;</mi> <mi>j</mi> </msub> </mfrac> <mo>,</mo> <msub> <mi>&sigma;</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>

where e _j is the projection error, and is the Tukey dual-weight objective function, σ _T is the unbiased estimate of the standard deviation of feature point matching, μ is the camera pose six-tuple description vector;

For the failures in tracking, the feature observation ratio is proposed to evaluate the tracking quality of each frame image; if the ratio is lower than the set threshold, the tracking effect is poor, and the tracking will continue, but the system will not update the frame to the map In the middle; if the feature observation ratio is lower than the set threshold in the tracking of multiple frames of images, it is considered to be a "follow-up" situation, and the tracking process is initialized; the accuracy of the pose is achieved through the above steps It is estimated that the tracking process can be carried out normally;

Step 4, map creation

Step 4.1, map initialization

The system uses the ground features and the standard stereo camera algorithm model to establish the current environment initialization map; in the map initialization process, using the FAST corner point matching relationship in the image, combined with the RANSAC algorithm, using the initial pose estimation method based on ground features, using triangulation The measurement method calculates the three-dimensional coordinates of the current feature point, and determines the starting point as the origin of the global world coordinate system, and completes the initialization of the three-dimensional sparse map;

In the initial drawing stage of the map, the specific process of obtaining two key frames is as follows. The camera is set up on the mobile robot, and the initialization process starts after the camera and the ground are calibrated; during the initialization process, the mobile robot starts to move, and the camera captures the current Corner information in the scene is associated and the initial estimate of the current pose is obtained by solving equation (12); when the initial estimate exceeds the set threshold d ₀ , the pose estimation method based on ground features in 2.2 is used to obtain the current pose accurate estimate;

Step 4.2, keyframe update and epipolar search

When the map is initialized, it only contains two keyframes, and can only describe the adjacent small volume space. When the camera leaves the initial position, it is necessary to add new keyframes and map features; if the image frame between the camera and the current keyframe The number exceeds the threshold condition, the camera tracking effect is the best, and the distance between the camera and the nearest map key point is as small as possible, and the process of adding key frames will be performed automatically; due to real-time constraints, the tracking system may only measure the visual features in the frame subset, so the map thread needs to replan and measure the remaining map features;

In the update of the key frame, the system will first perform Shi-Tomas evaluation on all the corner points in the newly added key frame to obtain the current corner point information with salient features, and the salient points near the successfully observed feature points will be discarded , the retained salient points will be used as candidate map nodes; since the new map nodes require known depth information, which cannot be obtained only by a single frame image, it is necessary to select the nearest key frame as the second view, using the epipolar line The search and block matching method establishes the sub-pixel precision mapping relationship of feature points; in the second view, the zero-mean SSD method is used, combined with the pose estimation model to achieve precise camera relocation; finally, the matching points are projected into space to generate the current global environment 3D map;

Step 4.3, Optimization

The optimal local and global LM cluster adjustment algorithm is used to optimize the global consistency of the current map to improve robustness; the mathematical description of the cluster adjustment algorithm is:

<mrow> <mo>{</mo> <mo>{</mo> <msub> <mi>&mu;</mi> <mn>2</mn> </msub> <mo>...</mo> <msub> <mi>&mu;</mi> <mi>N</mi> </msub> <mo>}</mo> <mo>,</mo> <mo>{</mo> <msub> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mn>1</mn> </msub> <mo>...</mo> <msub> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mi>M</mi> </msub> <mo>}</mo> <mo>}</mo> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <mo>{</mo> <mo>{</mo> <mi>&mu;</mi> <mo>}</mo> <mo>,</mo> <mo>{</mo> <mi>p</mi> <mo>}</mo> <mo>}</mo> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>&Element;</mo> <mi>X</mi> <mo>&Element;</mo> <mi>Y</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>&Element;</mo> <msub> <mi>S</mi> <mi>i</mi> </msub> </mrow> </munder> <mi>O</mi> <mi>b</mi> <mi>j</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>e</mi> <mrow> <mi>j</mi> <mi>i</mi> </mrow> </msub> <mo>|</mo> </mrow> <msub> <mi>&sigma;</mi> <mrow> <mi>j</mi> <mi>i</mi> </mrow> </msub> </mfrac> <mo>,</mo> <msub> <mi>&sigma;</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>21</mn> <mo>)</mo> </mrow> </mrow>

Among them, σ _ji is the unbiased estimate of the matching standard deviation of FAST feature points in the i-th key frame, μ _i represents the pose six-tuple description vector of the i-th key frame, and p _i is a point in the global map;

Through continuous bundle adjustment, the stable cost function in formula (21) is minimized; the global bundle adjusts the poses of all key frames and the positions of map points, and at the same time utilizes the inherent sparsity in the structure-motion problem to make the overall matrix decomposition The complexity shows a cubic level of decline, from O((N+M) ³ ) to O(N ³ ), but in most estimation cases it is constrained by the generated outer point vector product. As the map grows, This calculation will increase significantly;

Aiming at the above problems, local bundle adjustment is introduced, here only the subset poses of the key frames are adjusted, and the objective function at this time becomes:

<mrow> <mo>{</mo> <mo>{</mo> <msub> <mi>&mu;</mi> <mi>&chi;</mi> </msub> <mo>&Element;</mo> <mi>X</mi> <mo>}</mo> <mo>,</mo> <mo>{</mo> <msub> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mi>z</mi> </msub> <mo>&Element;</mo> <mi>Z</mi> <mo>}</mo> <mo>}</mo> <mo>=</mo> <munder> <mi>argmin</mi> <mrow> <mo>{</mo> <mo>{</mo> <mi>&mu;</mi> <mo>}</mo> <mo>,</mo> <mo>{</mo> <mi>p</mi> <mo>}</mo> <mo>}</mo> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>&Element;</mo> <mi>X</mi> <mo>&Element;</mo> <mi>Y</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>&Element;</mo> <mi>Z</mi> <mo>&cap;</mo> <msub> <mi>S</mi> <mi>i</mi> </msub> </mrow> </munder> <mi>O</mi> <mi>b</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>22</mn> <mo>)</mo> </mrow> </mrow>

X is a series of keyframes to be adjusted, Y is a fixed frame set, and Z is a subset of map points; the set X is composed of the latest keyframe and the 4 closest keyframes in the map, and all keyframes in the map The visible points form a set Z, and finally Y contains all the key frames obtained from the measurement data in Z; thus, the local cluster adjustment optimizes the pose of the latest and most recent key frames, and updates all points in the map. At the same time, the complexity is also significantly reduced, not exceeding O(NM).