WO2023184968A1 - Structured scene visual slam method based on point line surface features - Google Patents

Structured scene visual slam method based on point line surface features Download PDF

Info

Publication number
WO2023184968A1
WO2023184968A1 PCT/CN2022/128826 CN2022128826W WO2023184968A1 WO 2023184968 A1 WO2023184968 A1 WO 2023184968A1 CN 2022128826 W CN2022128826 W CN 2022128826W WO 2023184968 A1 WO2023184968 A1 WO 2023184968A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
point
line
plane
coordinate system
Prior art date
Application number
PCT/CN2022/128826
Other languages
French (fr)
Chinese (zh)
Inventor
裴海龙
翁卓荣
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Publication of WO2023184968A1 publication Critical patent/WO2023184968A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • n 1 , n 2 , n 3 represent the found mutually orthogonal normal vectors, c i represents the id of the current frame, and m k represents the id of the Manhattan world coordinate system; for Perform SVD decomposition to obtain the rotation matrix from the Manhattan world coordinate system to the camera coordinate system after orthogonalization Then the rotation matrix from the camera coordinate system to the world coordinate system is:
  • Figure 1 is a flow chart of an embodiment of a structured scene visual SLAM method based on point, line and surface features
  • a structured scene visual SLAM method based on point, line and surface features including the following steps:

Abstract

Disclosed is a structured scene visual SLAM method based on point line surface features, wherein the method comprises: first, inputting a color image and a depth image corresponding thereto, extracting image point line surface features, and performing feature matching; then, using a plane normal vector to detect a Manhattan world coordinate system, and if a Manhattan world coordinate system exists and appears in a Manhattan world map, then solving for the camera attitude and tracking the point line surface features to estimate displacement, and if not, then tracking the point line surface features to estimate the attitude; then performing key frame judgment on a current frame, and if the current frame is the key frame then inserting same into a local map; subsequently maintaining the map information, and performing joint optimization on the current key frame, adjacent key frames and three-dimensional features; and finally, performing loopback detection, and if a closed-loop frame is detected, then closing the loop and performing global optimization. The present invention comprises a visual SLAM method with high precision and strong robustness, and solves the problem of reduced precision and even system failure when visual SLAM is based only on point features in a low-texture structured scene.

Description

一种基于点线面特征的结构化场景视觉SLAM方法A structured scene visual SLAM method based on point, line and surface features 技术领域Technical field
本发明属于机器人同时定位与地图构建技术领域,具体设计一种基于点线面特征的结构化场景视觉SLAM方法。The invention belongs to the technical field of robot simultaneous positioning and map construction, and specifically designs a structured scene visual SLAM method based on point, line and surface features.
背景技术Background technique
近年来智能车辆定位系统在城市交通中的应用越来越广泛,传统定位方法如基于GNSS技术的室外定位系统目前已经非常成熟。然而,在GNSS信号受到遮蔽的室内环境,高性能的实时定位仍然是一个难点。虽然基于无线信号的室内定位系统得到了长足发展,如基于蓝牙、WiFi、UWB以及RFID等,但由于其设备部署成本高昂,而且在室内容易受到遮挡以及多路径效应的干扰,难以有效用于室内定位与建图。In recent years, intelligent vehicle positioning systems have become more and more widely used in urban transportation. Traditional positioning methods such as outdoor positioning systems based on GNSS technology are now very mature. However, high-performance real-time positioning is still a difficulty in indoor environments where GNSS signals are obscured. Although indoor positioning systems based on wireless signals have made great progress, such as those based on Bluetooth, WiFi, UWB, and RFID, they are difficult to effectively use indoors due to the high cost of equipment deployment and the fact that they are easily interfered by occlusion and multipath effects indoors. Positioning and mapping.
针对上述问题,同时定位与建图方法(SLAM)被提出。SLAM是一种在未知环境中,进行自身定位并同时建立地图的技术。主要应用在移动机器人、自动驾驶、虚拟现实、增强现实等方面。针对室内环境已经有成熟的基于激光的SLAM系统如cartographer、hector map、gmapping等系统提出。但由于激光雷达造价仍然较为昂贵,因此人们普遍期望采用价格低廉的视觉传感器,比如单目相机,双目相机,深度相机等,系统将图像作为输入,输出相机轨迹以及重建的三维地图,如DSO利用摄像机等实现高精度的室内定位。然而,现有的基于视觉的SLAM系统大量依赖对图像中的低层次的点特征的提取,如目前先进的基于点特征的ORB-SLAM2(ORB-SLAM2:An Open Source SLAM System for Monocular Stereo and RGB-D Cameras),该SLAM方法仅基于点特征点实现位姿估计,虽然在纹理丰富的环境下能够实现高精度定位,但在纹理特征缺乏、光照变化比较大的人造室内环境,点特征难以提取且不稳定,该方法位姿估计精度会变差甚至会跟踪失败。在该类人造室内环境中富含墙面、地面、天花板等规则的几何结构,以线、面为代表的结构特征相比于点特征,更易获取,且不易受光照变化的影响,因此如何在纹理特征缺乏的室内几何结构化场景利用视觉信息找到适用于SLAM定位的结构特征显得尤为重要。通过利用室内几何结构元素,本发明提出了一种基于点线面特征的结构化场景视觉SLAM方法,解决在纹理缺乏、光照变化的人造结构化室内场景中位姿估计精度低、鲁棒性差的问题。In response to the above problems, the Simultaneous Localization and Mapping Method (SLAM) was proposed. SLAM is a technology that locates itself in an unknown environment and builds a map at the same time. Mainly used in mobile robots, autonomous driving, virtual reality, augmented reality, etc. For indoor environments, mature laser-based SLAM systems such as cartographer, hector map, gmapping and other systems have been proposed. However, since lidar is still relatively expensive, people generally expect to use low-cost visual sensors, such as monocular cameras, binocular cameras, depth cameras, etc. The system takes images as input and outputs camera trajectories and reconstructed three-dimensional maps, such as DSO Utilize cameras, etc. to achieve high-precision indoor positioning. However, existing vision-based SLAM systems rely heavily on the extraction of low-level point features in images, such as the current advanced point feature-based ORB-SLAM2 (ORB-SLAM2: An Open Source SLAM System for Monocular Stereo and RGB -D Cameras), this SLAM method only achieves pose estimation based on point feature points. Although it can achieve high-precision positioning in texture-rich environments, point features are difficult to extract in artificial indoor environments where texture features are lacking and illumination changes are large. And it is unstable, and the pose estimation accuracy of this method will become worse or even tracking will fail. This type of artificial indoor environment is rich in regular geometric structures such as walls, floors, and ceilings. Structural features represented by lines and surfaces are easier to obtain than point features and are less susceptible to changes in lighting. Therefore, how to In indoor geometrically structured scenes that lack texture features, it is particularly important to use visual information to find structural features suitable for SLAM positioning. By utilizing indoor geometric structural elements, the present invention proposes a structured scene visual SLAM method based on point, line and surface features to solve the problem of low pose estimation accuracy and poor robustness in artificial structured indoor scenes with lack of texture and changing illumination. question.
发明内容Contents of the invention
针对现有技术存在的上述问题,本发明的目的是提供一种基于点线面特征的结构化场景视觉SLAM方法,以解决目前的视觉SLAM技术在纹理缺乏、光照变化的人造结构化室内场景中位姿估计精度低、鲁棒性差的问题。In view of the above-mentioned problems existing in the existing technology, the purpose of the present invention is to provide a structured scene visual SLAM method based on point, line and surface features to solve the problem of current visual SLAM technology in artificial structured indoor scenes with lack of texture and changing illumination. The problem of low pose estimation accuracy and poor robustness.
本发明至少通过如下技术方案之一实现。The present invention is realized through at least one of the following technical solutions.
一种基于点线面特征的结构化场景视觉SLAM方法,包括以下步骤:A structured scene visual SLAM method based on point, line and surface features, including the following steps:
S1、输入彩色图像,根据彩色图像提取点特征和线特征并进行特征匹配;S1. Input a color image, extract point features and line features based on the color image and perform feature matching;
S2、输入深度图像,转换成点云序列结构,然后提取图像平面,接着对提取的图像平面与地图面进行匹配;S2. Input the depth image, convert it into a point cloud sequence structure, then extract the image plane, and then match the extracted image plane with the map plane;
S3、检测图像中是否存在曼哈顿世界坐标系,若存在且在曼哈顿世界地图中的历史关键帧观测到该坐标系,则根据所提取到的曼哈顿世界坐标系估计相机姿态并跟踪点特征、线特征和平面特征去优化相机位移,否则跟踪点特征、线特征和平面特征去优化相机位姿;S3. Detect whether the Manhattan world coordinate system exists in the image. If it exists and the coordinate system is observed in the historical key frames in the Manhattan world map, estimate the camera posture and track point features and line features based on the extracted Manhattan world coordinate system. and plane features to optimize camera displacement, otherwise track point features, line features and plane features to optimize camera pose;
S4、判断当前帧是否为关键帧,若为关键帧,则加入到局部地图中,并更新地图中的三维点、三维线和曼哈顿世界地图,对当前关键帧和相邻的关键帧进行联合优化,优化相机的位姿和三维点和三维线特征,并剔除部分外点和冗余关键帧;S4. Determine whether the current frame is a key frame. If it is a key frame, add it to the local map, update the three-dimensional points, three-dimensional lines and the Manhattan world map in the map, and jointly optimize the current key frame and adjacent key frames. , optimize the camera's pose and 3D point and 3D line features, and eliminate some external points and redundant key frames;
S5、对关键帧进行回环检测,若检测到回环则进行闭合回环并全局优化,减少累积误差。S5. Perform loopback detection on key frames. If a loopback is detected, the loopback will be closed and globally optimized to reduce cumulative errors.
进一步地,所述步骤S1具体为:输入彩色图像,首先采用ORB算法进行点特征的提取,并根据描述子进行点特征匹配,再通过PROSAC方法剔除点特征间的误匹配;之后采用EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。Further, the step S1 is specifically: input a color image, first use the ORB algorithm to extract point features, and perform point feature matching according to the descriptor, and then use the PROSAC method to eliminate mismatches between point features; then use the EDLine algorithm to For line feature extraction, split line segments are merged based on distance, angle information and descriptors as filtering criteria, and line features are matched based on LBD descriptors.
进一步地,所述步骤S2具体为:首先对深度有效的像素点求出其对应的三维点,形成有组织的点云序列结构,接着使用层次聚类算法使小平面块融合为更大的平面,并对融合的平面进行分割优化,最后对分割好的平面求出平面的三维中心点P C=[X C Y C Z C] T和平面的单位法向量n=[n x n y n z] T,其中X C、Y C、Z C为中心点P C对应的三维坐标值,n x、n y、n z为法向量n对应的值;则相机光心到平面的距离
Figure PCTCN2022128826-appb-000001
该平面的面特征表示为π=[n T d] T;接着对图像中的平面投影到世界坐标系上并根据法线夹角大小和世界坐标系原点到平面距离的差值大小对投影平面与地图面进行匹配。
Further, the step S2 is specifically: first, find the corresponding three-dimensional points for the depth-effective pixel points to form an organized point cloud sequence structure, and then use a hierarchical clustering algorithm to merge the small plane blocks into a larger plane. , perform segmentation optimization on the fused plane, and finally calculate the three-dimensional center point of the segmented plane P C = [X C Y C Z C ] T and the unit normal vector n of the plane = [n x n y n z ] T , where _ _ _ _
Figure PCTCN2022128826-appb-000001
The surface characteristics of this plane are expressed as π=[n T d] T ; then the plane in the image is projected onto the world coordinate system and the projected plane is projected based on the angle between the normal and the distance from the origin of the world coordinate system to the plane. Match the map surface.
进一步地,所述步骤S3具体为:Further, the step S3 is specifically:
S31、对步骤S2所提取的平面的法向量进行遍历,寻找是否存在三个互相正交的法向量组合,若是只存在两个互相正交的法向量组合,其另外一维法向量通过两个相互正交的法向量叉积获得;S31. Traverse the normal vectors of the plane extracted in step S2 to find whether there are three mutually orthogonal normal vector combinations. If there are only two mutually orthogonal normal vector combinations, the other one-dimensional normal vector passes through two mutually orthogonal normal vector combinations. The cross product of mutually orthogonal normal vectors is obtained;
S32、若步骤S31所寻找的法向量组合存在,则在曼哈顿世界地图中查询历史关键帧中是否观测到与这组正交法向量一致的互相正交的平面,若存在则构成一个曼哈顿世界坐标系,取对应平面点数总和最多的法向量组合所构成的曼哈顿世界坐标系,并求出该曼哈顿世界坐标系到该帧相机坐标系的旋转矩阵为:S32. If the normal vector combination sought in step S31 exists, query the Manhattan world map to see whether a mutually orthogonal plane that is consistent with this set of orthogonal normal vectors is observed in the historical key frames. If it exists, a Manhattan world coordinate is formed. system, take the Manhattan world coordinate system composed of the normal vector combination with the largest number of corresponding plane points, and find the rotation matrix from the Manhattan world coordinate system to the camera coordinate system of the frame:
Figure PCTCN2022128826-appb-000002
Figure PCTCN2022128826-appb-000002
其中n 1、n 2、n 3表示找到的相互正交的法向量,c i表示当前帧的id,m k表示曼哈顿世界坐标系的id;对
Figure PCTCN2022128826-appb-000003
作SVD分解,得到正交化后曼哈顿世界坐标系到相机坐标系的旋转矩阵
Figure PCTCN2022128826-appb-000004
则相机坐标系到世界坐标系的旋转矩阵为:
where n 1 , n 2 , n 3 represent the found mutually orthogonal normal vectors, c i represents the id of the current frame, and m k represents the id of the Manhattan world coordinate system; for
Figure PCTCN2022128826-appb-000003
Perform SVD decomposition to obtain the rotation matrix from the Manhattan world coordinate system to the camera coordinate system after orthogonalization
Figure PCTCN2022128826-appb-000004
Then the rotation matrix from the camera coordinate system to the world coordinate system is:
Figure PCTCN2022128826-appb-000005
Figure PCTCN2022128826-appb-000005
其中c j表示所找到的历史关键帧的id,
Figure PCTCN2022128826-appb-000006
为第c j帧时曼哈顿世界坐标系m k到相机坐标系的旋转矩阵,
Figure PCTCN2022128826-appb-000007
为第c j帧时世界坐标系到相机坐标系的旋转矩阵;
where c j represents the id of the found historical key frame,
Figure PCTCN2022128826-appb-000006
is the rotation matrix from the Manhattan world coordinate system m k to the camera coordinate system at the c jth frame,
Figure PCTCN2022128826-appb-000007
is the rotation matrix from the world coordinate system to the camera coordinate system at the c jth frame;
然后跟踪点特征、线特征和平面特征去优化相机位移t,其位移的误差模型e t为: Then track point features, line features and plane features to optimize the camera displacement t. The error model e t of its displacement is:
Figure PCTCN2022128826-appb-000008
Figure PCTCN2022128826-appb-000008
其中e p、e l、e π分别表示点特征、线特征、平面特征的重投影误差,具体形式分别为: Among them, e p , e l , and e π represent the reprojection errors of point features, line features, and plane features respectively. The specific forms are:
e p=p-(KR cwP w+t cw) e p =p-(KR cw P w +t cw )
e l=l T(KR cwP L+t cw) e l = l T (KR cw P L +t cw )
Figure PCTCN2022128826-appb-000009
Figure PCTCN2022128826-appb-000009
其中K为相机内参,R cw为世界坐标系到相机坐标系的旋转矩阵,t cw为世界坐标系到相机坐标系的位移,T cw为世界坐标系到相机坐标系的位姿变换矩阵;p为该帧识别到的点特征的像素坐标,P w为该点特征对应的三维点;l为该帧识别到的图像中的线特征,L为该线特征对应的三维线;P L为三维线的三维端点;
Figure PCTCN2022128826-appb-000010
为面特征参数表达形式,π=[n x n x n x d] T,n x、n y、n z为平面π的法向量对应的值,d为相机光心到平面π的距离;π c为该帧检测到的平面,π w为该平面对应的地图面;其中Λ p、Λ l、Λ π分别表示该点特征、线特征和平面特征的信息矩阵,ρ p、ρ l、ρ π分别为点特征、线特征和平面特征的Huber鲁棒核函数,Φ p(p)、Φ l(l)、Φ π(π)分别表示点特征、线特征和平面特征的置信度系数,分别为:
Among them, K is the internal parameter of the camera, R cw is the rotation matrix from the world coordinate system to the camera coordinate system, t cw is the displacement from the world coordinate system to the camera coordinate system, and T cw is the pose transformation matrix from the world coordinate system to the camera coordinate system; p is the pixel coordinate of the point feature recognized in this frame, P w is the three-dimensional point corresponding to the point feature; l is the line feature in the image recognized in the frame, L is the three-dimensional line corresponding to the line feature; P L is the three-dimensional The three-dimensional endpoint of the line;
Figure PCTCN2022128826-appb-000010
is the expression form of surface feature parameters, π = [n x n x n x d] T , n x , n y , n z are the values corresponding to the normal vector of plane π, d is the distance from the camera optical center to plane π; π c is the plane detected in the frame, π w is the map surface corresponding to the plane; where Λ p , Λ l , Λ π represent the information matrix of the point feature, line feature and plane feature respectively, ρ p , ρ l , ρ π are the Huber robust kernel functions of point features, line features and plane features respectively, Φ p (p), Φ l (l) and Φ π (π) represent the confidence coefficients of point features, line features and plane features respectively. They are:
Figure PCTCN2022128826-appb-000011
Figure PCTCN2022128826-appb-000011
Figure PCTCN2022128826-appb-000012
Figure PCTCN2022128826-appb-000012
Figure PCTCN2022128826-appb-000013
Figure PCTCN2022128826-appb-000013
其中n p、n l、n π分别表示对应的点特征、线特征、平面被观测的次数,t p、t l、t π分别为点特征、线特征和平面特征的权重系数,level i代表该点特征所在ORB金字塔的层数;α为权重系数,α∈[0.5,1];θ i表示第i帧相机视线与地图线的夹角,
Figure PCTCN2022128826-appb-000014
Among them, n p , n l , and n π respectively represent the number of times the corresponding point feature, line feature, and plane are observed, t p , t l , and t π are the weight coefficients of point features, line features, and plane features respectively, and level i represents The number of layers of the ORB pyramid where the point feature is located; α is the weight coefficient, α∈[0.5,1]; θ i represents the angle between the i-th frame camera’s line of sight and the map line,
Figure PCTCN2022128826-appb-000014
S33、假如不存在步骤S31所求的曼哈顿世界坐标系或曼哈顿世界地图中的历史关键帧没有观测到对应的法向量组合,则跟踪点特征、线特征和平面特征去优化相机位姿R,t,相机位姿误差模型e R,t为: S33. If the Manhattan world coordinate system required in step S31 does not exist or the corresponding normal vector combination is not observed in the historical key frames in the Manhattan world map, then track the point features, line features and plane features to optimize the camera pose R,t , the camera pose error model e R,t is:
Figure PCTCN2022128826-appb-000015
Figure PCTCN2022128826-appb-000015
进一步地,所述步骤S4具体为:Further, the step S4 is specifically:
S41、根据点特征和线特征的匹配对数、是否检测到新的平面、特征跟踪情况以及局部地图内的关键帧情况来判断是否设该帧为关键帧,若为关键帧则加入局部地图,否则返回步骤S1;S41. Determine whether to set the frame as a key frame based on the matching pairs of point features and line features, whether a new plane is detected, feature tracking conditions, and key frames in the local map. If it is a key frame, add it to the local map. Otherwise, return to step S1;
S42、对于新插入的关键帧,更新共视图和生成树,加入新关键帧节点;根据观测一致性对从被创建开始未被连续可靠观测到的点特征、线特征和平面特征进行剔除;对新关键帧没有匹配的点特征和线特征根据深度信息反投影生成新的地图点和地图线并插入到地图中;根据图像中的平面垂直关系记录垂直平面组合与当前关键帧的关联,并更新曼哈顿世界地图;S42. For the newly inserted keyframe, update the common view and spanning tree, and add new keyframe nodes; based on the observation consistency, eliminate point features, line features, and plane features that have not been continuously and reliably observed since their creation; The new keyframe has no matching point features and line features. New map points and map lines are generated based on the depth information back-projection and inserted into the map; the association between the vertical plane combination and the current keyframe is recorded according to the plane vertical relationship in the image, and updated Manhattan world map;
S43、完整地图更新之后,对当前关键帧和与之关联的关键帧进行联合优化,并在优化中剔除外点以最大程度优化相机位姿;优化对象为相关关键帧的相机位姿R,t以及三维特征参数P,L,优化过程中采用的重投影误差e为:S43. After the complete map is updated, jointly optimize the current keyframe and the keyframes associated with it, and eliminate outliers in the optimization to optimize the camera pose to the greatest extent; the optimization object is the camera pose R,t of the relevant keyframes. As well as the three-dimensional feature parameters P, L, the reprojection error e used in the optimization process is:
Figure PCTCN2022128826-appb-000016
Figure PCTCN2022128826-appb-000016
其中e p、e l分别表示点特征和线特征的重投影误差,Φ p(p)、Φ l(l)分别表示点特征和线 特征的置信度系数,Λ p、Λ l分别表示该点特征和线特征的信息矩阵,ρ p、ρ l分别为点特征和线特征的Huber鲁棒核函数; Among them, e p and e l represent the reprojection errors of point features and line features respectively, Φ p (p) and Φ l (l) represent the confidence coefficients of point features and line features respectively, and Λ p and Λ l represent the point respectively. The information matrix of features and line features, ρ p and ρ l are the Huber robust kernel functions of point features and line features respectively;
S44、将关键帧之间特征的重合度大于90%的关键帧剔除。S44. Eliminate key frames whose feature overlap between key frames is greater than 90%.
进一步地,判断关键帧的条件具体为:Furthermore, the conditions for judging key frames are specifically:
(1)插入关键帧之后已处理多于10帧图像,且当前局部建图线程处于空闲状态,则判断为关键帧;(1) After inserting the key frame, more than 10 frames of images have been processed, and the current local mapping thread is in an idle state, then it is determined to be a key frame;
(2)上一次插入关键帧之后已处理超过20帧图像,则判断为关键帧;(2) If more than 20 frames of images have been processed since the last key frame was inserted, it is determined to be a key frame;
(3)当前图像上匹配的点特征、线特征和平面特征总和不少于20,否则不能作为关键帧;(3) The sum of the matching point features, line features and plane features on the current image is not less than 20, otherwise it cannot be used as a key frame;
(4)当前图像上跟踪的特征相比最近关键帧跟踪的特征少于90%,则判断为关键帧;(4) If the features tracked on the current image are less than 90% compared to the features tracked by the recent key frame, it is determined to be a key frame;
(5)图像中提取到新平面,则判断为关键帧。(5) If a new plane is extracted from the image, it is judged to be a key frame.
进一步地,所述步骤S5具体为:基于点线特征词典模型,首先根据当前关键帧所有点线特征对应的描述子,在词典K叉树中找到所述的单词种类,计算每种单词对应的权重,得到当前关键帧的单词向量;计算当前关键帧的单词向量与其他关键帧单词向量之间的预选相似度,获得帧与帧之间的相似度并根据相似度获得当前关键帧的回环帧;构建回环内所有关键帧的位姿图,进行全局位姿图优化,减少累积误差。Further, the step S5 is specifically: based on the point-line feature dictionary model, first find the word type in the dictionary K-ary tree based on the descriptors corresponding to all point-line features of the current key frame, and calculate the corresponding word for each word. Weight, get the word vector of the current key frame; calculate the preselected similarity between the word vector of the current key frame and the word vectors of other key frames, obtain the similarity between frames and obtain the loop frame of the current key frame based on the similarity. ; Construct the pose graph of all key frames in the loop, optimize the global pose graph, and reduce cumulative errors.
进一步地,预选相似度为:Further, the preselected similarity is:
Figure PCTCN2022128826-appb-000017
Figure PCTCN2022128826-appb-000017
其中v c为当前关键帧的单词向量,v o为其他关键帧的单词向量,其中单词向量v的具体形式为: where v c is the word vector of the current key frame, v o is the word vector of other key frames, and the specific form of the word vector v is:
v={(w 11),(w 22),…,(w ii)} v={(w 1 , η 1 ), (w 2 , η 2 ),…, (w i , η i )}
其中w i为视觉词典中的第i个单词,η i为w i的单词权重,其中单词权重η i计算方式为: Where w i is the i-th word in the visual dictionary, eta i is the word weight of w i , where the word weight eta i is calculated as:
Figure PCTCN2022128826-appb-000018
Figure PCTCN2022128826-appb-000018
其中n i为图像中属于单词w i的特征数量,n为图像中的点线特征总数量,N为关键帧数据库中所有特征数量,N i为数据库中属于单词w i的点线特征总数量。 where n i is the number of features belonging to word w i in the image, n is the total number of point and line features in the image, N is the number of all features in the keyframe database, and N i is the total number of point and line features belonging to word w i in the database .
进一步地,所述步骤S1具体为:输入彩色图像,首先采用ORB算法进行点特征的提取,并根据描述子进行点特征匹配,再通过使用RANSAC方法剔除点特征间的误匹配;之后采用 EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。Further, step S1 is specifically: input a color image, first use the ORB algorithm to extract point features, and perform point feature matching according to the descriptor, and then use the RANSAC method to eliminate mismatches between point features; then use the EDLine algorithm For line feature extraction, split line segments are merged based on distance, angle information and descriptors as filtering criteria, and line features are matched based on LBD descriptors.
进一步地,所述步骤S1具体为:输入彩色图像,首先使用LSD算法提取线特征,并根据描述子进行点特征匹配,再通过使用RANSAC方法剔除点特征间的误匹配;之后采用EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。Further, step S1 is specifically: input a color image, first use the LSD algorithm to extract line features, and perform point feature matching according to the descriptor, and then use the RANSAC method to eliminate mismatches between point features; then use the EDLine algorithm to match the lines For feature extraction, split line segments are merged based on distance, angle information and descriptors as filtering criteria, and line features are matched based on LBD descriptors.
与现有技术相比,本发明具有如下显著性有益效果:Compared with the prior art, the present invention has the following significant beneficial effects:
(1)本发明改善了仅基于点特征法的视觉SLAM系统定位精度低、鲁棒性差等难题,能够在低纹理、光照变化大的室内人工结构化场景下达到满意的定位跟踪效果;(1) The present invention improves the problems of low positioning accuracy and poor robustness of the visual SLAM system based only on the point feature method, and can achieve satisfactory positioning and tracking effects in indoor artificially structured scenes with low texture and large illumination changes;
(2)本发明通过利用室内场景中的结构化特征,基于曼哈顿世界假设,通过提取曼哈顿世界坐标系求出绝对的曼哈顿世界坐标系到相机坐标系的旋转矩阵,进而直接获得世界坐标系到相机坐标系的旋转矩阵,从而极大减小帧间累积漂移误差,提高跟踪效果和精度。(2) This invention uses the structural features in the indoor scene and based on the Manhattan world hypothesis, extracts the Manhattan world coordinate system to calculate the absolute rotation matrix from the Manhattan world coordinate system to the camera coordinate system, and then directly obtains the rotation matrix from the world coordinate system to the camera coordinate system. The rotation matrix of the coordinate system can greatly reduce the cumulative drift error between frames and improve the tracking effect and accuracy.
附图说明Description of drawings
图1为实施例一种基于点线面特征的结构化场景视觉SLAM方法的流程图;Figure 1 is a flow chart of an embodiment of a structured scene visual SLAM method based on point, line and surface features;
图2为本发明在TUM fr3_str_notext_far数据序列中估计的轨迹对比图;Figure 2 is a trajectory comparison chart estimated by the present invention in the TUM fr3_str_notext_far data sequence;
图3为本发明在TUM fr3_str_notext_far数据序列中估计的xyz位移对比图;Figure 3 is a comparison diagram of the xyz displacement estimated by the present invention in the TUM fr3_str_notext_far data sequence;
图4为本发明在TUM fr3_str_notext_far数据序列中估计的三轴欧拉角姿态对比图;Figure 4 is a comparison diagram of the three-axis Euler angle attitude estimated by the present invention in the TUM fr3_str_notext_far data sequence;
图5为本发明在TUM fr3_str_notext_far数据序列中重建的稀疏三维特征图;Figure 5 is the sparse three-dimensional feature map reconstructed by the present invention in the TUM fr3_str_notext_far data sequence;
图6为本发明在TUM fr3_str_notext_far数据序列中重建的平面网格图。Figure 6 is a plane grid diagram reconstructed by the present invention in the TUM fr3_str_notext_far data sequence.
具体实施方式Detailed ways
下面结合具体的实施例对本发明技术方案做进一步详细、完整地说明。The technical solution of the present invention will be further described in detail and completely below with reference to specific embodiments.
如图1所示,本发明提供的一种基于点线面特征的结构化场景视觉SLAM方法,包括如下步骤:As shown in Figure 1, the present invention provides a structured scene visual SLAM method based on point, line and surface features, including the following steps:
步骤S1、输入彩色图像,首先采用ORB算法对点特征提取和根据描述子去匹配,然后通过PROSAC方法剔除点特征间误匹配,其中在本实施例中除了可以用PROSAC方法剔除点特征间的误匹配,也可以使用RANSAC方法剔除点特征间的误匹配;接着采用EDLine算法对线特征提取,根据端点距离、角度和描述子信息作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配;Step S1: Input a color image, first use the ORB algorithm to extract point features and match them based on descriptors, and then use the PROSAC method to eliminate mismatches between point features. In this embodiment, the PROSAC method can be used to eliminate mismatches between point features. For matching, you can also use the RANSAC method to eliminate mismatches between point features; then use the EDLine algorithm to extract line features, merge the split line segments based on the endpoint distance, angle and descriptor information as filtering criteria, and match the line features based on the LBD descriptor ;
步骤S2、输入深度图像,首先对深度有效的像素点求出其对应的三维点,形成有组织的点云序列结构;然后根据所求点云序列使用层次聚类算法快速提取平面;接着根据平面参数对图像中的平面和地图面进行匹配;其具体步骤为:Step S2. Input the depth image. First, find the corresponding three-dimensional points for the depth-effective pixels to form an organized point cloud sequence structure. Then use the hierarchical clustering algorithm to quickly extract the plane according to the required point cloud sequence. Then, according to the plane The parameters match the plane in the image and the map surface; the specific steps are:
步骤S21、输入深度图像,对深度有效的像素点求出其对应的三维点,形成有组织的点云序列结构,方便接下来算法处理;Step S21: Input the depth image, and calculate the corresponding three-dimensional points for the depth-effective pixels to form an organized point cloud sequence structure to facilitate subsequent algorithm processing;
步骤S22、初始化图模型,将图像中的点云均匀分成10×10大小的方块,每一块相当于图模型中的一个节点,当节点的均方误差高于设定阈值或节点含有深度缺失的数据或节点包含的深度不连续的点或节点在两个平面的边界时就把该节点和连接的边从图模型中删除;Step S22: Initialize the graph model and divide the point cloud in the image evenly into 10×10 squares. Each block is equivalent to a node in the graph model. When the mean square error of the node is higher than the set threshold or the node contains missing depth When the data or node contains a point or node with discontinuous depth at the boundary of two planes, the node and the connected edges are deleted from the graph model;
步骤S23、使用层次聚类算法,首先建立一个最小堆的数据结构,使得能够更有效找到有最小均方误差的节点进行融合;然后再次计算融合后的平面拟合均方误差,找到获得均方误差最小的对应两个节点;若均方误差超过一个预先设定的阈值,一个平面的分割节点就找到并从图中提出,否则就把这两个节点融合重新加到构建的图中,对建立的最小堆进行更新;重复前两个步骤直到图中的节点全部被取出;Step S23: Use the hierarchical clustering algorithm to first establish a minimum heap data structure to more effectively find the node with the minimum mean square error for fusion; then calculate the mean square error of the plane fitting after fusion again to find the mean square The smallest error corresponds to the two nodes; if the mean square error exceeds a preset threshold, a plane segmentation node is found and extracted from the graph, otherwise the two nodes are fused and re-added to the constructed graph. The established minimum heap is updated; repeat the first two steps until all nodes in the graph are removed;
步骤S24、对平面进一步分割优化,对边缘处产生的锯齿状分割采用腐蚀边界区域进行优化;对未使用的数据点采用就近分类到该点周围的平面;对过度分割的面在特别小的图中再次进行层次聚类;Step S24: Further segment and optimize the plane. Use the corrosion boundary area to optimize the jagged segmentation generated at the edge; classify unused data points into nearest planes around the point; over-segment the surface in a particularly small image. Hierarchical clustering is performed again;
步骤S25、对分割好的平面求出平面的中心点P C=[X C Y C Z C] T和其单位法向量n=[n x n y n z] T,其中X C,Y C,Z C为中心点P C对应的三维坐标值,n x,n yn z为法向量n对应的值;则相机光心到平面的距离
Figure PCTCN2022128826-appb-000019
则该平面的面特征表示为π=[n T d] T
Step S25: Find the center point of the divided plane P C = [X C Y C Z C ] T and its unit normal vector n = [n x n y n z ] T , where X C , Y C , Z C is the three-dimensional coordinate value corresponding to the center point P C , n x ,n y n z is the value corresponding to the normal vector n; then the distance from the camera optical center to the plane
Figure PCTCN2022128826-appb-000019
Then the surface characteristics of the plane are expressed as π=[n T d] T ;
步骤S26、将图像中的平面投影到世界坐标系上并根据法线夹角大小和世界坐标系原点到平面距离的差值大小对投影平面与地图面进行匹配;Step S26: Project the plane in the image onto the world coordinate system and match the projection plane with the map surface based on the angle between the normals and the difference between the origin of the world coordinate system and the plane;
步骤S3、检测图像中是否存在曼哈顿世界坐标系,若存在且曼哈顿世界地图中的历史关键帧观测到,则根据所提取得到的曼哈顿世界坐标系求相机旋转矩阵并跟踪点特征、线特征和平面特征优化相机位移,否则跟踪点特征、线特征和平面特征优化相机位姿,具体步骤为:Step S3: Detect whether the Manhattan world coordinate system exists in the image. If it exists and the historical key frames in the Manhattan world map are observed, the camera rotation matrix is calculated based on the extracted Manhattan world coordinate system and the point features, line features and planes are tracked. Features optimize camera displacement, otherwise tracking point features, line features and plane features optimize camera pose. The specific steps are:
步骤S31、对步骤S24所提取的平面其对应的法向量进行遍历寻找是否存在3个互相正交的法向量组合,或是否存在2个互相正交的法向量组合,其另外一维法向量通过2个相互正交的法向量叉积获得;Step S31: Traverse the corresponding normal vectors of the plane extracted in step S24 to find whether there are three mutually orthogonal normal vector combinations, or whether there are two mutually orthogonal normal vector combinations, and the other one-dimensional normal vector passes through The cross product of 2 mutually orthogonal normal vectors is obtained;
步骤S32、假如存在则在曼哈顿世界地图中查询历史关键帧中是否存在与这组正交法向量一致的3个互相正交的平面组合,若存在则构成一个曼哈顿世界坐标系,取对应平面点数总和最多的法向量组合所构成的曼哈顿世界坐标系,并求出该曼哈顿世界坐标系到该帧相机坐标 系的旋转矩阵为:Step S32. If it exists, query the historical key frame in the Manhattan world map to see if there are three mutually orthogonal plane combinations that are consistent with this set of orthogonal normal vectors. If they exist, a Manhattan world coordinate system will be formed, and the number of corresponding plane points will be obtained. The Manhattan world coordinate system formed by the combination of normal vectors with the largest sum, and the rotation matrix from the Manhattan world coordinate system to the camera coordinate system of the frame is obtained:
Figure PCTCN2022128826-appb-000020
Figure PCTCN2022128826-appb-000020
其中n 1,n 2,n 3代表找到的相互正交的法向量,c i表示当前帧的id,m k表示曼哈顿世界坐标系的id;因为传感器噪声的存在会导致n 1,n 2,n 3不完全正交或
Figure PCTCN2022128826-appb-000021
不为正交矩阵,需要对
Figure PCTCN2022128826-appb-000022
作SVD分解,得到正交化后曼哈顿世界坐标系到相机坐标系的旋转矩阵:
Figure PCTCN2022128826-appb-000023
则相机坐标系到世界坐标系的旋转矩阵为:
where n 1 , n 2 , n 3 represent the found mutually orthogonal normal vectors, c i represents the id of the current frame, and m k represents the id of the Manhattan world coordinate system; because the presence of sensor noise will cause n 1 , n 2 , n 3 is not completely orthogonal or
Figure PCTCN2022128826-appb-000021
is not an orthogonal matrix, you need to
Figure PCTCN2022128826-appb-000022
Perform SVD decomposition to obtain the rotation matrix from the Manhattan world coordinate system to the camera coordinate system after orthogonalization:
Figure PCTCN2022128826-appb-000023
Then the rotation matrix from the camera coordinate system to the world coordinate system is:
Figure PCTCN2022128826-appb-000024
Figure PCTCN2022128826-appb-000024
其中c j表示所找到的历史关键帧的id,
Figure PCTCN2022128826-appb-000025
为第c j帧时曼哈顿世界坐标系m k到相机坐标系的旋转矩阵,
Figure PCTCN2022128826-appb-000026
为第c j帧时世界坐标系到相机坐标系的旋转矩阵;
where c j represents the id of the found historical key frame,
Figure PCTCN2022128826-appb-000025
is the rotation matrix from the Manhattan world coordinate system m k to the camera coordinate system at the c jth frame,
Figure PCTCN2022128826-appb-000026
is the rotation matrix from the world coordinate system to the camera coordinate system at the c jth frame;
然后跟踪点特征、线特征和平面特征优化相机位移t,其位移的误差模型e t为: Then the point features, line features and plane features are tracked to optimize the camera displacement t, and the error model e t of its displacement is:
Figure PCTCN2022128826-appb-000027
Figure PCTCN2022128826-appb-000027
其中e p,e l,e π分别表示点特征、线特征和平面特征的重投影误差,具体形式分别为: Among them, e p , e l and e π respectively represent the reprojection errors of point features, line features and plane features. The specific forms are:
e p=p-(KR cwP w+t cw) e p =p-(KR cw P w +t cw )
e l=l T(KR cwP L+t cw) e l = l T (KR cw P L +t cw )
Figure PCTCN2022128826-appb-000028
Figure PCTCN2022128826-appb-000028
其中K为相机内参,R cw为世界坐标系到相机坐标系的旋转矩阵,t cw为世界坐标系到相机坐标系的位移,T cw为世界坐标系到相机坐标系的位姿变换矩阵;p为该帧识别到的点特征的像素坐标,P w为该点特征对应的三维点;l为该帧识别到的线特征,L为该线特征对应的三维线;P L为该三维线的三维端点;
Figure PCTCN2022128826-appb-000029
为本发明采取的平面特征参数表达形式,π=[n x n x n x d] T,n x、n y、n z为平面π的法向量对应的值,d为相机光心到平面π的距离;π c为该帧检测到的平面,π w为该平面对应的地图面;其中Λ plπ分别表示点特征、线特征和平面特征的信息矩阵,ρ p、ρ l、ρ π分别为点特征、线特征和平面特征的Huber鲁棒核函数,Φ p(p),Φ l(l),Φ π(π)分别表示点特征、线特征和平面特征的置信度, 分别为:
Among them, K is the internal parameter of the camera, R cw is the rotation matrix from the world coordinate system to the camera coordinate system, t cw is the displacement from the world coordinate system to the camera coordinate system, and T cw is the pose transformation matrix from the world coordinate system to the camera coordinate system; p is the pixel coordinate of the point feature identified in the frame, P w is the three-dimensional point corresponding to the point feature; l is the line feature identified in the frame, L is the three-dimensional line corresponding to the line feature; P L is the three-dimensional line corresponding to the point feature 3D endpoint;
Figure PCTCN2022128826-appb-000029
is the plane characteristic parameter expression form adopted by the present invention, π=[n x n x n x d] T , n x , ny , n z are the values corresponding to the normal vector of the plane π, and d is the distance from the camera optical center to the plane π distance; π c is the plane detected in the frame, π w is the map surface corresponding to the plane; where Λ p , Λ l , Λ π represent the information matrices of point features, line features and plane features respectively, ρ p , ρ l and ρ π are the Huber robust kernel functions of point features, line features and plane features respectively, Φ p (p), Φ l (l), Φ π (π) represent the confidence of point features, line features and plane features respectively. degrees, respectively:
Figure PCTCN2022128826-appb-000030
Figure PCTCN2022128826-appb-000030
Figure PCTCN2022128826-appb-000031
Figure PCTCN2022128826-appb-000031
Figure PCTCN2022128826-appb-000032
Figure PCTCN2022128826-appb-000032
其中n p,n l,n π分别表示对应的点特征、线特征和平面特征被观测的次数,t p,t l,t π为点特征、线特征和平面特征的权重系数,level i代表该点特征所在ORB金字塔的层数;α为权重系数,α∈[0.5,1];θ i表示第i帧相机视线与地图线的夹角,
Figure PCTCN2022128826-appb-000033
Among them n p , n l , n π respectively represent the number of times the corresponding point feature, line feature and plane feature are observed, t p , t l , t π are the weight coefficients of the point feature, line feature and plane feature, and level i represents The number of layers of the ORB pyramid where the point feature is located; α is the weight coefficient, α∈[0.5,1]; θ i represents the angle between the i-th frame camera’s line of sight and the map line,
Figure PCTCN2022128826-appb-000033
其中在本实施例平面特征采用的是球坐标参数表示形式
Figure PCTCN2022128826-appb-000034
π=[n x n x n x d] T,n x、n y、n z为平面π的法向量对应的值,d为相机光心到平面π的距离;此外平面特征也可以采用单位四元数参数表示形式或者是最近点参数表示形式表示平面特征。
In this embodiment, the plane features are expressed in the form of spherical coordinate parameters.
Figure PCTCN2022128826-appb-000034
π=[n x n x n x d] T , n x , n y , n z are the values corresponding to the normal vector of the plane π, d is the distance from the camera optical center to the plane π; in addition, the plane features can also be unit four The elemental parameter representation or the closest point parameter representation represents the planar feature.
步骤S33、假如不存在S31所求的曼哈顿世界坐标系或曼哈顿世界地图中的历史关键帧没有观测到对应的法向量组合,则跟踪点特征、线特征和平面特征优化相机位姿R,t,相机位姿误差模型e R,t为: Step S33. If the Manhattan world coordinate system required in S31 does not exist or the corresponding normal vector combination is not observed in the historical key frames in the Manhattan world map, then track the point features, line features and plane features to optimize the camera pose R,t, The camera pose error model e R,t is:
Figure PCTCN2022128826-appb-000035
Figure PCTCN2022128826-appb-000035
其中e p、e l、e π分别表示点特征、线特征和平面特征的重投影误差,Φ p(p)、Φ l(l)、Φ π(π)分别表示点特征、线特征和平面特征的置信度系数,Λ p、Λ l、Λ π分别表示该点特征、线特征和平面特征的信息矩阵,ρ p、ρ l、ρ π分别为点特征、线特征和平面特征的Huber鲁棒核函数; where e p , e l , and e π represent the reprojection errors of point features, line features, and plane features respectively, and Φ p (p), Φ l (l), and Φ π (π) represent point features, line features, and plane features respectively. The confidence coefficient of the feature, Λ p , Λ l , Λ π respectively represent the information matrix of the point feature, line feature and plane feature, ρ p , ρ l , ρ π are the Huber Lux of point feature, line feature and plane feature respectively. rod kernel function;
步骤S4、判断当前帧是否为关键帧,若为关键帧,则加入到局部地图中,并更新地图中的三维点、三维线和曼哈顿世界地图,对当前关键帧和相邻的关键帧进行联合优化,优化相机的位姿和三维点和三维线特征,并剔除部分外点和冗余关键帧,具体步骤为:Step S4: Determine whether the current frame is a key frame. If it is a key frame, add it to the local map, update the three-dimensional points, three-dimensional lines and the Manhattan world map in the map, and combine the current key frame and adjacent key frames. Optimize, optimize the camera's pose and 3D point and 3D line features, and eliminate some external points and redundant key frames. The specific steps are:
步骤S41、判断该帧是否为关键帧,若为关键帧,则将其加入到局部地图,否则返回步骤S1,继续处理下一帧图像,其判断的条件为:(1)插入关键帧之后已处理多于10帧图像,且当前局部建图线程处于空闲状态,则判断为关键帧;(2)上一次插入关键帧之后已处理超过20帧图像,则判断为关键帧;(3)当前图像上匹配的点特征、线特征和平面特征总和不少于20,否则不能作为关键帧;(4)当前图像上跟踪的特征相比最近关键帧跟踪的特征少于90%,则判断为关键帧;(5)图像中提取到新平面,则判断为关键帧;Step S41: Determine whether the frame is a key frame. If it is a key frame, add it to the local map. Otherwise, return to step S1 and continue to process the next frame of image. The conditions for judgment are: (1) After inserting the key frame, it is If more than 10 frames of images are processed, and the current local mapping thread is idle, it is determined to be a key frame; (2) If more than 20 frames of images have been processed since the last key frame was inserted, it is determined to be a key frame; (3) The current image The sum of the matching point features, line features and plane features is no less than 20, otherwise it cannot be used as a key frame; (4) If the features tracked on the current image are less than 90% compared to the features tracked by the recent key frame, it is judged to be a key frame ; (5) If a new plane is extracted from the image, it is judged to be a key frame;
步骤S42、对于新插入的关键帧,首先更新共视图和生成树,加入新关键帧节点;然后剔除局部地图中的三维特征被观测次数少于三次的点特征和线特征;对新关键帧没有匹配的点线特征根据深度信息反投影生成新的地图点和地图线并插入到地图中;最后根据图像中的平面垂直关系记录垂直平面组合与当前关键帧的关联,并更新曼哈顿世界地图;Step S42. For the newly inserted key frame, first update the common view and spanning tree, and add new key frame nodes; then remove the point features and line features whose three-dimensional features in the local map have been observed less than three times; for the new key frame, no The matched point and line features are back-projected to generate new map points and map lines based on the depth information and inserted into the map; finally, the association between the vertical plane combination and the current key frame is recorded according to the plane vertical relationship in the image, and the Manhattan world map is updated;
步骤S43、完整地图更新之后,对当前关键帧和与之关联的关键帧进行联合优化,主要优化对象为相关关键帧的相机位姿以及三维特征参数,这里平面的参数不作为优化对象;在迭代优化10次后将重投影误差过大的外点移除,再继续迭代优化,重复4次,以尽可能地清除外点,最大程度优化相机位姿R,t和三维特征信息P,L,这里不优化平面的参数;优化过程中采用的重投影误差e如下式所示:Step S43. After the complete map is updated, the current key frame and the key frames associated with it are jointly optimized. The main optimization objects are the camera pose and three-dimensional feature parameters of the relevant key frames. Here, the parameters of the plane are not used as optimization objects; in the iteration After 10 times of optimization, remove the outer points with excessive reprojection errors, and then continue the iterative optimization, repeating 4 times to remove as many outer points as possible and optimize the camera pose R,t and three-dimensional feature information P,L to the greatest extent. The parameters of the plane are not optimized here; the reprojection error e used in the optimization process is as follows:
Figure PCTCN2022128826-appb-000036
Figure PCTCN2022128826-appb-000036
其中e p、e l分别表示点特征、线特征的重投影误差,Φ p(p)、Φ l(l)分别表示点特征和线特征的置信度系数,Λ p、Λ l分别表示该点特征和线特征的信息矩阵,ρ p、ρ l分别为点特征和线特征的Huber鲁棒核函数; Among them, e p and e l represent the reprojection errors of point features and line features respectively, Φ p (p) and Φ l (l) represent the confidence coefficients of point features and line features respectively, and Λ p and Λ l represent the point respectively. The information matrix of features and line features, ρ p and ρ l are the Huber robust kernel functions of point features and line features respectively;
步骤S44、将关键帧之间特征的重合度大于90%的关键帧剔除,以精简共视图;Step S44: Eliminate key frames with a feature overlap greater than 90% between key frames to streamline common views;
步骤S5、对关键帧进行回环检测以及全局优化,其具体步骤为:Step S5: Perform loop closure detection and global optimization on key frames. The specific steps are:
步骤S51、基于点线特征词典模型,首先根据当前关键帧所有点特征和线特征对应的描述子,在词典K叉树中找到所述的单词种类,计算每种单词对应的权重,得到当前关键帧的单词向量;Step S51. Based on the point-line feature dictionary model, first find the word type in the dictionary K-ary tree according to the descriptors corresponding to all point features and line features of the current key frame, calculate the weight corresponding to each word, and obtain the current key The word vector of the frame;
步骤S52、计算当前关键帧的单词向量与其他关键帧单词向量之间的预选相似度s,选取相似度最高且超过设定阈值的关键帧作为当前关键帧的回环帧,其中当前关键帧的单词向量与其他关键帧单词向量之间的预选相似度为:Step S52: Calculate the preselected similarity s between the word vector of the current key frame and the word vectors of other key frames, and select the key frame with the highest similarity and exceeding the set threshold as the loop frame of the current key frame, in which the word of the current key frame The preselected similarity between the vector and other keyframe word vectors is:
Figure PCTCN2022128826-appb-000037
Figure PCTCN2022128826-appb-000037
其中v c为当前关键帧的单词向量,v o为其他关键帧的单词向量,其中单词向量v的具体 形式为: where v c is the word vector of the current key frame, v o is the word vector of other key frames, and the specific form of the word vector v is:
v={(w 11),(w 22),…,(w kk)} v={(w 1 , η 1 ), (w 2 , η 2 ),…, (w k , η k )}
其中w i为视觉词典中的第i个单词,η i为w i的单词权重,其中单词权重η i计算方式为: Where w i is the i-th word in the visual dictionary, eta i is the word weight of w i , where the word weight eta i is calculated as:
Figure PCTCN2022128826-appb-000038
Figure PCTCN2022128826-appb-000038
其中n i为图像中属于单词w i的特征数量,n为图像中的点线特征总数量,N为关键帧数据库中所有特征数量,N i为数据库中属于单词w i的点线特征总数量。 where n i is the number of features belonging to word w i in the image, n is the total number of point and line features in the image, N is the number of all features in the keyframe database, and N i is the total number of point and line features belonging to word w i in the database .
步骤S53、构建回环内所有关键帧的位姿图,位姿图的顶点为每一个关键帧的位姿,边为关键帧之间的共视点特征或共视线特征,实现全局位姿图优化。Step S53: Construct a pose graph of all key frames in the loop. The vertices of the pose graph are the poses of each key frame, and the edges are common viewpoint features or common line features between key frames to achieve global pose graph optimization.
步骤S6、结果输出:位姿保存,路径绘制和地图的保存与同步实时显示,如图2为本发明在TUM fr3_str_notext_far数据序列中估计的轨迹对比图,图3的a、b、c为本发明在TUM fr3_str_notext_far数据序列中估计的xyz轴位移对比图,图4的a、b、c为本发明在TUM fr3_str_notext_far数据序列中估计的xyz轴欧拉角姿态对比图;如图5为本发明在TUM fr3_str_notext_far数据序列中重建的稀疏三维特征图,图6为本发明在TUM fr3_str_notext_far数据序列中重建的平面网格图。Step S6. Result output: pose saving, path drawing and map saving and synchronized real-time display. Figure 2 shows the trajectory comparison chart estimated by the present invention in the TUM fr3_str_notext_far data sequence. Figure 3 a, b and c show the present invention. Comparison diagram of the xyz-axis displacement estimated in the TUM fr3_str_notext_far data sequence. Figure 4 a, b, and c are comparison diagrams of the xyz-axis Euler angle attitude estimated by the present invention in the TUM fr3_str_notext_far data sequence; Figure 5 shows the comparison diagram of the present invention in TUM The sparse three-dimensional feature map reconstructed in the fr3_str_notext_far data sequence. Figure 6 is the plane grid map reconstructed in the TUM fr3_str_notext_far data sequence by the present invention.
实施例2Example 2
一种基于点线面特征的结构化场景视觉SLAM方法,包括以下步骤:A structured scene visual SLAM method based on point, line and surface features, including the following steps:
S1、输入彩色图像,根据彩色图像提取点特征和线特征并进行特征匹配;其中在本实施例中采用EDLine算法提取线特征,也可以使用LSD算法提取线特征,并且对LSD算法提取的割裂线段的合并方法同样是根据端点距离、角度和描述子信息作为筛选标准;S1. Input a color image, extract point features and line features based on the color image and perform feature matching; in this embodiment, the EDLine algorithm is used to extract line features, the LSD algorithm can also be used to extract line features, and the split line segments extracted by the LSD algorithm are The merging method also uses endpoint distance, angle and descriptor information as filtering criteria;
S2、输入深度图像,转换成点云序列结构,然后提取图像平面,接着对提取的图像平面与地图面进行匹配;在本实施例中关于平面匹配除了根据法线夹角大小和世界坐标系原点到平面距离的差值大小作为判断平面匹配标准,也可以采用法线夹角大小与两平面间是否存在碰撞区域作为判断平面匹配标准;S2. Input the depth image, convert it into a point cloud sequence structure, then extract the image plane, and then match the extracted image plane with the map plane; in this embodiment, in addition to the plane matching based on the normal angle and the origin of the world coordinate system The difference in distance to the plane is used as the criterion for judging plane matching, and the size of the normal angle and whether there is a collision area between the two planes can also be used as the criterion for judging plane matching;
S3、检测图像中是否存在曼哈顿世界坐标系,若存在且在曼哈顿世界地图中的历史关键帧观测到该坐标系,则根据所提取到的曼哈顿世界坐标系估计相机姿态并跟踪点特征、线特征和平面特征去优化相机位移,否则跟踪点特征、线特征和平面特征去优化相机位姿;S3. Detect whether the Manhattan world coordinate system exists in the image. If it exists and the coordinate system is observed in the historical key frames in the Manhattan world map, estimate the camera posture and track point features and line features based on the extracted Manhattan world coordinate system. and plane features to optimize camera displacement, otherwise track point features, line features and plane features to optimize camera pose;
S4、判断当前帧是否为关键帧,若为关键帧,则加入到局部地图中,并更新地图中的三维点、三维线和曼哈顿世界地图,对当前关键帧和相邻的关键帧进行联合优化,优化相机的位姿和三维点和三维线特征,并剔除部分外点和冗余关键帧;S4. Determine whether the current frame is a key frame. If it is a key frame, add it to the local map, update the three-dimensional points, three-dimensional lines and the Manhattan world map in the map, and jointly optimize the current key frame and adjacent key frames. , optimize the camera's pose and 3D point and 3D line features, and eliminate some external points and redundant key frames;
S5、对关键帧进行回环检测,若检测到回环则进行闭合回环并全局优化,减少累积误差。S5. Perform loopback detection on key frames. If a loopback is detected, the loopback will be closed and globally optimized to reduce cumulative errors.
实施例3Example 3
一种基于点线面特征的结构化场景视觉SLAM方法,包括以下步骤:A structured scene visual SLAM method based on point, line and surface features, including the following steps:
S1、输入彩色图像,根据彩色图像提取点特征和线特征并进行特征匹配;其中在本实施例中使用LSD算法提取线特征,并且对LSD算法提取的割裂线段的合并方法同样是根据端点距离、角度和描述子信息作为筛选标准;S1. Input a color image, extract point features and line features based on the color image and perform feature matching; in this embodiment, the LSD algorithm is used to extract line features, and the merging method of the split line segments extracted by the LSD algorithm is also based on the endpoint distance, Angle and descriptor information are used as filtering criteria;
S2、输入深度图像,转换成点云序列结构,然后提取图像平面,接着对提取的图像平面与地图面进行匹配;在本实施例中关于平面匹配除了根据法线夹角大小和世界坐标系原点到平面距离的差值大小作为判断平面匹配标准,也可以采用法线夹角大小与两平面间是否存在碰撞区域作为判断平面匹配标准;S2. Input the depth image, convert it into a point cloud sequence structure, then extract the image plane, and then match the extracted image plane with the map plane; in this embodiment, in addition to the plane matching based on the normal angle and the origin of the world coordinate system The difference in distance to the plane is used as the criterion for judging plane matching, and the size of the normal angle and whether there is a collision area between the two planes can also be used as the criterion for judging plane matching;
S3、检测图像中是否存在曼哈顿世界坐标系,若存在且在曼哈顿世界地图中的历史关键帧观测到该坐标系,则根据所提取到的曼哈顿世界坐标系估计相机姿态并跟踪点特征、线特征和平面特征去优化相机位移,否则跟踪点特征、线特征和平面特征去优化相机位姿;S3. Detect whether the Manhattan world coordinate system exists in the image. If it exists and the coordinate system is observed in the historical key frames in the Manhattan world map, estimate the camera posture and track point features and line features based on the extracted Manhattan world coordinate system. and plane features to optimize camera displacement, otherwise track point features, line features and plane features to optimize camera pose;
S4、判断当前帧是否为关键帧,若为关键帧,则加入到局部地图中,并更新地图中的三维点、三维线和曼哈顿世界地图,对当前关键帧和相邻的关键帧进行联合优化,优化相机的位姿和三维点和三维线特征,并剔除部分外点和冗余关键帧;S4. Determine whether the current frame is a key frame. If it is a key frame, add it to the local map, update the three-dimensional points, three-dimensional lines and the Manhattan world map in the map, and jointly optimize the current key frame and adjacent key frames. , optimize the camera's pose and 3D point and 3D line features, and eliminate some external points and redundant key frames;
S5、对关键帧进行回环检测,若检测到回环则进行闭合回环并全局优化,减少累积误差。S5. Perform loopback detection on key frames. If a loopback is detected, the loopback will be closed and globally optimized to reduce cumulative errors.
以上实施例仅用以说明本发明的技术方案而非对其进行限制,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The above embodiments are only used to illustrate the technical solutions of the present invention but not to limit them. Without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and modifications according to the present invention. deformation, but these corresponding changes and deformations shall fall within the protection scope of the appended claims of the present invention.

Claims (10)

  1. 一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于,包括以下步骤:A structured scene visual SLAM method based on point, line and surface features, which is characterized by including the following steps:
    S1、输入彩色图像,根据彩色图像提取点特征和线特征并进行特征匹配;S1. Input a color image, extract point features and line features based on the color image and perform feature matching;
    S2、输入深度图像,转换成点云序列结构,然后提取图像平面,接着对提取的图像平面与地图面进行匹配;S2. Input the depth image, convert it into a point cloud sequence structure, then extract the image plane, and then match the extracted image plane with the map plane;
    S3、检测图像中是否存在曼哈顿世界坐标系,若存在且在曼哈顿世界地图中的历史关键帧观测到该坐标系,则根据所提取到的曼哈顿世界坐标系估计相机姿态并跟踪点特征、线特征和平面特征去优化相机位移,否则跟踪点特征、线特征和平面特征去优化相机位姿;S3. Detect whether the Manhattan world coordinate system exists in the image. If it exists and the coordinate system is observed in the historical key frames in the Manhattan world map, estimate the camera posture and track point features and line features based on the extracted Manhattan world coordinate system. and plane features to optimize camera displacement, otherwise track point features, line features and plane features to optimize camera pose;
    S4、判断当前帧是否为关键帧,若为关键帧,则加入到局部地图中,并更新地图中的三维点、三维线和曼哈顿世界地图,对当前关键帧和相邻的关键帧进行联合优化,优化相机的位姿和三维点和三维线特征,并剔除部分外点和冗余关键帧;S4. Determine whether the current frame is a key frame. If it is a key frame, add it to the local map, update the three-dimensional points, three-dimensional lines and the Manhattan world map in the map, and jointly optimize the current key frame and adjacent key frames. , optimize the camera's pose and 3D point and 3D line features, and eliminate some external points and redundant key frames;
    S5、对关键帧进行回环检测,若检测到回环则进行闭合回环并全局优化,减少累积误差。S5. Perform loopback detection on key frames. If a loopback is detected, the loopback will be closed and globally optimized to reduce cumulative errors.
  2. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S1具体为:输入彩色图像,首先采用ORB算法进行点特征的提取,并根据描述子进行点特征匹配,再通过PROSAC方法剔除点特征间的误匹配;之后采用EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S1 is specifically: input a color image, first use the ORB algorithm to extract point features, and extract the point features according to the description Point feature matching is performed, and then the PROSAC method is used to eliminate mismatches between point features; then the EDLine algorithm is used to extract line features, and the split line segments are merged based on distance, angle information and descriptors as screening criteria, and the line features are aligned based on the LBD descriptor Make a match.
  3. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S2具体为:首先对深度有效的像素点求出其对应的三维点,形成有组织的点云序列结构,接着使用层次聚类算法使小平面块融合为更大的平面,并对融合的平面进行分割优化,最后对分割好的平面求出平面的三维中心点P C=[X C Y C Z C] T和平面的单位法向量n=[n x n y n z] T,其中X C、Y C、Z C为中心点P C对应的三维坐标值,n x、n y、n z为法向量n对应的值;则相机光心到平面的距离
    Figure PCTCN2022128826-appb-100001
    该平面的面特征表示为π=[n T d] T;接着对图像中的平面投影到世界坐标系上并根据法线夹角大小和世界坐标系原点到平面距离的差值大小对投影平面与地图面进行匹配。
    A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S2 is specifically: first, find the corresponding three-dimensional points for the depth-effective pixel points, forming a The organized point cloud sequence structure is then used to fuse the small plane blocks into a larger plane using a hierarchical clustering algorithm, and the fused plane is segmented and optimized. Finally, the three-dimensional center point of the plane P C = [ X C Y C Z C ] T and the unit normal vector n of the plane = [n x n y n z ] T , where X C , Y C , and Z C are the three-dimensional coordinate values corresponding to the center point P C , n x , n y , n z are the values corresponding to the normal vector n; then the distance from the camera optical center to the plane
    Figure PCTCN2022128826-appb-100001
    The surface characteristics of the plane are expressed as π=[n T d] T ; then the plane in the image is projected onto the world coordinate system and the projected plane is projected based on the angle between the normal and the distance from the origin of the world coordinate system to the plane. Match the map surface.
  4. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S3具体为:A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S3 is specifically:
    S31、对步骤S2所提取的平面的法向量进行遍历,寻找是否存在三个互相正交的法向量组合,若是只存在两个互相正交的法向量组合,其另外一维法向量通过两个相互正交的法向量叉积获得;S31. Traverse the normal vectors of the plane extracted in step S2 to find whether there are three mutually orthogonal normal vector combinations. If there are only two mutually orthogonal normal vector combinations, the other one-dimensional normal vector passes through two mutually orthogonal normal vector combinations. The cross product of mutually orthogonal normal vectors is obtained;
    S32、若步骤S31所寻找的法向量组合存在,则在曼哈顿世界地图中查询历史关键帧中是 否观测到与这组正交法向量一致的互相正交的平面,若存在则构成一个曼哈顿世界坐标系,取对应平面点数总和最多的法向量组合所构成的曼哈顿世界坐标系,并求出该曼哈顿世界坐标系到该帧相机坐标系的旋转矩阵为:S32. If the normal vector combination sought in step S31 exists, query the Manhattan world map to see whether a mutually orthogonal plane that is consistent with this set of orthogonal normal vectors is observed in the historical key frames. If it exists, a Manhattan world coordinate is formed. system, take the Manhattan world coordinate system composed of the normal vector combination with the largest number of corresponding plane points, and find the rotation matrix from the Manhattan world coordinate system to the camera coordinate system of the frame:
    Figure PCTCN2022128826-appb-100002
    Figure PCTCN2022128826-appb-100002
    其中n 1、n 2、n 3表示找到的相互正交的法向量,c i表示当前帧的id,m k表示曼哈顿世界坐标系的id;对
    Figure PCTCN2022128826-appb-100003
    作SVD分解,得到正交化后曼哈顿世界坐标系到相机坐标系的旋转矩阵
    Figure PCTCN2022128826-appb-100004
    则相机坐标系到世界坐标系的旋转矩阵为:
    where n 1 , n 2 , n 3 represent the found mutually orthogonal normal vectors, c i represents the id of the current frame, and m k represents the id of the Manhattan world coordinate system; for
    Figure PCTCN2022128826-appb-100003
    Perform SVD decomposition to obtain the rotation matrix from the Manhattan world coordinate system to the camera coordinate system after orthogonalization
    Figure PCTCN2022128826-appb-100004
    Then the rotation matrix from the camera coordinate system to the world coordinate system is:
    Figure PCTCN2022128826-appb-100005
    Figure PCTCN2022128826-appb-100005
    其中c j表示所找到的历史关键帧的id,
    Figure PCTCN2022128826-appb-100006
    为第c j帧时曼哈顿世界坐标系m k到相机坐标系的旋转矩阵,
    Figure PCTCN2022128826-appb-100007
    为第c j帧时世界坐标系到相机坐标系的旋转矩阵;
    where c j represents the id of the found historical key frame,
    Figure PCTCN2022128826-appb-100006
    is the rotation matrix from the Manhattan world coordinate system m k to the camera coordinate system at the c jth frame,
    Figure PCTCN2022128826-appb-100007
    is the rotation matrix from the world coordinate system to the camera coordinate system at the c jth frame;
    然后跟踪点特征、线特征和平面特征去优化相机位移t,其位移的误差模型e t为: Then track point features, line features and plane features to optimize the camera displacement t. The error model e t of its displacement is:
    Figure PCTCN2022128826-appb-100008
    Figure PCTCN2022128826-appb-100008
    其中e p、e l、e π分别表示点特征、线特征、平面特征的重投影误差,具体形式分别为: Among them, e p , e l , and e π represent the reprojection errors of point features, line features, and plane features respectively. The specific forms are:
    e p=p-(KR cwP w+t cw) e p =p-(KR cw P w +t cw )
    e l=l T(KR cwP L+t cw) e l = l T (KR cw P L +t cw )
    Figure PCTCN2022128826-appb-100009
    Figure PCTCN2022128826-appb-100009
    其中K为相机内参,R cw为世界坐标系到相机坐标系的旋转矩阵,t cw为世界坐标系到相机坐标系的位移,T cw为世界坐标系到相机坐标系的位姿变换矩阵;p为该帧识别到的点特征的像素坐标,P w为该点特征对应的三维点;l为该帧识别到的图像中的线特征,L为该线特征对应的三维线;P L为三维线的三维端点;
    Figure PCTCN2022128826-appb-100010
    为平面特征参数表达形式,π=[n x n x n x d] T,n x、n y、n z为平面π的法向量对应的值,d为相机光心到平面π的距离;π c为该帧检测到的平面,π w为该平面对应的地图面;其中Λ p、Λ l、Λ π分别表示该点特征、线特征和平面特征的信息矩阵,ρ p、ρ l、ρ π分别为点特征、线特征和平面特征的Huber鲁棒核函数,Φ p(p)、Φ l(l)、Φ π(π)分别表示点特征、线特征和平面特征的置信度系数,分别为:
    Among them, K is the internal parameter of the camera, R cw is the rotation matrix from the world coordinate system to the camera coordinate system, t cw is the displacement from the world coordinate system to the camera coordinate system, and T cw is the pose transformation matrix from the world coordinate system to the camera coordinate system; p is the pixel coordinate of the point feature recognized in this frame, P w is the three-dimensional point corresponding to the point feature; l is the line feature in the image recognized in the frame, L is the three-dimensional line corresponding to the line feature; P L is the three-dimensional The three-dimensional endpoint of the line;
    Figure PCTCN2022128826-appb-100010
    is the expression form of plane characteristic parameters, π=[n x n x n x d] T , n x , n y , n z are the values corresponding to the normal vector of plane π, d is the distance from the optical center of the camera to plane π; π c is the plane detected in the frame, π w is the map surface corresponding to the plane; where Λ p , Λ l , Λ π represent the information matrix of the point feature, line feature and plane feature respectively, ρ p , ρ l , ρ π are the Huber robust kernel functions of point features, line features and plane features respectively, Φ p (p), Φ l (l) and Φ π (π) represent the confidence coefficients of point features, line features and plane features respectively. They are:
    Figure PCTCN2022128826-appb-100011
    Figure PCTCN2022128826-appb-100011
    Figure PCTCN2022128826-appb-100012
    Figure PCTCN2022128826-appb-100012
    Figure PCTCN2022128826-appb-100013
    Figure PCTCN2022128826-appb-100013
    其中n p、n l、n π分别表示对应的点特征、线特征、平面被观测的次数,t p、t l、t π分别为点特征、线特征和平面特征的权重系数,level i代表该点特征所在ORB金字塔的层数;α为权重系数,α∈[0.5,1];θ i表示第i帧相机视线与地图线的夹角,
    Figure PCTCN2022128826-appb-100014
    Among them, n p , n l , and n π respectively represent the number of times the corresponding point feature, line feature, and plane are observed, t p , t l , and t π are the weight coefficients of point features, line features, and plane features respectively, and level i represents The number of layers of the ORB pyramid where the point feature is located; α is the weight coefficient, α∈[0.5,1]; θ i represents the angle between the i-th frame camera’s line of sight and the map line,
    Figure PCTCN2022128826-appb-100014
    S33、假如不存在步骤S31所求的曼哈顿世界坐标系或曼哈顿世界地图中的历史关键帧没有观测到对应的法向量组合,则跟踪点特征、线特征和平面特征去优化相机位姿R,t,相机位姿误差模型e R,t为: S33. If the Manhattan world coordinate system required in step S31 does not exist or the corresponding normal vector combination is not observed in the historical key frames in the Manhattan world map, then track the point features, line features and plane features to optimize the camera pose R,t , the camera pose error model e R,t is:
    Figure PCTCN2022128826-appb-100015
    Figure PCTCN2022128826-appb-100015
  5. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S4具体为:A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S4 is specifically:
    S41、根据点特征和线特征的匹配对数、是否检测到新的平面、特征跟踪情况以及局部地图内的关键帧情况来判断是否设该帧为关键帧,若为关键帧则加入局部地图,否则返回步骤S1;S41. Determine whether to set the frame as a key frame based on the matching pairs of point features and line features, whether a new plane is detected, feature tracking conditions, and key frames in the local map. If it is a key frame, add it to the local map. Otherwise, return to step S1;
    S42、对于新插入的关键帧,更新共视图和生成树,加入新关键帧节点;根据观测一致性对从被创建开始未被连续可靠观测到的点特征、线特征和平面特征进行剔除;对新关键帧没有匹配的点特征和线特征根据深度信息反投影生成新的地图点和地图线并插入到地图中;根据图像中的平面垂直关系记录垂直平面组合与当前关键帧的关联,并更新曼哈顿世界地图;S42. For the newly inserted keyframe, update the common view and spanning tree, and add new keyframe nodes; based on the observation consistency, eliminate point features, line features, and plane features that have not been continuously and reliably observed since their creation; The new keyframe has no matching point features and line features. New map points and map lines are generated based on the depth information back-projection and inserted into the map; the association between the vertical plane combination and the current keyframe is recorded according to the plane vertical relationship in the image, and updated Manhattan world map;
    S43、完整地图更新之后,对当前关键帧和与之关联的关键帧进行联合优化,并在优化中剔除外点以最大程度优化相机位姿;优化对象为相关关键帧的相机位姿R,t以及三维特征参数P,L,优化过程中采用的重投影误差e为:S43. After the complete map is updated, jointly optimize the current keyframe and the keyframes associated with it, and eliminate outliers in the optimization to optimize the camera pose to the greatest extent; the optimization object is the camera pose R,t of the relevant keyframes. As well as the three-dimensional feature parameters P, L, the reprojection error e used in the optimization process is:
    Figure PCTCN2022128826-appb-100016
    Figure PCTCN2022128826-appb-100016
    其中e p、e l分别表示点特征和线特征的重投影误差,Φ p(p)、Φ l(l)分别表示点特征和线特征的置信度系数,Λ p、Λ l分别表示该点特征和线特征的信息矩阵,ρ p、ρ l分别为点特征和线特征的Huber鲁棒核函数; Among them, e p and e l represent the reprojection errors of point features and line features respectively, Φ p (p) and Φ l (l) represent the confidence coefficients of point features and line features respectively, and Λ p and Λ l represent the point respectively. The information matrix of features and line features, ρ p and ρ l are the Huber robust kernel functions of point features and line features respectively;
    S44、将关键帧之间特征的重合度大于90%的关键帧剔除。S44. Eliminate key frames whose feature overlap between key frames is greater than 90%.
  6. 根据权利要求5所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:判断关键帧的条件具体为:A structured scene visual SLAM method based on point, line and surface features according to claim 5, characterized in that: the conditions for judging key frames are specifically:
    (1)插入关键帧之后已处理多于10帧图像,且当前局部建图线程处于空闲状态,则判断为关键帧;(1) After inserting the key frame, more than 10 frames of images have been processed, and the current local mapping thread is in an idle state, then it is determined to be a key frame;
    (2)上一次插入关键帧之后已处理超过20帧图像,则判断为关键帧;(2) If more than 20 frames of images have been processed since the last key frame was inserted, it is determined to be a key frame;
    (3)当前图像上匹配的点特征、线特征和平面特征总和不少于20,否则不能作为关键帧;(3) The sum of the matching point features, line features and plane features on the current image is not less than 20, otherwise it cannot be used as a key frame;
    (4)当前图像上跟踪的特征相比最近关键帧跟踪的特征少于90%,则判断为关键帧;(4) If the features tracked on the current image are less than 90% compared to the features tracked by the recent key frame, it is determined to be a key frame;
    (5)图像中提取到新平面,则判断为关键帧。(5) If a new plane is extracted from the image, it is judged to be a key frame.
  7. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S5具体为:基于点线特征词典模型,首先根据当前关键帧所有点线特征对应的描述子,在词典K叉树中找到所述的单词种类,计算每种单词对应的权重,得到当前关键帧的单词向量;计算当前关键帧的单词向量与其他关键帧单词向量之间的预选相似度,获得帧与帧之间的相似度并根据相似度获得当前关键帧的回环帧;构建回环内所有关键帧的位姿图,进行全局位姿图优化,减少累积误差。A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S5 is specifically: based on the point and line feature dictionary model, first according to the corresponding point and line features of the current key frame Descriptor, find the word type in the dictionary K-ary tree, calculate the weight corresponding to each word, and obtain the word vector of the current key frame; calculate the preselection between the word vector of the current key frame and the word vectors of other key frames Similarity, obtain the similarity between frames and obtain the loop frame of the current key frame based on the similarity; construct the pose graph of all key frames in the loop, optimize the global pose graph, and reduce the cumulative error.
  8. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:预选相似度为:A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the preselected similarity is:
    Figure PCTCN2022128826-appb-100017
    Figure PCTCN2022128826-appb-100017
    其中v c为当前关键帧的单词向量,v o为其他关键帧的单词向量,其中单词向量v的具体形式为: where v c is the word vector of the current key frame, v o is the word vector of other key frames, and the specific form of the word vector v is:
    v={(w 11),(w 22),…,(w ii)} v={(w 1 , η 1 ), (w 2 , η 2 ),…, (w i , η i )}
    其中w i为视觉词典中的第i个单词,η i为w i的单词权重,其中单词权重η i计算方式为: Where w i is the i-th word in the visual dictionary, eta i is the word weight of w i , where the word weight eta i is calculated as:
    Figure PCTCN2022128826-appb-100018
    Figure PCTCN2022128826-appb-100018
    其中n i为图像中属于单词w i的特征数量,n为图像中的点线特征总数量,N为关键帧数据库中所有特征数量,N i为数据库中属于单词w i的点线特征总数量。 where n i is the number of features belonging to word w i in the image, n is the total number of point and line features in the image, N is the number of all features in the keyframe database, and N i is the total number of point and line features belonging to word w i in the database .
  9. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征 在于:所述步骤S1具体为:输入彩色图像,首先采用ORB算法进行点特征的提取,并根据描述子进行点特征匹配,再通过使用RANSAC方法剔除点特征间的误匹配;之后采用EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S1 is specifically: input a color image, first use the ORB algorithm to extract point features, and extract the point features according to the description Point feature matching is performed, and then the mismatching between point features is eliminated by using the RANSAC method; then the EDLine algorithm is used to extract line features, and the split line segments are merged based on distance, angle information and descriptors as filtering criteria, and the lines are aligned based on the LBD descriptor Features are matched.
  10. 根据权利要求1所述的一种基于点线面特征的结构化场景视觉SLAM方法,其特征在于:所述步骤S1具体为:输入彩色图像,首先使用LSD算法提取线特征,并根据描述子进行点特征匹配,再通过使用RANSAC方法剔除点特征间的误匹配;之后采用EDLine算法对线特征提取,根据距离、角度信息和描述子作为筛选标准合并割裂线段,并根据LBD描述子对线特征进行匹配。A structured scene visual SLAM method based on point, line and surface features according to claim 1, characterized in that: the step S1 is specifically: input a color image, first use the LSD algorithm to extract line features, and perform processing according to the descriptor Point feature matching, and then use the RANSAC method to eliminate mismatches between point features; then use the EDLine algorithm to extract line features, merge the split line segments based on distance, angle information and descriptors as screening criteria, and perform line feature extraction based on the LBD descriptor match.
PCT/CN2022/128826 2022-04-02 2022-10-31 Structured scene visual slam method based on point line surface features WO2023184968A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210346890.6A CN114862949A (en) 2022-04-02 2022-04-02 Structured scene vision SLAM method based on point, line and surface characteristics
CN202210346890.6 2022-04-02

Publications (1)

Publication Number Publication Date
WO2023184968A1 true WO2023184968A1 (en) 2023-10-05

Family

ID=82629058

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128826 WO2023184968A1 (en) 2022-04-02 2022-10-31 Structured scene visual slam method based on point line surface features

Country Status (2)

Country Link
CN (1) CN114862949A (en)
WO (1) WO2023184968A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649536A (en) * 2024-01-29 2024-03-05 华东交通大学 Visual synchronous positioning and mapping method for fusing dot line and line structural features
CN117649495A (en) * 2024-01-30 2024-03-05 山东大学 Indoor three-dimensional point cloud map generation method and system based on point cloud descriptor matching

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics
CN115376051B (en) * 2022-10-25 2023-03-24 杭州华橙软件技术有限公司 Key frame management method and device, SLAM method and electronic equipment
CN115830110B (en) * 2022-10-26 2024-01-02 北京城市网邻信息技术有限公司 Instant positioning and map construction method and device, terminal equipment and storage medium
CN115578620B (en) * 2022-10-28 2023-07-18 北京理工大学 Point-line-plane multidimensional feature-visible light fusion slam method
CN115601434B (en) * 2022-12-12 2023-03-07 安徽蔚来智驾科技有限公司 Loop detection method, computer device, computer-readable storage medium and vehicle
CN116468786B (en) * 2022-12-16 2023-12-26 中国海洋大学 Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN117611677A (en) * 2024-01-23 2024-02-27 北京理工大学 Robot positioning method based on target detection and structural characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682027A (en) * 2018-05-11 2018-10-19 北京华捷艾米科技有限公司 VSLAM realization method and systems based on point, line Fusion Features
US20190114777A1 (en) * 2017-10-18 2019-04-18 Tata Consultancy Services Limited Systems and methods for edge points based monocular visual slam
CN114241050A (en) * 2021-12-20 2022-03-25 东南大学 Camera pose optimization method based on Manhattan world hypothesis and factor graph
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114777A1 (en) * 2017-10-18 2019-04-18 Tata Consultancy Services Limited Systems and methods for edge points based monocular visual slam
CN108682027A (en) * 2018-05-11 2018-10-19 北京华捷艾米科技有限公司 VSLAM realization method and systems based on point, line Fusion Features
CN114241050A (en) * 2021-12-20 2022-03-25 东南大学 Camera pose optimization method based on Manhattan world hypothesis and factor graph
CN114862949A (en) * 2022-04-02 2022-08-05 华南理工大学 Structured scene vision SLAM method based on point, line and surface characteristics

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649536A (en) * 2024-01-29 2024-03-05 华东交通大学 Visual synchronous positioning and mapping method for fusing dot line and line structural features
CN117649536B (en) * 2024-01-29 2024-04-16 华东交通大学 Visual synchronous positioning and mapping method for fusing dot line and line structural features
CN117649495A (en) * 2024-01-30 2024-03-05 山东大学 Indoor three-dimensional point cloud map generation method and system based on point cloud descriptor matching

Also Published As

Publication number Publication date
CN114862949A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
WO2023184968A1 (en) Structured scene visual slam method based on point line surface features
CN108090958B (en) Robot synchronous positioning and map building method and system
CN112258618B (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
US20230260151A1 (en) Simultaneous Localization and Mapping Method, Device, System and Storage Medium
CN107025668B (en) Design method of visual odometer based on depth camera
CN112132893B (en) Visual SLAM method suitable for indoor dynamic environment
WO2019057179A1 (en) Visual slam method and apparatus based on point and line characteristic
Huang Review on LiDAR-based SLAM techniques
CN103646391A (en) Real-time camera tracking method for dynamically-changed scene
CN107818598B (en) Three-dimensional point cloud map fusion method based on visual correction
CN113658337B (en) Multi-mode odometer method based on rut lines
Zhang et al. Hand-held monocular SLAM based on line segments
CN112085849A (en) Real-time iterative three-dimensional modeling method and system based on aerial video stream and readable medium
CN111998862A (en) Dense binocular SLAM method based on BNN
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
Koch et al. Wide-area egomotion estimation from known 3d structure
Zhu et al. A review of 6d object pose estimation
Zhang et al. Stereo plane slam based on intersecting lines
CN115965686A (en) Semi-direct visual positioning method integrating point-line characteristics
Zhang LILO: A Novel Lidar–IMU SLAM System With Loop Optimization
Zhao et al. Visual SLAM combining lines and structural regularities: Towards robust localization
CN113781525A (en) Three-dimensional target tracking algorithm research based on original CAD model
CN117253003A (en) Indoor RGB-D SLAM method integrating direct method and point-plane characteristic method
WO2023030062A1 (en) Flight control method and apparatus for unmanned aerial vehicle, and device, medium and program
Sun et al. A multisensor-based tightly coupled integrated navigation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934786

Country of ref document: EP

Kind code of ref document: A1