CN111445526A

CN111445526A - Estimation method and estimation device for pose between image frames and storage medium

Info

Publication number: CN111445526A
Application number: CN202010321620.0A
Authority: CN
Inventors: 张涛; 李少朋; 杨新
Original assignee: Ningbo Huayun Intelligent Technology Co ltd; Tsinghua University
Current assignee: Ningbo Huayun Intelligent Technology Co ltd; Tsinghua University
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2020-07-24
Anticipated expiration: 2040-04-22
Also published as: CN111445526B

Abstract

The present application discloses a method for estimating pose between image frames, an estimating device and a storage medium, specifically, firstly receiving image frames, then using each frame of image frames as a current frame, and using the previous frame of the current frame as a reference frame, and according to the current frame, track the current frame in the reference frame and the local map generated based on the reference frame in turn, and finally, in response to the successful tracking, determine the current frame that meets the preset conditions as the key frame, and extract the image features in the key frame points, and calculate the optimal pose between keyframes based on image feature points. The embodiments of the present application distinguish key frames from non-key frames, and only extract image feature points in key frames, so as to improve the optimization efficiency and optimization accuracy of pose optimization.

Description

Estimation method, estimation device and storage medium for pose between image frames

技术领域technical field

本申请涉及计算机视觉技术领域，尤其涉及一种图像帧之间位姿的估计方法、估计装置和存储介质。The present application relates to the technical field of computer vision, and in particular, to a pose estimation method, estimation device and storage medium between image frames.

背景技术Background technique

即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)最早源于机器人领域，一般是指机器人携带传感器在运动过程中对自身位置进行定位，同时以合适的方式描述周围的环境的技术。SLAM帮助解决在一个未知的环境中实时重建环境的三维结构并同时对机器人自身进行定位的问题，且能够比传统的文字、图像和视频等方式更高效、直观地呈现信息。当在全球定位系统(Global Positioning System，GPS)不能正常使用的环境中，SLAM也可以作为一种有效的替代方案实现在未知环境中的实时导航。因此，SLAM技术在服务机器人、无人驾驶汽车、增强现实等诸多领域发挥着越来越重要的作用。Simultaneous Localization and Mapping (SLAM) originated in the field of robotics, and generally refers to the technology that robots carry sensors to locate their own position during the movement process, and describe the surrounding environment in an appropriate way. SLAM helps solve the problem of reconstructing the three-dimensional structure of the environment in an unknown environment in real time and positioning the robot itself at the same time, and can present information more efficiently and intuitively than traditional text, image and video methods. When the Global Positioning System (Global Positioning System, GPS) cannot be used normally, SLAM can also be used as an effective alternative to realize real-time navigation in an unknown environment. Therefore, SLAM technology plays an increasingly important role in many fields such as service robots, driverless cars, and augmented reality.

视觉SLAM技术可以分为通过特征点提取法和直接法实现。其中，特征点法提取每个图像中的显着图像特征，使用不变特征描述在连续帧中进行特征点的匹配，使用对极几何结构稳健地恢复相机的姿态和结构，采用关联的特征完成视觉结合计算和基于最小化投影误差的位姿优化，这些提取的显著特征可以聚类来描述整幅图像用以回环检测，但是图像特征点的提取与关联匹配是比较繁琐且耗时的工作。直接法可通过光度误差直接恢复相机的姿态和结构，而无需提取特征。但是，直接法不具备回环检测能力，致使其在长期导航中具有较大的位姿漂移和积累误差的问题。Visual SLAM technology can be divided into feature point extraction method and direct method. Among them, the feature point method extracts salient image features in each image, uses invariant feature descriptions to match feature points in consecutive frames, uses epipolar geometry to robustly restore camera pose and structure, and uses associated features to complete Combined with vision calculation and pose optimization based on minimizing projection error, these extracted salient features can be clustered to describe the entire image for loop closure detection, but the extraction and correlation matching of image feature points are tedious and time-consuming tasks. The direct method can directly recover the pose and structure of the camera through the photometric error without extracting features. However, the direct method does not have the capability of loop closure detection, resulting in large pose drift and accumulated errors in long-term navigation.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种图像帧之间位姿的估计方法，克服了图像帧之间位姿估计不准确且效率较低的问题。The embodiments of the present application provide a method for estimating pose between image frames, which overcomes the problems of inaccurate and low efficiency of pose estimation between image frames.

该方法包括：The method includes:

接收图像帧；receive image frames;

将每一帧所述图像帧作为当前帧，以及将所述当前帧的前一帧作为参考帧，并根据所述当前帧，依次在所述参考帧和基于所述参考帧生成的局部地图中跟踪所述当前帧；Taking each frame of the image frame as the current frame, and taking the previous frame of the current frame as the reference frame, and according to the current frame, sequentially in the reference frame and the local map generated based on the reference frame track the current frame;

响应于跟踪成功，确定满足预设条件的所述当前帧为关键帧，在所述关键帧中提取图像特征点，并基于所述图像特征点计算所述关键帧之间的最优位姿。In response to successful tracking, the current frame that satisfies the preset condition is determined as a key frame, image feature points are extracted from the key frame, and an optimal pose between the key frames is calculated based on the image feature points.

可选地，当在所述参考帧中追踪所述当前帧成功时，获取所述当前帧与所述参考帧的初始位姿；Optionally, when the current frame is successfully tracked in the reference frame, the initial poses of the current frame and the reference frame are acquired;

将在所述参考帧中跟踪到的地图点按照所述初始位姿投影至所述当前帧，并计算所述地图点投影在所述当前帧中的投影点所在的图像块与对应在所述当前帧中的匹配点所在的图像块之间的像素误差；Project the map point tracked in the reference frame to the current frame according to the initial pose, and calculate the image block where the projected point of the map point is projected in the current frame and the corresponding image block in the current frame. The pixel error between the image blocks where the matching point in the current frame is located;

将最小化所述像素误差后的相对位姿作为所述当前帧与所述参考帧之间的最优位姿。The relative pose after minimizing the pixel error is taken as the optimal pose between the current frame and the reference frame.

可选地，将所述当前帧之前的至少一个所述图像帧对应在局部地图中的三维地图点均投影至所述当前帧，并在至少一个所述投影点附近分别搜索与所述局部地图中的所述三维地图点对应的所述投影点：Optionally, project the three-dimensional map points in the local map corresponding to at least one of the image frames before the current frame to the current frame, and search and search for the local map in the vicinity of at least one of the projection points. The projection point corresponding to the three-dimensional map point in :

在搜索到的所述投影点中选取与所述当前帧中的所述匹配点的灰度值最接近的所述投影点，并计算选取的所述投影点与对应在所述当前帧中的所述匹配点之间的像素误差；Select the projection point that is closest to the gray value of the matching point in the current frame from the searched projection points, and calculate the relationship between the selected projection point and the corresponding point in the current frame. the pixel error between the matching points;

将最小化所述像素误差后的相对位姿作为所述当前帧与所述局部地图之间的最优位姿。The relative pose after minimizing the pixel error is taken as the optimal pose between the current frame and the local map.

可选地，当在所述参考帧中跟踪所述当前帧失败时，在所述当前帧中提取与上一帧所述关键帧中的至少一个地图点相匹配的所述图像特征点，计算所述当前帧与上一帧所述关键帧之间的相对位姿，并最小化所述相对位姿以完成位姿跟踪。Optionally, when tracking the current frame in the reference frame fails, extract the image feature point in the current frame that matches at least one map point in the key frame of the previous frame, and calculate relative pose between the current frame and the key frame of the previous frame, and minimize the relative pose to complete pose tracking.

可选地，当在所述局部地图中跟踪所述当前帧失败时，在所述当前帧中提取与所述局部地图中的至少一个地图点相匹配的所述图像特征点，计算所述当前帧与所述局部地图之间的相对位姿，并最小化所述相对位姿以完成位姿跟踪。Optionally, when tracking the current frame in the local map fails, extract the image feature point in the current frame that matches at least one map point in the local map, and calculate the current frame. The relative pose between the frame and the local map, and minimize the relative pose to complete the pose tracking.

可选地，在连续的所述图像帧的数量超过预设次数且在所述预设次数的连续的所述图像帧中未选取所述关键帧，和/或在所述参考帧中跟踪到的地图点的数量少于预设阈值。Optionally, the number of consecutive image frames exceeds a preset number of times and the key frame is not selected in the preset number of consecutive image frames, and/or tracked in the reference frame. The number of map points is less than the preset threshold.

可选地，在至少一个所述关键帧中提取与所述局部地图中的至少一个三维地图点相匹配的所述图像特征点，并基于所述图像特征点和所述三维地图点之间的所述最小化所述像素误差后的所述最优位姿，对匹配的所述三维地图点的坐标进行位姿优化；Optionally, extracting the image feature point that matches at least one 3D map point in the local map in at least one of the key frames, and based on the difference between the image feature point and the 3D map point. For the optimal pose after minimizing the pixel error, perform pose optimization on the coordinates of the matched three-dimensional map points;

将当前的所述关键帧的所述图像特征点与之前已经确定的至少一个所述关键帧的所述图像特征点进行相似度比较，将大于相似度阈值的当前的所述关键帧确定候选回环帧，并在所述候选回环帧及其相邻图像帧与之前已经确定的至少一个所述关键帧及其相邻图像帧连续相似的数量大于预设数量时，则确定所述候选回环帧为回环帧。Compare the similarity between the image feature points of the current key frame and the image feature points of at least one previously determined key frame, and determine a candidate loop closure for the current key frame greater than the similarity threshold frame, and when the number of consecutive similarities between the candidate loopback frame and its adjacent image frames and the previously determined at least one key frame and its adjacent image frames is greater than a preset number, the candidate loopback frame is determined to be Loopback frame.

在本发明的另一个实施例中，提供了一种图像帧之间位姿的估计装置，该装置包括：In another embodiment of the present invention, an apparatus for estimating pose between image frames is provided, the apparatus comprising:

接收模块，用于接收图像帧；a receiving module for receiving image frames;

跟踪模块，用于将每一帧所述图像帧作为当前帧，以及将所述当前帧的前一帧作为参考帧，并根据所述当前帧，依次在所述参考帧和基于所述参考帧生成的局部地图中跟踪所述当前帧；The tracking module is configured to take each frame of the image frame as the current frame, and take the previous frame of the current frame as the reference frame, and according to the current frame, in sequence in the reference frame and based on the reference frame tracking the current frame in the generated local map;

构建模块，用于响应于跟踪成功，确定满足预设条件的所述当前帧为关键帧，在所述关键帧中提取图像特征点，并基于所述图像特征点计算所述关键帧之间的最优位姿。The building module is used to, in response to successful tracking, determine that the current frame that satisfies a preset condition is a key frame, extract image feature points in the key frame, and calculate the difference between the key frames based on the image feature points. optimal pose.

在本发明的另一个实施例中，提供了一种非瞬时计算机可读存储介质，所述非瞬时计算机可读存储介质存储指令，所述指令在由处理器执行时使得所述处理器执行上述一种图像帧之间位姿的估计方法中的各个步骤。In another embodiment of the present invention, a non-transitory computer-readable storage medium is provided, the non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the above-described Steps in a method for estimating pose between image frames.

在本发明的另一个实施例中，提供了一种终端设备，包括处理器，所述处理器用于执行上述一种图像帧之间位姿的估计方法中的各个步骤。In another embodiment of the present invention, a terminal device is provided, which includes a processor, and the processor is configured to execute each step in the above-mentioned method for estimating pose between image frames.

基于上述实施例，首先接收图像帧，其次将每一帧图像帧作为当前帧，以及将当前帧的前一帧作为参考帧，并根据当前帧，依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧，最后，响应于跟踪成功，确定满足预设条件的当前帧为关键帧，在关键帧中提取图像特征点，并基于图像特征点计算关键帧之间的最优位姿。本申请实施例通过区分关键帧和非关键帧，只在关键帧中提取图像特征点，以提升位姿优化的优化效率和优化精度。Based on the above-mentioned embodiment, the image frame is first received, then each image frame is used as the current frame, and the previous frame of the current frame is used as the reference frame, and according to the current frame, the reference frame and the local map generated based on the reference frame are sequentially generated. At last, in response to successful tracking, the current frame that meets the preset conditions is determined as the key frame, the image feature points are extracted from the key frame, and the optimal pose between the key frames is calculated based on the image feature points. In the embodiment of the present application, by distinguishing key frames and non-key frames, and extracting image feature points only in key frames, the optimization efficiency and optimization accuracy of pose optimization are improved.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

图1示出了本申请实施例100所提供的一种图像帧之间位姿的估计方法的流程示意图；FIG. 1 shows a schematic flowchart of a method for estimating pose between image frames provided by Embodiment 100 of the present application;

图2示出了本申请实施例200提供的在非关键使用SLAM算法中的直接法进行跟踪以及在关键帧中使用SLAM算法中的特征点提取法进行优化和闭环检测的流程的示意图；2 shows a schematic diagram of the process of using the direct method in the non-critical SLAM algorithm to track and using the feature point extraction method in the SLAM algorithm to perform optimization and closed-loop detection in a key frame provided by Embodiment 200 of the present application;

图3示出了本申请实施例300提供的一种图像帧之间位姿的估计方法的具体流程的示意图；FIG. 3 shows a schematic diagram of a specific flow of a method for estimating a pose between image frames provided by Embodiment 300 of the present application;

图4示出了本申请实施例400还提供一种图像帧之间位姿的估计装置的示意图；FIG. 4 shows a schematic diagram of an apparatus for estimating pose between image frames provided by Embodiment 400 of the present application;

图5示出了本申请实施例500所提供的一种终端设备的示意图。FIG. 5 shows a schematic diagram of a terminal device provided by Embodiment 500 of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例如能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含。例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can, for example, be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units expressly listed, but may include steps or units not expressly listed or for such process, method, product or Other steps or units inherent in the device.

完整的SLAM框架一般由以下四个模块组成：跟踪前端、优化后端、回环检测、地图重建。跟踪前端即视觉里程计负责初步估计图像帧之间的运动状态及路标的位置；后端优化负责接收不同时刻视觉里程计测量的位姿信息并计算最大后验概率估计；回环检测负责判断机器人是否回到了原来的位置，并进行回环闭合修正估计误差；地图重建负责根据相机轨迹和图像，构建与任务要求相适应的地图。基于现有技术中的问题，本申请实施例提供了一种图像帧之间位姿的估计方法，主要适用于计算机视觉技术领域。通过区分关键帧和非关键帧，在关键帧中使用SLAM算法中的特征提取法，以及在非关键帧中直接法，提升了三维地图的构建精度，并节省了构建时间。下面以具体实施例对本发明的技术方案进行详细说明，以实现一种图像帧之间位姿的估计方法。以下几个具体实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例中不再赘述。如图1所示，为本申请实施例100提供的一种图像帧之间位姿的估计方法的流程示意图。其中，详细步骤如下：A complete SLAM framework generally consists of the following four modules: tracking front-end, optimization back-end, loop detection, and map reconstruction. The tracking front-end, that is, the visual odometry, is responsible for initially estimating the motion state between image frames and the position of the road signs; the back-end optimization is responsible for receiving the pose information measured by the visual odometry at different times and calculating the maximum posterior probability estimate; the loopback detection is responsible for judging whether the robot is Return to the original position, and perform loop closure to correct the estimation error; map reconstruction is responsible for constructing a map suitable for the task requirements based on the camera track and image. Based on the problems in the prior art, the embodiments of the present application provide a method for estimating pose between image frames, which is mainly applicable to the technical field of computer vision. By distinguishing key frames and non-key frames, using the feature extraction method in the SLAM algorithm in the key frames, and using the direct method in the non-key frames, the construction accuracy of the 3D map is improved and the construction time is saved. The technical solution of the present invention will be described in detail below with specific embodiments, so as to realize a method for estimating pose between image frames. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. As shown in FIG. 1 , it is a schematic flowchart of a method for estimating pose between image frames according to Embodiment 100 of the present application. The detailed steps are as follows:

S11，接收图像帧。S11, receive an image frame.

本步骤中，接收到的图像帧由图像采集设备采集。具体的，图像采集设备可以为照相机、摄像机或者虚拟现实(Virtual Reality，VR)设备。In this step, the received image frames are collected by the image collection device. Specifically, the image acquisition device may be a camera, a video camera, or a virtual reality (Virtual Reality, VR) device.

S12，将每一帧图像帧作为当前帧，以及将当前帧的前一帧作为参考帧，并根据当前帧，依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧。S12 , taking each frame of image frame as the current frame, and taking the previous frame of the current frame as the reference frame, and according to the current frame, sequentially tracking the current frame in the reference frame and the local map generated based on the reference frame.

本步骤中，为便于陈述，本申请实施例将接受到的每一帧作为当前帧，将当前帧的前一帧作为参考帧。在接收到每一帧当前帧后，使用SLAM算法中的直接法首先在参考帧中跟踪当前帧，并在跟踪成功后继续在基于参考帧生成的局部地图中跟踪当前帧。具体的，依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧的流程由跟踪线程完成。其中，如果对参考帧的跟踪成功，则采用等速运动模型来预测当前的图像采集设备的初始位姿。进一步地，在参考帧中跟踪的地图点将基于估计的初始位姿投影到当前帧上。最后，通过最小化对应于同一地图点的图像块之间的光度误差来求解参考帧和当前帧之间的相对位姿。In this step, for convenience of description, the embodiment of the present application uses each received frame as the current frame, and uses the previous frame of the current frame as the reference frame. After receiving the current frame of each frame, the direct method in the SLAM algorithm is used to first track the current frame in the reference frame, and after the tracking is successful, it continues to track the current frame in the local map generated based on the reference frame. Specifically, the process of tracking the current frame in the reference frame and the local map generated based on the reference frame is completed by the tracking thread. Wherein, if the tracking of the reference frame is successful, a constant velocity motion model is used to predict the initial pose of the current image acquisition device. Further, map points tracked in the reference frame are projected onto the current frame based on the estimated initial pose. Finally, the relative pose between the reference frame and the current frame is solved by minimizing the photometric error between image patches corresponding to the same map point.

进一步地，仅通过将参考帧获取的当前帧的相对位姿可能存在不够精确的问题。跟踪线程在局部地图中跟踪当前帧，以获得更多应跟踪和优化的匹配映射点。其中，局部地图由多个参考帧初始化后形成。具体的，当前帧中的像素通过在参考帧中追踪当前帧确定的相对位姿与局部地图中的三维地图点匹配，将之前的图像帧对应在局部地图中的三维地图点均投影至当前帧上，并在投影区域中选取与当前帧上的匹配点的灰度值最相近的投影点。通过最小化投影点与当匹配点之间的光度误差，分别获得当前帧中相对应的特征位置，以使当前帧对应的图像采集设备的相对位姿会得到进一步地优化。Further, the relative pose of the current frame obtained only by using the reference frame may be inaccurate. The tracking thread tracks the current frame in the local map for more matching map points that should be tracked and optimized. Among them, the local map is formed by initializing multiple reference frames. Specifically, the relative pose of the pixels in the current frame is matched with the 3D map points in the local map by tracking the relative pose of the current frame in the reference frame, and the 3D map points in the local map corresponding to the previous image frame are projected to the current frame. and select the projection point in the projection area that is closest to the gray value of the matching point on the current frame. By minimizing the photometric error between the projection point and the current matching point, the corresponding feature positions in the current frame are obtained respectively, so that the relative pose of the image acquisition device corresponding to the current frame can be further optimized.

S13，响应于跟踪成功，确定满足预设条件的当前帧为关键帧，在关键帧中提取图像特征点，并基于图像特征点计算关键帧之间的最优位姿。S13, in response to the successful tracking, determine the current frame that meets the preset condition as a key frame, extract image feature points in the key frame, and calculate the optimal pose between the key frames based on the image feature points.

本步骤中，当依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧成功时，进一步地确定当前帧是否为满足预设条件的关键帧。其中的预设条件，本申请实施例包括但不限于以下四种；已经连续预设次数如20帧未选定关键帧；局部优化线程处于空闲状态；跟踪的地图点数量小于预设值如50；当前帧跟踪到上一关键帧的图像特征点数量小于其总数的预设阈值如90％。使用SLAM算法中的特征点提取法只针对关键帧提取Oriented FASTand Rotated BRIEF(ORB)ORB作为图像特征点，以及基于图像特征点计算关键帧之间的最优位姿。当前帧被选择为关键帧后会传递至局部优化线程生成新的地图点并参与地图点位姿优化，并且传递给回环检测线程用于检测回环，以完成三维地图的构建。In this step, when the current frame is successfully tracked in the reference frame and the local map generated based on the reference frame in turn, it is further determined whether the current frame is a key frame satisfying the preset condition. Among the preset conditions, the embodiments of the present application include but are not limited to the following four types; a key frame has not been selected for a continuous preset number of times, such as 20 frames; the local optimization thread is in an idle state; the number of tracked map points is less than a preset value such as 50 ; The number of image feature points tracked to the previous key frame in the current frame is less than a preset threshold such as 90% of its total number. Use the feature point extraction method in the SLAM algorithm to extract the Oriented FASTand Rotated BRIEF(ORB) ORB only for the key frames as the image feature points, and calculate the optimal pose between the key frames based on the image feature points. After the current frame is selected as a key frame, it will be passed to the local optimization thread to generate new map points and participate in map point pose optimization, and passed to the loop loop detection thread for loop detection to complete the construction of the 3D map.

如上所述，基于上述实施例，首先接收图像帧，其次将每一帧图像帧作为当前帧，以及将当前帧的前一帧作为参考帧，并根据当前帧，依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧，最后，响应于跟踪成功，确定满足预设条件的当前帧为关键帧，在关键帧中提取图像特征点，并基于图像特征点计算关键帧之间的最优位姿。本申请实施例通过区分关键帧和非关键帧，只在关键帧中提取图像特征点，以提升位姿优化的优化效率和优化精度。As described above, based on the above-mentioned embodiment, the image frame is first received, then each frame of image frame is used as the current frame, and the previous frame of the current frame is used as the reference frame, and according to the current frame, the reference frame and the reference frame based on the current frame are used in turn. The current frame is tracked in the generated local map, and finally, in response to the successful tracking, the current frame that meets the preset conditions is determined as the key frame, the image feature points are extracted from the key frame, and the optimal value between the key frames is calculated based on the image feature points. pose. In the embodiment of the present application, by distinguishing key frames and non-key frames, and extracting image feature points only in key frames, the optimization efficiency and optimization accuracy of pose optimization are improved.

本申请实施例中的一种图像帧之间位姿的估计方法，主要是一种具有回环检测功能的半直接SLAM系统，该系统可以达到与特征点法相媲美的精度，并使系统能够尽可能快地运行。如图2所示，为本申请实施例200示出的在非关键使用SLAM算法中的直接法进行跟踪以及在关键帧中使用SLAM算法中的特征点提取法进行优化和闭环检测的流程的示意图。其中，根据场景的变化程度确定当前帧是否为关键帧，在关键帧中提取图像点特征，并采用特征点提取法完成关键帧之间的相对位姿计算，在非关键帧中采用直接法跟踪这些特征完成快速定位。其中，提取的特征为Oriented FAST and RotatedBRIEF(ORB)特征，该特征选取Oriented FAST角点为图像特征点坐标，并由256维的Rotated BRIEF描述子描述，该特征具有良好的尺度和视角的不变性，特征提取和匹配的速度快，更适合实时计算的SLAM系统。本申请实施例示出的图像帧之间位姿的估计方法包含三个并行运行的线程，即前端跟踪、局部优化和回环检测。跟踪线程负责在每个图像帧中初步求解图像采集设备而的位姿并确定是否插入新的关键帧；局部优化线程只处理新关键帧，对各帧位姿再优化，并管理局部地图用于位姿跟踪。回环检测线程计算当前关键帧与之前关键帧的相似性用以检测回环，并在检测到回环时纠正漂移误差，保持全局的一致性。A method for estimating pose between image frames in the embodiment of the present application is mainly a semi-direct SLAM system with loop closure detection function, which can achieve accuracy comparable to that of the feature point method, and enables the system to achieve as much accuracy as possible. Run fast. As shown in FIG. 2 , it is a schematic diagram of the process of using the direct method in the SLAM algorithm for tracking in the non-critical and using the feature point extraction method in the SLAM algorithm in the key frame to perform optimization and closed-loop detection shown in Embodiment 200 of the present application. . Among them, whether the current frame is a key frame is determined according to the degree of change of the scene, the image point features are extracted in the key frame, and the feature point extraction method is used to complete the relative pose calculation between the key frames, and the direct method is used to track in the non-key frames. These features complete fast positioning. Among them, the extracted features are Oriented FAST and RotatedBRIEF (ORB) features. This feature selects Oriented FAST corner points as image feature point coordinates and is described by a 256-dimensional Rotated BRIEF descriptor. This feature has good scale and perspective invariance , the speed of feature extraction and matching is fast, and it is more suitable for real-time computing SLAM systems. The method for estimating pose between image frames shown in the embodiments of the present application includes three threads running in parallel, namely front-end tracking, local optimization, and loop closure detection. The tracking thread is responsible for initially solving the pose of the image acquisition device in each image frame and determining whether to insert a new key frame; the local optimization thread only processes the new key frame, re-optimizes the pose of each frame, and manages the local map application. in pose tracking. The loopback detection thread calculates the similarity between the current keyframe and the previous keyframe to detect loopbacks, and corrects drift errors when a loopback is detected to maintain global consistency.

如图3所示，为本申请实施例300提供的一种图像帧之间位姿的估计方法的具体流程的示意图。本申请实施例所示的方法主要包括单目初始化、依据参考帧的初始定位、依据局部地图的定位、地图和图像帧管理四个部分。图3描述了跟踪线程在初始化完成后的整个过程，当接收图像帧后，首先通过直接法跟踪参考帧来确定当前帧的位姿T_i(i为大于等于1的整数，表示第i个当前帧)，为若跟踪成功则继续采用直接法跟踪局部地图微调当前帧的相对位姿，若跟踪失败则提取当前帧的图像特征点并采用特征点提取法跟踪上一关键帧T_k-1(k为大于等于1的整数，表示当前的关键帧的顺序)后再跟踪局部地图，若跟踪局部地图失败，则提取特征点采用特征提取点法跟踪局部地图，之后更新速度、局部地图等变量，然后再确定当前帧是否为关键帧，若是关键帧则提取图像特征并传给局部优化线程。综上，该具体流程的详细过程如下：As shown in FIG. 3 , it is a schematic diagram of a specific flow of a method for estimating a pose between image frames provided by Embodiment 300 of the present application. The method shown in the embodiment of the present application mainly includes four parts: monocular initialization, initial positioning according to reference frame, positioning according to local map, and management of map and image frames. Fig. 3 describes the whole process of the tracking thread after the initialization is completed. After receiving the image frame, it firstly uses the direct method to track the reference frame to determine the pose T _i of the current frame (i is an integer greater than or equal to 1, indicating the i-th current frame frame), if the tracking is successful, continue to use the direct method to track the local map to fine-tune the relative pose of the current frame, if the tracking fails, extract the image feature points of the current frame and use the feature point extraction method to track the previous key frame T _k-1 ( k is an integer greater than or equal to 1, indicating the order of the current key frame) and then track the local map. If tracking the local map fails, extract the feature points and use the feature extraction point method to track the local map, and then update the variables such as speed and local map. Then it is determined whether the current frame is a key frame, and if it is a key frame, the image features are extracted and passed to the local optimization thread. In summary, the detailed process of the specific process is as follows:

S301，接收图像帧。S301, receiving an image frame.

这里，将每一帧图像帧作为当前帧，以及将当前帧的前一帧作为参考帧。Here, each image frame is taken as the current frame, and the previous frame of the current frame is taken as the reference frame.

S302，初始化获取当前帧和参考帧的初始位姿，并确定局部地图。S302, initialize and acquire the initial poses of the current frame and the reference frame, and determine the local map.

这里，单目初始化的目标是计算两个图像帧之间的相对位姿，并三角化一组三维的地图点用于下一图像帧的跟踪。初始化过程每帧都要提取ORB特征，并行计算两个几何模型：平面场景假设下的单应性矩阵H和非平面场景假设下的基础矩阵F，然后使用启发式方法选择其中一个模型，并使用该模型对应的方法求解初始位姿，具体步骤如下：Here, the goal of monocular initialization is to compute the relative pose between two image frames and triangulate a set of 3D map points for tracking in the next image frame. In the initialization process, ORB features are extracted for each frame, and two geometric models are calculated in parallel: the homography matrix H under the assumption of flat scene and the fundamental matrix F under the assumption of non-planar scene, and then use a heuristic method to select one of the models, and use The method corresponding to this model solves the initial pose, and the specific steps are as follows:

(1)选取初始化帧：在接收的图像帧中选取匹配的ORB特征点数大于100的两个连续的图像帧作为初始化帧，这样可以避免在低纹理、光照差的情况下初始化。(1) Select the initialization frame: In the received image frame, select two consecutive image frames with matching ORB feature points greater than 100 as the initialization frame, which can avoid initialization in the case of low texture and poor illumination.

(2)并行计算两个模型：分别通过直接线性变换单应性矩阵H、八点法求解基础矩阵F以同样的迭代次数，采用随机采样一致性(Random Sample Consensus，RANSAC)求解上述两个矩阵。(2) Two models are calculated in parallel: the basic matrix F is solved by the direct linear transformation of the homography matrix H and the eight-point method respectively, and the above two matrices are solved by Random Sample Consensus (RANSAC) with the same number of iterations. .

(3)求解模型选择：如果场景是平面的，接近平面的或视差较低，则可以用单应性H解释。同时，也可以求解基础矩阵F，但模型不会得到很好的约束，从该基础矩阵估计的运动存在较大误差。相反，在非平面场景应选择基础矩阵F来估计运动，评判标准如下公式1：(3) Solving model selection: If the scene is flat, close to flat or with low parallax, it can be explained by the homography H. At the same time, the fundamental matrix F can also be solved, but the model will not be well constrained, and there is a large error in the motion estimated from this fundamental matrix. On the contrary, in a non-planar scene, the fundamental matrix F should be selected to estimate the motion, and the evaluation criteria are as follows: Equation 1:

如果R_H>0.45，则选择单应性矩阵，否则，选择基础矩阵。If R _H > 0.45, choose the homography matrix, otherwise, choose the fundamental matrix.

(4)相对位姿及结构求解：在相对位姿求解时，如果没有某一种模型明显好于另外一个，则不进行初始，返回到步骤(1)重新开始，这样的操作很好地保证了初始化时场景的鲁棒性。当选取基础矩阵模型时，需根据相机内参K继续求解本质矩阵E，如下述公式2:(4) Relative pose and structure solution: When solving the relative pose, if no one model is significantly better than the other, the initialization is not performed, and the process returns to step (1) to start again. This operation is well guaranteed. The robustness of the scene at initialization time. When selecting the basic matrix model, it is necessary to continue to solve the essential matrix E according to the camera internal parameter K, as shown in the following formula 2:

E＝K^TFK 公式2E=K ^T FK Equation 2

又E＝t^R，根据本质矩阵E可以求解出四组位姿变换的解，然后通过地图点在相机中深度的正负值选取正确的解。And E=t^R, according to the essential matrix E, the solution of four sets of pose transformation can be solved, and then the correct solution can be selected according to the positive and negative values of the depth of the map point in the camera.

(5)全局Bundle Adjustment：然后执行full BA(全局Bundle Adjustmen，图像采集设备的位姿和地图结构同时参与优化)来微调图像采集设备和地图结构。(5) Global Bundle Adjustment: Then perform full BA (Global Bundle Adjustmen, the pose and map structure of the image acquisition device participate in the optimization at the same time) to fine-tune the image acquisition device and map structure.

以上是单目初始化的步骤，初始化完成后便确定了初始两帧的相对位姿，并通过初始化部分图像帧(如初始化开始的十帧图像帧)，生成可用于后续帧跟踪的局部地图。The above are the steps of monocular initialization. After the initialization is completed, the relative pose of the initial two frames is determined, and by initializing some image frames (such as the ten image frames at the beginning of the initialization), a local map that can be used for subsequent frame tracking is generated.

S303，在参考帧中跟踪当前帧。S303, track the current frame in the reference frame.

这里，根据当前帧，跟踪线程依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧。Here, according to the current frame, the tracking thread sequentially tracks the current frame in the reference frame and the local map generated based on the reference frame.

S304，计算当前帧和参考帧的最优位姿。S304, calculate the optimal pose of the current frame and the reference frame.

这里，当在参考帧中追踪当前帧成功时，获取当前帧与参考帧的初始位姿。具体的，成功初始化或参考帧跟踪成功后，首先采用根据“恒速运动模型”给定当前帧相对于参考帧的位姿T_i,i-1初值的初值T_i,i-1＝T_i-1,i-2，由于当前帧并未提取图像特征点，所以以该初值为基础，采用光度误差优化相邻帧之间的位姿：Here, when the current frame is successfully tracked in the reference frame, the initial poses of the current frame and the reference frame are obtained. Specifically, after the successful initialization or the successful tracking of the reference frame _, the initial value T _i,i-1 = T _i-1,i-2 , since the current frame does not extract image feature points, based on this initial value, the photometric error is used to optimize the pose between adjacent frames:

其中，

u是第i-1帧图像中角点像素位置,R是采集图像角点的区域，T_i,i-1∈SE(3)是由李群表示的6自由度位姿，δI是光度误差。du是第i-1帧中像素u所对应的深度值。in,

u is the pixel position of the corner point in the i-1th frame image, R is the area where the corner points of the image are collected, T _i,i-1 ∈ SE(3) is the 6-DOF pose represented by the Lie group, and δI is the photometric error . du is the depth value corresponding to pixel u in frame i-1.

进一步地，将在参考帧中跟踪到的地图点按照初始位姿投影至当前帧，并计算地图点投影在当前帧中的投影点所在的图像块与对应在当前帧中的匹配点所在的图像块之间的像素误差。将最小化像素误差后的相对位姿作为当前帧与参考帧之间的最优位姿。其中，优化过程中每个像素点所对应的斑块可以包含8个像素，其中省略了右下角的像素，8个像素便于启动处理器中SSE加速计算，这个斑块模型在速度和精度之间实现了良好的平衡。Further, the map point tracked in the reference frame is projected to the current frame according to the initial pose, and the image block where the projected point of the map point is projected in the current frame and the image corresponding to the matching point in the current frame are calculated. Pixel error between blocks. The relative pose after minimizing the pixel error is taken as the optimal pose between the current frame and the reference frame. Among them, the patch corresponding to each pixel point in the optimization process can contain 8 pixels, of which the pixel in the lower right corner is omitted, and 8 pixels are convenient to start the SSE acceleration calculation in the processor. This patch model is between speed and accuracy. A good balance is achieved.

S305，当跟踪失败时，在当前帧提取图像特征点，以在参考帧中完成对当前帧的位姿跟踪。S305, when the tracking fails, extract image feature points in the current frame to complete the pose tracking of the current frame in the reference frame.

这里，当在参考帧中跟踪当前帧失败时，在当前帧中提取与上一帧关键帧中的至少一个地图点相匹配的图像特征点，计算当前帧与上一帧关键帧之间的相对位姿，并最小化相对位姿以完成位姿跟踪。具体的，完成匹配后，通过最小化当前帧与上一关键帧(k-1)之间的相对位姿完成位姿跟踪，如公式4所示；Here, when tracking the current frame in the reference frame fails, extract image feature points in the current frame that match at least one map point in the key frame of the previous frame, and calculate the relative relationship between the current frame and the key frame of the previous frame. pose, and minimize the relative pose to complete the pose tracking. Specifically, after the matching is completed, the pose tracking is completed by minimizing the relative pose between the current frame and the previous key frame (k-1), as shown in Equation 4;

其中，p_j为关键帧k-1中提取的第j个图像特征点在该帧相机坐标系下的三维坐标，u'_j为当前帧提取的图像特征点中与关键帧中第j个图像特征点相匹配的点。Among them, p _j is the three-dimensional coordinate of the jth image feature point extracted in the key frame k-1 in the camera coordinate system of this frame, u' _j is the image feature point extracted in the current frame and the jth image in the key frame Points that match the feature points.

S306，对当前帧进行位姿优化。S306, perform pose optimization on the current frame.

S307，在局部地图中跟踪当前帧。S307, track the current frame in the local map.

本步骤中，完成当前帧基于参考帧的位姿估计后，继续跟踪局部地图，局部地图中维护着多个关键帧所对应的三维地图点，通过跟踪更多帧的三维地图点可以进一步增加约束、提升位姿估计精度。In this step, after completing the pose estimation of the current frame based on the reference frame, the local map is continued to be tracked. The local map maintains the 3D map points corresponding to multiple key frames. Constraints can be further increased by tracking the 3D map points of more frames. , Improve the accuracy of pose estimation.

S308，计算当前帧和局部地图的最优位姿。S308, calculate the optimal pose of the current frame and the local map.

这里，将当前帧之前的至少一个图像帧对应在局部地图中的三维地图点均投影至当前帧，并在至少一个投影点附近分别搜索与局部地图中的三维地图点对应的投影点：在搜索到的投影点中选取与当前帧中的匹配点的灰度值最接近的投影点，并计算选取的投影点与对应在当前帧中的匹配点之间的像素误差；将最小化像素误差后的相对位姿作为当前帧与局部地图之间的最优位姿。具体的，局部地图点投影到当前帧通过光度误差再进行一次位姿微调，如公式5所示：Here, the three-dimensional map points in the local map corresponding to at least one image frame before the current frame are projected to the current frame, and the projection points corresponding to the three-dimensional map points in the local map are respectively searched near at least one projected point: Select the projection point that is closest to the gray value of the matching point in the current frame from the obtained projection points, and calculate the pixel error between the selected projection point and the matching point in the current frame; after minimizing the pixel error The relative pose of is the optimal pose between the current frame and the local map. Specifically, the local map point is projected to the current frame to perform fine-tuning of the pose again through the photometric error, as shown in Equation 5:

其中，p_j，k-1为世界坐标系下的三维地图点、u_j为提取p_j的像素、k为u_j所在的关键帧。进一步地，需要通过像素块的匹配及投影误差优化进一步提升位姿精度。由于未在当前帧提取图像特征点，则需要通过斑块的像素误差完成像素斑块的匹配。以微调后的位姿为初值，将局部地图点投影到当前帧，在投影点附近分别搜索局部地图点中的匹配斑块。完成匹配后，采用重投影误差模型进一步优化位姿，如公式6所示：Among them, p _{j, k-1} is the three-dimensional map point in the world coordinate system, u _j is the pixel from which p _j is extracted, and k is the key frame where u _j is located. Further, it is necessary to further improve the pose accuracy through pixel block matching and projection error optimization. Since the image feature points are not extracted in the current frame, the pixel patch matching needs to be completed through the pixel error of the patch. Taking the fine-tuned pose as the initial value, project the local map points to the current frame, and search for matching patches in the local map points near the projected points. After the matching is completed, the reprojection error model is used to further optimize the pose, as shown in Equation 6:

其中，u'_j为搜索到的p_j的匹配点。Among them, u' _j is the searched matching point of p _j .

S309，当跟踪失败时，在当前帧提取图像特征点，以在局部地图中完成对当前帧的位姿跟踪。S309, when the tracking fails, extract image feature points in the current frame to complete the pose tracking of the current frame in the local map.

这里，当在局部地图中跟踪当前帧失败时，在当前帧中提取与局部地图中的至少一个地图点相匹配的图像特征点，计算当前帧与局部地图之间的相对位姿，并最小化相对位姿以完成位姿跟踪。其中，若跟踪局部地图失败，则在当前帧提取ORB特征完成与局部地图的匹配，再根据公式6进行位姿优化。在局部地图的管理上采用了缓存机制，在上一帧跟踪效果较好的地图点会缓存下来优先投影到当前帧，若缓存的地图点不够则补充其它地图点。这样能以更高的效率跟踪局部地图。Here, when tracking the current frame in the local map fails, extract image feature points in the current frame that match at least one map point in the local map, calculate the relative pose between the current frame and the local map, and minimize Relative pose to complete pose tracking. Among them, if the tracking of the local map fails, the ORB feature is extracted in the current frame to complete the matching with the local map, and then the pose is optimized according to formula 6. A cache mechanism is used in the management of local maps. The map points with better tracking effect in the previous frame will be cached and projected to the current frame first. If the cached map points are not enough, other map points will be added. This enables tracking of local maps with greater efficiency.

S310，判断当前帧是否满足关键帧的预设条件。S310: Determine whether the current frame satisfies the preset condition of the key frame.

这里，确定当前帧是否被选择为关键帧，主要按照运动幅度的大小及当时的场景来确定，主要标准参考了ORB-SLAM，满足以下条件之一，当前帧被选为关键帧。在连续的图像帧的数量超过预设次数且在预设次数如20帧的连续的图像帧中未选取关键帧，和/或在参考帧中跟踪到的地图点的数量少于预设阈值如90％。还包括局部优化线程处于空闲状态和/或跟踪的地图点数量小于预设数值如50。同时，在不是关键帧时继续重复上述S301至S310的步骤。Here, determining whether the current frame is selected as a key frame is mainly determined according to the magnitude of the motion range and the scene at that time. The main standard refers to ORB-SLAM. If one of the following conditions is met, the current frame is selected as a key frame. When the number of consecutive image frames exceeds a preset number of times and no key frames are selected in a preset number of consecutive image frames, such as 20 frames, and/or the number of map points tracked in the reference frame is less than a preset threshold such as 90%. It also includes that the local optimization thread is in an idle state and/or the number of map points tracked is less than a preset value such as 50. Meanwhile, when it is not a key frame, the above steps S301 to S310 are continued to be repeated.

S311，基于关键帧和局部地图做三维地图的局部优化。S311, perform local optimization of the three-dimensional map based on the key frame and the local map.

这里，在至少一个关键帧中提取与局部地图中的至少一个三维地图点相匹配的图像特征点，并基于图像特征点和三维地图点之间的最小化像素误差后的最优位姿，对匹配的三维地图点的坐标进行位姿优化。具体的，局部优化线程最主要的作用是对最近的m个关键帧以及对应局部地图做局部Bundle Adjustment(Local BA)。关键帧的位姿及地图点的三维坐标同时作为优化变量进行优化，同时提升位姿和地图精度，依据地图点在关键帧中的跟踪和可视程度来剔除不稳定的地图点和关键帧，并三角化新的地图点。Here, an image feature point matching at least one 3D map point in the local map is extracted in at least one key frame, and based on the optimal pose between the image feature point and the 3D map point after minimizing the pixel error, for The coordinates of the matched 3D map points are optimized for pose. Specifically, the main function of the local optimization thread is to perform local Bundle Adjustment (Local BA) on the nearest m key frames and the corresponding local map. The pose of the key frame and the three-dimensional coordinates of the map point are optimized as optimization variables at the same time, and the pose and map accuracy are improved at the same time, and unstable map points and key frames are eliminated according to the tracking and visibility of the map points in the key frame. And triangulate the new map point.

S312，进行三维地图的回环检测。S312, perform loop closure detection of the three-dimensional map.

这里，将当前的关键帧的图像特征点与之前已经确定的至少一个关键帧的所述图像特征点进行相似度比较，将大于相似度阈值的当前的关键帧确定候选回环帧，并在候选回环帧及其相邻图像帧与之前已经确定的至少一个关键帧及其相邻图像帧连续相似的数量大于预设数量时，则确定候选回环帧为回环帧。Here, the similarity is compared between the image feature point of the current key frame and the image feature point of at least one key frame that has been determined before, and the current key frame greater than the similarity threshold is determined as a candidate loop closure frame, and the candidate loop closure frame is When the number of consecutive similarities between the frame and its adjacent image frames and the previously determined at least one key frame and its adjacent image frames is greater than a preset number, the candidate loop closure frame is determined to be a loop closure frame.

具体的，回环检测线程主要完成相似关键帧判断、求解近似变换、回环融合、本质图优化，具体如下：Specifically, the loop closure detection thread mainly completes similar key frame judgment, approximate transformation, loop closure fusion, and essential map optimization, as follows:

(1)检测回环：由词袋法(Bag-of-Visual-Words)计算图像间的相似性，首先计算当前的关键帧与其共视关键帧(共视地图点大于30个)的相似性，并保存其中的最小值，相似性大于该值的帧确定为候选回环帧，将当前帧及共视关键帧与候选回环帧及其相邻帧进行匹配，如果连续相似的帧数大于预设值如3，则确定该候选帧为回环帧。(1) Detect loop closure: Calculate the similarity between images by the Bag-of-Visual-Words method, first calculate the similarity between the current key frame and its co-view key frame (more than 30 co-view map points), And save the minimum value among them. Frames with a similarity greater than this value are determined as candidate loopback frames, and the current frame and the co-view key frame are matched with the candidate loopback frame and its adjacent frames. If the number of consecutive similar frames is greater than the preset value If 3, it is determined that the candidate frame is a loopback frame.

(2)计算相似性变换：由于当前的关键帧和回环帧之间可以进行ORB匹配，所以各自的地图点之间也建立了匹配关系，于是可以优化出两个关键帧之间的相似变换。(2) Calculate the similarity transformation: Since ORB matching can be performed between the current key frame and the loopback frame, a matching relationship is also established between the respective map points, so the similarity transformation between the two key frames can be optimized.

(3)回环融合：完成相似位姿变换求解后，调整当前的关键帧及周围关键帧的位置姿态，这样回环的两头就基本对齐了，之后融合当前的关键帧和回环帧之间匹配的地图。(3) Loop fusion: After completing the solution of the similar pose transformation, adjust the position and attitude of the current key frame and surrounding key frames, so that the two ends of the loop are basically aligned, and then fuse the current key frame and the matching map between the loop frames .

(4)回环闭合：根据essential graph(回环中共视点大于100关键帧之间建立边,形成的图)优化回环中所有关键帧的位置姿态，把回环误差均摊到相应的关键帧上。(4) Loop closure: optimize the position and attitude of all key frames in the loop according to the essential graph (the graph formed by establishing edges between the viewpoints in the loop and more than 100 key frames), and evenly distribute the loop error to the corresponding key frames.

以上是回环检测线程的主要步骤，由于关键帧中提取了ORB特征点，所以SVL具有回环检测功能。The above are the main steps of the loopback detection thread. Since the ORB feature points are extracted from the key frame, SVL has the function of loopback detection.

本申请基于上述步骤实现上述一种图像帧之间位姿的估计方法。通过在在关键帧中提取图像点特征，并根据该特征的描述子完成关键帧之间的特征匹配；非关键帧不再提取和匹配图像点特征，通过稀疏的图像对齐完成位姿跟踪和定位。在非关键帧避免了特征提取和匹配带来的时间消耗，关键帧处的ORB特征使得该方案具备较高的精度以及回环检测能力。The present application implements the above-mentioned method for estimating pose between image frames based on the above steps. By extracting image point features in key frames, and completing feature matching between key frames according to the descriptors of the features; non-key frames no longer extract and match image point features, and complete pose tracking and positioning through sparse image alignment . The time consumption caused by feature extraction and matching is avoided in non-key frames, and the ORB features at key frames enable the scheme to have high accuracy and loop closure detection capabilities.

基于同一发明构思，本申请实施例400还提供一种三维地图的构建装置，其中，如图4所示，该装置包括：Based on the same inventive concept, Embodiment 400 of the present application further provides a device for constructing a three-dimensional map, wherein, as shown in FIG. 4 , the device includes:

接收模块41，用于接收图像帧；a receiving module 41 for receiving image frames;

跟踪模块42，用于将每一帧图像帧作为当前帧，以及将当前帧的前一帧作为参考帧，并根据当前帧，依次在参考帧和基于参考帧生成的局部地图中跟踪当前帧；The tracking module 42 is used to use each frame of image frame as the current frame, and the previous frame of the current frame as the reference frame, and according to the current frame, in turn in the reference frame and the local map generated based on the reference frame. Track the current frame;

构建模块43，用于响应于跟踪成功，确定满足预设条件的当前帧为关键帧，在关键帧中提取图像特征点，并基于图像特征点计算关键帧之间的最优位姿。The building module 43 is configured to, in response to successful tracking, determine the current frame satisfying the preset condition as a key frame, extract image feature points in the key frame, and calculate the optimal pose between the key frames based on the image feature points.

本实施例中，接收模块41、跟踪模块42和构建模块43的具体功能和交互方式，可参见图1对应的实施例的记载，在此不再赘述。In this embodiment, for the specific functions and interaction modes of the receiving module 41 , the tracking module 42 and the building module 43 , reference may be made to the description of the embodiment corresponding to FIG. 1 , which will not be repeated here.

如图5所示，本申请的又一实施例500还提供一种终端设备，包括处理器501，其中，处理器501用于执行上述一种图像帧之间位姿的估计方法的步骤。从图5中还可以看出，上述实施例提供的终端设备还包括非瞬时计算机可读存储介质502，该非瞬时计算机可读存储介质502上存储有计算机程序，该计算机程序被处理器501运行时执行上述一种图像帧之间位姿的估计方法的步骤。实际应用中，该终端设备可以是一台或多台计算机，只要包括上述计算机可读介质和处理器即可。As shown in FIG. 5 , another embodiment 500 of the present application further provides a terminal device, including a processor 501, wherein the processor 501 is configured to execute the steps of the above-mentioned method for estimating a pose between image frames. It can also be seen from FIG. 5 that the terminal device provided in the above-mentioned embodiment further includes a non-transitory computer-readable storage medium 502 , and a computer program is stored on the non-transitory computer-readable storage medium 502 , and the computer program is executed by the processor 501 When performing the steps of the above-mentioned method for estimating pose between image frames. In practical applications, the terminal device may be one or more computers, as long as it includes the above-mentioned computer-readable medium and a processor.

具体地，该存储介质能够为通用的存储介质，如移动磁盘、硬盘和FLASH等，该存储介质上的计算机程序被运行时，能够执行上述的一种图像帧之间位姿的估计方法中的各个步骤。实际应用中，所述的计算机可读介质可以是上述实施例中描述的设备/装置/系统中所包含的，也可以是单独存在，而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序，当上述一个或多个程序被执行时，能够执行上述的一种图像帧之间位姿的估计方法中的各个步骤。Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, and FLASH, etc. When the computer program on the storage medium is run, it can execute the above-mentioned method for estimating the pose between image frames. each step. In practical applications, the computer-readable medium may be included in the device/apparatus/system described in the above embodiments, or may exist alone without being assembled into the apparatus/apparatus/system. The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, each step in the above-mentioned method for estimating a pose between image frames can be performed.

根据本申请公开的实施例，计算机可读存储介质可以是非易失性的计算机可读存储介质，例如可以包括但不限于：便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件，或者上述的任意合适的组合，但不用于限制本申请保护的范围。在本申请公开的实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to the embodiments disclosed in the present application, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as, but not limited to, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM) ), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above, but are not intended to limit this application scope of protection. In the embodiments disclosed in this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

本申请附图中的流程图和框图，示出了按照本申请公开的各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或者代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应该注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同附图中所标注的顺序发生。例如，两个连接地表示的方框实际上可以基本并行地执行，它们有时也可以按照相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或者流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the different figures. For example, two blocks shown in connection may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

本领域技术人员可以理解，本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合，即使这样的组合或结合没有明确记载于本申请中。特别地，在不脱离本申请精神和教导的情况下，本申请的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合，所有这些组合和/或结合均落入本申请公开的范围。Those skilled in the art will appreciate that various combinations and/or combinations of features recited in various embodiments of the present disclosure and/or claims are possible, even if such combinations or combinations are not explicitly described in this application. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present application and/or the claims may be made without departing from the spirit and teachings of the present application, all such combinations and/or combinations falling within Scope of this disclosure.

最后应说明的是：以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行变更或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些变更、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present application, and are used to illustrate the technical solutions of the present application, rather than limit them. The embodiment describes the application in detail. Those of ordinary skill in the art should understand that: any person skilled in the art can still make changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the application. Or can easily think of changes, or equivalently replace some of the technical features; and these changes, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application, and should be covered in the present application. within the scope of protection. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. A method for estimating a pose between image frames, comprising:

receiving an image frame;

taking each frame of the image frame as a current frame, taking a previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;

and responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.

2. The estimation method according to claim 1, wherein between the step of tracking the current frame in the reference frame and the local map generated based on the reference frame in turn and the step of determining that the current frame satisfying a preset condition is a key frame, the method further comprises:

when the current frame is successfully tracked in the reference frame, acquiring initial poses of the current frame and the reference frame;

projecting the map points tracked in the reference frame to the current frame according to the initial pose, and calculating pixel errors between image blocks where the map points are projected in the current frame and image blocks where the matching points corresponding to the current frame are located;

and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the reference frame.

3. The estimation method according to claim 2, wherein after the step of taking the relative pose after minimizing the pixel error as the optimal pose between the current frame and the reference frame, the method further comprises:

projecting three-dimensional map points, corresponding to a local map, of at least one image frame before the current frame to the current frame, and searching for the projection points corresponding to the three-dimensional map points in the local map near the at least one projection point respectively:

selecting the projection point which is closest to the gray value of the matching point in the current frame from the searched projection points, and calculating the pixel error between the selected projection point and the matching point corresponding to the current frame;

and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the local map.

4. The estimation method according to claim 2, wherein between the step of tracking the current frame in the reference frame and the local map generated based on the reference frame in turn and the step of determining that the current frame satisfying a preset condition is a key frame, the method further comprises:

when tracking the current frame in the reference frame fails, extracting the image feature point matched with at least one map point in the key frame of the previous frame in the current frame, calculating the relative pose between the current frame and the key frame of the previous frame, and minimizing the relative pose to complete pose tracking.

5. The estimation method according to claim 3, characterized in that after the step of taking the relative pose after minimizing the pixel error as the optimal pose between the current frame and the reference frame, the method further comprises:

when tracking of the current frame in the local map fails, extracting the image feature point matched with at least one map point in the local map from the current frame, calculating a relative pose between the current frame and the local map, and minimizing the relative pose to complete pose tracking.

6. The estimation method according to claim 1, wherein the step of determining the current frame satisfying a preset condition as a key frame comprises:

the number of the continuous image frames exceeds a preset number of times, the key frames are not selected from the continuous image frames of the preset number of times, and/or the number of the map points tracked in the reference frame is less than a preset threshold value.

7. The estimation method according to claim 3, wherein after the step of calculating optimal poses between the keyframes based on the image feature points, the method further comprises:

extracting the image feature points matched with at least one three-dimensional map point in the local map from at least one key frame, and performing pose optimization on the coordinates of the matched three-dimensional map points on the basis of the optimal pose between the image feature points and the three-dimensional map points after the pixel error is minimized;

comparing the image feature points of the current key frame with the image feature points of at least one key frame which is determined before, determining a candidate loop-back frame from the current key frame which is greater than a similarity threshold, and determining the candidate loop-back frame as a loop-back frame when the number of the candidate loop-back frame and the adjacent image frame thereof which are continuously similar to the at least one key frame and the adjacent image frame thereof which are determined before is greater than a preset number.

8. An estimation apparatus of a three-dimensional map, characterized in that the construction apparatus comprises:

a receiving module for receiving the image frame;

the tracking module is used for taking the image frame of each frame as a current frame, taking the previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;

and the construction module is used for responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.

9. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of a method of estimating pose between image frames as claimed in any one of claims 1 to 7.

10. A terminal device characterized by comprising a processor for executing each step in a method of estimating a pose between image frames according to any one of claims 1 to 7.