CN112785705B

CN112785705B - Pose acquisition method and device and mobile equipment

Info

Publication number: CN112785705B
Application number: CN202110082125.3A
Authority: CN
Inventors: 秦家虎; 刘晨昕; 余雷; 王帅
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2024-02-09
Anticipated expiration: 2041-01-21
Also published as: CN112785705A

Abstract

The application discloses a pose acquisition method, a pose acquisition device and mobile equipment, wherein the pose acquisition method comprises the following steps: obtaining a current frame image; obtaining characteristic points with matched characteristic points in a history image contained in a sliding window in a current frame image; respectively obtaining a first pixel model of the matched characteristic points in each current frame image; respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of the characteristic points matched with the characteristic points in the current frame image in the historical image; screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points do not belong to a moving object; and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point.

Description

A posture acquisition method, device and mobile device

技术领域Technical field

本申请涉及智能家居技术领域，尤其涉及一种位姿获取方法、装置及移动设备。The present application relates to the field of smart home technology, and in particular to a posture acquisition method, device and mobile device.

背景技术Background technique

同时定位与建图(Simultaneous Localization and Mapping，SLAM)技术成为当前机器人领域研究的热点，SLAM技术按照传感器的不同可以分为激光SLAM和视觉SLAM。其中，视觉SLAM是通过一张张连续运动的图像，推断机器人的位姿以及周围环境的情况。视觉SLAM具体是指搭载特定传感器的机器人，在没有环境先验信息的情况下，在运动过程中建立环境模型，并且估计自己的运动位姿。Simultaneous Localization and Mapping (SLAM) technology has become a hot topic in current research in the field of robotics. SLAM technology can be divided into laser SLAM and visual SLAM according to different sensors. Among them, visual SLAM is to infer the position and posture of the robot and the surrounding environment through images of continuous motion. Visual SLAM specifically refers to a robot equipped with specific sensors that builds an environment model during movement and estimates its own movement posture without prior information about the environment.

而目前基于SLAM定位的算法都是基于环境是静态或者准静态的假设，也就是整个场景都是静态的。但是实际的机器人运行场景中大多数都是动态场景，相对运动的物体会对位姿获取造成干扰，因此会存在定位准确性较低的情况。The current algorithms based on SLAM positioning are based on the assumption that the environment is static or quasi-static, that is, the entire scene is static. However, most of the actual robot operating scenes are dynamic scenes, and relatively moving objects will interfere with pose acquisition, so the positioning accuracy may be low.

因此，亟需一种能够提高SLAM定位准确性的技术方案。Therefore, a technical solution that can improve the accuracy of SLAM positioning is urgently needed.

发明内容Contents of the invention

有鉴于此，本申请提供一种位姿获取方法、装置及移动设备，用以解决现有技术中动态场景下的移动设备定位准确性较低的技术问题。In view of this, the present application provides a pose acquisition method, device and mobile device to solve the technical problem of low positioning accuracy of mobile devices in dynamic scenes in the prior art.

本申请提供了一种位姿获取方法，包括：This application provides a pose acquisition method, including:

获得当前帧图像，所述当前帧图像为移动设备上的图像采集装置所采集到的图像；Obtain a current frame image, which is an image collected by an image acquisition device on the mobile device;

获得所述当前帧图像中匹配出的特征点，其中，所述当前帧图像中匹配出的特征点为在所述当前帧图像对应的滑动窗口所包含的历史图像中有相匹配的特征点的特征点；所述滑动窗口中包含多帧历史图像，所述历史图像为所述当前帧图像之前的关键帧图像；Obtain the matched feature points in the current frame image, wherein the matched feature points in the current frame image are those with matching feature points in the historical images included in the sliding window corresponding to the current frame image. Feature points; the sliding window contains multiple frames of historical images, and the historical images are key frame images before the current frame image;

分别获得每个所述当前帧图像中匹配出的特征点的第一像素模型，所述第一像素模型对应于所述当前帧图像中匹配出的特征点的多个邻居特征点，且所述第一像素模型还具有所述第一像素模型的方向向量和所述当前帧图像中匹配出的特征点的重投影误差；A first pixel model of each matched feature point in the current frame image is obtained respectively, the first pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the current frame image, and the The first pixel model also has a direction vector of the first pixel model and a reprojection error of the matched feature points in the current frame image;

分别将每个所述当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，以得到模型比对结果，所述模型比对结果表征所述当前帧图像中匹配出的特征点对应的空间点是否属于运动物体；其中，所述第二像素模型为所述历史图像中与所述当前帧图像中匹配出的特征点相匹配的特征点的像素模型，所述第二像素模型对应于所述历史图像中匹配出的特征点的多个邻居特征点，且所述第二像素模型还具有所述第二像素模型的方向向量和所述历史图像中匹配出的特征点的重投影误差的均值；The first pixel model of the matched feature point in each of the current frame images is compared with the corresponding second pixel model to obtain a model comparison result, and the model comparison result characterizes the current frame image. Whether the spatial point corresponding to the matched feature point belongs to a moving object; wherein, the second pixel model is a pixel model of the feature point in the historical image that matches the feature point matched in the current frame image, The second pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the historical image, and the second pixel model also has a direction vector of the second pixel model and a matching feature point in the historical image. The mean value of the reprojection error of the feature points;

根据所述模型比对结果，筛选出所述当前帧图像中的目标特征点，所述目标特征点为所述当前帧图像中对应的空间点不属于运动物体的特征点；According to the model comparison result, target feature points in the current frame image are screened out, and the target feature points are feature points whose corresponding spatial points in the current frame image do not belong to moving objects;

根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿。According to the reprojection error corresponding to the target feature point in the current frame image, the current pose of the mobile device is obtained.

上述方法，优选的，根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿，包括：The above method, preferably, obtains the current pose of the mobile device based on the reprojection error corresponding to the target feature point in the current frame image, including:

根据所述目标特征点对应的空间点的深度值，获得所述目标特征点对应的权重值；Obtain the weight value corresponding to the target feature point according to the depth value of the spatial point corresponding to the target feature point;

根据所述目标特征点对应的权重值，获得所述目标特征点的重投影误差最小时对应的位姿变换矩阵，所述重投影误差根据所述目标特征点在所述历史图像中对应的三维坐标值和所述目标特征点在所述当前帧图像中的二维坐标值获得；According to the weight value corresponding to the target feature point, the pose transformation matrix corresponding to the minimum reprojection error of the target feature point is obtained. The reprojection error is based on the three-dimensional corresponding three-dimensional image of the target feature point in the historical image. The coordinate value and the two-dimensional coordinate value of the target feature point in the current frame image are obtained;

根据所述位姿变换矩阵，获得所述移动设备的当前位姿。According to the pose transformation matrix, the current pose of the mobile device is obtained.

上述方法，优选的，分别获得每个所述当前帧图像中匹配出的特征点的第一像素模型，包括：The above method, preferably, obtains the first pixel model of the matched feature points in each of the current frame images, including:

分别以每个所述当前帧图像中匹配出的特征点为中心，获得所述当前帧图像中匹配出的特征点的多个邻居特征点；Using the matched feature points in each current frame image as the center, obtain multiple neighbor feature points of the matched feature points in the current frame image;

其中，所述当前帧图像中匹配出的特征点与其对应的邻居特征点的深度值不同，且所述当前帧图像中匹配出的特征点的邻居特征点为深度值大于目标深度且与所述当前帧图像中匹配出的特征点之间的距离满足距离排序规则的特征点，所述目标深度与所述当前帧图像中匹配出的特征点的邻域像素点的深度均值相关；Wherein, the depth value of the matched feature point in the current frame image is different from that of its corresponding neighbor feature point, and the depth value of the neighbor feature point of the matched feature point in the current frame image is greater than the target depth and is different from the target depth. The distance between the matched feature points in the current frame image satisfies the feature points of the distance sorting rule, and the target depth is related to the depth mean of the neighborhood pixels of the matched feature points in the current frame image;

至少根据所述当前帧图像中匹配出的特征点的多个邻居特征点，建立所述当前帧图像中匹配出的特征点的第一像素模型，以使得所述第一像素模型对应于所述当前帧图像中匹配出的特征点的多个邻居特征点且还具有所述第一像素模型的方向向量和所述当前帧图像中匹配出的特征点的重投影误差。Establish a first pixel model of the matched feature point in the current frame image based on at least a plurality of neighbor feature points of the matched feature point in the current frame image, so that the first pixel model corresponds to the The multiple neighbor feature points of the matched feature point in the current frame image also have the direction vector of the first pixel model and the reprojection error of the matched feature point in the current frame image.

上述方法，所述第一像素模型的方向向量基于所述第一像素模型中的邻居特征点对应的空间中心点的三维坐标值和所述当前帧图像中匹配出的特征点对应的空间点的三维坐标值获得，所述当前帧图像中匹配出的特征点的重投影误差基于所述历史图像获得。In the above method, the direction vector of the first pixel model is based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the spatial point corresponding to the matched feature point in the current frame image. Three-dimensional coordinate values are obtained, and the reprojection error of the matched feature points in the current frame image is obtained based on the historical image.

上述方法，优选的，分别将每个所述当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，以得到模型比对结果，包括：In the above method, preferably, the first pixel model of the matched feature points in each of the current frame images is compared with the corresponding second pixel model to obtain the model comparison result, including:

获得所述第一像素模型和其对应的第二像素模型中相匹配的邻居特征点的数量；Obtain the number of matching neighbor feature points in the first pixel model and its corresponding second pixel model;

获得所述第一像素模型中的方向向量和其对应的第二像素模型中的方向向量之间的差值的模长；Obtain the module length of the difference between the direction vector in the first pixel model and its corresponding direction vector in the second pixel model;

判断所述第一像素模型中所述当前帧图像中匹配出的特征点的重投影误差是否小于或等于与所述第二像素模型中所述历史图像中匹配出的特征点对应的重投影误差的均值，以得到判断结果；所述均值为所述历史图像中匹配出的特征点在所述滑动窗口中累积重投影误差的均值；Determine whether the reprojection error of the matched feature points in the current frame image in the first pixel model is less than or equal to the reprojection error corresponding to the matched feature points in the historical image in the second pixel model The mean value is the mean value of the cumulative reprojection error of the matched feature points in the historical image in the sliding window to obtain the judgment result;

根据所述数量以及所述模长和所述判断结果，获得模型比对结果。According to the quantity, the module length and the judgment result, a model comparison result is obtained.

上述方法，优选的，还包括：The above method preferably also includes:

获得针对于所述历史图像中的特征点的像素模型的更新请求；Obtain an update request for a pixel model of a feature point in the historical image;

获得所述滑动窗口内新加入的图像；Obtain newly added images within the sliding window;

对所述新加入的图像进行特征点提取，以得到所述新加入的图像中的特征点；Perform feature point extraction on the newly added image to obtain feature points in the newly added image;

将所述新加入的图像中的特征点与所述滑动窗口中的历史图像中的特征点进行匹配，以得到所述新加入的图像中与所述历史图像中的特征点相匹配的匹配特征点和所述新加入的图像中与所述历史图像中的特征点均不匹配的非匹配特征点；Match the feature points in the newly added image with the feature points in the historical images in the sliding window to obtain matching features in the newly added image that match the feature points in the historical images. points and non-matching feature points in the newly added image that do not match the feature points in the historical image;

根据所述匹配特征点，对所述历史图像中与所述匹配特征点相匹配的特征点的像素模型进行更新；According to the matching feature point, update the pixel model of the feature point in the historical image that matches the matching feature point;

获得所述非匹配特征点的像素模型，以使得在所述新加入的图像作为所述滑动窗口中的历史图像时，所述非匹配特征点的像素模型作为所述历史图像中新的特征点的像素模型。Obtain the pixel model of the non-matching feature point, so that when the newly added image is used as a historical image in the sliding window, the pixel model of the non-matching feature point is used as a new feature point in the historical image. pixel model.

上述方法，优选的，在根据所述模型比对结果，筛选出所述当前帧图像中的目标特征点之后，在根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿之前，所述方法还包括：In the above method, preferably, after filtering out the target feature points in the current frame image according to the model comparison result, and obtaining the said target feature points according to the reprojection error corresponding to the target feature point in the current frame image. Before moving the current posture of the device, the method further includes:

在所述当前帧图像中的目标特征点中，筛选出与所述历史图像中的特征点满足极线约束的目标特征点。Among the target feature points in the current frame image, target feature points that satisfy the epipolar constraint with the feature points in the historical image are screened out.

上述方法，优选的，获得所述当前帧图像中匹配出的特征点，包括：The above method, preferably, obtains the matched feature points in the current frame image, including:

对所述当前帧图像中的特征点进行提取；Extract feature points in the current frame image;

将所述当前帧图像中提取出的特征点与所述历史图像中提取到的特征点进行匹配，以得到所述当前帧图像中与所述历史图像中的特征点相匹配的特征点。Match the feature points extracted from the current frame image with the feature points extracted from the historical image to obtain feature points in the current frame image that match the feature points in the historical image.

本申请还提供了一种位姿获取装置，包括：This application also provides a posture acquisition device, including:

图像获得单元，用于获得当前帧图像，所述当前帧图像为移动设备上的图像采集装置所采集到的图像；An image acquisition unit, used to obtain a current frame image, where the current frame image is an image collected by an image acquisition device on a mobile device;

特征点处理单元，用于获得所述当前帧图像中匹配出的特征点，其中，所述当前帧图像中匹配出的特征点为在所述当前帧图像对应的滑动窗口所包含的历史图像中有相匹配的特征点的特征点；所述滑动窗口中包含多帧历史图像，所述历史图像为所述当前帧图像之前的关键帧图像；A feature point processing unit, configured to obtain the matched feature points in the current frame image, where the matched feature points in the current frame image are historical images included in the sliding window corresponding to the current frame image. There are feature points that match the feature points; the sliding window contains multiple frames of historical images, and the historical images are key frame images before the current frame image;

模型建立单元，用于分别获得每个所述当前帧图像中匹配出的特征点的第一像素模型，所述第一像素模型对应于所述当前帧图像中匹配出的特征点的多个邻居特征点，且所述第一像素模型还具有所述第一像素模型的方向向量和所述当前帧图像中匹配出的特征点的重投影误差；A model building unit configured to obtain a first pixel model of each feature point matched in the current frame image, where the first pixel model corresponds to multiple neighbors of the feature point matched in the current frame image. Feature points, and the first pixel model also has a direction vector of the first pixel model and a reprojection error of the matched feature points in the current frame image;

模型比对单元，用于分别将每个所述当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，以得到模型比对结果，所述模型比对结果表征所述当前帧图像中匹配出的特征点对应的空间点是否属于运动物体；其中，所述第二像素模型为所述历史图像中与所述当前帧图像中匹配出的特征点相匹配的特征点的像素模型，所述第二像素模型对应于所述历史图像中匹配出的特征点的多个邻居特征点，且所述第二像素模型还具有所述第二像素模型的方向向量和所述历史图像中匹配出的特征点的重投影误差的均值；A model comparison unit, configured to compare the first pixel model of the matched feature points in each of the current frame images with the corresponding second pixel model to obtain a model comparison result. The model comparison The result represents whether the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object; wherein, the second pixel model matches the feature point matched in the current frame image in the historical image. A pixel model of the feature point, the second pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the historical image, and the second pixel model also has a direction vector of the second pixel model and the mean value of the reprojection error of the matched feature points in the historical image;

特征点筛选单元，用于根据所述模型比对结果，筛选出所述当前帧图像中的目标特征点，所述目标特征点为所述当前帧图像中对应的空间点不属于运动物体的特征点；A feature point screening unit, configured to screen out target feature points in the current frame image based on the model comparison results, where the target feature points are features of corresponding spatial points in the current frame image that do not belong to moving objects. point;

位姿获得单元，用于根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿。A pose obtaining unit is configured to obtain the current pose of the mobile device based on the reprojection error corresponding to the target feature point in the current frame image.

本申请还提供了一种移动设备，包括：This application also provides a mobile device, including:

存储器，用于存储应用程序和所述应用程序运行所产生的数据；Memory for storing application programs and data generated by the operation of said application programs;

处理器，用于执行所述应用程序，以实现：A processor for executing said application to achieve:

从上述技术方案可以看出，本申请公开的一种位姿获取方法、装置及移动设备，在移动设备上的图像采集装置所采集到的当前帧图像之后，对当前帧图像中与之前滑动窗口所包含的历史图像中的特征点相匹配的特征点进行获取，进而在分别获得这些当前帧图像中的特征点的对应于多个邻居特征点的像素模型之后，将该像素模型与滑动窗口所包含的历史图像中的特征点的像素模型进行比对，进而根据模型比对结果就可以筛选出当前帧图像中对应的空间点不属于运动物体的目标特征点，基于此，就能够根据这些目标特征点的重投影误差来获得移动设备的当前位姿。可见，本申请中利用前后图像中相匹配的特征点的像素模型来对属于运动物体的特征点进行剔除，从而根据筛选出来的不属于运动物体的特征点实现对移动设备的定位，由此，能够避免运动物体的特征点对位姿获取的干扰，从而提高移动设备在动态场景下的定位准确性。It can be seen from the above technical solutions that the pose acquisition method, device and mobile device disclosed in this application, after the current frame image collected by the image acquisition device on the mobile device, compares the current frame image with the previous sliding window The feature points that match the feature points in the included historical images are obtained, and then after obtaining the pixel models corresponding to multiple neighbor feature points of the feature points in the current frame image, the pixel model is combined with the sliding window. The pixel models of the feature points in the historical images included are compared, and then based on the model comparison results, the corresponding spatial points in the current frame image can be filtered out that are not target feature points of moving objects. Based on this, based on these targets, The reprojection error of the feature points is used to obtain the current pose of the mobile device. It can be seen that in this application, the pixel model of the matching feature points in the front and rear images is used to eliminate the feature points belonging to the moving object, so as to realize the positioning of the mobile device based on the filtered feature points that do not belong to the moving object. Therefore, It can avoid the interference of the feature points of moving objects on pose acquisition, thereby improving the positioning accuracy of mobile devices in dynamic scenes.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本申请实施例一提供的一种位姿获取方法的流程图；Figure 1 is a flow chart of a pose acquisition method provided in Embodiment 1 of the present application;

图2-图6分别为本申请实施例的示例图；Figures 2-6 are example diagrams of embodiments of the present application respectively;

图7-图8分别为本申请实施例一提供的一种位姿获取方法的部分流程图；Figures 7-8 are partial flow charts of a posture acquisition method provided in Embodiment 1 of the present application;

图9为本申请实施例的另一示例图；Figure 9 is another example diagram of an embodiment of the present application;

图10-图11分别为本申请实施例一提供的一种位姿获取方法的另一部分流程图；Figures 10 and 11 are respectively another part of the flow chart of a posture acquisition method provided in Embodiment 1 of the present application;

图12为本申请实施例一提供的一种位姿获取方法的另一流程图；Figure 12 is another flow chart of a pose acquisition method provided in Embodiment 1 of the present application;

图13为本申请实施例二提供的一种位姿获取装置的结构示意图；Figure 13 is a schematic structural diagram of a posture acquisition device provided in Embodiment 2 of the present application;

图14为本申请实施例二提供的一种位姿获取装置的另一结构示意图；Figure 14 is another structural schematic diagram of a posture acquisition device provided in Embodiment 2 of the present application;

图15为本申请实施例三提供的一种移动设备的结构示意图；Figure 15 is a schematic structural diagram of a mobile device provided in Embodiment 3 of the present application;

图16-图19分别为本申请实施例适用于移动机器人的示例图。Figures 16 to 19 are respectively example diagrams of embodiments of the present application being applied to mobile robots.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

参考图1，为本申请实施例一提供的一种位姿获取方法的实现流程图，该方法可以适用于能够对图像进行处理的电子设备中，如具有图像采集设备的移动设备，如移动机器人等，或者可以为与移动设备相连接的设备，如计算机或服务器等。本实施例中的技术方案用于提高对移动设备进行位姿获取的准确性。Referring to Figure 1, there is a flow chart for implementing a pose acquisition method provided in Embodiment 1 of the present application. This method can be applied to electronic devices that can process images, such as mobile devices with image acquisition devices, such as mobile robots. etc., or it can be a device connected to a mobile device, such as a computer or server. The technical solution in this embodiment is used to improve the accuracy of obtaining the pose of the mobile device.

具体的，本实施例中的方法可以包括以下步骤：Specifically, the method in this embodiment may include the following steps:

步骤101：获得当前帧图像。Step 101: Obtain the current frame image.

其中，当前帧图像为移动设备上的图像采集装置所采集到的最新图像，图像采集装置可以为摄像头或录像机等设备，能够对移动设备对应的采集范围内的图像进行采集，如图2中所示，移动机器人的顶端设置有摄像头，在移动机器人移动过程中，摄像头能够采集前方范围内的图像。Among them, the current frame image is the latest image collected by the image collection device on the mobile device. The image collection device can be a camera or a video recorder, and can collect images within the corresponding collection range of the mobile device, as shown in Figure 2 It shows that the top of the mobile robot is equipped with a camera. During the movement of the mobile robot, the camera can collect images in the front range.

需要说明的是，本实施例中可以通过与图像采集装置之间的连接接口获得图像采集设备所采集到的当前帧图像，或者，本实施例中可以通过与移动设备之间的通信接口获得移动设备所传输的图像采集装置采集到的当前帧图像。It should be noted that in this embodiment, the current frame image collected by the image acquisition device can be obtained through the connection interface with the image acquisition device, or in this embodiment, the mobile image can be obtained through the communication interface with the mobile device. The current frame image collected by the image acquisition device transmitted by the device.

另外，本实施例中的当前帧图像可以理解为当前时刻需要进行处理的图像。In addition, the current frame image in this embodiment can be understood as the image that needs to be processed at the current moment.

步骤102：获得当前帧图像中匹配出的特征点。Step 102: Obtain the matched feature points in the current frame image.

其中，步骤102中所获得到的当前帧图像中匹配出的特征点为当前帧图像中与历史图像中的特征点相匹配的特征点，历史图像为当前帧图像之前的滑动窗口中的图像，如图3中所示，历史图像可以在被采集到之后，一帧或多帧历史图像组成当前帧图像对应的滑动窗口，并被存储到图像库中，基于此，本实施例中在获得当前帧图像之后，可以在图像库中读取已经保存的滑动窗口中的历史图像。其中，当前帧图像中与历史图像中的特征点相匹配的特征点是与历史图像中的特征点对应于地图中的同一空间点的特征点。Among them, the matched feature points in the current frame image obtained in step 102 are the feature points in the current frame image that match the feature points in the historical image, and the historical image is the image in the sliding window before the current frame image, As shown in Figure 3, after the historical images are collected, one or more frames of historical images form a sliding window corresponding to the current frame image and are stored in the image library. Based on this, in this embodiment, after obtaining the current After framing the image, the historical images in the saved sliding window can be read in the image library. Among them, the feature points in the current frame image that match the feature points in the historical image are the feature points that correspond to the same spatial point in the map as the feature points in the historical image.

需要说明的是，本实施例中被添加到滑动窗口中的历史图像为图像采集装置所采集到的图像中的关键帧图像，关键帧是指具有代表性意义的图像帧，可以理解为角色或者物体运动或变化中的关键动作所处的图像帧，如I帧等。It should be noted that the historical images added to the sliding window in this embodiment are key frame images in the images collected by the image acquisition device. Key frames refer to image frames with representative significance, which can be understood as characters or The image frame where the key action in the movement or change of the object occurs, such as I frame, etc.

在一种实现方式中，本实施例中在步骤102中获得当前帧图像中与历史图像中的特征点相匹配的特征点时，可以通过以下方式实现：In one implementation, in this embodiment, when obtaining the feature points in the current frame image that match the feature points in the historical image in step 102, it can be achieved in the following manner:

首先，对当前帧图像中的特征点进行提取，图像中的特征点是指图像中具有代表性的点，是图像信息的另一种数字表达形式。本实施例中的特征点可以为当前帧图像中的ORB(riented Fast and Rotated Brief)特征，由关键点和描述子组成，其中，关键点可以为FAST角点，具有尺度和旋转不变性，而描述子可以为二进制BRIEF描述子。具体实现中，本实施例可以采用预设的特征提取算法对当前帧图像中的特征点进行提取，由此得到当前帧图像中满足提取条件的特征点。而为了进一步提高准确性，本实施例中可以先将当前帧图像划分为多个小方格，再分别针对每个小方格进行特征点提取，从而增加特征点提取的均匀性，使得当前帧图像的特征点能够均匀分布在整个图像视野，此时所提取出的特征点包含当前帧图像中的特征点；First, extract the feature points in the current frame image. The feature points in the image refer to representative points in the image, which are another digital expression of image information. The feature points in this embodiment can be ORB (iented Fast and Rotated Brief) features in the current frame image, which are composed of key points and descriptors. The key points can be FAST corner points, which have scale and rotation invariance, and The descriptor can be a binary BRIEF descriptor. In specific implementation, this embodiment can use a preset feature extraction algorithm to extract feature points in the current frame image, thereby obtaining feature points in the current frame image that meet the extraction conditions. In order to further improve the accuracy, in this embodiment, the current frame image can be divided into multiple small squares, and then feature points are extracted for each small square, thereby increasing the uniformity of feature point extraction, so that the current frame The feature points of the image can be evenly distributed throughout the image field of view, and the extracted feature points at this time include the feature points in the current frame image;

然后，将当前帧图像中提取出的特征点与历史图像中提取到的特征点进行匹配。其中，历史图像中的特征点是通过与当前帧图像相同的特征提取方式进行特征点提取所得到的特征点。Then, the feature points extracted from the current frame image are matched with the feature points extracted from the historical image. Among them, the feature points in the historical image are the feature points obtained by extracting feature points using the same feature extraction method as the current frame image.

具体的，本实施例中可以使用预设的匹配算法将当前帧图像中的特征点与历史图像中的特征点进行匹配，例如对特征点进行ORB描述子匹配。为了进一步提高匹配速率，本实施例中可以采用DBOW2词袋技术等匹配算法对当前帧图像中的特征点和历史图像中的特征进行加速匹配，从而匹配出当前帧图像中与历史图像中的特征点对应于同一空间点的特征点，如图4中所示，此时，当前帧图像中匹配出的特征点在历史图像中有相匹配的特征点。Specifically, in this embodiment, a preset matching algorithm can be used to match the feature points in the current frame image with the feature points in the historical image, for example, ORB descriptor matching is performed on the feature points. In order to further improve the matching rate, in this embodiment, matching algorithms such as DBOW2 bag-of-word technology can be used to accelerate the matching of feature points in the current frame image and features in historical images, thereby matching features in the current frame image and historical images. Points correspond to feature points in the same space point, as shown in Figure 4. At this time, the feature points matched in the current frame image have matching feature points in the historical image.

具体的，本实施例中可以先将当前帧图像中的特征点与滑动窗口中与当前帧图像最近的一帧关键帧图像中提取到的特征点进行点匹配，如果当前帧图像中与滑动窗口中与当前帧图像最近的一帧关键帧图像中提取到的特征点相匹配的特征点的数量没有满足要求数目，那么再将当前帧图像中的特征点与滑动窗口中所有关键帧图像中的特征点进行匹配，直到要求数目被满足。Specifically, in this embodiment, the feature points in the current frame image can be point matched with the feature points extracted from the key frame image in the sliding window that is closest to the current frame image. If the feature points in the current frame image are consistent with the sliding window If the number of feature points matching the feature points extracted from the most recent key frame image of the current frame image does not meet the required number, then the feature points in the current frame image will be compared with the feature points in all key frame images in the sliding window. Feature points are matched until the required number is met.

进一步的，为了提高本实施例的可靠性，需要对当前帧图像中匹配出的特征点的数量是否达到预设的要求数目进行判断，如果当前帧图像中匹配出的特征点的数量没有达到要求数目，那么可以修改匹配算法，如放宽匹配算法中的匹配条件，再利用放宽匹配条件后的匹配算法将当前帧图像中提取出的特征点与滑动窗口中的历史图像中的特征点进行匹配，例如，利用放宽匹配条件的DBOW2词袋技术加速匹配出当前帧图像中与历史图像中的特征点相匹配的特征点，从而增加当前帧图像中匹配出的特征点的数量，使得匹配出的特征点的数量能够满足要求数目。Further, in order to improve the reliability of this embodiment, it is necessary to determine whether the number of matched feature points in the current frame image reaches the preset required number. If the number of matched feature points in the current frame image does not meet the requirement, number, then you can modify the matching algorithm, such as relaxing the matching conditions in the matching algorithm, and then use the matching algorithm after relaxing the matching conditions to match the feature points extracted from the current frame image with the feature points in the historical images in the sliding window. For example, the DBOW2 bag-of-words technology that relaxes the matching conditions is used to accelerate the matching of feature points in the current frame image that match the feature points in the historical image, thereby increasing the number of matched feature points in the current frame image, so that the matched features The number of points can meet the required number.

需要说明的是，滑动窗口中的历史图像为关键帧图像，因此，在对当前帧图像中的所有特征进行匹配时，使用的是滑动窗口内的关键帧的特征点，由此，匹配出当前帧图像中与滑动窗口中关键帧特征点相匹配的特征点。It should be noted that the historical images in the sliding window are key frame images. Therefore, when matching all the features in the current frame image, the feature points of the key frames in the sliding window are used, thus matching the current Feature points in the frame image that match key frame feature points in the sliding window.

进一步的，本实施例中在步骤102中获得到当前帧图像中与历史图像中的特征点相匹配的特征点之后，可以基于当前帧图像中匹配出的特征点进行重投影误差计算，并基于计算出的重投影误差初步估计出移动设备的当前位姿。例如，本实施例中根据历史图像对应的深度图像获得历史图像中与当前帧图像中的特征点相匹配的特征点对应的空间点的深度值，进而得到这些空间点的三维坐标值，再利用初始的位姿变换矩阵将三维坐标值转换到当前帧图像对应的当前坐标系中，之后投影出这些空间点在当前坐标系中的二维坐标值，再将这些坐标值与当前帧图像中匹配出的相应特征点的二维坐标值进行误差计算，由此，根据计算出的重投影误差进行位姿估计，由此初步估计出移动设备的当前位姿。Further, in this embodiment, after obtaining the feature points in the current frame image that match the feature points in the historical image in step 102, the reprojection error can be calculated based on the matched feature points in the current frame image, and based on The calculated reprojection error initially estimates the current pose of the mobile device. For example, in this embodiment, the depth values of the spatial points corresponding to the feature points in the historical image that match the feature points in the current frame image are obtained based on the depth images corresponding to the historical images, and then the three-dimensional coordinate values of these spatial points are obtained, and then used The initial pose transformation matrix converts the three-dimensional coordinate values into the current coordinate system corresponding to the current frame image, then projects the two-dimensional coordinate values of these spatial points in the current coordinate system, and then matches these coordinate values with the current frame image. The error is calculated based on the two-dimensional coordinate values of the corresponding feature points. From this, the pose estimation is performed based on the calculated reprojection error, thereby preliminarily estimating the current pose of the mobile device.

步骤103：分别获得每个当前帧图像中匹配出的特征点的第一像素模型。Step 103: Obtain the first pixel model of the matched feature points in each current frame image.

其中，第一像素模型也可以称为特征点模型，第一像素模型对应于当前帧图像中匹配出的特征点的多个邻居特征点、方向向量以及重投影误差，进一步的，当前帧图像中匹配出的特征点的邻居特征点的深度值与相应当前帧图像中的特征点的深度值是不同的，也就是说，本实施例中针对每个当前帧图像中匹配出的特征点分别建立一个第一像素模型，而每个当前帧图像中匹配出的特征点的第一像素模型是根据与当前帧图像中匹配出的特征点的深度值不同的多个邻居特征点所建立的，这些邻居特征点之间在深度值上可能相同也可能不同。基于此，本实施例中在获得当前帧图像中匹配出的特征点的第一像素模型时，具体可以通过以下方式实现：Among them, the first pixel model can also be called a feature point model. The first pixel model corresponds to multiple neighbor feature points, direction vectors and reprojection errors of the feature points matched in the current frame image. Further, in the current frame image The depth value of the neighbor feature point of the matched feature point is different from the depth value of the feature point in the corresponding current frame image. That is to say, in this embodiment, a separate establishment is established for each matched feature point in the current frame image. A first pixel model, and the first pixel model of each feature point matched in the current frame image is established based on multiple neighbor feature points that have different depth values from the feature points matched in the current frame image. These Neighbor feature points may have the same or different depth values. Based on this, in this embodiment, when obtaining the first pixel model of the matched feature points in the current frame image, it can be achieved in the following ways:

首先，获得当前帧图像中匹配出的特征点的多个邻居特征点，邻居特征点分布在当前帧图像中匹配出的特征点周边，如图5中所示；First, multiple neighbor feature points of the matched feature points in the current frame image are obtained. The neighbor feature points are distributed around the matched feature points in the current frame image, as shown in Figure 5;

然后，剔除这多个邻居特征点中深度值相同的冗余邻居特征点；Then, eliminate redundant neighbor feature points with the same depth value among these multiple neighbor feature points;

之后，再根据剩余的邻居特征点建立当前帧图像中匹配出的特征点的第一像素模型。After that, the first pixel model of the matched feature points in the current frame image is established based on the remaining neighbor feature points.

具体的，第一像素模型中可以包含有多个邻居特征点的特征数据，如邻居特征点的像素值、坐标值等，还包含有：第一像素模型的方向向量和当前帧图像中匹配出的特征点的重投影误差，其中，第一像素模型的方向向量基于第一像素模型中的邻居特征点对应的空间中心点的三维坐标值和当前帧图像中匹配出的特征点对应的空间点的三维坐标值获得，而当前帧图像中匹配出的特征点的重投影误差基于历史图像如滑动窗口中的历史图像获得，具体的，重投影误差为根据历史图像中的3D空间点投影到当前帧图像中的几何误差获得。例如，根据历史图像对应的深度图像获得历史图像中与当前帧图像中匹配出的特征点相匹配的特征点对应的空间点的深度值，进而得到这些空间点的三维坐标值，再利用初始的位姿变换矩阵将三维坐标值转换到当前帧图像对应的当前坐标系中，之后投影出这些空间点在当前坐标系中的二维坐标值，再将这些坐标值与当前帧图像中匹配出的相应特征点的二维坐标值进行误差计算。Specifically, the first pixel model may contain feature data of multiple neighbor feature points, such as pixel values, coordinate values, etc. of neighbor feature points. It also includes: the direction vector of the first pixel model and the matching result in the current frame image. The reprojection error of the feature point, where the direction vector of the first pixel model is based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the spatial point corresponding to the matched feature point in the current frame image The three-dimensional coordinate values of Geometric errors in frame images are obtained. For example, based on the depth image corresponding to the historical image, the depth values of the spatial points corresponding to the feature points in the historical image that match the feature points matched in the current frame image are obtained, and then the three-dimensional coordinate values of these spatial points are obtained, and then the initial The pose transformation matrix converts the three-dimensional coordinate values into the current coordinate system corresponding to the current frame image, and then projects the two-dimensional coordinate values of these spatial points in the current coordinate system, and then matches these coordinate values with the coordinate values in the current frame image. The two-dimensional coordinate values of the corresponding feature points are used for error calculation.

具体的，第一像素模型可以通过{p'₁,p'₂,p'₃…p'_N，n'，r'}表示。其中，{p'₁,p'₂,p'₃…p'_N}为当前帧图像中匹配出的特征点p′的与特征点p′在空间上的深度不同的邻居特征点，方向向量以n′表示，具体以表示，其中，/>为特征点p′的邻居特征点对应的空间中心点的三维坐标值，(x,y,z)为当前帧图像中匹配出的特征点对应的空间点的三维坐标值，三维坐标值可以通过当前帧图像中的特征点和其在深度图像中对应的深度值映射到空间中来获得。当前帧图像中匹配出的特征点p′的重投影误差以r`来表示，N为特征点p′的邻居特征点的数量，可以为预设值，具体可以根据经验数据确定。Specifically, the first pixel model can be represented by {p' ₁ , p' ₂ , p' ₃ ...p' _N , n', r'}. Among them, {p' ₁ , p' ₂ , p' ₃ ...p' _N } are the neighbor feature points of the matched feature point p′ in the current frame image that are different in spatial depth from the feature point p′, and the direction vector Expressed by n′, specifically means, among them,/> is the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point of feature point p′, (x, y, z) is the three-dimensional coordinate value of the spatial point corresponding to the matched feature point in the current frame image. The three-dimensional coordinate value can be passed The feature points in the current frame image and their corresponding depth values in the depth image are mapped into space to obtain. The reprojection error of the matched feature point p′ in the current frame image is represented by r`, and N is the number of neighbor feature points of the feature point p′, which can be a preset value, and can be determined based on empirical data.

需要说明的是，由于步骤102中所获得的当前帧图像中匹配的特征点为当前帧图像中与滑动窗口中的历史图像中的特征点相匹配的特征点，因此，第一像素模型是根据当前帧图像的之前的历史图像的特征点对当前帧图像的原特征点经过筛选后所得到的特征点的像素模型。另外，对于当前帧图像中除了与历史图像中的特征点相匹配的特征点之外的其他特征点也需要建立相应的像素模型，这些其他特征点建立像素模型的方式可以参考第一像素模型的建立方式，这些其他特征点的像素模型主要用于对历史图像中与当前帧图像的特征点相匹配的特征点对应的第二像素模型进行更新，具体见后文内容。因此，当前帧图像中匹配出的特征点的第一像素模型的建立是基于该特征点是与历史图像中相匹配的特征点的前提下实现的，因此，本实施例中当前帧图像中匹配出的特征点的第一像素模型的建立是在具有历史图像的基础上建立的。It should be noted that since the matching feature points in the current frame image obtained in step 102 are feature points in the current frame image that match the feature points in the historical image in the sliding window, therefore, the first pixel model is based on The pixel model of the feature points obtained by filtering the feature points of the previous historical images of the current frame image against the original feature points of the current frame image. In addition, corresponding pixel models need to be established for other feature points in the current frame image other than the feature points that match the feature points in the historical image. The way to build pixel models for these other feature points can refer to the first pixel model. In the establishment method, the pixel models of these other feature points are mainly used to update the second pixel model corresponding to the feature points in the historical image that match the feature points of the current frame image. For details, see the following content. Therefore, the establishment of the first pixel model of the feature point matched in the current frame image is based on the premise that the feature point matches the feature point in the historical image. Therefore, in this embodiment, the matching feature point in the current frame image The establishment of the first pixel model of the feature points is based on historical images.

步骤104：分别将每个当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，以得到模型比对结果。Step 104: Compare the first pixel model of the matched feature point in each current frame image with the corresponding second pixel model to obtain a model comparison result.

其中，模型比对结果表征当前帧图像中匹配出的特征点对应的空间点是否属于运动物体；第二像素模型为历史图像中的特征点的像素模型，第二像素模型对应于历史图像中的特征点的多个邻居特征点、方向向量和重投影误差的均值，进一步的，历史图像中的特征点的邻居特征点的深度值与其对应的历史图像中的特征点的深度值是不同的。Among them, the model comparison result indicates whether the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object; the second pixel model is the pixel model of the feature point in the historical image, and the second pixel model corresponds to the pixel model of the feature point in the historical image. The mean value of multiple neighbor feature points, direction vectors and reprojection errors of the feature point. Furthermore, the depth value of the neighbor feature point of the feature point in the historical image is different from the depth value of the feature point in the corresponding historical image.

需要说明的是，本实施例中历史图像中匹配出的特征点的第二像素模型的建立方式可以参考前文中第一像素模型的建立方式，此处不再详述。其中，在包含有至少两帧历史图像的滑动窗口内基于前一帧图像建立后一帧图像中的匹配特征点的第二像素模型，而滑动窗口内第一帧图像的特征点可以通过初始化预设，由此，滑动窗口内第一帧图像中的特征点的像素模型基于初始化预设的特征点建立，在滑动窗口中新加入的图像后，利用这些新加入的图像中提取出的特征点的像素模型对滑动窗口中前一帧图像中被匹配出的特征点的第二像素模型进行更新。It should be noted that in this embodiment, the method of establishing the second pixel model of the matched feature points in the historical image can refer to the method of establishing the first pixel model mentioned above, and will not be described in detail here. Among them, a second pixel model of matching feature points in the next frame image is established based on the previous frame image in a sliding window containing at least two frames of historical images, and the feature points of the first frame image in the sliding window can be pre-initialized through Suppose, thus, the pixel model of the feature points in the first frame of the image in the sliding window is established based on the initialized preset feature points. After the newly added images in the sliding window, the feature points extracted from these newly added images are used. The pixel model updates the second pixel model of the matched feature points in the previous frame image in the sliding window.

当然，第二像素模型中的内容是与第一像素模型中的内容相一致的，例如，第一像素模型中包含有当前帧图像中的特征点的多个邻居特征点的特征数据，还包含有第一像素模型的方向向量和历史图像中的特征点的重投影误差，而第二像素模型中包含有历史图像中的特征点的多个邻居特征点的特征数据，还包含有第二像素模型的方向向量和历史图像中的特征点的重投影误差的均值。Of course, the content in the second pixel model is consistent with the content in the first pixel model. For example, the first pixel model contains feature data of multiple neighbor feature points of the feature point in the current frame image, and also includes There is the reprojection error of the direction vector of the first pixel model and the feature points in the historical image, and the second pixel model contains feature data of multiple neighbor feature points of the feature points in the historical image, and also contains the second pixel The mean value of the reprojection error between the model's direction vector and the feature points in the historical image.

基于此，本实施例中将每个当前帧图像中匹配出的特征点的第一像素模型分别与各个历史图像中的与当前帧图像中匹配的特征点相匹配的特征点的第二像素模型进行比对，即一一对比，如图6中所示，由此，得到每个当前帧图像中匹配出的特征点对应的模型比对结果。Based on this, in this embodiment, the first pixel model of the matched feature points in each current frame image is separately matched with the second pixel model of the feature points in each historical image that matches the matched feature points in the current frame image. Comparison is performed, that is, one-to-one comparison, as shown in Figure 6. Thus, the model comparison results corresponding to the matched feature points in each current frame image are obtained.

具体的，本实施例中分别将每个当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，可以是每个当前帧图像中匹配出的特征点的第一像素模型分别与相应的第二像素模型在多个邻居点、方向向量和重投影误差中的每一项分别进行比对，进而得到每个当前帧图像中匹配出的特征点各自对应的多个邻居特征点、方向向量和重投影误差中每一项的比对结果，进而基于这些比对结果，获得到每个当前帧图像中匹配出的特征点对应的模型比对结果。Specifically, in this embodiment, the first pixel model of the matched feature points in each current frame image is compared with the corresponding second pixel model, which may be the first pixel model of the matched feature points in each current frame image. The first pixel model is compared with the corresponding second pixel model respectively on each of multiple neighbor points, direction vectors and reprojection errors, and then the corresponding features of the matched feature points in each current frame image are obtained. The comparison results of each item of multiple neighbor feature points, direction vectors and reprojection errors are then obtained. Based on these comparison results, the model comparison results corresponding to the matched feature points in each current frame image are obtained.

步骤105：根据模型比对结果，筛选出当前帧图像中的目标特征点。Step 105: Based on the model comparison results, filter out the target feature points in the current frame image.

其中，目标特征点为当前帧图像中对应的空间点不属于运动物体的特征点。Among them, the target feature points are the corresponding spatial points in the current frame image and do not belong to the feature points of the moving object.

具体的，本实施例中模型比对结果表征当前帧图像中匹配出的特征点对应的空间点是否属于运动物体，在模型比对结果中表征当前帧图像中匹配出的特征点与其相匹配的历史图像中的特征点之间的差异性超出一定的标准的情况下，表征当前帧图像中匹配出的特征点为相对于其相匹配的历史图像中的特征点运动的特征点，那么可以确定当前帧图像中匹配出的特征点对应的空间点属于运动物体，如果模型比对结果中表征当前帧图像中匹配出的特征点与其相匹配的历史图像中的特征点之间的差异性没有超出一定的标准，那么可以表征当前帧图像中匹配出的特征点为相对于其相匹配的历史图像中的特征点静止的特征点，那么可以确定当前帧图像中匹配出的特征点对应的空间点不属于运动物体，基于此，根据模型比对结果，对当前帧图像中对应的空间点属于运动物体的特征点进行剔除，从而筛选出对应的空间点不属于运动物体的目标特征点，即静态特征点。Specifically, in this embodiment, the model comparison result represents whether the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object, and the model comparison result represents whether the matched feature point in the current frame image is matched with it. When the difference between the feature points in the historical image exceeds a certain standard, it means that the matched feature point in the current frame image is a feature point that moves relative to the feature point in the matching historical image, then it can be determined The spatial point corresponding to the matched feature point in the current frame image belongs to a moving object. If the model comparison result indicates that the difference between the matched feature point in the current frame image and its matching feature point in the historical image does not exceed According to certain standards, it can be characterized that the matched feature points in the current frame image are static feature points relative to the feature points in the matching historical images, and then the spatial points corresponding to the matched feature points in the current frame image can be determined. Do not belong to moving objects. Based on this, based on the model comparison results, the corresponding spatial points in the current frame image are eliminated and the feature points belonging to moving objects are eliminated, thereby filtering out the corresponding spatial points that do not belong to the target feature points of moving objects, that is, static Feature points.

步骤106：根据当前帧图像中的目标特征点对应的重投影误差，获得移动设备的当前位姿。Step 106: Obtain the current pose of the mobile device based on the reprojection error corresponding to the target feature point in the current frame image.

具体的，本实施例中可以通过对重投影误差进行最小化，进而获得到目标特征点的重投影误差最小时对应的位姿变换矩阵，这里重投影误差根据目标特征点的与其匹配的历史图像中的特征点对应的空间点的三维坐标值和目标特征点在当前帧图像中的二维坐标值获得，之后，就可以根据位姿变换矩阵，获得移动设备的当前位姿。Specifically, in this embodiment, the reprojection error can be minimized to obtain the pose transformation matrix corresponding to the minimum reprojection error of the target feature point, where the reprojection error is based on the historical image of the target feature point that matches it. The three-dimensional coordinate value of the spatial point corresponding to the feature point in and the two-dimensional coordinate value of the target feature point in the current frame image are obtained. After that, the current pose of the mobile device can be obtained according to the pose transformation matrix.

进一步的，本实施例中可以在初步估计的移动设备的当前位姿的基础上，以初步估计的当前位姿为优化初始值，通过对当前帧图像中的目标特征点对应的重投影误差进行最小化，进而获得到目标特征点的重投影误差最小时对应的位姿变换矩阵，进而根据位姿变换矩阵获得到更为精确的移动设备的当前位姿。Furthermore, in this embodiment, based on the initially estimated current pose of the mobile device, the initially estimated current pose can be used as the optimization initial value, and the reprojection error corresponding to the target feature point in the current frame image can be performed. Minimize, and then obtain the corresponding pose transformation matrix when the reprojection error of the target feature point is minimum, and then obtain a more accurate current pose of the mobile device based on the pose transformation matrix.

由上述方案可知，本申请实施例一提供的一种位姿获取方法中，在移动设备上的图像采集装置所采集到的当前帧图像之后，对当前帧图像中与之前滑动窗口所包含的历史图像中的特征点相匹配的特征点进行获取，进而在分别获得这些当前帧图像中匹配出的特征点的对应于的像素模型之后，将该像素模型与滑动窗口所包含的历史图像中的特征点的像素模型进行比对，进而根据模型比对结果就可以筛选出当前帧图像中对应的空间点不属于运动物体的目标特征点，基于此，就能够根据这些目标特征点的重投影误差来获得移动设备的当前位姿。可见，本实施例中利用前后图像中相匹配的特征点的像素模型来对属于运动物体的特征点进行剔除，从而根据筛选出来的不属于运动物体的特征点实现对移动设备的定位，由此，能够避免运动物体的特征点对位姿获取的干扰，从而提高移动设备在动态场景下的定位准确性。It can be seen from the above solution that in the pose acquisition method provided in Embodiment 1 of the present application, after the current frame image is collected by the image acquisition device on the mobile device, the history contained in the current frame image and the previous sliding window is compared. The feature points that match the feature points in the image are obtained, and then after obtaining the pixel models corresponding to the matched feature points in the current frame image, the pixel model is combined with the features in the historical images included in the sliding window. The pixel models of the points are compared, and then based on the model comparison results, the corresponding spatial points in the current frame image can be screened out and the target feature points that do not belong to the moving object can be selected based on the reprojection error of these target feature points. Get the current pose of the mobile device. It can be seen that in this embodiment, the pixel model of the matching feature points in the front and rear images is used to eliminate the feature points belonging to the moving object, so as to realize the positioning of the mobile device based on the filtered feature points that do not belong to the moving object. , can avoid the interference of the feature points of moving objects on pose acquisition, thereby improving the positioning accuracy of mobile devices in dynamic scenes.

为了进一步提高定位准确性，步骤106中在根据当前帧图像中的目标特征点对应的重投影误差，获得移动设备的当前位姿时，具体可以通过以下步骤实现，如图7中所示：In order to further improve the positioning accuracy, when obtaining the current pose of the mobile device based on the reprojection error corresponding to the target feature point in the current frame image in step 106, the following steps can be implemented, as shown in Figure 7:

步骤701：根据目标特征点对应的空间点的深度值，获得目标特征点对应的权重值。Step 701: Obtain the weight value corresponding to the target feature point according to the depth value of the spatial point corresponding to the target feature point.

其中，目标特征点对应的空间点的深度值可以根据目标特征点所在当前帧图像对应的深度图像中的深度值获得，基于此，权重值基于该深度值获得。例如，第i个目标特征点对应的空间点的深度值以ω_i表示，其中，λ是一个尺度常数，d_i为第i个目标特征点在历史图像中匹配的特征点对应的空间点在当前帧图像对应的深度图像中的深度值。这是因为对定位产生干扰的运动物体通常处于距离移动设备较近的位置，而距离较远的运动物体在较短时间内的移动在像素平面上所产生的位置相对较小，因此，本实施例中基于目标特征点对应的空间点的深度值对目标特征点配置权重值，由此，提高后续位姿获取的准确性。Among them, the depth value of the spatial point corresponding to the target feature point can be obtained according to the depth value in the depth image corresponding to the current frame image where the target feature point is located. Based on this, the weight value is obtained based on the depth value. For example, the depth value of the spatial point corresponding to the i-th target feature point is represented by ω _i , where, λ is a scale constant, and _di is the depth value of the spatial point corresponding to the feature point matched by the i-th target feature point in the historical image in the depth image corresponding to the current frame image. This is because moving objects that interfere with positioning are usually located closer to the mobile device, and the position of moving objects farther away in a shorter period of time on the pixel plane is relatively small. Therefore, this implementation In the example, a weight value is configured for the target feature point based on the depth value of the spatial point corresponding to the target feature point, thereby improving the accuracy of subsequent pose acquisition.

步骤702：根据目标特征点对应的权重值，获得目标特征点的重投影误差最小时对应的位姿变换矩阵。Step 702: According to the weight value corresponding to the target feature point, obtain the pose transformation matrix corresponding to the minimum reprojection error of the target feature point.

其中，重投影误差根据目标特征点在历史图像中对应的三维坐标值和目标特征点在当前帧图像中的二维坐标值获得。例如，本实施例中根据历史图像对应的深度图像获得历史图像中目标特征点对应的空间点的深度值，进而得到这些空间点的三维坐标值，再利用初始的位姿变换矩阵将三维坐标值转换到当前帧图像对应的当前坐标系中，之后投影出这些空间点在当前坐标系中的二维坐标值，再将这些二维坐标值与当前帧图像中匹配出的相应特征点的二维坐标值进行误差计算，得到重投影误差。基于此，本实施例中根据目标特征点对应的权重值，对相应的重投影误差进行加权，以得到最优的位姿变换矩阵。Among them, the reprojection error is obtained based on the corresponding three-dimensional coordinate value of the target feature point in the historical image and the two-dimensional coordinate value of the target feature point in the current frame image. For example, in this embodiment, the depth values of the spatial points corresponding to the target feature points in the historical images are obtained based on the depth images corresponding to the historical images, and then the three-dimensional coordinate values of these spatial points are obtained, and then the initial pose transformation matrix is used to convert the three-dimensional coordinate values Convert to the current coordinate system corresponding to the current frame image, then project the two-dimensional coordinate values of these spatial points in the current coordinate system, and then compare these two-dimensional coordinate values with the two-dimensional coordinate values of the corresponding feature points matched in the current frame image. Error calculation is performed on the coordinate values to obtain the reprojection error. Based on this, in this embodiment, the corresponding reprojection error is weighted according to the weight value corresponding to the target feature point to obtain the optimal pose transformation matrix.

步骤703：根据位姿变换矩阵，获得移动设备的当前位姿。Step 703: Obtain the current pose of the mobile device according to the pose transformation matrix.

其中，本实施例中可以采用预设的定位算法根据位姿变换矩阵，计算出移动设备的当前位姿。In this embodiment, a preset positioning algorithm can be used to calculate the current pose of the mobile device based on the pose transformation matrix.

具体的，本实施例中可以先根据历史图像对应的深度图像获得与目标特征点相匹配的历史图像中的特征点的深度值，进而还原出历史图像中分别与每个目标特征点相匹配的特征点在空间中的三维坐标值，基于此，就可以将历史图像中的与每个目标特征点相匹配的特征点的三维坐标值和当前帧图像中各个目标特征点的二维坐标值构成三维-二维点对，即3D-2D点对，基于此，构建重投影误差方程以及相应的位姿变换矩阵，以表示，其中，F(·)为Huber核函数，其作用是当出现特征点误匹配时，这时重投影误差会很大，通过核函数可以限制重投影误差的增长，而且因为Huber核函数是光滑的，方便求导，利于重投影误差方程的最小化求解。π(·)是基于图像采集装置如相机模型的投影变换，可以将相机坐标系下3D坐标变化为相机图像中的2D像素坐标，i为3D-2D点对的序号，一共n对，u_i＝[x_i,y_i]^T为3D-2D点对中2D点在像素平面即当前帧图像中的二维坐标，P_i＝[X_i,Y_i,Z_i]^T为3D-2D点对中3D点在空间中即历史图像中对应的空间点的三维坐标。Specifically, in this embodiment, the depth value of the feature points in the historical image that matches the target feature point can be first obtained based on the depth image corresponding to the historical image, and then the depth values of the feature points in the historical image that match each target feature point can be restored. The three-dimensional coordinate value of the feature point in space. Based on this, the three-dimensional coordinate value of the feature point in the historical image that matches each target feature point and the two-dimensional coordinate value of each target feature point in the current frame image can be composed. Three-dimensional-two-dimensional point pairs, that is, 3D-2D point pairs. Based on this, the reprojection error equation and the corresponding pose transformation matrix are constructed to represents, among them, F(·) is the Huber kernel function. Its function is that when a feature point mismatch occurs, the reprojection error will be very large. The kernel function can limit the growth of the reprojection error, and because the Huber kernel function is Smooth, convenient for derivation, and conducive to minimizing the solution of the reprojection error equation. π(·) is a projection transformation based on an image acquisition device such as a camera model. It can change the 3D coordinates in the camera coordinate system into the 2D pixel coordinates in the camera image. i is the serial number of the 3D-2D point pair. There are n pairs in total, u _i =[ _xi ,y _i ] ^T is the two-dimensional coordinates of the 2D point in the pixel plane, that is, the current frame image, in the 3D-2D point pair, P _i =[X _i ,Y _i ,Z _i ] ^T is the 3D-2D point Center the 3D point in space, that is, the three-dimensional coordinates of the corresponding spatial point in the historical image.

基于此，在最小化重投影误差的函数时，就可以得到最合适的位姿变换矩阵。具体的，本实施例中可以通过EPNP算法对重投影误差函数进行非线性优化求解，由此得到位姿变换矩阵。基于此，本实施例中以初步估计的移动设备的当前位姿为初始值，就可以利用位姿变换矩阵获得到更为精确的移动设备的当前位姿。Based on this, when minimizing the function of the reprojection error, the most appropriate pose transformation matrix can be obtained. Specifically, in this embodiment, the EPNP algorithm can be used to perform nonlinear optimization on the reprojection error function, thereby obtaining the pose transformation matrix. Based on this, in this embodiment, using the initially estimated current pose of the mobile device as the initial value, the pose transformation matrix can be used to obtain a more accurate current pose of the mobile device.

进一步的，为了提高第一像素模型的准确性，本实施例中需要在当前帧图像中匹配出的特征点的邻居特征点中筛选出最能够表征当前帧图像中的特征点的特点的邻居特征点，基于此，本实施例中在步骤103中分别获得每个所述当前帧图像中匹配出的特征点的第一像素模型时，具体可以通过以下步骤实现，如图8中所示：Further, in order to improve the accuracy of the first pixel model, in this embodiment, it is necessary to select the neighbor features that best characterize the characteristics of the feature points in the current frame image from the neighbor feature points of the matched feature points in the current frame image. Based on this, in this embodiment, when obtaining the first pixel model of each matched feature point in the current frame image in step 103, this can be achieved through the following steps, as shown in Figure 8:

步骤801：分别以每个当前帧图像中匹配出的特征点为中心，获得当前帧图像中匹配出的特征点的多个邻居特征点。Step 801: Taking the matched feature point in each current frame image as the center, obtain multiple neighbor feature points of the matched feature point in the current frame image.

其中，当前帧图像中匹配出的特征点与其对应的邻居特征点的深度值不同，也就是说，当前帧图像中匹配出的特征点的邻居特征点的深度值与作为中心的当前帧图像中匹配出的特征点的深度值不同，且当前帧图像中匹配出的特征点的邻居特征点为深度值大于目标深度且与当前帧图像中匹配出的特征点之间的距离满足距离排序规则的特点，目标深度与所述当前帧图像中匹配出的特征点的24邻域像素点的深度均值相关。Among them, the depth values of the matched feature points in the current frame image and their corresponding neighbor feature points are different. That is to say, the depth values of the neighbor feature points of the matched feature points in the current frame image are different from the depth values of the neighbor feature points in the current frame image as the center. The depth values of the matched feature points are different, and the neighbor feature points of the matched feature points in the current frame image have depth values greater than the target depth and the distance between them and the matched feature points in the current frame image satisfies the distance sorting rule. Characteristically, the target depth is related to the depth average of the 24 neighborhood pixels of the matched feature point in the current frame image.

以筛选出N个邻居特征点为例，具体筛选方式如下：Taking the selection of N neighbor feature points as an example, the specific screening method is as follows:

首先，以每个当前帧图像中匹配出的特征点为中心，获得当前帧图像中匹配出的特征点的24邻域或8邻域内的像素点，如图9中所示；First, taking the matched feature point in each current frame image as the center, obtain the pixel points within the 24 neighborhoods or 8 neighborhoods of the matched feature points in the current frame image, as shown in Figure 9;

然后，计算当前帧图像中匹配出的特征点p′的24邻域或8邻域的像素点的深度值的均值，以表示，例如，/>d_i为24邻域的像素点的深度值；基于此，筛选出深度值大于目标深度如/>的像素点；Then, calculate the mean of the depth values of the pixels in the 24 neighborhoods or 8 neighborhoods of the matched feature point p′ in the current frame image, to means, for example, /> d _i is the depth value of the pixels in the 24-neighborhood; based on this, filter out the depth values greater than the target depth, such as/> pixels;

之后，计算当前帧图像中匹配出的特征点p′与各个邻居特征点之间的距离，如欧式距离，将深度值大于且与当前帧图像中匹配出的特征点p′的距离从小到大排序在前N个的邻居特征点作为用于建立第一像素模型的邻居特征点。After that, the distance between the matched feature point p′ in the current frame image and each neighbor feature point is calculated, such as the Euclidean distance, and the depth value is greater than And the top N neighbor feature points ranked in ascending order of distance from the matched feature point p′ in the current frame image are used as neighbor feature points for establishing the first pixel model.

以上筛选方案可以排除与当前帧图像中匹配出的特征点p′属于同一刚体即同一物体的邻居特征点也被加入到第一像素模型的建立中的情况，这是因为在深度接近的情况下，像素点极有可能处于同一刚体，即使运动发生了变化，它们之间的关系也不会发生太大变化，因此，本实施例中剔除可能与当前帧图像中匹配出的特征点属于同一刚体的邻居特征点，能够提高第一像素模型的准确性。The above filtering scheme can exclude the situation where the feature point p′ matched in the current frame image belongs to the same rigid body, that is, the neighbor feature points of the same object are also added to the establishment of the first pixel model. This is because in the case of close depth , the pixel points are very likely to be in the same rigid body. Even if the motion changes, the relationship between them will not change much. Therefore, in this embodiment, the feature points eliminated in this embodiment may belong to the same rigid body as the matched feature points in the current frame image. Neighbor feature points can improve the accuracy of the first pixel model.

步骤802：至少根据当前帧图像中匹配出的特征点的多个邻居特征点，建立当前帧图像中匹配出的特征点的第一像素模型。Step 802: Establish a first pixel model of the matched feature point in the current frame image based on at least a plurality of neighbor feature points of the matched feature point in the current frame image.

具体的，本实施例中可以根据这多个邻居特征点的特征数据建立第一像素模型，第一像素模型中可以包含有这多个邻居特征点各自的特征数据，还包含有这多个邻居特征点对应的方向向量和当前帧图像中匹配出的特征点的重投影误差。Specifically, in this embodiment, a first pixel model can be established based on the feature data of the multiple neighbor feature points. The first pixel model can include the feature data of each of the multiple neighbor feature points, and also includes the multiple neighbor features. The direction vector corresponding to the feature point and the reprojection error of the matched feature point in the current frame image.

进一步的，步骤104中在分别将每个当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对时，具体可以通过以下步骤实现，如图10中所示：Further, when comparing the first pixel model of the matched feature point in each current frame image with the corresponding second pixel model in step 104, the following steps can be implemented, as shown in Figure 10 :

步骤1001：获得第一像素模型和其对应的第二像素模型中相匹配的邻居特征点的数量。Step 1001: Obtain the number of matching neighbor feature points in the first pixel model and its corresponding second pixel model.

其中，相匹配的邻居特征点是指第一像素模型中与其对应的第二像素模型中相对应的邻居特征点之间的差异值小于或等于差异阈值。以当前帧图像中匹配出的特征点p′的第一像素模型{p′₁,p′₂,p′₃…p′_N，n′，r′}和历史图像中相匹配的特征点p的第二像素模型为例，本实施例中可以将第一像素模型中的邻居特征点和第二像素模型中对应的邻居特征点分别进行比对，从而获得距离、像素值和深度值等维度上的一个或多个差异值，在差异值小于相应的差异阈值的情况下，认为第一像素模型中的邻居特征点与第二像素模型中的相应的邻居特征点是相似的，即相匹配的，而如果差异值大于或等于相应的差异阈值，那么认为第一像素模型中的邻居特征点与第二像素模型中的相应的邻居特征点是不匹配的。Wherein, matching neighbor feature points means that the difference value between the corresponding neighbor feature points in the first pixel model and its corresponding second pixel model is less than or equal to the difference threshold. The first pixel model {p′ ₁ , p′ ₂ , p′ ₃ …p′ _N , n′, r′} of the matched feature point p′ in the current frame image and the matched feature point p in the historical image The second pixel model of For example, in this embodiment, the neighbor feature points in the first pixel model and the corresponding neighbor feature points in the second pixel model can be compared respectively to obtain one or more dimensions such as distance, pixel value, and depth value. difference values, when the difference value is less than the corresponding difference threshold, it is considered that the neighbor feature points in the first pixel model and the corresponding neighbor feature points in the second pixel model are similar, that is, they match, and if the difference If the value is greater than or equal to the corresponding difference threshold, then it is considered that the neighbor feature points in the first pixel model do not match the corresponding neighbor feature points in the second pixel model.

基于此，本实施例中获得到相匹配的邻居特征点的数量，即第一像素模型中与其对应的第二像素模型中差异值小于或等于差异阈值的邻居特征点的数量。Based on this, in this embodiment, the number of matching neighbor feature points is obtained, that is, the number of neighbor feature points in the first pixel model whose difference value is less than or equal to the difference threshold in the corresponding second pixel model.

步骤1002：获得第一像素模型中的方向向量和其对应的第二像素模型中的方向向量之间的差值的模长。Step 1002: Obtain the module length of the difference between the direction vector in the first pixel model and its corresponding direction vector in the second pixel model.

其中，在运动物体是垂直于相机平面运动的情况下，两个像素模型中的邻居特征点之间会出现相似度极高的情况，即差异值极低的情况，因此，为了进一步提高准确性，本实施例中增加方向向量的对比条件，基于此，本实施例中需要获得两个像素模型的方向向量之间的差值的模长，以||n-n′||₂表示，n为第二像素模型中的方向向量。Among them, when the moving object moves perpendicular to the camera plane, the similarity between the neighbor feature points in the two pixel models will be extremely high, that is, the difference value will be extremely low. Therefore, in order to further improve the accuracy , in this embodiment, the contrast condition of the direction vector is added. Based on this, in this embodiment, it is necessary to obtain the modulus length of the difference between the direction vectors of the two pixel models, represented by ||nn′|| ₂ , n is the th Direction vector in a two-pixel model.

步骤1003：判断第一像素模型中当前帧图像中匹配出的特征点的重投影误差是否小于或等于与第二像素模型中历史图像中匹配出的特征点对应的重投影误差的均值，以得到判断结果。Step 1003: Determine whether the reprojection error of the matched feature points in the current frame image in the first pixel model is less than or equal to the mean value of the reprojection error corresponding to the matched feature points in the historical image in the second pixel model to obtain critical result.

其中，均值为历史图像中匹配出的特征点在滑动窗口中累积重投影误差的均值。Among them, the mean value is the mean value of the cumulative reprojection error of the matched feature points in the historical image in the sliding window.

需要说明的是，为了通过重投影误差的累积对比特征点是动态点的可能性，本实施例中的均值以表示，m为滑动窗口中历史图像的数量，/>为历史图像中匹配出的特征点p在滑动窗口中第i帧历史图像中对应的重投影误差。基于此，本实施例中将第一像素模型中当前帧图像中匹配出的特征点的重投影误差r_p′与均值/>进行大小判断，由此得到均值是否大于第一像素模型中当前帧图像中匹配出的特征点的重投影误差的判断结果。It should be noted that, in order to compare the possibility that the feature points are dynamic points through the accumulation of reprojection errors, the mean value in this embodiment is means, m is the number of historical images in the sliding window, /> is the corresponding reprojection error of the matched feature point p in the historical image in the i-th frame of the historical image in the sliding window. Based on this, in this embodiment, the reprojection error r _p′ of the matched feature points in the current frame image in the first pixel model is compared with the mean value/> A size judgment is performed to obtain a judgment result of whether the mean value is greater than the reprojection error of the matched feature points in the current frame image in the first pixel model.

其中，步骤1002和步骤1003以及步骤1001之间的执行顺序可以变化，例如，可以先执行步骤1002，再执行其他步骤，或者同时执行三个步骤，所得到的不同的技术方案属于同一发明构思，均在本申请的保护范围内。Among them, the execution order between step 1002, step 1003 and step 1001 can be changed. For example, step 1002 can be executed first, and then other steps can be executed, or three steps can be executed at the same time. The different technical solutions obtained belong to the same inventive concept. All are within the protection scope of this application.

步骤1004：根据数量以及模长和判断结果，获得模型比对结果。Step 1004: Obtain model comparison results based on the quantity, module length and judgment results.

其中，在数量超过预设的数值、模长小于预设值且重投影误差小于均值的情况下，所得到的模型比对结果表征当前帧图像中匹配出的特征点对应的空间点不属于运动物体，即静态点。Among them, when the number exceeds the preset value, the module length is less than the preset value, and the reprojection error is less than the mean value, the obtained model comparison result indicates that the spatial point corresponding to the matched feature point in the current frame image does not belong to motion. Object, i.e. static point.

也就是说，在至少数量超过预设的数值如3或5的情况下，为了进一步提高准确性，判断方向向量差值的模长是否在一定范围内如小于预设值δ和重投影误差是否小于均值，来获得模型比对结果。That is to say, when at least the number exceeds a preset value such as 3 or 5, in order to further improve the accuracy, it is determined whether the modulus length of the direction vector difference is within a certain range, such as less than the preset value δ and whether the reprojection error is less than the mean to obtain model comparison results.

以根据数量是否超过预设的数值、方向向量差值的模长是否在一定范围内和重投影误差是否小于均值来获得模型比对结果为例：For example, the model comparison results are obtained based on whether the number exceeds a preset value, whether the module length of the direction vector difference is within a certain range, and whether the reprojection error is less than the mean:

如果数量超过预设的数值、方向向量差值的模长在一定范围内而且重投影误差小于滑动窗口内对应的均值，那么可以得到确定当前帧图像中匹配出的特征点对应的空间点属于静态点即不属于运动物体的模型比对结果；If the number exceeds the preset value, the module length of the direction vector difference is within a certain range and the reprojection error is less than the corresponding mean value within the sliding window, then it can be determined that the spatial point corresponding to the matched feature point in the current frame image is static. Points are model comparison results that do not belong to moving objects;

如果数量没有超过预设的数值、或者方向向量差值的模长超出预设的范围内或者重投影误差已经大于滑动窗口内对应的均值，那么可以得到确定当前帧图像中匹配出的特征点对应的空间点属于动态点即属于运动物体的模型比对结果。If the number does not exceed the preset value, or the module length of the direction vector difference exceeds the preset range, or the reprojection error is greater than the corresponding mean value within the sliding window, then the corresponding feature points matched in the current frame image can be determined. The spatial points belong to dynamic points, that is, the model comparison results of moving objects.

也就是说，以上三个条件中：数量超过预设的数值、方向向量差值的模长在一定范围内即||n-n′||₂<δ、重投影误差小于滑动窗口内对应的均值即只有三个条件都满足时，才能确定当前帧图像中匹配出的特征点对应的空间点为静态点，只要有一个条件不满足，那么可以确定当前帧图像中匹配出的特征点对应的空间点为动态点。That is to say, among the above three conditions: the number exceeds the preset value, the module length of the direction vector difference is within a certain range, that is, ||nn′|| ₂ <δ, and the reprojection error is less than the corresponding mean value within the sliding window, that is, Only when all three conditions are met, the spatial point corresponding to the matched feature point in the current frame image can be determined to be a static point. As long as one condition is not met, then the spatial point corresponding to the matched feature point in the current frame image can be determined. is a dynamic point.

在具体实现中，滑动窗口内历史图像中的每个特征点均具有像素模型，本实施例中将历史图像中与当前帧图像中匹配出的特征点相匹配的特征点的像素模型记为第二像素模型，以便于与当前帧图像中匹配出的特征点的第一像素模型和没有特征点相匹配的特征点的像素模型进行区分。In a specific implementation, each feature point in the historical image within the sliding window has a pixel model. In this embodiment, the pixel model of the feature point in the historical image that matches the feature point matched in the current frame image is recorded as the third The two-pixel model is used to distinguish the first pixel model of the feature point matched in the current frame image from the pixel model of the feature point that does not match the feature point.

基于此，进一步的，为了增加滑动窗口内历史图像中每个特征点的像素模型内特征点的多样性，保证历史图像所在滑动窗口内老旧图像的像素模型中的特征点不被轻易代替，本实施例中可以对历史图像中的像素模型进行更新，具体图11中所示：Based on this, further, in order to increase the diversity of feature points in the pixel model of each feature point in the historical image in the sliding window, and ensure that the feature points in the pixel model of the old image in the sliding window where the historical image is located are not easily replaced, In this embodiment, the pixel model in the historical image can be updated, as shown in Figure 11:

步骤1101：获得针对于历史图像中特征点的像素模型的更新请求。Step 1101: Obtain an update request for the pixel model of the feature point in the historical image.

其中，更新请求可以自动生成或者用户操作生成，该更新请求中可以包含有需要被更新的历史图像中的某一个或多个特征点的标识。The update request can be automatically generated or generated by user operation, and the update request can include the identification of one or more feature points in the historical image that need to be updated.

步骤1102：获得滑动窗口内新加入的图像。Step 1102: Obtain the newly added image in the sliding window.

其中，新加入的图像可以为最近一次经过位姿定位处理的且为关键帧图像的当前帧图像，此时，该当前帧图像已经作为历史图像加入到滑动窗口中，记为新加入的图像。需要说明的是，在当前帧图像为关键帧图像的情况下，该当前帧图像会被加入到滑动窗口中。Among them, the newly added image can be the current frame image that has recently undergone pose positioning processing and is a key frame image. At this time, the current frame image has been added to the sliding window as a historical image and is recorded as a newly added image. It should be noted that when the current frame image is a key frame image, the current frame image will be added to the sliding window.

需要说明的是，新加入的图像是经过定位处理的关键帧图像，该经过定位处理的关键帧图像在有新的图像被获得并作为新的当前帧图像时被当成历史图像加入到滑动窗口中。It should be noted that the newly added image is a key frame image that has been positioned and processed. When a new image is obtained and used as a new current frame image, the key frame image that has been positioned is added to the sliding window as a historical image. .

步骤1103：对新加入的图像进行特征点提取，以得到新加入的图像中的特征点。Step 1103: Extract feature points from the newly added image to obtain feature points in the newly added image.

其中，新加入的图像中可以通过特征提取算法等方式提取到各种特征点，新加入的图像中的每个特征点在历史图像中可能有相匹配的特征点，如在历史图像中有特征点与新加入的图像中的特征点关于描述子相匹配，也可能新加入的图像中的每个特征点在历史图像中没有相匹配的特征点。Among them, various feature points can be extracted from the newly added image through feature extraction algorithms and other methods. Each feature point in the newly added image may have matching feature points in the historical image. For example, there are features in the historical image. The points match the descriptor with the feature points in the newly added image. It is also possible that each feature point in the newly added image does not have a matching feature point in the historical image.

步骤1104：将新加入的图像中的特征点与滑动窗口中的历史图像中的特征点进行匹配，以得到新加入的图像中与历史图像中的特征点相匹配的匹配特征点和新加入的图像中与历史图像中的特征点均不匹配的非匹配特征点。Step 1104: Match the feature points in the newly added image with the feature points in the historical image in the sliding window to obtain the matching feature points in the newly added image that match the feature points in the historical image and the newly added Non-matching feature points in the image that do not match any feature points in the historical image.

其中，匹配特征点是指新加入的图像中的在历史图像中存在特征点相匹配的特征点，非匹配特征点是指新加入的图像中的在历史图像中没有特征点相匹配的特征点。Among them, the matching feature points refer to the feature points in the newly added image that match the feature points in the historical image, and the non-matching feature points refer to the feature points in the newly added image that do not match the feature points in the historical image. .

具体的，本实施例中可以将新加入的图像中的特征点与滑动窗口中的历史图像中的特征点ORB描述子匹配，进而得到匹配特征点和非匹配特征点。Specifically, in this embodiment, the feature points in the newly added image can be matched with the feature point ORB descriptors in the historical images in the sliding window to obtain matching feature points and non-matching feature points.

步骤1105：根据匹配特征点，对历史图像中与匹配特征点相匹配的特征点的像素模型进行更新。Step 1105: According to the matching feature points, update the pixel model of the feature points in the historical image that match the matching feature points.

具体的，本实施例中可以先对匹配特征点建立像素模型，该匹配特征点的像素模型对应于匹配特征点的多个邻居特征点、方向向量以及重投影误差，之后，本实施例中利用匹配特征点的像素模型中的邻居特征点对历史图像中与匹配特征点相匹配的特征点的像素模型进行更新。Specifically, in this embodiment, a pixel model can be first established for the matching feature points. The pixel model of the matching feature points corresponds to multiple neighbor feature points, direction vectors and reprojection errors of the matching feature points. After that, in this embodiment, using The neighbor feature points in the pixel model of the matching feature point update the pixel model of the feature point in the historical image that matches the matching feature point.

例如，对于历史图像中的特征点p的第二像素模型在滑动窗口内新加入的图像F中找到了匹配的特征点p^F，该特征点具有相应的像素模型M(p^F)，对于/>中的每个点，其被更新的概率相同，更新概率为1/N；之后，可以选取与p^F距离最小如欧式距离最小的特征点/>对第二像素模型M(p)进行更新。For example, for the second pixel model of the feature point p in the historical image The matching feature point p ^F is found in the newly added image F within the sliding window. This feature point has a corresponding pixel model M(p ^F ). For/> Each point in has the same probability of being updated, and the update probability is 1/N; after that, the feature point with the smallest distance from p ^F , such as the smallest Euclidean distance, can be selected/> The second pixel model M(p) is updated.

基于此，在第二像素模型中的像素特征点被更新后，可以重新计算第二像素模型中的方向向量以及重投影误差。而更新后的第二像素模型可以用于模型比对，从而得到更准确的模型比对结果，提高位姿获取的准确性。Based on this, after the pixel feature points in the second pixel model are updated, the direction vector and the reprojection error in the second pixel model can be recalculated. The updated second pixel model can be used for model comparison, thereby obtaining more accurate model comparison results and improving the accuracy of pose acquisition.

步骤1106：获得非匹配特征点的像素模型，以使得在新加入的图像作为滑动窗口中的历史图像时，非匹配特征点的像素模型作为历史图像中新的特征点的像素模型。Step 1106: Obtain the pixel model of the non-matching feature point, so that when the newly added image is used as the historical image in the sliding window, the pixel model of the non-matching feature point is used as the pixel model of the new feature point in the historical image.

也就是说，新加入的图像被添加到滑动窗口之后，该新加入的图像已经作为滑动窗口中的历史图像，因此新加入的图像中的非匹配特征点的像素模型直接作为历史图像中特征点的像素模型。基于此，在使用新的当前帧图像对移动设备的当前位姿进行定位时，将新的当前帧图像中提取出的特征点与滑动窗口中加入有新图像的历史图像中提取到的特征点进行匹配，在新的当前帧图像中匹配出与历史图像中的特征点相匹配的特征点之后，将新的当前帧图像中匹配出的特征点的像素模型与其在历史图像中相匹配的特征点的像素模型进行比对，进而筛选出新的当前帧图像中的目标特征点，即静态点，再基于这些静态点对应的重投影误差对移动设备的当前位姿进行精确估计。That is to say, after the newly added image is added to the sliding window, the newly added image has been used as a historical image in the sliding window, so the pixel model of the non-matching feature points in the newly added image is directly used as the feature point in the historical image. pixel model. Based on this, when using the new current frame image to locate the current pose of the mobile device, the feature points extracted from the new current frame image are added to the sliding window with the feature points extracted from the historical images containing the new image. Matching is performed. After matching the feature points in the new current frame image that match the feature points in the historical image, the pixel model of the matched feature points in the new current frame image is matched with the features in the historical image. The pixel models of the points are compared, and then the target feature points in the new current frame image are screened out, that is, static points, and then the current pose of the mobile device is accurately estimated based on the reprojection errors corresponding to these static points.

需要说明的是，步骤1105和步骤1106的执行顺序不限于附图中所示的顺序，例如，也可以先执行步骤1106，再执行步骤1105，或者，同时执行步骤1105和步骤1106，步骤1105和步骤1106的执行顺序不同所形成的不同的技术方案属于同一发明构思，均在本申请的保护范围内。It should be noted that the execution order of steps 1105 and 1106 is not limited to the order shown in the drawings. For example, step 1106 may be executed first, and then step 1105 may be executed, or steps 1105 and 1106 may be executed simultaneously, and steps 1105 and 1106 may be executed simultaneously. Different technical solutions formed by different execution orders of step 1106 belong to the same inventive concept and are all within the protection scope of this application.

为进一步提高准确性，本实施例中在步骤105根据所述模型比对结果，筛选出所述当前帧图像中的目标特征点之后，在步骤106中根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿之前，本实施例中的方法还可以包括以下步骤，如图12中所示：In order to further improve the accuracy, in this embodiment, after filtering out the target feature points in the current frame image according to the model comparison result in step 105, in step 106 based on the target feature points in the current frame image Before obtaining the corresponding reprojection error and the current pose of the mobile device, the method in this embodiment may also include the following steps, as shown in Figure 12:

步骤107：在当前帧图像中的目标特征点中，筛选出与历史图像中的特征点满足极线约束的目标特征点，之后，再执行步骤106。Step 107: From the target feature points in the current frame image, select the target feature points that satisfy the epipolar constraints with the feature points in the historical image, and then perform step 106.

也就是说，考虑到可能存在漏检或者之前处理中没有找到匹配的特征点的情况，本申请中引入多视几何的约束进行处理，进而提出目标特征点中不满足极线约束的特征点，只保留满足极线约束的目标特征点，从而提高后续位姿获取的准确性。That is to say, considering that there may be missed detections or no matching feature points were found in the previous processing, this application introduces the constraints of multi-view geometry for processing, and then proposes the feature points that do not satisfy the epipolar constraints among the target feature points, Only the target feature points that satisfy the epipolar constraints are retained, thereby improving the accuracy of subsequent pose acquisition.

参考图13，为本申请实施例二提供的一种位姿获取装置的结构示意图，该装置可以配置在能够对图像进行处理的电子设备中，如具有图像采集设备的移动设备，如移动机器人等，或者可以为与移动设备相连接的设备，如计算机或服务器等。本实施例中的技术方案用于提高对移动设备进行位姿获取的准确性。Referring to Fig. 13, which is a schematic structural diagram of a posture acquisition device provided in Embodiment 2 of the present application. The device can be configured in an electronic device capable of processing images, such as a mobile device with an image acquisition device, such as a mobile robot, etc. , or it can be a device connected to the mobile device, such as a computer or server. The technical solution in this embodiment is used to improve the accuracy of obtaining the pose of the mobile device.

具体的，本实施例中的装置可以包括以下单元：Specifically, the device in this embodiment may include the following units:

图像获得单元1301，用于获得当前帧图像，所述当前帧图像为移动设备上的图像采集装置所采集到的图像；The image acquisition unit 1301 is used to obtain the current frame image, which is an image collected by the image acquisition device on the mobile device;

特征点处理单元1302，用于获得所述当前帧图像中匹配出的特征点，其中，所述当前帧图像中匹配出的特征点为在所述当前帧图像对应的滑动窗口所包含的历史图像中有相匹配的特征点的特征点；所述滑动窗口中包含多帧历史图像，所述历史图像为所述当前帧图像之前的关键帧图像；The feature point processing unit 1302 is used to obtain the matched feature points in the current frame image, where the matched feature points in the current frame image are historical images included in the sliding window corresponding to the current frame image. There are feature points with matching feature points in the sliding window; the sliding window contains multiple frames of historical images, and the historical images are key frame images before the current frame image;

模型建立单元1303，用于分别获得每个所述当前帧图像中匹配出的特征点的第一像素模型，所述第一像素模型对应于所述当前帧图像中匹配出的特征点的多个邻居特征点，且所述第一像素模型还具有所述第一像素模型的方向向量和所述当前帧图像中匹配出的特征点的重投影误差；The model building unit 1303 is configured to obtain a first pixel model of each feature point matched in the current frame image, where the first pixel model corresponds to a plurality of feature points matched in the current frame image. Neighbor feature points, and the first pixel model also has the direction vector of the first pixel model and the reprojection error of the matched feature points in the current frame image;

模型比对单元1304，用于分别将每个所述当前帧图像中匹配出的特征点的第一像素模型与相应的第二像素模型进行比对，以得到模型比对结果，所述模型比对结果表征所述当前帧图像中匹配出的特征点对应的空间点是否属于运动物体；其中，所述第二像素模型为所述历史图像中与所述当前帧图像中匹配出的特征点相匹配的特征点的像素模型，所述第二像素模型对应于所述历史图像中匹配出的特征点的多个邻居特征点，且所述第二像素模型还具有所述第二像素模型的方向向量和所述历史图像中匹配出的特征点的重投影误差的均值；The model comparison unit 1304 is configured to compare the first pixel model of the matched feature point in each current frame image with the corresponding second pixel model to obtain a model comparison result. The result represents whether the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object; wherein, the second pixel model is the corresponding feature point in the historical image and the matched feature point in the current frame image. A pixel model of the matched feature point, the second pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the historical image, and the second pixel model also has the direction of the second pixel model The vector and the mean value of the reprojection error of the matched feature points in the historical image;

特征点筛选单元1305，用于根据所述模型比对结果，筛选出所述当前帧图像中的目标特征点，所述目标特征点为所述当前帧图像中对应的空间点不属于运动物体的特征点；The feature point screening unit 1305 is configured to screen out target feature points in the current frame image according to the model comparison results. The target feature points are corresponding spatial points in the current frame image that do not belong to moving objects. Feature points;

位姿获得单元1306，用于根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿。The pose obtaining unit 1306 is configured to obtain the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image.

由上述方案可知，本申请实施例一提供的一种位姿获取装置，在移动设备上的图像采集装置所采集到的当前帧图像之后，对当前帧图像中与之前滑动窗口所包含的历史图像中的特征点相匹配的特征点进行获取，进而在分别获得这些当前帧图像中的特征点的对应于多个邻居特征点的像素模型之后，将该像素模型与滑动窗口所包含的历史图像中的特征点的像素模型进行比对，进而根据模型比对结果就可以筛选出当前帧图像中对应的空间点不属于运动物体的目标特征点，基于此，就能够根据这些目标特征点的重投影误差来获得移动设备的当前位姿。可见，本实施例中利用前后图像中相匹配的特征点的像素模型来对属于运动物体的特征点进行剔除，从而根据筛选出来的不属于运动物体的特征点实现对移动设备的定位，由此，能够避免运动物体的特征点对位姿获取的干扰，从而提高移动设备在动态场景下的定位准确性。It can be seen from the above solution that the pose acquisition device provided in Embodiment 1 of the present application, after the current frame image is collected by the image acquisition device on the mobile device, compares the current frame image with the historical images contained in the previous sliding window. The feature points that match the feature points in the current frame image are obtained, and then after obtaining the pixel models corresponding to multiple neighbor feature points of the feature points in the current frame image, the pixel model is compared with the historical images included in the sliding window. The pixel models of the feature points are compared, and then based on the model comparison results, the corresponding spatial points in the current frame image can be filtered out and the target feature points that do not belong to the moving object can be screened out. Based on this, the reprojection of these target feature points can be carried out. error to obtain the current pose of the mobile device. It can be seen that in this embodiment, the pixel model of the matching feature points in the front and rear images is used to eliminate the feature points belonging to the moving object, so as to realize the positioning of the mobile device based on the filtered feature points that do not belong to the moving object. , can avoid the interference of the feature points of moving objects on pose acquisition, thereby improving the positioning accuracy of mobile devices in dynamic scenes.

在一种实现方式中，位姿获得单元1306具体用于：根据所述目标特征点对应的空间点的深度值，获得所述目标特征点对应的权重值；根据所述目标特征点对应的权重值，获得所述目标特征点的重投影误差最小时对应的位姿变换矩阵，所述重投影误差根据所述目标特征点在所述历史图像中对应的三维坐标值和所述目标特征点在所述当前帧图像中的二维坐标值获得；根据所述位姿变换矩阵，获得所述移动设备的当前位姿。In one implementation, the pose obtaining unit 1306 is specifically configured to: obtain the weight value corresponding to the target feature point according to the depth value of the spatial point corresponding to the target feature point; value, obtain the pose transformation matrix corresponding to the minimum reprojection error of the target feature point, and the reprojection error is based on the corresponding three-dimensional coordinate value of the target feature point in the historical image and the location of the target feature point. The two-dimensional coordinate values in the current frame image are obtained; and the current posture of the mobile device is obtained according to the posture transformation matrix.

在一种实现方式中，模型建立单元1303具体用于：分别以每个所述当前帧图像中匹配出的特征点为中心，获得所述当前帧图像中匹配出的特征点的多个邻居特征点；其中，且所述当前帧图像中匹配出的特征点与其对应的邻居特征点为深度值大于目标深度且与所述当前帧图像中匹配出的特征点之间的距离满足距离排序规则的特征点，所述目标深度与所述当前帧图像中匹配出的特征点的邻域像素点的深度均值相关；至少根据所述当前帧图像中匹配出的特征点的多个邻居特征点，建立所述当前帧图像中匹配出的特征点的第一像素模型，以使得所述第一像素模型对应于所述当前帧图像中匹配出的特征点的多个邻居特征点且还具有所述第一像素模型的方向向量和所述当前帧图像中匹配出的特征点的重投影误差。In one implementation, the model building unit 1303 is specifically configured to: take the matched feature points in each of the current frame images as the center, and obtain multiple neighbor features of the matched feature points in the current frame image. point; wherein, the feature point matched in the current frame image and its corresponding neighbor feature point have a depth value greater than the target depth and the distance between the feature point matched in the current frame image satisfies the distance sorting rule. Feature points, the target depth is related to the depth mean value of neighbor pixels of the matched feature points in the current frame image; at least based on multiple neighbor feature points of the matched feature points in the current frame image, a The first pixel model of the matched feature point in the current frame image, so that the first pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the current frame image and also has the third The direction vector of a pixel model and the reprojection error of the matched feature points in the current frame image.

其中，所述第一像素模型的方向向量基于所述第一像素模型中的邻居特征点对应的空间中心点的三维坐标值和所述当前帧图像中匹配出的特征点对应的空间点的三维坐标值获得，所述当前帧图像中匹配出的特征点的重投影误差基于所述历史图像获得。Wherein, the direction vector of the first pixel model is based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the three-dimensional coordinate value of the spatial point corresponding to the matched feature point in the current frame image. The coordinate values are obtained, and the reprojection error of the matched feature points in the current frame image is obtained based on the historical image.

基于此，模型比对单元1304具体用于：获得所述第一像素模型和其对应的第二像素模型中相匹配的邻居特征点的数量；获得所述第一像素模型中的方向向量和其对应的第二像素模型中的方向向量之间的差值的模长；判断所述第一像素模型中所述当前帧图像中匹配出的特征点的重投影误差是否小于或等于与所述第二像素模型中所述历史图像中匹配出的特征点对应的重投影误差的均值，以得到判断结果；所述均值为所述历史图像中匹配出的特征点在所述滑动窗口中累积重投影误差的均值；根据所述数量以及所述模长和所述判断结果，获得模型比对结果。Based on this, the model comparison unit 1304 is specifically configured to: obtain the number of matching neighbor feature points in the first pixel model and its corresponding second pixel model; obtain the direction vector in the first pixel model and its corresponding second pixel model. The module length of the difference between the corresponding direction vectors in the second pixel model; determine whether the reprojection error of the matched feature point in the current frame image in the first pixel model is less than or equal to that of the third pixel model. The mean value of the reprojection error corresponding to the matched feature points in the historical image in the two-pixel model is used to obtain the judgment result; the mean value is the cumulative reprojection of the matched feature points in the historical image in the sliding window The mean value of the error; according to the quantity, the module length and the judgment result, the model comparison result is obtained.

在一种实现方式中，本实施例中的装置还可以包括以下单元，如图14中所示：In an implementation manner, the device in this embodiment may also include the following units, as shown in Figure 14:

模型更新单元1307，用于：获得针对于所述历史图像中的特征点的像素模型的更新请求；获得所述滑动窗口内新加入的图像；对所述新加入的图像进行特征点提取，以得到所述新加入的图像中的特征点；将所述新加入的图像中的特征点与所述滑动窗口中的历史图像中的特征点进行匹配，以得到所述新加入的图像中与所述历史图像中的特征点相匹配的匹配特征点和所述新加入的图像中与所述历史图像中的特征点均不匹配的非匹配特征点；根据所述匹配特征点，对所述历史图像中与所述匹配特征点相匹配的特征点的像素模型进行更新；获得所述非匹配特征点的像素模型，以使得在所述新加入的图像作为所述滑动窗口中的历史图像时，所述非匹配特征点的像素模型作为所述历史图像中新的特征点的像素模型。The model update unit 1307 is used to: obtain an update request for the pixel model of the feature points in the historical image; obtain a newly added image within the sliding window; perform feature point extraction on the newly added image to Obtain the feature points in the newly added image; match the feature points in the newly added image with the feature points in the historical images in the sliding window to obtain the corresponding features in the newly added image. matching feature points that match the feature points in the historical image and non-matching feature points in the newly added image that do not match the feature points in the historical image; based on the matching feature points, the historical image The pixel model of the feature point in the image that matches the matching feature point is updated; the pixel model of the non-matching feature point is obtained, so that when the newly added image is used as a historical image in the sliding window, The pixel model of the non-matching feature point is used as the pixel model of the new feature point in the historical image.

在一种实现方式中，特征点筛选单元1305在根据所述当前帧图像中的目标特征点对应的重投影误差，获得所述移动设备的当前位姿之前，还用于：在所述当前帧图像中的目标特征点中，筛选出与所述历史图像中的特征点满足极线约束的目标特征点。In one implementation, before obtaining the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image, the feature point filtering unit 1305 is also configured to: in the current frame Among the target feature points in the image, target feature points that satisfy the epipolar constraint with the feature points in the historical image are screened out.

在一种实现方式中，特征点处理单元1302具体用于：对所述当前帧图像中的特征点进行提取；将所述当前帧图像中提取出的特征点与所述历史图像中提取到的特征点进行匹配，以得到所述当前帧图像中与所述历史图像中的特征点相匹配的特征点。In one implementation, the feature point processing unit 1302 is specifically configured to: extract feature points in the current frame image; combine the feature points extracted in the current frame image with the feature points extracted in the historical image. Feature points are matched to obtain feature points in the current frame image that match feature points in the historical image.

在一种实现方式中，特征点处理单元1302还用于：基于当前帧图像中匹配出的特征点进行重投影误差计算，并基于计算出的重投影误差初步估计出移动设备的当前位姿。In one implementation, the feature point processing unit 1302 is further configured to: perform reprojection error calculation based on the matched feature points in the current frame image, and initially estimate the current pose of the mobile device based on the calculated reprojection error.

需要说明的是，本实施例中各单元的具体实现可以参考前文中的相应内容此处不再详述。It should be noted that the specific implementation of each unit in this embodiment can refer to the corresponding content in the foregoing text and will not be described in detail here.

参考图15，为本申请实施例三提供的一种移动设备的结构示意图，该移动设备可以为能够对图像进行处理的电子设备，如具有图像采集设备的移动设备，如移动机器人等，或者可以为与移动设备相连接的设备，如计算机或服务器等。本实施例中的技术方案用于提高对移动设备进行位姿获取的准确性。Referring to Figure 15, a schematic structural diagram of a mobile device is provided in Embodiment 3 of the present application. The mobile device can be an electronic device capable of processing images, such as a mobile device with an image acquisition device, such as a mobile robot, etc., or it can It is a device connected to a mobile device, such as a computer or server. The technical solution in this embodiment is used to improve the accuracy of obtaining the pose of the mobile device.

具体的，本实施例中的移动设备中至少包括以下结构：Specifically, the mobile device in this embodiment at least includes the following structures:

存储器1501，用于存储应用程序和所述应用程序运行所产生的数据；Memory 1501, used to store application programs and data generated by running the application programs;

处理器1502，用于执行所述应用程序，以实现：Processor 1502, used to execute the application program to implement:

从上述技术方案可以看出，本申请实施例三的一种移动设备，在移动设备上的图像采集装置所采集到的当前帧图像之后，对当前帧图像中与之前滑动窗口所包含的历史图像中的特征点相匹配的特征点进行获取，进而在分别获得这些当前帧图像中的特征点的对应于多个邻居特征点的像素模型之后，将该像素模型与滑动窗口所包含的历史图像中的特征点的像素模型进行比对，进而根据模型比对结果就可以筛选出当前帧图像中对应的空间点不属于运动物体的目标特征点，基于此，就能够根据这些目标特征点的重投影误差来获得移动设备的当前位姿。可见，本实施例中利用前后图像中相匹配的特征点的像素模型来对属于运动物体的特征点进行剔除，从而根据筛选出来的不属于运动物体的特征点实现对移动设备的定位，由此，能够避免运动物体的特征点对位姿获取的干扰，从而提高移动设备在动态场景下的定位准确性。It can be seen from the above technical solutions that a mobile device according to Embodiment 3 of the present application, after the current frame image is collected by the image acquisition device on the mobile device, compares the current frame image with the historical images contained in the previous sliding window. The feature points that match the feature points in the current frame image are obtained, and then after obtaining the pixel models corresponding to multiple neighbor feature points of the feature points in the current frame image, the pixel model is compared with the historical images included in the sliding window. The pixel models of the feature points are compared, and then based on the model comparison results, the corresponding spatial points in the current frame image can be filtered out and the target feature points that do not belong to the moving object can be screened out. Based on this, the reprojection of these target feature points can be carried out. error to obtain the current pose of the mobile device. It can be seen that in this embodiment, the pixel model of the matching feature points in the front and rear images is used to eliminate the feature points belonging to the moving object, so as to realize the positioning of the mobile device based on the filtered feature points that do not belong to the moving object. , can avoid the interference of the feature points of moving objects on pose acquisition, thereby improving the positioning accuracy of mobile devices in dynamic scenes.

其中，在电子设备为移动设备的情况下，电子设备中还包含有图像采集装置，如摄像头等。Wherein, when the electronic device is a mobile device, the electronic device also includes an image collection device, such as a camera.

需要说明的是，本实施例中处理器的具体实现可以参考前文中的相应内容此处不再详述。It should be noted that the specific implementation of the processor in this embodiment can refer to the corresponding content in the foregoing text and will not be described in detail here.

以机器人的定位为例，以下对本申请的技术方案实现SLAM进行详细的举例说明：Taking the positioning of robots as an example, the following is a detailed example of how to implement SLAM using the technical solution of this application:

首先，本申请的发明人在采用SLAM进行机器人定位过程中，发现存在如下缺陷：First of all, the inventor of this application found the following defects in the process of using SLAM for robot positioning:

在所采用的视觉SLAM框架中，需要假设环境为静态环境，或者环境中的静态特征占据多数的准静态场景。无论是基于直接法或是特征点法的视觉SLAM系统，都很难应对运动物体的干扰，而且在SLAM运行过程中，对于相机位姿的解算如果完全依赖于最小化能量函数或者随机采样一致性RANSAC方法的话，当静态特征点在整个图像中未占据多数，或者占比较低，则在有限次的迭代计算中，往往很难获得正确的相机运动模型；In the visual SLAM framework used, it is necessary to assume that the environment is a static environment, or a quasi-static scene in which static features occupy the majority of the environment. Whether it is a visual SLAM system based on the direct method or the feature point method, it is difficult to deal with the interference of moving objects, and during the SLAM operation process, if the solution to the camera pose completely relies on minimizing the energy function or random sampling, With the linear RANSAC method, when static feature points do not occupy the majority in the entire image, or the proportion is low, it is often difficult to obtain the correct camera motion model in a limited number of iterative calculations;

本申请的发明人基于以上缺陷继续对定位方案进行试验，还发现：对于基于惯性测量单元IMU(Inertial Measurement Unit)惯性测量单元的解决动态环境的SLAM算法，这种方法并没有从本质解决动态环境问题，它只是通过提供给系统一个运动先验，增加特征点匹配的准确度，仍有可能在环境相似度较大的场景发生大量误匹配；而且增加额外传感器会增加定位算法的复杂度，增加传感器硬件成本；The inventor of this application continued to test the positioning solution based on the above defects, and also found that: for the SLAM algorithm based on the inertial measurement unit IMU (Inertial Measurement Unit) to solve the dynamic environment, this method does not essentially solve the dynamic environment. The problem is that it only increases the accuracy of feature point matching by providing a motion prior to the system, but it is still possible for a large number of mismatches to occur in scenes with large environmental similarities; and adding additional sensors will increase the complexity of the positioning algorithm, increasing Sensor hardware cost;

由于动态物体并不一定是运动的物体，如果盲目的将预定义动态物体排除，这将很容易造成图像中信息大大减少，而采用目前的语义动态SLAM则是识别动态物体并将其剔除，这将会就会使得其在特征点稀少到场景中适用性更差，而且引入深度学习网络，会对系统计算力要求很高，在不含图形处理器GPU(Graphics Processing Unit)的硬件平台，该算法很难达到实时性，若增加GPU则会增加硬件成本，使得产品应用受到极大限制。另一方面由于深度学习的目标检测算法需要大量数据进行预训练，这也会一定程度限制基于视觉SLAM定位算法的机器人对于未知环境的探索；Since dynamic objects are not necessarily moving objects, if predefined dynamic objects are blindly excluded, it will easily cause the information in the image to be greatly reduced. However, using the current semantic dynamic SLAM is to identify dynamic objects and eliminate them. This This will make its applicability even worse in scenarios where feature points are sparse, and the introduction of a deep learning network will require very high computing power of the system. On a hardware platform that does not include a graphics processor GPU (Graphics Processing Unit), this It is difficult for the algorithm to achieve real-time performance. If a GPU is added, the hardware cost will increase, which will greatly limit the product application. On the other hand, since the target detection algorithm of deep learning requires a large amount of data for pre-training, this will also limit the exploration of unknown environments by robots based on the visual SLAM positioning algorithm to a certain extent;

本申请的发明人还发现：对于基于运动分割的算法，它能够同时估计出相机和场景中刚性动态物体的运动，这种方法不依赖于图像的语义信息，通过将相同运动的点聚类为一个运动模型进行运动估计，从而将动态场景中不同运动刚体分割出来。但是该方法对于场景的条件要求较高，在场景中可测3D点深度过少的情况下，相机和动态物体运动估计的误差较大，而且由于聚类分割的不准确性，会造成某个聚类包含多个运动刚体。The inventor of this application also found that: for an algorithm based on motion segmentation, it can simultaneously estimate the motion of the camera and rigid dynamic objects in the scene. This method does not rely on the semantic information of the image, by clustering points with the same motion into A motion model performs motion estimation to segment different moving rigid bodies in dynamic scenes. However, this method has high requirements for scene conditions. When there are too few measurable 3D point depths in the scene, the error in camera and dynamic object motion estimation will be large, and due to the inaccuracy of clustering segmentation, it will cause certain problems. A cluster contains multiple kinematic rigid bodies.

因此，本申请的发明人为了解决传统视觉SLAM定位算法在动态环境下精度降低的问题，提出了一种基于特征点建模和多视几何的方法来剔除场景中的动态区域和异常特征点的方法。该方法以视觉SLAM为主体，在一个滑动窗口内建立为特征点建立模型，通过将模型与当前帧进行对比，有效剔除场景中动态物体在图像上对应的特征点，实现在动态环境下的相机位姿的鲁棒性估计。具体如下：Therefore, in order to solve the problem of reduced accuracy of traditional visual SLAM positioning algorithms in dynamic environments, the inventor of this application proposed a method based on feature point modeling and multi-view geometry to eliminate dynamic areas and abnormal feature points in the scene. method. This method takes visual SLAM as the main body and establishes a model for feature points in a sliding window. By comparing the model with the current frame, it can effectively eliminate the feature points corresponding to the dynamic objects in the scene on the image, and realize the camera in a dynamic environment. Robust estimation of pose. details as follows:

结合图16中所示的本申请的技术框架，本申请中可以分为五个部分：基于最小化重投影误差计算粗略的位姿变换矩阵、动态区域判断、特征点建模、极线约束判断外点和精确位姿估计。下面对这五部分进行详细介绍：Combined with the technical framework of this application shown in Figure 16, this application can be divided into five parts: calculating a rough pose transformation matrix based on minimizing the reprojection error, dynamic area judgment, feature point modeling, and epipolar constraint judgment. Outer points and precise pose estimation. These five parts are introduced in detail below:

1)基于最小化重投影误差计算位姿估计1) Calculate pose estimation based on minimizing reprojection error

位姿估计流程图17所示。本申请中对位姿估计采取了一个由粗到精的方式。这一步的位姿估计属于粗略位姿估计，核心是通过构造重投影误差计算3D-2D点对的运动。这部分算法步骤与原理如下：The pose estimation flow chart is shown in Figure 17. This application adopts a coarse-to-fine approach to pose estimation. The pose estimation in this step is a rough pose estimation, and the core is to calculate the motion of the 3D-2D point pair by constructing a reprojection error. The steps and principles of this part of the algorithm are as follows:

(1)在当前帧图像中提取特征点。特征点是图像中具有代表性的点，是图像信息的另一种数字表达形式。发明中，我们对图像提取的是ORB特征，它由关键点和描述子组成。关键点称为“Oriented FAST”,这是一种改进的FAST角点，具有尺度和旋转不变性。描述子采用的的是二进制BRIEF描述子。在特征提取过程中，可能会因为图像中某一处纹理丰富，造成特征点集中于一个区域，使得位姿估计趋向于局部运动。为了提高系统精度，我们增加特征提取的均匀性：将图像划分为S*S的小方格，对于每个方格进行特征点提取，使得图像特征尽量分布在整个图像视野。(1) Extract feature points in the current frame image. Feature points are representative points in the image and are another digital expression of image information. In the invention, we extract ORB features from images, which consist of key points and descriptors. The key points are called "Oriented FAST", which is an improved FAST corner point with scale and rotation invariance. The descriptor uses a binary BRIEF descriptor. During the feature extraction process, the feature points may be concentrated in one area due to the rich texture in one part of the image, causing the pose estimation to tend to local motion. In order to improve the accuracy of the system, we increase the uniformity of feature extraction: divide the image into S*S small squares, and extract feature points for each square so that the image features are distributed throughout the entire image field of view as much as possible.

(2)将当前帧提取的特征点与所属的上一关键帧中对应的特征点进行ORB描述子匹配，这里采用DBOW2词袋技术对当前帧提取的特征点和上一关键帧中的特征点进行加速匹配，匹配结束后判断特征点匹配的数量是否达到要求数目，如果达到了，直接进行下一步构建重投影误差，估计相机位姿。若未达到要求，则放宽匹配条件，再次利用DBOW2词袋技术加速匹配当前帧提取的特征点和前面帧所构建的局部地图点。(2) Match the feature points extracted from the current frame with the corresponding feature points in the previous key frame. Here, DBOW2 bag-of-word technology is used to match the feature points extracted from the current frame with the feature points in the previous key frame. Accelerate matching is performed. After the matching is completed, it is judged whether the number of matching feature points reaches the required number. If it is reached, proceed directly to the next step to construct the reprojection error and estimate the camera pose. If the requirements are not met, the matching conditions are relaxed, and the DBOW2 bag-of-words technology is used again to accelerate the matching of the feature points extracted from the current frame and the local map points constructed from the previous frame.

(3)利用上一关键帧传感器输入的深度图像数据获取上一关键帧特征点的深度值，还原特征点3D坐标。把之前的3D点与当前帧2D像素点构成3D-2D点对，构建重投影误差方程如 (3) Use the depth image data input by the sensor of the previous key frame to obtain the depth value of the feature point of the previous key frame and restore the 3D coordinates of the feature point. The previous 3D points and the current frame 2D pixels form a 3D-2D point pair, and the reprojection error equation is constructed as follows:

其中，ω_i是预定义权重，它的计算公式为：λ是一个尺度常数，d_i为第i个特征点对中3D点的深度值。这项权重的加入是一个新颖的角度。它是出于两方面考虑：一般在室内场景，运动目标都处于离机器人较近的位置；距离较远的移动3D点在较短时间内的移动在像素平面产生的位移可以忽略不计。F(·)为Huber核函数，其作用是当出现特征点误匹配时，这时重投影误差会很大，通过核函数可以限制重投影误差的增长，而且因为Huber核函数是光滑的，方便求导，利于重投影误差方程的求解。π(·)是基于相机模型的投影变换，可以将相机坐标系下3D坐标变化为相机图像中的2D像素坐标。u_i＝[u_i,v_i]^T为3D-2D点对中2D点在像素平面的坐标，P_i＝[X_i,Y_i,Z_i]^T为3D-2D点对中3D点在空间中的坐标。最后通过EPNP算法进行非线性优化求解，得到初步估计的机器人的当前位姿。Among them, ω _i is the predefined weight, and its calculation formula is: λ is a scale constant, and _di is the depth value of the 3D point in the i-th feature point pair. The addition of this weight is a novel angle. It is based on two considerations: generally in indoor scenes, moving targets are located closer to the robot; the displacement of pixel planes caused by the movement of distant moving 3D points in a short period of time is negligible. F(·) is the Huber kernel function. Its function is that when mismatching of feature points occurs, the reprojection error will be very large. The kernel function can limit the growth of the reprojection error, and because the Huber kernel function is smooth, it is convenient Derivative is helpful for solving the reprojection error equation. π(·) is a projection transformation based on the camera model, which can change the 3D coordinates in the camera coordinate system into the 2D pixel coordinates in the camera image. u _i =[u _i ,vi _] ^T is the coordinate of the 2D point in the 3D-2D point pair in the pixel plane, P _i =[X _i ,Y _i ,Z _i ] ^T is the 3D point in the 3D-2D point pair in coordinates in space. Finally, the EPNP algorithm is used to perform nonlinear optimization to obtain a preliminary estimate of the current pose of the robot.

2)特征点建模2) Feature point modeling

这一步是本申请的核心部分，主要是为特征点建立一个阶段性模型，即前文中的像素模型，因为模型会在一个滑动窗口内进行替换和更新，故称阶段性，流程图如图18所示。建立模型的目的是通过将当前帧特征点与匹配的特征点模型进行比较，通过对比当前帧特征点与特征点模型的差异，对特征点是否为动态进行判断。This step is the core part of this application. It mainly establishes a staged model for feature points, that is, the pixel model mentioned above. Because the model will be replaced and updated within a sliding window, it is called staged. The flow chart is shown in Figure 18 shown. The purpose of establishing the model is to judge whether the feature points are dynamic by comparing the feature points of the current frame with the matching feature point model, and by comparing the differences between the feature points of the current frame and the feature point model.

模型的建立和初始化如下：首先，对于每个特征点p，即历史图像中的特征点，建立其像素模型M(p)，即第二像素模型，其中，取其中，/>为特征点p在空间上的深度不一致邻居特征点，N为选取的特征点数量。这些邻居特征点的选取并不是随意选取的，首先需要计算特征点p的24邻域的像素点深度值得均值/>其中，/>然后计算特征点p与其他特征点的欧式距离，将深度值大于/>的与特征点p的欧式距离从小到大排在前N的特征点作为深度不一致邻居特征点。The establishment and initialization of the model are as follows: First, for each feature point p, that is, the feature point in the historical image, establish its pixel model M(p), that is, the second pixel model, where, take Among them,/> are the depth-inconsistent neighbor feature points of feature point p in space, and N is the number of selected feature points. The selection of these neighbor feature points is not randomly selected. First, it is necessary to calculate the mean depth value of the pixels in the 24 neighborhoods of the feature point p/> Among them,/> Then calculate the Euclidean distance between the feature point p and other feature points, and set the depth value greater than/> The top N feature points ranked in ascending order by their Euclidean distance from the feature point p are regarded as depth-inconsistent neighbor feature points.

这一步出于的考虑是排除与特征点p属于同一刚体的特征点被加入模型，在深度接近的情况下，像素点极大可能处于同一个刚体，即使运动发生了改变，它们之间的关系也不会发生太大变化。n为该模型的方向向量，它是通过计算模型中所有邻居特征点对应的3D地图点的空间中心点的位置然后与特征点p的对应的3D地图点的位置(x,y,z)相减获得的，即/>r_p为重投影误差的累计和值，/>初始化时候为0，m为滑动窗口的帧数。这一步的模型是通过之前帧建立的，即在一个滑动窗口内的帧，并取第一帧进行初始化，随后帧进行模型更新。The purpose of this step is to exclude feature points that belong to the same rigid body as the feature point p from being added to the model. When the depth is close, the pixel points are most likely to be in the same rigid body. Even if the motion changes, the relationship between them Not much will change. n is the direction vector of the model, which is calculated by calculating the position of the spatial center point of the 3D map point corresponding to all neighbor feature points in the model. Then it is obtained by subtracting the position (x, y, z) of the corresponding 3D map point of the feature point p, that is/> r _p is the cumulative sum of reprojection errors,/> It is 0 during initialization, and m is the number of frames in the sliding window. The model in this step is established through previous frames, that is, frames within a sliding window, and the first frame is taken for initialization, and subsequent frames are used to update the model.

3)动态特征点判断与模型更新3) Dynamic feature point judgment and model update

在对当前帧特征点进行判断之前，还需要同样对当前帧有效特征点(即在历史图像中有相匹配的特征点的特征点)建立一个像素特征点模型，即第一像素模型，类似于上文中的步骤，首先寻找特征点p'即当前帧图像中的特征点的N个深度不一致邻居特征点，然后计算模型的方向向量n′，最后加入第一步中计算的重投影误差r_p'，本申请的发明人可以设置三个判断条件来判断当前帧某个特征点是否为静态，若同时满足下面三个条件满足，则该特征点为静态点：Before judging the feature points of the current frame, it is also necessary to establish a pixel feature point model, that is, the first pixel model, for the valid feature points of the current frame (that is, the feature points with matching feature points in the historical image). In the above steps, first find the feature point p', that is, the N depth-inconsistent neighbor feature points of the feature point in the current frame image, then calculate the direction vector n' of the model, and finally add the reprojection error r _p calculated in the first step. _' , the inventor of this application can set three judgment conditions to judge whether a certain feature point of the current frame is static. If the following three conditions are met at the same time, the feature point is a static point:

(1)利用在第1)步中进行特征点匹配的结果，在特征点像素模型即第二像素模型中找到与当前帧特征点p'匹配的特征点(也可以称为特征点模型)，将特征点模型中的邻居特征点与前帧特征点p'的邻居特征点{p'₁,p'₂,p'₃…p'_N}进行匹配，若与预设的N_th个以上特征点匹配成功即满足该条件。(1) Using the result of feature point matching in step 1), find the feature point that matches the feature point p' of the current frame in the feature point pixel model, that is, the second pixel model (it can also be called a feature point model), Convert the neighbor feature points in the feature point model to Match with the neighbor feature points {p' ₁ , p' ₂ , p' ₃ ...p' _N } of the feature point p' in the previous frame. This condition is satisfied if the match is successful with more than the preset N _th feature points.

(2)考虑到动态物体是垂直于相机平面运动，特征点像素模型中的深度不一致邻居特征点与当前帧特征点p'的N个深度不一致邻居特征点会出现相似度极高的情况，进一步增加方向向量对比条件，即如果特征点模型中的方向向量n与当前帧特征点p′模型中方向向量n′差值的模长在一定范围内，就满足该特征点为静态的要求：||n-n′||₂<δ。(2) Considering that dynamic objects move perpendicular to the camera plane, the depth-inconsistent neighbor feature points in the feature point pixel model will have extremely high similarity with the N depth-inconsistent neighbor feature points of the current frame feature point p'. Further, Add a direction vector comparison condition, that is, if the module length of the difference between the direction vector n in the feature point model and the direction vector n' in the feature point p' model of the current frame is within a certain range, the requirement that the feature point is static is met: | |nn′|| ₂ <δ.

(3)这一项条件可以看成额外项约束，主要是为了通过重投影误差的累计对比出特征点是动态点的可能性，这里的重投影误差计算使用的是第一步计算的结果。公式表达为m为滑动窗口的帧数，这里的意思是求取特征点模型中点p在滑动窗口中累计重投影误差的均值，若当前帧特征点p′的重投影误差小于特征点模型中点p在滑动窗口中重投影误差的均值，则可以更加坚信特征点p′为静态点，即满足该判定条件。(3) This condition can be regarded as an additional constraint, mainly to compare the possibility of feature points being dynamic points through the accumulation of reprojection errors. The reprojection error calculation here uses the result of the first step of calculation. The formula is expressed as m is the number of frames in the sliding window. What it means here is to find the mean value of the cumulative reprojection error of the feature point model midpoint p in the sliding window. If the reprojection error of the feature point p′ of the current frame is less than the feature point model midpoint p in By using the mean value of the reprojection error in the sliding window, we can be more confident that the feature point p′ is a static point, which satisfies this determination condition.

进一步的，随着机器人对图像的获取，机器人将经过定位的当前帧图像作为历史图像进行处理，如移动到滑动窗口中，同时采集到新的当前帧图像。基于此，在滑动窗口中的历史图像存在更新时，可以对于滑动窗口中的历史图像中的静态特征点p的模型即第二像素模型进行更新，本申请中采取的是一种随机更新策略，主要做可以保证滑动窗口内老旧帧产生的模型像素点不被轻易代替，增加模型内像素特征点的多样性：Further, as the robot acquires the image, the robot processes the positioned current frame image as a historical image, such as moving it to a sliding window, and collects a new current frame image at the same time. Based on this, when there is an update to the historical image in the sliding window, the model of the static feature point p in the historical image in the sliding window, that is, the second pixel model, can be updated. A random update strategy is adopted in this application. The main purpose is to ensure that the model pixels generated by old frames in the sliding window are not easily replaced, and to increase the diversity of pixel feature points in the model:

首先，对于滑动窗口中特征点p的特征点模型若在当前帧的特征点中找到了匹配的特征点p^F，并且该特征点被判断为静态特征点，以及该特征点建立的特征点模型M(p^F)，基于此，对于滑动窗口中的特征点的模型中每一个点，都给予同等更新的概率，就是说每一个点都有相同的概率进行更新，更新概率为/> First, for the feature point model of feature point p in the sliding window If a matching feature point p ^F is found among the feature points of the current frame, and the feature point is judged to be a static feature point, and the feature point model M(p ^F ) established by the feature point, based on this, for the sliding window model of feature points Each point in , is given the same probability of updating, that is to say, every point has the same probability of being updated, and the update probability is/>

接着，对于采用模型M(p^F)中对哪一个特征点的模型进行更新，可以首先定义更新概率，即模型M(p^F)只有概率η的可能性对特征点p的模型进行更新。若模型M(p^F)可以对特征点p的模型更新，选取与p^F欧式距离最小的像素特征点对模型M(p)进行更新。同样，当模型M(p)中的像素特征点更新后，可以重新计算模型的方向向量n以及更新重投影误差r_p。对于未在滑动窗口中找到匹配点的当前帧特征点，本申请考虑将其按照初始化步骤加入特征点模型，以便于下次的特征点匹配以及模型比对。Next, as to which feature point model in the model M(p ^F ) is used to update, the update probability can be first defined, that is, the model M(p ^F ) only has the possibility of updating the model of the feature point p with probability eta. If the model M(p ^F ) can update the model of the feature point p, select the pixel feature point with the smallest Euclidean distance from p ^F Update model M(p). Similarly, when the pixel feature points in the model M(p) are updated, the direction vector n of the model can be recalculated and the reprojection error r _p can be updated. For the feature points of the current frame for which no matching point is found in the sliding window, this application considers adding them to the feature point model according to the initialization step to facilitate the next feature point matching and model comparison.

4)极线约束判断外点4) Epipolar constraints determine external points

在利用特征点模型进行动态点移除(筛选)后，考虑到可能存在漏检或者上一步中没有在特征点模型中找到匹配的特征点的问题，本申请通过引入多视几何的约束进行处理。本申请中基于的事实是在静态场景，动态特征会违反在多视几何结构中的标准约束，如图19所示。After using the feature point model to perform dynamic point removal (filtering), considering that there may be missed detection or no matching feature points were found in the feature point model in the previous step, this application handles it by introducing the constraints of multi-view geometry. . This application is based on the fact that in static scenes, dynamic features will violate the standard constraints in multi-view geometry, as shown in Figure 19.

其中x₁、x₂和x′₂是像素坐标上的三个点，如果3D点P₁为静态点，那么x₁和x₂是一对匹配点，那么它们会满足方程x₂Fx₁＝0，F为基础矩阵，可以通过相机运动旋转矩阵R和平移向量t获得。l₁、l₂称为极线，故该几何约束也称为极线约束。如果3D点P₁为动态点，即在下一时刻，它运动到了P₂点，那么P₂在相机平面的投影变成了x′₂，很明显x₁和x′₂不满足极线约束，即x₂Fx₁≠0，并且远大于0，本申请就是通过这个多视约束进一步剔除漏检的动态3D点。where x ₁ , x ₂ and x′ ₂ are three points on the pixel coordinates. If the 3D point P ₁ is a static point, then x ₁ and x ₂ are a pair of matching points, then they will satisfy the equation x ₂ Fx ₁ = 0, F is the basic matrix, which can be obtained through the camera motion rotation matrix R and translation vector t. l ₁ and l ₂ are called epipolar lines, so this geometric constraint is also called epipolar line constraint. If 3D point P ₁ is a dynamic point, that is, at the next moment, it moves to point P ₂ , then the projection of P ₂ on the camera plane becomes x′ _2. It is obvious that x ₁ and x′ ₂ do not satisfy the epipolar constraints, That is, x ₂ Fx ₁ ≠0, and is much greater than 0. This application uses this multi-view constraint to further eliminate missed dynamic 3D points.

5)精确位姿估计5) Accurate pose estimation

这一步是将当前帧经过前面步骤后存活的特征点即目标特征点与局部地图中的静态特征点进行局部优化，优化的方法仍然是通过计算重投影误差函数的最小值，使得当前帧的位姿估计最优。重投影误差函数的形式为其中P_i为当前帧静态特征点匹配到的局部地图中的3D点，u_i为当前帧中静态特征点在像素平面的坐标。This step is to locally optimize the feature points of the current frame that survived the previous steps, that is, the target feature points, and the static feature points in the local map. The optimization method is still by calculating the minimum value of the reprojection error function, so that the position of the current frame The pose estimation is optimal. The form of the reprojection error function is Among them, _Pi is the 3D point in the local map matched by the static feature point in the current frame, and u _i is the coordinate of the static feature point in the current frame on the pixel plane.

可见，本申请的发明人在应对动态环境下定位问题时提出了一种特征点建模的概念，并通过将特征点与模型对比来剔除动态。而且在计算重投影误差函数时，引入了预权重的概念，利用远的动态点在连续帧中不会在像素平面产生较大位移和静态点更容易是远处背景的思想，对误差值进行预加权处理。It can be seen that the inventor of this application proposed a concept of feature point modeling when dealing with positioning problems in dynamic environments, and eliminated dynamics by comparing feature points with the model. Moreover, when calculating the reprojection error function, the concept of pre-weighting is introduced, and the error value is calculated using the idea that far dynamic points will not cause large displacements in the pixel plane in consecutive frames and static points are more likely to be distant backgrounds. Pre-weighted processing.

进一步的，本申请中提出了一种利用深度不一致特征点建模的方法，思想是利用周围环境信息对特征点进行建模，而不是像素周围信息建模，通过将模型与需要判断的特征点进行三个条件判定，可以有效的区分动态和静态特征点。而且在本申请中采用了滑动窗口的方法对多连续帧进行判断，通过累计的信息对特征点判断，避免了短暂两帧之间运动不明显的缺点，有效地提高了特征点静态动态区分的准确性。Furthermore, this application proposes a method of modeling feature points with inconsistent depth. The idea is to use the surrounding environment information to model the feature points instead of modeling the information around the pixels. By combining the model with the feature points that need to be judged, Three conditional judgments can be made to effectively distinguish dynamic and static feature points. Moreover, in this application, a sliding window method is used to judge multiple consecutive frames, and feature points are judged through accumulated information, which avoids the shortcoming of inconspicuous motion between two short frames and effectively improves the static and dynamic differentiation of feature points. accuracy.

可见，本申请中不同于现有处理动态环境的方式，采取了特征点建模的方式，利用特征点在时间和空间的信息，建立一种像素模型，空间上是采用了深度不一致邻居特征点对像素进行代表，时间上采用了一种随机更新策略，可以增加模型中所包含的过去信息量。并且设定了三个条件对特征点进行判断，只有同时符合三个条件的特征点才能被判断为静态点，然后再利用这些特征点进行位姿的精确估计。进一步的，本申请是一种利用像素周围环境信息的方式，不同于采用深度学习的预训练和运动分割的方法，不需要对场景环境有预先的了解，也不需要对场景中不同运动刚体进行分割，减少了对硬件水平的需求，是一种实用性很强的处理动态环境的的SLAM算法。It can be seen that this application is different from the existing way of processing dynamic environments. It adopts the method of feature point modeling, using the information of feature points in time and space to establish a pixel model. In space, the depth-inconsistent neighbor feature points are used To represent pixels in time, a random update strategy is used to increase the amount of past information contained in the model. And three conditions are set to judge the feature points. Only the feature points that meet the three conditions at the same time can be judged as static points, and then these feature points are used to accurately estimate the pose. Furthermore, this application is a way of using the surrounding environment information of pixels. It is different from the pre-training and motion segmentation methods using deep learning. It does not require a prior understanding of the scene environment, nor does it need to conduct analysis on different moving rigid bodies in the scene. Segmentation reduces the need for hardware levels and is a highly practical SLAM algorithm for handling dynamic environments.

需要说明的是，在传感器方面，本申请中筛选静态特征点的的方法不仅仅适用于RGB-D相机也同样使用，对于单目和双目相机也同样适用，只是RGB-D相机可以直接从传感器或得像素点深度信息，不需要额外算法进行计算。而且，本申请所使用的像素建模的方式是基于动态特征点其周围像素特征点也会发生变化的特点，对特征点进行建模，如果可以找到别的方案区分动态和静态特征点随着时间的变化有何相同和不同的地方，仍在本申请的保护范围内。另外，本申请是基于视觉SLAM的算法，但是实质上也是基于环境特征点采样的，但对于使用激光传感器的SLAM算法来说，也是适用的，只需要找到对激光点进行建模的其他特征点就行，然后再根据激光传感器获取的采样点特点设定静态和动态特征点判断条件，区分动态和静态特征点，利用静态点对机器人位姿进行估计。It should be noted that in terms of sensors, the method of filtering static feature points in this application is not only applicable to RGB-D cameras, but is also applicable to monocular and binocular cameras, except that RGB-D cameras can directly The sensor can obtain pixel depth information, and no additional algorithm is needed for calculation. Moreover, the pixel modeling method used in this application is based on the characteristic that the surrounding pixel feature points of dynamic feature points will also change. The feature points are modeled. If other solutions can be found to distinguish dynamic and static feature points as the The similarities and differences in time changes are still within the protection scope of this application. In addition, this application is based on a visual SLAM algorithm, but it is essentially based on environmental feature point sampling. However, it is also applicable to SLAM algorithms using laser sensors. You only need to find other feature points that model laser points. That's it, and then set the static and dynamic feature point judgment conditions based on the characteristics of the sampling points acquired by the laser sensor, distinguish dynamic and static feature points, and use the static points to estimate the robot's pose.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.

专业人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those skilled in the art may further realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both. In order to clearly illustrate the possible functions of hardware and software, Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下，在其它实施例中实现。因此，本申请将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the application. Therefore, the present application is not to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A pose acquisition method, characterized in that it includes:

Obtain a current frame image, which is an image collected by an image acquisition device on the mobile device;

Obtain the matched feature points in the current frame image, wherein the matched feature points in the current frame image are those with matching feature points in the historical images included in the sliding window corresponding to the current frame image. Feature points; the sliding window contains multiple frames of historical images, and the historical images are key frame images before the current frame image;

A first pixel model of each matched feature point in the current frame image is obtained respectively, the first pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the current frame image, and the The first pixel model also has a direction vector of the first pixel model and a reprojection error of the matched feature points in the current frame image. The first pixel model is to remove the depth values of the plurality of neighbor feature points. After obtaining the same redundant neighbor feature points, the remaining neighbor feature points are established based on the matched feature points in the current frame image;

The first pixel model of the matched feature point in each of the current frame images is compared with the corresponding second pixel model to obtain a model comparison result, and the model comparison result characterizes the current frame image. Whether the spatial point corresponding to the matched feature point belongs to a moving object; wherein, the second pixel model is a pixel model of the feature point in the historical image that matches the feature point matched in the current frame image, The second pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the historical image, and the second pixel model also has a direction vector of the second pixel model and a matching feature point in the historical image. The mean value of the reprojection error of the feature points, and the comparison result is that the first pixel model of the matched feature points in each current frame image and the corresponding second pixel model are in multiple neighbor points, direction vectors and Each item in the reprojection error is compared separately, and then the comparison results of multiple neighbor feature points, direction vectors and each item in the reprojection error corresponding to the matched feature points in each current frame image are obtained;

According to the model comparison result, target feature points in the current frame image are screened out, and the target feature points are feature points whose corresponding spatial points in the current frame image do not belong to moving objects;

According to the reprojection error corresponding to the target feature point in the current frame image, the current pose of the mobile device is obtained.

2. The method of claim 1, wherein obtaining the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image includes:

Obtain the weight value corresponding to the target feature point according to the depth value of the spatial point corresponding to the target feature point;

According to the weight value corresponding to the target feature point, the pose transformation matrix corresponding to the minimum reprojection error of the target feature point is obtained. The reprojection error is based on the three-dimensional corresponding three-dimensional image of the target feature point in the historical image. The coordinate value and the two-dimensional coordinate value of the target feature point in the current frame image are obtained;

According to the pose transformation matrix, the current pose of the mobile device is obtained.

3. The method according to claim 1, characterized in that, respectively obtaining the first pixel model of the matched feature points in each of the current frame images includes:

Using the matched feature points in each current frame image as the center, obtain multiple neighbor feature points of the matched feature points in the current frame image;

Wherein, the depth value of the matched feature point in the current frame image is different from that of its corresponding neighbor feature point, and the depth value of the neighbor feature point of the matched feature point in the current frame image is greater than the target depth and is different from the target depth. The distance between the matched feature points in the current frame image satisfies the feature points of the distance sorting rule, and the target depth is related to the depth mean of the neighborhood pixels of the matched feature points in the current frame image;

Establish a first pixel model of the matched feature point in the current frame image based on at least a plurality of neighbor feature points of the matched feature point in the current frame image, so that the first pixel model corresponds to the The multiple neighbor feature points of the matched feature point in the current frame image also have the direction vector of the first pixel model and the reprojection error of the matched feature point in the current frame image.

4. The method of claim 3, wherein the direction vector of the first pixel model is based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the current frame. The three-dimensional coordinate value of the spatial point corresponding to the matched feature point in the image is obtained, and the reprojection error of the matched feature point in the current frame image is obtained based on the historical image.

5. The method according to claim 3 or 4, characterized in that the first pixel model of the matched feature points in each of the current frame images is compared with the corresponding second pixel model to obtain Model comparison results include:

Obtain the number of matching neighbor feature points in the first pixel model and its corresponding second pixel model;

Obtain the module length of the difference between the direction vector in the first pixel model and its corresponding direction vector in the second pixel model;

Determine whether the reprojection error of the matched feature points in the current frame image in the first pixel model is less than or equal to the reprojection error corresponding to the matched feature points in the historical image in the second pixel model The mean value is the mean value of the cumulative reprojection error of the matched feature points in the historical image in the sliding window to obtain the judgment result;

According to the quantity, the module length and the judgment result, a model comparison result is obtained.

6. The method of claim 5, further comprising:

Obtain an update request for a pixel model of a feature point in the historical image;

Obtain newly added images within the sliding window;

Perform feature point extraction on the newly added image to obtain feature points in the newly added image;

Match the feature points in the newly added image with the feature points in the historical images in the sliding window to obtain matching features in the newly added image that match the feature points in the historical images points and non-matching feature points in the newly added image that do not match the feature points in the historical image;

According to the matching feature point, update the pixel model of the feature point in the historical image that matches the matching feature point;

Obtain the pixel model of the non-matching feature point, so that when the newly added image is used as a historical image in the sliding window, the pixel model of the non-matching feature point is used as a new feature point in the historical image. pixel model.

7. The method according to claim 1, characterized in that, after filtering out the target feature points in the current frame image according to the model comparison result, according to the target feature points in the current frame image Corresponding reprojection error, before obtaining the current pose of the mobile device, the method further includes:

Among the target feature points in the current frame image, target feature points that satisfy the epipolar constraint with the feature points in the historical image are screened out.

8. The method of claim 1, wherein obtaining the matched feature points in the current frame image includes:

Extract feature points in the current frame image;

Match the feature points extracted from the current frame image with the feature points extracted from the historical image to obtain feature points in the current frame image that match the feature points in the historical image.

9. A posture acquisition device, characterized in that it includes:

An image acquisition unit, used to obtain a current frame image, where the current frame image is an image collected by an image acquisition device on a mobile device;

A feature point processing unit, configured to obtain the matched feature points in the current frame image, where the matched feature points in the current frame image are historical images included in the sliding window corresponding to the current frame image. There are feature points that match the feature points; the sliding window contains multiple frames of historical images, and the historical images are key frame images before the current frame image;

A model building unit configured to obtain a first pixel model of each feature point matched in the current frame image, where the first pixel model corresponds to multiple neighbors of the feature point matched in the current frame image. feature points, and the first pixel model also has a direction vector of the first pixel model and a reprojection error of the matched feature points in the current frame image. The first pixel model is to remove the plurality of feature points. After the redundant neighbor feature points with the same depth value among the neighbor feature points are established, the remaining neighbor feature points are established based on the matched feature points in the current frame image;

A model comparison unit, configured to compare the first pixel model of the matched feature points in each of the current frame images with the corresponding second pixel model to obtain a model comparison result. The model comparison The result represents whether the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object; wherein, the second pixel model matches the feature point matched in the current frame image in the historical image. A pixel model of the feature point, the second pixel model corresponds to a plurality of neighbor feature points of the matched feature point in the historical image, and the second pixel model also has a direction vector of the second pixel model and the mean value of the reprojection error of the matched feature points in the historical image, and the comparison result is that the first pixel model of the matched feature point in each current frame image is compared with the corresponding second pixel model at multiple times. Neighbor points, direction vectors and reprojection errors are compared separately, and then multiple neighbor feature points, direction vectors and reprojection errors corresponding to the matched feature points in each current frame image are obtained. The comparison result of one item;

A feature point screening unit, configured to screen out target feature points in the current frame image based on the model comparison results, where the target feature points are features of corresponding spatial points in the current frame image that do not belong to moving objects. point;

A pose obtaining unit is configured to obtain the current pose of the mobile device based on the reprojection error corresponding to the target feature point in the current frame image.

10. A mobile device, characterized in that it includes:

Memory for storing application programs and data generated by the operation of said application programs;

A processor for executing said application to achieve:

According to the model comparison results, target feature points in the current frame image are screened out, and the target feature points are feature points whose corresponding spatial points in the current frame image do not belong to moving objects; according to the current frame The reprojection error corresponding to the target feature point in the image is used to obtain the current pose of the mobile device.