CN115718303A

CN115718303A - A fusion relocalization method of camera and solid-state lidar for dynamic environment

Info

Publication number: CN115718303A
Application number: CN202211040009.6A
Authority: CN
Inventors: 禹鑫燚; 郑万财; 张震; 胡佳南; 欧林林; 蔡传烩
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2023-02-28

Abstract

A camera and solid-state laser radar fusion relocation method facing a dynamic environment comprises the following steps: obtaining a priori dynamic object through target detection of deep learning YOLOv 5; extracting feature points of the image, calculating the dynamic probability of each feature point in a probability form by combining epipolar geometry and a Bayes formula, eliminating the dynamic feature points, and keeping static feature points; using a DBOW algorithm to represent and update word bags, wherein each image carries the pose calculated by the solid-state laser radar, the pose of the camera is obtained through external parameter calibration, and the point cloud information of the static part of the frame is recorded; and designing a two-stage registration mode to improve the robustness of the system, firstly carrying out similarity calculation on the feature points of the current frame and the previously stored feature points during relocation, and carrying out ICP registration on point cloud back projection in the candidate frame if the feature points are too little matched, so as to further improve the robustness of the system. The present invention combines vision and laser to ameliorate the disadvantages of repositioning in dynamic environments.

Description

A fusion relocalization method of camera and solid-state lidar for dynamic environment

技术领域technical field

本发明涉及机器人全局定位领域，具体是一种面向动态环境的相机与固态激光雷达融合重定位方法。The invention relates to the field of robot global positioning, in particular to a dynamic environment-oriented camera and solid-state laser radar fusion relocation method.

背景技术Background technique

机器人的全局定位是实现机器人能够真正的实现自主运动的前提条件，机器人在随时随地启动后能够快速准确的确定自己的位置，这有效提高了机器人的适用性和实用性；另一方面机器人在定位丢失的情况下能够快速重新认识到自己目前所处的位置，这一能力为机器人定位提供了长期的稳定性和可靠性。The global positioning of the robot is a prerequisite for the robot to truly realize autonomous movement. The robot can quickly and accurately determine its own position after starting anytime and anywhere, which effectively improves the applicability and practicability of the robot; The ability to quickly re-recognize its current location in case of loss provides long-term stability and reliability for robot positioning.

近三十年来，随着计算机科学和传感器的快速发展，同步定位与测绘(SLAM)已经成为许多领域不可或缺的技术，如机器人，自动驾驶，增强现实(AR)。得益于计算机视觉的发展，视觉SLAM(V-SLAM)以其低成本、低功耗、能够提供丰富的景观信息等优势，受到了众多研究人员和企业的关注。尽管它在静态环境中取得了优异的成绩，但在动态环境中性能下降和鲁棒性不足已成为其实际应用的主要障碍。例如专利号为CN110533722A的发明专利中提出一种基于视觉词典的机器人快速重定位方法以及系统，专利号CN110796683A的发明专利提出一种基于视觉特征联合激光SLAM的重定位方法，视觉对动态物体的敏感度远高于激光雷达一类传感器，他们利用纯视觉的方法在静态环境中实现重定位是低成本的方法但是当面对真实环境的时候，他这个方法将会失效，他会将动态物体(例如行人、车辆等)的特征点一并计算，这无非自己将干扰引入系统中，虽然专利号CN110796683A会将自身的位姿也一并记录到视觉词典DBow中并且设计了一些策略，但是对于动态物体的干扰依然无济于事。激光雷达和相机融合的重定位方法如专利号CN112596064A的发明专利提出激光与视觉融合的一体化室内机器人全局定位方法，该方法同时建立激光关键帧检索库和视觉关键帧检索库，虽然鲁棒性有所提升，但是在现实动态环境下，物体的移动会导致关键帧记录下错误的特征点，随之而来的是当大量特征点改变后就有可能导致重定位失效。In the past three decades, with the rapid development of computer science and sensors, simultaneous localization and mapping (SLAM) has become an indispensable technology in many fields, such as robotics, autonomous driving, and augmented reality (AR). Thanks to the development of computer vision, visual SLAM (V-SLAM) has attracted the attention of many researchers and enterprises due to its low cost, low power consumption, and ability to provide rich landscape information. Although it has achieved excellent results in static environments, performance degradation and insufficient robustness in dynamic environments have become the main obstacles for its practical application. For example, the invention patent with the patent number CN110533722A proposes a fast relocation method and system for robots based on visual dictionaries, and the invention patent with the patent number CN110796683A proposes a relocation method based on visual features combined with laser SLAM, and the vision is sensitive to dynamic objects. The accuracy is much higher than that of sensors such as lidar. They use pure vision to achieve relocation in a static environment. It is a low-cost method, but when faced with a real environment, this method will fail. For example, the feature points of pedestrians, vehicles, etc.) are calculated together, which is nothing more than introducing interference into the system. Although the patent number CN110796683A will also record its own pose in the visual dictionary DBow and design some strategies, but for dynamic The interference of objects still does not help. The relocation method of laser radar and camera fusion, such as the invention patent of patent number CN112596064A, proposes an integrated indoor robot global positioning method for laser and vision fusion. This method simultaneously establishes a laser key frame retrieval library and a visual key frame retrieval library. Although the robustness It has been improved, but in the real dynamic environment, the movement of the object will cause the key frame to record the wrong feature points, and then when a large number of feature points change, it may cause the relocation to fail.

目前针对机器人的全局定位方法存在如下缺陷与不足：1)目前方法分别建立视觉定位地图和激光定位地图，然后通过对两种地图的对齐，实现两种全局定位方法结果的统一。但是这种方法对两种传感器的标定的精度比较敏感，并且分别建立两种地图资源消耗严重。2)如果只是用激光雷达进行重定位，不仅点云配准时间长，而且如果出现结构相似、无结构等场景就会导致重定位失效。目前，有些学者会提供初始位姿让机器人在这个范围进行匹配重定位，但是人有可能并不知道机器人大概的位置，并且这也属于人为干预，没办法实现完全自主的机器人。3)在现实环境中，动态物体的存在是不可避免的，它对重定位甚至对整个SLAM系统都会照成不可估量的影响，也是阻碍SLAM进一步发展的主要问题之一。The current global positioning method for robots has the following defects and deficiencies: 1) The current method establishes a visual positioning map and a laser positioning map respectively, and then realizes the unification of the results of the two global positioning methods by aligning the two maps. However, this method is sensitive to the calibration accuracy of the two sensors, and it consumes a lot of resources to establish two maps respectively. 2) If only lidar is used for relocation, not only the point cloud registration time will be long, but also if there are scenes with similar structure or no structure, the relocation will fail. At present, some scholars will provide an initial pose for the robot to perform matching and repositioning in this range, but humans may not know the approximate position of the robot, and this is also human intervention, and there is no way to achieve a fully autonomous robot. 3) In the real environment, the existence of dynamic objects is inevitable. It will have an inestimable impact on relocation and even the entire SLAM system, and it is also one of the main problems hindering the further development of SLAM.

发明内容Contents of the invention

本发明针对现有技术中的上述不足，提供一种面向动态环境的相机与固态激光雷达融合重定位方法，可以在动态的现实环境中实现定位，同时有效结合固态激光雷达和视觉的信息，大幅度提高机器人在动态环境下的定位的鲁棒性，重定位无需人为干预，速度快，精度高。固态激光雷达比旋转激光雷达成本低并且生成的点云密集但是FOV小。Aiming at the above-mentioned deficiencies in the prior art, the present invention provides a camera and solid-state laser radar fusion relocation method for dynamic environments, which can realize positioning in a dynamic real environment, and effectively combine solid-state laser radar and visual information at the same time, greatly The robustness of the positioning of the robot in a dynamic environment is greatly improved, and the repositioning does not require human intervention, and the speed is fast and the precision is high. Solid-state lidars are less expensive than spinning lidars and generate dense point clouds with a small FOV.

为实现上述目的，本发明提供如下技术方案：一种面向动态环境的相机与固态激光雷达融合重定位方法，包括以下步骤：In order to achieve the above object, the present invention provides the following technical solution: a camera and solid-state lidar fusion relocation method for dynamic environments, comprising the following steps:

S1：固态激光雷达和相机的外参标定得到相机到雷达的坐标变换矩阵

其中c代表相机坐标系，L代表雷达坐标系，标定相机，相机和雷达时间同步，去除图像畸变，获取去畸变后的图像；S1: External parameter calibration of solid-state lidar and camera to obtain the coordinate transformation matrix from camera to radar

Where c represents the camera coordinate system, L represents the radar coordinate system, calibrates the camera, synchronizes the camera and radar time, removes image distortion, and obtains the image after dedistortion;

S2：使用剪枝后的YOLOv5对图像中的动态目标识别，并且分别将得到的动态物体的目标框送入到视觉辅助模块中；S2: Use the pruned YOLOv5 to recognize the dynamic target in the image, and send the target frame of the obtained dynamic object to the visual assistance module;

上述的剪枝YOLOv5具体如下：The above pruning YOLOv5 is as follows:

在YOLOV5框架的基础上，采用在批量归一化(BN)中对缩放因子施加L1正则化层进行剪枝达到轻量化，因此容易实现而且无需更改现有的架构。Based on the YOLOV5 framework, the L1 regularization layer is applied to the scaling factor in batch normalization (BN) for pruning to achieve lightweight, so it is easy to implement and does not need to change the existing architecture.

式(1)是BN(Batch Normalization)层的计算公式，其中z_in和z_out分别为BN层的输入和输出，B是当前最小的mini batch，μ_B和σ_B代表在B上的输入激活函数的平均值和标准差，γ和β是可训练的仿射变换参数(尺度和位移)。每个通道数激活大小z_out和系数γ是正相关的，如果γ太小接近于零，那么激活值也会非常小，进而允许剔除γ趋近于零的通道(或神经元)。但是正常情况下，训练一个网络后，BN层的系数是类似一个正态分布，正常训练时候γ是随着epochs呈直方图分布，所以γ趋近于零附近的通道是很少的，所以没法剪枝。所以要通过L1正则化将BN缩放因子的值推近零，使能够识别无关紧要的通道(或神经元)，因为每个缩放因子都对应于特定的卷积通道(或全连接层中的神经元)。Equation (1) is the calculation formula of the BN (Batch Normalization) layer, where z _in and z _out are the input and output of the BN layer respectively, B is the current minimum mini batch, μ _B and σ _B represent the input activation on B The mean and standard deviation of the function, γ and β are the trainable affine transformation parameters (scale and displacement). The activation size z _{out of} each channel is positively correlated with the coefficient γ. If γ is too small and close to zero, the activation value will be very small, which allows to eliminate channels (or neurons) where γ is close to zero. But under normal circumstances, after training a network, the coefficients of the BN layer are similar to a normal distribution. During normal training, γ is distributed in a histogram along with epochs, so there are very few channels where γ approaches zero, so there is no French pruning. Therefore, the value of the BN scaling factor should be pushed close to zero by L1 regularization, so that insignificant channels (or neurons) can be identified, because each scaling factor corresponds to a specific convolutional channel (or neuron in the fully connected layer Yuan).

式(2)的前半部分就是标准的CNN损失函数部分，后半部分是添加的正则化系数的约束，其中λ是平衡因子，W表示可训练权重，(x，y)表示训练输入和目标，λ可以根据数据集进行合理的调整，g(s)＝∣s∣是代表L1正则化。由于L1范数的引入导致的梯度问题，可以使用smooth-L1进行替换。The first half of formula (2) is the standard CNN loss function part, and the second half is the constraint of the added regularization coefficient, where λ is the balance factor, W represents the trainable weight, (x, y) represents the training input and target, λ can be adjusted reasonably according to the data set, and g(s)=|s| represents L1 regularization. The gradient problem caused by the introduction of the L1 norm can be replaced by smooth-L1.

S3：视觉辅助模块提取图像的特征点后需要对特征点进行动态概率的计算，将概率值较大的视为干扰点并且将他剔除。为了减少动态物体对激光SLAM的定位建图产生影响，为此将当前帧的点云投影到图像上并且将S2中得到的目标框内的动态物体点云进行聚类，在建图的过程中将他剔除。S3: After the visual aid module extracts the feature points of the image, it needs to calculate the dynamic probability of the feature points, and regard the ones with higher probability values as interference points and remove them. In order to reduce the impact of dynamic objects on the positioning and mapping of laser SLAM, the point cloud of the current frame is projected onto the image and the point cloud of dynamic objects in the target frame obtained in S2 is clustered. Get him out.

上述的动态概率计算具体如下：The above dynamic probability calculation is as follows:

S31：提取ORB特征点S31: Extract ORB feature points

S311:当图像当点P周围超过N个的点的灰度值I(x)与该点P的灰度值I(p)相差大于阈值ε就认为该点为目标角点，具体表示为：S311: When the difference between the gray value I(x) of more than N points around the point P and the gray value I(p) of the point P in the image is greater than the threshold ε, the point is considered to be the target corner point, specifically expressed as:

S312：通过计算质心保持特征点方向不变性，通过对原图像序列以一定比例缩放，构成图像金字塔，以保持特征点尺度不变性：同时为了使特征点在图像中均匀分布采用四叉树均匀算法S312: Maintain the direction invariance of the feature points by calculating the centroid, and form an image pyramid by scaling the original image sequence at a certain ratio to maintain the scale invariance of the feature points: at the same time, use the quadtree uniform algorithm to make the feature points evenly distributed in the image

S313：计算BRIEF描述子，通过二进制进行描述特征点周围的信息；S313: Calculate the BRIEF descriptor, and describe the information around the feature point through binary;

S32：首先，计算由YOLO识别出来的边界框外所有点的对极几何的基础矩阵，基础矩阵将后一帧中的特征点映射到当前帧中相应的搜索域，即极线,假设P为静态点，P'为动态点，动态点到极线之间的距离为D。设p₁,p₂分别表示上一帧和当前帧的匹配点，并且P₁,P₂是齐次坐标形式，u，v是在图像中的特征点的像素坐标。S32: First, calculate the fundamental matrix of the epipolar geometry of all points outside the bounding box identified by YOLO. The fundamental matrix maps the feature points in the next frame to the corresponding search domain in the current frame, that is, the epipolar line. Suppose P is Static point, P' is a dynamic point, and the distance between the dynamic point and the epipolar line is D. Let p ₁ and p ₂ denote the matching points of the previous frame and the current frame respectively, and P ₁ and P ₂ are homogeneous coordinate forms, and u and v are pixel coordinates of feature points in the image.

P₁＝[u₁,v₁,1],P₂＝[u₂,v₂,1] (4)P ₁ =[u ₁ ,v ₁ ,1], P ₂ =[u ₂ ,v ₂ ,1] (4)

p₁＝[u₁,v₁],p₂＝[u₂,v₂] (5)p ₁ =[u ₁ , v ₁ ], p ₂ =[u ₂ ,v ₂ ] (5)

极线由I表示，并且计算公式：The epipolar line is represented by I, and the calculation formula is:

其中X,Y,Z表示极线向量，F是基础矩阵。Where X, Y, and Z represent epipolar vectors, and F is the fundamental matrix.

式(7)表示某一个配对点在上一帧经过基础矩阵变换后落在当前帧中，它与极线I之间的距离为D。对于稳定的特征点而言，在理想状态下经过基础矩阵变换后最终会落在极线I上，即点到极线的距离为零。由于运动物体的存在，图像中的每个特征点不严格约束位于其相应的极线上。将距离信息转化为概率信息，根据上述可以推导出，当点离极线距离越长，移动概率的可能性越大；Equation (7) indicates that a certain pairing point falls in the current frame after the basic matrix transformation in the previous frame, and the distance between it and the epipolar line I is D. For a stable feature point, in an ideal state, it will eventually fall on the epipolar line I after the basic matrix transformation, that is, the distance from the point to the epipolar line is zero. Due to the existence of moving objects, each feature point in the image is not strictly constrained to be located on its corresponding epipolar line. Convert distance information into probability information. According to the above, it can be deduced that the longer the distance from the epipolar line, the greater the possibility of moving probability;

S33：动态是一种连续性动作，单独对一帧进行处理时是不稳定的，因此利用贝叶斯定理，他是一种利用过去和当前信息来评估和消除动态关键点的思想。假设每个点的动态概率初始值为ω并且从匹配的关键点到其对应极线的距离是满足高斯分布而且概率迭代是符合马尔科夫性质的。S33: Dynamic is a continuous action, and it is unstable when processing a frame alone. Therefore, using Bayesian theorem, it is an idea of using past and current information to evaluate and eliminate dynamic key points. It is assumed that the initial value of the dynamic probability of each point is ω and the distance from the matching key point to its corresponding epipolar line is Gaussian distribution and the probability iteration is in line with the Markov property.

S331:使用正态概率密度函数计算关键点的移动概率，其定义为：S331: Use the normal probability density function to calculate the movement probability of the key point, which is defined as:

采用标准正态分布，δ代表标准差其值为1，期望为零，其中c_pi代表由对极几何方式初步筛选出的第i个点是静态还是动态，表达式为：The standard normal distribution is adopted, δ represents the standard deviation, its value is 1, and the expectation is zero, where c _pi represents whether the i-th point preliminarily screened by the epipolar geometric method is static or dynamic, and the expression is:

S332:在本发明中，一个点的概率传播公式为：S332: In the present invention, the probability propagation formula of a point is:

其中

更新概率是来自几何模型中，

更新概率是来自YOLO的边界框内。in

The update probabilities are derived from the geometric model,

The update probabilities are from within the bounding box of YOLO.

S333:其中ω是权重代表分别对极几何模型和YOLO识别的边界框的置信度。ω的计算方式为：S333: where ω is a weight representing the confidence of the polar geometry model and the bounding box identified by YOLO respectively. ω is calculated as:

其中N_c代表由对极几何模型计算出的外点个数，N_s代表由YOLO的边界框所包含的点数。当概率大于ω₁时就会被判定为动态点并且将其剔除不再参与重定位的特征点配对。where N _c represents the number of outliers calculated by the epipolar geometric model, and N _s represents the number of points contained by the bounding box of YOLO. When the probability is greater than ω ₁ , it will be judged as a dynamic point and will be removed from the pairing of feature points that no longer participate in relocation.

S4：固态激光雷达用传统的SLAM方法构建环境地图并计算出机器人位姿T_wL，根据外参推算出世界坐标系下相机的位姿

S4: The solid-state lidar uses the traditional SLAM method to construct the environment map and calculate the robot pose T _wL , and calculate the camera pose in the world coordinate system according to the external parameters

S5：相机在运动的过程中不断采集图像并且筛选出其中的关键帧，然后用DBow算法对已经剔除动态特征点的关键帧进行记录并且更新词袋，同时融入位姿信息，还会将投影到图像上的动态框外的点云进行存储；S5: The camera continuously collects images during the movement and screens out the key frames, then uses the DBow algorithm to record the key frames that have removed the dynamic feature points and update the bag of words, and at the same time incorporates the pose information, and also projects to The point cloud outside the dynamic frame on the image is stored;

S51：相机在运动的过程中不断采集图像并且筛选出其中的关键帧；S51: the camera continuously collects images during the movement and filters out key frames;

S52:用DBow算法对已经剔除动态特征点的关键帧进行记录并且更新词袋，并且将激光SLAM计算出来的位姿根据公式(13)推导出相机的位姿并且记录下，其中式(13)中的T_wc和T_wL分别是相机在世界坐标系下的位姿和雷达在世界坐标系下的位姿，

是相机到雷达的外参变换矩阵。S52: Use the DBow algorithm to record the key frames that have removed the dynamic feature points and update the bag of words, and deduce the pose of the camera from the pose calculated by the laser SLAM according to the formula (13) and record it, where the formula (13) T _wc and T _wL in are the pose of the camera in the world coordinate system and the pose of the radar in the world coordinate system, respectively.

is the extrinsic transformation matrix from camera to radar.

S54：将同一帧的固态激光雷达的点云通过内参和外参投影到相机的平面上；S54: project the point cloud of the solid-state lidar in the same frame onto the plane of the camera through the internal reference and the external reference;

S55：利用YOLOv5计算出来的识别框和固态激光雷达投影到相机平面上点云进行对齐，然后将动态物体聚类提取出来并且将其剔除。与此同时，将投影在图像上的剩余点云以图像的形式保存下来。S55: Use the recognition frame calculated by YOLOv5 and the solid-state lidar to project the point cloud on the camera plane for alignment, and then cluster and extract dynamic objects and remove them. At the same time, the remaining point cloud projected on the image is saved as an image.

S6：机器人开始重定位时，首先进行S3的特征点提取，利用已构建的词袋与S5中采集的关键帧进行相似度计算，如果匹配上的特征点少于阈值σ₁但是大于阈值σ₂，则会在前一步匹配出的候选帧中进行点云配准。将候选帧中存储的点云反投影成三维空间点与当前的点云进行ICP配准计算得分，最终得到总分最高的就是机器人的粗位姿；S6: When the robot starts to relocate, it first extracts the feature points of S3, and uses the constructed bag of words to calculate the similarity with the key frames collected in S5. If the matching feature points are less than the threshold σ ₁ but greater than the threshold σ ₂ , the point cloud registration will be performed in the candidate frames matched in the previous step. The point cloud stored in the candidate frame is back-projected into a three-dimensional space point and the current point cloud is registered with the ICP to calculate the score, and finally the rough pose of the robot is obtained with the highest total score;

S61：计算当前的特征点的DBow向量，将他与历史记录的DBow的向量计算相似度；S61: Calculate the DBow vector of the current feature point, and calculate the similarity between it and the DBow vector of the historical record;

S62：如果匹配上的点满足粗位姿估算则进入下一步；S62: If the matched points meet the rough pose estimation, go to the next step;

S63：如果匹配的点少于阈值σ₁但是大于阈值σ₂，则会把S62中计算出来的候选帧中所记录的点云反投影至三维空间中，当前帧的点云会依次进行ICP配准计算出最优的候选帧得到机器人的粗位姿；S63: If the matched points are less than the threshold σ ₁ but greater than the threshold σ ₂ , the point cloud recorded in the candidate frame calculated in S62 will be back-projected into the three-dimensional space, and the point cloud of the current frame will be sequentially ICP configured. Quasi-calculate the optimal candidate frame to obtain the rough pose of the robot;

S7：经过S6的匹配会得到一个机器人的粗位姿，然后这个粗位姿会被送入到固态激光雷达预处理模块中进行局部点云匹配得到精确的机器人位姿。S7: After the matching in S6, a rough pose of the robot will be obtained, and then this rough pose will be sent to the solid-state lidar preprocessing module for local point cloud matching to obtain an accurate robot pose.

本发明通过深度学习YOLOv5的目标检测得到先验动态物体；对图像进行特征点的提取，结合对极几何和贝叶斯公式用概率的形式计算每个特征点的动态概率，剔除动态特征点，保留静态的特征点；用DBOW算法进行表示更新词袋，每张图像都会携带由固态激光雷达计算出的位姿通过外参标定得到相机的位姿，并且会记录这帧静态部分的点云信息；设计二段配准的方式提高系统的鲁棒性，当在重定位的时候会先将当前帧的特征点与之前的存储的特征点进行相似度计算，如果出现特征点匹配过少的时候，还会将候选帧中的点云反投影进行ICP配准进一步提高系统的鲁棒性。本发明结合了视觉和激光改善了在动态环境下的重定位的弊端。The present invention obtains a priori dynamic objects through the target detection of deep learning YOLOv5; extracts feature points from the image, calculates the dynamic probability of each feature point in the form of probability in combination with epipolar geometry and Bayesian formula, and eliminates dynamic feature points. Keep the static feature points; use the DBOW algorithm to represent and update the bag of words, each image will carry the pose calculated by the solid-state lidar through external parameter calibration to obtain the pose of the camera, and record the point cloud information of the static part of this frame ; Design two-segment registration to improve the robustness of the system. When relocating, the feature points of the current frame will be calculated with the previously stored feature points. If there are too few feature points matching , and also perform ICP registration on the point cloud back projection in the candidate frame to further improve the robustness of the system. The invention combines vision and laser to improve the disadvantage of relocation in dynamic environment.

与现有技术相比，本发明的有益效果在于：Compared with prior art, the beneficial effect of the present invention is:

本发明一种面向动态环境的相机与固态激光雷达融合重定位方法，解决了传统激光SLAM在动态环境下重定位失效的问题，还实现了机器人自主重定位功能，无需人为干预。与此同时，在现实动态环境下能够达到快速准确的重定位的效果，还提高了传统SLAM在动态环境下的鲁棒性。The present invention is a camera and solid-state lidar fusion relocation method oriented to a dynamic environment, which solves the problem of failure of traditional laser SLAM relocation in a dynamic environment, and also realizes the robot's autonomous relocation function without human intervention. At the same time, it can achieve fast and accurate relocation effects in real dynamic environments, and also improves the robustness of traditional SLAM in dynamic environments.

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2是本发明的剪枝流程图。Fig. 2 is a pruning flowchart of the present invention.

图3是本发明的对极几何示意图。Fig. 3 is a schematic diagram of the antipole geometry of the present invention.

图4是本发明的概率与动态程度关系示意图。Fig. 4 is a schematic diagram of the relationship between probability and dynamic degree in the present invention.

图5是本发明的动态特征点剔除后的效果示意图。Fig. 5 is a schematic diagram of the effect of the dynamic feature point elimination of the present invention.

图6是慕尼黑工业大学数据集上动态特征点剔除后的效果示意图。Figure 6 is a schematic diagram of the effect of dynamic feature point removal on the data set of Technical University of Munich.

图7是本发明的重定位前示意图。Fig. 7 is a schematic diagram of the present invention before relocation.

图8是本发明的重定位后示意图。Fig. 8 is a schematic diagram of the present invention after relocation.

具体实施方式Detailed ways

为了使本项发明的目的、技术方案以及优点更加明确，以下结合附图对本发明做进一步详述。应当理解，此次描述实施例子仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described below in conjunction with the accompanying drawings. It should be understood that the implementation examples described this time are only used to explain the present invention, not to limit the present invention.

本发明为一种面向动态环境的相机与固态激光雷达融合重定位方法流程图，如图1所示，包括以下几个步骤。The present invention is a flow chart of a camera and solid-state lidar fusion relocation method for dynamic environments, as shown in FIG. 1 , including the following steps.

S1：固态激光雷达和相机的外参标定是使用香港大学Livox Camera Calib算法得到相机坐标系到雷达坐标系的变换矩阵

用Realsense d455自带的标定软件对相机进行标定，相机和雷达时间同步，去除图像畸变，获取去畸变后的图像；S1: The external parameter calibration of solid-state lidar and camera is to use the Livox Camera Calib algorithm of the University of Hong Kong to obtain the transformation matrix from the camera coordinate system to the radar coordinate system

Use the calibration software that comes with Realsense d455 to calibrate the camera, synchronize the camera and radar time, remove image distortion, and obtain the image after dedistortion;

S2：YOLOv5剪枝流程图如图7所示，使用剪枝后的YOLOv5对图像中的动态目标识别，并且分别将得到的动态物体的目标框送入到视觉辅助模块中；S2: The YOLOv5 pruning flow chart is shown in Figure 7. Use the pruned YOLOv5 to recognize the dynamic target in the image, and send the obtained target frame of the dynamic object to the visual assistance module;

S3：视觉辅助模块提取图像的特征点后，需要对特征点进行动态概率的计算，将概率值大于ω₁＝0.75的视为干扰点并且将他剔除。为了减少动态物体对激光SLAM的定位建图产生影响，为此将当前帧的点云投影到图像上并且将前一步骤中得到的目标框内的动态物体点云进行聚类，在建图的过程中将他剔除。做了数据集上的实验和在自己的数据集上，如图4所示，这是在自己的数据集上的效果，其中绿色的点是稳定的静态点。见图5所示这是在著名的慕尼黑工业大学的数据集上进行验证，可以看出概率的形式保留了在YOLO目标框内的点，进一步提高系统的鲁棒性，图上显示是红色的点就是框内保留的特征点，绿色的是框外稳定的特征点。S3: After the visual assistance module extracts the feature points of the image, it needs to calculate the dynamic probability of the feature points, and regard those whose probability value is greater than ω ₁ =0.75 as interference points and remove them. In order to reduce the impact of dynamic objects on the positioning and mapping of laser SLAM, the point cloud of the current frame is projected onto the image and the point cloud of dynamic objects in the target frame obtained in the previous step is clustered. Remove him from the process. I did experiments on the data set and on my own data set, as shown in Figure 4, this is the effect on my own data set, where the green points are stable static points. As shown in Figure 5, this is verified on the data set of the famous Technical University of Munich. It can be seen that the form of probability retains the points in the YOLO target frame to further improve the robustness of the system. The figure is shown in red The points are the feature points retained in the frame, and the green ones are the stable feature points outside the frame.

S31：首先将YOLOv5框外的特征点放入集合P中，再利用P集合中的点与上一帧进行匹配得到对应的匹配点放入到M集合中，将M集合和P集合配对点进行对极几何约束计算得到F矩阵；S31: First put the feature points outside the YOLOv5 frame into the set P, and then use the points in the P set to match with the previous frame to get the corresponding matching points into the M set, and pair the points of the M set and the P set. The F matrix is obtained by calculating the polar geometric constraints;

S32：利用公式(7)计算所有特征点到极线的距离D；S32: Utilize formula (7) to calculate the distance D from all feature points to the epipolar line;

S33:假设点到极线的距离服从标准正态分布，将S32中计算的D带入式(8)中计算出点的集合动态概率；S33: Assume that the distance from the point to the epipolar line obeys standard normal distribution, and the D calculated in S32 is brought into formula (8) to calculate the aggregate dynamic probability of the point;

S34:计算ω权重，他是代表分别对极几何模型和YOLO识别的边界框的置信度。ω的计算方式为：S34: Calculate the ω weight, which represents the confidence of the polar geometric model and the bounding box identified by YOLO respectively. ω is calculated as:

其中N_c代表由对极几何模型计算出的外点个数，N_s代表由YOLO的边界框所包含的点数。where N _c represents the number of outliers calculated by the epipolar geometric model, and N _s represents the number of points contained by the bounding box of YOLO.

S35：利用过去和当前信息来评估和消除动态关键点，进一步加强对动态点筛选的有效性，因此假设每个点的几何概率和YOLOv5框内的点的概率初始值为ω＝0.5，利用式(10)进行概率迭代如果得到的概率大于ω₁＝0.75，则将其加入动态特征点S集合中S35: Use past and current information to evaluate and eliminate dynamic key points, and further strengthen the effectiveness of dynamic point screening. Therefore, it is assumed that the initial value of the geometric probability of each point and the probability of points in the YOLOv5 frame is ω=0.5, using the formula (10) Probability iteration If the obtained probability is greater than ω ₁ =0.75, add it to the set of dynamic feature points S

S5：相机在运动的过程中不断采集图像并且筛选出其中的关键帧，然后用DBow算法对已经剔除动态特征点的关键帧进行记录并且更新词袋，同时融入S4中计算出的位姿信息，还会将投影到图像上的动态框外的点云进行存储；S5: The camera continuously collects images during the movement and screens out the key frames, then uses the DBow algorithm to record the key frames that have removed the dynamic feature points and update the word bag, and at the same time incorporate the pose information calculated in S4, The point cloud outside the dynamic frame projected onto the image will also be stored;

is the extrinsic transformation matrix from camera to radar.

S6：机器人开始重定位时，首先进行S3的特征点提取，利用已构建的词袋与S5中采集的关键帧进行相似度计算，如果特征点匹配上的特征点少于阈值σ₁＝500但是大于阈值σ₂＝200，则会进行对当前特征点匹配的候选帧，将其存储的点云反投影成三维空间点与当前的点云进行ICP配对计算得分，最终得到总分最高的就是机器人的粗位姿；S6: When the robot starts to relocate, it first extracts the feature points of S3, and uses the constructed bag of words to calculate the similarity with the key frames collected in S5. If the feature points matching the feature points are less than the threshold σ ₁ =500 but If it is greater than the threshold σ ₂ =200, it will match the candidate frame of the current feature point, back-project the stored point cloud into a three-dimensional space point and perform ICP pairing with the current point cloud to calculate the score, and finally the robot with the highest total score is the robot. rough pose of

S63：如果匹配的点少于阈值σ₁＝500但是大于阈值σ₂＝200，则会把S62中计算出来的候选帧中所记录的点云反投影至三维空间中，当前帧的点云会依次进行ICP配准计算出最优的候选帧得到机器人的粗位姿；S63: If the matched points are less than the threshold σ ₁ =500 but greater than the threshold σ ₂ =200, the point cloud recorded in the candidate frame calculated in S62 will be back-projected into the three-dimensional space, and the point cloud of the current frame will be Perform ICP registration in sequence to calculate the optimal candidate frame to obtain the rough pose of the robot;

可以见附图6和附图7进行对比，这是验证重定位的效果和速度，图6是准备重定位前的样子可以看出靠右边的红色点并没有与白色的地图点重叠，当在图7的时候可以看出红色的当前帧的点云已经成功的与白色地图点的重叠了。可以注意到两幅图下面的WallTime的时间可以发现重定位的速度只需要80ms即可完成重定位。并且该地图是受到动态物体干扰后形成的地图，依然可以完成重定位的，这充分表现出本发明的可靠性和实用性。You can see the comparison between attached drawing 6 and attached drawing 7, which is to verify the effect and speed of relocation. Figure 6 is the appearance before relocation. It can be seen that the red point on the right does not overlap with the white map point. When in In Figure 7, it can be seen that the red point cloud of the current frame has successfully overlapped with the white map point. It can be noticed that the WallTime time under the two pictures can be found that the relocation speed only needs 80ms to complete the relocation. Moreover, the map is formed after being disturbed by dynamic objects, and can still be relocated, which fully demonstrates the reliability and practicability of the present invention.

需要强调的是，本发明所述的实施案例是说明性的，而不是限定性的，因此本发明包括并不限于具体实施方案中所述的实施例，凡是由本领域技术人员根据本发明的技术方案得出类似的其它实施方式，同样属于本发明的保护范围。It should be emphasized that the embodiments described in the present invention are illustrative rather than restrictive, so the present invention includes and is not limited to the examples described in the specific implementation, and those skilled in the art according to the technology of the present invention Other implementations similar to those derived from the scheme also belong to the protection scope of the present invention.

Claims

1. A camera and solid-state lidar fusion relocation method for dynamic environment, it is characterized in that, the method comprises:

S1: External parameter calibration of solid-state lidar and camera to obtain the coordinate transformation matrix from camera to radar

S2: Use the pruned YOLOv5 to recognize the dynamic target in the image, and send the target frame of the obtained dynamic object to the visual assistance module;

S3: After the visual aid module extracts the feature points of the image, it needs to calculate the dynamic probability of the feature points, and regard the ones with higher probability values as interference points and remove them; in order to reduce the impact of dynamic objects on the positioning and mapping of laser SLAM, To this end, the point cloud of the current frame is projected onto the image and the point cloud of the dynamic object in the target frame obtained in S2 is clustered, and he is eliminated during the process of building the map;

S5: The camera continuously collects images during the movement and screens out the key frames, then uses the DBow algorithm to record the key frames that have removed the dynamic feature points and update the bag of words, and at the same time incorporates the pose information, and also projects to The point cloud outside the dynamic frame on the image is stored;

S6: When the robot starts to relocate, it first extracts the feature points of S3, and uses the constructed bag of words to calculate the similarity with the key frames collected in S5. If the matching feature points are less than the threshold σ ₁ but greater than the threshold σ ₂ , the point cloud registration will be performed in the candidate frame matched in the previous step; the point cloud stored in the candidate frame is back-projected into a three-dimensional space point and the current point cloud is used to perform ICP pairing to calculate the score, and finally the one with the highest total score is Coarse pose of the robot;

S7: After the matching in S6, a rough pose of the robot will be obtained, and then this rough pose will be sent to the solid-state lidar preprocessing module for local point cloud matching to obtain an accurate robot pose.

2. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, is characterized in that: the pruning YOLOv5 in S2 is specifically as follows:

On the basis of the YOLOV5 framework, the L1 regularization layer is applied to the scaling factor in the batch normalization (BN) for pruning to achieve lightweight, so it is easy to implement and does not need to change the existing architecture;

Equation (1) is the calculation formula of the BN (Batch Normalization) layer, where z _in and z _out are the input and output of the BN layer respectively, B is the current minimum mini batch, μ _B and σ _B represent the input activation on B The mean and standard deviation of the function, γ and β are trainable affine transformation parameters (scale and displacement); the activation size z _out of each channel is positively correlated with the coefficient γ, if γ is too small and close to zero, then the activation The value will also be very small, allowing to eliminate channels (or neurons) where γ approaches zero; but under normal circumstances, after training a network, the coefficients of the BN layer are similar to a normal distribution, and during normal training γ is with The epochs are distributed in a histogram, so there are very few channels where γ approaches zero, so it cannot be pruned; therefore, the value of the BN scaling factor must be pushed to zero through L1 regularization, so that irrelevant channels can be identified ( or neurons), since each scaling factor corresponds to a specific convolutional channel (or neuron in a fully connected layer);

The first half of formula (2) is the standard CNN loss function part, and the second half is the constraint of the added regularization coefficient, where λ is the balance factor, W is the trainable weight, and l(x,y) is the training input and target , λ can be reasonably adjusted according to the data set, g(s)=∣s∣ represents L1 regularization; the gradient problem caused by the introduction of the L1 norm can be replaced by smooth-L1.

3. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, is characterized in that: S3 is specifically as follows:

S31: Extract ORB feature points

S311: When the difference between the gray value I(x) of more than N points around the point P and the gray value I(p) of the point P in the image is greater than the threshold ε, the point is considered to be the target corner point, specifically expressed as:

S312: Maintain the direction invariance of the feature points by calculating the centroid, and form an image pyramid by scaling the original image sequence at a certain ratio to maintain the scale invariance of the feature points: at the same time, use the quadtree uniform algorithm to make the feature points evenly distributed in the image

S313: Calculate the BRIEF descriptor, and describe the information around the feature point through binary;

S32: First, calculate the fundamental matrix of the epipolar geometry of all points outside the bounding box identified by YOLO. The fundamental matrix maps the feature points in the next frame to the corresponding search domain in the current frame, that is, the epipolar line. Suppose P is Static point, P' is a dynamic point, and the distance between the dynamic point and the epipolar line is D; let p ₁ and p ₂ represent the matching points of the previous frame and the current frame respectively, and P ₁ and P ₂ are in the form of homogeneous coordinates , u, v are the pixel coordinates of the feature points in the image;

P ₁ =[u ₁ ,v ₁ ,1], P ₂ =[u ₂ ,v ₂ ,1] (4)

p ₁ =[u ₁ , v ₁ ], p ₂ =[u ₂ ,v ₂ ] (5)

The epipolar line is represented by I, and the calculation formula is:

Among them, X, Y, and Z represent epipolar vectors, and F is the fundamental matrix;

Equation (7) indicates that a certain paired point falls in the current frame after the basic matrix transformation in the previous frame, and the distance between it and the epipolar line I is D; for stable feature points, in an ideal state, after the basic After the matrix transformation, it will eventually fall on the epipolar line I, that is, the distance from the point to the epipolar line is zero; due to the existence of moving objects, each feature point in the image is not strictly constrained to be located on its corresponding epipolar line; the distance information is transformed into It is probability information. According to the above, it can be deduced that the longer the distance from the epipolar line, the greater the possibility of moving probability;

S33: Dynamic is a continuous action, and it is unstable when processing a frame alone. Therefore, using Bayesian theorem, it is an idea to evaluate and eliminate dynamic key points by using past and current information; assuming that every The initial value of the dynamic probability of a point is ω and the distance from the matching key point to its corresponding epipolar line satisfies the Gaussian distribution and the probability iteration conforms to the Markov property;

S331: Use the normal probability density function to calculate the movement probability of the key point, which is defined as:

The standard normal distribution is adopted, δ represents the standard deviation, its value is 1, and the expectation is zero, where c _pi represents whether the i-th point preliminarily screened by the epipolar geometric method is static or dynamic, and the expression is:

S332: In the present invention, the probability propagation formula of a point is:

in

The update probabilities are derived from the geometric model,

Update probabilities are from within the bounding box of YOLO;

S333: wherein ω is the confidence degree of the bounding box identified by the weight representing the pole geometry model and YOLO respectively; the calculation method of ω is:

Among them, N _c represents the number of outliers calculated by the epipolar geometric model, and N _s represents the number of points contained by the bounding box of YOLO; when the probability is greater than ω ₁ , it will be judged as a dynamic point and will be eliminated from participating in the Relocated feature point pairs.

4. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, it is characterized in that: solid-state lidar uses traditional SLAM method to construct environment map and calculates robot pose _TwL , according to The external parameters calculate the pose of the camera in the world coordinate system

5. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, it is characterized in that: in S5, the storage mode of camera and radar information is specifically as follows:

S51: the camera continuously collects images during the movement and filters out key frames;

S52: Use the DBow algorithm to record the key frames that have removed the dynamic feature points and update the bag of words, and deduce the pose of the camera from the pose calculated by the laser SLAM according to the formula (13) and record it, where the formula (13) T _wc and T _wL in are the pose of the camera in the world coordinate system and the pose of the radar in the world coordinate system, respectively.

is the extrinsic transformation matrix from camera to radar;

S54: project the point cloud of the solid-state lidar in the same frame onto the plane of the camera through the internal reference and the external reference;

S55: Use the recognition frame calculated by YOLOv5 and the solid-state lidar to project the point cloud on the camera plane for alignment, then extract the people clusters and remove them; at the same time, project the remaining point cloud on the image as an image form is preserved.

6. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, is characterized in that: the relocation matching in S6 is specifically as follows:

S61: Calculate the DBow vector of the current feature point, and calculate the similarity between it and the DBow vector of the historical record;

S62: If the matched points meet the rough pose estimation, go to the next step;

S63: If the matched points are less than the threshold σ ₁ but greater than the threshold σ ₂ , the point cloud recorded in the candidate frame calculated in S62 will be back-projected into the three-dimensional space, and the point cloud of the current frame will be sequentially ICP configured. Quasi-calculate the optimal candidate frame to get the rough pose of the robot.

7. A kind of dynamic environment-oriented camera and solid-state lidar fusion relocation method as claimed in claim 1, it is characterized in that relocation is specifically: the rough pose of a robot can be obtained through the matching of S6, and then this rough pose The pose will be sent to the solid-state lidar preprocessing module for local point cloud matching to obtain the precise pose of the robot.