CN110349250A

CN110349250A - A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera

Info

Publication number: CN110349250A
Application number: CN201910572096.1A
Authority: CN
Inventors: 林斌; 曹权
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-10-18
Anticipated expiration: 2039-06-28
Also published as: CN110349250B

Abstract

The invention discloses a three-dimensional reconstruction method of an indoor dynamic scene based on an RGBD camera, including the method of calibrating the RGBD camera, collecting scene images, extracting feature points, detecting and eliminating dynamic points, and combining convolutional neural network and multi-view geometric method To perform steps such as dynamic point elimination, re-tracking, insertion of new key frames, local map optimization, and loop detection. Using this method effectively solves the problem of inaccurate camera pose estimation and dynamic object ghosting in 3D reconstruction in dynamic scenes. , the present invention optimizes the time of the dynamic area detection step, and can realize the effect of real-time reconstruction. On the TUM dataset, we use the absolute error of the camera trajectory to measure, and its root mean square error has increased from the previous 0.752 to 0.025.

Description

A 3D reconstruction method of indoor dynamic scene based on RGBD camera

技术领域technical field

本发明属于三维成像领域，具体的说，是一种基于RGBD相机的室内动态场景的三维重建方法。The invention belongs to the field of three-dimensional imaging, in particular, it is a three-dimensional reconstruction method of an indoor dynamic scene based on an RGBD camera.

背景技术Background technique

在近些年的计算机技术的进步，AR、VR技术逐渐成为研究的热点领域之一，在智能家居领域，在虚拟购物领域有着重要的应用，如何有效的重建周围的场景是其中的研究的方向之一。随着人工智能技术的进步，在汽车自动驾驶领域，无人机自主飞行领域同样需要解决周围场景的构建以及自身的定位问题。With the advancement of computer technology in recent years, AR and VR technology has gradually become one of the hot research areas. In the field of smart home, it has important applications in the field of virtual shopping. How to effectively reconstruct the surrounding scenes is the direction of research. one. With the advancement of artificial intelligence technology, in the field of automatic driving of cars, the field of autonomous flight of drones also needs to solve the construction of surrounding scenes and its own positioning.

SLAM(同时定位与地图构建)技术的出现很好的解决了以上的问题。SLAM系统利用搭载于物体上传感器(例如视觉SLAM中的单目、双目相机)来获得外界信息，利用这些信息对自身的位置和姿态进行估计,根据需要构建出地图。The emergence of SLAM (simultaneous localization and map construction) technology has solved the above problems well. The SLAM system uses sensors mounted on objects (such as monocular and binocular cameras in visual SLAM) to obtain external information, uses this information to estimate its own position and attitude, and constructs maps as needed.

与此同时，近几年深度学习的神经网络在图像识别和分割领域的表现令人惊艳，Alex-Net，Google-Net，VGG，Res-Net等网络不断刷新着目标识别数据集的准确率。因此与深度学习相结合的语义SLAM系统有了新的发展。语义SLAM系统在对周围场景进行重建的同时能够对场景中的物体进行识别，将其应用在机器人领域可以提高机器人对周围场景的理解与认知，可以为机器人完成更加复杂的任务提供可能。将其应用在自动驾驶领域可以实现主动避障，危险预警等功能，因此语义SLAM有着广阔的应用前景。At the same time, in recent years, the performance of deep learning neural networks in the field of image recognition and segmentation has been amazing. Alex-Net, Google-Net, VGG, Res-Net and other networks have continuously refreshed the accuracy of target recognition datasets. Therefore, the semantic SLAM system combined with deep learning has a new development. The semantic SLAM system can recognize objects in the scene while reconstructing the surrounding scene. Its application in the field of robotics can improve the robot's understanding and cognition of the surrounding scene, and can provide the possibility for the robot to complete more complex tasks. Applying it in the field of automatic driving can realize active obstacle avoidance, danger warning and other functions, so semantic SLAM has broad application prospects.

传统的SLAM方法很好的解决了静态场景下的定位和建图问题，但是在有动态物体的场景中，这些方法的定位和建图的准确度很差，因为传统的SLAM方法难以区分出动态物体和静态物体，而它们进行相同的处理。但实际上，动态物体提取的特征点与静态背景的特征点运动时不一致的，这会严重影响相机位置的定位，从而影响建图的结果。在实际应用中，消除动态物体也是很有必要的，例如在扫地机器人的路径规划和导航中，如果不能消除人，狗，猫等动态物体，会使机器人的导航路径产生偏差。Traditional SLAM methods solve the positioning and mapping problems in static scenes very well, but in scenes with dynamic objects, the positioning and mapping accuracy of these methods is very poor, because traditional SLAM methods are difficult to distinguish dynamic objects. objects and static objects, and they are treated the same. But in fact, the feature points extracted by the dynamic object are inconsistent with the feature points of the static background, which will seriously affect the positioning of the camera position, thereby affecting the result of the mapping. In practical applications, it is also necessary to eliminate dynamic objects. For example, in the path planning and navigation of sweeping robots, if dynamic objects such as people, dogs, and cats cannot be eliminated, the robot's navigation path will be deviated.

本发明解决了SLAM系统中在动态场景下轨迹估计不准确，地图重建时产生重影的问题。本发明在SLAM系统中加入了卷积神经网络，利用卷积神经网络对物体进行分割，利用先验知识和多视图几何法对动态点进行判断，从而消除了动态点对场景的影响。The invention solves the problems of inaccurate trajectory estimation and ghosting during map reconstruction in the SLAM system. The present invention adds a convolutional neural network to the SLAM system, uses the convolutional neural network to segment objects, and uses prior knowledge and multi-view geometry to judge dynamic points, thereby eliminating the impact of dynamic points on the scene.

发明内容Contents of the invention

本发明的目的是针对现有三维重建方法在动态场景下存在的问题提出了一种基于RGBD相机和卷积神经网络的三维重建方法，是利用深度相机和彩色相机对室内场景进行三维重建，利用语义分割的方法消除了动态物体对最终重建效果的影响。The purpose of the present invention is to propose a three-dimensional reconstruction method based on RGBD cameras and convolutional neural networks for the problems existing in existing three-dimensional reconstruction methods in dynamic scenes. It uses depth cameras and color cameras to perform three-dimensional reconstruction of indoor scenes. The method of semantic segmentation eliminates the impact of dynamic objects on the final reconstruction effect.

本发明是通过以下技术方案来实现的：The present invention is achieved through the following technical solutions:

本发明公开了一种基于RGBD相机和卷积神经网络的三维重建方法，三维重建方法所用的装置主要包括RGBD相机，带有GPU的PC机，具体步骤如下：The invention discloses a three-dimensional reconstruction method based on an RGBD camera and a convolutional neural network. The devices used in the three-dimensional reconstruction method mainly include an RGBD camera and a PC with a GPU. The specific steps are as follows:

1)、标定RGBD相机：得到RGBD相机的彩色相机内参值和深度相机的内参值以及深度相机和彩色相机的转移矩阵；1), Calibrate the RGBD camera: Obtain the internal parameter value of the color camera of the RGBD camera and the internal parameter value of the depth camera, as well as the transfer matrix of the depth camera and the color camera;

2)、采集场景图像：每一帧包括一张彩色图和深度图，利用RGBD相机的SDK将彩色图与深度图对齐；2) Collect scene images: each frame includes a color map and a depth map, and use the SDK of the RGBD camera to align the color map with the depth map;

3)、提取特征点：利用ORB特征点算法提取出彩色图像中的特征点；3), extract feature points: use the ORB feature point algorithm to extract feature points in the color image;

4)、动态点检测与剔除：利用卷积神经网络和多视图几何法相结合的方式来进行动态点的剔除；4), dynamic point detection and elimination: use the combination of convolutional neural network and multi-view geometric method to eliminate dynamic points;

5)、再跟踪：在利用多视图几何法进一步剔除动态点后，重新利用速度模型和参考帧模型进行对当前帧位姿进行估计，在得到初始位姿后在局部地图中进行更精确的跟踪；5) Re-tracking: After using the multi-view geometry method to further eliminate dynamic points, re-use the velocity model and the reference frame model to estimate the pose of the current frame, and perform more accurate tracking in the local map after obtaining the initial pose ;

6)、插入新关键帧：判断插入新关键帧的依据是，距离上一次插入关键帧超过了20帧以上，当前帧追踪到了到的地图点少于50个，当前帧与参考帧的重合比例少于90％；6) Insert a new key frame: The basis for judging the insertion of a new key frame is that the distance from the last inserted key frame exceeds 20 frames, the number of map points tracked by the current frame is less than 50, and the overlap ratio between the current frame and the reference frame less than 90%;

7)、局部地图优化：对新加入的关键帧、新加入关键帧的所有共视帧及所有共视帧中的地图点进行BA优化；7), local map optimization: perform BA optimization on newly added key frames, all common view frames of newly added key frames, and map points in all common view frames;

8)、回环检测：利用当前关键帧的词袋模型判断是否与之前帧构成闭环，若过闭环一致性检，则计算闭环帧之间的相似变换，相邻帧进行闭环矫正。8) Loop closure detection: Use the bag-of-words model of the current key frame to determine whether it forms a closed loop with the previous frame. If it passes the closed-loop consistency check, calculate the similarity transformation between closed-loop frames, and perform closed-loop correction on adjacent frames.

作为进一步地改进，本发明所述的步骤4)中，动态点检测与剔除包括如下步骤：As a further improvement, in step 4) of the present invention, dynamic point detection and elimination include the following steps:

a)、图像分割与识别：利用重新训练的SegNet网络(一种基于卷积神经网络的语义分割网络)对彩色图像进行分割，得到每个像素的标签，得到的人和其他动物的区域属于动态区域，其内的特征点属于动态点，对这些动态点进行剔除操作；a), image segmentation and recognition: use the retrained SegNet network (a semantic segmentation network based on convolutional neural network) to segment the color image, get the label of each pixel, and the obtained areas of people and other animals are dynamic The feature points in the area belong to dynamic points, and these dynamic points are eliminated;

b)、跟踪：将当前帧剔除后的特征点和上一帧或者上一个参考帧做特征点匹配，将利用速度模型或参考帧模型估计出当前帧的一个初始位姿；b) Tracking: match the feature points of the current frame with the feature points of the previous frame or the previous reference frame, and use the velocity model or the reference frame model to estimate an initial pose of the current frame;

c)、多视图几何法判断动态点：通过比较某一空间特征点投影到当前帧上深度估计值与该点的深度测量值来进行判断，当其差值大于一定阈值是认为该点为动态点；c) Judgment of dynamic points by multi-view geometry method: Judgment is made by comparing the estimated depth value of a certain spatial feature point projected onto the current frame with the measured depth value of the point. When the difference is greater than a certain threshold, the point is considered to be dynamic. point;

d)、对多视图几何法得到动态点，对其在深度图上进行区域生长操作，得到动态区域；d) Obtain dynamic points from the multi-view geometry method, and perform region growing operations on the depth map to obtain dynamic regions;

e)、将卷积神经网络和多视图几何法得到的两个动态区域进行融合，融合的具体操作方法是对两个区域取并集。e) Fusing the two dynamic regions obtained by the convolutional neural network and the multi-view geometry method, the specific operation method of the fusion is to take the union of the two regions.

f)、作为进一步的改进，为了提高检测速度，每5帧进行一次动态区域检测,即每5帧重复以上a)-e)步骤。f) As a further improvement, in order to increase the detection speed, the dynamic region detection is performed every 5 frames, that is, the above steps a)-e) are repeated every 5 frames.

作为进一步地改进，本发明所述的步骤c)中的具体判断的原理如下：将空间点X的坐标投影到当前帧上可以获得图像点x′，同时可以估计它的深度信息z_proj,如果一个动态物体对X进行了遮挡，则实际测量得到的深度信息z′会小于z_proj，因此当某个点的′Δz＝z_proj-z′大于某个阈值τ时，将其归类为动态点。As a further improvement, the principle of the specific judgment in step c) of the present invention is as follows: the coordinates of the spatial point X can be projected onto the current frame to obtain the image point x′, and its depth information z _proj can be estimated at the same time, if If a dynamic object blocks X, the actual measured depth information z′ will be smaller than z _proj , so when a point’Δz=z _proj -z′ is greater than a certain threshold τ, it is classified as dynamic point.

作为进一步地改进，本发明所述的τ值为0.5m。As a further improvement, the value of τ described in the present invention is 0.5m.

作为进一步地改进，本发明所述的步骤6)后，步骤7)前还包括如下步骤：送入建图线程，建图线程负责将相机位姿，彩色图，深度图生成点云图。As a further improvement, after step 6) described in the present invention, before step 7) also includes the following steps: send into the mapping thread, and the mapping thread is responsible for generating the point cloud image with the camera pose, color map, and depth map.

作为进一步地改进，本发明所述的RGBD相机为Kinect V2(微软第二代3D体感摄影机)。As a further improvement, the RGBD camera described in the present invention is Kinect V2 (Microsoft's second generation 3D somatosensory camera).

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

本发明提出了一种语义分割相结合的三维重建方法，使用该方法有效的解决了三维重建在动态场景下相机位姿估计不准确以及动态物体重影的问题，利用SegNet网络对图像进行分割，利用先验知识排除属于动态区域的点，再利用多视图几何法进一步判断图像中属于动态部分的其他点，最后利用所有静态点进行相机位姿估计和三维场景重建。本发明对动态区域检测步骤进行了时间上的优化，可以实现实时重建的效果。在TUM的数据集上我们用相机轨迹绝对误差进行衡量，其均方根误差由先前的0.752提高至0.025。The present invention proposes a 3D reconstruction method combined with semantic segmentation, which effectively solves the problems of inaccurate camera pose estimation and ghosting of dynamic objects in 3D reconstruction in dynamic scenes, and uses the SegNet network to segment images, Use prior knowledge to exclude points belonging to the dynamic area, and then use the multi-view geometry method to further judge other points belonging to the dynamic part of the image, and finally use all static points for camera pose estimation and 3D scene reconstruction. The invention optimizes the time of the dynamic area detection step, and can realize the effect of real-time reconstruction. On the TUM dataset, we use the absolute error of the camera trajectory to measure, and its root mean square error has increased from the previous 0.752 to 0.025.

附图说明Description of drawings

图1是本发明系统的流程示意图；Fig. 1 is a schematic flow chart of the system of the present invention;

图2是多视图几何法判断动态点原理示意图。Fig. 2 is a schematic diagram of the principle of judging dynamic points by the multi-view geometry method.

具体实施方法Specific implementation method

本发明公开了一种基于RGBD相机的室内三维重建方法，，着重解决室内场景中的动态物体，装置主要由RGBD相机，带有GPU的PC机组成，RGBD相机为Kinect,图1是本发明系统的流程示意图；具体步骤如下：The present invention discloses an indoor three-dimensional reconstruction method based on RGBD camera, which focuses on solving dynamic objects in indoor scenes. The device is mainly composed of RGBD camera and PC with GPU. The RGBD camera is Kinect. Figure 1 is the system of the present invention Schematic diagram of the process; the specific steps are as follows:

标定Kinect相机，得到Kinect的彩色相机内参值和深度相机的内参值以及深度相机和彩色相机的转移矩阵。Calibrate the Kinect camera, get the Kinect color camera internal parameter value and depth camera internal parameter value and the transfer matrix of the depth camera and color camera.

采集场景图像：每一帧包括一张彩色图和深度图，利用Kinect的SDK将彩色图与深度图对齐。Collect scene images: each frame includes a color map and a depth map, and use the Kinect SDK to align the color map with the depth map.

提取特征点：利用ORB特征点算法提取出彩色图像中的特征点。Extract feature points: use the ORB feature point algorithm to extract feature points in the color image.

动态点剔除：本发明利用卷积神经网络和多视图几何法相结合的方式来进行动态点的剔除。具体方法如下：Dynamic point elimination: The present invention utilizes a combination of convolutional neural network and multi-view geometry method to eliminate dynamic points. The specific method is as follows:

1、利用重新训练的SegNet网络对彩色图像进行分割，得到每个像素的标签，得到的人和其他动物的区域属于动态区域，其内的特征点属于动态点，对这些动态点进行剔除操作。1. Use the retrained SegNet network to segment the color image to obtain the label of each pixel. The obtained areas of people and other animals belong to the dynamic area, and the feature points in it belong to the dynamic points, and these dynamic points are eliminated.

2、跟踪：将当前帧剔除后的特征点和上一帧或者上一个参考帧做特征点匹配，将利用速度模型或参考帧模型估计出当前帧的一个初始位姿2. Tracking: Match the feature points of the current frame with the feature points of the previous frame or the previous reference frame, and use the velocity model or reference frame model to estimate an initial pose of the current frame

3、多视图几何法判断动态点：通过比较某一空间特征点投影到当前帧上深度估计值与该点的深度测量值来进行判断，当其差值大于一定阈值是认为该点为动态点。3. Judgment of dynamic points by multi-view geometry method: Judgment is made by comparing the estimated depth value of a certain spatial feature point projected on the current frame with the measured depth value of the point. When the difference is greater than a certain threshold, the point is considered to be a dynamic point .

4、对多视图几何法得到动态点，对其在深度图上进行区域生长操作，得到动态区域。4. The dynamic point is obtained by the multi-view geometry method, and the region growing operation is performed on it on the depth map to obtain the dynamic region.

5、将卷积神经网络和多视图几何法得到的两个动态区域进行融合：融合的具体操作方法是对两个区域取并集。5. Fuse the two dynamic regions obtained by the convolutional neural network and the multi-view geometry method: the specific operation method of the fusion is to take the union of the two regions.

6、为了提高检测速度，每5帧进行一次动态区域检测。6. In order to improve the detection speed, a dynamic area detection is performed every 5 frames.

再跟踪：在剔除动态点后，重新利用速度模型和参考帧模型进行对当前帧位姿进行估计，在得到初始位姿后在局部地图中进行更精确的跟踪。Re-tracking: After removing dynamic points, re-use the velocity model and reference frame model to estimate the pose of the current frame, and perform more accurate tracking in the local map after obtaining the initial pose.

插入新关键帧：判断插入新关键帧的依据是：距离上一次插入关键帧超过了20帧以上，当前帧追踪到了到的地图点少于50个，当前帧与参考帧的重合比例少于90％。Insert a new key frame: The basis for judging the insertion of a new key frame is: the distance from the last inserted key frame exceeds 20 frames, the number of map points tracked by the current frame is less than 50, and the overlap ratio between the current frame and the reference frame is less than 90 %.

送入建图线程，建图线程负责将相机位姿，彩色图，深度图生成点云图。Send it to the mapping thread, which is responsible for generating a point cloud image from the camera pose, color map, and depth map.

局部地图优化：对于新加入的关键帧，找出其共视帧和这些共视帧可以观察到的地图点，进行BA优化。Local map optimization: For newly added key frames, find out their common view frames and map points that can be observed by these common view frames, and perform BA optimization.

回环检测：利用当前关键帧的词袋模型判断是否能与之前帧构成闭环，如果通过闭环一致性检，则计算闭环帧之间的相似变换，并对相邻帧进行闭环矫正。Loop detection: Use the bag-of-words model of the current key frame to judge whether it can form a closed loop with the previous frame. If it passes the closed-loop consistency check, calculate the similarity transformation between closed-loop frames, and perform closed-loop correction on adjacent frames.

下面通过具体实施例对本发明的技术方案作进一步地说明：The technical scheme of the present invention will be further described below by specific examples:

作为一种优选RGBD相机采用了KinectV2，具体步骤如下：As a preferred RGBD camera, KinectV2 is adopted, and the specific steps are as follows:

步骤一：使用棋盘格对KinectV2进行标定，通过标定算法得到得到Kinect的彩色相机内参值和深度相机的内参值以及深度相机和彩色相机的转移矩阵R，T；Step 1: Use the checkerboard to calibrate KinectV2, and get Kinect's internal parameter values of the color camera and depth camera and the transfer matrix R, T of the depth camera and color camera through the calibration algorithm;

步骤二：可以手持Kinect或者将Kinect置于移动机器人上，对室内场景进行采集，其中Kinect得到每一帧信息包括一张彩色图和深度图，需要利用Kinect的SDK将彩色图与深度图对齐。Step 2: You can hold the Kinect or put the Kinect on the mobile robot to collect the indoor scene. Kinect gets each frame information including a color map and a depth map. You need to use the Kinect SDK to align the color map with the depth map.

步骤三：对输入的每一帧图像进行跟踪，这一步包含了动态点部分的剔除以及对相机位姿的初始跟踪。Step 3: Track each frame of the input image. This step includes the removal of dynamic points and the initial tracking of the camera pose.

1、提取特征点：利用ORB特征点算法提取出彩色图像中的特征点。1. Extract feature points: Use the ORB feature point algorithm to extract feature points in the color image.

2、动态点检测与剔除：2. Dynamic point detection and elimination:

a)图像分割与识别：利用重新训练的SegNet网络对彩色图像进行分割，得到每个像素的标签，得到的人和其他动物的区域属于动态区域，其内的特征点属于动态点，对这些动态点进行剔除操作。a) Image segmentation and recognition: Use the retrained SegNet network to segment the color image to get the label of each pixel. The obtained areas of people and other animals belong to the dynamic area, and the feature points in it belong to the dynamic points. For these dynamic Click to delete.

b)跟踪：将当前帧剔除后的特征点和上一帧或者上一个参考帧做特征点匹配，将利用速度模型或参考帧模型估计出当前帧的一个初始估计位姿b) Tracking: match the feature points of the current frame with the feature points of the previous frame or the previous reference frame, and use the velocity model or the reference frame model to estimate an initial estimated pose of the current frame

c)多视图几何法判断动态点：图2是多视图几何法判断动态点原理示意图；利用上一步求得的初始估计位姿对特征点进行判断是否属于动态点，具体判断的原理如下：将空间点X的坐标投影到当前帧上可以获得图像点x′，同时可以估计它的深度信息z_proj,如果一个动态物体对X进行了遮挡，则实际测量得到的深度信息z′会小于z_proj，如图2所示。因此当某个点的′Δz＝z_proj-z′大于某个阈值τ时，将其归类为动态点。经过多次实验我们将τ设置为0.5m。c) Multi-view geometry method to judge dynamic points: Figure 2 is a schematic diagram of the principle of multi-view geometry method to judge dynamic points; use the initial estimated pose obtained in the previous step to judge whether a feature point belongs to a dynamic point, and the specific judgment principle is as follows: The coordinates of the spatial point X are projected onto the current frame to obtain the image point x′, and its depth information z _proj can be estimated at the same time. If a dynamic object blocks X, the actual measured depth information z′ will be smaller than z _proj ,as shown in picture 2. Therefore, when 'Δz=z _proj -z' of a certain point is greater than a certain threshold τ, it is classified as a dynamic point. After many experiments we set τ as 0.5m.

d)对多视图几何法得到动态点，对其在深度图上进行区域生长操作，得到动态区域。d) The multi-view geometry method is used to obtain dynamic points, and the region growing operation is performed on the depth map to obtain dynamic regions.

e)将卷积神经网络和多视图几何法得到的两个动态区域进行融合：融合的具体操作方法是对两个区域取并集。e) Fuse the two dynamic regions obtained by the convolutional neural network and the multi-view geometry method: the specific operation method of fusion is to take the union of the two regions.

f)为了提高检测速度，每5帧进行一次动态区域检测，对于未进行动态区域检测的帧，需要同已经进行动态区域检测的帧进行特征点匹配。利用匹配后落在动态区域外的特征点进行相机位姿估计。f) In order to improve the detection speed, a dynamic area detection is performed every 5 frames. For frames without dynamic area detection, it is necessary to perform feature point matching with frames that have undergone dynamic area detection. The camera pose estimation is performed using the matched feature points that fall outside the dynamic region.

3、再跟踪：在利用多视图几何法进一步剔除动态点后，重新利用速度模型和参考帧模型进行对当前帧位姿进行估计，在得到初始位姿后在局部地图中进行更精确的跟踪。3. Re-tracking: After using the multi-view geometry method to further eliminate dynamic points, re-use the velocity model and the reference frame model to estimate the pose of the current frame, and perform more accurate tracking in the local map after obtaining the initial pose.

4、插入新关键帧：判断插入新关键帧的依据是：距离上一次插入关键帧超过了20帧以上，当前帧追踪到了到的地图点少于50个，当前帧与参考帧的重合比例少于90％。4. Insert a new key frame: The basis for judging the insertion of a new key frame is: the distance from the last inserted key frame exceeds 20 frames, the number of map points tracked by the current frame is less than 50, and the overlap ratio between the current frame and the reference frame is small at 90%.

步骤四：局部地图优化：对新加入的关键帧、新加入关键帧的所有共视帧及所有共视帧中的地图点进行BA优化。Step 4: Local map optimization: BA optimization is performed on the newly added key frame, all common view frames of the newly added key frame, and map points in all common view frames.

如果设观测方程为z＝h(ξ，p)，这里ξ是相机位姿的李代数表示，p表示路标点的世界坐标，而观测数据z表示像素坐标z＝[u_s，v_s]^T，因此投影误差表示为e＝z-h(ξ，p)，对于BA优化是要将共视帧和共视帧中的地图点考虑进来，因此总的误差项写为If the observation equation is set as z=h(ξ, p), where ξ is the Lie algebra representation of the camera pose, p represents the world coordinates of landmark points, and the observation data z represents the pixel coordinates z=[u _s , v _s ] ^T , so the projection error is expressed as e=zh(ξ, p), for BA optimization is to take the common-view frame and the map points in the common-view frame into account, so the total error term is written as

也就是让上述误差项最小，其中e_ij表示在第i个位姿观测第j个特征点的误差。That is to minimize the above error term, where e _ij represents the error of observing the j-th feature point at the i-th pose.

步骤五：回环检测，利用当前关键帧的词袋模型判断是否与之前帧构成闭环，如果通过闭环一致性检，则计算闭环帧之间的相似变换，相邻帧进行闭环矫正。Step 5: Loop closure detection. Use the bag-of-words model of the current key frame to determine whether it forms a closed loop with the previous frame. If it passes the closed-loop consistency check, calculate the similarity transformation between closed-loop frames, and perform closed-loop correction on adjacent frames.

本发明最终的重建结果是：可以完整的还原出室内静态场景，并输出室内场景的彩色点云地图，点云地图中有效地消除了人等动态物体产生的残影。The final reconstruction result of the present invention is: the indoor static scene can be fully restored, and the colored point cloud map of the indoor scene can be output, and the afterimages produced by dynamic objects such as people are effectively eliminated in the point cloud map.

以上所述并非是对本发明的限制，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明实质范围的前提下，还可以做出若干变化、改型、添加或替换，这些改进和润饰也应视为本发明的保护范围。The above is not a limitation of the present invention. It should be pointed out that for those of ordinary skill in the art, some changes, modifications, additions or substitutions can be made without departing from the essential scope of the present invention. These Improvements and retouches should also be considered within the protection scope of the present invention.

Claims

1. a kind of three-dimensional rebuilding method based on RGBD camera and convolutional neural networks, which is characterized in that three-dimensional rebuilding method institute Device mainly includes RGBD camera, the PC machine with GPU, the specific steps are as follows:

1) it, demarcates RGBD camera: obtaining the color camera internal reference value of RGBD camera and the internal reference value of depth camera and depth phase The transfer matrix of machine and color camera；

2), acquire scene image: each frame include a cromogram and depth map, using RGBD camera SDK by cromogram with Depth map alignment；

3) it, extracts characteristic point: extracting the characteristic point in color image using ORB characteristic point algorithm；

4), dynamic point detection is with rejecting: dynamic is carried out in the way of combining by convolutional neural networks and multi-view geometry method The rejecting of point；

5) it, tracks again: after further rejecting dynamic point using multi-view geometry method, re-using rate pattern and reference frame mould Type estimate present frame pose, is more accurately tracked in local map after obtaining initial pose；

6), be inserted into new key frame: the foundation that new key frame is inserted into judgement is, the last insertion key frame of distance be more than 20 frames with On, present frame tracked to point map less than 50, the coincidence ratio of present frame and reference frame is less than 90%；

7), local map optimizes: in the key frame being newly added, new all view frames altogether that key frame is added and all frames of view altogether Point map carries out BA optimization；

8), winding detects: judging whether that previous frame constitutes closed loop therewith using the bag of words of current key frame, if it is consistent to cross closed loop Property inspection, then calculate the similarity transformation between closed loop frame, consecutive frame carries out closed loop correction.

2. the three-dimensional rebuilding method according to claim 1 based on RGBD camera and convolutional neural networks, which is characterized in that In the step 4), dynamic point detection includes the following steps: with rejecting

A), image segmentation and identification: a kind of SegNet network (semantic segmentation based on convolutional neural networks of re -training is utilized Network)；

Color image is split, the label of each pixel is obtained, the region of obtained people and other animals belongs to dynamic area Domain, in characteristic point belong to dynamic point, rejecting operation is carried out to these dynamic points；

B), track: characteristic point and previous frame or a upper reference frame after present frame is rejected do Feature Points Matching, will utilize Rate pattern or reference frame model estimate an initial pose of present frame；

C), multi-view geometry method judges dynamic point: projecting to estimation of Depth value on present frame by comparing a certain space characteristics point Judged with the depth measurement of the point, is to think that the point is dynamic point when its difference is greater than certain threshold value；

D), dynamic point is obtained to multi-view geometry method, region growing operation is carried out on depth map to it, obtains dynamic area；

E), two dynamic areas for obtaining convolutional neural networks and multi-view geometry method are merged, the concrete operations of fusion Method is to take union to two regions.

F), as a further improvement, in order to improve detection speed, every 5 frame carries out a dynamic area detection, i.e., every 5 frame weight Multiple above a)-e) step.

3. the three-dimensional rebuilding method according to claim 2 based on RGBD camera and convolutional neural networks, which is characterized in that The principle specifically judged in the step c) is as follows: image will can be obtained on the coordinate projection to present frame of spatial point X Point x ', while can estimate its depth information z_projIf a dynamic object blocks X, actual measurement is obtained The depth information z ' arrived can be less than z_proj, therefore when some point ' Δ z=z_projWhen-z ' is greater than some threshold tau, it is classified as Dynamic point.

4. the three-dimensional rebuilding method according to claim 3 based on RGBD camera and convolutional neural networks, which is characterized in that The τ τ value is 0.5m.

5. the three-dimensional rebuilding method according to claim 1 based on RGBD camera and convolutional neural networks, which is characterized in that Further include following steps before step 7) after the step 6): feeding builds figure line journey, builds figure line journey and is responsible for camera pose, color Chromatic graph, depth map generate point cloud chart.

6. the three-dimensional reconstruction side based on RGBD camera and convolutional neural networks described according to claim 1 or 2 or 3 or 4 or 5 Method, which is characterized in that the RGBD camera is KinectV2 (Microsoft's second generation 3D body-sensing video camera).