CN112418288A

CN112418288A - GMS and motion detection-based dynamic vision SLAM method

Info

Publication number: CN112418288A
Application number: CN202011282866.8A
Authority: CN
Inventors: 姚剑; 卓德胜; 程军豪; 龚烨; 涂静敏
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-26
Anticipated expiration: 2040-11-17
Also published as: CN112418288B

Abstract

The invention belongs to the technical field of visual space positioning, and discloses a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps: initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map; tracking and positioning of the SLAM system are realized by combining GMS; loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The invention solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art. The method can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing building in a dynamic scene, and has good real-time performance and higher positioning precision.

Description

GMS and motion detection-based dynamic vision SLAM method

Technical Field

The invention relates to the technical field of visual space positioning, in particular to a dynamic visual SLAM method based on GMS and motion detection.

Background

The instant positioning and mapping (SLAM) technology is a core technology for realizing the functions of the intelligent mobile robot. The SLAM technology dynamically constructs a map model for the current environment in an incremental manner by using a data stream acquired by a laser sensor or a visual sensor, and performs positioning in the process of constructing the map.

However, neither the feature point method nor the direct method can solve the problems caused by the dynamic objects commonly found in the scene. Most of current visual SLAM methods are still based on the assumption that an observation environment is static, moving objects in the environment can form a lot of wrong data associations or cause map points tracked before to be lost due to occlusion, so that the estimation of the camera attitude is seriously influenced, the positioning error of the whole visual SLAM system is caused, and a series of subsequent tasks of the robot are influenced. Therefore, the method has great significance in researching how to improve the positioning accuracy and the mapping effect of the visual SLAM in the dynamic environment.

In recent years, there are also a number of scholars beginning to study the visual SLAM problem in dynamic scenes. For example, the main advantage of the method based on foreground background initialization is to be able to track temporarily stopped dynamic objects, but requires predefined information about the background or objects, and is less effective when many moving objects are contained in the environment. The method based on geometric constraint has clear mathematical theory and low calculation cost, but the residual error caused by moving objects and the residual error caused by error matching are difficult to distinguish, and the initial pose of the camera needs to be known in advance. The optical flow method can process rich information, but is based on the assumption of constancy of brightness, and is sensitive to changes in illumination conditions. The method based on deep learning has the best overall effect, but the parameter adjustment is complex, and the real-time performance cannot be guaranteed.

Disclosure of Invention

The invention provides a dynamic visual SLAM method based on GMS and motion detection, and solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art.

The invention provides a dynamic vision SLAM method based on GMS and motion detection, which comprises the following steps:

step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;

step 2, tracking and positioning of the SLAM system are realized by combining GMS;

step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;

step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.

Preferably, in step 1, the initializing operation corresponding to the monocular image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.

Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the first frame image is a first initialization key frame F₁(ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F₁Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

Preferably, in step 1, the initialization operation corresponding to the binocular image and the RGB-D image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.

Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F₁(ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F₁Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

Preferably, the step 2 comprises:

2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;

2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;

wherein, the step 2.1 specifically comprises the following substeps:

step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;

step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;

step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;

step 2.1.4, the SLAM system enters a relocation model, calculates a candidate key frame having a common-view relation with a current frame through BoW, performs feature matching on the current frame and the candidate key frame by using GMS grid motion, estimates the pose of the current frame by combining PNP and RANSAC algorithms when a matched feature point is larger than a fourth threshold value, and obtains a map point corresponding to the current frame on a local map by using a BA optimization algorithm; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.

Preferably, the step 2.2 specifically comprises the following substeps:

2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;

and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.

Preferably, the constructing the static point cloud map in the step 3 includes: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.

Preferably, in the step 3, the specific implementation manner of the judging and screening the features is as follows:

detecting the motion of the key frame, and referring to the key frameInputting the current key frame into a trained YOLO v3 network to segment a potential motion area S of the reference frame_refAnd a potential motion region S of the current frame_cur，p_rTo refer to feature points in key frames, p_cIs projected to the pixel point behind the current key frame, and the projected point p_cThe depth information in the corresponding depth map is Z, and p is calculated according to the camera model_cCorresponding projected depth value Z_projAnd obtaining a depth difference value:

ΔZ＝Z_proj-Z

calculating p_rAnd p_cWith corresponding spatial point p_wWhen alpha and delta Z meet the preset condition, the pixel point p is judged_cScreening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.

Preferably, in step 3, a specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:

marking the single-frame static point cloud map constructed by each key frame as C_kAnd the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T_kConverting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:

where N represents the total number of key frame sets.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

firstly, initializing a SLAM system by combining GMS and a sliding window to obtain an initial map; then, tracking and positioning of the SLAM system are realized by combining GMS; then, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; finally, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The visual SLAM method provided by the invention combines GMS (grid motion statistics) and motion detection, and compared with the existing traditional processing method, the visual SLAM method can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing establishment in a dynamic scene, and has good real-time performance and higher positioning precision. In addition, compared with the traditional characteristic-based SLAM system which forms mostly sparse maps, the method adds static map threads based on RGB-D data types, and the formed dense maps can embody more environmental characteristics and more environmental detail information compared with the sparse maps.

Drawings

Fig. 1 is an overall flowchart of a dynamic visual SLAM method based on GMS and motion detection according to an embodiment of the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps:

step 4, outputting a pose track after data processing is finished; outputting static point cloud map for RGB-D image

The present invention is further described below.

The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which specifically includes the following steps, with reference to fig. 1:

step 1: and initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map.

Namely, the GMS and the sliding window are combined to carry out initialization operation on the SLAM system, and dynamic characteristic influence in an operation scene is eliminated.

And (3) finishing corresponding initialization operation aiming at different data types of monocular, binocular and RGB-D, wherein the monocular operation is step 1.1, and the initialization of the binocular and RGB-D types is directly skipped to step 1.2.

Step 1.1: monocular initialization is mainly divided into four stages, wherein in the first stage, a sliding window is used for selecting an initial frame; in the second stage, GMS is adopted to match the feature points; in the third stage, solving the initial frame camera pose; and the fourth stage of feature triangularization to obtain an initial map.

Step 1.1.1: selecting initialization key frame by using sliding window, firstly reading first frame image, extracting characteristic points of image, calculating characteristic point number, if the characteristic point number is greater than first threshold (for example 100), confirming as first initialization key frame F₁Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; at the first initialization key frame F₁A sliding window is established on the basis, the minimum window size of the sliding window is n, and the maximum window size is m. Setting sliding window to obtain initial key frame F with certain time space interval₁And key frame F_nOr F_mThe method aims to obtain relatively pure feature points by matching and rejecting dynamic features through feature matching and adopting a GMS robust algorithm, and can obtain a more thorough rejection effect compared with the situation that GMS processing is carried out on dynamic mismatching points on adjacent key frames. Then processing the subsequent image frame, reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than a first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

Step 1.1.2: and eliminating mismatching point pairs on the dynamic object by using a GMS matching algorithm. For the initial frame F in step 1.1.1₁And F_nThe extracted features are carried outFeature matching obtains a set of matching points, and the image is divided into G grids (generally 20 × 20). For the first initialization key frame F₁Each feature point X of_iThe matching point corresponding to the feature point falls on F_nIn grids with more matching numbers, a 3 x 3 grid neighborhood is established by taking the feature point and the matching point grid as a central expansion area, and the corresponding matching point set in the grid neighborhood is recorded as { a }^k,b^kAnd calculating a comprehensive score S of feature matching according to the corresponding point set, wherein k is {0,1,2.. 9}_iThe reference formula is as follows:

wherein, K is 9,

is a neighborhood a^k,b^kAnd (4) counting the number of the matching points in the field. Using the composite score S_iAnd comparing the threshold t to judge the accuracy of feature matching. The threshold t is set as:

where α is set to 0.6 and n is the average number of feature points for each of k (i.e., 9) neighborhood grids. When the score is S_i>And t, judging the characteristic point as a correct matching point, otherwise, judging the characteristic point as an incorrect matching point. Deleting the matching point pairs between the initial key frames according to the principle to obtain a static matching point set { p_c,p_r}。

Step 1.1.3: and solving the pose of the camera. Using the static matching point set p obtained in the previous step_c,p_rH, respectively calculating homography matrix H of the motion_crAnd a basis matrix F_crAnd calculating to obtain the coordinate { x) of the normalized plane according to the matching points_c,x_rAnd establishing a direct linear transformation matrix for the homography matrix H:

x_c＝H_crx_r

and solving a homography matrix H by using a normalized 4-point algorithm based on RANSAC. For the basis matrix F, an equation is established according to epipolar geometric constraints:

based on RANSAC, the basis matrix is solved using a normalized 8-point algorithm. The RANSAC algorithm can eliminate outliers to a certain degree, and simultaneously calculates a matrix score R aiming at a homography matrix H and a basic matrix F_H：

S_HAnd S_FModel scores for homography matrix and basis matrix, respectively, if threshold R_H>And 0.45, calculating the pose of the camera by using the H matrix through SVD, and otherwise, calculating the essential matrix E by using the F matrix, and then calculating the pose by using the SVD. Finally, checking the pose to obtain the optimal solution, and if the initialized pose does not meet the initialization requirement, taking the next frame and the first initialization key frame F₁Repeating the above operations as an initial frame until a frame F at the maximum size m of the sliding window_mIf the initialization is still unsuccessful, the sliding window is moved backward as a whole, and the first frame of the sliding window is taken as the first initialization key frame F₁And continuing the above operations until the initial pose is obtained.

Step 1.1.4: and triangularizing the features to obtain an initial map. Coordinate pairs { x) according to a normalized plane_c,x_rThere are geometrical relations:

wherein z is_c,z_rFor the Z-axis coordinate (i.e. depth information) in the corresponding camera coordinate systemInformation), T_cwAnd T_rwPose transformation from the world coordinate system to the current key frame and the reference key frame, P_wAre the corresponding 3D point coordinates. The points of the normalization plane corresponding to the cross-product of the above formula can be obtained:

finishing to obtain:

at the moment, SVD solving is carried out on the formula to finally obtain the final 3D point coordinate P_wTo { p_c,p_rAnd finally obtaining an initial map after the triangularization operation is completed by the matching point pair of the map, and completing initialization operation.

Step 1.2: the initialization of binocular/RGB-D is relatively simple and can be divided into three stages, wherein in the first stage, an initial frame is selected by using a sliding window; in the second stage, GMS feature screening is completed; and the third stage is used for completing feature triangulation to obtain an initial map.

Step 1.2.1: the step is similar to step 1.1.1, and the first frame image is read, feature point extraction is performed on the image, the number of feature points is calculated, and if the number of feature points is greater than a second threshold (e.g., 500), the first initialization key frame F is determined to be the first initialization key frame F₁Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; in the first initialization key frame F₁Establishing a sliding window on the basis, wherein the minimum window size is n, and the maximum window size is m; processing the subsequent image frame, reading the next image frame as a current frame, if the frame feature number of the current frame is greater than a second threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

Step 1.2.2: GMS screens matching point pairs, the step is consistent with the step 1.1.2, and the original frame F is finally obtained₁And F_nThe matching point pairs between the static matching points are deleted and selected to obtain a static matching point set { p_c,p_r}。

Step 1.2.3: this stage completes feature triangulation. Is simpler for RGB-D type, at F_nAnd recovering the depth information by combining the correct matching points with the depth map, and creating a static initial map. For binocular, F_nCarrying out binocular triangulation on the correct matching points, and carrying out binocular triangulation on any static matching point p between the left frame and the right frame_L(u_L,v_L) And p_R(u_R,v_R) Their corresponding 3D points P, using the geometric relationship between the binoculars, are as follows:

wherein z represents the depth information of P, b represents the base line of binocular, f is the focal length of the camera, and d is the parallax between two frames of images. The initial map information can be finally obtained by processing by adopting the method.

Step 2: and tracking and positioning of the SLAM system are realized by combining GMS.

When the system is initialized successfully, the system enters a tracking thread and initializes a key frame F₁When a new image frame is received, the camera pose is estimated by using a reference frame tracking model or a constant-speed tracking model, then the local key frame and the local map point are updated, the local map point is re-projected to the current frame, a map optimization model is established, and the pose is further optimized. Step 2 is mainly divided into two parts, namely tracking and local map creation, and loop detection and global optimization of the system.

Step 2.1: the tracing and mapping implementation in the step inherits the basic idea of ORB-SLAM. The image information enters the system and is tracked by adopting different modes of a reference frame tracking model and a constant speed tracking model.

Step 2.1.1: the SLAM system firstly enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose xi of the current frame, and the map point in the reference frame is projected to the 3D-2D data association of the current frame, and the relationship is as follows:

wherein, lie algebra xi ^ represents the camera pose, u_iAs pixel coordinates of observation points, s_iIs scale information, K is an internal reference matrix, P_iAs a 3-dimensional coordinate, xi, of a spatial point^*The camera pose needs to be optimized. And then BA optimization is carried out to minimize the reprojection error and optimize the pose.

Step 2.1.2: and (3) filtering and eliminating the inspection method by adopting a 3D-2D matching point in the traditional ORB-SLAM system to obtain the matching number maps of the final Map points and the feature points, if the matching number is greater than or equal to a third threshold value, for example, the matching number > is 10, directly jumping to the step 2.2, and if not, entering the step 2.1.3 to track the reference frame.

Step 2.1.3: the SLAM system enters a reference frame tracking module, the pose transformation of the previous frame is used as the initial pose of the current frame, the feature matching of the current frame and a reference key frame is carried out, map points corresponding to feature points in the reference key frame are correlated to matching feature points of the current frame, GMS is adopted for matching the feature points, dynamic feature point pairs are removed, static 3D-2D data correlation is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimum reprojection errors. And (3) referring to the step 2.1.2, performing 3D-2D matching check, if the number of matches Map is larger than or equal to a third threshold, directly jumping to the step 2.2, and otherwise, entering the step 2.1.4 to perform relocation.

Step 2.1.4: the SLAM system enters the relocation model. When a plurality of moving objects exist in a scene, static characteristic points are insufficient, or a camera moves too fast, the attitude tracking is lost, and the camera enters a repositioning module. Similar to ORB-SLAM2, candidate key frames having a co-view relationship with the current frame are computed by BoW herein. The current frame is then feature matched with the candidate keyframes using GMS mesh motion. And when the matched feature point is larger than a fourth threshold (for example, 30), estimating the pose of the current frame by combining the PNP algorithm and the RANSAC algorithm, and obtaining the map point of the current frame corresponding to the local map by using the BA optimization algorithm. If the number of map points for the current frame returns to be greater than a fifth threshold (e.g., 50), the relocation is successful, followed by entering step 2.2.

Step 2.2: after the inter-frame tracking is completed, the local map is tracked, the fusion tracking of the tracking data and the local map data is completed, and then the generation of the key frame is judged.

Step 2.2.1: in the local map tracking model, a local map is established for tracking by utilizing the initial pose of the current frame. Firstly, a key frame set observing a map point of a current frame is used as a primary connected key frame, a frame with a higher common view area with the primary connected key frame, a father frame and a subframe form a local key frame together. Then, the map points in the local key frame are used as local map points, the map points which can be observed by the current frame in the local map points are re-projected to the current frame, a more-observation-side map optimization model is established, and the pose of the current frame is further optimized.

Step 2.2.2: and (4) creation of a key frame. After the local map tracking is completed, the current frame is judged, whether the current frame is packaged as a key frame or not is judged, and the adopted preset conditions are as follows: the ratio of the number of the feature points tracked by the current frame to the total number of the feature points of the reference frame is less than a certain threshold, the number of the key frames waiting for processing in the local mapping thread is not more than 3, and the minimum interval between the number of the key frames and the number of the total feature points of the reference frame is 8 frames; if the preset conditions are not met, returning to the tracking thread to track the next frame of image, otherwise, creating a key frame, and simultaneously performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame for triangularization creation of map points, wherein a single-frame static point cloud map constructed by each frame of key frame is C_k。

And step 3: loop detection and global optimization are carried out; and constructing a static point cloud map according to the RGB-D image.

Step 3.1: loop detection and global optimization. For new keyframes, we are still used to compute the co-view relationships of all keyframes, and keyframes that have a co-view relationship but are not directly connected are taken as candidate keyframes. The GMS algorithm is used for carrying out feature matching on the candidate key frame and the current key frame, and the PNP algorithm and the RANSAC algorithm are combined to solve and optimize the pose (the pose of the monocular camera is 7 degrees of freedom, and the pose of the binocular camera and the pose of the RGB-D camera are 6 degrees of freedom). And after the closed-loop key frames are determined, optimizing the poses of all key frames in the local map by using the pose map model. And finally, optimizing all key frames and map points of the whole local map by using a BA algorithm.

Step 3.2: and (5) static map construction. The steps are directed at an RGB-D data set, and are divided into two stages, firstly, judging and screening are carried out on features, and secondly, a depth map is collected to complete the fusion of 3D points.

Step 3.2.1: detecting the motion of the key frame and referring to the key frame KF_refAnd current key frame KF_curInput into a trained YOLO v3 network to segment out potential motion areas S of reference key frames_refAnd the current key frame S_cur，p_rTo refer to feature points in key frames, p_cFor the pixel point projected to the current frame and the depth information of the depth map corresponding to the projected point is Z, the corresponding projected depth value Z can be calculated according to the camera model_projAnd obtaining a depth difference value:

ΔZ＝Z_proj-Z

simultaneous calculation of p_rAnd p_cWith corresponding spatial point p_wWhen alpha is included<30 DEG and Delta Z > T_zWherein T is_zIs a depth threshold value, is generally set to 1, and the point p is judged according to the condition of the above formula_cFor a dynamic point, the point is filtered from the current frame. KF_curAnd processing all the points according to the method to finally obtain a static characteristic point set.

Step 3.2.2: the single-frame static point cloud map constructed by each frame of key frame is C_kPose with respect to world coordinate system is T_kConverting the point clouds of all frames into a world coordinate system, and constructing a global static point cloud map W, namely:

and finally splicing the total number of the N key frame sets, namely representing the number of key frames of the global map to complete the static point cloud map.

And 4, step 4: outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.

The dynamic visual SLAM method based on GMS and motion detection provided by the embodiment of the invention at least comprises the following technical effects:

(1) the method of the invention can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system by using the method of combining GMS and motion detection, can solve the problems of visual SLAM positioning and mapping in a dynamic scene, and has good real-time performance and higher positioning precision.

(2) Compared with the traditional map formed by the SLAM system based on the characteristics, the map is mostly a sparse map, the RGB-D type dense map is added, and the formed dense map can embody more environmental characteristics and more environmental detail information compared with the sparse map.

(3) The method adds a sliding window mechanism in the initialization process to establish the static initial map without the dynamic object feature points, which is equivalent to the prior art, because the time interval and the motion amplitude between adjacent key frames are small, the dynamic feature points are difficult to identify, and the key frames with certain time interval can be matched by adopting the sliding window, so that the dynamic points can be better distinguished and removed, more thorough static feature points can be obtained, and the quality of the initial map points is improved.

(4) The GMS is added in the initialization process, and the GMS can judge the dynamic feature points on the basis of the matched feature point pairs, so that the motion information can be better eliminated.

(5) According to the invention, the potential motion area can be segmented by using a YOLO v3 network, the projection point depth and the rotation angle are used for judging whether the potential motion area is a dynamic area, and if the potential motion area is the dynamic area, the feature points of the corresponding area are removed to finally obtain the static feature points, so that the motion information is better removed.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A GMS and motion detection based dynamic visual SLAM method, comprising the steps of:

2. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein the initialization operation for monocular image correspondence in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.

3. The GMS and motion detection based dynamic visual SLAM method of claim 2, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the first frame image is a first initialization key frame F₁(ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F₁Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

4. The GMS and motion detection based dynamic vision SLAM method according to claim 1, wherein the initialization operation corresponding to the binocular image and the RGB-D image in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.

5. The GMS and motion detection based dynamic visual SLAM method of claim 4, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F₁(ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F₁Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F_nAdding to the window to the maximumSmall size n, then continuously processing the added key frame to finally obtain the mth key frame F_mWhen the window reaches the maximum size m.

6. The GMS and motion detection based dynamic visual SLAM method of claim 1, wherein said step 2 comprises:

wherein, the step 2.1 specifically comprises the following substeps:

7. The GMS and motion detection based dynamic visual SLAM method according to claim 6, characterized in that said step 2.2 comprises in particular the following sub-steps:

8. The GMS and motion detection based dynamic visual SLAM method of claim 1, wherein the constructing of the static point cloud map in step 3 comprises: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.

9. The GMS and motion detection based dynamic visual SLAM method according to claim 8, wherein in step 3, the specific implementation manner of the feature judgment and screening is as follows:

performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion region S of the reference frame_refAnd a potential motion region S of the current frame_cur，p_rTo refer to feature points in key frames, p_cIs projected to the pixel point behind the current key frame, and the projected point p_cThe depth information in the corresponding depth map is Z, and p is calculated according to the camera model_cCorresponding projected depth value Z_projAnd obtaining a depth difference value:

ΔZ＝Z_proj-Z

10. The GMS and motion detection based dynamic visual SLAM method according to claim 8, wherein in step 3, the specific implementation manner of the aggregate depth map to complete the fusion of 3D points is as follows:

where N represents the total number of key frame sets.