CN112418288B

CN112418288B - GMS and motion detection-based dynamic vision SLAM method

Info

Publication number: CN112418288B
Application number: CN202011282866.8A
Authority: CN
Inventors: 姚剑; 卓德胜; 程军豪; 龚烨; 涂静敏
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2023-02-03
Anticipated expiration: 2040-11-17
Also published as: CN112418288A

Abstract

The invention belongs to the technical field of visual space positioning, and discloses a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps: initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map; tracking and positioning of the SLAM system are realized by combining GMS; loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The invention solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art. The method can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing building in a dynamic scene, and has good real-time performance and higher positioning precision.

Description

GMS and motion detection-based dynamic vision SLAM method

Technical Field

The invention relates to the technical field of visual space positioning, in particular to a dynamic visual SLAM method based on GMS and motion detection.

Background

The instant positioning and mapping (SLAM) technology is a core technology for realizing the functions of the intelligent mobile robot. The SLAM technology dynamically constructs a map model for the current environment in an incremental manner by using a data stream acquired by a laser sensor or a visual sensor, and performs positioning in the process of map construction.

However, neither the feature point method nor the direct method can solve the problems caused by the dynamic objects that are common in the scene. Most of the current visual SLAM methods are still based on the assumption that the observation environment is static, and moving objects in the environment form a lot of wrong data association or cause map points tracked before to be lost due to occlusion, so that the estimation of the camera attitude is seriously influenced, the positioning of the whole visual SLAM system is wrong, and the subsequent series of tasks of the robot are influenced. Therefore, the method has great significance in researching the positioning accuracy and the mapping effect of the visual SLAM in the dynamic environment.

In recent years, there are also a number of scholars beginning to study the visual SLAM problem in dynamic scenes. For example, the main advantage of the method based on foreground-background initialization is to be able to track temporarily stopped dynamic objects, but requires predefined information about the background or objects, and is less effective when many moving objects are contained in the environment. The method based on geometric constraint has clear mathematical theory and low calculation cost, but is difficult to distinguish residual errors caused by moving objects and residual errors caused by error matching, and the initial pose of the camera needs to be known in advance. The optical flow method can process rich information, but is sensitive to illumination condition changes based on the assumption of constancy of brightness. The method based on deep learning has the best overall effect, but the parameter adjustment is complex, and the real-time performance cannot be guaranteed.

Disclosure of Invention

The invention provides a dynamic visual SLAM method based on GMS and motion detection, and solves the problem of poor SLAM tracking and positioning effects in a dynamic environment in the prior art.

The invention provides a dynamic vision SLAM method based on GMS and motion detection, which comprises the following steps:

step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;

step 2, tracking and positioning of the SLAM system are realized by combining GMS;

step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;

step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.

Preferably, in step 1, the initializing operation corresponding to the monocular image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.

Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F ₁ (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; at the first initialization key frame F ₁ Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size of the sliding window is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F _n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F _m When the window reaches the maximum size m.

Preferably, in step 1, the initialization operation corresponding to the binocular image and the RGB-D image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.

Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: read outOne frame of image, extracting the characteristic points of the image and calculating the number of the characteristic points, if the number of the characteristic points is more than a second threshold value, determining the image as a first initialization key frame F ₁ (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F ₁ Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F _n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F _m When the window reaches the maximum size m.

Preferably, the step 2 comprises:

2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;

2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;

wherein, the step 2.1 specifically comprises the following substeps:

step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;

step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2; otherwise, step 2.1.3 is carried out to track the reference frame;

step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;

step 2.1.4, the SLAM system enters a repositioning model, candidate key frames having a common-view relation with a current frame are calculated through BoW, feature matching is carried out on the current frame and the candidate key frames through GMS grid motion, when matched feature points are larger than a fourth threshold value, the pose of the current frame is estimated by combining a PNP algorithm and a RANSAC algorithm, and a BA optimization algorithm is used for obtaining map points of the current frame corresponding to a local map; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.

Preferably, the step 2.2 specifically comprises the following substeps:

2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;

and 2.2.2, after the local map tracking is finished, distinguishing the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame, and triangularizing to create map points.

Preferably, the constructing the static point cloud map in the step 3 includes: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.

Preferably, in the step 3, the specific implementation manner of the judging and screening the features is as follows:

performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion region S of the reference frame _ref And a potential motion region S of the current frame _cur ，p _r To refer to feature points in key frames, p _c Is projected to the pixel point behind the current key frame, and the projected point p _c Depth in corresponding depth mapDegree information is Z, and p is calculated according to the camera model _c Corresponding projected depth value Z _proj And obtaining a depth difference value:

ΔZ＝Z _proj -Z

calculating p _r And p _c With corresponding spatial point p _w When alpha and delta Z meet the preset condition, the pixel point p is judged _c Screening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.

Preferably, in step 3, a specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:

marking the single-frame static point cloud map constructed by each key frame as C _k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T _k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:

where N represents the total number of key frame sets.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

in the method, firstly, an SLAM system is initialized by combining GMS and a sliding window to obtain an initial map; then, tracking and positioning of the SLAM system are realized by combining GMS; then, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; finally, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The visual SLAM method provided by the invention combines GMS (grid motion statistics) and motion detection, and compared with the existing traditional processing method, the visual SLAM method can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of an SLAM system, solves the problems of visual SLAM positioning and drawing in a dynamic scene, and has good real-time performance and higher positioning precision. In addition, compared with the traditional characteristic-based SLAM system which forms mostly sparse maps, the method adds static map threads based on RGB-D data types, and the formed dense maps can embody more environmental characteristics and more environmental detail information compared with the sparse maps.

Drawings

Fig. 1 is an overall flowchart of a dynamic visual SLAM method based on GMS and motion detection according to an embodiment of the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps:

step 4, outputting a pose track after data processing is finished; outputting static point cloud map for RGB-D image

The present invention is further described below.

The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which specifically includes the following steps, with reference to fig. 1:

step 1: and initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map.

Namely, the GMS and the sliding window are combined to carry out initialization operation on the SLAM system, and dynamic characteristic influence in an operation scene is eliminated.

And (3) finishing corresponding initialization operation aiming at different data types of monocular, binocular and RGB-D, wherein the monocular operation is step 1.1, and the initialization of the binocular and RGB-D types directly jumps to step 1.2.

Step 1.1: monocular initialization is mainly divided into four stages, wherein in the first stage, a sliding window is utilized to select an initial frame; in the second stage, GMS is adopted to match the feature points; in the third stage, solving the initial frame camera pose; and triangularizing the features at the fourth stage to obtain an initial map.

Step 1.1.1: selecting initialization key frames by using a sliding window, firstly reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold (for example 100), determining that the first initialization key frame is a first initialization key frame F ₁ Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; at the first initialization key frame F ₁ A sliding window is established on the basis, the minimum window size of the sliding window is n, and the maximum window size is m. Setting sliding window to obtain initial key frame F with certain time space interval ₁ And key frame F _n Or F _m The method aims to obtain relatively pure feature points by matching and rejecting dynamic features through feature matching and adopting a GMS robust algorithm, and can obtain a more thorough rejection effect compared with the situation that GMS processing is carried out on dynamic mismatching points on adjacent key frames. Then processing the subsequent image frame, reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than a first threshold value, adding the current frame into a sliding window, and if not, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization key frame F _n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F _m At this point the window reaches a maximum size m.

Step 1.1.2: and eliminating mismatching point pairs on the dynamic object by using a GMS matching algorithm. For the initial frame F in step 1.1.1 ₁ And F _n And performing feature matching on the extracted features to obtain a matching point set, and dividing the image into G grids (generally 20 × 20). For the first initialization key frame F ₁ Each feature point X of _i The matching point corresponding to the feature point falls on F _n Matching grids with more number, and establishing a 3 x 3 grid by taking the feature points and the matching point grids as a central expanded areaAnd (4) recording a corresponding matching point set in the grid neighborhood as { a } ^k ,b ^k And (5) calculating a comprehensive score S of feature matching according to the corresponding point set, wherein k = {0,1,2.. 9}, and _i the reference formula is as follows:

wherein, K =9,

is a neighborhood a ^k ,b ^k And (4) counting the number of the matching points in the field. Using the composite score S _i And comparing the threshold t to judge the accuracy of feature matching. The threshold t is set as:

where α is set to 0.6 and n is the average number of feature points for each of k (i.e., 9) neighborhood grids. When score is S _i >And t, judging the characteristic point as a correct matching point, otherwise, judging the characteristic point as an incorrect matching point. Deleting and selecting the matching point pairs between the initial key frames according to the principle to obtain a static matching point set { p } _c ,p _r }。

Step 1.1.3: and solving the pose of the camera. Using the static matching point set p obtained in the previous step _c ,p _r H, respectively calculating homography matrix H of the motion _cr And a base matrix F _cr And calculating to obtain the coordinate { x) of the normalized plane according to the matching points _c ,x _r And establishing a direct linear transformation matrix for the homography matrix H:

x _c ＝H _cr x _r

and solving a homography matrix H by using a normalized 4-point algorithm based on RANSAC. For the basis matrix F, an equation is established according to epipolar geometric constraints:

based on RANSAC, a normalized 8-point algorithm is used for solving a basic matrix. The RANSAC algorithm can eliminate outliers to a certain degree, and simultaneously calculates a matrix score R aiming at a homography matrix H and a basic matrix F _H ：

S _H And S _F Model scores for the homography matrix and the basis matrix, respectively, if the threshold value R is _H >And 0.45, calculating the pose of the camera by using the H matrix through SVD, and otherwise, calculating the essential matrix E by using the F matrix, and then calculating the pose by using the SVD. And finally, checking the pose to obtain an optimal solution, and if the initialized pose does not meet the initialization requirement, taking the next frame and the first initialization key frame F ₁ Repeating the above operations as an initial frame until a frame F at the maximum size m of the sliding window _m If the initialization is still unsuccessful, the sliding window is moved backward as a whole, and the first frame of the sliding window is taken as the first initialization key frame F ₁ And continuing the above operations until the initial pose is obtained.

Step 1.1.4: and (5) triangulating the characteristics to obtain an initial map. Coordinate pairs { x) according to a normalized plane _c ,x _r There are geometrical relations:

wherein z is _c ,z _r Is the Z-axis coordinate (i.e. depth information) in the corresponding camera coordinate system, T _cw And T _rw Pose transformation from the world coordinate system to the current key frame and the reference key frame, P _w Are the corresponding 3D point coordinates. The points of the normalization plane corresponding to the cross-product of the above formula can be obtained:

finishing to obtain:

at the moment, SVD is carried out on the formula to finally obtain the final 3D point coordinate P _w To { p _c ,p _r And (4) after the triangularization operation is completed, an initial map is finally obtained, and initialization operation is completed.

Step 1.2: the initialization of binocular/RGB-D is relatively simple and can be divided into three stages, wherein in the first stage, an initial frame is selected by using a sliding window; in the second stage, GMS feature screening is completed; and the third stage is used for completing feature triangulation to obtain an initial map.

Step 1.2.1: the step is similar to step 1.1.1, and the first frame image is read, feature point extraction is performed on the image, the number of feature points is calculated, and if the number of feature points is greater than a second threshold (e.g., 500), the first initialization key frame F is determined to be the first initialization key frame F ₁ Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; in the first initialization key frame F ₁ Establishing a sliding window on the basis, wherein the size of the minimum window is n, and the size of the maximum window is m; processing the subsequent image frame, reading the next image frame as a current frame, if the frame feature number of the current frame is greater than a second threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F _n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F _m When the window reaches the maximum size m.

Step 1.2.2: GMS screens matching point pairs, the step is consistent with the step 1.1.2, and the original frame F is finally obtained ₁ And F _n The matching point pairs between the static matching points are deleted and selected to obtain a static matching point set { p _c ,p _r }。

Step 1.2.3: the stage is completed speciallyAnd (5) performing triangle formation. Is simpler for the RGB-D type, at F _n And restoring the depth information by combining the correct matching points with the depth map to create a static initial map. For binocular, F _n Carrying out binocular triangulation on the correct matching points, and carrying out binocular triangulation on any static matching point p between the left frame and the right frame _L (u _L ,v _L ) And p _R (u _R ,v _R ) Their corresponding 3D points P, using the geometric relationship between the binoculars, are as follows:

wherein z represents the depth information of P, b represents the base line of binocular, f is the focal length of the camera, and d is the parallax between two images. The method is adopted for processing, and initial map information can be finally obtained.

Step 2: and tracking and positioning of the SLAM system are realized by combining GMS.

When the system is initialized successfully, the system enters a tracking thread and initializes a key frame F ₁ When a new image frame is received, the pose of the camera is estimated by using a reference frame tracking model or a constant-speed tracking model, then the local key frame and the local map points are updated, the local map points are re-projected to the current frame, a map optimization model is established, and the pose is further optimized. Step 2 is mainly divided into two parts, namely tracking and local map creation, and then loop detection and global optimization of the system.

Step 2.1: the tracing and mapping implementation in the step inherits the basic idea of ORB-SLAM. The image information enters the system and is tracked by adopting different modes of a reference frame tracking model and a constant-speed tracking model.

Step 2.1.1: the SLAM system firstly enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose xi of the current frame, and the map points in the reference frame are projected to the current frame to complete 3D-2D data association, and the relationship is as follows:

wherein, lie algebra xi ^ represents camera pose, u _i As pixel coordinates of observation points, s _i Is scale information, K is an internal reference matrix, P _i As a 3-dimensional coordinate, xi, of a spatial point ^* The camera pose needs to be optimized. And then BA optimization is carried out to minimize the reprojection error and optimize the pose.

Step 2.1.2: and (3) filtering and eliminating the inspection method by adopting a 3D-2D matching point in the traditional ORB-SLAM system to obtain the matching number matches Map of the final Map points and the feature points, if the matching number is greater than or equal to a third threshold value, for example, the matching number > =10, directly jumping to the step 2.2, and otherwise, entering the step 2.1.3 to track the reference frame.

Step 2.1.3: the SLAM system enters a reference frame tracking module, the pose transformation of the previous frame is used as the initial pose of the current frame, the feature matching of the current frame and a reference key frame is carried out, map points corresponding to feature points in the reference key frame are correlated to matching feature points of the current frame, GMS is adopted for matching the feature points, dynamic feature point pairs are removed, static 3D-2D data correlation is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimum reprojection errors. And (3) referring to the step 2.1.2, performing 3D-2D matching check, if the number of maps is more than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4, and performing relocation.

Step 2.1.4: the SLAM system enters the relocation model. When moving objects are more in a scene, static feature points are insufficient, or a camera moves too fast, gesture tracking is lost, and the target enters a repositioning module. Similar to ORB-SLAM2, candidate key frames having a co-view relationship with the current frame are computed herein by BoW. The current frame is then feature matched with the candidate keyframes using GMS mesh motion. And when the matched feature point is larger than a fourth threshold (for example, 30), estimating the pose of the current frame by combining the PNP algorithm and the RANSAC algorithm, and obtaining the map point of the current frame corresponding to the local map by using the BA optimization algorithm. If the number of map points for the current frame returns to be greater than a fifth threshold (e.g., 50), the relocation is successful, followed by entering step 2.2.

Step 2.2: after the inter-frame tracking is completed, the local map is tracked, the fusion tracking of the tracking data and the local map data is completed, and then the generation of the key frame is judged.

Step 2.2.1: in the local map tracking model, a local map is established for tracking by utilizing the initial pose of the current frame. Firstly, a key frame set observing a map point of a current frame is used as a primary connected key frame, a frame with a higher common view area with the primary connected key frame, a father frame and a subframe form a local key frame together. And then, using map points in the local key frame as local map points, re-projecting map points which can be observed by the current frame in the local map points to the current frame, establishing a map optimization model with more observation edges, and further optimizing the pose of the current frame.

Step 2.2.2: and (4) creating a key frame. After the local map tracking is completed, the current frame is judged, whether the current frame is packaged as a key frame or not is judged, and the adopted preset conditions are as follows: the ratio of the number of the characteristic points tracked by the current frame to the total number of the characteristic points of the reference frame is less than a certain threshold value, the number of the key frames waiting for processing in the local mapping thread is not more than 3, and the minimum interval between the number of the key frames and the number of the total characteristic points of the reference frame is 8 frames; if the preset conditions are not met, returning to the tracking thread to track the next frame of image, otherwise, creating a key frame, and simultaneously performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame for triangularization creation of map points, wherein a single-frame static point cloud map constructed by each frame of key frame is C _k 。

And step 3: loop detection and global optimization are carried out; and constructing a static point cloud map according to the RGB-D image.

Step 3.1: loop detection and global optimization. For new keyframes, we are still used to compute the co-view relationships of all keyframes, and keyframes that have a co-view relationship but are not directly connected are taken as candidate keyframes. Feature matching is performed on the candidate key frame and the current key frame by using a GMS algorithm, and the pose is solved and optimized by combining a PNP algorithm and a RANSAC algorithm (the pose of a monocular camera is 7 degrees of freedom, and the poses of a binocular camera and an RGB-D camera are 6 degrees of freedom). After the closed-loop key frames are determined, the pose of all key frames in the local map is optimized by using the pose map model. And finally, optimizing all key frames and map points of the whole local map by using a BA algorithm.

Step 3.2: and (5) static map building. The steps are directed at an RGB-D data set, and are divided into two stages, firstly, judging and screening are carried out on features, and secondly, a depth map is collected to complete the fusion of 3D points.

Step 3.2.1: detecting the motion of the key frame, and referring to the key frame KF _ref And current key frame KF _cur Inputting the input into a trained YOLO v3 network to segment a potential motion region S of a reference key frame _ref And the current key frame S _cur ，p _r To refer to feature points in key frames, p _c For the pixel point projected to the current frame and the depth information of the depth map corresponding to the projected point is Z, the corresponding projected depth value Z can be calculated according to the camera model _proj And obtaining a depth difference value:

ΔZ＝Z _proj -Z

simultaneous calculation of p _r And p _c With corresponding spatial point p _w Angle α therebetween when<=30 ° and Δ Z > T _z Wherein T is _z Is a depth threshold value, is generally set to 1, and the point p is judged according to the condition of the above formula _c For a dynamic point, the point is filtered from the current frame. KF (potassium fluoride) _cur All the points are processed according to the method to finally obtain a static characteristic point set.

Step 3.2.2: the single-frame static point cloud map constructed by each frame of key frame is C _k Pose with respect to world coordinate system is T _k Converting the point clouds of all frames into a world coordinate system, and constructing a global static point cloud map W, namely:

and finally splicing the total number of the N key frame sets, namely the number of the key frames representing the global map to complete the static point cloud map.

And 4, step 4: outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.

The dynamic visual SLAM method based on GMS and motion detection provided by the embodiment of the invention at least comprises the following technical effects:

(1) The method of combining GMS and motion detection can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of the SLAM system, can solve the problems of visual SLAM positioning and mapping in a dynamic scene, and has good real-time performance and higher positioning precision.

(2) Compared with the traditional map formed by the SLAM system based on the characteristics, the map is mostly a sparse map, the RGB-D type dense map is added, and the formed dense map can embody more environmental characteristics and more environmental detail information compared with the sparse map.

(3) The method adds a sliding window mechanism in the initialization process to establish the static initial map without the dynamic object feature points, which is equivalent to the prior art, because the time interval and the motion amplitude between adjacent key frames are small, the dynamic feature points are difficult to identify, and the key frames with certain time interval can be matched by adopting the sliding window, so that the dynamic points can be better distinguished and removed, more thorough static feature points can be obtained, and the quality of the initial map points is improved.

(4) The GMS is added in the initialization process, and the GMS can judge the dynamic feature points on the basis of the matched feature point pairs, so that the motion information can be better eliminated.

(5) According to the invention, the potential motion area can be segmented by using a YOLO v3 network, the projection point depth and the rotation angle are used for judging whether the potential motion area is a dynamic area, and if the potential motion area is the dynamic area, the feature points of the corresponding area are removed to finally obtain the static feature points, so that the motion information is better removed.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A GMS and motion detection based dynamic visual SLAM method, comprising the steps of:

the constructing of the static point cloud map comprises: judging and screening the characteristics; the depth map is collected to complete the fusion of 3D points;

the specific implementation manner of judging and screening the features is as follows:

performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion area S of the reference frame _ref And a potential motion region S of the current frame _cur ，p _r To refer to feature points in key frames, p _c Is projected to the pixel point behind the current key frame, and the projected point p _c The depth information in the corresponding depth map is Z, and p is calculated according to the camera model _c Corresponding projected depth value Z _proj And obtaining a depth difference value:

ΔZ＝Z _proj -Z

calculating p _r And p _c With corresponding spatial point p _w When alpha and delta Z meet the preset condition, the pixel point p is judged _c Screening out the pixel point from the current frame as a dynamic point; processing all points in the current key frame according to the method to finally obtain a static characteristic point set;

the specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:

each one is to beSingle-frame static point cloud map constructed by key frames is marked as C _k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T _k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, which is expressed as follows:

wherein N represents the total number of key frame sets;

2. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein in step 1, the initializing operation of monocular image correspondence comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangularization.

3. The GMS and motion detection based dynamic visual SLAM method of claim 2, wherein the specific implementation of the initial keyframe selection using sliding windows is: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F ₁ (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F ₁ Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the characteristic points of the current frame is larger than the first threshold value, adding the current frame into the sliding window, otherwise, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization keyFrame F _n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F _m At this point the window reaches a maximum size m.

4. The GMS and motion detection based dynamic vision SLAM method according to claim 1, wherein the initialization operation corresponding to the binocular image and the RGB-D image in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.

5. The GMS and motion detection based dynamic visual SLAM method of claim 4, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F ₁ (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; at the first initialization key frame F ₁ Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F _n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F _m At this point the window reaches a maximum size m.

6. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein said step 2 includes:

2.2, tracking the local map, completing fusion tracking of the tracking data and the local map data, and generating a distinguishing key frame;

wherein, the step 2.1 specifically comprises the following substeps:

step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;

step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to carry out the matching of the feature points to obtain and remove dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating the matching, and finally the pose is solved through BA optimization minimum reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;

7. The GMS and motion detection based dynamic visual SLAM method according to claim 6, characterised in that said step 2.2 comprises in particular the sub-steps of:

2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by using the initial pose of the current frame;

and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.