CN112418288B - GMS and motion detection-based dynamic vision SLAM method - Google Patents
GMS and motion detection-based dynamic vision SLAM method Download PDFInfo
- Publication number
- CN112418288B CN112418288B CN202011282866.8A CN202011282866A CN112418288B CN 112418288 B CN112418288 B CN 112418288B CN 202011282866 A CN202011282866 A CN 202011282866A CN 112418288 B CN112418288 B CN 112418288B
- Authority
- CN
- China
- Prior art keywords
- frame
- key frame
- map
- gms
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001514 detection method Methods 0.000 title claims abstract description 29
- 230000003068 static effect Effects 0.000 claims abstract description 44
- 230000000007 visual effect Effects 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 230000004927 fusion Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 230000009191 jumping Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 7
- 230000000875 corresponding effect Effects 0.000 description 22
- 239000011159 matrix material Substances 0.000 description 16
- 230000008569 process Effects 0.000 description 6
- 238000013507 mapping Methods 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 4
- 239000011698 potassium fluoride Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- NROKBHXJSPEDAR-UHFFFAOYSA-M potassium fluoride Chemical compound [F-].[K+] NROKBHXJSPEDAR-UHFFFAOYSA-M 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 235000003270 potassium fluoride Nutrition 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Geometry (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Remote Sensing (AREA)
- Computer Graphics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of visual space positioning, and discloses a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps: initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map; tracking and positioning of the SLAM system are realized by combining GMS; loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The invention solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art. The method can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing building in a dynamic scene, and has good real-time performance and higher positioning precision.
Description
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic visual SLAM method based on GMS and motion detection.
Background
The instant positioning and mapping (SLAM) technology is a core technology for realizing the functions of the intelligent mobile robot. The SLAM technology dynamically constructs a map model for the current environment in an incremental manner by using a data stream acquired by a laser sensor or a visual sensor, and performs positioning in the process of map construction.
However, neither the feature point method nor the direct method can solve the problems caused by the dynamic objects that are common in the scene. Most of the current visual SLAM methods are still based on the assumption that the observation environment is static, and moving objects in the environment form a lot of wrong data association or cause map points tracked before to be lost due to occlusion, so that the estimation of the camera attitude is seriously influenced, the positioning of the whole visual SLAM system is wrong, and the subsequent series of tasks of the robot are influenced. Therefore, the method has great significance in researching the positioning accuracy and the mapping effect of the visual SLAM in the dynamic environment.
In recent years, there are also a number of scholars beginning to study the visual SLAM problem in dynamic scenes. For example, the main advantage of the method based on foreground-background initialization is to be able to track temporarily stopped dynamic objects, but requires predefined information about the background or objects, and is less effective when many moving objects are contained in the environment. The method based on geometric constraint has clear mathematical theory and low calculation cost, but is difficult to distinguish residual errors caused by moving objects and residual errors caused by error matching, and the initial pose of the camera needs to be known in advance. The optical flow method can process rich information, but is sensitive to illumination condition changes based on the assumption of constancy of brightness. The method based on deep learning has the best overall effect, but the parameter adjustment is complex, and the real-time performance cannot be guaranteed.
Disclosure of Invention
The invention provides a dynamic visual SLAM method based on GMS and motion detection, and solves the problem of poor SLAM tracking and positioning effects in a dynamic environment in the prior art.
The invention provides a dynamic vision SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
Preferably, in step 1, the initializing operation corresponding to the monocular image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F 1 (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; at the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size of the sliding window is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Preferably, in step 1, the initialization operation corresponding to the binocular image and the RGB-D image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: read outOne frame of image, extracting the characteristic points of the image and calculating the number of the characteristic points, if the number of the characteristic points is more than a second threshold value, determining the image as a first initialization key frame F 1 (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Preferably, the step 2 comprises:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a repositioning model, candidate key frames having a common-view relation with a current frame are calculated through BoW, feature matching is carried out on the current frame and the candidate key frames through GMS grid motion, when matched feature points are larger than a fourth threshold value, the pose of the current frame is estimated by combining a PNP algorithm and a RANSAC algorithm, and a BA optimization algorithm is used for obtaining map points of the current frame corresponding to a local map; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
Preferably, the step 2.2 specifically comprises the following substeps:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;
and 2.2.2, after the local map tracking is finished, distinguishing the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame, and triangularizing to create map points.
Preferably, the constructing the static point cloud map in the step 3 includes: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.
Preferably, in the step 3, the specific implementation manner of the judging and screening the features is as follows:
performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion region S of the reference frame ref And a potential motion region S of the current frame cur ,p r To refer to feature points in key frames, p c Is projected to the pixel point behind the current key frame, and the projected point p c Depth in corresponding depth mapDegree information is Z, and p is calculated according to the camera model c Corresponding projected depth value Z proj And obtaining a depth difference value:
ΔZ=Z proj -Z
calculating p r And p c With corresponding spatial point p w When alpha and delta Z meet the preset condition, the pixel point p is judged c Screening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.
Preferably, in step 3, a specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:
marking the single-frame static point cloud map constructed by each key frame as C k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:
where N represents the total number of key frame sets.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
in the method, firstly, an SLAM system is initialized by combining GMS and a sliding window to obtain an initial map; then, tracking and positioning of the SLAM system are realized by combining GMS; then, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; finally, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The visual SLAM method provided by the invention combines GMS (grid motion statistics) and motion detection, and compared with the existing traditional processing method, the visual SLAM method can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of an SLAM system, solves the problems of visual SLAM positioning and drawing in a dynamic scene, and has good real-time performance and higher positioning precision. In addition, compared with the traditional characteristic-based SLAM system which forms mostly sparse maps, the method adds static map threads based on RGB-D data types, and the formed dense maps can embody more environmental characteristics and more environmental detail information compared with the sparse maps.
Drawings
Fig. 1 is an overall flowchart of a dynamic visual SLAM method based on GMS and motion detection according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; outputting static point cloud map for RGB-D image
The present invention is further described below.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which specifically includes the following steps, with reference to fig. 1:
step 1: and initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map.
Namely, the GMS and the sliding window are combined to carry out initialization operation on the SLAM system, and dynamic characteristic influence in an operation scene is eliminated.
And (3) finishing corresponding initialization operation aiming at different data types of monocular, binocular and RGB-D, wherein the monocular operation is step 1.1, and the initialization of the binocular and RGB-D types directly jumps to step 1.2.
Step 1.1: monocular initialization is mainly divided into four stages, wherein in the first stage, a sliding window is utilized to select an initial frame; in the second stage, GMS is adopted to match the feature points; in the third stage, solving the initial frame camera pose; and triangularizing the features at the fourth stage to obtain an initial map.
Step 1.1.1: selecting initialization key frames by using a sliding window, firstly reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold (for example 100), determining that the first initialization key frame is a first initialization key frame F 1 Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; at the first initialization key frame F 1 A sliding window is established on the basis, the minimum window size of the sliding window is n, and the maximum window size is m. Setting sliding window to obtain initial key frame F with certain time space interval 1 And key frame F n Or F m The method aims to obtain relatively pure feature points by matching and rejecting dynamic features through feature matching and adopting a GMS robust algorithm, and can obtain a more thorough rejection effect compared with the situation that GMS processing is carried out on dynamic mismatching points on adjacent key frames. Then processing the subsequent image frame, reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than a first threshold value, adding the current frame into a sliding window, and if not, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
Step 1.1.2: and eliminating mismatching point pairs on the dynamic object by using a GMS matching algorithm. For the initial frame F in step 1.1.1 1 And F n And performing feature matching on the extracted features to obtain a matching point set, and dividing the image into G grids (generally 20 × 20). For the first initialization key frame F 1 Each feature point X of i The matching point corresponding to the feature point falls on F n Matching grids with more number, and establishing a 3 x 3 grid by taking the feature points and the matching point grids as a central expanded areaAnd (4) recording a corresponding matching point set in the grid neighborhood as { a } k ,b k And (5) calculating a comprehensive score S of feature matching according to the corresponding point set, wherein k = {0,1,2.. 9}, and i the reference formula is as follows:
wherein, K =9,is a neighborhood a k ,b k And (4) counting the number of the matching points in the field. Using the composite score S i And comparing the threshold t to judge the accuracy of feature matching. The threshold t is set as:
where α is set to 0.6 and n is the average number of feature points for each of k (i.e., 9) neighborhood grids. When score is S i >And t, judging the characteristic point as a correct matching point, otherwise, judging the characteristic point as an incorrect matching point. Deleting and selecting the matching point pairs between the initial key frames according to the principle to obtain a static matching point set { p } c ,p r }。
Step 1.1.3: and solving the pose of the camera. Using the static matching point set p obtained in the previous step c ,p r H, respectively calculating homography matrix H of the motion cr And a base matrix F cr And calculating to obtain the coordinate { x) of the normalized plane according to the matching points c ,x r And establishing a direct linear transformation matrix for the homography matrix H:
x c =H cr x r
and solving a homography matrix H by using a normalized 4-point algorithm based on RANSAC. For the basis matrix F, an equation is established according to epipolar geometric constraints:
based on RANSAC, a normalized 8-point algorithm is used for solving a basic matrix. The RANSAC algorithm can eliminate outliers to a certain degree, and simultaneously calculates a matrix score R aiming at a homography matrix H and a basic matrix F H :
S H And S F Model scores for the homography matrix and the basis matrix, respectively, if the threshold value R is H >And 0.45, calculating the pose of the camera by using the H matrix through SVD, and otherwise, calculating the essential matrix E by using the F matrix, and then calculating the pose by using the SVD. And finally, checking the pose to obtain an optimal solution, and if the initialized pose does not meet the initialization requirement, taking the next frame and the first initialization key frame F 1 Repeating the above operations as an initial frame until a frame F at the maximum size m of the sliding window m If the initialization is still unsuccessful, the sliding window is moved backward as a whole, and the first frame of the sliding window is taken as the first initialization key frame F 1 And continuing the above operations until the initial pose is obtained.
Step 1.1.4: and (5) triangulating the characteristics to obtain an initial map. Coordinate pairs { x) according to a normalized plane c ,x r There are geometrical relations:
wherein z is c ,z r Is the Z-axis coordinate (i.e. depth information) in the corresponding camera coordinate system, T cw And T rw Pose transformation from the world coordinate system to the current key frame and the reference key frame, P w Are the corresponding 3D point coordinates. The points of the normalization plane corresponding to the cross-product of the above formula can be obtained:
finishing to obtain:
at the moment, SVD is carried out on the formula to finally obtain the final 3D point coordinate P w To { p c ,p r And (4) after the triangularization operation is completed, an initial map is finally obtained, and initialization operation is completed.
Step 1.2: the initialization of binocular/RGB-D is relatively simple and can be divided into three stages, wherein in the first stage, an initial frame is selected by using a sliding window; in the second stage, GMS feature screening is completed; and the third stage is used for completing feature triangulation to obtain an initial map.
Step 1.2.1: the step is similar to step 1.1.1, and the first frame image is read, feature point extraction is performed on the image, the number of feature points is calculated, and if the number of feature points is greater than a second threshold (e.g., 500), the first initialization key frame F is determined to be the first initialization key frame F 1 Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the size of the minimum window is n, and the size of the maximum window is m; processing the subsequent image frame, reading the next image frame as a current frame, if the frame feature number of the current frame is greater than a second threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Step 1.2.2: GMS screens matching point pairs, the step is consistent with the step 1.1.2, and the original frame F is finally obtained 1 And F n The matching point pairs between the static matching points are deleted and selected to obtain a static matching point set { p c ,p r }。
Step 1.2.3: the stage is completed speciallyAnd (5) performing triangle formation. Is simpler for the RGB-D type, at F n And restoring the depth information by combining the correct matching points with the depth map to create a static initial map. For binocular, F n Carrying out binocular triangulation on the correct matching points, and carrying out binocular triangulation on any static matching point p between the left frame and the right frame L (u L ,v L ) And p R (u R ,v R ) Their corresponding 3D points P, using the geometric relationship between the binoculars, are as follows:
wherein z represents the depth information of P, b represents the base line of binocular, f is the focal length of the camera, and d is the parallax between two images. The method is adopted for processing, and initial map information can be finally obtained.
Step 2: and tracking and positioning of the SLAM system are realized by combining GMS.
When the system is initialized successfully, the system enters a tracking thread and initializes a key frame F 1 When a new image frame is received, the pose of the camera is estimated by using a reference frame tracking model or a constant-speed tracking model, then the local key frame and the local map points are updated, the local map points are re-projected to the current frame, a map optimization model is established, and the pose is further optimized. Step 2 is mainly divided into two parts, namely tracking and local map creation, and then loop detection and global optimization of the system.
Step 2.1: the tracing and mapping implementation in the step inherits the basic idea of ORB-SLAM. The image information enters the system and is tracked by adopting different modes of a reference frame tracking model and a constant-speed tracking model.
Step 2.1.1: the SLAM system firstly enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose xi of the current frame, and the map points in the reference frame are projected to the current frame to complete 3D-2D data association, and the relationship is as follows:
wherein, lie algebra xi ^ represents camera pose, u i As pixel coordinates of observation points, s i Is scale information, K is an internal reference matrix, P i As a 3-dimensional coordinate, xi, of a spatial point * The camera pose needs to be optimized. And then BA optimization is carried out to minimize the reprojection error and optimize the pose.
Step 2.1.2: and (3) filtering and eliminating the inspection method by adopting a 3D-2D matching point in the traditional ORB-SLAM system to obtain the matching number matches Map of the final Map points and the feature points, if the matching number is greater than or equal to a third threshold value, for example, the matching number > =10, directly jumping to the step 2.2, and otherwise, entering the step 2.1.3 to track the reference frame.
Step 2.1.3: the SLAM system enters a reference frame tracking module, the pose transformation of the previous frame is used as the initial pose of the current frame, the feature matching of the current frame and a reference key frame is carried out, map points corresponding to feature points in the reference key frame are correlated to matching feature points of the current frame, GMS is adopted for matching the feature points, dynamic feature point pairs are removed, static 3D-2D data correlation is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimum reprojection errors. And (3) referring to the step 2.1.2, performing 3D-2D matching check, if the number of maps is more than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4, and performing relocation.
Step 2.1.4: the SLAM system enters the relocation model. When moving objects are more in a scene, static feature points are insufficient, or a camera moves too fast, gesture tracking is lost, and the target enters a repositioning module. Similar to ORB-SLAM2, candidate key frames having a co-view relationship with the current frame are computed herein by BoW. The current frame is then feature matched with the candidate keyframes using GMS mesh motion. And when the matched feature point is larger than a fourth threshold (for example, 30), estimating the pose of the current frame by combining the PNP algorithm and the RANSAC algorithm, and obtaining the map point of the current frame corresponding to the local map by using the BA optimization algorithm. If the number of map points for the current frame returns to be greater than a fifth threshold (e.g., 50), the relocation is successful, followed by entering step 2.2.
Step 2.2: after the inter-frame tracking is completed, the local map is tracked, the fusion tracking of the tracking data and the local map data is completed, and then the generation of the key frame is judged.
Step 2.2.1: in the local map tracking model, a local map is established for tracking by utilizing the initial pose of the current frame. Firstly, a key frame set observing a map point of a current frame is used as a primary connected key frame, a frame with a higher common view area with the primary connected key frame, a father frame and a subframe form a local key frame together. And then, using map points in the local key frame as local map points, re-projecting map points which can be observed by the current frame in the local map points to the current frame, establishing a map optimization model with more observation edges, and further optimizing the pose of the current frame.
Step 2.2.2: and (4) creating a key frame. After the local map tracking is completed, the current frame is judged, whether the current frame is packaged as a key frame or not is judged, and the adopted preset conditions are as follows: the ratio of the number of the characteristic points tracked by the current frame to the total number of the characteristic points of the reference frame is less than a certain threshold value, the number of the key frames waiting for processing in the local mapping thread is not more than 3, and the minimum interval between the number of the key frames and the number of the total characteristic points of the reference frame is 8 frames; if the preset conditions are not met, returning to the tracking thread to track the next frame of image, otherwise, creating a key frame, and simultaneously performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame for triangularization creation of map points, wherein a single-frame static point cloud map constructed by each frame of key frame is C k 。
And step 3: loop detection and global optimization are carried out; and constructing a static point cloud map according to the RGB-D image.
Step 3.1: loop detection and global optimization. For new keyframes, we are still used to compute the co-view relationships of all keyframes, and keyframes that have a co-view relationship but are not directly connected are taken as candidate keyframes. Feature matching is performed on the candidate key frame and the current key frame by using a GMS algorithm, and the pose is solved and optimized by combining a PNP algorithm and a RANSAC algorithm (the pose of a monocular camera is 7 degrees of freedom, and the poses of a binocular camera and an RGB-D camera are 6 degrees of freedom). After the closed-loop key frames are determined, the pose of all key frames in the local map is optimized by using the pose map model. And finally, optimizing all key frames and map points of the whole local map by using a BA algorithm.
Step 3.2: and (5) static map building. The steps are directed at an RGB-D data set, and are divided into two stages, firstly, judging and screening are carried out on features, and secondly, a depth map is collected to complete the fusion of 3D points.
Step 3.2.1: detecting the motion of the key frame, and referring to the key frame KF ref And current key frame KF cur Inputting the input into a trained YOLO v3 network to segment a potential motion region S of a reference key frame ref And the current key frame S cur ,p r To refer to feature points in key frames, p c For the pixel point projected to the current frame and the depth information of the depth map corresponding to the projected point is Z, the corresponding projected depth value Z can be calculated according to the camera model proj And obtaining a depth difference value:
ΔZ=Z proj -Z
simultaneous calculation of p r And p c With corresponding spatial point p w Angle α therebetween when<=30 ° and Δ Z > T z Wherein T is z Is a depth threshold value, is generally set to 1, and the point p is judged according to the condition of the above formula c For a dynamic point, the point is filtered from the current frame. KF (potassium fluoride) cur All the points are processed according to the method to finally obtain a static characteristic point set.
Step 3.2.2: the single-frame static point cloud map constructed by each frame of key frame is C k Pose with respect to world coordinate system is T k Converting the point clouds of all frames into a world coordinate system, and constructing a global static point cloud map W, namely:
and finally splicing the total number of the N key frame sets, namely the number of the key frames representing the global map to complete the static point cloud map.
And 4, step 4: outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
The dynamic visual SLAM method based on GMS and motion detection provided by the embodiment of the invention at least comprises the following technical effects:
(1) The method of combining GMS and motion detection can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of the SLAM system, can solve the problems of visual SLAM positioning and mapping in a dynamic scene, and has good real-time performance and higher positioning precision.
(2) Compared with the traditional map formed by the SLAM system based on the characteristics, the map is mostly a sparse map, the RGB-D type dense map is added, and the formed dense map can embody more environmental characteristics and more environmental detail information compared with the sparse map.
(3) The method adds a sliding window mechanism in the initialization process to establish the static initial map without the dynamic object feature points, which is equivalent to the prior art, because the time interval and the motion amplitude between adjacent key frames are small, the dynamic feature points are difficult to identify, and the key frames with certain time interval can be matched by adopting the sliding window, so that the dynamic points can be better distinguished and removed, more thorough static feature points can be obtained, and the quality of the initial map points is improved.
(4) The GMS is added in the initialization process, and the GMS can judge the dynamic feature points on the basis of the matched feature point pairs, so that the motion information can be better eliminated.
(5) According to the invention, the potential motion area can be segmented by using a YOLO v3 network, the projection point depth and the rotation angle are used for judging whether the potential motion area is a dynamic area, and if the potential motion area is the dynamic area, the feature points of the corresponding area are removed to finally obtain the static feature points, so that the motion information is better removed.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (7)
1. A GMS and motion detection based dynamic visual SLAM method, comprising the steps of:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
the constructing of the static point cloud map comprises: judging and screening the characteristics; the depth map is collected to complete the fusion of 3D points;
the specific implementation manner of judging and screening the features is as follows:
performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion area S of the reference frame ref And a potential motion region S of the current frame cur ,p r To refer to feature points in key frames, p c Is projected to the pixel point behind the current key frame, and the projected point p c The depth information in the corresponding depth map is Z, and p is calculated according to the camera model c Corresponding projected depth value Z proj And obtaining a depth difference value:
ΔZ=Z proj -Z
calculating p r And p c With corresponding spatial point p w When alpha and delta Z meet the preset condition, the pixel point p is judged c Screening out the pixel point from the current frame as a dynamic point; processing all points in the current key frame according to the method to finally obtain a static characteristic point set;
the specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:
each one is to beSingle-frame static point cloud map constructed by key frames is marked as C k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, which is expressed as follows:
wherein N represents the total number of key frame sets;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
2. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein in step 1, the initializing operation of monocular image correspondence comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangularization.
3. The GMS and motion detection based dynamic visual SLAM method of claim 2, wherein the specific implementation of the initial keyframe selection using sliding windows is: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F 1 (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the characteristic points of the current frame is larger than the first threshold value, adding the current frame into the sliding window, otherwise, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization keyFrame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
4. The GMS and motion detection based dynamic vision SLAM method according to claim 1, wherein the initialization operation corresponding to the binocular image and the RGB-D image in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
5. The GMS and motion detection based dynamic visual SLAM method of claim 4, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F 1 (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; at the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
6. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein said step 2 includes:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of the tracking data and the local map data, and generating a distinguishing key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to carry out the matching of the feature points to obtain and remove dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating the matching, and finally the pose is solved through BA optimization minimum reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a repositioning model, candidate key frames having a common-view relation with a current frame are calculated through BoW, feature matching is carried out on the current frame and the candidate key frames through GMS grid motion, when matched feature points are larger than a fourth threshold value, the pose of the current frame is estimated by combining a PNP algorithm and a RANSAC algorithm, and a BA optimization algorithm is used for obtaining map points of the current frame corresponding to a local map; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
7. The GMS and motion detection based dynamic visual SLAM method according to claim 6, characterised in that said step 2.2 comprises in particular the sub-steps of:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by using the initial pose of the current frame;
and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011282866.8A CN112418288B (en) | 2020-11-17 | 2020-11-17 | GMS and motion detection-based dynamic vision SLAM method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011282866.8A CN112418288B (en) | 2020-11-17 | 2020-11-17 | GMS and motion detection-based dynamic vision SLAM method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112418288A CN112418288A (en) | 2021-02-26 |
CN112418288B true CN112418288B (en) | 2023-02-03 |
Family
ID=74831325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011282866.8A Active CN112418288B (en) | 2020-11-17 | 2020-11-17 | GMS and motion detection-based dynamic vision SLAM method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112418288B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113282088A (en) * | 2021-05-21 | 2021-08-20 | 潍柴动力股份有限公司 | Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle |
CN113506325B (en) * | 2021-07-15 | 2024-04-12 | 清华大学 | Image processing method and device, electronic equipment and storage medium |
CN113781574B (en) * | 2021-07-19 | 2024-04-12 | 长春理工大学 | Dynamic point removing method for binocular refraction and reflection panoramic system |
CN115830110B (en) * | 2022-10-26 | 2024-01-02 | 北京城市网邻信息技术有限公司 | Instant positioning and map construction method and device, terminal equipment and storage medium |
CN115567658B (en) * | 2022-12-05 | 2023-02-28 | 泉州艾奇科技有限公司 | Method and device for keeping image not deflecting and visual earpick |
CN116299500B (en) * | 2022-12-14 | 2024-03-15 | 江苏集萃清联智控科技有限公司 | Laser SLAM positioning method and device integrating target detection and tracking |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056643A (en) * | 2016-04-27 | 2016-10-26 | 武汉大学 | Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system |
CN107392964A (en) * | 2017-07-07 | 2017-11-24 | 武汉大学 | The indoor SLAM methods combined based on indoor characteristic point and structure lines |
CN107871327A (en) * | 2017-10-23 | 2018-04-03 | 武汉大学 | The monocular camera pose estimation of feature based dotted line and optimization method and system |
CN107917710A (en) * | 2017-11-08 | 2018-04-17 | 武汉大学 | A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method |
CN109558790A (en) * | 2018-10-09 | 2019-04-02 | 中国电子科技集团公司电子科学研究院 | A kind of pedestrian target detection method, apparatus and system |
CN109974743A (en) * | 2019-03-14 | 2019-07-05 | 中山大学 | A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure |
CN110009732A (en) * | 2019-04-11 | 2019-07-12 | 司岚光电科技(苏州)有限公司 | Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method |
CN110807377A (en) * | 2019-10-17 | 2020-02-18 | 浙江大华技术股份有限公司 | Target tracking and intrusion detection method, device and storage medium |
CN111161318A (en) * | 2019-12-30 | 2020-05-15 | 广东工业大学 | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching |
CN111708042A (en) * | 2020-05-09 | 2020-09-25 | 汕头大学 | Robot method and system for pedestrian trajectory prediction and following |
CN111797688A (en) * | 2020-06-02 | 2020-10-20 | 武汉大学 | Visual SLAM method based on optical flow and semantic segmentation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111833358A (en) * | 2020-06-26 | 2020-10-27 | 中国人民解放军32802部队 | Semantic segmentation method and system based on 3D-YOLO |
-
2020
- 2020-11-17 CN CN202011282866.8A patent/CN112418288B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056643A (en) * | 2016-04-27 | 2016-10-26 | 武汉大学 | Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system |
CN107392964A (en) * | 2017-07-07 | 2017-11-24 | 武汉大学 | The indoor SLAM methods combined based on indoor characteristic point and structure lines |
CN107871327A (en) * | 2017-10-23 | 2018-04-03 | 武汉大学 | The monocular camera pose estimation of feature based dotted line and optimization method and system |
CN107917710A (en) * | 2017-11-08 | 2018-04-17 | 武汉大学 | A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method |
CN109558790A (en) * | 2018-10-09 | 2019-04-02 | 中国电子科技集团公司电子科学研究院 | A kind of pedestrian target detection method, apparatus and system |
CN109974743A (en) * | 2019-03-14 | 2019-07-05 | 中山大学 | A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure |
CN110009732A (en) * | 2019-04-11 | 2019-07-12 | 司岚光电科技(苏州)有限公司 | Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method |
CN110807377A (en) * | 2019-10-17 | 2020-02-18 | 浙江大华技术股份有限公司 | Target tracking and intrusion detection method, device and storage medium |
CN111161318A (en) * | 2019-12-30 | 2020-05-15 | 广东工业大学 | Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching |
CN111708042A (en) * | 2020-05-09 | 2020-09-25 | 汕头大学 | Robot method and system for pedestrian trajectory prediction and following |
CN111797688A (en) * | 2020-06-02 | 2020-10-20 | 武汉大学 | Visual SLAM method based on optical flow and semantic segmentation |
Non-Patent Citations (3)
Title |
---|
"DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes";Junhao Cheng,Zhi Wang,Hongyan Zhou,Li Li and Jian Yao;《IJGI》;20200327;第9卷;1-18页 * |
"DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors";Guihua Liu,Weilin Zeng,Bo Feng and Feng Xu;《Sensors》;20190827;第19卷;1-20页 * |
Guihua Liu,Weilin Zeng,Bo Feng and Feng Xu."DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors".《Sensors》.2019,第19卷第1-20页. * |
Also Published As
Publication number | Publication date |
---|---|
CN112418288A (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112418288B (en) | GMS and motion detection-based dynamic vision SLAM method | |
CN111462135B (en) | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation | |
CN109166149B (en) | Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU | |
CN111968129B (en) | Instant positioning and map construction system and method with semantic perception | |
CN109544636B (en) | Rapid monocular vision odometer navigation positioning method integrating feature point method and direct method | |
Mur-Artal et al. | ORB-SLAM: a versatile and accurate monocular SLAM system | |
CN109509230A (en) | A kind of SLAM method applied to more camera lens combined type panorama cameras | |
CN106920259B (en) | positioning method and system | |
CN108682027A (en) | VSLAM realization method and systems based on point, line Fusion Features | |
Kang et al. | Detection and tracking of moving objects from a moving platform in presence of strong parallax | |
US20200257862A1 (en) | Natural language understanding for visual tagging | |
Kim et al. | Direct semi-dense SLAM for rolling shutter cameras | |
CN110533720B (en) | Semantic SLAM system and method based on joint constraint | |
CN113108771B (en) | Movement pose estimation method based on closed-loop direct sparse visual odometer | |
CN110766024B (en) | Deep learning-based visual odometer feature point extraction method and visual odometer | |
CN110570453A (en) | Visual odometer method based on binocular vision and closed-loop tracking characteristics | |
CN111445526A (en) | Estimation method and estimation device for pose between image frames and storage medium | |
CN110827353B (en) | Robot positioning method based on monocular camera assistance | |
Concha et al. | Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping. | |
US20210225038A1 (en) | Visual object history | |
US11880964B2 (en) | Light field based reflection removal | |
JP2018113021A (en) | Information processing apparatus and method for controlling the same, and program | |
CN116449384A (en) | Radar inertial tight coupling positioning mapping method based on solid-state laser radar | |
CN116468786B (en) | Semantic SLAM method based on point-line combination and oriented to dynamic environment | |
CN115147344A (en) | Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |