CN112418288B - GMS and motion detection-based dynamic vision SLAM method - Google Patents

GMS and motion detection-based dynamic vision SLAM method Download PDF

Info

Publication number
CN112418288B
CN112418288B CN202011282866.8A CN202011282866A CN112418288B CN 112418288 B CN112418288 B CN 112418288B CN 202011282866 A CN202011282866 A CN 202011282866A CN 112418288 B CN112418288 B CN 112418288B
Authority
CN
China
Prior art keywords
frame
key frame
map
gms
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282866.8A
Other languages
Chinese (zh)
Other versions
CN112418288A (en
Inventor
姚剑
卓德胜
程军豪
龚烨
涂静敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202011282866.8A priority Critical patent/CN112418288B/en
Publication of CN112418288A publication Critical patent/CN112418288A/en
Application granted granted Critical
Publication of CN112418288B publication Critical patent/CN112418288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of visual space positioning, and discloses a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps: initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map; tracking and positioning of the SLAM system are realized by combining GMS; loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The invention solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art. The method can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing building in a dynamic scene, and has good real-time performance and higher positioning precision.

Description

GMS and motion detection-based dynamic vision SLAM method
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic visual SLAM method based on GMS and motion detection.
Background
The instant positioning and mapping (SLAM) technology is a core technology for realizing the functions of the intelligent mobile robot. The SLAM technology dynamically constructs a map model for the current environment in an incremental manner by using a data stream acquired by a laser sensor or a visual sensor, and performs positioning in the process of map construction.
However, neither the feature point method nor the direct method can solve the problems caused by the dynamic objects that are common in the scene. Most of the current visual SLAM methods are still based on the assumption that the observation environment is static, and moving objects in the environment form a lot of wrong data association or cause map points tracked before to be lost due to occlusion, so that the estimation of the camera attitude is seriously influenced, the positioning of the whole visual SLAM system is wrong, and the subsequent series of tasks of the robot are influenced. Therefore, the method has great significance in researching the positioning accuracy and the mapping effect of the visual SLAM in the dynamic environment.
In recent years, there are also a number of scholars beginning to study the visual SLAM problem in dynamic scenes. For example, the main advantage of the method based on foreground-background initialization is to be able to track temporarily stopped dynamic objects, but requires predefined information about the background or objects, and is less effective when many moving objects are contained in the environment. The method based on geometric constraint has clear mathematical theory and low calculation cost, but is difficult to distinguish residual errors caused by moving objects and residual errors caused by error matching, and the initial pose of the camera needs to be known in advance. The optical flow method can process rich information, but is sensitive to illumination condition changes based on the assumption of constancy of brightness. The method based on deep learning has the best overall effect, but the parameter adjustment is complex, and the real-time performance cannot be guaranteed.
Disclosure of Invention
The invention provides a dynamic visual SLAM method based on GMS and motion detection, and solves the problem of poor SLAM tracking and positioning effects in a dynamic environment in the prior art.
The invention provides a dynamic vision SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
Preferably, in step 1, the initializing operation corresponding to the monocular image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F 1 (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; at the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size of the sliding window is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Preferably, in step 1, the initialization operation corresponding to the binocular image and the RGB-D image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: read outOne frame of image, extracting the characteristic points of the image and calculating the number of the characteristic points, if the number of the characteristic points is more than a second threshold value, determining the image as a first initialization key frame F 1 (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Preferably, the step 2 comprises:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a repositioning model, candidate key frames having a common-view relation with a current frame are calculated through BoW, feature matching is carried out on the current frame and the candidate key frames through GMS grid motion, when matched feature points are larger than a fourth threshold value, the pose of the current frame is estimated by combining a PNP algorithm and a RANSAC algorithm, and a BA optimization algorithm is used for obtaining map points of the current frame corresponding to a local map; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
Preferably, the step 2.2 specifically comprises the following substeps:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;
and 2.2.2, after the local map tracking is finished, distinguishing the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame, and triangularizing to create map points.
Preferably, the constructing the static point cloud map in the step 3 includes: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.
Preferably, in the step 3, the specific implementation manner of the judging and screening the features is as follows:
performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion region S of the reference frame ref And a potential motion region S of the current frame cur ,p r To refer to feature points in key frames, p c Is projected to the pixel point behind the current key frame, and the projected point p c Depth in corresponding depth mapDegree information is Z, and p is calculated according to the camera model c Corresponding projected depth value Z proj And obtaining a depth difference value:
ΔZ=Z proj -Z
calculating p r And p c With corresponding spatial point p w When alpha and delta Z meet the preset condition, the pixel point p is judged c Screening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.
Preferably, in step 3, a specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:
marking the single-frame static point cloud map constructed by each key frame as C k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:
Figure BDA0002781361860000041
where N represents the total number of key frame sets.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
in the method, firstly, an SLAM system is initialized by combining GMS and a sliding window to obtain an initial map; then, tracking and positioning of the SLAM system are realized by combining GMS; then, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; finally, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The visual SLAM method provided by the invention combines GMS (grid motion statistics) and motion detection, and compared with the existing traditional processing method, the visual SLAM method can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of an SLAM system, solves the problems of visual SLAM positioning and drawing in a dynamic scene, and has good real-time performance and higher positioning precision. In addition, compared with the traditional characteristic-based SLAM system which forms mostly sparse maps, the method adds static map threads based on RGB-D data types, and the formed dense maps can embody more environmental characteristics and more environmental detail information compared with the sparse maps.
Drawings
Fig. 1 is an overall flowchart of a dynamic visual SLAM method based on GMS and motion detection according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; outputting static point cloud map for RGB-D image
The present invention is further described below.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which specifically includes the following steps, with reference to fig. 1:
step 1: and initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map.
Namely, the GMS and the sliding window are combined to carry out initialization operation on the SLAM system, and dynamic characteristic influence in an operation scene is eliminated.
And (3) finishing corresponding initialization operation aiming at different data types of monocular, binocular and RGB-D, wherein the monocular operation is step 1.1, and the initialization of the binocular and RGB-D types directly jumps to step 1.2.
Step 1.1: monocular initialization is mainly divided into four stages, wherein in the first stage, a sliding window is utilized to select an initial frame; in the second stage, GMS is adopted to match the feature points; in the third stage, solving the initial frame camera pose; and triangularizing the features at the fourth stage to obtain an initial map.
Step 1.1.1: selecting initialization key frames by using a sliding window, firstly reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold (for example 100), determining that the first initialization key frame is a first initialization key frame F 1 Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; at the first initialization key frame F 1 A sliding window is established on the basis, the minimum window size of the sliding window is n, and the maximum window size is m. Setting sliding window to obtain initial key frame F with certain time space interval 1 And key frame F n Or F m The method aims to obtain relatively pure feature points by matching and rejecting dynamic features through feature matching and adopting a GMS robust algorithm, and can obtain a more thorough rejection effect compared with the situation that GMS processing is carried out on dynamic mismatching points on adjacent key frames. Then processing the subsequent image frame, reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than a first threshold value, adding the current frame into a sliding window, and if not, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
Step 1.1.2: and eliminating mismatching point pairs on the dynamic object by using a GMS matching algorithm. For the initial frame F in step 1.1.1 1 And F n And performing feature matching on the extracted features to obtain a matching point set, and dividing the image into G grids (generally 20 × 20). For the first initialization key frame F 1 Each feature point X of i The matching point corresponding to the feature point falls on F n Matching grids with more number, and establishing a 3 x 3 grid by taking the feature points and the matching point grids as a central expanded areaAnd (4) recording a corresponding matching point set in the grid neighborhood as { a } k ,b k And (5) calculating a comprehensive score S of feature matching according to the corresponding point set, wherein k = {0,1,2.. 9}, and i the reference formula is as follows:
Figure BDA0002781361860000061
wherein, K =9,
Figure BDA0002781361860000062
is a neighborhood a k ,b k And (4) counting the number of the matching points in the field. Using the composite score S i And comparing the threshold t to judge the accuracy of feature matching. The threshold t is set as:
Figure BDA0002781361860000063
where α is set to 0.6 and n is the average number of feature points for each of k (i.e., 9) neighborhood grids. When score is S i >And t, judging the characteristic point as a correct matching point, otherwise, judging the characteristic point as an incorrect matching point. Deleting and selecting the matching point pairs between the initial key frames according to the principle to obtain a static matching point set { p } c ,p r }。
Step 1.1.3: and solving the pose of the camera. Using the static matching point set p obtained in the previous step c ,p r H, respectively calculating homography matrix H of the motion cr And a base matrix F cr And calculating to obtain the coordinate { x) of the normalized plane according to the matching points c ,x r And establishing a direct linear transformation matrix for the homography matrix H:
x c =H cr x r
and solving a homography matrix H by using a normalized 4-point algorithm based on RANSAC. For the basis matrix F, an equation is established according to epipolar geometric constraints:
Figure BDA0002781361860000064
based on RANSAC, a normalized 8-point algorithm is used for solving a basic matrix. The RANSAC algorithm can eliminate outliers to a certain degree, and simultaneously calculates a matrix score R aiming at a homography matrix H and a basic matrix F H
Figure BDA0002781361860000065
S H And S F Model scores for the homography matrix and the basis matrix, respectively, if the threshold value R is H >And 0.45, calculating the pose of the camera by using the H matrix through SVD, and otherwise, calculating the essential matrix E by using the F matrix, and then calculating the pose by using the SVD. And finally, checking the pose to obtain an optimal solution, and if the initialized pose does not meet the initialization requirement, taking the next frame and the first initialization key frame F 1 Repeating the above operations as an initial frame until a frame F at the maximum size m of the sliding window m If the initialization is still unsuccessful, the sliding window is moved backward as a whole, and the first frame of the sliding window is taken as the first initialization key frame F 1 And continuing the above operations until the initial pose is obtained.
Step 1.1.4: and (5) triangulating the characteristics to obtain an initial map. Coordinate pairs { x) according to a normalized plane c ,x r There are geometrical relations:
Figure BDA0002781361860000071
wherein z is c ,z r Is the Z-axis coordinate (i.e. depth information) in the corresponding camera coordinate system, T cw And T rw Pose transformation from the world coordinate system to the current key frame and the reference key frame, P w Are the corresponding 3D point coordinates. The points of the normalization plane corresponding to the cross-product of the above formula can be obtained:
Figure BDA0002781361860000072
finishing to obtain:
Figure BDA0002781361860000073
at the moment, SVD is carried out on the formula to finally obtain the final 3D point coordinate P w To { p c ,p r And (4) after the triangularization operation is completed, an initial map is finally obtained, and initialization operation is completed.
Step 1.2: the initialization of binocular/RGB-D is relatively simple and can be divided into three stages, wherein in the first stage, an initial frame is selected by using a sliding window; in the second stage, GMS feature screening is completed; and the third stage is used for completing feature triangulation to obtain an initial map.
Step 1.2.1: the step is similar to step 1.1.1, and the first frame image is read, feature point extraction is performed on the image, the number of feature points is calculated, and if the number of feature points is greater than a second threshold (e.g., 500), the first initialization key frame F is determined to be the first initialization key frame F 1 Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the size of the minimum window is n, and the size of the maximum window is m; processing the subsequent image frame, reading the next image frame as a current frame, if the frame feature number of the current frame is greater than a second threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m When the window reaches the maximum size m.
Step 1.2.2: GMS screens matching point pairs, the step is consistent with the step 1.1.2, and the original frame F is finally obtained 1 And F n The matching point pairs between the static matching points are deleted and selected to obtain a static matching point set { p c ,p r }。
Step 1.2.3: the stage is completed speciallyAnd (5) performing triangle formation. Is simpler for the RGB-D type, at F n And restoring the depth information by combining the correct matching points with the depth map to create a static initial map. For binocular, F n Carrying out binocular triangulation on the correct matching points, and carrying out binocular triangulation on any static matching point p between the left frame and the right frame L (u L ,v L ) And p R (u R ,v R ) Their corresponding 3D points P, using the geometric relationship between the binoculars, are as follows:
Figure BDA0002781361860000081
wherein z represents the depth information of P, b represents the base line of binocular, f is the focal length of the camera, and d is the parallax between two images. The method is adopted for processing, and initial map information can be finally obtained.
Step 2: and tracking and positioning of the SLAM system are realized by combining GMS.
When the system is initialized successfully, the system enters a tracking thread and initializes a key frame F 1 When a new image frame is received, the pose of the camera is estimated by using a reference frame tracking model or a constant-speed tracking model, then the local key frame and the local map points are updated, the local map points are re-projected to the current frame, a map optimization model is established, and the pose is further optimized. Step 2 is mainly divided into two parts, namely tracking and local map creation, and then loop detection and global optimization of the system.
Step 2.1: the tracing and mapping implementation in the step inherits the basic idea of ORB-SLAM. The image information enters the system and is tracked by adopting different modes of a reference frame tracking model and a constant-speed tracking model.
Step 2.1.1: the SLAM system firstly enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose xi of the current frame, and the map points in the reference frame are projected to the current frame to complete 3D-2D data association, and the relationship is as follows:
Figure BDA0002781361860000082
wherein, lie algebra xi ^ represents camera pose, u i As pixel coordinates of observation points, s i Is scale information, K is an internal reference matrix, P i As a 3-dimensional coordinate, xi, of a spatial point * The camera pose needs to be optimized. And then BA optimization is carried out to minimize the reprojection error and optimize the pose.
Step 2.1.2: and (3) filtering and eliminating the inspection method by adopting a 3D-2D matching point in the traditional ORB-SLAM system to obtain the matching number matches Map of the final Map points and the feature points, if the matching number is greater than or equal to a third threshold value, for example, the matching number > =10, directly jumping to the step 2.2, and otherwise, entering the step 2.1.3 to track the reference frame.
Step 2.1.3: the SLAM system enters a reference frame tracking module, the pose transformation of the previous frame is used as the initial pose of the current frame, the feature matching of the current frame and a reference key frame is carried out, map points corresponding to feature points in the reference key frame are correlated to matching feature points of the current frame, GMS is adopted for matching the feature points, dynamic feature point pairs are removed, static 3D-2D data correlation is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimum reprojection errors. And (3) referring to the step 2.1.2, performing 3D-2D matching check, if the number of maps is more than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4, and performing relocation.
Step 2.1.4: the SLAM system enters the relocation model. When moving objects are more in a scene, static feature points are insufficient, or a camera moves too fast, gesture tracking is lost, and the target enters a repositioning module. Similar to ORB-SLAM2, candidate key frames having a co-view relationship with the current frame are computed herein by BoW. The current frame is then feature matched with the candidate keyframes using GMS mesh motion. And when the matched feature point is larger than a fourth threshold (for example, 30), estimating the pose of the current frame by combining the PNP algorithm and the RANSAC algorithm, and obtaining the map point of the current frame corresponding to the local map by using the BA optimization algorithm. If the number of map points for the current frame returns to be greater than a fifth threshold (e.g., 50), the relocation is successful, followed by entering step 2.2.
Step 2.2: after the inter-frame tracking is completed, the local map is tracked, the fusion tracking of the tracking data and the local map data is completed, and then the generation of the key frame is judged.
Step 2.2.1: in the local map tracking model, a local map is established for tracking by utilizing the initial pose of the current frame. Firstly, a key frame set observing a map point of a current frame is used as a primary connected key frame, a frame with a higher common view area with the primary connected key frame, a father frame and a subframe form a local key frame together. And then, using map points in the local key frame as local map points, re-projecting map points which can be observed by the current frame in the local map points to the current frame, establishing a map optimization model with more observation edges, and further optimizing the pose of the current frame.
Step 2.2.2: and (4) creating a key frame. After the local map tracking is completed, the current frame is judged, whether the current frame is packaged as a key frame or not is judged, and the adopted preset conditions are as follows: the ratio of the number of the characteristic points tracked by the current frame to the total number of the characteristic points of the reference frame is less than a certain threshold value, the number of the key frames waiting for processing in the local mapping thread is not more than 3, and the minimum interval between the number of the key frames and the number of the total characteristic points of the reference frame is 8 frames; if the preset conditions are not met, returning to the tracking thread to track the next frame of image, otherwise, creating a key frame, and simultaneously performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame for triangularization creation of map points, wherein a single-frame static point cloud map constructed by each frame of key frame is C k
And step 3: loop detection and global optimization are carried out; and constructing a static point cloud map according to the RGB-D image.
Step 3.1: loop detection and global optimization. For new keyframes, we are still used to compute the co-view relationships of all keyframes, and keyframes that have a co-view relationship but are not directly connected are taken as candidate keyframes. Feature matching is performed on the candidate key frame and the current key frame by using a GMS algorithm, and the pose is solved and optimized by combining a PNP algorithm and a RANSAC algorithm (the pose of a monocular camera is 7 degrees of freedom, and the poses of a binocular camera and an RGB-D camera are 6 degrees of freedom). After the closed-loop key frames are determined, the pose of all key frames in the local map is optimized by using the pose map model. And finally, optimizing all key frames and map points of the whole local map by using a BA algorithm.
Step 3.2: and (5) static map building. The steps are directed at an RGB-D data set, and are divided into two stages, firstly, judging and screening are carried out on features, and secondly, a depth map is collected to complete the fusion of 3D points.
Step 3.2.1: detecting the motion of the key frame, and referring to the key frame KF ref And current key frame KF cur Inputting the input into a trained YOLO v3 network to segment a potential motion region S of a reference key frame ref And the current key frame S cur ,p r To refer to feature points in key frames, p c For the pixel point projected to the current frame and the depth information of the depth map corresponding to the projected point is Z, the corresponding projected depth value Z can be calculated according to the camera model proj And obtaining a depth difference value:
ΔZ=Z proj -Z
simultaneous calculation of p r And p c With corresponding spatial point p w Angle α therebetween when<=30 ° and Δ Z > T z Wherein T is z Is a depth threshold value, is generally set to 1, and the point p is judged according to the condition of the above formula c For a dynamic point, the point is filtered from the current frame. KF (potassium fluoride) cur All the points are processed according to the method to finally obtain a static characteristic point set.
Step 3.2.2: the single-frame static point cloud map constructed by each frame of key frame is C k Pose with respect to world coordinate system is T k Converting the point clouds of all frames into a world coordinate system, and constructing a global static point cloud map W, namely:
Figure BDA0002781361860000101
and finally splicing the total number of the N key frame sets, namely the number of the key frames representing the global map to complete the static point cloud map.
And 4, step 4: outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
The dynamic visual SLAM method based on GMS and motion detection provided by the embodiment of the invention at least comprises the following technical effects:
(1) The method of combining GMS and motion detection can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of the SLAM system, can solve the problems of visual SLAM positioning and mapping in a dynamic scene, and has good real-time performance and higher positioning precision.
(2) Compared with the traditional map formed by the SLAM system based on the characteristics, the map is mostly a sparse map, the RGB-D type dense map is added, and the formed dense map can embody more environmental characteristics and more environmental detail information compared with the sparse map.
(3) The method adds a sliding window mechanism in the initialization process to establish the static initial map without the dynamic object feature points, which is equivalent to the prior art, because the time interval and the motion amplitude between adjacent key frames are small, the dynamic feature points are difficult to identify, and the key frames with certain time interval can be matched by adopting the sliding window, so that the dynamic points can be better distinguished and removed, more thorough static feature points can be obtained, and the quality of the initial map points is improved.
(4) The GMS is added in the initialization process, and the GMS can judge the dynamic feature points on the basis of the matched feature point pairs, so that the motion information can be better eliminated.
(5) According to the invention, the potential motion area can be segmented by using a YOLO v3 network, the projection point depth and the rotation angle are used for judging whether the potential motion area is a dynamic area, and if the potential motion area is the dynamic area, the feature points of the corresponding area are removed to finally obtain the static feature points, so that the motion information is better removed.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (7)

1. A GMS and motion detection based dynamic visual SLAM method, comprising the steps of:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
the constructing of the static point cloud map comprises: judging and screening the characteristics; the depth map is collected to complete the fusion of 3D points;
the specific implementation manner of judging and screening the features is as follows:
performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion area S of the reference frame ref And a potential motion region S of the current frame cur ,p r To refer to feature points in key frames, p c Is projected to the pixel point behind the current key frame, and the projected point p c The depth information in the corresponding depth map is Z, and p is calculated according to the camera model c Corresponding projected depth value Z proj And obtaining a depth difference value:
ΔZ=Z proj -Z
calculating p r And p c With corresponding spatial point p w When alpha and delta Z meet the preset condition, the pixel point p is judged c Screening out the pixel point from the current frame as a dynamic point; processing all points in the current key frame according to the method to finally obtain a static characteristic point set;
the specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:
each one is to beSingle-frame static point cloud map constructed by key frames is marked as C k And the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as T k Converting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, which is expressed as follows:
Figure FDA0003803968410000011
wherein N represents the total number of key frame sets;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
2. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein in step 1, the initializing operation of monocular image correspondence comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangularization.
3. The GMS and motion detection based dynamic visual SLAM method of claim 2, wherein the specific implementation of the initial keyframe selection using sliding windows is: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the image is a first initialization key frame F 1 (ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the characteristic points of the current frame is larger than the first threshold value, adding the current frame into the sliding window, otherwise, abandoning; repeating the above operations, processing the key frame to obtain the nth initialization keyFrame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
4. The GMS and motion detection based dynamic vision SLAM method according to claim 1, wherein the initialization operation corresponding to the binocular image and the RGB-D image in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
5. The GMS and motion detection based dynamic visual SLAM method of claim 4, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F 1 (ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; at the first initialization key frame F 1 Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operation, processing the key frame to obtain the nth initialization key frame F n Adding the key frame into the window to reach the minimum size n, and then continuously processing the added key frame to finally obtain the mth key frame F m At this point the window reaches a maximum size m.
6. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein said step 2 includes:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of the tracking data and the local map data, and generating a distinguishing key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to carry out the matching of the feature points to obtain and remove dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating the matching, and finally the pose is solved through BA optimization minimum reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a repositioning model, candidate key frames having a common-view relation with a current frame are calculated through BoW, feature matching is carried out on the current frame and the candidate key frames through GMS grid motion, when matched feature points are larger than a fourth threshold value, the pose of the current frame is estimated by combining a PNP algorithm and a RANSAC algorithm, and a BA optimization algorithm is used for obtaining map points of the current frame corresponding to a local map; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
7. The GMS and motion detection based dynamic visual SLAM method according to claim 6, characterised in that said step 2.2 comprises in particular the sub-steps of:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by using the initial pose of the current frame;
and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.
CN202011282866.8A 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method Active CN112418288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282866.8A CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282866.8A CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Publications (2)

Publication Number Publication Date
CN112418288A CN112418288A (en) 2021-02-26
CN112418288B true CN112418288B (en) 2023-02-03

Family

ID=74831325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282866.8A Active CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Country Status (1)

Country Link
CN (1) CN112418288B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282088A (en) * 2021-05-21 2021-08-20 潍柴动力股份有限公司 Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle
CN113506325B (en) * 2021-07-15 2024-04-12 清华大学 Image processing method and device, electronic equipment and storage medium
CN113781574B (en) * 2021-07-19 2024-04-12 长春理工大学 Dynamic point removing method for binocular refraction and reflection panoramic system
CN115830110B (en) * 2022-10-26 2024-01-02 北京城市网邻信息技术有限公司 Instant positioning and map construction method and device, terminal equipment and storage medium
CN115567658B (en) * 2022-12-05 2023-02-28 泉州艾奇科技有限公司 Method and device for keeping image not deflecting and visual earpick
CN116299500B (en) * 2022-12-14 2024-03-15 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056643A (en) * 2016-04-27 2016-10-26 武汉大学 Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system
CN107392964A (en) * 2017-07-07 2017-11-24 武汉大学 The indoor SLAM methods combined based on indoor characteristic point and structure lines
CN107871327A (en) * 2017-10-23 2018-04-03 武汉大学 The monocular camera pose estimation of feature based dotted line and optimization method and system
CN107917710A (en) * 2017-11-08 2018-04-17 武汉大学 A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method
CN109558790A (en) * 2018-10-09 2019-04-02 中国电子科技集团公司电子科学研究院 A kind of pedestrian target detection method, apparatus and system
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110009732A (en) * 2019-04-11 2019-07-12 司岚光电科技(苏州)有限公司 Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method
CN110807377A (en) * 2019-10-17 2020-02-18 浙江大华技术股份有限公司 Target tracking and intrusion detection method, device and storage medium
CN111161318A (en) * 2019-12-30 2020-05-15 广东工业大学 Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching
CN111708042A (en) * 2020-05-09 2020-09-25 汕头大学 Robot method and system for pedestrian trajectory prediction and following
CN111797688A (en) * 2020-06-02 2020-10-20 武汉大学 Visual SLAM method based on optical flow and semantic segmentation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833358A (en) * 2020-06-26 2020-10-27 中国人民解放军32802部队 Semantic segmentation method and system based on 3D-YOLO

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056643A (en) * 2016-04-27 2016-10-26 武汉大学 Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system
CN107392964A (en) * 2017-07-07 2017-11-24 武汉大学 The indoor SLAM methods combined based on indoor characteristic point and structure lines
CN107871327A (en) * 2017-10-23 2018-04-03 武汉大学 The monocular camera pose estimation of feature based dotted line and optimization method and system
CN107917710A (en) * 2017-11-08 2018-04-17 武汉大学 A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method
CN109558790A (en) * 2018-10-09 2019-04-02 中国电子科技集团公司电子科学研究院 A kind of pedestrian target detection method, apparatus and system
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110009732A (en) * 2019-04-11 2019-07-12 司岚光电科技(苏州)有限公司 Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method
CN110807377A (en) * 2019-10-17 2020-02-18 浙江大华技术股份有限公司 Target tracking and intrusion detection method, device and storage medium
CN111161318A (en) * 2019-12-30 2020-05-15 广东工业大学 Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching
CN111708042A (en) * 2020-05-09 2020-09-25 汕头大学 Robot method and system for pedestrian trajectory prediction and following
CN111797688A (en) * 2020-06-02 2020-10-20 武汉大学 Visual SLAM method based on optical flow and semantic segmentation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes";Junhao Cheng,Zhi Wang,Hongyan Zhou,Li Li and Jian Yao;《IJGI》;20200327;第9卷;1-18页 *
"DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors";Guihua Liu,Weilin Zeng,Bo Feng and Feng Xu;《Sensors》;20190827;第19卷;1-20页 *
Guihua Liu,Weilin Zeng,Bo Feng and Feng Xu."DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors".《Sensors》.2019,第19卷第1-20页. *

Also Published As

Publication number Publication date
CN112418288A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112418288B (en) GMS and motion detection-based dynamic vision SLAM method
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN111968129B (en) Instant positioning and map construction system and method with semantic perception
CN109544636B (en) Rapid monocular vision odometer navigation positioning method integrating feature point method and direct method
Mur-Artal et al. ORB-SLAM: a versatile and accurate monocular SLAM system
CN109509230A (en) A kind of SLAM method applied to more camera lens combined type panorama cameras
CN106920259B (en) positioning method and system
CN108682027A (en) VSLAM realization method and systems based on point, line Fusion Features
Kang et al. Detection and tracking of moving objects from a moving platform in presence of strong parallax
US20200257862A1 (en) Natural language understanding for visual tagging
Kim et al. Direct semi-dense SLAM for rolling shutter cameras
CN110533720B (en) Semantic SLAM system and method based on joint constraint
CN113108771B (en) Movement pose estimation method based on closed-loop direct sparse visual odometer
CN110766024B (en) Deep learning-based visual odometer feature point extraction method and visual odometer
CN110570453A (en) Visual odometer method based on binocular vision and closed-loop tracking characteristics
CN111445526A (en) Estimation method and estimation device for pose between image frames and storage medium
CN110827353B (en) Robot positioning method based on monocular camera assistance
Concha et al. Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping.
US20210225038A1 (en) Visual object history
US11880964B2 (en) Light field based reflection removal
JP2018113021A (en) Information processing apparatus and method for controlling the same, and program
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN115147344A (en) Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant