CN112418288A - GMS and motion detection-based dynamic vision SLAM method - Google Patents

GMS and motion detection-based dynamic vision SLAM method Download PDF

Info

Publication number
CN112418288A
CN112418288A CN202011282866.8A CN202011282866A CN112418288A CN 112418288 A CN112418288 A CN 112418288A CN 202011282866 A CN202011282866 A CN 202011282866A CN 112418288 A CN112418288 A CN 112418288A
Authority
CN
China
Prior art keywords
frame
key frame
map
gms
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011282866.8A
Other languages
Chinese (zh)
Other versions
CN112418288B (en
Inventor
姚剑
卓德胜
程军豪
龚烨
涂静敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202011282866.8A priority Critical patent/CN112418288B/en
Publication of CN112418288A publication Critical patent/CN112418288A/en
Application granted granted Critical
Publication of CN112418288B publication Critical patent/CN112418288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Abstract

The invention belongs to the technical field of visual space positioning, and discloses a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps: initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map; tracking and positioning of the SLAM system are realized by combining GMS; loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The invention solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art. The method can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing building in a dynamic scene, and has good real-time performance and higher positioning precision.

Description

GMS and motion detection-based dynamic vision SLAM method
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic visual SLAM method based on GMS and motion detection.
Background
The instant positioning and mapping (SLAM) technology is a core technology for realizing the functions of the intelligent mobile robot. The SLAM technology dynamically constructs a map model for the current environment in an incremental manner by using a data stream acquired by a laser sensor or a visual sensor, and performs positioning in the process of constructing the map.
However, neither the feature point method nor the direct method can solve the problems caused by the dynamic objects commonly found in the scene. Most of current visual SLAM methods are still based on the assumption that an observation environment is static, moving objects in the environment can form a lot of wrong data associations or cause map points tracked before to be lost due to occlusion, so that the estimation of the camera attitude is seriously influenced, the positioning error of the whole visual SLAM system is caused, and a series of subsequent tasks of the robot are influenced. Therefore, the method has great significance in researching how to improve the positioning accuracy and the mapping effect of the visual SLAM in the dynamic environment.
In recent years, there are also a number of scholars beginning to study the visual SLAM problem in dynamic scenes. For example, the main advantage of the method based on foreground background initialization is to be able to track temporarily stopped dynamic objects, but requires predefined information about the background or objects, and is less effective when many moving objects are contained in the environment. The method based on geometric constraint has clear mathematical theory and low calculation cost, but the residual error caused by moving objects and the residual error caused by error matching are difficult to distinguish, and the initial pose of the camera needs to be known in advance. The optical flow method can process rich information, but is based on the assumption of constancy of brightness, and is sensitive to changes in illumination conditions. The method based on deep learning has the best overall effect, but the parameter adjustment is complex, and the real-time performance cannot be guaranteed.
Disclosure of Invention
The invention provides a dynamic visual SLAM method based on GMS and motion detection, and solves the problem of poor SLAM tracking and positioning effect in a dynamic environment in the prior art.
The invention provides a dynamic vision SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
Preferably, in step 1, the initializing operation corresponding to the monocular image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the first frame image is a first initialization key frame F1(ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F1Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
Preferably, in step 1, the initialization operation corresponding to the binocular image and the RGB-D image includes: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
Preferably, the specific implementation manner of selecting the initialization key frame by using the sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F1(ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F1Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
Preferably, the step 2 comprises:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a relocation model, calculates a candidate key frame having a common-view relation with a current frame through BoW, performs feature matching on the current frame and the candidate key frame by using GMS grid motion, estimates the pose of the current frame by combining PNP and RANSAC algorithms when a matched feature point is larger than a fourth threshold value, and obtains a map point corresponding to the current frame on a local map by using a BA optimization algorithm; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
Preferably, the step 2.2 specifically comprises the following substeps:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;
and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.
Preferably, the constructing the static point cloud map in the step 3 includes: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.
Preferably, in the step 3, the specific implementation manner of the judging and screening the features is as follows:
detecting the motion of the key frame, and referring to the key frameInputting the current key frame into a trained YOLO v3 network to segment a potential motion area S of the reference framerefAnd a potential motion region S of the current framecur,prTo refer to feature points in key frames, pcIs projected to the pixel point behind the current key frame, and the projected point pcThe depth information in the corresponding depth map is Z, and p is calculated according to the camera modelcCorresponding projected depth value ZprojAnd obtaining a depth difference value:
ΔZ=Zproj-Z
calculating prAnd pcWith corresponding spatial point pwWhen alpha and delta Z meet the preset condition, the pixel point p is judgedcScreening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.
Preferably, in step 3, a specific implementation manner of completing the fusion of the 3D points by the set depth map is as follows:
marking the single-frame static point cloud map constructed by each key frame as CkAnd the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as TkConverting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:
Figure BDA0002781361860000041
where N represents the total number of key frame sets.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
firstly, initializing a SLAM system by combining GMS and a sliding window to obtain an initial map; then, tracking and positioning of the SLAM system are realized by combining GMS; then, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image; finally, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image. The visual SLAM method provided by the invention combines GMS (grid motion statistics) and motion detection, and compared with the existing traditional processing method, the visual SLAM method can well eliminate the influence of dynamic characteristics and integrate the dynamic characteristics into each functional module of the SLAM system, solves the problems of visual SLAM positioning and drawing establishment in a dynamic scene, and has good real-time performance and higher positioning precision. In addition, compared with the traditional characteristic-based SLAM system which forms mostly sparse maps, the method adds static map threads based on RGB-D data types, and the formed dense maps can embody more environmental characteristics and more environmental detail information compared with the sparse maps.
Drawings
Fig. 1 is an overall flowchart of a dynamic visual SLAM method based on GMS and motion detection according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which comprises the following steps:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; outputting static point cloud map for RGB-D image
The present invention is further described below.
The embodiment provides a dynamic visual SLAM method based on GMS and motion detection, which specifically includes the following steps, with reference to fig. 1:
step 1: and initializing the SLAM system by combining the GMS and the sliding window to obtain an initial map.
Namely, the GMS and the sliding window are combined to carry out initialization operation on the SLAM system, and dynamic characteristic influence in an operation scene is eliminated.
And (3) finishing corresponding initialization operation aiming at different data types of monocular, binocular and RGB-D, wherein the monocular operation is step 1.1, and the initialization of the binocular and RGB-D types is directly skipped to step 1.2.
Step 1.1: monocular initialization is mainly divided into four stages, wherein in the first stage, a sliding window is used for selecting an initial frame; in the second stage, GMS is adopted to match the feature points; in the third stage, solving the initial frame camera pose; and the fourth stage of feature triangularization to obtain an initial map.
Step 1.1.1: selecting initialization key frame by using sliding window, firstly reading first frame image, extracting characteristic points of image, calculating characteristic point number, if the characteristic point number is greater than first threshold (for example 100), confirming as first initialization key frame F1Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; at the first initialization key frame F1A sliding window is established on the basis, the minimum window size of the sliding window is n, and the maximum window size is m. Setting sliding window to obtain initial key frame F with certain time space interval1And key frame FnOr FmThe method aims to obtain relatively pure feature points by matching and rejecting dynamic features through feature matching and adopting a GMS robust algorithm, and can obtain a more thorough rejection effect compared with the situation that GMS processing is carried out on dynamic mismatching points on adjacent key frames. Then processing the subsequent image frame, reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than a first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
Step 1.1.2: and eliminating mismatching point pairs on the dynamic object by using a GMS matching algorithm. For the initial frame F in step 1.1.11And FnThe extracted features are carried outFeature matching obtains a set of matching points, and the image is divided into G grids (generally 20 × 20). For the first initialization key frame F1Each feature point X ofiThe matching point corresponding to the feature point falls on FnIn grids with more matching numbers, a 3 x 3 grid neighborhood is established by taking the feature point and the matching point grid as a central expansion area, and the corresponding matching point set in the grid neighborhood is recorded as { a }k,bkAnd calculating a comprehensive score S of feature matching according to the corresponding point set, wherein k is {0,1,2.. 9}iThe reference formula is as follows:
Figure BDA0002781361860000061
wherein, K is 9,
Figure BDA0002781361860000062
is a neighborhood ak,bkAnd (4) counting the number of the matching points in the field. Using the composite score SiAnd comparing the threshold t to judge the accuracy of feature matching. The threshold t is set as:
Figure BDA0002781361860000063
where α is set to 0.6 and n is the average number of feature points for each of k (i.e., 9) neighborhood grids. When the score is Si>And t, judging the characteristic point as a correct matching point, otherwise, judging the characteristic point as an incorrect matching point. Deleting the matching point pairs between the initial key frames according to the principle to obtain a static matching point set { pc,pr}。
Step 1.1.3: and solving the pose of the camera. Using the static matching point set p obtained in the previous stepc,prH, respectively calculating homography matrix H of the motioncrAnd a basis matrix FcrAnd calculating to obtain the coordinate { x) of the normalized plane according to the matching pointsc,xrAnd establishing a direct linear transformation matrix for the homography matrix H:
xc=Hcrxr
and solving a homography matrix H by using a normalized 4-point algorithm based on RANSAC. For the basis matrix F, an equation is established according to epipolar geometric constraints:
Figure BDA0002781361860000064
based on RANSAC, the basis matrix is solved using a normalized 8-point algorithm. The RANSAC algorithm can eliminate outliers to a certain degree, and simultaneously calculates a matrix score R aiming at a homography matrix H and a basic matrix FH
Figure BDA0002781361860000065
SHAnd SFModel scores for homography matrix and basis matrix, respectively, if threshold RH>And 0.45, calculating the pose of the camera by using the H matrix through SVD, and otherwise, calculating the essential matrix E by using the F matrix, and then calculating the pose by using the SVD. Finally, checking the pose to obtain the optimal solution, and if the initialized pose does not meet the initialization requirement, taking the next frame and the first initialization key frame F1Repeating the above operations as an initial frame until a frame F at the maximum size m of the sliding windowmIf the initialization is still unsuccessful, the sliding window is moved backward as a whole, and the first frame of the sliding window is taken as the first initialization key frame F1And continuing the above operations until the initial pose is obtained.
Step 1.1.4: and triangularizing the features to obtain an initial map. Coordinate pairs { x) according to a normalized planec,xrThere are geometrical relations:
Figure BDA0002781361860000071
wherein z isc,zrFor the Z-axis coordinate (i.e. depth information) in the corresponding camera coordinate systemInformation), TcwAnd TrwPose transformation from the world coordinate system to the current key frame and the reference key frame, PwAre the corresponding 3D point coordinates. The points of the normalization plane corresponding to the cross-product of the above formula can be obtained:
Figure BDA0002781361860000072
finishing to obtain:
Figure BDA0002781361860000073
at the moment, SVD solving is carried out on the formula to finally obtain the final 3D point coordinate PwTo { pc,prAnd finally obtaining an initial map after the triangularization operation is completed by the matching point pair of the map, and completing initialization operation.
Step 1.2: the initialization of binocular/RGB-D is relatively simple and can be divided into three stages, wherein in the first stage, an initial frame is selected by using a sliding window; in the second stage, GMS feature screening is completed; and the third stage is used for completing feature triangulation to obtain an initial map.
Step 1.2.1: the step is similar to step 1.1.1, and the first frame image is read, feature point extraction is performed on the image, the number of feature points is calculated, and if the number of feature points is greater than a second threshold (e.g., 500), the first initialization key frame F is determined to be the first initialization key frame F1Otherwise, reading the next frame of image until obtaining the initialization key frame meeting the conditions; in the first initialization key frame F1Establishing a sliding window on the basis, wherein the minimum window size is n, and the maximum window size is m; processing the subsequent image frame, reading the next image frame as a current frame, if the frame feature number of the current frame is greater than a second threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
Step 1.2.2: GMS screens matching point pairs, the step is consistent with the step 1.1.2, and the original frame F is finally obtained1And FnThe matching point pairs between the static matching points are deleted and selected to obtain a static matching point set { pc,pr}。
Step 1.2.3: this stage completes feature triangulation. Is simpler for RGB-D type, at FnAnd recovering the depth information by combining the correct matching points with the depth map, and creating a static initial map. For binocular, FnCarrying out binocular triangulation on the correct matching points, and carrying out binocular triangulation on any static matching point p between the left frame and the right frameL(uL,vL) And pR(uR,vR) Their corresponding 3D points P, using the geometric relationship between the binoculars, are as follows:
Figure BDA0002781361860000081
wherein z represents the depth information of P, b represents the base line of binocular, f is the focal length of the camera, and d is the parallax between two frames of images. The initial map information can be finally obtained by processing by adopting the method.
Step 2: and tracking and positioning of the SLAM system are realized by combining GMS.
When the system is initialized successfully, the system enters a tracking thread and initializes a key frame F1When a new image frame is received, the camera pose is estimated by using a reference frame tracking model or a constant-speed tracking model, then the local key frame and the local map point are updated, the local map point is re-projected to the current frame, a map optimization model is established, and the pose is further optimized. Step 2 is mainly divided into two parts, namely tracking and local map creation, and loop detection and global optimization of the system.
Step 2.1: the tracing and mapping implementation in the step inherits the basic idea of ORB-SLAM. The image information enters the system and is tracked by adopting different modes of a reference frame tracking model and a constant speed tracking model.
Step 2.1.1: the SLAM system firstly enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose xi of the current frame, and the map point in the reference frame is projected to the 3D-2D data association of the current frame, and the relationship is as follows:
Figure BDA0002781361860000082
wherein, lie algebra xi ^ represents the camera pose, uiAs pixel coordinates of observation points, siIs scale information, K is an internal reference matrix, PiAs a 3-dimensional coordinate, xi, of a spatial point*The camera pose needs to be optimized. And then BA optimization is carried out to minimize the reprojection error and optimize the pose.
Step 2.1.2: and (3) filtering and eliminating the inspection method by adopting a 3D-2D matching point in the traditional ORB-SLAM system to obtain the matching number maps of the final Map points and the feature points, if the matching number is greater than or equal to a third threshold value, for example, the matching number > is 10, directly jumping to the step 2.2, and if not, entering the step 2.1.3 to track the reference frame.
Step 2.1.3: the SLAM system enters a reference frame tracking module, the pose transformation of the previous frame is used as the initial pose of the current frame, the feature matching of the current frame and a reference key frame is carried out, map points corresponding to feature points in the reference key frame are correlated to matching feature points of the current frame, GMS is adopted for matching the feature points, dynamic feature point pairs are removed, static 3D-2D data correlation is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimum reprojection errors. And (3) referring to the step 2.1.2, performing 3D-2D matching check, if the number of matches Map is larger than or equal to a third threshold, directly jumping to the step 2.2, and otherwise, entering the step 2.1.4 to perform relocation.
Step 2.1.4: the SLAM system enters the relocation model. When a plurality of moving objects exist in a scene, static characteristic points are insufficient, or a camera moves too fast, the attitude tracking is lost, and the camera enters a repositioning module. Similar to ORB-SLAM2, candidate key frames having a co-view relationship with the current frame are computed by BoW herein. The current frame is then feature matched with the candidate keyframes using GMS mesh motion. And when the matched feature point is larger than a fourth threshold (for example, 30), estimating the pose of the current frame by combining the PNP algorithm and the RANSAC algorithm, and obtaining the map point of the current frame corresponding to the local map by using the BA optimization algorithm. If the number of map points for the current frame returns to be greater than a fifth threshold (e.g., 50), the relocation is successful, followed by entering step 2.2.
Step 2.2: after the inter-frame tracking is completed, the local map is tracked, the fusion tracking of the tracking data and the local map data is completed, and then the generation of the key frame is judged.
Step 2.2.1: in the local map tracking model, a local map is established for tracking by utilizing the initial pose of the current frame. Firstly, a key frame set observing a map point of a current frame is used as a primary connected key frame, a frame with a higher common view area with the primary connected key frame, a father frame and a subframe form a local key frame together. Then, the map points in the local key frame are used as local map points, the map points which can be observed by the current frame in the local map points are re-projected to the current frame, a more-observation-side map optimization model is established, and the pose of the current frame is further optimized.
Step 2.2.2: and (4) creation of a key frame. After the local map tracking is completed, the current frame is judged, whether the current frame is packaged as a key frame or not is judged, and the adopted preset conditions are as follows: the ratio of the number of the feature points tracked by the current frame to the total number of the feature points of the reference frame is less than a certain threshold, the number of the key frames waiting for processing in the local mapping thread is not more than 3, and the minimum interval between the number of the key frames and the number of the total feature points of the reference frame is 8 frames; if the preset conditions are not met, returning to the tracking thread to track the next frame of image, otherwise, creating a key frame, and simultaneously performing GMS feature matching on the current key frame and a reference key frame to obtain the static features of the current frame for triangularization creation of map points, wherein a single-frame static point cloud map constructed by each frame of key frame is Ck
And step 3: loop detection and global optimization are carried out; and constructing a static point cloud map according to the RGB-D image.
Step 3.1: loop detection and global optimization. For new keyframes, we are still used to compute the co-view relationships of all keyframes, and keyframes that have a co-view relationship but are not directly connected are taken as candidate keyframes. The GMS algorithm is used for carrying out feature matching on the candidate key frame and the current key frame, and the PNP algorithm and the RANSAC algorithm are combined to solve and optimize the pose (the pose of the monocular camera is 7 degrees of freedom, and the pose of the binocular camera and the pose of the RGB-D camera are 6 degrees of freedom). And after the closed-loop key frames are determined, optimizing the poses of all key frames in the local map by using the pose map model. And finally, optimizing all key frames and map points of the whole local map by using a BA algorithm.
Step 3.2: and (5) static map construction. The steps are directed at an RGB-D data set, and are divided into two stages, firstly, judging and screening are carried out on features, and secondly, a depth map is collected to complete the fusion of 3D points.
Step 3.2.1: detecting the motion of the key frame and referring to the key frame KFrefAnd current key frame KFcurInput into a trained YOLO v3 network to segment out potential motion areas S of reference key framesrefAnd the current key frame Scur,prTo refer to feature points in key frames, pcFor the pixel point projected to the current frame and the depth information of the depth map corresponding to the projected point is Z, the corresponding projected depth value Z can be calculated according to the camera modelprojAnd obtaining a depth difference value:
ΔZ=Zproj-Z
simultaneous calculation of prAnd pcWith corresponding spatial point pwWhen alpha is included<30 DEG and Delta Z > TzWherein T iszIs a depth threshold value, is generally set to 1, and the point p is judged according to the condition of the above formulacFor a dynamic point, the point is filtered from the current frame. KFcurAnd processing all the points according to the method to finally obtain a static characteristic point set.
Step 3.2.2: the single-frame static point cloud map constructed by each frame of key frame is CkPose with respect to world coordinate system is TkConverting the point clouds of all frames into a world coordinate system, and constructing a global static point cloud map W, namely:
Figure BDA0002781361860000101
and finally splicing the total number of the N key frame sets, namely representing the number of key frames of the global map to complete the static point cloud map.
And 4, step 4: outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
The dynamic visual SLAM method based on GMS and motion detection provided by the embodiment of the invention at least comprises the following technical effects:
(1) the method of the invention can well eliminate the influence of dynamic characteristics and integrate the influence into each functional module of the SLAM system by using the method of combining GMS and motion detection, can solve the problems of visual SLAM positioning and mapping in a dynamic scene, and has good real-time performance and higher positioning precision.
(2) Compared with the traditional map formed by the SLAM system based on the characteristics, the map is mostly a sparse map, the RGB-D type dense map is added, and the formed dense map can embody more environmental characteristics and more environmental detail information compared with the sparse map.
(3) The method adds a sliding window mechanism in the initialization process to establish the static initial map without the dynamic object feature points, which is equivalent to the prior art, because the time interval and the motion amplitude between adjacent key frames are small, the dynamic feature points are difficult to identify, and the key frames with certain time interval can be matched by adopting the sliding window, so that the dynamic points can be better distinguished and removed, more thorough static feature points can be obtained, and the quality of the initial map points is improved.
(4) The GMS is added in the initialization process, and the GMS can judge the dynamic feature points on the basis of the matched feature point pairs, so that the motion information can be better eliminated.
(5) According to the invention, the potential motion area can be segmented by using a YOLO v3 network, the projection point depth and the rotation angle are used for judging whether the potential motion area is a dynamic area, and if the potential motion area is the dynamic area, the feature points of the corresponding area are removed to finally obtain the static feature points, so that the motion information is better removed.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. A GMS and motion detection based dynamic visual SLAM method, comprising the steps of:
step 1, initializing an SLAM system by combining GMS and a sliding window to obtain an initial map;
step 2, tracking and positioning of the SLAM system are realized by combining GMS;
step 3, loop detection and global optimization are carried out; constructing a static point cloud map according to the RGB-D image;
step 4, outputting a pose track after data processing is finished; and outputting a static point cloud map according to the RGB-D image.
2. The GMS and motion detection based dynamic visual SLAM method according to claim 1, wherein the initialization operation for monocular image correspondence in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; solving the initial frame camera pose; and obtaining an initial map through feature triangulation.
3. The GMS and motion detection based dynamic visual SLAM method of claim 2, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a first threshold value, determining that the first frame image is a first initialization key frame F1(ii) a Otherwise, ignoring the frame image, and performing feature extraction and threshold verification on the subsequent images until the threshold requirement is met to obtain a first initialization key frame; in the first initialization key frame F1Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n, and the maximum window size is m; reading the next image frame as a current frame, if the number of the feature points of the current frame is greater than the first threshold value, adding the current frame into a sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding the key frame into the window to reach the minimum size n, and then continuing to process the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
4. The GMS and motion detection based dynamic vision SLAM method according to claim 1, wherein the initialization operation corresponding to the binocular image and the RGB-D image in step 1 comprises: selecting an initialization key frame by using a sliding window; matching the feature points by adopting GMS; and obtaining an initial map through feature triangulation.
5. The GMS and motion detection based dynamic visual SLAM method of claim 4, wherein the specific implementation of the initialization key frame selection using sliding window is as follows: reading a first frame image, extracting feature points of the image, calculating the number of the feature points, and if the number of the feature points is greater than a second threshold value, determining that the first frame image is a first initialization key frame F1(ii) a Otherwise, neglecting the frame image, and performing feature extraction and threshold verification on the subsequent images until the key frame is obtained according with the threshold requirement; in the first initialization key frame F1Establishing a sliding window on the basis, wherein the minimum window size of the sliding window is n; reading the next image frame as the current frame, if the number of the feature points of the current frame is larger than the second threshold value, adding the current frame into the sliding window, otherwise, discarding; repeating the above operations, processing the key frame to obtain the nth initialization key frame FnAdding to the window to the maximumSmall size n, then continuously processing the added key frame to finally obtain the mth key frame FmWhen the window reaches the maximum size m.
6. The GMS and motion detection based dynamic visual SLAM method of claim 1, wherein said step 2 comprises:
2.1, tracking any image information of monocular, binocular or RGB-D type in an SLAM system;
2.2, tracking the local map, completing fusion tracking of tracking data and local map data, and generating a judgment key frame;
wherein, the step 2.1 specifically comprises the following substeps:
step 2.1.1, the SLAM system enters a constant-speed tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, and map points in the reference frame are projected to the current frame to complete 3D-2D data association;
step 2.1.2, performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, and directly jumping to the step 2.2 if the matching number is larger than or equal to a third threshold value; otherwise, step 2.1.3 is carried out to track the reference frame;
step 2.1.3, the SLAM system enters a reference frame tracking model, the pose transformation of the previous frame is used as the initial pose of the current frame, map points corresponding to feature points in the reference key frame are associated to matching feature points of the current frame through the feature matching of the current frame and the reference key frame, GMS is adopted to match the feature points to obtain and eliminate dynamic feature point pairs, static 3D-2D data association is formed, a BoW bag-of-word method is used for accelerating matching, and finally the pose is solved through BA optimization minimized reprojection errors; performing 3D-2D matching inspection to obtain the matching number of the final map points and the feature points, if the matching number is greater than or equal to a third threshold value, directly jumping to the step 2.2, otherwise, entering the step 2.1.4 to perform relocation;
step 2.1.4, the SLAM system enters a relocation model, calculates a candidate key frame having a common-view relation with a current frame through BoW, performs feature matching on the current frame and the candidate key frame by using GMS grid motion, estimates the pose of the current frame by combining PNP and RANSAC algorithms when a matched feature point is larger than a fourth threshold value, and obtains a map point corresponding to the current frame on a local map by using a BA optimization algorithm; if the number of map points of the current frame is larger than the fifth threshold value, the relocation is successful.
7. The GMS and motion detection based dynamic visual SLAM method according to claim 6, characterized in that said step 2.2 comprises in particular the following sub-steps:
2.2.1, the SLAM system enters a local map tracking model, and a local map is established for tracking by utilizing the initial pose of the current frame;
and 2.2.2, after the local map tracking is completed, judging the current frame, creating a key frame, performing GMS feature matching on the current key frame and a reference key frame to obtain the static feature of the current frame, and triangularizing to create map points.
8. The GMS and motion detection based dynamic visual SLAM method of claim 1, wherein the constructing of the static point cloud map in step 3 comprises: judging and screening the characteristics; the integration of the depth maps completes the fusion of the 3D points.
9. The GMS and motion detection based dynamic visual SLAM method according to claim 8, wherein in step 3, the specific implementation manner of the feature judgment and screening is as follows:
performing motion detection on the key frame, inputting the reference key frame and the current key frame into a trained YOLO v3 network to segment a potential motion region S of the reference framerefAnd a potential motion region S of the current framecur,prTo refer to feature points in key frames, pcIs projected to the pixel point behind the current key frame, and the projected point pcThe depth information in the corresponding depth map is Z, and p is calculated according to the camera modelcCorresponding projected depth value ZprojAnd obtaining a depth difference value:
ΔZ=Zproj-Z
calculating prAnd pcWith corresponding spatial point pwWhen alpha and delta Z meet the preset condition, the pixel point p is judgedcScreening out the pixel point from the current frame as a dynamic point; and processing all the points in the current key frame according to the method to finally obtain the static characteristic point set.
10. The GMS and motion detection based dynamic visual SLAM method according to claim 8, wherein in step 3, the specific implementation manner of the aggregate depth map to complete the fusion of 3D points is as follows:
marking the single-frame static point cloud map constructed by each key frame as CkAnd the pose of the camera relative to the world coordinate system at the moment corresponding to the key frame is recorded as TkConverting the point clouds of all key frames into a world coordinate system, and constructing a global static point cloud map W, wherein the point cloud map W is expressed as:
Figure FDA0002781361850000031
where N represents the total number of key frame sets.
CN202011282866.8A 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method Active CN112418288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282866.8A CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282866.8A CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Publications (2)

Publication Number Publication Date
CN112418288A true CN112418288A (en) 2021-02-26
CN112418288B CN112418288B (en) 2023-02-03

Family

ID=74831325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282866.8A Active CN112418288B (en) 2020-11-17 2020-11-17 GMS and motion detection-based dynamic vision SLAM method

Country Status (1)

Country Link
CN (1) CN112418288B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282088A (en) * 2021-05-21 2021-08-20 潍柴动力股份有限公司 Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle
CN113506325A (en) * 2021-07-15 2021-10-15 清华大学 Image processing method and device, electronic equipment and storage medium
CN113781574A (en) * 2021-07-19 2021-12-10 长春理工大学 Method for removing dynamic points of binocular catadioptric panoramic system
CN115567658A (en) * 2022-12-05 2023-01-03 泉州艾奇科技有限公司 Method and device for keeping image not deflecting and visual earpick
CN115830110A (en) * 2022-10-26 2023-03-21 北京城市网邻信息技术有限公司 Instant positioning and map construction method and device, terminal equipment and storage medium
CN116299500A (en) * 2022-12-14 2023-06-23 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056643A (en) * 2016-04-27 2016-10-26 武汉大学 Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system
CN107392964A (en) * 2017-07-07 2017-11-24 武汉大学 The indoor SLAM methods combined based on indoor characteristic point and structure lines
CN107871327A (en) * 2017-10-23 2018-04-03 武汉大学 The monocular camera pose estimation of feature based dotted line and optimization method and system
CN107917710A (en) * 2017-11-08 2018-04-17 武汉大学 A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method
CN109558790A (en) * 2018-10-09 2019-04-02 中国电子科技集团公司电子科学研究院 A kind of pedestrian target detection method, apparatus and system
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110009732A (en) * 2019-04-11 2019-07-12 司岚光电科技(苏州)有限公司 Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method
CN110807377A (en) * 2019-10-17 2020-02-18 浙江大华技术股份有限公司 Target tracking and intrusion detection method, device and storage medium
CN111161318A (en) * 2019-12-30 2020-05-15 广东工业大学 Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching
CN111708042A (en) * 2020-05-09 2020-09-25 汕头大学 Robot method and system for pedestrian trajectory prediction and following
CN111797688A (en) * 2020-06-02 2020-10-20 武汉大学 Visual SLAM method based on optical flow and semantic segmentation
CN111833358A (en) * 2020-06-26 2020-10-27 中国人民解放军32802部队 Semantic segmentation method and system based on 3D-YOLO

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056643A (en) * 2016-04-27 2016-10-26 武汉大学 Point cloud based indoor dynamic scene SLAM (Simultaneous Location and Mapping) method and system
CN107392964A (en) * 2017-07-07 2017-11-24 武汉大学 The indoor SLAM methods combined based on indoor characteristic point and structure lines
CN107871327A (en) * 2017-10-23 2018-04-03 武汉大学 The monocular camera pose estimation of feature based dotted line and optimization method and system
CN107917710A (en) * 2017-11-08 2018-04-17 武汉大学 A kind of positioning in real time of the interior based on single line laser and three-dimensional map construction method
CN109558790A (en) * 2018-10-09 2019-04-02 中国电子科技集团公司电子科学研究院 A kind of pedestrian target detection method, apparatus and system
CN109974743A (en) * 2019-03-14 2019-07-05 中山大学 A kind of RGB-D visual odometry optimized based on GMS characteristic matching and sliding window pose figure
CN110009732A (en) * 2019-04-11 2019-07-12 司岚光电科技(苏州)有限公司 Based on GMS characteristic matching towards complicated large scale scene three-dimensional reconstruction method
CN110807377A (en) * 2019-10-17 2020-02-18 浙江大华技术股份有限公司 Target tracking and intrusion detection method, device and storage medium
CN111161318A (en) * 2019-12-30 2020-05-15 广东工业大学 Dynamic scene SLAM method based on YOLO algorithm and GMS feature matching
CN111708042A (en) * 2020-05-09 2020-09-25 汕头大学 Robot method and system for pedestrian trajectory prediction and following
CN111797688A (en) * 2020-06-02 2020-10-20 武汉大学 Visual SLAM method based on optical flow and semantic segmentation
CN111833358A (en) * 2020-06-26 2020-10-27 中国人民解放军32802部队 Semantic segmentation method and system based on 3D-YOLO

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUIHUA LIU,WEILIN ZENG,BO FENG AND FENG XU: ""DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors"", 《SENSORS》 *
JUNHAO CHENG,ZHI WANG,HONGYAN ZHOU,LI LI AND JIAN YAO: ""DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes"", 《IJGI》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282088A (en) * 2021-05-21 2021-08-20 潍柴动力股份有限公司 Unmanned driving method, device and equipment of engineering vehicle, storage medium and engineering vehicle
CN113506325A (en) * 2021-07-15 2021-10-15 清华大学 Image processing method and device, electronic equipment and storage medium
CN113506325B (en) * 2021-07-15 2024-04-12 清华大学 Image processing method and device, electronic equipment and storage medium
CN113781574A (en) * 2021-07-19 2021-12-10 长春理工大学 Method for removing dynamic points of binocular catadioptric panoramic system
CN113781574B (en) * 2021-07-19 2024-04-12 长春理工大学 Dynamic point removing method for binocular refraction and reflection panoramic system
CN115830110A (en) * 2022-10-26 2023-03-21 北京城市网邻信息技术有限公司 Instant positioning and map construction method and device, terminal equipment and storage medium
CN115830110B (en) * 2022-10-26 2024-01-02 北京城市网邻信息技术有限公司 Instant positioning and map construction method and device, terminal equipment and storage medium
CN115567658A (en) * 2022-12-05 2023-01-03 泉州艾奇科技有限公司 Method and device for keeping image not deflecting and visual earpick
CN116299500A (en) * 2022-12-14 2023-06-23 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking
CN116299500B (en) * 2022-12-14 2024-03-15 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking

Also Published As

Publication number Publication date
CN112418288B (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN112418288B (en) GMS and motion detection-based dynamic vision SLAM method
CN111968129B (en) Instant positioning and map construction system and method with semantic perception
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN109544636B (en) Rapid monocular vision odometer navigation positioning method integrating feature point method and direct method
CN107025668B (en) Design method of visual odometer based on depth camera
CN108682027A (en) VSLAM realization method and systems based on point, line Fusion Features
CN109509230A (en) A kind of SLAM method applied to more camera lens combined type panorama cameras
CN108537848B (en) Two-stage pose optimization estimation method for indoor scene reconstruction
CN106920259B (en) positioning method and system
CN112785702A (en) SLAM method based on tight coupling of 2D laser radar and binocular camera
CN110807809B (en) Light-weight monocular vision positioning method based on point-line characteristics and depth filter
CN110009732B (en) GMS feature matching-based three-dimensional reconstruction method for complex large-scale scene
Tang et al. ESTHER: Joint camera self-calibration and automatic radial distortion correction from tracking of walking humans
US11367195B2 (en) Image segmentation method, image segmentation apparatus, image segmentation device
CN113108771B (en) Movement pose estimation method based on closed-loop direct sparse visual odometer
CN110766024B (en) Deep learning-based visual odometer feature point extraction method and visual odometer
CN110827353B (en) Robot positioning method based on monocular camera assistance
US11880964B2 (en) Light field based reflection removal
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
CN110599545A (en) Feature-based dense map construction system
CN112767546B (en) Binocular image-based visual map generation method for mobile robot
CN116449384A (en) Radar inertial tight coupling positioning mapping method based on solid-state laser radar
CN107330980A (en) A kind of virtual furnishings arrangement system based on no marks thing
CN113658337A (en) Multi-mode odometer method based on rut lines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant