CN116429087A - Visual SLAM method suitable for dynamic environment - Google Patents

Visual SLAM method suitable for dynamic environment Download PDF

Info

Publication number
CN116429087A
CN116429087A CN202310387172.8A CN202310387172A CN116429087A CN 116429087 A CN116429087 A CN 116429087A CN 202310387172 A CN202310387172 A CN 202310387172A CN 116429087 A CN116429087 A CN 116429087A
Authority
CN
China
Prior art keywords
points
dynamic
point
algorithm
visual slam
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310387172.8A
Other languages
Chinese (zh)
Inventor
黎萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China Zhongshan Institute
Original Assignee
University of Electronic Science and Technology of China Zhongshan Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China Zhongshan Institute filed Critical University of Electronic Science and Technology of China Zhongshan Institute
Priority to CN202310387172.8A priority Critical patent/CN116429087A/en
Publication of CN116429087A publication Critical patent/CN116429087A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3833Creation or updating of map data characterised by the source of data
    • G01C21/3837Data obtained from a single source

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual SLAM method suitable for a dynamic environment, which has the technical scheme that the visual SLAM method comprises four major threads, namely a tracking thread, a local image building thread, a closed loop detection thread and an image building thread. In order to improve the adaptability of the visual SLAM system in a dynamic environment, ensure the extraction quantity and quality of feature points and improve the robustness of the system, a deep learning algorithm YOLOv4 and a dynamic feature point detection algorithm based on geometric constraint are added at the front end of the SLAM system, dynamic objects are firstly identified by the YOLOv4, and the dynamic points are further removed by adopting the geometric constraint, so that the influence of the dynamic objects on the feature extraction in the visual SLAM method is avoided, the map construction and positioning precision of the visual SLAM in the dynamic environment are improved, and the robustness of the visual SLAM system is effectively improved.

Description

Visual SLAM method suitable for dynamic environment
Technical Field
The invention belongs to the field of visual SLAM (synchronous positioning and map building), and particularly relates to a visual SLAM method suitable for a dynamic environment.
Background
The pose estimation and positioning precision of the current visual SLAM system in a static scene are excellent, but in a dynamic scene, a large number of moving objects influence the extraction of characteristic points of the system, at the moment, the system usually treats the characteristic points on the dynamic objects as normal static points, obviously, the system has larger errors due to the operation, and thus the pose estimation and positioning map construction becomes inaccurate. Therefore, how to introduce data correlation of appropriate dynamic environments in SLAM systems to achieve accurate localization and mapping of dense maps has become a hotspot of current research.
If there are fewer dynamic objects in the environment, a visual SLAM system using a filtering method such as RANSAC (Random Sample Consenus, random sample consensus) can better perform the tasks expected by the designer. The ORB-SLAM2 system adopts a RANSAC method, and because the motion rules of dynamic points and static points are greatly different, a filter can be used for identifying the dynamic points as outliers and removing the outliers. However, when there are many dynamic objects in the environment, the capability of detecting the outer points of the detection model will be greatly reduced, and at this time, the dynamic points and the static points cannot be accurately distinguished, so that serious errors will occur in the estimation of the pose of the subsequent system. One solution to this problem in the dynamic environment of visual SLAM is to completely discard these moving points when estimating the motion pose, so that an algorithm capable of accurately detecting a dynamic object is needed at this time, and at present, such an algorithm is mainly divided into a deep learning algorithm and a geometric algorithm.
In the aspect of deep learning, a class label of an object is obtained by using a deep learning network, and then whether the object belongs to a dynamic object is judged according to the class label.
Flow Fusion adds segmentation and discrimination to dynamic point clouds by improving Static Fusion, and uses PWC-Net network to identify and then reject dynamic objects. Riazuelo et al uses target detection to reject feature points on a dynamic object, thereby reducing the influence of pedestrians on pose estimation. It should be noted, however, that the algorithm only considers that the dynamic object is a person, and the dynamic object is various in real life, so the algorithm is not applicable any more in the face of the various situations of the dynamic object. Xu et al use Mask R-CNN algorithm capable of carrying out semantic segmentation, and can retain enough static characteristic points while filtering dynamic characteristic points, but the final real-time performance of the algorithm is very low due to the huge calculation amount of the algorithm model. Bescos et al propose DynaSLAM systems that consider three types of monocular, binocular, and RGB-D cameras, and if a binocular camera or other sensor is used, mask R-CNN is used to segment the dynamic object first, and then a static region is used to map. For RGB-D cameras, the system adds a link for determining whether it is a real dynamic object. The system has the greatest advantage of being capable of constructing a more complete map because the algorithm does not simply remove dynamic objects from the image, but rather complements the area according to different view angle information. Wang et al propose a method of segmenting an object on a depth map that slightly improves the efficiency of the system. However, the algorithms are based on semantic segmentation, so that the real-time performance is not high.
In the research of solving the dynamic object by using the geometric algorithm, tan and the like propose an RDSLAM algorithm based on RANSAC, the algorithm can eliminate dynamic points which are obviously different from static points, and in addition, a strategy for updating key frames on line is also provided, so that the image frames with larger motion occupation can be replaced. However, when the number of dynamic points is greater than that of static points, the positioning accuracy of the system will be greatly reduced. Dai et al construct a plurality of triangles for feature points in an image frame by using a triangulation method, judge whether the feature points are the same target by the distance change of the connected edges in the two images, and if the feature points are different, directly remove the corresponding edges to finally obtain an area formed by the remaining triangles, thereby rejecting the dynamic target. Sun et al propose a method for removing moving objects, which uses sparse optical flow to obtain the contour of the moving object, and then further segments the object, but the premise is to ensure that the camera is not moved all the time, and obviously the method needs to be improved to be applied to SLAM systems.
The above information disclosed in the background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to improve the robustness of an SLAM system in a dynamic environment, avoid the influence of dynamic objects on the construction and positioning, provide a visual SLAM method suitable for the dynamic environment, and improve the robustness of the SLAM system to the dynamic environment while maintaining the excellent construction capability and loop detection capability of the SLAM system under the conventional condition.
In order to achieve the above purpose, the present invention is realized by the following technical scheme:
a visual SLAM method adapted to dynamic environment, characterized by comprising four large threads: tracking threads, local mapping threads, closed loop detection threads and mapping threads, and specifically comprises the following steps:
A. tracking threads: the SLAM system receives images from cameras, firstly uses a deep learning YOLOv4 network to identify common dynamic objects such as people in the environment, eliminates dynamic feature points by utilizing epipolar geometric constraint after ORB features are extracted, outputs camera pose information corresponding to each frame of images for positioning, performs local map tracking, selects key frames and transmits the key frames to a local map building thread and a dense map building thread;
B. local mapping thread: receiving a key frame output by a tracking thread, completing the insertion of the key frame, and generating a new map point; then, adjusting by a local beam adjustment method, and finally screening the inserted key frames to remove redundant key frames;
C. loop detection thread: the method mainly comprises two processes, namely loop detection and loop correction, wherein the loop detection firstly utilizes word bags to detect loop key frames, then performs similarity transformation through a sim3 algorithm, and the loop correction is to perform loop fusion and optimize an intrinsic image;
D. dense mapping threads: constructing a dense map by using a PCL (Point Cloud Library) point cloud library, constructing a static dense map by using key frames from which dynamic points are removed, obtaining point cloud information which is always noisy and contains more redundant information, removing outliers by using a statistical filtering method in a PCL library, and removing the redundant point cloud information by using voxel filtering.
A visual SLAM method adapted to a dynamic environment as described above, characterized by: in order to improve the adaptability of the visual SLAM system in a dynamic environment, ensure the extraction quantity and quality of feature points and improve the robustness of the system, a deep learning algorithm YOLOv4 and a dynamic feature point detection algorithm based on geometric constraint are added at the front end of the SLAM system, a dynamic object is firstly identified by YOLOv4, and the dynamic points are further removed by adopting geometric constraint, and the method comprises the following steps:
screening matching points in an image by using a RANSAC algorithm, removing wrong matching points, calculating a basic matrix F by using the rest characteristic points, and solving R and t;
the data to be processed can be divided into inner points and outer points through a RANSAC algorithm, wherein the inner points are points at which a model is expected to be effective, and refer to effective characteristic points, and points on a dynamic object are excluded;
secondly, recovering specific coordinates X of the space point P in two camera coordinate systems according to the epipolar geometry relation between the pixel points of the two matching point pairs and the solved R, t 1 ,X 2
Third, calculate the solved X 1 ,X 2 Whether or not formula X is satisfied 2 =R·X 1 If the space point P is satisfied, the space point P is a static point, if the space point P is not satisfied, the judgment is continued, and X is calculated 2 Or X 1 Projecting to the next frame image to obtain a pixel point p 6 Then respectively at p 5 And p 6 For a central construction of a 3 x 3 pixel block, the corresponding small block is denoted as a and B, and the degree of correlation between a and B is expressed by a normalized cross-correlation factor as follows:
Figure BDA0004174444370000051
setting the threshold to 0.9, if S (A, B) NCC If the difference is larger than 0.9, the two points are considered to be similar, namely the space point P is judged to be a static point, and otherwise, the two points are dissimilar, namely the space point P is judged to be a dynamic point.
The visual SLAM method adapted to dynamic environment as described above is characterized in that the specific steps of the RANSAC algorithm are as follows:
(1) Randomly selecting four sample data (ensuring that the four sample data are not collinear) from a feature point data set obtained by matching a visual SLAM system, calculating a transformation matrix H, and making a model be M;
(2) Traversing all the characteristic points by utilizing a transformation matrix H, calculating projection errors between each characteristic point and a model M, and adding the characteristic point P into an inner point set if the errors between the characteristic point P and the model are smaller than a set error threshold;
(3) Comparing the current obtained model M' with the previous model M, and selecting a model with more interior points in the interior point set;
(4) Repeating the above three steps until the iteration is finished (i.e. the number of iterations reaches a preset value).
Compared with the prior art, the invention has the following advantages:
the visual SLAM method suitable for the dynamic environment adds the YOLOv4 and the dynamic point detection algorithm based on the geometric features at the front end of the SLAM system, firstly uses the YOLOv4 to identify dynamic objects, and adopts geometric constraints to further reject dynamic points, thereby avoiding the influence of the dynamic objects on the feature extraction in the visual SLAM method, improving the map construction and positioning precision of the visual SLAM under the dynamic environment, and further effectively improving the robustness of the visual SLAM system.
Drawings
FIG. 1 is a block diagram of a visual SLAM method of the present invention adapted to a dynamic environment;
FIG. 2 is a graph of the geometry of a dynamic point under multiple camera coordinate systems;
FIG. 3 is a flow chart of a dynamic point detection algorithm based on geometric constraints of the present invention;
FIG. 4 is a comparative graph of feature point extraction experiment results;
FIG. 5 is a graph of ORB-SLAM2 and the absolute track error of the algorithm of the present invention at a fr 3_stationary sequence;
FIG. 6 is a graph of the absolute track error versus the fr3_walking_halfsphere sequence for ORB-SLAM2 and the algorithm of the present invention;
FIG. 7 is a graph of ORB-SLAM2 and the absolute track error of the algorithm of the present invention in a fr3_walk_static sequence;
FIG. 8 is a graph of ORB-SLAM2 and the absolute track error versus the fr3_walk_xyz sequence for the algorithm of the present invention;
FIG. 9 is a graph of ORB-SLAM2 and the algorithm of the present invention in comparison to dynamic sequence APE (Absolute Pose Error);
FIG. 10 is a second comparison of ORB-SLAM2 and the algorithm of the present invention in dynamic sequence APE (Absolute Pose Error);
FIG. 11 is a third comparison of ORB-SLAM2 and the algorithm of the present invention in dynamic sequence APE (Absolute Pose Error);
FIG. 12 is a fourth comparison of ORB-SLAM2 and the algorithm of the present invention in dynamic sequence APE (Absolute Pose Error);
FIG. 13 is an ORB-SLAM2 build dense point cloud;
fig. 14 is a dense point cloud created by the algorithm of the present invention.
Detailed Description
The technical features of the present invention are described in further detail below with reference to the accompanying drawings so that those skilled in the art can understand the features.
As shown in fig. 1, the visual SLAM method of the present invention adapted to a dynamic environment is divided into four major threads: tracking threads, local mapping threads, closed loop detection threads and dense mapping threads, which are specifically as follows:
1. tracking threads: the SLAM system receives images from cameras, firstly uses a deep learning YOLOv4 network to identify common dynamic objects such as people in the environment, eliminates dynamic feature points by utilizing epipolar geometric constraint after ORB features are extracted, outputs camera pose information corresponding to each frame of images for positioning, performs local map tracking, selects key frames and transmits the key frames to a local map building thread and a dense map building thread;
2. local mapping thread: and receiving the key frames output by the tracking thread, completing the insertion of the key frames, and generating new map points. And then, adjusting by a local beam adjustment method (BA), and finally, screening the inserted key frames to remove redundant key frames.
3. Closed loop detection thread: the method mainly comprises two processes, namely loop detection and loop correction. Loop detection firstly uses word bags to detect loop key frames, and then performs similarity transformation through a sim3 algorithm. The loop correction is to perform loop fusion and optimize the intrinsic image.
4. Dense mapping threads: and constructing a dense map by using the PCL (Point Cloud Library) point cloud library. And constructing a static dense map by using key frames from which dynamic points are removed, wherein the obtained point cloud information is always noisy and contains more redundant information, removing outliers by adopting a statistical filtering method in a PCL library, and removing the redundant point cloud information by adopting voxel filtering.
The geometrical relationship of the dynamic point under a plurality of camera coordinate systems is shown in fig. 2, and the dynamic point detection algorithm based on geometrical constraint is shown in fig. 3.
Screening matching points in an image by using a RANSAC algorithm, removing wrong matching points, calculating a basic matrix F by using the rest characteristic points, and solving R and t;
the data to be processed can be divided into inner points and outer points through the RANSAC algorithm, wherein the inner points are points at which the model is expected to be effective, refer to effective characteristic points, exclude points on a dynamic object, and the outer points are invalid data and refer to points on the dynamic object. In the visual SLAM system, the RANSAC algorithm specifically comprises the following steps:
(1) Randomly selecting four sample data (ensuring that the four sample data are not collinear) from a feature point data set obtained by matching a visual SLAM system, calculating a transformation matrix H, and making a model be M;
(2) Traversing all the characteristic points by utilizing a transformation matrix H, calculating projection errors between each characteristic point and a model M, and adding the characteristic point P into an inner point set if the errors between the characteristic point P and the model are smaller than a set error threshold;
(3) Comparing the current obtained model M' with the previous model M, and selecting a model with more interior points in the interior point set;
(4) Repeating the above three steps until the iteration is finished (i.e. the number of iterations reaches a preset value).
Secondly, recovering specific coordinates X of the space point P in two camera coordinate systems according to the epipolar geometry relation between the pixel points of the two matching point pairs and the solved R, t 1 ,X 2
Third, calculate the solved X 1 ,X 2 Whether or not formula X is satisfied 2 =R·X 1 If the space point P is satisfied, the space point P is a static point, if the space point P is not satisfied, the judgment is continued, and X is calculated 2 Or X 1 Projecting to the next frame image to obtain a pixel point p 6 Then respectively at p 5 And p 6 For a central construction of a 3 x 3 pixel block, the corresponding small block is denoted as a and B, and the degree of correlation between a and B is expressed by a normalized cross-correlation factor as follows:
Figure BDA0004174444370000081
setting the threshold to 0.9, if S (A, B) NCC If the difference is larger than 0.9, the two points are considered to be similar, namely the space point P is judged to be a static point, and otherwise, the two points are dissimilar, namely the space point P is judged to be a dynamic point.
The algorithm of the application has the following specific experiments and comparison demonstration:
to verify the effectiveness of the algorithm herein, test experiments were performed with five sets of dynamic sequences fre3_walking_xyz, fre3_walking_ halfsphere, fr3_walking_static, fre3_stationary, fre3_stationary_xyz in the fr3 dynamic dataset in the TUM dataset of the munich university of industry, each sequence being tested 5 times. These five sets of dynamic sequences can be divided into two categories again, where "walking" represents high dynamics, where the dynamics are people walking back and forth, the range of motion is large, and where the people can occupy 1/3 or even 1/2 of the picture, which is certainly a great challenge for the robustness of the visual SLAM system. While "sitting" indicates low dynamics in which the character sits mainly in a chair with little movement on the limbs. The suffix in the name represents the motion of the camera, e.g. "XYZ" means moving in three coordinate axes XYZ, "halfsphere" means moving along a hemisphere, and "static" means stationary. The experimental environment is a Dart Precision 7820Tower computer, the processor is Intel (R) Silver4210R, the model of the display card is NVIDIA Quadro P22004G, and the system version is Ubuntu16.04.
1. Characteristic point extraction experiment
The result of extracting the feature points after the dynamic object is identified by the oloov 4 is shown in the left diagram of fig. 4, and because the oloov 4 can be misjudged for the condition that the character in the image is incomplete or blurred, the feature points on the dynamic object are not removed and processed by the ORB-SLAM2 combined with the algorithm of the oloov 4, so that more feature points appear on the body of the pedestrian, the subsequent pose estimation error can be caused, and the established map accuracy is rapidly reduced. The improved visual front end is combined with YOLOv4 and geometric constraint to remove dynamic points, the processed image is shown in the right side diagram of FIG. 4, and therefore, even if the image is blurred due to pedestrian movement, the invention can well remove the characteristic points on the pedestrian body, and a subsequent system can accurately estimate the pose according to the effective static characteristic points.
2. Positioning accuracy test experiment
In contrast to the positioning accuracy of ORB-SLAM2 and the algorithm of the invention on four dynamic sequences (including low dynamic and high dynamic), the absolute track errors of the two methods on the four sequences are shown in fig. 5 to 8, the absolute error of the camera pose is shown on the left side of the graph, the error magnitude is described by the color bar on the right side, the value is gradually increased from bottom to top, and the color is also changed from dark blue to dark red. In the fr 3-sizing-static sequence, the algorithm is not much improved over the ORB-SLAM2 algorithm because the sequence is a low dynamic sequence under which the ORB-SLAM2 algorithm is robust. But the difference between the two algorithms is relatively large with the remaining three highly dynamic sequences. Observing fr3_walk_ halfsphere, fr3 _walk_static and fr3_walk_xyz can obtain that the algorithm has smaller error, the track is closer to the real track, the ORB-SLAM2 algorithm is opposite, the error between the obtained result and the real track is larger, and the main reason is that the ORB-SLAM2 algorithm does not have the capability of processing more scenes with dynamic feature points and can only keep certain robustness in a static environment. The dynamic sequence APE (Absolute Pose Error ) test results are shown in fig. 9-12, wherein the APE can be used for evaluating global consistency of SLAM tracks, and indexes such as absolute error, root mean square error and the like are included, and it can be seen intuitively that in a static sequence, error fluctuation of two algorithms is basically similar, the difference is not large, and in a dynamic sequence, error fluctuation generated by an ORB-SLAM2 is relatively large, and error fluctuation generated by an improved algorithm is obviously smaller and can continuously keep smaller fluctuation, so that the performance of the improved algorithm is superior to that of the ORB-SLAM 2.
The Absolute Track Error (ATE) and relative pose error (Relative Pose Error, RPE) of the two algorithms on four sequences are shown in tables 1, 2 and 3, wherein the smaller the value of rmse is, the smaller the representative error is, the error between the algorithm and the real track is smaller than the ORB-SLAM2 algorithm as seen from rmse in three tables, the difference between the algorithm and the real track is more obvious on the dynamic sequence, the improved algorithm can well treat the interference in the dynamic environment, and the RPE represents the accuracy of the real pose and the estimated pose error of two adjacent frames after a period of time is separated.
TABLE 1Absolute track error test (ATE) results (m) Table.1Absolute trajectory error test results (m)
Figure BDA0004174444370000111
TABLE 2 Relative Pose Error (RPE) translation test results (m) Table.2Translation test result ofrelative pose error (m)
Figure BDA0004174444370000112
TABLE 3Relative Pose Error (RPE) rotation angle test results (deg) Table.3relative pose error rotation angle testresults (deg)
Figure BDA0004174444370000121
In order to more intuitively embody the merits of the improved algorithm, the invention selects the rmse calculation in the Absolute Track Error (ATE) test result of the improved algorithm to be improved relative to the original ORB-SLAM2, and the calculation formula is as follows:
Figure BDA0004174444370000122
where α represents the rate of rise, m is the root mean square error derived from ORB-SLAM2, and n is the root mean square error derived from the improved algorithm herein. The calculation results are shown in table 4, and the results in the table show that the improved algorithm of the invention has obvious advantages in a high dynamic state, the improvement rate is over 90%, and the improved algorithm can better cope with dynamic environment.
Table 4rmse vs Table.4RMSE comparison ofabsolute trajectory errorbetween two algorithms of absolute track error for two algorithms
Sequence name ORB-SLAM2(m) The algorithm (m) herein Lifting (%)
sitting_static 0.0091 0.0077 15.38
sitting_xyz 0.0091 0.0078 14.29
walking_halfsphere 0.7757 0.0507 93.46
walking_static 0.2813 0.0099 96.48
walking_xyz 1.0007 0.0167 98.33
In terms of dynamic SLAM system research, there are many improved excellent algorithms such as DS-SLAM [ Yu C, liu Z X, liu X J, et al DS-SLAM: a semantic visual SLAM towards dynamic environments [ C ] 2018IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) & IEEE,2018:1168-1174 ], dynaSLAM [ Bescos B, F.c. J M, civera J, et al DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [ J ]. IEEE Robotics and Automation Letters,2018,3 (4): 4076-4083.], etc., for verification of algorithm advancement herein and because each algorithm experimental configuration is different, the improvement rate of absolute track errors in rmse of the two algorithms is selected to be compared with that of the algorithm herein, and the data is taken from the disclosure values in the corresponding literature, as shown in Table 5.
TABLE 5absolute track error RMSE promotion Rate of each dynamic SLAM algorithm Table.5Absolute tetrajectoryerrrmSEINCreateareateof Technomium SLAMALGORITHM
Sequence name DS-SLAM DynaSLAM Algorithm herein
sitting_static 25.94% 15.38%
walking_halfsphere 93.76% 92.88% 93.46%
walking_static 97.91% 93.33% 96.48%
walking_xyz 96.71% 96.73% 98.33%
As can be seen from Table 5, the algorithm of the present invention performs best on the high dynamic sequence walking_xyz and performs better overall than the DynaSLAM algorithm, but slightly less than the best overall DS-SLAM algorithm. By comparing with two excellent dynamic SLAM algorithms, the improved algorithm can be obtained, so that errors can be reduced better, and the accuracy of system positioning is improved.
3. Dense map building test
The experiment was performed with the high dynamic sequence walking halfsphere, walking xyz. Firstly, under the condition that dynamic objects are not removed from both groups (namely the original ORB-SLAM2 algorithm), a dense point cloud map is constructed, and the result is shown in fig. 13, and as the dynamic objects of people are not removed, the point cloud map is spliced in error, a large number of double images are generated, and the accuracy of the SLAM system in map establishment is affected.
The algorithm of the invention is used for eliminating dynamic objects in the environment, and a constructed dense point cloud map is shown in fig. 14. In contrast to the dense point cloud map 13 processed by the ORB-SLAM2 algorithm, the point cloud of the dynamic object in fig. 14 is filtered out, so that a lot of ghosts caused by pedestrians are no longer present, and the scene occluded by the pedestrians can be completely restored. From the comparison result of the experiment, the algorithm can well process dynamic objects in the environment, and the robustness of the SLAM system is improved.
The embodiments of the present invention are merely described in terms of preferred embodiments of the present invention, and are not intended to limit the scope and spirit of the present invention, but various modifications and improvements of the technical solutions of the present invention will be apparent to those skilled in the art without departing from the design concept of the present invention.

Claims (3)

1. A visual SLAM method adapted to dynamic environment, characterized by comprising four large threads: tracking threads, local mapping threads, closed loop detection threads and mapping threads, and specifically comprises the following steps:
A. tracking threads: the SLAM system receives images from cameras, firstly uses a deep learning YOLOv4 network to identify common dynamic objects such as people in the environment, eliminates dynamic feature points by utilizing epipolar geometric constraint after ORB features are extracted, outputs camera pose information corresponding to each frame of images for positioning, performs local map tracking, selects key frames and transmits the key frames to a local map building thread and a dense map building thread;
B. local mapping thread: receiving a key frame output by a tracking thread, completing the insertion of the key frame, and generating a new map point; then, adjusting by a local beam adjustment method, and finally screening the inserted key frames to remove redundant key frames;
C. loop detection thread: the method mainly comprises two processes, namely loop detection and loop correction, wherein the loop detection firstly utilizes word bags to detect loop key frames, then performs similarity transformation through a sim3 algorithm, and the loop correction is to perform loop fusion and optimize an intrinsic image;
D. dense mapping threads: and constructing a dense map by using a PCL point cloud library, constructing a static dense map by using key frames from which dynamic points are removed, wherein the obtained point cloud information is always noisy and contains more redundant information, removing outliers by using a statistical filtering method in the PCL library, and removing the redundant point cloud information by using voxel filtering.
2. The visual SLAM method adapted to dynamic environment according to claim 1, wherein: adding a deep learning algorithm YOLOv4 and a dynamic feature point detection algorithm based on geometric constraint at the front end of the SLAM system, firstly identifying a dynamic object by using the YOLOv4, and further removing dynamic points by adopting the geometric constraint, wherein the method comprises the following steps:
screening matching points in an image by using a RANSAC algorithm, removing wrong matching points, calculating a basic matrix F by using the rest characteristic points, and solving R and t;
the data to be processed can be divided into inner points and outer points through a RANSAC algorithm, wherein the inner points are points at which a model is expected to be effective, and refer to effective characteristic points, points on a dynamic object are excluded, and the outer points are invalid data, and refer to points on the dynamic object;
secondly, recovering specific coordinates X of the space point P in two camera coordinate systems according to the epipolar geometry relation between the pixel points of the two matching point pairs and the solved R, t 1 ,X 2
Third, calculate the solved X 1 ,X 2 Whether or not formula X is satisfied 2 =R·X 1 +t, if satisfied, this spatial point P isIf the static point is not satisfied, continuing to judge and X 2 Or X 1 Projecting to the next frame image to obtain a pixel point p 6 Then respectively at p 5 And p 6 For a central construction of a 3 x 3 pixel block, the corresponding small block is denoted as a and B, and the degree of correlation between a and B is expressed by a normalized cross-correlation factor as follows:
Figure FDA0004174444360000021
setting the threshold to 0.9, if S (A, B) NCC If the difference is larger than 0.9, the two points are considered to be similar, namely the space point P is judged to be a static point, and otherwise, the two points are dissimilar, namely the space point P is judged to be a dynamic point.
3. Visual SLAM method adapted to dynamic environment according to claim 2, characterized in that the RANSAC algorithm comprises the following specific steps:
firstly, randomly selecting four sample data from a feature point data set obtained by matching a visual SLAM system, calculating a transformation matrix H, and enabling a model to be M;
step two, traversing all the characteristic points by using a transformation matrix H, calculating projection errors between each characteristic point and a model M, and adding the characteristic point P into an inner point set if the errors between the characteristic point P and the model are smaller than a set error threshold;
thirdly, comparing the current obtained model M' with the previous model M, and selecting a model with more interior points in the interior point set;
fourth, repeating the above three steps until the iteration is finished.
CN202310387172.8A 2023-04-12 2023-04-12 Visual SLAM method suitable for dynamic environment Pending CN116429087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310387172.8A CN116429087A (en) 2023-04-12 2023-04-12 Visual SLAM method suitable for dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310387172.8A CN116429087A (en) 2023-04-12 2023-04-12 Visual SLAM method suitable for dynamic environment

Publications (1)

Publication Number Publication Date
CN116429087A true CN116429087A (en) 2023-07-14

Family

ID=87082780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310387172.8A Pending CN116429087A (en) 2023-04-12 2023-04-12 Visual SLAM method suitable for dynamic environment

Country Status (1)

Country Link
CN (1) CN116429087A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117553808A (en) * 2024-01-12 2024-02-13 中国民用航空飞行学院 Deep learning-based robot positioning navigation method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117553808A (en) * 2024-01-12 2024-02-13 中国民用航空飞行学院 Deep learning-based robot positioning navigation method, device, equipment and medium
CN117553808B (en) * 2024-01-12 2024-04-16 中国民用航空飞行学院 Deep learning-based robot positioning navigation method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110310320B (en) Binocular vision matching cost aggregation optimization method
Concha et al. DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence
CN112132897A (en) Visual SLAM method based on deep learning semantic segmentation
Kamencay et al. Improved Depth Map Estimation from Stereo Images Based on Hybrid Method.
Tang et al. ESTHER: Joint camera self-calibration and automatic radial distortion correction from tracking of walking humans
WO2019057179A1 (en) Visual slam method and apparatus based on point and line characteristic
CN107657644B (en) Sparse scene flows detection method and device under a kind of mobile environment
CN110599545B (en) Feature-based dense map construction system
CN112381841A (en) Semantic SLAM method based on GMS feature matching in dynamic scene
CN112509044A (en) Binocular vision SLAM method based on dotted line feature fusion
CN110942476A (en) Improved three-dimensional point cloud registration method and system based on two-dimensional image guidance and readable storage medium
CN110827321B (en) Multi-camera collaborative active target tracking method based on three-dimensional information
CN112752028A (en) Pose determination method, device and equipment of mobile platform and storage medium
CN113744315B (en) Semi-direct vision odometer based on binocular vision
CN112418288A (en) GMS and motion detection-based dynamic vision SLAM method
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN112541423A (en) Synchronous positioning and map construction method and system
CN111709982B (en) Three-dimensional reconstruction method for dynamic environment
CN116429087A (en) Visual SLAM method suitable for dynamic environment
Cavestany et al. Improved 3D sparse maps for high-performance SFM with low-cost omnidirectional robots
Min et al. COEB-SLAM: A Robust VSLAM in Dynamic Environments Combined Object Detection, Epipolar Geometry Constraint, and Blur Filtering
CN112734816A (en) Heterogeneous image registration method based on CSS-Delaunay
Zhuang et al. Amos-SLAM: An Anti-Dynamics Two-stage SLAM Approach
CN113570713B (en) Semantic map construction method and device for dynamic environment
Zhang et al. Feature regions segmentation based RGB-D visual odometry in dynamic environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination