CN110378997B - ORB-SLAM 2-based dynamic scene mapping and positioning method - Google Patents

ORB-SLAM 2-based dynamic scene mapping and positioning method Download PDF

Info

Publication number
CN110378997B
CN110378997B CN201910481714.1A CN201910481714A CN110378997B CN 110378997 B CN110378997 B CN 110378997B CN 201910481714 A CN201910481714 A CN 201910481714A CN 110378997 B CN110378997 B CN 110378997B
Authority
CN
China
Prior art keywords
dynamic
pixel
key frame
pixels
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910481714.1A
Other languages
Chinese (zh)
Other versions
CN110378997A (en
Inventor
龙土志
蔡述庭
李丹
杨家兵
董海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910481714.1A priority Critical patent/CN110378997B/en
Publication of CN110378997A publication Critical patent/CN110378997A/en
Application granted granted Critical
Publication of CN110378997B publication Critical patent/CN110378997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic scene mapping and positioning method based on ORB-SLAM2, which comprises a local map tracking process, a dynamic pixel eliminating process, a sparse mapping process, a closed loop detection process and an octree map building process; the method has the function of dynamic pixel elimination, and can be used for quickly detecting a moving object in the image information of a camera by combining a target detection method with a depth image of a new key frame and constructing a clean static background octree map in a complex dynamic environment.

Description

ORB-SLAM 2-based dynamic scene mapping and positioning method
Technical Field
The invention relates to the technical field of robot synchronous mapping and positioning slam, in particular to a dynamic scene mapping and positioning method based on orb-slam 2.
Background
SLAM (simultaneous localization and mapping) has been a popular topic in the fields of computer vision and robotics, and has attracted the attention of many high-tech companies. SLAM technology is to create a map in an unknown environment and to enable real-time localization in the map. The framework of modern visual SLAM systems is very mature, such as ORB-SLAM2, LSD-SLAM; most advanced visual synchronous positioning and mapping (V-SLAM) systems have high precision positioning functionality, but most of these systems assume the operating environment to be static, limiting their application.
For establishing a static map in a dynamic scene, the existing algorithms have disadvantages, for example, dynaSLAM cannot eliminate pixels of predefined objects in real time, dynaSLAM cannot eliminate pixels of undefined objects or only a part of predefined objects, a StaticFusion system cannot eliminate the undefined objects or only a part of the predefined objects, and the above algorithms cannot establish a clean static map in real time in a complex dynamic environment.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a dynamic scene mapping and positioning method based on ORB-SLAM2, which has the function of dynamic pixel elimination, rapidly detects a moving object in the image information of a camera by collecting the depth image of a new key frame through a target detection method, and constructs a clean static background octree map in a complex dynamic environment.
In order to realize the task, the invention adopts the following technical scheme:
a dynamic scene mapping and positioning method based on ORB-SLAM2 comprises the following steps:
step 1, local map tracking
Initializing the pose of a camera by using image information captured by the camera carried by the robot; taking a first frame image captured by a camera as a key frame during initialization; after the initial pose is obtained, tracking the local map, so as to optimize the pose of the camera and produce a new key frame;
step 2, dynamic pixel elimination
Detecting a predefined dynamic object in the color image of the new key frame using a target detection algorithm, and then identifying dynamic pixels in combination with the depth image of the new key frame; dynamic pixels detected by the two methods are removed;
step 3, sparse mapping
For the key frames with the dynamic pixels removed, the robot poses of the key frames are optimized, new map points are added, and the quality and scale of a key frame set are maintained;
step 4, closed loop detection
Performing closed loop detection on each new key frame, and once closed loops are detected, performing pose graph optimization;
step 5, constructing an octree map
Dividing map points into voxels by using an octree, and storing the voxels through an octree structure to construct an octree map; and (4) calculating the occupation probability of the voxel to judge whether the voxel is occupied, and if so, visualizing in an octree graph.
Further, tracking the local map to optimize the camera pose and generate new keyframes includes:
the local map refers to 3D points observed by a key frame with a distance and a visual angle close to the current frame; more matched 3D points are obtained through re-projection, so that the camera pose is optimized and new key frames are generated with the minimum error:
projecting the 3D points on the local map to the current frame to obtain 3D-2D feature matching;
limiting the area of searching 2D matching points in the current frame to reduce mismatching, then comparing the pixel in the current frame with the position obtained by projecting the 3D point according to the current estimated camera pose to obtain an error structure least square method problem, minimizing the error structure least square method problem, and then searching the best camera pose to position;
and judging whether a new key frame needs to be generated or not according to preset conditions.
Further, the step 2 of identifying dynamic pixels in combination with the depth image of the new key frame includes:
projecting the remaining pixels of the predefined object removed by the target detection algorithm to world coordinates to create a 3D point; dividing the 3D points into a plurality of clusters, and uniformly selecting M pixels from each cluster; for each pixel, the pixel is projected to the N frames of key frames nearest to the new key frame and compared to detect whether the pixel is a dynamic pixel:
backprojecting pixel u to 3D point p in world coordinates using depth in depth image of keyframe and robot pose of new keyframe w
Point p in 3D w Projecting the color image of the jth key frame near the new key frame;
if the pixel u ' of the jth key frame has a valid depth value z ' in the corresponding depth image, the pixel u ' is back-projected to the 3D point p in world coordinates w′
By mixing p with w′ And p w The distance d between the two and a set threshold value d mth Comparing to determine whether the pixel u is dynamic:
search for pixels in a square around u' such that d takes the minimum value d min (ii) a If d is min Greater than a threshold value d mth Then make a preliminary judgmentAnd judging that the broken pixel u is static, otherwise, preliminarily judging that the broken pixel u is dynamic.
Further, assuming that the pixel u is in the preliminary judgment results of all the nearby key frames, the number of static results is N S The number of dynamic results is N d The number of invalid results is N i The final properties of pixel u are as follows:
if (N) S ≥N d ,N S > 0), then pixel u is a static pixel;
if (N) d ≥N s ,N d > 0), then pixel u is a dynamic pixel;
if (N) S =N d = 0), pixel u is invalid.
Further, the dynamic pixels detected by the two methods are removed, wherein after the dynamic pixels are identified by combining the depth image of the new key frame, the removing method comprises the following steps:
among the uniformly selected M pixels in one cluster, the number of static pixels is assumed to be N s ', the number of dynamic pixels is N d ', the number of invalid pixels is N i ', the final properties of the cluster are as follows:
if (N) S '≥N d '), the cluster is a static cluster, and the cluster is reserved;
if (N) d '≥N s ') the cluster is a dynamic cluster, and the cluster is removed.
Further, the optimizing the robot pose of the key frame, adding new map points, and maintaining the quality and scale of the key frame set includes:
calculating the Bow vector of the current key frame, and updating the map point of the current key frame;
optimizing the pose of the robot by using the local BA of the sliding window, wherein an optimized object is the pose in the current frame;
redundant key frames are detected and culled, and if 90% of the pixels on a key frame can be observed by more than three arbitrary key frames, the key frame is considered to be a redundant key frame and deleted.
Further, the determining whether a voxel is occupied by calculating an occupation probability of the voxel, and if so, visualizing in an octree map includes:
is provided with Z t Representing the observation of a voxel n at time t, the log of the probability of the voxel n starting at time t is L (n | Z) 1:t ) Then, at time t +1, the probability logarithm of voxel n is given by the following equation:
L(n|Z 1:t+1 )=L(n|Z 1:t-1 )+L(n|Z 1:t ) Formula 8
If voxel n is observed at time t, L (n | Z) t ) Is tau, otherwise is 0; the increment τ is a predetermined value;
defining p ∈ [0,1] to represent the probability that a voxel is occupied, and l ∈ R to represent the logarithm of probability, l being computed by the logit transform:
Figure BDA0002084065270000041
the above equation is inversely transformed:
Figure BDA0002084065270000042
calculating an occupancy probability p of the voxel by substituting a probability logarithm value into equation 10; only if the probability of occupancy p is greater than a predetermined threshold, the voxel n is considered occupied and will be visualized in the octree graph.
The invention has the following technical characteristics:
1. rapidity of operation
The algorithm uses a CornerNet-Squeeze target detection algorithm to detect dynamic objects and uses a K-Means variant Mini Batch K-Means clustering algorithm to cluster the depth information of the images, which is faster than the existing algorithm. Because the CornerNet-Squeeze target detection algorithm only needs 34ms for processing a picture, the algorithm is faster than the algorithms such as YOLOv3, and the hardware testing environment is as follows: 1080ti GPU + Intel Core i7-7700k CPU. For a clustering method with big data larger than ten thousand, the K-Means variant Mini Batch K-Means is three times faster than the K-Means itself, and the performance is not very different.
In addition, the octree map is established, so that the updating time of the map is shortened.
2. Very clean static map can be established in complex environment
The algorithm combines a target detection method and a probability logarithm updating voxel mode of an octree map to detect and eliminate dynamic pixels.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of a partial BA optimization process;
FIG. 3 is an octree map model.
Detailed Description
ORB-SLAM2 is a complete set of SLAM solutions based on monocular, binocular and RGB-D cameras. It can realize the functions of map reuse, loop detection and relocation. It assumes that the operating environment is static, limiting its application.
The algorithm is provided on the basis of an ORB-SLAM2 algorithm, can quickly establish a clean static octree map in a dynamic environment in real time, and mainly comprises five steps: local map tracking, dynamic pixel culling, sparse mapping, closed-loop detection and octree map creation, the overall flow chart is shown in fig. 1. The specific content is as follows:
step 1, local map tracking
Initializing the pose of a camera by using image information captured by the camera carried by the robot; taking a first frame image captured by a camera as a key frame during initialization; after the initial pose is obtained, tracking the local map so as to optimize the pose of the camera and produce a new key frame; the method comprises the following specific steps:
step 1.1, extracting ORB characteristic points from original RGB image information (including color images and depth images) captured by Kinect2, matching, and tracking and initializing camera poses sequentially through three modes of a motion mode, a key mode and a repositioning mode, namely positioning; at initialization, the first frame is set as the key frame, this step 1.1 is the same as ORB-SLAM 2.
Step 1.2, after the initial pose is obtained, tracking a local map, wherein the local map refers to 3D points observed by a key frame (local key frame) of a current frame (a picture currently shot by a camera) at a distance and a visual angle (the distance is set to be 4m, and the angle is set to be 1 rad); more matched 3D points are obtained through re-projection, so that the camera pose is optimized with the minimum error:
(1) Defining:
transformation matrix from camera coordinate system c to robot coordinate system r:
Figure BDA0002084065270000051
transformation matrix from robot coordinate system r to world coordinate system w (i.e. pose of the robot):
Figure BDA0002084065270000052
from a 3D point P corresponding to a frame of picture C c Projection relationship of the 2D point u projected onto this picture:
Figure BDA0002084065270000061
back-projecting from 2D point u on a frame picture C and its corresponding depth information z to the corresponding 3D point P of the picture c The back projection relation is as follows:
Figure BDA0002084065270000062
(2) Carrying out reprojection to obtain feature matching:
let the robot pose (i.e. the current frame pose) be
Figure BDA0002084065270000063
Map 3D points on a local map
Figure BDA0002084065270000064
Projecting on the current frame, obtaining the feature matching of 3D-2D:
Figure BDA0002084065270000065
(3) Optimizing the pose of the camera:
in a dynamic scene, a large number of mismatching exists in feature matching, and in order to solve the problem, the invention reduces the mismatching by limiting an area (a circle with a radius set to be 3 pixels) for searching 2D matching points (namely pixels) in a current frame. Then the pixel u in the current frame is divided into two i A position u obtained by projecting with the 3D point according to the currently estimated camera pose i ' the error found by the comparison (equation 3) constructs a least squares problem to minimize it, and then looks for the best camera pose to locate:
Figure BDA0002084065270000066
(4) Finally, judging whether a new key frame needs to be generated or not according to preset conditions; this preset condition is the same as the orb-slam2 algorithm.
Step 2, dynamic pixel elimination
In constructing static maps in dynamic scenes, the identification and deletion of dynamic pixels is most critical. Since only the key frames are used to construct the octree map, only the newly selected key frames (new key frames) from the previous step are subjected to dynamic pixel culling.
The method comprises the steps of firstly, detecting a predefined dynamic object in a color image of a new key frame by using a target detection method, and then identifying a dynamic pixel by combining a depth image of the new key frame; the dynamic pixels detected by the two methods are removed. The method comprises the following steps:
step 2.1, first for predefined objects such as: people, tables, chairs and the like, the scheme can detect the predefined dynamic object in the color image of the new key frame through a CornerNet-Squeeze target detection algorithm in CornerNet-Lite, and if the dynamic object is detected, the pixels of the dynamic object are removed.
The CornerNet-Squeeze target detection algorithm only needs 34ms for processing a picture, which is faster than the algorithms such as YOLOv3, and the hardware testing environment is as follows: 1080ti GPU + Intel Core i7-7700k CPU.
Step 2.2, for some undefined dynamic objects such as books, boxes and the like or parts of predefined objects which can not be detected by the target detection algorithm, such as human hands, the scheme detects dynamic pixels on the color picture processed by the previous step by combining the depth image of the new key frame through the following method:
2.2.1 projecting the remaining pixels of the predefined object removed by the target detection algorithm to world coordinates to create a 3D point.
2.2.2 Using clustering Algorithm to divide 3D points into clusters
Dividing the 3D points into a plurality of clusters; the number of clusters k is based on the number of 3D points s p Determining: k = s p /n pt ,n pt Is the average number of points of the cluster to be adjusted, s p Representing the size of the point cloud, M pixels are uniformly selected from each cluster.
Due to the fact that the number of pixels is extremely large and the clustering speed needs to be as fast as possible, a K-Means variant clustering method Mini Batch K-Means method is used in the scheme. Wherein n is reduced pt Better approximation can be guaranteed, but the calculation burden is increased, and the scheme uses n pt Set to 6000 to balance computational consumption and accuracy.
Because this scheme focuses on removing dynamic pixels and building a static map without tracking dynamic objects, it is assumed that the clusters are rigid bodies, which means that pixels in the same cluster have the same motion attributes; therefore, the scheme only needs to detect which clusters are dynamic clusters; to speed up the dynamic cluster detection process, the present scheme selects M =100 pixels uniformly for each cluster.
In the following steps, the selected dynamic and static attributes are judged; if the number of dynamic pixels is more than that of static pixels, the cluster is determined to be dynamic for removing, otherwise, the cluster is determined to be static and reserved.
2.2.4 determining whether a pixel is a dynamic pixel
The present scheme detects whether a pixel is a dynamic pixel by projecting M pixels selected in step 2.2.2 to the nearest N =6 frames (near the new key frame) of the key frame to the new key frame and comparing them. The method comprises the following specific steps:
(1) Robot pose using depth z in depth image of keyframe and new keyframe n
Figure BDA0002084065270000071
Backprojecting pixel u to 3D point p in world coordinates w
Figure BDA0002084065270000081
(2) Point 3D p w Projected onto the color image of the jth key frame near the new key frame:
Figure BDA0002084065270000082
wherein
Figure BDA0002084065270000083
Is the robot pose of the jth keyframe near the keyframe.
(3) If the pixel u ' of the jth key frame has a valid depth value z ' in the corresponding depth image, the pixel u ' is back-projected to the 3D point p in world coordinates w′
Figure BDA0002084065270000084
(4) By mixing p with w′ And p w The distance d between the two and a set threshold value d mth Comparing to determine whether the pixel u is dynamic:
since the depth image and pose of the key frame are both in error and u 'may not be the pixel corresponding to u, the scheme operates by searching for a square around u' (empirically setting the square side length S to be10 pixels) such that d takes a minimum value d min (ii) a If d is min Greater than a threshold value d mth (threshold value d) mth Set to increase linearly with the depth value z'), then the pixel u is preliminarily determined to be static, otherwise it is preliminarily determined to be dynamic; other cases are that no valid depth value is found in the square search area, or that u is out of range of the image, in which case the pixel u is judged to be invalid.
Considering that the result of one key frame is not reliable enough and may produce invalid result, the present solution applies the above-mentioned preliminary judgment processes (1) - (4) to all the neighboring key frames of the new key frame (the present embodiment selects 6 key frames), and finally, the final condition of the pixel u is determined by voting: assuming that the pixel u is in the preliminary judgment results of all the neighboring key frames, the number of static results is N S The number of dynamic results is N d The number of invalid results is N i The final properties of pixel u are as follows:
if (N) S ≥N d ,N S > 0), then pixel u is a static pixel;
if (N) d ≥N s ,N d > 0), then pixel u is a dynamic pixel;
if (N) S =N d = 0), pixel u is invalid.
2.2.5 determining whether a cluster is dynamic
The step also adopts the voting method of the previous step to determine the cluster attributes; among the uniformly selected M pixels in one cluster, the number of static pixels is assumed to be N s ', the number of dynamic pixels is N d ', the number of invalid pixels is N i ', the final properties of the cluster are as follows:
if (N) S '≥N d '), the cluster is a static cluster, and the cluster is reserved;
if (N) d '≥N s ') the cluster is a dynamic cluster, and the cluster is removed.
Step 3, sparse mapping
The main purpose of sparse mapping is to receive and process the key frame with the removed dynamic pixels, optimize the robot pose of the key frame, add new map points and maintain the quality and scale of the key frame set. The method comprises the following specific steps:
step 3.1, processing newly introduced key frame
Calculating the Bow vector of the current key frame, and updating the map point of the current key frame;
step 3.2, local BA
The sliding window local BA is applied to optimize the pose of the robot, an optimization framework is shown as figure 2, an optimization object is the pose in the current frame, and the optimization is carried out by:
(1) All poses in the keyframes connected with the current keyframe in the sliding window; in order to balance time and precision, the scheme sets n in a sliding window to be 7;
(2) Two black map points created before the key frame in the sliding window;
(3) Two white map points created after the key frame in the sliding window, which are not as variables, are already fixed. After local BA optimization, the pose of a new keyframe is optimized, creating a new map point, which will be used to construct an octree map.
Step 3.3, local Key frame screening
In order to control the reconstruction density and the complexity of BA optimization, the step also comprises a process of detecting redundant key frames and removing the redundant key frames; as can be seen from the reprojection, if 90% of the pixels on a key frame can be observed by more than three arbitrary key frames, it is considered as a redundant key frame and deleted.
Step 4, closed loop detection
Each new key frame is closed-loop detected using the bag-of-words model Dbow 2. Once a closed loop is detected, optimizing a pose graph; the process is the same as the ORB-SLAM2 algorithm, and the specific pose graph optimization process belongs to the prior art and is not described again.
Step 5, constructing octree map
Dividing map points which are created and optimized by the octree into voxels (or small squares), and storing the voxels through an octree structure to construct an octree map; and (4) calculating the occupation probability of the voxel to judge whether the voxel is occupied, and if so, visualizing in an octree graph.
As shown in fig. 3, the octree map is constructed to continuously update whether the voxels are occupied; octree maps use the form of probability to indicate whether a voxel is occupied, unlike map points that are only 0 for blank and 1 for occupied; the following is described using a probability log-valued method. The method comprises the following specific steps:
step 5.1, setting Z t Representing the observation (which can be obtained by reprojection) of a voxel n at time t, the log probability value of the voxel n starting at time t being L (n | Z) 1:t ) Then, at time t +1, the probability logarithm of voxel n is given by the following formula:
L(n|Z 1:t+1 )=L(n|Z 1:t-1 )+L(n|Z 1:t ) Formula 8
If voxel n is observed at time t, L (n | Z) t ) Is tau, otherwise is 0; the increment τ is a predetermined value. The formula indicates that the log of the probability of a voxel will increase when repeatedly observed to be occupied, and decrease otherwise.
Step 5.2, defining p ∈ [0,1] to represent the probability of voxel being occupied, and l ∈ R to represent the probability logarithm value, l can be calculated by the logit transformation:
Figure BDA0002084065270000101
the above equation is inversely transformed into:
Figure BDA0002084065270000102
calculating the occupancy probability p of the voxel by substituting the probability logarithm value obtained in the last step into the formula 10; only if the probability of occupancy p is greater than a predetermined threshold, the voxel n is considered occupied and will be visualized in the octree graph.
In other words, voxels that have been observed to occupy multiple times are considered stable occupied voxels, and in this way the present solution can deal well with the mapping problem in dynamic environments. Under complex conditions, the octree map is helpful to enhance elimination of dynamic pixels and minimize the influence of dynamic objects.

Claims (7)

1. A dynamic scene mapping and positioning method based on ORB-SLAM2 is characterized by comprising the following steps:
step 1, local map tracking
Initializing the pose of a camera by using image information captured by the camera carried by the robot; taking a first frame image captured by a camera as a key frame during initialization; after the initial pose is obtained, tracking the local map, so as to optimize the pose of the camera and produce a new key frame;
step 2, dynamic pixel elimination
Detecting a predefined dynamic object in the color image of the new key frame using a target detection algorithm, and then identifying dynamic pixels in combination with the depth image of the new key frame; dynamic pixels detected by the two methods are removed;
step 3, sparse mapping
For the key frames with the dynamic pixels removed, the robot poses of the key frames are optimized, new map points are added, and the quality and scale of a key frame set are maintained;
step 4, closed loop detection
Performing closed loop detection on each new key frame, and once closed loops are detected, performing pose graph optimization;
step 5, constructing an octree map
Dividing map points into voxels by using an octree, and storing the voxels through an octree structure to construct an octree map; and (4) judging whether the voxels are occupied or not by calculating the occupancy probability of the voxels, and if so, visualizing in an octree graph.
2. The ORB-SLAM2 based dynamic scene mapping and localization method of claim 1, wherein tracking the local map to optimize camera pose and generate new keyframes comprises:
the local map refers to a 3D point observed by a key frame which is 4m away from the current frame and has an angle of 1 rad; more matched 3D points are obtained through re-projection, so that the camera pose is optimized and new key frames are generated with the minimum error:
projecting the 3D points on the local map to the current frame to obtain 3D-2D feature matching;
limiting the area of searching 2D matching points in the current frame to reduce mismatching, then comparing the pixel in the current frame with the position obtained by projecting the 3D point according to the current estimated camera pose to obtain an error structure least square method problem, minimizing the error structure least square method problem, and then searching the best camera pose to position;
and judging whether a new key frame needs to be generated or not according to preset conditions.
3. The ORB-SLAM2 based dynamic scene mapping and localization method of claim 1, wherein the step 2 of identifying dynamic pixels in combination with the depth image of the new key frame comprises:
projecting the remaining pixels of the predefined object removed by the target detection algorithm to world coordinates to create a 3D point; dividing the 3D points into a plurality of clusters, and uniformly selecting M pixels from each cluster; for each pixel, the pixel is projected to the N frames of key frames nearest to the new key frame and compared to detect whether the pixel is a dynamic pixel:
backprojecting pixel u to 3D point p in world coordinates using depth in depth image of keyframe and robot pose of new keyframe w
Point p in 3D w Projecting the color image of the jth key frame near the new key frame;
if the pixel u ' of the jth key frame has a valid depth value z ' in the corresponding depth image, the pixel u ' is back-projected to the 3D point p in world coordinates w′
By mixing p with w′ And p w The distance d between the two and a set threshold value d mth Comparing to determine the pixelWhether u is dynamic:
search for pixels within a square around u' such that d takes the minimum value d min (ii) a If d is min Greater than a threshold value d mth If not, the pixel u is judged to be dynamic.
4. The ORB-SLAM 2-based dynamic scene mapping and localization method of claim 3, wherein assuming that pixel u is among the preliminary decision results of all nearby keyframes, the number of static results is N S The number of dynamic results is N d The number of invalid results is N i The final properties of pixel u are as follows:
if N is present S ≥N d ,N S If the pixel u is more than 0, the pixel u is a static pixel;
if N is present d ≥N s ,N d If the pixel u is more than 0, the pixel u is a dynamic pixel;
if N is present S =N d =0, pixel u is invalid.
5. The ORB-SLAM 2-based dynamic scene mapping and positioning method of claim 4, wherein the dynamic pixels detected by both methods are removed, wherein the removing method after identifying the dynamic pixels by combining the depth image of the new key frame comprises:
among the uniformly selected M pixels in one cluster, the number of static pixels is assumed to be N s ', the number of dynamic pixels is N d ', the number of invalid pixels is N i ', the final properties of the cluster are as follows:
if (N) S '≥N d '), the cluster is a static cluster, and the cluster is reserved;
if (N) d '≥N s ') and the cluster is a dynamic cluster, and the cluster is removed.
6. The ORB-SLAM 2-based dynamic scene mapping and localization method of claim 1, wherein optimizing keyframe robot pose, adding new map points, maintaining keyframe set quality and scale, comprises:
calculating a bag-of-words model (Bow) vector of the current key frame, and updating the map point of the current key frame;
optimizing the pose of the robot by using a sliding window local beam adjustment optimization BA, wherein an optimized object is the pose in the current frame;
redundant key frames are detected and culled, and if 90% of the pixels on a key frame can be observed by more than three arbitrary key frames, the key frame is considered to be a redundant key frame and deleted.
7. The ORB-SLAM 2-based dynamic scene mapping and localization method according to claim 1, wherein said determining whether voxels are occupied by calculating their occupancy probability, if occupied then visualizing in an octree map, comprises:
is provided with Z t Representing the observation of a voxel n at time t, the log of the probability of the voxel n starting at time t is L (n | Z) 1:t ) Then, at time t +1, the probability logarithm of voxel n is given by the following equation:
L(n|Z 1:t+1 )=L(n|Z 1:t-1 )+L(n|Z 1:t ) Formula 8
If voxel n is observed at time t, L (n | Z) t ) Is tau, otherwise is 0; the increment τ is a predetermined value;
defining p ∈ [0,1] to represent the probability of a voxel being occupied, l ∈ R to represent the logarithm of the probability, l is calculated by the logit transform:
Figure FDA0003884835770000031
the above equation is inversely transformed:
Figure FDA0003884835770000032
calculating an occupation probability p of the voxel by substituting a probability logarithm value into formula 10; only if the probability of occupancy p is greater than a predetermined threshold, the voxel n is considered occupied and will be visualized in the octree graph.
CN201910481714.1A 2019-06-04 2019-06-04 ORB-SLAM 2-based dynamic scene mapping and positioning method Active CN110378997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910481714.1A CN110378997B (en) 2019-06-04 2019-06-04 ORB-SLAM 2-based dynamic scene mapping and positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910481714.1A CN110378997B (en) 2019-06-04 2019-06-04 ORB-SLAM 2-based dynamic scene mapping and positioning method

Publications (2)

Publication Number Publication Date
CN110378997A CN110378997A (en) 2019-10-25
CN110378997B true CN110378997B (en) 2023-01-20

Family

ID=68249727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910481714.1A Active CN110378997B (en) 2019-06-04 2019-06-04 ORB-SLAM 2-based dynamic scene mapping and positioning method

Country Status (1)

Country Link
CN (1) CN110378997B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827305B (en) * 2019-10-30 2021-06-08 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN110930519B (en) * 2019-11-14 2023-06-20 华南智能机器人创新研究院 Semantic ORB-SLAM sensing method and device based on environment understanding
CN111178342B (en) * 2020-04-10 2020-07-07 浙江欣奕华智能科技有限公司 Pose graph optimization method, device, equipment and medium
WO2021207999A1 (en) * 2020-04-16 2021-10-21 华为技术有限公司 Vehicle positioning method and apparatus, and positioning map layer generation method and apparatus
CN111539305B (en) * 2020-04-20 2024-03-12 肇庆小鹏汽车有限公司 Map construction method and system, vehicle and storage medium
CN111709982B (en) * 2020-05-22 2022-08-26 浙江四点灵机器人股份有限公司 Three-dimensional reconstruction method for dynamic environment
CN111914832B (en) * 2020-06-03 2023-06-13 华南理工大学 SLAM method of RGB-D camera under dynamic scene
CN111862162B (en) * 2020-07-31 2021-06-11 湖北亿咖通科技有限公司 Loop detection method and system, readable storage medium and electronic device
CN112884835A (en) * 2020-09-17 2021-06-01 中国人民解放军陆军工程大学 Visual SLAM method for target detection based on deep learning
CN112802053B (en) * 2021-01-27 2023-04-11 广东工业大学 Dynamic object detection method for dense mapping in dynamic environment
CN113547501B (en) * 2021-07-29 2022-10-28 中国科学技术大学 SLAM-based mobile mechanical arm cart task planning and control method
CN114529672A (en) * 2022-02-15 2022-05-24 国网山东省电力公司建设公司 Three-dimensional reconstruction method and system for transformer substation construction site
CN117596367A (en) * 2024-01-19 2024-02-23 安徽协创物联网技术有限公司 Low-power-consumption video monitoring camera and control method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105856230A (en) * 2016-05-06 2016-08-17 简燕梅 ORB key frame closed-loop detection SLAM method capable of improving consistency of position and pose of robot
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105856230A (en) * 2016-05-06 2016-08-17 简燕梅 ORB key frame closed-loop detection SLAM method capable of improving consistency of position and pose of robot
CN108596974A (en) * 2018-04-04 2018-09-28 清华大学 Dynamic scene robot localization builds drawing system and method
CN109631855A (en) * 2019-01-25 2019-04-16 西安电子科技大学 High-precision vehicle positioning method based on ORB-SLAM

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种动态场景下语义分割优化的ORB_SLAM2;王召东等;《大连海事大学学报》;20181221(第04期);全文 *

Also Published As

Publication number Publication date
CN110378997A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110378997B (en) ORB-SLAM 2-based dynamic scene mapping and positioning method
CN109508678B (en) Training method of face detection model, and detection method and device of face key points
CN109299643B (en) Face recognition method and system based on large-posture alignment
CN113537208A (en) Visual positioning method and system based on semantic ORB-SLAM technology
CN110032925B (en) Gesture image segmentation and recognition method based on improved capsule network and algorithm
Delmerico et al. Building facade detection, segmentation, and parameter estimation for mobile robot localization and guidance
JP2006209755A (en) Method for tracing moving object inside frame sequence acquired from scene
Biresaw et al. Tracker-level fusion for robust Bayesian visual tracking
Rambach et al. Learning 6dof object poses from synthetic single channel images
CN105809716B (en) Foreground extraction method integrating superpixel and three-dimensional self-organizing background subtraction method
AU2020300067B2 (en) Layered motion representation and extraction in monocular still camera videos
CN107403451B (en) Self-adaptive binary characteristic monocular vision odometer method, computer and robot
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN115035260A (en) Indoor mobile robot three-dimensional semantic map construction method
CN112507848B (en) Mobile terminal real-time human face attitude estimation method
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
Wang et al. Digital twin: Acquiring high-fidelity 3D avatar from a single image
CN118071873A (en) Dense Gaussian map reconstruction method and system in dynamic environment
CN107274477B (en) Background modeling method based on three-dimensional space surface layer
CN113436251A (en) Pose estimation system and method based on improved YOLO6D algorithm
Paterson et al. 3D head tracking using non-linear optimization.
CN108985216A (en) A kind of pedestrian head detection method based on multiple logistic regression Fusion Features
CN115063715A (en) ORB-SLAM3 loop detection acceleration method based on gray level histogram
CN111932629A (en) Target positioning method and system based on deep neural network
Mahmood et al. Learning indoor layouts from simple point-clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant