CN110378997A

CN110378997A - A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method

Info

Publication number: CN110378997A
Application number: CN201910481714.1A
Authority: CN
Inventors: 龙土志; 蔡述庭; 李丹; 杨家兵; 董海涛
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-10-25
Anticipated expiration: 2039-06-04
Also published as: CN110378997B

Abstract

The invention discloses a kind of dynamic scenes based on ORB-SLAM2 to build figure and localization method, including local map tracking process, dynamic pixel reject process, sparse mapping process, closed loop detection process and building Octree map process；This method has the function of dynamic pixel rejecting, combines the depth image of new key frame to be quickly detected mobile object in the image information of camera by object detection method and constructs a clean static background Octree map in complicated dynamic environment.

Description

A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method

Technical field

It is synchronized the present invention relates to robot and builds figure and positioning slam technical field, and in particular to one kind is based on orb-slam 2 Dynamic scene build figure and localization method.

Background technique

SLAM (while positioning and map reconstruction) is always computer vision and the hot topic of robot field, while The concern of many high-tech companies is attracted.SLAM technology is a map to be established in unknown environment and can be on ground Positioning in real time in figure.The frame of modern times visualization SLAM system is highly developed, such as ORB-SLAM2, LSD-SLAM；It is most advanced Visual synchronization positioning and mapping (V-SLAM) system there is high-precision positioning function, but these most of system postulations operation Environment is static, to limit their application.

It is directed in dynamic scene and establishes static map, existing algorithm has its deficiency, such as DynaSLAM cannot In real time, DynaSLAM only rejects the pixel of predefined object, and for undefined object or only predefined object A part of then can not reject, StaticFusion system can not reject people static for a long time, and algorithm above all cannot be Complicated dynamic environment establishes a clean static map in real time.

Summary of the invention

For above-mentioned problems of the prior art, it is dynamic based on ORB-SLAM2 that the object of the present invention is to provide a kind of State scene builds figure and localization method, and this method has the function of dynamic pixel rejecting, newly crucial by object detection method set The depth image of frame is quickly detected mobile object in the image information of camera and constructs one in complicated dynamic environment Clean static background Octree map.

In order to realize above-mentioned task, the invention adopts the following technical scheme:

A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method, comprising the following steps:

Step 1, local map tracks

Camera pose is initialized using camera captured image information entrained by robot；By phase when initialization The first frame image of machine capture is as key frame；After obtaining initial pose, local map is tracked, to optimize phase seat in the plane Appearance and the new key frame of production；

Step 2, dynamic pixel is rejected

Predefined dynamic object is detected in the color image of new key frame using algorithm of target detection, is closed then in conjunction with new The depth image of key frame identifies dynamic pixel；The dynamic pixel successively detected by both methods is all removed；

Step 3, sparse mapping

For eliminating the key frame of dynamic pixel, optimizes the robot pose of key frame, increase new point map, safeguard The quality and scale of key frame set；

Step 4, closed loop detects

Closed loop detection is carried out to each new key frame, once detecting closed loop, carries out the optimization of pose figure；

Step 5, Octree map is constructed

Point map is divided into voxel using Octree, and these voxels are stored by octree structure, with eight fork of building Set map；Judge whether voxel is occupied by calculating the occupation probability of voxel, is carried out visually if occupying in Octree figure Change.

Further, described that local map is tracked, thus optimize camera pose and produce new key frame, Include:

The local map refers to the point of 3D observed by the key frame of distance and visual angle close to present frame；Pass through weight Projection obtains more matched 3D points, so that minimal error optimizes camera pose and generates new key frame:

3D point in local map is projected on present frame, 3D-2D characteristic matching is obtained；

The region of the search 2D match point of present frame is limited in reduce error hiding, then by the pixel and 3D in present frame Point constructs Least Square Solution according to the error that the position that the camera pose currently estimated is projected compares, It is minimized, best camera pose is then looked for, to be positioned；

It is determined whether to generate new key frame according to preset condition.

Further, the depth image of the new key frame of combination described in step 2 identifies dynamic pixel, comprising:

Get off to create 3D to world coordinates the predefined remaining pixel projection of object is rejected by algorithm of target detection Point；3D point is divided into multiple clusters, M pixel is selected uniformly at from each cluster；For each pixel, by pixel projection to from It is for no dynamic pixel that the new nearest N frame key frame of key frame, which is compared to detection pixel:

Using the robot pose of depth and new key frame in the depth image of key frame by pixel u back projection to the world 3D point p under coordinate^w；

By 3D point p^wIt projects on the color image of j-th of key frame near new key frame；

If there are effective depth value z ', pixel u ' on corresponding depth image by the pixel u ' of j-th of key frame 3D point p under back projection to world coordinates^w′；

By by p^w′And p^wThe distance between d and setting threshold value d_mthCompare to judge whether pixel u is dynamic:

The pixel in the square around u ' is searched for, so that d is minimized d_min；If d_minGreater than threshold value d_mth, then tentatively Judge that pixel u is judged as static state, otherwise tentatively judges that it is dynamic.

Further, it is assumed that in all preliminary judging results of key frame nearby, the quantity of static result is pixel u N_S, the quantity of dynamic result is N_d, the quantity of null result is N_i, the final attribute of pixel u is as follows:

If (N_S≥N_d,N_S> 0), then pixel u is static pixels；

If (N_d≥N_s,N_d> 0), then pixel u is dynamic pixel；

If (N_S=N_d=0), then pixel u is invalid.

Further, the successive dynamic pixel detected by both methods is all removed, wherein combining new The depth image of key frame is come the method after identifying dynamic pixel, rejected are as follows:

In a cluster in M pixel of uniform design, it is assumed that static pixels number is N_s', dynamic pixel number is N_d', in vain Pixel number is N_i', the final attribute of cluster is as follows:

If (N_S'≥N_d'), then the cluster is static cluster, retains the cluster；

If (N_d'≥N_s'), then the cluster is Dynamic Cluster, is rejected.

Further, the robot pose of the optimization key frame increases new point map, maintenance key frame set Quality and scale, comprising:

The Bow vector for calculating current key frame, updates the point map of current key frame；

The optimization to robot pose is carried out using sliding window part BA, optimization object is the pose in present frame；

Detection redundancy key frames are simultaneously rejected, if 90% pixel can be exceeded three any key frames sights on key frame It observes, is then considered as redundancy key frames, and be deleted.

Further, the occupation probability by calculating voxel judges whether voxel is occupied, if occupying Octree figure is visualized, comprising:

If Z_tIndicate voxel n time t observation as a result, probability logarithm of the voxel n since the time is t be L (n | Z_1:t), then when the time is t+1, the probability logarithm of voxel n is obtained by following formula:

L(n|Z_1:t+1)=L (n | Z_1:t-1)+L(n|Z_1:t) formula 8

If voxel n is observed in time t, and L (n | Z_t) it is τ, it is otherwise 0；Increment τ is predetermined value；

Defining the general and l ∈ R that p ∈ [0,1] indicates that voxel is occupied indicates probability logarithm, and l is converted by logit and counted It obtains:

Above equation inverse transformation are as follows:

The occupation probability p of voxel is calculated by the way that probability logarithm is substituted into formula 10；Only when acquistion probability p is greater than predetermined threshold When value, just think that voxel n is occupied and will visualize in Octree figure.

The present invention has following technical characterstic:

1. rapidity

The algorithm is detected dynamic object using CornerNet-Squeeze algorithm of target detection and uses k-means Variant Mini Batch K-Means clustering algorithm clusters the depth information of image, and the algorithm than there is now is fast.Because CornerNet-Squeeze algorithm of target detection, which handles a picture, only needs 34ms, will be fast than YOLOv3 scheduling algorithm, and test is hard Part environment are as follows: 1080ti GPU+Intel Core i7-7700k CPU.It is greater than 10,000 clustering method, k- for big data Fast three times of k-means itself of Batch K-Means ratio of means variant Mini, performance is also not much different.

In addition, the foundation of Octree map also shortens the renewal time of map.

2. can establish very clean static map in complex environment

The algorithm combining target detection method and the probability logarithm of Octree map update voxel mode to detect simultaneously Reject dynamic pixel.

Detailed description of the invention

Fig. 1 is the flow chart of the method for the present invention；

Fig. 2 is part BA optimization process figure；

Fig. 3 is Octree cartographic model.

Specific embodiment

ORB-SLAM2 is to be based on monocular, the SLAM scheme of the complete set of binocular and RGB-D camera.It can be realized ground Figure reuses, the function of winding detection and repositioning.But it is assumed that operating environment be it is static, limit its application.

Inventive algorithm is proposed on the basis of ORB-SLAM2 algorithm, can be real-time, quickly in dynamic environment It is middle to establish a clean static Octree map, mainly it is made of five steps: local map tracks, dynamic pixel is rejected, Sparse mapping, closed loop detection and creation Octree map, overall flow figure are as shown in Figure 1.Particular content is as follows:

Step 1, local map tracks

Camera pose is initialized using camera captured image information entrained by robot；By phase when initialization The first frame image of machine capture is as key frame；After obtaining initial pose, local map is tracked, to optimize phase seat in the plane Appearance and the new key frame of production；Specific step is as follows:

Step 1.1, the original RGB image information (including color image and depth image) captured by Kinect2 is mentioned Take ORB characteristic point and matched, then pass sequentially through motor pattern, critical mode and reset bit pattern these three modes come with Track and initialization camera pose, that is, position；When initialization, first frame is set as key frame, this step 1.1 and ORB-SLAM2 mono- Sample.

Step 1.2, after obtaining initial pose, local map is tracked, local map refer to distance and visual angle ( Distance is set as 4m, and angle is set as within the scope of 1rad) close to the key frame of present frame (photo of camera current shooting), (part is crucial Frame) observed by 3D point；More matched 3D points are obtained by re-projection, so that minimal error optimizes camera pose:

(1) it defines:

From camera coordinates system c to the transformation matrix of robot coordinate system r:

Transformation matrix (i.e. the pose of robot) from robot coordinate system r to world coordinate system w:

From the corresponding 3D point P of a frame picture C_cProject to the projection relation of the 2D point u on this picture:

Combine its corresponding depth information z back projection to the corresponding 3D point P of this picture from the 2D point u on a frame picture C_cInstead Projection relation:

(2) re-projection obtains characteristic matching:

If robot pose (i.e. present frame pose) isBy the 3D point in local mapIt projects on present frame, then The characteristic matching of 3D-2D can be obtained:

(3) optimize camera pose:

Under dynamic scene, characteristic matching can have a large amount of error hiding, and in order to solve this problem, the present invention is using limit The region (circle that radius is set as 3 pixels) in the search 2D match point (i.e. pixel) of present frame has been made to reduce error hiding. Then by the pixel u in present frame_iThe position u projected with 3D point according to the camera pose currently estimated_i' (formula 3) phase The error construction Least Square Solution compared, minimizes it, best camera pose is then looked for, to be determined Position:

(4) it is finally determined whether to generate new key frame according to preset condition；This preset condition and orb-slam2 algorithm Equally.

Step 2, dynamic pixel is rejected

Static map is constructed in dynamic scene, the identification and deletion of dynamic pixel are most criticals.Because only that crucial Frame is for constructing Octree map, so only reject to the key frame (new key frame) that previous step is newly selected herein dynamic State pixel.

The step uses object detection method to detect predefined dynamic object in the color image of new key frame first, so Dynamic pixel is identified in conjunction with the depth image of new key frame afterwards；The dynamic pixel successively detected by both methods all by It rejects.Steps are as follows:

Step 2.1, for example firstly for predefined object: people, desk, chair etc., this programme can pass through CornerNet-Squeeze algorithm of target detection in CornerNet-Lite is come predetermined in the color image to new key frame Adopted dynamic object is detected, if detecting dynamic object, the pixel of dynamic object is rejected.

CornerNet-Squeeze algorithm of target detection, which handles a picture, only needs 34ms, will than YOLOv3 scheduling algorithm Fastly, hardware environment is tested are as follows: 1080ti GPU+Intel Core i7-7700k CPU.

Step 2.2, for there are the one of the either predefined object such as some undefined dynamic objects such as books, chest Part fails to be detected to obtain such as the hand of people by algorithm of target detection, and this programme is passing through previous step by the following method Depth image on processed color image in conjunction with new key frame carries out detection dynamic pixel:

2.2.1 get off wound the predefined remaining pixel projection of object is rejected by algorithm of target detection to world coordinates Build 3D point.

2.2.2 3D point is divided into several clusters using clustering algorithm

3D point is divided into multiple clusters；The quantity k of cluster is according to the quantity s of 3D point_pIt determines: k=s_p/n_pt, n_ptIt adjusts The averagely counting of cluster, s_pThe size for indicating point cloud, is selected uniformly at M pixel from each cluster.

Since the high number and cluster speed needs of pixel are as fast as possible, the change of K-means is used in this programme Body clustering method Mini Batch K-Means method.Wherein, n is reduced_ptIt can guarantee better approximation, but also will increase calculating Burden, this programme is by n_pt6000 are set as with EQUILIBRIUM CALCULATION FOR PROCESS consumption and precision.

Because this programme, which is absorbed in, removes dynamic pixel and building static map without tracking dynamic object, it is assumed that poly- Class is rigid body, it means that the pixel movement properties having the same in same cluster；Therefore, this programme only needs which is detected Cluster is Dynamic Cluster；In order to accelerate dynamic clustering detection process, this programme equably selects wherein M=100 for each cluster Pixel.

In the step of below, the dynamic selected and static attribute are judged；If dynamic pixel is more than static pixels, it is determined that The cluster is otherwise to be determined as then being retained for static state dynamically to be rejected.

2.2.4 judge whether pixel is dynamic pixel

This programme passes through M pixel projection selecting step 2.2.2 to the N=6 frame key frame nearest from new key frame (near new key frame) is compared to whether detection pixel is for dynamic pixel.Specific step is as follows:

(1) using the robot pose of depth z and new key frame n in the depth image of key frameIt throws pixel u is counter 3D point p under shadow to world coordinates^w:

(2) by 3D point p^wIt projects on the color image of j-th of key frame near new key frame:

WhereinIt is the robot pose of j-th of key frame near key frame.

(3) if there are effective depth value z ', pixels on corresponding depth image by the pixel u ' of j-th of key frame 3D point p under u ' back projection to world coordinates^w′:

(4) by by p^w′And p^wThe distance between d and setting threshold value d_mthCompare to judge whether pixel u is dynamic:

Because the depth image and posture of key frame have error, u ' may not be pixel corresponding with u, so this programme By the pixel in the square (rule of thumb setting 10 pixels for square side length S) around search u ', so that d takes most Small value d_min；If d_minGreater than threshold value d_mth(threshold value d_mthIt is set as and depth value z ' linear increase), then tentatively judge that pixel u sentences Break to be static, otherwise tentatively judges that it is dynamic；Other situations are to can not find effective depth in moving-square search region Value or u exceed the range of image, and in this case, pixel u is judged as in vain.

Less reliable in view of the result of a key frame and there may be null result, this programme is by above-mentioned preliminary judgement Process (1)-(4) are applied to all key frames (the present embodiment selects 6 frame key frames) nearby of new key frame, finally, pixel u is most Whole situation is determined by voting: assuming that pixel u is in all preliminary judging results of key frame nearby, the quantity of static result is N_S, the quantity of dynamic result is N_d, the quantity of null result is N_i, the final attribute of pixel u is as follows:

If (N_S≥N_d,N_S> 0), then pixel u is static pixels；

If (N_d≥N_s,N_d> 0), then pixel u is dynamic pixel；

If (N_S=N_d=0), then pixel u is invalid.

2.2.5 judge whether a cluster is dynamic

The step uses the voting method of previous step also to determine the attribute of cluster；The M picture of uniform design in a cluster In element, it is assumed that static pixels number is N_s', dynamic pixel number is N_d', inactive pixels number is N_i', the final attribute of cluster is as follows:

If (N_S'≥N_d'), then the cluster is static cluster, retains the cluster；

If (N_d'≥N_s'), then the cluster is Dynamic Cluster, is rejected.

Step 3, sparse mapping

Sparse mapping main purpose is the key frame for receiving processing by rejecting dynamic pixel, optimizes the machine of this key frame People's pose increases new point map, safeguards the quality and scale of key frame set.Specific step is as follows:

Step 3.1, the key frame newly introduced is handled

Step 3.2, local BA

The optimization to robot pose is carried out using sliding window part BA, Optimization Framework is as shown in Fig. 2, optimization object is Pose in present frame participates in having for optimization:

(1) pose in all key frames being connected in sliding window with current key frame；For equilibration time and essence N in sliding window is set as 7 by degree, this programme；

(2) the two black point maps created before the key frame in sliding window；

(3) the white point map of two created after the key frame in sliding window, the two white point maps are not made It has been fixed for variable.After local BA optimization, the pose of new key frame is optimized, and creates new point map, The new key frame will be used to construct Octree map.

Step 3.3, local key frame screening

In order to control the complexity for rebuilding density and BA optimization, which further includes the mistake for detecting redundancy key frames and rejecting Journey；By re-projection it is found that being recognized if 90% pixel can be exceeded three any key frames and observe on key frame To be redundancy key frames, and it is deleted.

Step 4, closed loop detects

Closed loop detection is carried out to each new key frame using bag of words Dbow2.Once detecting closed loop, position will be carried out The optimization of appearance figure；As ORB-SLAM2 algorithm, the specific pose figure optimization process category prior art repeats no more this process.

Step 5, Octree map is constructed

Point map using Octree by being created above and by optimization is divided into voxel (or small cube), and passes through eight forks These voxels of storage of data structure, to construct Octree map；Judge whether voxel is occupied by calculating the occupation probability of voxel According to being visualized in Octree figure as occupied.

As shown in figure 3, building Octree map is constantly to go to update whether these voxels are occupied；Octree map makes Indicate whether voxel is occupied with the form of probability, only using rather than point map 0 indicates blank, and 1 indicates to be occupied；It uses below Probability logarithm method describes.Specific step is as follows:

Step 5.1, if Z_tIndicate observation result (by re-projection it can be concluded that result) of the voxel n in time t, voxel n Probability logarithm since the time is t be L (n | Z_1:t), then when the time is t+1, the probability logarithm of voxel n is by following Formula can obtain:

L(n|Z_1:t+1)=L (n | Z_1:t-1)+L(n|Z_1:t) formula 8

If voxel n is observed in time t, and L (n | Z_t) it is τ, it is otherwise 0；Increment τ is predetermined value.The formula table Show that the probability logarithm of the voxel when repeated observation is occupied to voxel will increase, otherwise reduces.

Step 5.2, defining the general and l ∈ R that p ∈ [0,1] indicates that voxel is occupied indicates probability logarithm, and l can pass through Logit transformation calculations obtain:

Above equation inverse transformation are as follows:

Formula 10 is substituted by probability logarithm obtained in the previous step to calculate the occupation probability p of voxel；Only as acquistion probability p When greater than predetermined threshold, just think that voxel n is occupied and will visualize in Octree figure.

In other words, being observed the voxel repeatedly occupied is considered as the stable voxel that occupies, by this method, This programme can handle the map structuring problem in dynamic environment well.In complex situations, Octree map helps reinforcement Rejecting to dynamic pixel minimizes the influence of dynamic object.

Claims

1. a kind of dynamic scene based on ORB-SLAM2 builds figure and localization method, which comprises the following steps:

Step 1, local map tracks

Camera pose is initialized using camera captured image information entrained by robot；Camera is caught when initialization The first frame image obtained is as key frame；After obtaining initial pose, local map is tracked, thus optimize camera pose with And the key frame that production is new；

Step 2, dynamic pixel is rejected

Predefined dynamic object is detected in the color image of new key frame using algorithm of target detection, then in conjunction with new key frame Depth image identify dynamic pixel；The dynamic pixel successively detected by both methods is all removed；

Step 3, sparse mapping

For eliminating the key frame of dynamic pixel, optimizes the robot pose of key frame, increase new point map, maintenance is crucial The quality and scale of frame set；

Step 4, closed loop detects

Step 5, Octree map is constructed

Point map is divided into voxel using Octree, and these voxels are stored by octree structure, with constructing Octree Figure；Judge whether voxel is occupied by calculating the occupation probability of voxel, as occupied, is visualized in Octree figure.

2. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described Local map is tracked, to optimize camera pose and the new key frame of production, comprising:

The local map refers to the point of 3D observed by the key frame of distance and visual angle close to present frame；Pass through re-projection More matched 3D points are obtained, so that minimal error optimizes camera pose and generates new key frame:

The region of the search 2D match point of present frame is limited in reduce error hiding, then presses the pixel in present frame with 3D Least Square Solution is constructed according to the error that the position that the camera pose currently estimated is projected compares, makes it It minimizes, best camera pose is then looked for, to be positioned；

3. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that step 2 The depth image of the new key frame of the combination identifies dynamic pixel, comprising:

Get off to create 3D point to world coordinates the predefined remaining pixel projection of object is rejected by algorithm of target detection；It will 3D point is divided into multiple clusters, and M pixel is selected uniformly at from each cluster；For each pixel, pixel projection is closed to from new It is for no dynamic pixel that the nearest N frame key frame of key frame, which is compared to detection pixel:

Using the robot pose of depth and new key frame in the depth image of key frame by pixel u back projection to world coordinates Under 3D point p^w；

If there are effective depth value z ', pixel u ' instead to throw on corresponding depth image by the pixel u ' of j-th of key frame 3D point p under shadow to world coordinates^w′；

The pixel in the square around u ' is searched for, so that d is minimized d_min；If d_minGreater than threshold value d_mth, then preliminary judgement Pixel u is judged as static, otherwise tentatively judges that it is dynamic.

4. the dynamic scene based on ORB-SLAM2 builds figure and localization method as claimed in claim 3, which is characterized in that assuming that For pixel u in all preliminary judging results of key frame nearby, the quantity of static result is N_S, the quantity of dynamic result is N_d, nothing The quantity for imitating result is N_i, the final attribute of pixel u is as follows:

If (N_S≥N_d,N_S> 0), then pixel u is static pixels；

If (N_d≥N_s,N_d> 0), then pixel u is dynamic pixel；

If (N_S=N_d=0), then pixel u is invalid.

5. the dynamic scene based on ORB-SLAM2 builds figure and localization method as claimed in claim 4, which is characterized in that described The successive dynamic pixel detected by both methods be all removed, wherein the depth image in conjunction with new key frame identifies After dynamic pixel, the method rejected are as follows:

In a cluster in M pixel of uniform design, it is assumed that static pixels number is N_s', dynamic pixel number is N_d', inactive pixels Number is N_i', the final attribute of cluster is as follows:

If (N_S'≥N_d'), then the cluster is static cluster, retains the cluster；

If (N_d'≥N_s'), then the cluster is Dynamic Cluster, is rejected.

6. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described Optimization key frame robot pose, increase new point map, safeguard the quality and scale of key frame set, comprising:

Detection redundancy key frames are simultaneously rejected, if 90% pixel can be exceeded three any key frame observations on key frame It arrives, is then considered as redundancy key frames, and be deleted.

7. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described The occupation probability by calculating voxel judge whether voxel is occupied, as occupied, visualized, wrapped in Octree figure It includes:

If Z_tIndicate voxel n time t observation as a result, probability logarithm of the voxel n since the time is t be L (n | Z_1:t), so Afterwards when the time is t+1, the probability logarithm of voxel n is obtained by following formula:

L(n|Z_1:t+1)=L (n | Z_1:t-1)+L(n|Z_1:t) formula 8

Defining the general and l ∈ R that p ∈ [0,1] indicates that voxel is occupied indicates probability logarithm, and l is obtained by logit transformation calculations It arrives:

Above equation inverse transformation are as follows:

The occupation probability p of voxel is calculated by the way that probability logarithm is substituted into formula 10；Only when acquistion probability p is greater than predetermined threshold When, just think that voxel n is occupied and will visualize in Octree figure.