CN115143958A

CN115143958A - Multi-sensor fusion SLAM method based on GPU acceleration

Info

Publication number: CN115143958A
Application number: CN202210459576.9A
Authority: CN
Inventors: 徐锋; 明吉花; 张文凯; 崔云轩; 郭中远; 黄军杰
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-10-04

Abstract

The SLAM method based on GPU acceleration and multi-sensor fusion is developed and comprises four modules of laser radar/IMU tightly-coupled front-end odometer based on GPU acceleration, back-end optimization based on a factor graph, point cloud map creation and visual loopback detection based on GPU acceleration. Constructing a front-end odometer by a GPU acceleration point cloud registration method based on voxels; the back-end factors comprise an IMU pre-integration factor, a laser radar matching cost factor and a vision loop factor; the point cloud map is created and spliced into a globally consistent point cloud map by using a key frame mechanism; in the loop detection module, the scale of the camera is corrected through the laser radar, the GPU is used for accelerating to realize the feature detection part, and the vision is confirmed to be a loop and then further confirmed by a laser radar key frame. The invention effectively combines the advantages of a high-frequency milemeter tightly coupled by the laser radar/IMU and simple, rapid and accurate vision detection loop, avoids huge calculation amount of a CPU and improves the real-time property of the system.

Description

Multi-sensor fusion SLAM method based on GPU acceleration

Technical Field

The invention relates to the technical field of robot instant positioning and map construction, in particular to a GPU acceleration-based multi-sensor fusion SLAM method.

Background

An instant positioning and Mapping technology (SLAM) is a key technology for realizing autonomous navigation of a robot, and in recent years, a SLAM method of multi-sensor fusion is continuously concerned and researched and is widely applied. Although multi-sensor fusion improves the stability, accuracy and robustness of SLAM techniques, it also means that more powerful computer computing power is required as the amount of data increases, however, the fact is that there is a separation between the sensor functionality and the actual information processing functionality of the system. In recent years, further development of graphics image processing technology has made higher demands on operation speed, and the GPU of the graphics processor has attracted attention in various technical fields by virtue of its powerful computing power. The contradiction between the real-time performance and the huge data volume of the SLAM technology needs to be solved urgently, and the development of the GPU provides a fit point for the development of the GPU.

The SLAM technology is very dependent on the accuracy and the high efficiency of state estimation, so the front-end data matching is very important, and if the front-end data matching can be accurately and efficiently completed in this part, important guarantee is provided for the accuracy of a map and the overall real-time performance. Besides the front end, pose optimization and loop detection of the rear end play very important roles, especially in scenes with unobvious features or high repetition, the requirements on the integrity and richness of information captured by a sensor are high, the accuracy and the high efficiency of loop detection are worthy of further improvement, loop information can be captured in time, and accurate state information is updated, so that the SLAM has important significance. The prior art can only realize acceleration of a certain aspect, cannot simultaneously accelerate two parts, has defects in a fusion mode, has more accurate robot state based on a factor graph optimization fusion method, and is a research trend of multi-sensor fusion at present. Therefore, how to accelerate the running speed of the SLAM technology based on factor graph optimization and multi-sensor fusion to meet the application requirements of higher levels is a worthy direction to be researched.

Disclosure of Invention

In order to accurately complete the SLAM technology in real time, the invention provides a multisensor fusion SLAM method based on GPU acceleration, which is characterized by comprising a laser radar/IMU tightly-coupled front-end odometer module based on GPU acceleration, a back-end optimization module based on a factor graph, a point cloud map creation module and a visual loop detection module based on GPU acceleration:

the laser radar/IMU tightly coupled front-end odometer module based on GPU acceleration comprises the following point cloud matching steps of GPU acceleration:

step 1: down-sampling: a host Computer (CPU) reads laser radar data, voxel grid down-sampling operation is carried out on the radar point cloud data, dividing all point clouds into different voxels, and eliminating points which are far away from a grid and obviously wrong;

step 2: and motion compensation: a host Computer (CPU) uses IMU motion prediction to carry out motion compensation on point cloud distortion generated by motion in a radar frame, and projects each point cloud to the starting time of a laser radar frame;

and step 3: voxelized target point: setting a previous frame of point cloud data as a target point cloud by a host end (CPU) and storing the target point cloud in a container, searching the point cloud with the radius of the target point in a range of r by using a KD-tree and storing the point cloud in the container, copying the target point cloud from the host end (CPU) to an equipment end (GPU), wherein each target point cloud corresponds to a voxel index, initializing a mean value and a covariance (the step is skipped in a non-initialization mode);

and 4, step 4: voxelized source points: setting current frame point cloud data as source point clouds by a host computer end (CPU) and storing the source point clouds in a container, searching point clouds in a radius r range of a target point by using a KD-tree and storing the point clouds in the container, copying the source point clouds from the host computer end (CPU) to an equipment end (GPU), wherein each source point cloud corresponds to a voxel index, and initializing a mean value and a covariance;

and 5: calculating the mean and covariance: a host computer end (CPU) designs a kernel function according to equipment conditions and the number of point clouds, sets the width of a kernel, sets the number of threads equal to the number of voxels, drives an equipment end (GPU) to model the point clouds into Gaussian distribution according to indexes, and calculates the mean value and covariance of the source point clouds and the target point clouds;

step 6: calculating a relative transformation: calculating the corresponding relation between voxels by a device side (GPU), solving the maximum posterior probability by adopting maximum likelihood estimation, carrying out iterative optimization by using a Gauss-Newton method, weighting error items by using the number of points in the voxels, obtaining an information matrix according to the covariance matrix of the voxels, iterating until a target cost function is converged to obtain a corresponding transformation matrix T, and transmitting the transformation matrix T to a host side (CPU);

and 7: updating the point cloud: and setting the current frame point cloud as a target point cloud, and waiting for next calculation.

The factor graph-based back-end optimization module constructs a sliding window comprising an IMU pre-integration factor, a laser radar matching cost factor and a visual loopback factor. When a new sensor state is added, a fixed lag smoothing method is used for optimizing a factor graph and feeding back an updated pose state to a laser radar/IMU tightly-coupled front-end odometer module based on GPU acceleration for correction, and the marginalized key frame is used for constructing a point cloud map.

The visual loopback detection module based on GPU acceleration firstly extracts FAST corner features based on GPU acceleration, and then performs dictionary query, similarity scoring, group matching, time rationality check and structure consistency check. And when the loop is determined, adding the loop factor into a back-end optimization module based on the factor graph to execute global optimization.

Optionally, in the factor graph-based back-end optimization module, the size of the sensor state nodes in the constructed sliding window factor graph is fixed to 20. The constraints between the optimization factors are: the constraint between the edged key frame and the latest frame is a unary edge, and other factors such as IMU pre-integration factor, laser radar matching cost factor and visual loop factor are binary edges.

Optionally, the visual loop detection module based on GPU acceleration includes the following steps of:

step a: acquiring an image pyramid: a host Computer (CPU) performs secondary sampling on a camera input image to obtain an image pyramid;

step b: dividing into rectangular units: dividing each resolution image into rectangular units with fixed width and height by a host Computer (CPU), and copying the rectangular units to a device side (GPU), wherein each rectangular block corresponds to one thread block;

step c: determining a region of interest: calculating a FAST corner point in warp by a device side (GPU), and comparing the brightness of the selected pixel point with enough pixel points in the surrounding field of the selected pixel point so as to determine a region of interest (ROI);

step d: angular response function evaluation: for each pixel of a region of interest (ROI) in each rectangular block, the device side (GPU) performs a coarse angular response function (CCRF) evaluation, and then performs a Corner Response Function (CRF) evaluation;

step e: non-maxima suppression: a device side (GPU) divides the corner response image into regular cell grids, each cell is allocated with a thread block, a plurality of warps are used for processing one cell, and non-maximum value suppression is carried out;

step f: the CPU reads the FAST corner: and writing the maximum results (CRF scores and pixel positions) into a specified shared memory by all the warp in the thread block, selecting the maximum results for each unit by the first thread in the block and writing the maximum results into a global memory, namely finishing FAST corner detection of the whole image, and transmitting the results to a host (CPU).

Optionally, in the step c of extracting features of the FAST corner based on the GPU acceleration in the visual loop detection module based on the GPU acceleration, the FAST corner is calculated in the warp of the device side (GPU), first, 16 pixels with a radius of 3 are selected, and then, a result of comparing the 16 pixels is stored as a bit array to be used as an index of the lookup table. Calculate all possible results as

Are all binary and are stored in

These results are stored in bits (8 KB), using 4 byte integers, each storing 32 combinations.

Optionally, in the step e of extracting FAST corner features based on GPU acceleration in the visual loop detection module based on GPU acceleration, multiple warps are used for non-maximum suppression, and the steps are as follows:

step (1): warp reads the first row of the cell, and each thread acquires a response of 1 pixel;

step (2): the whole warp starts molar neighborhood inhibition, each thread verifies whether the response reaches the maximum value in the molar neighborhood, and the non-maximum thread inhibits the response;

and (3): warp moves to the next row, the operations of the

steps

1 and 2 are repeated, and the whole cell grid is processed. The first thread in the block selects the maximum result for the cell and writes it into global memory, completing the processing of the cell.

In general, compared with the prior art, the technical scheme of the invention can achieve the following beneficial effects

(1) The invention provides a GPU acceleration-based SLAM method for fusing a 3D laser radar, an IMU and a vision multi-sensor, wherein the laser radar and the IMU are tightly coupled at the front end and the rear end, a vision sensor is adopted in a loop detection part, the vision image texture is richer, the loop detection method is simpler and quicker than the laser radar, and the loop detection capability is stronger for scenes with unobvious characteristics or repeated characteristics. The three sensors are effectively combined, so that the complexity of data tight coupling of the three sensors is avoided, and a considerable effect is achieved;

(2) Compared with a common line and plane point matching mode based on characteristics, the method uses more points to calculate registration errors, so that the matching precision is higher, and the parallel structure of a GPU is utilized to accelerate realization, so that the real-time performance of point cloud matching is improved;

(3) And a visual sensor is used for loop detection, a characteristic detection part is deployed on a GPU for acceleration, and after the characteristic detection part is identified as loop, the characteristic detection part is confirmed again by a laser radar key frame, so that the requirement of accurate and quick loop detection is met.

Drawings

Fig. 1 is a general block diagram of a SLAM method of multi-sensor fusion based on GPU acceleration according to the present invention.

FIG. 2 is a laser radar point cloud matching acceleration strategy diagram of the SLAM method based on GPU acceleration and multi-sensor fusion.

FIG. 3 is a sliding window factor graph of the SLAM method based on GPU acceleration and multi-sensor fusion.

Fig. 4 is a diagram illustrating a visual loop detection acceleration strategy of a SLAM method based on GPU-accelerated multi-sensor fusion according to the present invention.

FIG. 5 is a flow chart of visual loopback detection of the SLAM method of multi-sensor fusion based on GPU acceleration according to the present invention.

Detailed Description

To make the objects, advantages and technical solutions of the present invention more clear and concrete, the following detailed description is made with reference to specific embodiments of the present invention and the accompanying drawings.

Fig. 1 is a general block diagram of one embodiment of a SLAM method for GPU-accelerated multi-sensor fusion based of the present invention, the sensors including 3D lidar, IMU, monocular camera, the method including a GPU-accelerated lidar/IMU tightly coupled front-end odometry module, a factor graph-based back-end optimization module, a point cloud map creation module, and a GPU-accelerated visual loop detection module.

A GPU acceleration-based SLAM method for multi-sensor fusion takes an IMU as a body coordinate system, and defines a sensor state as the following form:

（1）

wherein

And

are IMU linear acceleration and angular acceleration noise.

FIG. 2 is a point cloud matching acceleration strategy diagram of a laser radar based on SLAM method of multi-sensor fusion accelerated by GPU, comprising the following steps:

step 1: and (3) down-sampling: reading laser radar data, carrying out voxel grid down-sampling operation on radar point cloud data, specifically setting a voxel container with a fixed size (selected from 0.1m in the embodiment), dividing all point clouds into different voxels, and removing points far away from the grid and obviously wrong;

step 2: and motion compensation: performing motion compensation on point cloud distortion generated by motion in radar frames by using IMU motion prediction, firstly integrating IMU measurement between scanning start time and scanning end time of a laser frame, and then performing linear interpolation on IMU integration results by using a time stamp of each laser measurement point so as to project each point to the start time of the laser frame to complete motion compensation, wherein the IMU performs motion compensation in the future

The velocity, position, rotation at time are calculated as follows:

（2）

wherein

And

is white noise in IMU measurement

The IMU pre-integration obtains the relative motion between two timestamps of the radar, and the speed measurement, the position measurement and the rotation measurement between the ith moment and the jth moment are obtained and calculated as follows:

（3）

wherein

，

And

are all white noise in the pre-integration process

And 3, step 3: voxelized target point: setting a previous frame of point cloud data as a target point cloud by a host end (CPU) and storing the target point cloud in a container, searching the point cloud with the radius of the target point in a range of r by using a KD-tree and storing the point cloud in the container, copying the target point cloud from the host end (CPU) to an equipment end (GPU), wherein each target point cloud corresponds to a voxel index, initializing a mean value and a covariance (the step is skipped in a non-initialization mode);

and 5: calculating the mean and covariance: the method comprises the steps that a host computer end (CPU) designs a kernel function according to equipment conditions and the number of point clouds, the kernel width is set, the number of threads is equal to the number of voxels, a driving equipment end (GPU) models the point clouds into Gaussian distribution according to indexes, and each input point cloud is modeled into Gaussian distribution

And calculating the mean value

Sum covariance

Wherein the covariance is calculated from k adjacent points within the radius r of the point cloud;

and 6: calculating a relative transformation: calculating the corresponding relation between voxels by a device side (GPU), calculating the distance between a source point and adjacent points within the radius r of a plurality of target points, calculating the distance residual error by adopting maximum likelihood estimation as shown in the formula (4)Obtaining maximum posterior probability, using Gauss-Newton method to iterate and optimize, using point number in voxel to weight error item, obtaining information matrix according to voxel covariance matrix, iterating until target cost function is converged, finding transformation matrix with highest confidence coefficient

The cost function is shown as formula (5), and finally the matrix is transformed

To the host side (CPU);

（4）

where s represents the origin, t represents the target point, and j represents the number of the plurality of target points

（5）

Where i represents the number of neighboring points.

Fig. 3 is a back-end sliding window factor graph of the SLAM method based on GPU-accelerated multi-sensor fusion of the present invention, including an IMU pre-integration factor, a lidar matching cost factor, and a visual loopback factor. The IMU pre-integration factor is given by the formula (3), the lidar matching cost factor is given by the formula (5), and the vision loop factor is given by the formula (6). Specifically, a factor graph with a fixed window size of 20 state nodes is constructed, and constraints among optimization factors are as follows: the constraint between the edged key frame and the latest frame is a unary edge, and other factors such as IMU pre-integration factor, laser radar matching cost factor and visual loop factor are binary edges. When new sensor states are added, the factor graph is optimized and old frames are rimmed using a fixed lag smoothing method, the rimmed frames being managed by a keyframe mechanism. The spatial distribution of the key frames is well distributed and there is sufficient overlap with the latest frame. An overlap rate threshold is set to manage the key frames, and if the overlap rate is less than the threshold (90% is selected in this embodiment), the key frames are set. Otherwise the frame is discarded. And the marginal key frames of the factor graph are used as initial estimation of the state of the sensor and sent to a point cloud map creation module, and the point cloud map creation module continuously receives the marginal key frames, obtains a transformation relation among the key frames and splices the key frames into a globally consistent point cloud map.

Fig. 4 is a diagram of a visual loop detection acceleration strategy for a SLAM method based on GPU-accelerated multi-sensor fusion. The vision loop detection module based on GPU acceleration, wherein the acceleration part is FAST corner feature detection, and the specific steps are as follows:

step c: determining a region of interest: and the equipment side (GPU) calculates a FAST corner in warp, performs brightness comparison on the selected pixel and 16 pixels with the peripheral radius of 3, and then stores the brightness comparison result of the 16 pixels as a digit group to be used as an index of a lookup table. Calculate all possible results as

Are all binary and are stored in

These results are stored in bits (8 KB), using 4 byte integers, each storing 32 combinations. The index hits in a cache table, so as to determine a region of interest (ROI);

step e: non-maxima suppression: a device end (GPU) divides a corner response graph into regular cell grids, each cell is allocated with a thread block, a plurality of warps are used for processing one cell to carry out non-maximum suppression, the first row of the cell is read by the warps, each thread acquires the response of 1 pixel, then the whole warp starts molar neighborhood suppression, each thread verifies whether the response reaches the maximum value in the molar neighborhood, the non-maximum thread suppresses the response, then the warp moves to the next row, the operations of the

steps

1 and 2 are repeated, and the whole cell grid is processed;

Fig. 5 is a flow chart of visual loop detection of SLAM method based on GPU accelerated multi-sensor fusion. The visual loop detection is realized by firstly constructing a visual odometer, estimating a transformation matrix by the visual odometer through a characteristic point method, and correcting the scale once every 20 frames through a laser radar. The loop detection module performs detection based on a bag-of-words mode by vision. The specific steps of the vision loop detection based on GPU acceleration are as follows:

step 1: feature extraction: feature extraction as shown in fig. 4, the visual loop detection acceleration strategy of the SLAM method based on the GPU-accelerated multi-sensor fusion is that after FAST features are extracted, the results are transmitted to a host (CPU), BRIEF descriptors of feature points are calculated and converted into words (the bag-of-words dictionary tree is created), the words of the current frame are determined, and a bag-of-words vector of the current frame is formed;

and 2, step: dictionary query: finding out a key frame of the same word as the current frame;

and step 3: similarity scoring: comparing the difference between the bag-of-words vectors, wherein the calculation formula is shown as formula (6), and if the difference between the two frame bag-of-words vectors is less than the threshold (90% is selected in the embodiment), jumping to step 7;

（6）

wherein the content of the first and second substances,

representing the current frame (source) bag-of-words vector,

representing target bag of words vector, N representing number of features

And 4, step 4: group matching: and combining the frames with close time into a group, and matching only one of the frames during matching. If the group matching is not passed, jumping to step 7;

and 5: and (3) checking the reasonability of time: for a detected group match, it is computed that k consecutive group matches have been established before, and the highest scoring group match is considered a loop. If the time rationality check is not passed, jumping to step 7;

step 6: structural consistency checking: and further confirming the consistency of the bag-of-words vectors in the structural order of the current frame and the loop frame with the same bag-of-words vectors. If the structure consistency check is not passed, jumping to step 7;

and 7: ending the loop, if detecting the loop, outputting a loop frame, otherwise ending the loop;

and after the loop is detected, further confirming by the laser radar key frame with the same timestamp. And after the loop is confirmed to be a loop, adding the loop factor to a back-end factor graph to execute one-time global optimization, and simultaneously optimizing the position and attitude state of the sensor and the point cloud map.

Claims

1. A multisensor fusion SLAM method based on GPU acceleration is characterized by comprising a laser radar/IMU tightly-coupled front-end odometer module based on GPU acceleration, a back-end optimization module based on a factor graph, a point cloud map creation module and a visual loop detection module based on GPU acceleration:

step 1: down-sampling: reading laser radar data by a host Computer (CPU), carrying out voxel grid down-sampling operation on radar point cloud data, dividing all point clouds into different voxels, and removing points which are far away from a grid and are obviously wrong;

step 2: and (3) motion compensation: a host Computer (CPU) uses IMU motion prediction to carry out motion compensation on point cloud distortion generated by motion in a radar frame, and projects each point cloud to the starting time of a laser radar frame;

and 7: updating the point cloud: setting the current frame point cloud as a target point cloud, and waiting for next calculation;

the factor graph-based back-end optimization module constructs a sliding window comprising an IMU pre-integration factor, a laser radar matching cost factor and a visual loopback factor. When a new sensor state is added, a fixed lag smoothing method is used for optimizing a factor graph and feeding back an updated pose state to a laser radar/IMU tightly-coupled front-end odometer module based on GPU acceleration for correction, and the marginalized key frame is used for constructing a point cloud map. The visual loopback detection module based on GPU acceleration firstly extracts FAST corner features based on GPU acceleration, and then performs dictionary query, similarity scoring, group matching, time rationality check and structure consistency check. And when the loop is determined, adding the loop factor into a back-end optimization module based on the factor graph to execute global optimization.

2. The SLAM method based on GPU-accelerated multi-sensor fusion of claim 1, wherein in the factor graph-based back-end optimization module, the size of the sensor state nodes in the constructed sliding window factor graph is fixed to 20. The constraints between the optimization factors are: the constraint between the edged key frame and the latest frame is a unary edge, and other factors such as IMU pre-integration factor, laser radar matching cost factor and visual loop factor are binary edges.

3. The SLAM method for multi-sensor fusion based on GPU acceleration as described in claim 1, wherein the visual loop detection module based on GPU acceleration, wherein the FAST corner feature extraction of GPU acceleration comprises the following steps:

step e: non-maxima suppression: a device end (GPU) divides the angular point response image into regular cell grids, each cell is allocated with a thread block, a plurality of warps are used for processing one cell, and non-maximum suppression is carried out;

4. The SLAM method based on GPU-accelerated multisensor fusion of claim 3, wherein step c calculates FAST corner in device-side (GPU) warp, first selects 16 pixels with radius of 3, and then stores the brightness comparison result of 16 pixels as a digit array as index of lookup table. Calculate all possible results as

Are all binary and are stored in

These results are stored in bits (8 KB) using 4 byte integers, each storing 32 combinations.

5. The SLAM method based on GPU-accelerated multi-sensor fusion of claim 3, wherein step e uses multiple warps for non-maximum suppression, and comprises the following steps:

and (3): and (3) warp is moved to the next row, the operations in the steps (1) and (2) are repeated, the whole cell grid is processed, the first thread in the block selects the maximum result for the cell and writes the maximum result into the global memory, and the cell is processed.