CN116242374A

CN116242374A - Direct method-based multi-sensor fusion SLAM positioning method

Info

Publication number: CN116242374A
Application number: CN202310470533.5A
Authority: CN
Inventors: 彭侠夫; 成湘; 张霄力
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-06-09

Abstract

A multi-sensor fusion SLAM positioning method based on a direct method relates to the field of mobile robot positioning navigation. The camera is tracked by using a direct method of pixel point information with obvious gradient in the image, IMU pre-integration information is used as an initial pose, and an initial value is provided for tracking of the direct method. The image frames are divided into key frames and non-key frames depending on whether the tracking is successful. And constructing a visual factor graph, inserting an IMU residual error item and a GPS residual error item into the visual factor graph to construct a new fusion factor graph, and optimizing the fusion factor graph to finish the optimization flow of the system algorithm. The sliding window optimization-based back-end optimization strategy is used for fixing the calculation complexity of back-end attitude optimization, the multi-sensor tight coupling fusion strategy is used, the defect of marginalization is made up by utilizing early-in-ancient information, the marginalization advantage is reserved, and the system precision is improved. Can work in a purely visual mode. The operation can be continued in the visual odometer mode even if the GPS signal is not normally received.

Description

Direct method-based multi-sensor fusion SLAM positioning method

Technical Field

The invention relates to the field of mobile robot positioning navigation, in particular to a multi-sensor fusion SLAM positioning method based on a direct method.

Background

With the gradual penetration of the application of the robot, the autonomous navigation capability of the robot becomes one of the bottlenecks for limiting the improvement of the working capability of the robot, wherein the positioning is the basis and the foundation for realizing the autonomous navigation of the robot, and is one of core technologies for limiting the wide application of the robot. Therefore, simultaneous localization and mapping (SLAM) is beginning to be one of the current research hotspots. SLAM refers to the ability to continuously track the trajectory of a moving robot in an unknown environment while mapping it. The SLAM is supported by various sensors, and as the single sensor positioning is difficult to ensure the long-term stable work of the robot in various environments, the positioning method which fuses a plurality of sensor data in an unknown environment is increasingly researched and applied.

The sensors used by the current mainstream SLAM algorithm comprise a monocular/binocular/depth camera, an IMU inertial measurement unit and a laser radar, and a GPS signal and a wheel gauge are used in a fused mode. In addition, the visual SLAM also has algorithms such as a characteristic point method based on sparse distribution and calculating a descriptor thereof, a direct method based on pixel information with obvious gradient information, a semi-direct method based on characteristic points and not calculating a descriptor, and the like. The IMU and the monocular camera have advantages and disadvantages, but the IMU and the monocular camera are also in complementary relation, and after the IMU and the monocular camera are fused, the IMU and the monocular camera have the following advantages: the problem of scale uncertainty of a monocular camera is solved; the problem that monocular pure rotation cannot be estimated is solved, and the influence of textures, illumination and dynamic environment on the visual SLAM is reduced; the IMU is simple in zero offset estimation and eliminates integral drift; therefore, the multi-sensor fusion can be carried out to construct an algorithm with higher speed, more accurate calculation and more robust system, which has great research significance and practical value. In addition, the multi-sensor fusion is often divided into two modes of filtering-based and factor graph-based optimization, namely a tightly coupled fusion scheme and a loosely coupled fusion scheme, so that people can select a reasonable fusion scheme for upgrading the performance of equipment according to the equipment.

Disclosure of Invention

The invention aims to provide a direct method-based multi-sensor fusion SLAM positioning method, which improves the robustness and the operation speed in the existing monocular vision IMU fusion algorithm, and simultaneously adds GPS to perform global fusion positioning, and adds optimization constraint to enable the coordinates of the GPS to be aligned with the real world and also improve the accuracy of a system.

The invention comprises the following steps:

1) Acquiring an image frame of a monocular camera, IMU output information and GPS longitude and latitude height information, resolving the GPS longitude and latitude height information to obtain the current world coordinate system coordinate, aligning the information into a structural body data according to a time stamp, and inputting the structural body data into a system for operation;

2) Carrying out an initialization flow of a vision part algorithm, constructing an image pyramid for the first frame image by utilizing the structural body data obtained in the step 1), screening candidate gradient points through a gradient threshold value, projecting the candidate gradient points initialized by the first frame into a second frame, estimating initial relative motion by adopting the image pyramid, and initializing the pose transformation from an inverse depth and an IMU coordinate system to a world coordinate system; optimizing the two frames according to a luminosity error function formula, and judging whether initialization is finished or not according to the displacement of the optimized result;

3) Performing an image tracking flow of an algorithm, using the pose between two frames obtained by IMU pre-integration as an initial pose, performing optimization calculation based on luminosity errors on the front and rear two frames of images based on Gaussian Newton method, and tracking the movement of a camera; in the tracking process, determining whether to take the current frame as a new key frame according to the set three conditions comprises: (1) the system is just started, and the stored frame number is 1; (2) the translation, rotation or luminosity change proportion of the pixels of the current frame is overlarge; (3) the time of the current frame from the last key frame exceeds the maximum time interval between key frames;

a. if the current frame is determined to be used as a key frame, activating the gradient points extracted from the current frame, namely using a depth filter to converge inverse depth optimization calculation of the gradient points, adding the key frame into a visual factor graph to perform sliding window optimization, and performing marginalization by adopting a Shull complement;

b. if it is decided to use the current frame as a non-key frame, then only the pose estimate between the current and previous frames is used to update the inverse depth, the system is operated in a "vision only" mode;

4) Performing an algorithm vision-IMU-GPS data fusion flow, copying the factor graph as a fusion factor graph based on the vision factor graph in the step 3), adding an IMU pre-integration factor and a GPS factor, and performing joint optimization by using hysteresis marginalization; the fusion factor graph also uses a sliding window optimization method, but the marginalization strategy is to delay n key frames, and then carry out hysteresis marginalization with the marginalization sequence of the visual factor graph so as to achieve the aim of keeping the size of the sliding window; hysteresis marginalization ensures precision, and also makes up for marginalization defects and maintains marginalization advantages, and the system enters a vision-IMU-GPS model for operation.

The step 2) specifically comprises the following steps:

2.1 During initialization, an image pyramid is built for a first frame image of the initial two frames, the pyramid is traversed, camera internal parameters of each layer of image are calculated, the layer 1 image is divided into a plurality of grids with the grid size of 32 x & lt 32 & gt, then the light intensity differences dx and dy in the x and y directions of pixel points in the pixel block grids are calculated, and the square sum of the light intensity differences is obtained to serve as a gradient

Counting the gradient of each pixel block, and selecting the median as a gradient threshold value of the pixel block extraction candidate point;

2.2 Traversing each pixel block with n x n size in the layer 1 image, randomly selecting one direction in a pixel coordinate system by traversing pixel points in the pixel block to calculate gradient dI of pixels in the block, and recording a pixel position p with the maximum gradient and larger than a threshold value as a candidate point; if the extracted pixel points are too few, reducing the size of the block by 0.5n for recursive resampling; otherwise, the size of the block is increased to be 2n by 2n for resampling; the same applies to the generation of recursion. In the pyramid residual layer, the selection is performed in the same way, and the following two conditions need to be satisfied: (1) the gradient must be greater than a set threshold; (2) the pixel points in the pixel coordinate system are the largest points in the directions of x, y, x+y and x-y, and finally, each layer of the image pyramid is associated;

2.3 Calculating a transformation quaternion of the current camera coordinate system and the gravity alignment; acquiring a period of time based on the IMU information acquired in step 1) and aligned with the image timeAcceleration data and taking an average value as the gravity direction of the current frame; simultaneously constructing a sliding window with a fixed size to save the gravity direction of each frame, and optimizing the IMU gravity vector and the actual gravity direction [0, 9.8 ] by using sliding average filtering]Computing rotation as current frame gravity alignment

2.4 Pyramid optimization of the second frame image of the first two frames to complete initialization; the second frame builds a photometric error function based on the points of the first frame:

wherein N is _p Is a pixel set used in an algorithm, II is Huber norm, p represents projection of a spatial point p corresponding to a pixel point in a frame i on a frame j, and t _j Representing the j-th frame exposure time; to make the algorithm applicable to image sequences with unknown exposure time, affine photometric formula is used

Normalizing the luminosity; then, starting from the top layer to the bottom layer of the image pyramid, optimizing each layer of the image pyramid by using the identity matrix as an initial value through a Gauss Newton method, and judging whether the displacement is large enough or not; if the displacement is large enough and the tracking frame number is greater than 5, finishing initialization, setting a first frame as a key frame, activating the gradient points extracted from the first frame, and adding the gradient points into sliding window optimization of a visual factor graph; otherwise repeating steps 2.1) to 2.4).

The step 3) specifically comprises the following steps:

3.1 Entering the tracking flow of the algorithm: pre-integrating the IMU data which are acquired in the step 1) and aligned with the image time to obtain pose transformation between two frames, wherein a pre-integration formula is as follows:

in the method, in the process of the invention,

and->

B is _k+1 Relative to b _k Pose change of (2); the algorithm takes the pose obtained by IMU pre-integration as the initial pose of the direct method;

3.2 Using the initial pose to project the point of the reference frame into the current frame to construct a photometric error function; starting iterative optimization on candidate points downwards from the top layer of the image pyramid by using a Levenberg-Marquardt algorithm by taking the minimum luminosity error as an optimization target, namely iteratively optimizing the inverse depth, luminosity parameters and pose from coarse to fine; there are several different optimization processing operations: (1) if the luminosity error is greater than 60% of the energy threshold, re-optimizing the amplification threshold; (2) if the energy of a certain layer in the pyramid is more than 2 times of the minimum value, directly judging that tracking fails; (3) if the photometric error meets the condition, the tracking is successful.

3.3 Judging whether the current frame can become a key frame according to three preset conditions: (1) the system is just started, and the stored frame number is 1; (2) the translation, rotation or luminosity change proportion of the pixels of the current frame is overlarge; (3) the time of the current frame from the last key frame exceeds the maximum time interval between key frames;

3.4 If the current frame becomes a non-key frame, updating the inverse depth range of the candidate point in the step 2.2); obtaining an epipolar searching range according to the inverse depth range to calculate the image gradient and the epipolar included angle; if the included angle is too large, the two frames tend to be vertical, and the result error is considered to be large, so that the candidates are not updatedA point parameter; so when the included angle is proper, the position p with the minimum luminosity error can be searched on the polar line according to the fixed step length ₁ And a second smallest position p with a radius greater than 2 ₂ Dividing the latter by the former to obtain a ratio

As a quality attribute, the greater the quality, the better the position; further detailed searching is carried out on the basis, the optimization is carried out along polar lines by using a Gauss Newton method until the increment is small enough, the optimal position of projection is obtained at the moment, and a new inverse depth range is calculated by using the optimal position;

3.5 If the current frame needs to be a key frame, the key frames satisfying the following 3 conditions in the key frame sequence are first marginalized: (1) the change is large compared to the photometric parameters of the reference frame; (2) if the key frame is more than 7 and the distance between the key frame and the current frame is farthest; (3) less than 5% of candidate points are activated; deleting the points and the outer points of the marginalized frame, adding the current frame into the sequence, and constructing new key frames and residual errors of all key frames in the sequence to obtain an error function; constructing a distance map in the new key frame, when the candidate points meet the conditions that the polar line searching distance is less than 8 and the quality attribute in the step 3.4) is good and the inverse depth is positive, selecting and activating the candidate points, namely optimizing the inverse depth of the points by using a depth filter, and adding the activated points into an error function after activation to prepare calculation luminosity errors;

3.6 Starting sliding window optimization on the activation point of the visual factor graph according to the luminosity error function, optimizing pose, luminosity parameters and inverse depth by using a Gauss Newton method, and deriving an optimization variable:

(1) and (3) deriving photometric parameters:

wherein w is _p For Huber weights, a and b are parameters of the photometric affine transformation.

(2) Deriving the relative pose:

in the method, in the process of the invention,

respectively the pixel points are at p _j Gradients in horizontal and vertical direction at +.>

For normalizing coordinates>

f _x And f _y Is an internal reference of the camera;

(3) inverse depth derivative:

wherein ρ is the inverse depth, t _z Is a translation of the camera in the z-direction;

after the Jacobian matrix of each optimization variable is obtained, a Hessian matrix can be constructed to optimize the Gaussian Newton method so as to obtain more accurate pose; after the completion, the vision module of the algorithm can run independently and provide a basis for subsequent multi-sensor fusion.

Said step 4) comprises the steps of:

4.1 Copying the factor graph based on the visual factor graph in the step 3), adding an IMU pre-integration residual factor and a GPS residual factor as a fusion factor graph, and performing joint sliding window optimization by using hysteresis marginalization;

4.2 An IMU pre-integration residual factor added is:

wherein r represents residual terms corresponding to the position p, the attitude q, the velocity v, the IMU accelerometer bias ba and the gyroscope bias bg respectively,

representing the position of the IMU coordinate system relative to the world coordinate system at frame j,/>

Representing rotation of the IMU coordinate system to the world coordinate system at the ith frame, +.>

Indicating acceleration deviation in the ith frame, +.>

Representing quaternion multiplication, ++>

And->

Representing measured position, attitude and velocity;

4.3 The added GPS residual factor is:

where the time stamps k E (i, j) of the GPS data are the time stamps of the images of the previous and subsequent frames,

the difference between the GPS location and the i-frame location representing the time stamp k, otherwise as shown in step 4.2);

4.4 Total residual error of algorithm is photometric error E _pj IMU residual term r _p,,,ba, And GPS position residual term

And (3) optimizing by taking the minimum total residual error as an optimization target through fusing the factor graph;

4.5 Firstly, carrying out a coarse initialization step of the IMU, namely, keeping a sliding window with the size of 7, starting to calculate residual errors according to an optimization target, if the calculated residual errors are smaller than a set value, enabling an algorithm to directly enter a VIO-GPS (vision-IMU-GPS fusion) mode, otherwise, repeating the step; updating the initial visual edge prior using IMU information after IMU initialization, re-linearizing variables in the factor graph while maintaining all visual and most inertial information available;

4.6 Further, the marginalization of the sliding window optimization of the fusion factor graph lags behind 4 key frames, and the marginalization operation is carried out according to the marginalization sequence of the visual factor graph, so that the algorithm can carry out more accurate optimization calculation on the fusion factor graph by utilizing the ancient early information, and a better pose result is obtained.

4.7 After optimizing for a period of time, if the difference of the magnitudes of the inverse depths of the optimized variables in the fusion factor graph is found to be larger after the inverse depths are converged to a certain degree, selecting to discard the old key frames, reducing the influence of the inverse depth noise of the old key frames, otherwise, continuing normal real-time operation of the algorithm; thereafter, the algorithm operates in a multi-sensor fusion mode of operation, continuously outputting a motion profile of an object using the algorithm in real time according to data inputted from the sensors.

The invention uses the direct method of pixel point information with obvious gradient in the image to track the camera, uses IMU pre-integral information as initial pose, and provides initial value for tracking of the direct method. The image frames are divided into key frames and non-key frames depending on whether the tracking is successful. Meanwhile, a visual factor graph is constructed, an IMU residual error item and a GPS residual error item are inserted into the visual factor graph to construct a new fusion factor graph, and then fusion factor graph optimization is started to finish the optimization flow of a system algorithm. In addition, the algorithm also uses a sliding window optimization-based back-end optimization strategy, the calculation complexity of back-end gesture optimization is fixed, and compared with a visual factor graph before the IMU and the GPS are inserted, n pieces of early-in-ancient information are reserved by the fusion factor graph, and then hysteresis marginalization is carried out according to the marginalization sequence of the visual factor graph. Because the algorithm is divided into two parts of the visual factor graph and the fusion factor graph, the algorithm can still keep certain performance before the initialization of the visual IMU is completed or under the condition that the IMU cannot work, and can work in a pure visual mode. The algorithm may continue to operate in a Visual Inertial Odometer (VIO) mode even if the GPS signal is not normally received. Other sensors can be complemented when a single sensor is affected, the robustness and reliability of the multi-sensor fusion positioning algorithm in the unknown environment are reflected, and the method has high practical value and wide application scenes.

Compared with the prior art, the invention has the following advantages and beneficial results:

1. the invention uses a direct method based on luminosity error, directly utilizes luminosity information (gradient) of pixel level of the candidate point, saves the relatively time-consuming operation of extracting the feature descriptors compared with the feature point method, also allows normal operation in a scene with unobvious feature points, has relatively better instantaneity and improves the algorithm calculation efficiency. For the initial pose necessary for the direct method, the scheme of multi-sensor fusion provides the pose obtained by IMU pre-integration for the direct method, and compared with the scheme of the traditional scheme, the algorithm is more rapid and accurate.

2. The method adopts a multi-sensor tight coupling fusion strategy, the accuracy of the algorithm can be improved by utilizing the original information of the image through tight coupling, and meanwhile, a hysteresis marginalization method is adopted for the fusion factor graph, and a hysteresis n-frame marginalization method is adopted, so that the algorithm can finish initialization and optimization by utilizing the early ancient information, and compared with the conventional marginalization method, the accuracy and the robustness of the algorithm are better.

3. The invention adopts a key frame selection strategy based on luminosity error information, and determines whether the current frame is a key frame or not by judging whether the quantity of activation point information of inverse depth convergence in a factor graph is enough, whether exposure change is overlarge or not and other conditions, and the conventional strategy of key frame selection of a characteristic point method is closer to the algorithm itself.

4. The invention adopts the strategy of GPS data fusion, can convert the motion trail calculated by the algorithm into the northeast coordinate system, namely can correspond to the specific position of the real world, is closer to the reality, and improves the credibility of the motion trail.

Drawings

FIG. 1 is a visual factor pictorial representation of an optimization flow of the present invention.

FIG. 2 is a graphical illustration of a fusion factor of the multi-sensor fusion optimization of the present invention.

FIG. 3 is a flow chart of a method of SLAM localization based on direct method multi-sensor fusion of the present invention.

Detailed Description

The following examples will illustrate the invention in detail with reference to the accompanying drawings.

As shown in fig. 3, an embodiment of a method for positioning SLAM based on direct multi-sensor fusion according to the present invention includes the following steps:

1) Acquiring an image frame of a monocular camera, IMU output information and GPS longitude and latitude information, resolving the GPS longitude and latitude information to obtain the current world coordinate system coordinate, aligning the information into a structural body data according to a time stamp, and inputting the structural body data into a system for operation;

2) And 3) carrying out an initialization flow of a vision part algorithm, constructing an image pyramid for the first frame image by utilizing the structural body data obtained in the step 1), screening candidate gradient points through a gradient threshold value, then projecting the candidate gradient points initialized by the first frame into a second frame, estimating initial relative motion by adopting the image pyramid, and initializing the pose transformation from an inverse depth and an IMU coordinate system to a world coordinate system. And optimizing the two frames according to a luminosity error function formula, and judging whether initialization is finished or not according to the displacement of the optimized result. The visual factor diagram is intended to be seen in fig. 1.

Said step 2) comprises the steps of:

2.1 When initializing, establishing an image pyramid for the first frame image of the initial two frames, traversing the pyramid and calculating each imageThe camera internal reference of the layer image, for the layer 1 image, dividing into a plurality of grids with the grid size of 32 x 32, then calculating the light intensity differences dx and dy in the x and y directions of the pixel points in the pixel block grid to obtain the square sum of the light intensity differences as the gradient

And counting the gradient of each pixel block, and selecting the median as a gradient threshold value of the pixel block extraction candidate point.

2.2 In the layer 1 image, traversing each pixel block with the size of n, randomly selecting one direction in a pixel coordinate system by traversing pixel points in the pixel block to calculate the gradient dI of pixels in the block, and recording the pixel position p with the maximum gradient and larger than a threshold value as a candidate point. If the extracted pixel points are too few, reducing the size of the block by 0.5n for recursive resampling; whereas the size of the increased block is 2n x 2n resampling. The same applies to the generation of recursion. In the pyramid residual layer, the selection is performed in the same way, and the following two conditions need to be satisfied: (1) the gradient must be greater than a set threshold; (2) the pixel points in the pixel coordinate system are the largest points in the directions of x, y, x+y and x-y, and finally, each layer of the image pyramid is associated.

2.3 A transformation quaternion is calculated for the current camera coordinate system and gravity alignment. And (3) acquiring acceleration data for a period of time based on the IMU information which is acquired in the step (1) and aligned with the image time, and taking an average value as the gravity direction of the current frame. Simultaneously constructing a sliding window with a fixed size to save the gravity direction of each frame, and optimizing the IMU gravity vector and the actual gravity direction [0, 9.8 ] by using sliding average filtering]Computing rotation as current frame gravity alignment

2.4 A second frame image pyramid of the first two frames is optimized to complete the initialization. The second frame builds a photometric error function based on the points of the first frame:

in N _p Is a pixel set used in an algorithm, II is Huber norm, p represents projection of a spatial point p corresponding to a pixel point in a frame i on a frame j, and t _j Indicating the j-th frame exposure time. In order to make the algorithm applicable to image sequences with unknown exposure time, an affine photometric formula is used

And (5) normalizing the luminosity. And then starting from the top layer to the bottom layer of the image pyramid, optimizing each layer of the image pyramid by using the identity matrix as an initial value through a Gauss Newton method, and judging whether the displacement is large enough. If the displacement is large enough and the tracking frame number is greater than 5, finishing initialization, setting a first frame as a key frame, activating the gradient points extracted from the first frame, and adding the gradient points into sliding window optimization of a visual factor graph; otherwise repeating steps 2.1) to 2.4).

3) And (3) performing an image tracking flow of an algorithm, and tracking the movement of a camera by performing optimization calculation based on luminosity errors on the front and rear two frames of images based on a Gaussian Newton method by using the pose between the two frames obtained by IMU pre-integration as an initial pose. In the tracking process, determining whether to take the current frame as a new key frame according to the set three conditions comprises: (1) the system is just started, and the stored frame number is 1; (2) the translation, rotation or luminosity change proportion of the pixels of the current frame is overlarge; (3) the time of the current frame from the last key frame exceeds the maximum time interval between key frame frames.

b. if it is decided to treat the current frame as a non-key frame, the inverse depth is updated using only pose estimates between the current frame and the previous frame. At this point, the system is operating in a "vision only" mode. Said step 3) comprises the steps of:

3.1 A tracking flow of the algorithm is entered. Pre-integrating the IMU data which are acquired in the step 1) and aligned with the image time to obtain pose transformation between two frames, wherein a pre-integration formula is that

In the method, in the process of the invention,

and->

B is _k+1 Relative to b _k Pose change of (a). The algorithm takes the pose obtained by IMU pre-integration as the initial pose of the direct method.

3.2 Using the initial pose to project points of the reference frame into the current frame and construct a photometric error function. And (3) taking the minimum luminosity error as an optimization target, and starting iterative optimization on candidate points downwards from the top layer of the image pyramid by using a Levenberg-Marquardt algorithm, namely iteratively optimizing the inverse depth, luminosity parameters and pose from coarse to fine. There are several different optimization operations: (1) if the luminosity error is greater than 60% of the energy threshold, re-optimizing the amplification threshold; (2) if the energy of a certain layer in the pyramid is more than 2 times of the minimum value, directly judging that tracking fails; (3) if the photometric error meets the condition, the tracking is successful.

3.3 Judging whether the current frame can become a key frame according to three preset conditions: (1) the system is just started, and the stored frame number is 1; (2) the translation, rotation or luminosity change proportion of the pixels of the current frame is overlarge; (3) the time of the current frame from the last key frame exceeds the maximum time interval between key frame frames.

3.4 If the current frame becomes a non-key frame, updating the inverse depth range of the candidate point of step 2.2). And calculating the image gradient and the size of the polar line included angle according to the polar line searching range obtained from the inverse depth range. If the included angle is too large, the two frames tend to be vertical, the result error is considered to be large, and the candidate point parameters are not updated. So when the included angle is proper, the position p with the minimum luminosity error can be searched on the polar line according to the fixed step length ₁ And a second smallest position p with a radius greater than 2 ₂ Dividing the latter by the former to obtain a ratio

As the quality attribute, the greater the quality, the better the position. On this basis, a further detailed search is performed, and optimization is performed along the epipolar line using gauss-newton method until the increment is sufficiently small, at which time the optimal position of the projection is obtained, from which the new inverse depth range is calculated.

3.5 If the current frame needs to be a key frame, the key frames satisfying the following 3 conditions in the key frame sequence are first marginalized: (1) the change is large compared to the photometric parameters of the reference frame; (2) if the key frame is more than 7 and the distance between the key frame and the current frame is farthest; (3) the activated candidate points are less than 5%. The points and outliers of the marginalized frame are removed, then the current frame is added to the sequence, and the residuals of the new key frame and all key frames in the sequence are constructed to obtain an error function. And then constructing a distance map in the new key frame, when the candidate points meet the conditions that the polar line searching distance is less than 8 and the quality attribute of the step 3.4) is good and the inverse depth is positive, selecting and activating the candidate points, namely optimizing the inverse depth of the points by using a depth filter, and adding the activated points into an error function after activation to prepare calculation luminosity errors.

(1) and (3) deriving photometric parameters:

(2) Deriving the relative pose:

in the middle of

For normalizing coordinates>

f _x And f _y Is an internal reference of the camera;

(3) inverse depth derivative:

after the Jacobian matrix of each optimization variable is obtained, a Hessian matrix can be constructed to optimize the Gaussian Newton method so as to obtain more accurate pose. After step 3), the vision module of the algorithm can run independently and provide a basis for subsequent multi-sensor fusion.

4) And (5) performing an algorithm vision-IMU-GPS data fusion flow. Based on the visual factor graph in the step 3), copying the factor graph as a fusion factor graph, adding an IMU pre-integration factor and a GPS factor, and performing joint optimization by using hysteresis marginalization. The fusion factor graph also uses a sliding window optimization method, but the marginalization strategy is to delay n key frames, and then carry out lag marginalization with the marginalization sequence of the visual factor graph so as to achieve the purpose of keeping the size of the sliding window. Hysteresis marginalization ensures accuracy, and also makes up for marginalization defects and retains marginalization advantages. At this point, the system enters the "vision-IMU-GPS" model operation. A diagram of a fusion factor for a multisensor fusion optimization is shown schematically in fig. 2.

Said step 4) comprises the steps of:

4.1 Based on the visual factor graph in the step 3), copying the factor graph, adding an IMU pre-integral residual factor and a GPS residual factor as a fusion factor graph, and carrying out joint sliding window optimization by using hysteresis marginalization.

4.2 An added IMU pre-integral residual factor of

Indicating acceleration deviation in the ith frame, +.>

Representing quaternion multiplication, ++>

And

representing the measured position, attitude and velocity.

4.3 The added GPS residual factor is:

where the time stamps k e (i, j) of the GPS data, i and j are the image time stamps of the preceding and following frames,

the difference between the GPS location and the i-frame location, which represents the time stamp k, is otherwise the same as in step 4.2).

And (3) optimizing the fusion factor graph by taking the minimum total residual as an optimization target.

4.5 The fusion factor graph firstly carries out the rough initialization step of the IMU, namely, a sliding window with the size of 7 is maintained, the calculation of residual errors is started according to the optimization target, if the calculated residual errors are smaller than a set value, the algorithm directly enters a VIO-GPS (vision-IMU-GPS fusion) mode, otherwise, the step is repeated. The IMU information is used to update the original visual edge priors after IMU initialization, re-linearizing the variables in the factor graph while maintaining all visual and most inertial information available.

4.6 Further, the marginalization of the sliding window optimization of the fusion factor graph lags behind 4 key frames, and then the marginalization operation is carried out according to the marginalization sequence of the visual factor graph, so that the algorithm can carry out more accurate optimization calculation on the fusion factor graph by utilizing the ancient early information, and a better pose result is obtained.

4.7 If the difference of the magnitudes of the inverse depths of the optimization variables in the fusion factor graph after the optimization is carried out for a period of time is found to be larger, the old key frames are selected to be discarded, the influence of the inverse depth noise of the old key frames is reduced, and otherwise, the algorithm continues to normally run in real time. Thereafter, the algorithm operates in a multi-sensor fusion mode of operation, continuously outputting a motion profile of an object using the algorithm in real time according to data inputted from the sensors.

In summary, the algorithm of the invention uses a direct method based on luminosity errors, directly utilizes luminosity information (gradient) of pixel level of the candidate points, has relatively better instantaneity, and improves the calculation efficiency of the algorithm. For the initial pose necessary for the direct method, the scheme of tightly coupling and fusing the multiple sensors provides the pose obtained by IMU pre-integration for the method, so that the influence of noise is reduced, the accuracy of an algorithm can be improved by utilizing the original information of the image through tight coupling, and furthermore, the strategy of fusing GPS data is adopted, so that the method can correspond to the specific position of the real world, and the credibility of the motion trail is improved. In addition, a hysteresis marginalization method is adopted for the fusion factor graph, and a hysteresis n-frame marginalization method is adopted, so that the algorithm can finish initialization and optimization by utilizing the early ancient information, and compared with the conventional marginalization, the precision and the robustness of the algorithm are better. In addition, whether the current frame is a key frame or not is determined by judging whether the number of the activation point information of the inverse depth convergence in the factor graph is enough, whether the exposure change is overlarge or not and other conditions, and the conventional strategy for selecting the key frame by the bit point method is closer to the algorithm, so that the effect is better.

Claims

1. A direct method-based multi-sensor fusion SLAM positioning method is characterized by comprising the following steps:

4) Performing an algorithm vision-IMU-GPS data fusion flow, copying the factor graph as a fusion factor graph based on the vision factor graph in the step 3), adding an IMU pre-integration factor and a GPS factor, and performing joint optimization by using hysteresis marginalization; the fusion factor graph also uses a sliding window optimization method, but the marginalization strategy is to delay n key frames, and then carry out hysteresis marginalization with the marginalization sequence of the visual factor graph so as to achieve the aim of keeping the size of the sliding window; hysteresis marginalization ensures precision, and also makes up for marginalization defects and maintains marginalization advantages, and the system enters a 'vision-IMU-GPS fusion' mode for operation.

2. The method for positioning SLAM based on direct method and multi-sensor fusion as claimed in claim 1, wherein in step 2), the initialization process for performing the vision part algorithm specifically comprises the following steps:

2.1 During initialization, an image pyramid is built for a first frame image of the initial two frames, the pyramid is traversed, camera internal parameters of each layer of image are calculated, the layer 1 image is divided into a plurality of grids with the grid size of 32 multiplied by 32, then the light intensity differences dx and dy in the x and y directions of pixel points in the pixel block grids are calculated, and the square sum of the light intensity differences is obtained to be used as a gradient

2.2 Traversing each pixel block with the size of n multiplied by n in the layer 1 image, randomly selecting one direction in a pixel coordinate system by traversing pixel points in the pixel block to calculate the gradient dI of pixels in the block, and recording the pixel position p with the maximum gradient and larger than a threshold value as a candidate point; if the extracted pixel points are too few, reducing the size of the block by 0.5n multiplied by 0.5n for recursive resampling; otherwise, the size of the block is increased to be 2n multiplied by 2n for resampling; generating recursion and so on; in the pyramid residual layer, the selection is performed in the same way, and the following two conditions need to be satisfied: (1) the gradient must be greater than a set threshold; (2) the pixel points in the pixel coordinate system are the largest points in the directions of x, y, x+y and x-y, and finally, each layer of the image pyramid is associated;

2.3 Calculating a transformation quaternion of the current camera coordinate system and the gravity alignment; based on the IMU information which is acquired in the step 1) and aligned with the image time, acquiring acceleration data for a period of time and taking an average value as the gravity direction of the current frame; simultaneously constructing a sliding window with a fixed size to save the gravity direction of each frame, and optimizing the IMU gravity vector and the actual gravity direction [0, 9.8 ] by using sliding average filtering]Computing rotation as current frame gravity alignment

wherein N is _p Is a pixel set used in an algorithm, II is Huber norm, p represents projection of a spatial point p corresponding to a pixel point in a frame i on a frame j, and t _j Representing the j-th frame exposure time; to enable the algorithm to be applied to an image sequence with unknown exposure time, an affine photometric formula is used

Normalizing the luminosity; starting from the top layer to the bottom layer of the image pyramid, optimizing each layer of the image pyramid by using the identity matrix as an initial value through a Gauss Newton method, and judging whether the displacement is large enough or not; if the displacement is large enough and the tracking frame number is greater than 5, finishing initialization, setting a first frame as a key frame, activating the gradient points extracted from the first frame, and adding the gradient points into sliding window optimization of a visual factor graph; otherwise repeating steps 2.1) to 2.4).

3. The method for positioning SLAM based on direct method and multi-sensor fusion as set forth in claim 1, wherein in step 3), said image tracking process for performing algorithm specifically comprises the following steps:

in the method, in the process of the invention,

and->

3.2 Using the initial pose to project the point of the reference frame into the current frame to construct a photometric error function; starting iterative optimization on candidate points downwards from the top layer of the image pyramid by using a Levenberg-Marquardt algorithm by taking the minimum luminosity error as an optimization target, namely iteratively optimizing the inverse depth, luminosity parameters and pose from coarse to fine; there are several different optimization processing operations: (1) if the luminosity error is greater than 60% of the energy threshold, re-optimizing the amplification threshold; (2) if the energy of a certain layer in the pyramid is more than 2 times of the minimum value, directly judging that tracking fails; (3) if the luminosity error meets the condition, the tracking is successful;

3.4 If the current frame becomes a non-key frame, updating the inverse depth range of the candidate point in the step 2.2); obtaining an epipolar searching range according to the inverse depth range to calculate the image gradient and the epipolar included angle; if the included angle is too large, the two frames tend to be vertical, and the result error is considered to be large, so that the candidate point parameters are not updated; so when the included angle is proper, the position p with the minimum luminosity error can be searched on the polar line according to the fixed step length ₁ And a second smallest position p with a radius greater than 2 ₂ Dividing the latter byThe former obtains a proportion

3.5 If the current frame needs to be a key frame, the key frames satisfying the following 3 conditions in the key frame sequence are first marginalized: (1) the change is large compared to the photometric parameters of the reference frame; (2) if the key frame is more than 7 and the distance between the key frame and the current frame is farthest; (3) less than 5% of candidate points are activated; deleting the points and the outer points of the marginalized frame, adding the current frame into the sequence, and constructing new key frames and residual errors of all key frames in the sequence to obtain an error function; constructing a distance map in the new key frame, when the candidate points meet the conditions that the polar line searching distance is less than 8 and the quality attribute of the step 3.4) is good and the inverse depth is positive, selecting and activating the candidate points, namely optimizing the inverse depth of the points by using a depth filter, and adding the activated points into an error function after activation to prepare calculation luminosity errors;

(1) and (3) deriving photometric parameters:

wherein w is _p For Huber weights, a and b are parameters of the photometric affine transformation;

(2) deriving the relative pose:

in the method, in the process of the invention,

For normalizing coordinates>

f _x And f _y Is an internal reference of the camera;

(3) inverse depth derivative:

and after the Jacobian matrix of each optimization variable is obtained, constructing a Hessian matrix to optimize the Gaussian Newton method so as to obtain more accurate pose.

4. The direct method-based multi-sensor fusion SLAM positioning method as set forth in claim 1, wherein in step 4), said algorithmic vision-IMU-GPS data fusion process specifically includes the steps of:

4.2 An IMU pre-integration residual factor added is:

Indicating acceleration deviation in the ith frame, +.>

Representing quaternion multiplication, ++>

And->

Representing measured position, attitude and velocity;

4.3 The added GPS residual factor is:

4.4 Total residual error of algorithm is luminosity errorDifference E _pj IMU residual term r _p,,,ba, And GPS position residual term r _gk And (3) optimizing by taking the minimum total residual error as an optimization target through fusing the factor graph;

4.5 Firstly, carrying out a coarse initialization step of the IMU, namely, keeping a sliding window with the size of 7, starting to calculate residual errors according to an optimization target, if the calculated residual errors are smaller than a set value, enabling an algorithm to directly enter a VIO-GPS mode, namely, vision-IMU-GPS fusion, otherwise, repeating the step; updating the initial visual edge prior after the IMU initialization by using IMU information, re-linearizing variables in the factor graph while maintaining all visual and most inertial information;

4.6 4 key frames are lagged behind in the marginalization of the sliding window optimization of the fusion factor graph, and marginalization operation is carried out according to the marginalization sequence of the visual factor graph, so that the algorithm can carry out more accurate optimization calculation on the fusion factor graph by utilizing the ancient early information, and a better pose result is obtained;