CN114966734A

CN114966734A - Bidirectional depth vision inertial pose estimation method combined with multi-line laser radar

Info

Publication number: CN114966734A
Application number: CN202210479056.4A
Authority: CN
Inventors: 彭刚; 李剑峰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-08-30

Abstract

The invention discloses a method for estimating a two-way depth vision inertial pose by combining a multi-line laser radar, which comprises the following steps: after recovering the depth information of the characteristic points by the aid of the laser radar point cloud, the two-way depth visual image is respectively tightly coupled with IMU data to obtain two pose estimates, and the two pose estimates are subjected to weighted fusion to obtain a visual odometer; taking the estimation result of the visual odometer or the IMU as prior pose estimation, and combining the prior pose estimation with laser point-line residual error to carry out iterative optimization to obtain a laser odometer; respectively carrying out closed-loop detection on the bidirectional depth visual image and the laser radar point cloud, and if the closed loop is verified to be effective, taking the relative pose transformation relation between the laser key frames corresponding to the current moment and the closed-loop moment as closed-loop constraint; and (4) performing factor graph optimization by taking the laser odometer constraint, the visual odometer constraint and the closed-loop constraint as constraint items to obtain the optimal pose estimation. The method has high-precision and robust pose estimation in scenes with poor illumination, missing textures and the like.

Description

Bidirectional depth vision inertial pose estimation method combined with multi-line laser radar

Technical Field

The invention belongs to the field of mobile robot technology and multi-sensor fusion positioning, and particularly relates to a bidirectional depth vision inertial pose estimation method combining a multi-line laser radar.

Background

The traditional outdoor positioning method is mainly a positioning method based on global position signals. In urban environment, the positioning accuracy of the GPS can reach 5-10m, the requirement of rough positioning is met, but the positioning accuracy is far from sufficient for mobile robots which need high-accuracy poses, such as unmanned automobile automatic driving and the like.

In the environment with serious shielding such as indoor environment, tunnel environment and the like, the GPS is easy to lose signals, cannot stably provide positioning information, and is not high in robustness. Under the condition of lack of global position signals such as GPS (global positioning system), Beidou and the like or weak signals, the pose estimation of the robot needs to be realized by means of sensors such as a laser radar, a camera, an IMU (inertial measurement unit) and the like.

In a scene with poor lighting conditions and no texture, the visual SLAM is easy to cause tracking failure due to less detected effective features. At present, fusion of vision and an IMU (inertial measurement unit) is a common mode for improving algorithm robustness, the IMU can provide pose estimation in a short time when the vision fails, and if the robot is in uniform motion or pure rotation for a long time, the positioning accuracy and robustness of the robot can be reduced. The laser SLAM technology is mature, the map construction is clear and intuitive, the positioning accuracy is high, the laser SLAM technology is widely applied to the field of positioning and navigation at present, and the laser SLAM technology is easy to fail in unstructured environments such as tunnels, galleries, open spaces and the like. Both the laser SLAM and the visual SLAM have the advantages and the defects, and the positioning precision and the robustness need to be improved so as to adapt to different scene requirements.

Therefore, the technical problems of low positioning precision, poor robustness and difficulty in adapting to application requirements of different scenes exist in the prior art. Based on the advantages and disadvantages of various sensors, the invention is a feasible research direction for fusing multiple sensors to perform SLAM.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a bidirectional depth vision inertial pose estimation method combined with a multi-line laser radar, so that the technical problems of low positioning precision, poor robustness and difficulty in adapting to application requirements of different scenes in the prior art are solved.

To achieve the above object, according to one aspect of the present invention, there is provided a bidirectional depth vision inertial pose estimation method in combination with a multiline lidar, including:

after the depth information of the feature points in the bidirectional depth vision image is recovered by using the laser radar point cloud, the bidirectional depth vision image is respectively tightly coupled with IMU data to obtain two pose estimates, and the two pose estimates are subjected to weighted fusion to obtain a visual odometer;

extracting feature points from the laser radar point cloud, constructing a laser point line residual error, taking a vision odometer or IMU estimation result as a priori pose estimation, combining the priori pose estimation with the laser point line residual error to construct a Gauss-Newton equation, and performing iterative optimization on the Gauss-Newton equation to obtain a laser odometer;

respectively carrying out closed-loop detection on the bidirectional depth visual image and the laser radar point cloud, verifying all closed-loop detection results by using laser radar point cloud registration, and if the closed loop is verified to be effective, taking the relative pose transformation relation between the laser key frames corresponding to the current moment and the closed-loop moment as closed-loop constraint;

and taking the pose transformation relation between the vision odometers corresponding to the adjacent vision depth image key frames as vision odometer constraint, taking the pose transformation relation corresponding to the laser odometers between the adjacent laser key frames as laser odometer constraint, and taking the laser odometer constraint, the vision odometer constraint and the closed-loop constraint as constraint items to carry out factor graph optimization to obtain optimal pose estimation.

Further, the verifying of the closed loop detection result comprises:

performing closed-loop detection on the bidirectional depth visual image, acquiring two timestamps of the current time and the closed-loop time, searching two laser key frames corresponding to the two timestamps in a historical laser key frame queue, respectively constructing local feature point clouds for the two laser key frames, performing point cloud registration on the two local feature point clouds, and if the registration is successful, determining that the visual closed-loop is effective;

local feature point clouds are respectively constructed for a laser key frame corresponding to the current moment and a laser key frame corresponding to the closed-loop moment in the closed-loop detection of the laser radar point cloud, point cloud registration is carried out on the two local feature point clouds, and the laser closed loop is effective if the registration is successful.

Further, the closed-loop visual key frame detected by the bidirectional depth visual image closed loop meets the requirements of time consistency check and space consistency check, the time consistency check means that the interval between the current closed-loop time and the last closed-loop time is greater than a preset time threshold, and the space consistency check means that the distance between the current depth visual image key frame and the closed-loop visual key frame is smaller than a preset distance when the closed loop occurs.

Further, the closed loop detection comprises:

for each direction depth visual image in the two-way depth visual image, calculating the similarity between the depth visual image and the bag of words, and taking the depth visual image with the maximum similarity as a closed-loop visual key frame;

acquiring all historical laser key frame positions, constructing a 3D Kd-tree, searching in the Kd-tree at certain intervals according to the pose of the current laser key frame, and finding out a nearest neighbor frame to the current laser key frame; and if the position distance between the current laser key frame and the adjacent frame is smaller than a set threshold, taking the adjacent frame as the laser key frame corresponding to the closed-loop moment.

Further, the depth information of the feature points in the bidirectional depth vision image is recovered by the following method:

and constructing a local point cloud set through the laser radar point cloud after distortion removal, finding matched depth information in the local point cloud set by the characteristic points in the bidirectional depth visual image for recovery, and if matching points are lacked, recovering the depth information by using a depth camera for acquiring the bidirectional depth visual image.

Further, the local point cloud set is constructed by:

stacking the laser radar point clouds after distortion removal, reserving the laser radar point clouds in an adjacent time range, removing old laser radar point clouds to obtain a local point cloud set, wherein the adjacent time range is dynamically updated according to translation increment and rotation increment:

TimeSet＝Tmax*(1-(μ*deltaA/Amax+η*deltaP/Pmax))

the TimeSet is a time length of local point cloud set maintenance, namely an adjacent time range, Tmax is a maximum value of the TimeSet, μ and η are proportionality coefficients of a rotation increment deltaA and a translation increment deItaP respectively, and Amax and Pmax are maximum values of an angle increment and a rotation increment respectively.

Further, the method for iterative optimization by combining the laser point-line residual error comprises the following steps:

the curvature calculation of the laser point is obtained by the square of the sum of the distance deviations between the front and rear five points and the current point, and laser characteristic points including angular points and plane points are extracted according to the curvature of the laser point;

adopting a corner characteristic image and a plane characteristic image maintained by a laser key frame based on a sliding window;

searching 5 corner points adjacent to the corner point in a corner point feature map by using a k neighbor search method for the corner point corresponding to the current laser frame; if the distances between the 5 points and the reference point are both smaller than a threshold value and approximately collinear, the angular point is considered to be matched with the feature map;

searching 5 points nearest to a plane point corresponding to the current laser frame in a plane characteristic diagram; if the distances between the 5 points and the reference point are both smaller than the threshold value and approximately coplanar, the plane point is considered to be matched with the plane feature map;

and for the two types of matched feature points, taking the distances from the feature points to the target linear feature and the target plane feature as observed values, establishing a Gauss Newton equation, taking a relatively high-frequency visual odometer or IMU odometer as prior estimation, and obtaining the laser odometer by minimizing the distance from the detected linear-plane feature to the feature map.

Further, the calculating of the visual odometer comprises:

after the depth information of the feature points in the bidirectional depth vision image is recovered by using the laser radar point cloud, the bidirectional depth vision image obtains two pose estimations based on a sliding window optimization mode through respectively optimizing a vision re-projection error and an IMU measurement error in a combined mode, and obtains a vision odometer by adopting a weighted fusion mode.

Further, the factor graph optimization further comprises:

the method comprises the steps of constructing IMU constraints based on IMU data between a current laser key frame and a last laser key frame, wherein the IMU constraints comprise IMU pre-integration factors and IMU offset factors, the IMU pre-integration factors are constructed by the acceleration and angular velocity data of an IMU, the IMU offset factors are constructed by IMU offset, laser odometer constraints, visual odometer constraints, closed-loop constraints and the IMU constraints are used as constraint items, factor graph optimization is conducted, and the optimal pose estimation and the offset of the current laser key frame are obtained.

Further, the method further comprises:

and performing IMU pre-integration on IMU data after the current laser key frame moment again by using the offset after factor graph optimization, further obtaining an IMU estimation result at each moment after the current laser key frame moment, and taking the IMU estimation result as an initial pose estimation value of the visual odometer.

And the IMU estimation result is a robot pose estimation result of IMU frequency.

According to another aspect of the invention, there is provided a bi-directional depth vision inertial pose estimation system in combination with a multiline lidar comprising: the system comprises a double-depth camera, a multi-line laser radar, an IMU sensor and a processor;

the double-depth camera is used for acquiring a bidirectional depth visual image;

the multi-line laser radar is used for collecting laser radar point cloud;

the IMU sensor is used for acquiring IMU data;

the processor is in communication with a dual depth camera, a multi-line lidar, an IMU sensor, including:

the visual odometer estimator is used for recovering the depth information of the feature points in the bidirectional depth visual image by using the laser radar point cloud, and then the bidirectional depth visual image respectively obtains two pose estimations based on a sliding window optimization mode through jointly optimizing a visual reprojection error and an IMU measurement error, and obtains a visual odometer by adopting a weighted fusion mode;

the laser odometer estimator is used for extracting characteristic points from the laser radar point cloud, constructing a laser point line residual error, taking a vision odometer or IMU estimation result as a priori pose estimation, combining the priori pose estimation with the laser point line residual error to construct a Gauss-Newton equation, and performing iterative optimization on the Gauss-Newton equation to obtain the laser odometer;

the closed-loop detection module is used for respectively carrying out closed-loop detection on the bidirectional depth visual image and the laser radar point cloud, verifying all closed-loop detection results by using point cloud registration of the laser radar, and if the closed loop is verified to be effective, taking the relative pose transformation relation between the laser key frames corresponding to the current moment and the closed-loop moment as closed-loop constraint;

and the factor graph optimization module is used for taking the pose transformation relation between the visual odometers corresponding to the adjacent visual depth image key frames as visual odometer constraint, taking the pose transformation relation corresponding to the laser odometer between the adjacent laser key frames as laser odometer constraint, and taking the laser odometer constraint, the visual odometer constraint and the closed-loop constraint as constraint items to carry out factor graph optimization to obtain optimal pose estimation.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) the invention uses the bidirectional depth vision image to estimate the pose, thereby avoiding the situations of closed loop missing detection and single visual angle feature loss caused by vision limited visual angle. The method comprises the steps of recovering depth information of feature points in a bidirectional depth vision image by using laser radar point cloud, wherein the laser radar and vision are tightly coupled on a data layer, assisting in recovering the depth of the visual feature points by using the depth information of the laser radar point cloud, and calculating the laser odometer by taking a vision odometer or an IMU estimation result as prior pose estimation, which is loosely coupled. The loose coupling method has the advantages of simple structure, low calculation cost and strong expansibility, and the prior pose estimation has two options to bear failure risks. The invention combines tight coupling and loose coupling, thereby not only ensuring that the positioning precision has a risk avoiding strategy, but also being capable of accurately positioning even in the condition of lacking global position signals such as GPS, GNSS, Beidou and the like. The invention combines vision, laser and closed-loop data to carry out factor graph optimization, adopts multi-data fusion, still has high-precision and robust pose estimation in scenes such as poor illumination, texture loss and the like, realizes accurate positioning, and can be applied to unmanned automobile automatic driving.

(2) According to the invention, the two-way depth vision and the multi-line laser radar are combined to carry out closed-loop detection, and after the closed-loop detection of the vision is successful, secondary confirmation is carried out through point cloud registration of the laser radar so as to eliminate the closed-loop repeated detection and false detection conditions existing in the vision, and the accuracy and the recall rate of the closed-loop detection are improved.

(3) In order to improve the overall operation efficiency of the algorithm, the selection of the visual closed-loop frame needs to carry out time consistency inspection and space consistency inspection, so that the condition that the front camera and the rear camera simultaneously detect the closed loop or continuously detect the closed loop in a short time is avoided. When the forward or backward visual characteristics are not convenient for closed loop detection, the image of the other visual angle can play a role in compensation, so that the key frame pose is optimized by means of closed loop.

(4) The laser radar can also carry out laser closed loop detection according to odometer neighbor detection. The invention carries out closed loop on vision and laser, ensures that a closed loop key frame is effective, and verifies that the closed loop is effective, thereby being capable of accurately detecting a historical closed loop, carrying out closed loop optimization and providing a point cloud map with consistent and smooth overall situation for a subsequent positioning method under a known map.

(5) According to the invention, the two-way depth vision and the multi-line laser radar are fused on the data level, the local point cloud set is dynamically updated, and the visual characteristic points are associated with the laser point cloud for depth recovery, so that the precision and the calculation efficiency of the depth recovery of the visual characteristic points are improved. The invention dynamically updates the proximity time range according to the translation increment and the rotation increment, thereby changing the density degree of the point cloud set.

(6) As the system operation time increases, if the poses and features of all the keyframes are considered at the same time, the calculation efficiency gradually decreases or even the real-time stable operation cannot be realized. In order to save the optimization calculation of a large number of feature points, the invention constructs the odometer constraint according to the pose transformation relation among the key frames. And obtaining high-precision and robust pose estimation through factor graph joint optimization.

(7) The two-way depth vision image respectively obtains two pose estimations based on a sliding window optimization mode through jointly optimizing a vision reprojection error and an IMU measurement error, and obtains a vision odometer through a weighting fusion mode, so that the reduction of system positioning accuracy and reliability caused by single vision characteristic loss or tracking failure is avoided.

(8) According to the invention, IMU constraint is introduced in the factor graph optimization process, and the pose and IMU bias are optimized and updated in combination with laser odometer constraint, visual odometer constraint, closed-loop constraint and IMU constraint. Optimizing IMU biasing may suppress drift problems with pre-integration.

(9) The IMU pre-integration is carried out on the IMU data after the current laser key frame moment again by the offset after the factor graph optimization, and the improvement of the IMU pre-integration precision can provide more accurate initial value estimation for a subsequent laser odometer. And subsequently, taking the IMU estimation result as an initial pose estimation value of the visual odometer, and then using the new visual odometer and the IMU estimation result for calculating a laser odometer, wherein the vision and the laser limit deviation for the IMU. Under the condition of limited bias, the odometer of the IMU is very accurate, and the application scene of the invention is further expanded.

Drawings

FIG. 1 is a flow chart of a method for estimating an inertial pose of a two-way depth vision in combination with a multiline lidar according to an embodiment of the present invention;

FIG. 2 is a core architecture diagram of a bi-directional depth vision inertial pose estimation system incorporating a multiline lidar according to an embodiment of the present invention;

fig. 3 (a) is a three-dimensional point cloud dense map formed by a KITTI data set provided by an embodiment of the invention;

fig. 3 (b) is a three-dimensional point cloud dense map formed by an M2DGR data set provided by an embodiment of the present invention;

FIG. 4 (a) is a diagram of a closed loop pre-correction state provided by an embodiment of the present invention;

FIG. 4 (b) is a diagram of a post-closed loop correction state provided by an embodiment of the present invention;

FIG. 5 is a robot dataflow graph that is provided by an embodiment of the present invention;

FIG. 6 is a diagram of a ROS system node provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a two-way depth vision inertial pose estimation method combined with a multi-line laser radar, which can be applied to environments such as urban streets, corridors, indoor spaces and the like, realizes high-precision pose estimation and still has higher robustness in an environment with poor illumination conditions. The invention aims to improve the accuracy and robustness of the robot pose estimation under the condition of lacking global position signals such as a Global Position System (GPS), a Beidou and the like.

The invention discloses a pose estimation method based on multi-line laser radar, two-way depth vision and IMU fusion under the scene lacking global position signals, the whole flow of the method is shown in figure 1, and the method comprises the following steps:

after depth information of feature points in the bidirectional depth visual image is recovered by using the laser radar point cloud, tightly coupling the bidirectional depth visual image with IMU data to obtain two pose estimates, and performing weighted fusion on the two pose estimates to obtain a visual odometer;

extracting characteristic points from the laser radar point cloud, constructing a laser point line residual error, taking a vision odometer or IMU estimation result as a priori pose estimation, combining the priori pose estimation with the laser point line residual error to construct a Gauss-Newton equation, and performing iterative optimization on the Gauss-Newton equation to obtain a laser odometer;

and (3) taking the pose transformation relation between the visual odometers corresponding to the adjacent visual depth image key frames as visual odometer constraint, taking the pose transformation relation corresponding to the laser odometers between the adjacent laser key frames as laser odometer constraint, taking the laser odometer constraint, the visual odometer constraint and the closed-loop constraint as constraint items, and performing factor graph optimization to obtain optimal pose estimation.

As shown in fig. 2, a two-way depth vision inertial pose estimation system combined with a multiline laser radar includes: the system comprises a double-depth camera, a multi-line laser radar, an IMU sensor and a processor;

the multi-line laser radar is used for collecting laser radar point cloud;

the IMU sensor is used for acquiring IMU data;

the visual odometer estimator is used for recovering the depth information of the feature points in the bidirectional depth visual image by using the laser radar point cloud, respectively and tightly coupling the bidirectional depth visual image with IMU data to obtain two pose estimates, and performing weighted fusion on the two pose estimates to obtain a visual odometer;

Specifically, the visual odometer is calculated as follows:

and constructing a local point cloud set through the laser radar point cloud after distortion removal, finding matched depth information in the local point cloud set by the feature points in the bidirectional depth visual image for recovery, and if the matched points are lacked, recovering the depth information by using a depth camera for acquiring the bidirectional depth visual image.

And the laser radar point cloud is combined with the IMU for distortion removal, and the laser radar point cloud after distortion removal is obtained.

The local point cloud set is constructed in the following way:

stacking the laser radar point clouds after distortion removal, reserving the laser radar point clouds in an adjacent time range, removing old laser radar point clouds to obtain a local point cloud set, and dynamically updating the adjacent time range according to translation increment and rotation increment to change the density degree of the point cloud set; the formula for the calculation of the proximity time range is:

TimeSet＝Tmax*(1-(μ*deltaA/Amax+η*deltaP/Pmax))

the TimeSet is a time length of local point cloud collection maintenance, namely an adjacent time range, Tmax is the maximum value of the TimeSet, μ and η are proportionality coefficients of rotation increment deltaA and translation increment deltaP respectively, and Amax and Pmax are maximum values of angle increment and rotation increment respectively.

The bidirectional depth vision front end performs adaptive mean histogram processing on an input image to improve the contrast and definition of the image, and then performs Harris corner feature extraction. And tracking the feature points of the previous frame by adopting a KLT optical flow tracking algorithm, searching for matching points on the frame, and recording the states of the feature points, thereby eliminating the feature points which fail to track, and updating the tracked times of the feature points. According to the tracked times of the feature points, the feature points with the tracked times are reserved as much as possible, and meanwhile, the feature points on the whole image are subjected to homogenization treatment. By setting the minimum distance between adjacent feature points, the feature points of the texture-rich area in the image are prevented from being too dense. Due to the elimination and homogenization treatment of the feature points and the reduction of the number of the feature points of the current frame in the motion process, supplementary extraction needs to be carried out according to the number of the feature points so as to improve the number of the feature points to meet the basic requirements of a subsequent algorithm. Meanwhile, a basic matrix is calculated according to the feature points tracked by two adjacent frames of images, and abnormal points are eliminated through an RANSAC algorithm.

The two-way depth visual image respectively obtains two pose estimations based on a sliding window optimization mode by jointly optimizing a visual reprojection error and an IMU measurement error, and obtains a visual odometer by adopting a weighting fusion mode, so that the reduction of the positioning accuracy and reliability of a system caused by the loss of single visual features or the failure of tracking is avoided; the visual odometer can be used as a laser registration initial value to improve the registration accuracy.

The bidirectional depth visual image comprises a forward depth visual image and a backward depth visual image, the depth information of the feature points in the forward depth visual image and the backward depth visual image is restored through the following steps, and the forward depth visual image and the backward depth visual image are taken as examples:

(1) according to the rotation increment and the translation increment of the mobile platform, dynamically maintaining a local point cloud set and changing the density degree of the point cloud set;

(2) converting the local point cloud set into a camera coordinate system, and projecting the feature points and the laser radar point cloud in the forward depth visual image onto a unit sphere taking a forward depth camera as a center;

(3) for each feature point in the forward depth visual image, searching a 2D k-D tree by taking the polar coordinate of the feature point in the forward depth visual image as a reference, and acquiring 3 laser radar point clouds closest to the feature point on a sphere;

(4) if the maximum distance between the 3 laser radar point clouds is larger than 1.5m, the local point cloud set is considered to have no information matched with the feature points in the forward depth visual image, and a monocular triangulation or binocular stereo matching method is adopted to recover the depth information, otherwise, the step (5) is skipped;

(5) combining 3 laser radar point clouds, and obtaining the depth value of the characteristic point through interpolation calculation.

Because the system continuously collects the sensor data in the running process, if the sensor measurement at all times is considered, the calculation amount required by the data preprocessing and optimization is overlarge. In order to reduce the calculation amount, the image key frame data is processed in a sliding window optimization-based mode, and only key frames in a window are considered in the optimization process. Essentially, the visual-inertial tight-coupling nonlinear optimization problem is to solve the Jacobian matrix and covariance matrix for visual and IMU pre-integration. As shown in equation (1), the state vector to be optimized for this problem includes the state of the IMU within the sliding window, including PVQ (position, velocity, rotation), Bias, and the inverse depth of the camera to the outlier and landmark point of the IMU.

In the above formula, x _i Indicating the IMU state quantity corresponding to the ith frame image,

and

respectively representing position, direction and speed, b _a And b _g Indicating the offset, λ, of the accelerometer and gyroscope, respectively _m Representing the inverse depth of the mth 3D feature point, N representing the starting index of IMU state quantity in the sliding window, N representing the number of key frames contained in the window, M representing the starting index of point feature state variables in the window, M representing the number of feature points in the window,

representing camera to IMU external parameters.

Solving the nonlinear optimization problem by minimizing the sum of the multiple errors, wherein the cost function required to be constructed for system optimization is shown in formula (2):

wherein the content of the first and second substances,

it is shown that the IMU residual terms,

the visual observation residual term is represented, and p is the robust kernel function. { r _p ，H _P The influence of the interframe constraint on the current state can be reserved.

The calculation of the laser odometer includes:

and the extraction of the laser characteristic points takes the curvature of the laser points as a reference, and the curvature of each laser point is calculated for the laser point cloud after distortion removal. The calculation of the curvature of the laser spot is obtained by squaring the sum of the distance deviations between the first five points and the second five points and the current point, as shown in equation (3). If the points near the laser point are approximately on the same spherical surface, the calculation result of the curvature magnitude should be close to 0.

Similar to the extraction of the visual feature points, the laser feature points also need to be subjected to homogenization treatment, the laser point cloud is divided into a plurality of sections, and at most 30 angular points are taken for each sub-section of the laser point cloud. The selection of corner points gives preference to points with large curvature values, whereas points other than corner points are considered to be plane points. And respectively adding the angular points and the plane points into the angular point and plane point cloud sets, and performing downsampling processing on the plane point cloud sets. And finally, issuing point cloud information of the current laser frame after feature extraction on the ROS platform, wherein the point cloud information comprises data of laser point cloud, initial pose, feature point cloud and the like. The point cloud information is subscribed and used at the back end of the algorithm.

The laser odometer only uses the laser radar key frame, and the rest frames are ignored. The map matched with the laser frame image adopts a characteristic image maintained by a laser key frame based on a sliding window so as to reduce the computational complexity. After point cloud distortion removal and point cloud feature extraction, a relatively high-frequency visual odometer or IMU estimation is used as prior estimation, and the laser odometer is obtained by minimizing the distance from the detected line-surface features to the feature map. Where the a priori estimates are higher priority in terms of visual mileage. Under the environments of rapid movement, poor illumination condition, missing texture and the like, the visual inertia subsystem is easy to lose effectiveness due to insufficient number of tracked feature points. When a single vision sensor fails, the other module can still work normally, so that the Bias of the IMU is estimated correctly, and the situation that the Bias estimation is wrong to influence the pose estimation of the whole system is avoided. And judging whether the visual sensor fails under the conditions of too few tracked characteristic points, Bias exceeding a threshold value, rotation increment or too high vehicle speed, and when the failure conditions are met, performing visual initialization again. When both the front and back bi-directional depth vision systems fail, the laser odometer does not take the visual odometer as a priori when estimating.

The ith laser key frame is expressed in the form of

Wherein

In order to be a feature of a corner point,

is a planar feature. Local feature map M _i Pose { T) from the first n keyframes _i-n ，...，T _i And the features of the first n +1 key frames { K } _i-n ，...，K _i Is constructed wherein M is _i Map including corner features

And a planar feature map

When the latest radar key frame K _i+1 When coming, match the frame with local feature map M _i And constructing constraint terms of point lines and point planes. Obtaining the pose T of the latest frame by minimizing the total constraint _i+1 。

Before optimization, converting the feature points corresponding to the laser radar key frame into a coordinate system of a feature map, and defining

And coordinates of the ith characteristic point in the k +1 th laser key frame in a characteristic map coordinate system. The optimization method adopts a Gauss-Newton method, an iteration method of the method is shown as a formula (4), and a residual error equation is constructed on the basis of a point line and a point surface residual error for optimization.

Wherein f is a residual equation, J is a Jacobian matrix of a residual term relative to a state variable to be optimized, and delta P is an increment of the variable to be optimized, and comprises a translation part and a rotation part.

Before solving by using the gauss-newton method, a residual equation needs to be constructed. Respectively extracting angular point features

Corner-point feature map

Matched features, and planar features

And a planar feature map

And constructing residual constraint terms according to the matched features.

For the corner feature corresponding to the current laser frame, a k neighbor searching method of a PCL library is used for searching the map

To find 5 corner points adjacent to the point. If the distances between the 5 points and the reference point are both smaller than the threshold value and approximately collinear, the feature of the corner point is considered to be the map

And (6) matching. The residual calculation of the corner point is essentially to calculate the distance from the point to the corresponding feature straight line on the feature map, as shown in formula (5):

wherein, the first and the second end of the pipe are connected with each other,

and

two points on the feature straight line of the corner feature and the feature map are shown.

For the plane feature corresponding to the current laser frame, in the map

The nearest 5 points are found. If the distances between the 5 points and the reference point are both smaller than the threshold value and approximately coplanar, the plane feature is considered to be the same as the map

And (6) matching. The residual calculation of the plane feature point is essentially to find the distance from the point to the target plane feature on the feature map, as shown in formula (6):

and (3) calculating a Jacobian matrix for the two matched feature points, establishing a Gauss Newton equation by taking the distance from the feature points to the target linear feature and the target plane feature as an observed value, and iteratively optimizing the current pose according to a formula (4) so as to obtain the laser odometer after laser calibration.

The factor graph optimization comprises the following steps:

as the system operation time increases, if the poses and features of all the keyframes are considered at the same time, the calculation efficiency gradually decreases or even the real-time stable operation cannot be realized. In order to save the optimization calculation of a large number of feature points, the odometer constraint is constructed according to the pose transformation relation among the key frames. Constructing laser odometer factors (namely laser odometer constraint) by using the pose transformation relation between adjacent laser key frames; constructing a visual odometry factor (namely visual odometry constraint) by using a pose transformation relation between adjacent visual key frames; and for the closed-loop key frame, obtaining the pose transformation relation between the laser key frame at the current moment and the laser key frame at the closed-loop moment through ICP (inductively coupled plasma) registration, and constructing a closed-loop factor (closed-loop constraint) based on the pose transformation relation.

Further, the odometry factor may be defined uniformly as:

in the above formula, the first and second carbon atoms are,

and

representing the relative pose relationship between the ith and the jth key frames, r _ij Indicating an odometry factor.

The objective function of the overall optimization problem is as follows:

in the above formula, T represents a historical key frame pose set, O, V and L represent a laser odometer factor set, a visual odometer factor set and a closed-loop factor set, respectively, and h represents a Huber function with an excessive limiting error, so as to avoid the excessive influence on pose optimization caused by closed-loop false detection.

Further, the problem is solved based on an iSAM2 optimizer, and the optimized laser key frame pose is obtained. And calculating and adding an IMU pre-integration factor and an IMU offset factor according to IMU data between the current laser key frame and the last laser key frame, wherein the IMU pre-integration factor is constructed by the acceleration and angular velocity data of the IMU, and the IMU offset factor is constructed by the IMU offset, so that the state of the current laser key frame is further optimized, and the state comprises the pose, the velocity and the IMU offset of the current frame. Specifically, an iSAM2 optimizer in an open-source nonlinear optimization library Gtsam optimizes a factor graph, and updates the pose and offset of the robot. Optimizing IMU biasing may suppress drift problems with pre-integration.

Further, for the optimized laser odometer, the IMU pre-integration is carried out again on the IMU data after the current frame time by the latest IMU offset. The improvement of the IMU pre-integration precision can provide more accurate initial value estimation for the laser odometer. And calculating the state of the IMU at each moment according to the IMU pre-integration quantity on the basis of the state and the offset corresponding to the moment of the optimized IMU data after the moment corresponding to the laser odometer to obtain the IMU odometer (namely an IMU estimation result) at each moment, thereby realizing the parallel estimation of the visual odometer, the laser odometer and the IMU odometer. The IMU odometer is used as an initial pose estimation value of the visual odometer, the visual odometer and the IMU odometer are used as initial matching values of the laser odometer, and the vision and the laser limit deviation for the IMU. Under bias limited conditions, the odometer of the IMU is very accurate.

On the basis of obtaining a high-precision pose, registering the feature point cloud of the laser key frame into an environment map, and constructing a laser data three-dimensional scene map. Taking the 05 sequence of KITTI and the street 04 sequence data set of M2DGR as examples, the mapping effect is shown in (a) in FIG. 3 and (b) in FIG. 3, the invention can establish a globally consistent and smooth point cloud map, and truly reflects the three-dimensional structure of the actual scene.

The closed loop detection comprises:

visual closed-loop detection employs the DBoW2 method, and when a new image key frame is added, all feature points of the image are described by Brief feature descriptors. Since the number of Harris corners extracted by the visual front end is small, 500 FAST feature points need to be extracted, and Brief descriptors need to be calculated for closed loop detection. And calculating the similarity score of the current image key frame and the bag of words, and returning to the closed-loop visual key frame according to the similarity. And matching the current frame with the candidate closed-loop key frame through a Brief descriptor, and removing abnormal values by using a RANSAC method. And when the number of the matched point pairs exceeds a threshold value, the closed-loop candidate frame is considered to be effective.

The visual part uses two cameras to detect closed loop simultaneously. In order to improve the overall operation efficiency of the algorithm, the selection of the visual closed-loop frame needs to carry out time consistency inspection and space consistency inspection, so that the condition that the front camera and the rear camera simultaneously detect the closed loop or continuously detect the closed loop in a short time is avoided. The time consistency check means that the interval between the current visual closed-loop time and the last closed-loop time is necessarily greater than a set time threshold, and the space consistency check means that the distance between the current image key frame and the closed-loop frame is necessarily less than a set distance when the closed loop occurs.

In an actual scene, when the forward or backward visual characteristics are lost and closed loop detection is not convenient, the camera at the other visual angle can play a role in compensation, so that the historical key frame pose is optimized by means of closed loop. When the vision succeeds in detecting the closed loop, the information containing the current image frame and the closed loop frame timestamp is published in the Topic form of ROS. And performing laser closed-loop secondary confirmation by the laser inertia subsystem according to the closed-loop timestamp information issued by the vision.

Performing closed-loop detection on the bidirectional depth visual image, acquiring two timestamps of the current time and the closed-loop time, searching two laser key frames corresponding to the two timestamps in a historical laser key frame queue, respectively constructing local feature point clouds for the two laser key frames, performing point cloud registration on the two local feature point clouds, and if the registration is successful, the visual closed-loop is effective;

In order to improve the efficiency of point cloud registration, the two local point clouds are subjected to ICP point cloud registration, and when ICP iteration converges and the matching score meets the threshold requirement, the closed loop is considered to be effective. Fig. 4 (a) is a state diagram before closed-loop correction provided by the embodiment of the present invention, and fig. 4 (b) is a state diagram after closed-loop correction provided by the embodiment of the present invention, which illustrates that the closed-loop correction effect of the present invention is better.

In the embodiment of the invention, experiments are carried out on the data sets of the public reference data set KITTI and the M2DGR sequence, and the effect of the laser radar on the secondary verification of the visual closed loop is verified. The closed-loop detection of the bidirectional deep vision is verified based on the laser point cloud registration, and the result is shown in table 1, and the recorded data only contains data when the closed-loop detection of the vision is successful. The format of the data elements in the table is: (id) ₁ -id ₂ ) Wherein id ₁ And id ₂ And respectively representing the ID numbers of the laser key frame and the closed-loop laser key frame corresponding to the visual closed-loop information.

TABLE 1 closed-Loop validation of laser Vision Association

In table 1, after the secondary confirmation is performed by the laser point cloud registration, the closed loop failure reasons corresponding to the sequence numbers 1 and 3 are that the detection of the visual closed loop and the laser closed loop is successful near the current position, and the closed loop factor does not need to be repeatedly calculated. The failure reason corresponding to the sequence number 2 in the table is that the point cloud registration score is larger than the set closed-loop threshold value, so that the closed-loop information needs to be discarded, and the closed-loop information is regarded as closed-loop false detection. Based on the verification result, the condition that closed-loop false detection exists in part of scenes in the vision can be obtained, the false detection condition can be effectively eliminated through point cloud registration in the laser radar closed-loop detection, and the accuracy and the recall rate of the closed-loop detection are improved.

As shown in fig. 5, a measuring apparatus mounted on a mobile robot includes a multiline lidar, a dual depth camera, and an IMU sensor; the sensor is arranged on a double-layer support frame, the bottom of the support frame is fixed on a robot moving platform, and a 24V-to-220V inverter is fixedly arranged on the side edge of the platform; the motor driver of the robot chassis adopts an alternating current servo motor driver, the upper layer adopts a high-performance notebook computer as a computing platform, the laser radar is connected through a network port, a USB port is connected with a camera, and the lower layer adopts an embedded control panel as a main control panel and communicates with the upper platform through a serial port.

Concretely, a measuring apparatus mounted on a mobile robot includes: radium spirit 32 line laser radar, two front and back ZED2 cameras (built-in IMU), high performance computing platform and main control board. The sensor is installed on double-deck support frame, and the support frame bottom is fixed on moving platform, and platform side fixed mounting 24V changes 220V dc-to-ac converter. The motor driver on the robot chassis adopts loose alternating current servo motor driver, and the upper strata adopts high performance notebook computer as computing platform, connects laser radar through the net gape, and the ZED2 camera is connected to the USB mouth, and the lower floor adopts embedded control panel as the main control board, through serial ports and upper platform communication.

The ZED2 binocular depth camera is produced by Stereo labs corporation of the united states, with high image definition and high frame rate. The camera is internally provided with sensors such as a high-precision IMU, a magnetometer and a barometer, has the 6-axis attitude position precision of 1mm and the direction precision of 0.1, and is widely applied to application scenes such as target detection, scene reconstruction, visual positioning and human-computer interaction. The laser radar selects a laser 32-line laser radar C32-151A, 32 laser channels are shared, the ranging range is 100-200 m, the ranging precision reaches 3cm, and the single echo data rate reaches 65 ten thousand points per second. The vertical field angle of the laser radar is-16-15 degrees, the vertical angle resolution is 1 degree, the horizontal field angle is 360 degrees, and the horizontal angle resolution is 0.18 degrees. In this embodiment, the set camera image output frequency is 20Hz, the IMU output frequency is 400Hz, and the lidar output frequency is 10Hz, wherein both the ZED2 camera and the lidar adopt ROS drive to obtain their images and point cloud data. The high-performance notebook computer is configured to be an Intel i7-10870H processor, a 16G memory and an RTX 2060 display card. The embedded control board is designed and developed based on stm32f103, and has the functions of PWM output, IO acquisition, timer, serial port communication and the like.

The whole software architecture is developed based on the ROS. The high performance computing platform installs the Ubuntu 18.04 system and the Melodic version of the ROS, which are compatible with each other, with C + + as the main programming language. The ROS system is a robot operating system, and as shown in fig. 6, a software system ROS node diagram includes a visual feature extraction and tracking node, a visual odometer node, a visual closed-loop detection node, a laser point cloud distortion removal node, a laser point cloud feature extraction node, an IMU pre-integration node, and a back-end optimization node, which are respectively used for visual feature extraction and tracking, visual odometer estimation, visual closed-loop detection, laser point cloud distortion removal, laser point cloud feature extraction, IMU pre-integration, and factor graph optimization.

It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.

Claims

1. A bidirectional depth vision inertial pose estimation method combined with a multiline laser radar is characterized by comprising the following steps:

2. The method of claim 1, wherein the verifying the closed-loop detection result comprises:

3. The method for estimating the inertial pose of bi-directional depth vision combined with the multiline lidar as recited in claim 1 or 2, wherein a closed-loop visual key frame detected by the closed loop of the bi-directional depth visual image meets requirements of a temporal consistency check and a spatial consistency check, the temporal consistency check means that an interval between a current closed-loop time and a last closed-loop time is greater than a preset temporal threshold, and the spatial consistency check means that a distance between the current depth visual image key frame and the closed-loop visual key frame is smaller than a preset distance when the closed loop occurs.

4. The method of bi-directional depth vision inertial pose estimation in combination with multiline lidar of claim 1 or 2, wherein the closed-loop detection comprises:

5. The method for estimating the inertial pose of the two-way depth vision combined with the multiline laser radar according to claim 1 or 2, wherein the depth information of the feature points in the two-way depth vision image is recovered by the following steps:

6. The method for bi-directional depth vision inertial pose estimation in combination with multiline lidar of claim 5, wherein the local point cloud set is constructed by:

TimeSet＝Tmax*(1-(μ*deltaA/Amax+η*deltaP/Pmax))

7. The method for estimating the inertial pose of the two-way depth vision combined with the multiline laser radar according to claim 1 or 2, wherein the vision odometer is obtained by:

8. The method of bi-directional depth vision inertial pose estimation in conjunction with multiline lidar of claim 1 or 2, wherein the factor graph optimization further comprises:

9. The method of bi-directional depth vision inertial pose estimation in conjunction with multiline lidar of claim 8, further comprising:

and (4) performing IMU pre-integration on IMU data after the current laser key frame moment again by using the offset optimized by the factor graph, further obtaining an IMU estimation result at each moment after the current laser key frame moment, and taking the IMU estimation result as a pose estimation initial value of the visual odometer.

10. A bi-directional depth vision inertial pose estimation system in combination with a multiline lidar comprising: the system comprises a double-depth camera, a multi-line laser radar, an IMU sensor and a processor;

the multi-line laser radar is used for collecting laser radar point cloud;

the IMU sensor is used for acquiring IMU data;

the processor includes: