CN113763470A

CN113763470A - RGBD visual inertia simultaneous positioning and map construction with dotted line feature fusion

Info

Publication number: CN113763470A
Application number: CN202110914560.8A
Authority: CN
Inventors: 赵良玉; 朱叶青; 金瑞
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2021-12-07

Abstract

The invention discloses an RGBD visual inertia simultaneous positioning and map construction method, device, equipment and medium based on dotted line feature fusion, wherein the method comprises a front-end visual odometer, a rear-end optimization and a three-dimensional map construction process, images acquired by an RGBD camera and an IMU detection result are used as input information of an SLAM system in the front-end visual odometer process, the front-end visual odometer is carried out based on dotted line features and comprises feature detection and matching, IMU pre-integration and visual inertia alignment, and the feature detection and matching comprises point feature extraction and tracking and line feature extraction and tracking. The RGBD visual inertia simultaneous positioning and map construction method, device, equipment and medium based on dotted line feature fusion disclosed by the invention solve the problems of efficient state estimation and high-precision three-dimensional map construction of an autonomous robot under illumination change and low-texture indoor scenes, and have the advantages of high precision, high efficiency and the like.

Description

RGBD visual inertia simultaneous positioning and map construction with dotted line feature fusion

Technical Field

The invention relates to an RGBD visual inertia simultaneous localization and map construction system and method based on dotted line feature fusion, and belongs to the technical field of autonomous robot simultaneous localization and map construction.

Background

One of the goals of robots is autonomous operation in the real world, while the Localization and Mapping (SLAM) system is an autonomous robot key technology.

Efficient state estimation and accurate three-dimensional map construction technology of autonomous robots in indoor scenes with illumination changes and weak textures face huge challenges. On one hand, in an indoor scene with illumination change and weak texture, the visual inertia simultaneous localization and map creation (SLAM) method based on the feature point method has the defects that the localization precision is greatly reduced due to the difficulty in extracting a large number of effective feature points, and the complete failure of the system can be caused even by video, motion blur and the like with poor texture.

Most of traditional line feature detection methods adopt an EDLines algorithm as a tool for line feature detection, but when the algorithm faces a complex background or a noisy image, too many short line segment features are easily detected, and computing resources for line segment detection, description and matching are wasted, so that the positioning accuracy is obviously reduced. In addition, the algorithm usually has the condition that too many adjacent similar line segments and long line segments are easily segmented into a plurality of short line segments, so that the subsequent line segment matching task becomes complicated, and the uncertainty of the SLAM system is increased.

The construction of the three-dimensional dense point cloud map is very dependent on the accuracy of a pose estimation algorithm, and if the pose estimation is not accurate enough, the constructed three-dimensional dense point cloud map is easy to overlap and distort

For the reasons, the inventor conducts deep research on the existing autonomous robot state estimation and three-dimensional dense point cloud map construction method, and provides an RGBD visual inertia simultaneous localization and map construction system and method based on point-line feature fusion.

Disclosure of Invention

In order to overcome the problems, the inventor conducts intensive research to design an RGBD visual inertia simultaneous localization and map construction method based on dotted line feature fusion, which comprises a front-end visual odometer, a rear-end optimization and a three-dimensional map construction process, and further, in the front-end visual odometer process, input information of a visual inertia simultaneous localization and map creation system is detection values of an image and an IMU acquired by an RGBD camera.

Further, the front-end visual inertial odometer is based on dotted line features, including feature detection and matching, IMU pre-integration, and visual inertial alignment,

the feature detection and matching comprises the extraction and tracking of point features and the extraction and tracking of line features.

Further, the extracting and tracking of line features comprises the following sub-steps: s11, restraining the length of the line segment; s12, extracting line features through near line combination and short line splicing; and S13, tracking the line features by an optical flow method. Preferably, step S13 includes the following sub-steps: s131, matching anchor points; s132, matching points with lines; and S133, matching the lines. Preferably, in the front-end visual inertial odometer process, there is further the step of: s14, parameterizing point and line characteristics; and in the process of parameterizing the point and line characteristics, carrying out error constraint on the line characteristics.

Preferably, the IMU pre-integration value is taken as a constraint between two consecutive camera frame images.

Preferably, in the back-end optimization process, all state variables in the sliding window are optimized by minimizing the sum of all measurement residuals.

The invention also provides an RGBD visual inertia simultaneous positioning and map building device based on dotted line feature fusion, which comprises a front-end visual odometer module, a rear-end optimization module and a three-dimensional map building module,

the front-end visual inertial odometer module comprises a feature detection and matching sub-module, an IMU pre-integration sub-module and a visual inertial alignment sub-module, wherein the IMU pre-integration sub-module takes an IMU pre-integration value as a constraint between two frames of images, and the back-end optimization module optimizes all state variables in a sliding window by minimizing the sum of all measurement residuals.

The present invention also provides an electronic device, comprising: at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the above methods.

The present invention also provides a computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the above methods.

The invention has the advantages that:

(1) according to the RGBD visual inertia simultaneous positioning and map construction system based on dotted line feature fusion, the problem that the correlation error of the front end data of the visual inertia odometer based on the feature points is large in a weak texture scene is solved through the introduction of line features, and the estimation precision of the motion between adjacent frame images of the weak texture scene is effectively improved; (2) according to the RGBD visual inertia simultaneous localization and map construction system based on the point-line feature fusion, the problem of low line segment extraction quality of the traditional algorithm is optimized while the rapid extraction is ensured by using the length suppression, near line combination and broken line splicing strategies, and the mismatching rate of the line features of the system is reduced; (3) according to the RGBD visual inertia simultaneous localization and map construction system based on dotted line feature fusion, the efficiency of a line feature tracking algorithm is obviously improved by using an optical flow method to track line features; (4) according to the RGBD visual inertia simultaneous localization and map construction system based on the point-line feature fusion, the high-precision dense point cloud map is constructed, and efficient state estimation and accurate three-dimensional point cloud construction can be achieved in indoor scenes with illumination changes and weak textures.

Drawings

FIG. 1 illustrates a flow diagram of a dotted line feature fused RGBD visual inertia simultaneous localization and mapping method in accordance with a preferred embodiment of the present invention;

FIG. 2 illustrates comparison results of detection effects before and after line segment length suppression in a dotted line feature fused RGBD visual inertia simultaneous localization and mapping method according to a preferred embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating simultaneous localization and mapping of multiple proximate segments into a single segment in a dotted line feature fused RGBD visual inertia in accordance with a preferred embodiment of the present invention;

FIG. 4 is a diagram illustrating the joining of shorter segments into a long segment in a dotted line feature fused RGBD visual inertia simultaneous localization and mapping method according to a preferred embodiment of the present invention;

FIG. 5 illustrates RGBD visual inertia simultaneous localization and mapping method in I with dotted line feature fusion according to a preferred embodiment of the present invention₁Middle match I₀A point process schematic diagram in the frame image;

FIG. 6 is a diagram illustrating a matching point-to-line process in a dotted line feature fused RGBD visual inertia simultaneous localization and mapping method according to a preferred embodiment of the present invention;

FIG. 7 is a diagram illustrating a match line-to-line process in a dotted line feature fused RGBD visual inertia simultaneous localization and mapping method according to a preferred embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating RGBD visual inertia simultaneous localization and mapping method line feature reprojection error in dotted line feature fusion according to a preferred embodiment of the present invention;

FIG. 9-a shows the detection result of the cafe line characteristics in example 1; FIG. 9-b shows the detection results of the coffee shop spot feature in example 1; FIG. 9-c shows the results of cafe line detection and tracking in example 1;

FIG. 10-a shows the results of detection of the household line 1 features in example 1; FIG. 10-b shows the results of home 1-point feature detection in example 1; FIG. 10-c shows the results of line detection and tracking for household 1 in example 1;

FIG. 11-a shows the results of family 2 line feature detection in example 1; FIG. 11-b shows the results of family 2-point feature detection in example 1; FIG. 11-c shows the results of line detection and tracking for household 2 in example 1;

FIG. 12-a shows the results of the detection of the characteristics of the corridor lines in example 1; FIG. 12-b shows the detection result of the feature of the corridor point in embodiment 1; FIG. 12-c shows the corridor alignment detection and tracking results in example 1;

FIG. 13-a shows the results in the corridor scenario of example 1 compared to the true trajectory (Ground route) and error; FIG. 13-b shows the results of the corridor scenario in example 1 compared to the real trajectory (Ground route) and error in a top view perspective; FIG. 13-c shows the results of the corridor scene in example 1 compared to the real trajectory (Ground route) and error in side view viewing angle;

FIG. 14 shows the results of example 2 and comparative examples 3 to 5 in scenario Lab 14; FIG. 15 shows a translation error box plot on scene Lab 1; FIG. 16 shows a diagram of a rotational error box on scene Lab 1;

fig. 17 shows the positional drift of embodiment 2 in the scene Lab 3; fig. 18 shows the positional drift of embodiment 2 in the scene Lab 6; fig. 19 shows three-dimensional maps obtained in comparative example 3 and example 2; fig. 20 shows three-dimensional maps obtained in comparative example 3 and example 2; fig. 21 shows three-dimensional maps obtained in comparative example 3 and example 2.

Detailed Description

The invention is explained in more detail below with reference to the figures and examples. The features and advantages of the present invention will become more apparent from the description.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

Most of the traditional visual inertia simultaneous localization and map creation (SLAM) methods are based on point feature detection and description, however, in indoor scenes with illumination change and weak texture, due to the fact that a large number of effective feature points are difficult to extract, the localization accuracy is greatly reduced, and the complete failure of the system can be caused even by video with poor texture, motion blur and the like.

In view of the above, the inventor introduces line features and Inertial Measurement (IMU) data into the conventional SLAM method, the line feature detection is more robust than the point feature detection under the low texture condition, and the inertial measurement data can still perform good frequency motion under the condition of a small number of features in an image sequence, so that the SLAM result is still stable when the view angle of the robot changes significantly.

The RGBD visual inertia simultaneous localization and map construction method based on the dotted line feature fusion comprises a front-end visual odometer, a rear-end optimization and a three-dimensional map construction process.

In the process of front-end visual odometry, the input information of the visual inertia simultaneous positioning and map creation system is an image acquired by an RGBD camera and the detection value of an IMU.

The RGBD camera is a camera capable of simultaneously acquiring output RGB information and depth information, and is one of cameras commonly used for image recognition, and the IMU is a detection unit commonly used for a robot and is a sensor for measuring acceleration and motion angular velocity of the robot.

Further, the front-end visual inertial odometer is based on point features and line features, including feature detection and matching, IMU pre-integration, visual inertial alignment, as shown in fig. 1.

In the present invention, the method for extracting and tracking the point features is not particularly limited, and a point feature extraction and tracking method in the conventional SLAM may be adopted, for example, a Shi-Tomasi corner point is extracted as a feature point, a KLT optical flow method is adopted to track the feature point, and a point with a large difference is tracked and eliminated based on a backward optical flow method.

Wherein, the shit-Tomasi corner extraction adopts the paper J.Shi, C.tomasi, Good features to track, in: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 1994, pp.593-600; the KLT optoflow method uses the paper B.D.Lucas and T.Kanade, iterative image registration technique with An application to stereo vision, in: the methods of Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI), IEEE, 1981, pp.24-28; the reversed flow method was performed using the paper Baker S, Matthews I.Lucas-kanade 20years on: a unification frame [ J ]. International journal of computing, 2004, 56 (3): 221-255.

Compared with the traditional SLAM system, the method has the advantages that the extraction and tracking of the line features are added in the front-end visual odometer, and the additional constraint is provided for the system by adding the line features, so that the accuracy of the system state estimation is improved.

In the prior art, line feature extraction methods include an LSD method, an FLD method, an EDlines method and the like, wherein the LSD method and the FLD method have slow detection speed and cannot meet the timeliness requirement of SLAM; although the EDlines method has high detection speed and high detection efficiency, when the EDlines method faces a low-texture scene or a noise image, a large number of short line features are always detected, the calculation resources for line detection, description and matching are wasted, and the positioning accuracy is obviously reduced.

In the SLAM system formed by fusing point and line features, because the line features are only used for providing additional constraint to improve the accuracy of state estimation, the EDLines method is improved in the invention, so that the EDLines method only detects the obvious line features in a scene while the original detection efficiency is maintained, and a more effective line feature detection result is obtained.

Specifically, in the present invention, the extraction and tracking of line features comprises the following sub-steps:

s11, restraining the length of the line segment; s12, extracting line features through near line combination and short line splicing; s13, tracking line features through an optical flow method; preferably, after step S12, there may be step S14, point, line feature parameterization.

Since the longer line segments are more stable and can be repeatedly detected in multiple frames, the positioning accuracy also increases with the increase of the number of the longer line segments, and in step S11, the line segments with the length lower than the threshold are deleted through the length suppression of the line segments, thereby effectively improving the quality of the extracted line segments. The results of comparison of the detection effects before and after the suppression of the segment length are shown in fig. 2.

Specifically, the segment length suppression is represented as:

wherein len_iIndicating the length of the ith line segment, len, in the image_minThreshold value, W, representing the characteristic length of the shortest line_IIs the width of the image, H_IHeight of image, symbol

Represents rounding up, eta is a proportionality coefficient, (x)_sp，y_sp) Coordinates of origin of line segment (x)_ep，y_ep) Representing the coordinates of the end point of the line segment.

In the invention, the minimum value of the image width and the image height is taken as one of key indexes of threshold setting, so that the line length inhibition can adapt to different image sizes.

Preferably, the value of the proportionality coefficient eta is 0.01-0.2, more preferably 0.1, and through a large number of experiments, the inventor finds that the value of the proportionality coefficient eta can guarantee the quality of line segment detection to the maximum extent.

Considering that longer line segment features tend to come from a continuous region with a larger gradient where the detection accuracy is more reliable, it is necessary to merge adjacent line segments so that the resulting line segments are longer.

In step S12, a plurality of closer line segments are merged into one line segment, i.e., a near line merge, as shown in fig. 3, and

and splicing a plurality of shorter line segments into a long line segment, namely splicing short lines, as shown in fig. 4.

As shown in FIG. 3, in the present invention, L_i、L_j、L_kRepresenting segments to be spliced, their end points being respectively

As shown in FIG. 4, in the present invention, L_u、L_vRepresenting two segments to be spliced, the end points of which are respectively

Preferably, in step S12, the method includes the following sub-steps:

s121, sequencing the line segments; s122, screening the line segments; s123, filtering the screened line segment group to obtain a merging candidate line segment set and a splicing candidate line segment set; and S124, merging or splicing the line segments.

In step S121, the line segments are sorted by segment length, preferably in descending order from long to short, to obtain a sequence of line segments { L }₁，L₂，..，L_nIn which L is₁The line segment with the longest line segment length is shown.

Since longer line segments tend to come from image regions with continuous strong gradients, L is the longest line segment₁It is more reliable to start with.

Further, mixing L₁The outer line segments are represented as a set of remnant line segments

L＝{L₂，L₃，...，L_n} (two)

Wherein n is the total number of line segments.

In step S122, the filtering includes angle filtering, which can be expressed as:

wherein L is_αRepresenting candidate line segment groups, L, obtained by angle screening_mRepresenting different line segments, L, in the set of remnant line features_mAngle in horizontal direction is theta_m，θ_minIs an angle threshold for measuring the similarity of line segments.

The inventor determines theta through a large number of experiments_minIt is most preferable at π/90.

In step S123, the obtained line segment candidate group L is subjected to_αAnd performing filtering, wherein the filtering of the near line merging process is represented as:

through the filtering process, a merging candidate line segment set can be obtained

The filtering of the stub splicing process is represented as:

through the filtering process, a splicing candidate line segment set can be obtained

In the formula IV and the formula V, d_minAnd the threshold value represents the proximity of the measured line segment, and is preferably 5-15 pixels, and more preferably 9 pixels through a large amount of experimental analysis.

Through the filtering process, the angle and the space distance can be close to L₁The line segment sets are grouped into a group to obtain a candidate line segment set L_β＝{L_β1，L_β2}。

In step S124, the filtered candidate line segment set L_βAdding line segment L₁Form a new set of line segments { L }₁，L_βAnd selecting a head end point and a tail end point which are farthest from each other in the line segment set as a starting point and an ending point of a new line segment, and synthesizing a new line segment L_M。

Recalculating the resultant line segment L_MAngle theta_M，

If theta is satisfied_M＜θ_minThen set line segments { L }₁，L_βAre merged or spliced to form a composite line segment L_MSet of replacement line segments { L₁，L_β}；

If not satisfy theta_M＜θ_minAnd then giving up merging or splicing, wherein the angle difference between the front and the rear of merging is overlarge, and the synthetic result deviates from the original line segment.

And repeating the process, and merging and splicing all line segments in the image.

The step S12 is one of the problems to be solved by the present invention, which is how to determine the correspondence between line segments in two consecutive images and further implement the tracking of the line segments.

In step S13, the points on the line segment are matched by the optical flow method, and then the matching points are counted to realize the tracking of the line segment.

Optical flow is commonly used for tracking point features, and is improved in the present invention to enable tracking of line segments.

Specifically, the method comprises the following substeps: s131, matching anchor points; s132, matching points with lines; and S133, matching the lines.

In step S131, the anchor points are spaced points in a line segment, and one line segment is characterized by a plurality of anchor points.

In the present invention, two consecutive frame images are represented as I₀And I₁Further, I₀The set of line segments in the frame image is represented as:

L_{0 frame}＝⁰l_i ⁰l_i＝(⁰p_i，⁰q_i)，0≤i≤M₀Wherein M is₀Is represented by₀The total number of line segments in the frame image,⁰l_iis represented by₀The ith line segment in the frame image,⁰p_i，⁰q_iis represented by₀Two end points of the ith line segment in the frame image;

likewise, I₁The set of line segments in the frame image is represented as:

L_{1 frame}＝¹l_k|¹l_k＝(¹p_k，¹q_k)，0≤k≤M₁Wherein M is₁Is represented by₁The total number of line segments in the frame image,¹l_kis represented by₁The k-th line segment in the frame image,¹p_k，¹q_kis represented by₁Two end points of the k-th line segment in the frame image.

For I₀Each line segment in the frame image⁰l_i＝(⁰p_i，⁰q_i) The start point, the end point, and a plurality of points in the direction of the start point and the end point of the line segment are selected as anchor points, and as shown in fig. 5, the anchor points are set as⁰Q_i＝⁰λ_i，j|0≤j＜N_i，⁰λ_i，jRepresenting the jth anchor point, N, on the ith line segment_iRepresents the total number of anchor points on the ith line segment, and is preferably 12.

Further, the vector of the starting point and the end point direction is:

⁰a_i＝[⁰a_i，0，⁰a_i，1]^T＝(⁰q_i-⁰p_i)/||⁰q_i-⁰p_i||。

further, in I₁In the frame image, I can be obtained₀Each anchor point in the frame image⁰λ_i，jIn the normal direction⁰n_iUpper matching point¹λ_i，jThe normal direction is as follows:

⁰n_i＝[-⁰a_i，1，⁰a_i，0]^T

in step S122, in I₁In the frame image, each matching point is obtained¹λ_i，jTo different line segments¹l_kObtaining a distance set d_kThe shortest distance in the distance set is denoted as d_minThe line segment corresponding to the shortest distance is represented as¹l_{closett(i，j)}As shown in fig. 6, in this example,

further, if d_min＜d_thresholdThen the matching point is considered¹λ_i，jBelong to line segment¹l_{closest(i，j)}；

Otherwise, consider the matching point¹λ_i，jNot belonging to line segments¹l_{closest(i，j)}Judging the next anchor point;

wherein d is_thresholdThe empirical threshold is preferably 1 pixel.

In step S133, as shown in FIG. 7, for I₀Arbitrary line segment in frame image⁰l_i，0≤i＜M₀Each anchor point of (1) at₁All have unique matching points in the frame image¹λ_i，jThe matching point falls on the line segment¹l_{closest(i，j)}The above. For I₀Line segments within a frame⁰l_iCalculate it at I₁Number of anchor points matched in frame, set as Z_iIs a line segment¹l_match(i)Number of anchor points matched, if Z_i/N_i> R, definition⁰l_iAnd¹

l

_match(i)0≤match(i)＜M₁matched to each other, where R is an empirical threshold, preferably 0.6 pixels.

The inventors have found that although a solution based on line features is a good choice in low-texture scenes lacking point features, line feature occlusion or false detection tends to occur in repetitive scenes or low-texture scenes. In the present invention, the above problem is solved by a line feature error constraint.

In step S14, the point and line features are parameterized, and in the parameterization process, the line features are subjected to error constraint.

Specifically, in the present invention, the errors of the point feature and the line feature are constrained as follows:

in which, as shown in figure 8,

in the case of a point projection error term,

is a line projection error term; t is_IwRepresenting the state of the RGBD camera in the I frame image in the world coordinate system, X_w，jDenotes the j-th point, x, observed in the I-th frame_i，jIs X_w，jProjection onto a plane; p_w，k，Q_w，kTwo end points, p, representing the ith line segment in the I-th frame_i，k，q_i，kIs P_w，k，Q_w，kThe point on the corresponding image plane is,

is P_w，k，Q_w，kThe end points of the projection on the image plane,

is p in the world coordinate system_i，kTo the projected line segment

The distance of (a) to (b),

is q under the world coordinate system_i，kTo the projected line segment

The distance of (d); pi () for camera projection model, pi^h() Projecting the homogeneous coordinates of the model for the camera, further:

where O is the point coordinate in the RGBD camera coordinate system, P_wAs point coordinates in the world coordinate system, R_wRepresenting coordinate rotation, t_wRepresenting coordinate translation, f_x，f_yRepresenting the focal length of an RGBD camera, c_x，c_yRepresenting the principle points of the RGBD camera.

Further, P_w，kThe coordinate of the corresponding camera coordinate system is O_p，Q_w，kThe coordinate of the corresponding camera coordinate system is O_qObtainable by the following formula:

then line segment

To the end point P_w，k，Q_w，kThe Jacobian matrix of the reprojection errors of (1) is:

wherein the content of the first and second substances,

is O_pComponents on the x, y, z coordinate axes of the camera coordinate system;

is O_qComponents on the x, y, z coordinate axes of the camera coordinate system;

further, the air conditioner is provided with a fan,

l_ppointing to the optical center of a camera

Vector of (a) < i >_qPointing to the optical center of a camera

The vector of (2).

In the conventional SLAM, a BA-optimization-based method is mostly adopted, an IMU pre-integration process is adopted, the frame speed and rotation need to be updated during each state adjustment, and re-integration is needed after each iteration, which makes the transmission strategy very time-consuming.

IMU measurements typically include three orthogonal axis accelerometers and three axis gyroscopes for measuring acceleration and angular velocity relative to the inertial frame. Affected by noise, IMU measurements typically contain an additive white noise n and a time-varying bias b, expressed as:

wherein, a_tRepresenting the ideal noise free value of three orthogonal axis accelerometers,

the offset of the acceleration is represented by the acceleration offset,

rotation g representing world coordinate system to body coordinate system^wRepresenting the acceleration of gravity in the world coordinate system, n_aIndicating accelerationMeter noise, omega_tRepresenting an ideal value of the three-axis gyroscope without bias,

representing the gyroscope bias, n_wRepresenting gyroscope noise.

Further, accelerometer noise n_aAnd gyroscope noise n_wCan be expressed as a form of gaussian white noise:

further, the acceleration is biased

The derivative of (c) and the derivative of the gyroscope bias can be expressed in the form of gaussian white noise:

in the present invention, the IMU pre-integration value is used as a constraint between two frames of images, together with the camera re-projection residual, in the local window optimization process.

In particular, b_k、b_k+1For two successive image frames, b_kThe time corresponding to a frame is t_k，b_k+1The time corresponding to a frame is t_k+1，t_kAnd t_k+1The IMU pre-integration value is adopted to carry out the following constraint:

wherein the content of the first and second substances,

denotes b_kFrame to b_k+1The translation of the frame is carried out by a translation,

denotes b_kFrame to b_k+1The speed of the frame is such that,

denotes b_kFrame to b_k+1The rotation of the frame is carried out by rotating the frame,

representing τ Frames to b_kThe rotation of the frame is carried out by rotating the frame,

denotes the rotation from time t to bk frame and τ denotes the intermediate variable of time during the double integration.

Further, Ω (·) can be expressed as:

ω ^ represents an antisymmetric matrix.

In the invention, when the pre-integration calculation is carried out between two continuous camera frames, the Jacobian matrix of each pre-integration term is calculated, and the effect of correcting the influence of bias change is achieved. Further, if the estimated deviation change is large, the pre-integration process is propagated again under the new deviation value, and through the process, a large amount of calculation is saved, the calculation efficiency is improved, and the positioning and map building speed is improved.

The visual inertial alignment adopts a loose coupling scheme of vision and IMU, firstly, a structural motion method (SFM) is utilized to estimate the camera attitude and the spatial position of a three-dimensional point, and then, the camera attitude and the spatial position are aligned with IMU pre-integration data to obtain the gravity direction, the scale factor, the gyroscope bias and the corresponding speed of each frame.

According to the present invention, in the back-end optimization process, unlike the conventional SLAM method, all state variables in the sliding window are optimized by minimizing the sum of all measurement residuals, and the process can be expressed as:

wherein the content of the first and second substances,

measuring residual error, χ, from frame I to frame I +1 in a body coordinate system_bFor the set of all IMU pre-integrals within the sliding window,

for the re-projection error of the point feature,

reprojection error, x, for line features_pFor the set of point features observed in frame I, x_lFor the set of line features observed in frame I, e_priorFor a priori information, ρ () is a Cauchy function, which plays a role in suppressing an outlier.

In the three-dimensional mapping process, as in the conventional SLAM method, local point clouds are spliced into a final point cloud map by means of estimated camera poses.

Accurate pose estimation plays an important role in the accuracy of the point cloud picture. If the pose estimate is not accurate enough, the final point cloud images may overlap or bow. In the invention, as the feature tracking node is adopted to track the features on the RGB image and obtain the distance information from the depth map, and another scale information source is obtained by utilizing the depth map, the robustness and the accuracy of the system are improved, and the reconstructed three-dimensional cloud map is more reliable.

Furthermore, in the invention, three-dimensional points are projected from each depth image, and are synchronous with the key frame, the corresponding postures are estimated, and the cloud picture is constructed, so that the system is compact and is more suitable for small-sized robot integration. Moreover, the point cloud picture of the type can be directly used for semantic segmentation or recognition and further used for obstacle avoidance and path planning.

On the other hand, the invention also discloses a device which comprises a front-end visual odometer module, a rear-end optimization module and a three-dimensional drawing building module.

The front-end visual inertial odometer module comprises a feature detection and matching sub-module, an IMU pre-integration sub-module and a visual inertial alignment sub-module.

The feature detection and matching submodule is used for extracting and tracking point features and extracting and tracking line features.

Further, in the feature detection and matching submodule, the line feature is extracted and tracked through steps S11 to S13.

The IMU pre-integration submodule takes an IMU pre-integration value as a constraint between two frames of images, and the constraint is used together with a camera re-projection residual error in the local window optimization process, and the constraint is expressed as follows:

the visual inertial alignment submodule estimates the camera attitude and the spatial position of a three-dimensional point by using a structure motion method (SFM), and then aligns the camera attitude and the spatial position with IMU pre-integration data to obtain the gravity direction, the scale factor, the gyroscope bias and the corresponding speed of each frame.

The back-end optimization module optimizes all state variables in the sliding window by minimizing the sum of all measurement residuals, and the process can be expressed as:

and the three-dimensional mapping module is used for splicing the local point clouds into a final point cloud map by means of the estimated camera attitude.

Various embodiments of the above-described methods and apparatus of the present invention may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the methods and apparatus described herein may be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The methods and apparatus described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described herein), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service extensibility in the conventional physical host and VPS service ("Virtual Private Server", or VPS "for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed herein can be achieved, and the present disclosure is not limited herein.

Examples

Example 1

The simulation experiment of inertia simultaneous localization and map creation is carried out by adopting an open data set, wherein the data set is OpenLORIS-Scene, and is collected in 5 scenes of an office, a corridor, a family, a cafe and a market, which are very typical autonomous robot application scenes.

The simultaneous localization of the inertia and the map creation comprises a front-end visual odometer, a rear-end optimization and a three-dimensional map building process.

Extracting and tracking point features and line features in the process of front-end visual odometry, wherein when the point features are extracted and tracked, Shi-Tomasi angular points are extracted as feature points, a KLT optical flow method is adopted to realize the tracking of the feature points, and points with larger differences are tracked and eliminated based on a reverse optical flow method;

the line feature extraction tracking comprises the following sub-steps:

s11, restraining the length of the line segment; s12, extracting line features through near line combination and short line splicing; s13, tracking line features through an optical flow method; and S14, parameterizing point and line characteristics.

In step S11, the segment length suppression is expressed as:

eta is 0.1.

Step S12 includes the following substeps: s121, sequencing the line segments; s122, screening the line segments; s123, filtering the screened line segment group to obtain a merging candidate line segment set and a splicing candidate line segment set; and S124, merging or splicing the line segments.

In step S121, the line segments are sorted in the order of length of the line segment from long to short, and in step S122, the screening is an angle screening, which is expressed as:

wherein, theta_minIs pi/90.

the filtering of the stub splicing process is represented as:

d_mina threshold value representing the proximity of the measured line segment, whose value is 9 pixels.

Step S13 includes the following substeps: s131, matching anchor points; s132, matching points with lines; and S133, matching the lines.

Wherein, the total number of anchor points in the substep S131 is 12.

In step S14, parameterizing the point and line features, and in the parameterizing process, performing error constraint on the line features, and constraining the errors of the point and line features as follows:

in the process of front-end visual odometry, an IMU pre-integration value is used as a constraint between two frames of images, and is used together with a camera re-projection residual error in the process of local window optimization, so that the following constraints are performed:

in the process of the front-end visual odometer, a loose coupling scheme of vision and an IMU is adopted for visual inertial alignment, the camera attitude and the spatial position of a three-dimensional point are estimated by using a structure motion method (SFM), and then the camera attitude and the spatial position are aligned with IMU pre-integration data to obtain the gravity direction, the scale factor, the gyroscope bias and the corresponding speed of each frame.

In the back-end optimization process, all state variables in the sliding window are optimized by minimizing the sum of all measurement residuals, which process can be expressed as:

in the three-dimensional mapping process, the local point clouds are spliced into a final point cloud map by means of the estimated camera pose.

Fig. 9 to 12 are point and line feature detections in a plurality of low texture and illumination variation scenes in an OpenLORIS-Scene data set, where fig. 9 is a coffee shop detection result, fig. 10 is a home 1 detection result, fig. 11 is a home 2 detection result, and fig. 12 is a corridor detection result; FIGS. 9-a, 10-b, 11-c, 12-d show results of line feature detection, and FIGS. 9-a, 10-b, 11-c, 12-d show results of point feature detection.

It can be seen that the indoor scene is mostly an artificial structure scene, has linear structures such as abundant edge, right angle, changes the low texture scene (coffee shop, corridor) that is stronger in the illumination, and the characteristic is not abundant (family 1, family 2), and the point feature detection is not enough, but, line feature detection algorithm still can obtain abundant line characteristic. Compared with point detection, the point-line feature complementation method can provide richer and more robust feature information for subsequent motion estimation.

9-a, 10-b, 11-c, 12-d show the line detection and tracking in different scenarios, more effective information can be obtained by using a line feature matching algorithm, the geometric structure can be described more intuitively, and richer and more robust information is provided for the subsequent visual range to estimate the camera motion according to the comprehensive features.

Fig. 13-a is a comparison of the results in a corridor scene with the true trajectory (Ground route) and error, where the color changes from blue to red to indicate a gradual increase in error, and fig. 13-b, 13-c are the results in a top view viewing angle and a side view viewing angle.

As the speed of motion increases, the viewing angle changes significantly due to the introduction of more motion blur, the illumination conditions are also challenging to track features, and performance may degrade. As can be seen from the figure, the present embodiment can accurately align with the real trajectory, and especially when the camera rotates rapidly, drift free tracking can be realized, which means that adding depth information to the visual inertial system has great improvement on pose estimation.

Example 2

And carrying out a positioning and mapping experiment in an indoor environment, and verifying the positioning accuracy according to the data of the motion capture system OptiTrack.

The experimental move was a drone equipped with an intel RealSense D435i depth camera, intel NUC onboard computer, and Pixhawk V5+ flight controls. The D435i depth camera, which incorporates a six-axis IMU sensor, can capture 640 x 800 resolution color images. In the experiment, the sampling frequencies of the camera and IMU were set to 30Hz and 200Hz, respectively. And the onboard computer processes the image and IMU data acquired by the depth inertial camera in real time.

14 different scenes (Lab 1-Lab 14) are built indoors, and the 14 scenes respectively have the characteristics of low texture, illumination change, rapid acceleration and angular rotation.

And the on-board computer positions the pictures collected by the depth camera to map according to the method in the embodiment 1.

Comparative example 1

The same experiment as in example 1 was conducted, except that the test was conducted in a paper R.Mur-Artal, J.D.Tard. Louis, ORB-SLAM 2: the ORB-SLAM2 method in Anopen-source SLAMS for monocula, stereo, and RGB-Dcameras, IEEE Transactions on Robotics 33(5) (2017)1255-1262, ORB-SLAM2 is a point-based monocular SLAM system that can operate in large, small, indoor and outdoor environments by adding RGBD interfaces.

Comparative example 2

The same experiment as in example 1 was performed, except that the following paper q.fu, j.wang, h.yu, i.ali, f.guo, y.he, h.zhang, PL-VINS: real time monoclonal visual-interactive SLAMS with points and line features (2020). 2009.07462. The PL-VINS method is a monocular vision inertial system optimization method with point and line characteristics developed on the basis of point-based VINS-M0 no.

Comparative example 3

The same experiment as in example 2 was carried out, except that the VINS-RGBD method in article Z.Shan, R.Li, S.Schwertfeger, RGBD-inert project estimation and mapping for ground loops, Sensors 19(10) (2019)2251 was used.

Comparative example 4

The same experiment as in example 2 was performed, except that the following paper, R.Mur-Artal, J.D.Tard. Louis, ORB-SLAM 2: the ORB-SLAM2 method in open-source SLAMS for monoclonal, stereo, and RGB-Dcameras, IEEE Transactions on Robotics 33(5) (2017)1255-1262.

Comparative example 5

The same experiment as in example 2 was performed, except that the following paper q.fu, j.wang, h.yu, i.ali, f.guo, y.he, h.zhang, PL-VINS: real time monoclonal visual-interactive SLAMS with points and line features (2020). 2009.07462. The PL-VINS method of (1) is carried out.

Examples of the experiments

Experimental example 1

The results of the experiments comparing example 1 with comparative examples 1 and 2 were evaluated using Root Mean Square Error (RMSE), maximum error (max error) and mean error (mean error), and are shown in table one.

Watch 1

As can be seen from the table I, the RMSE in the embodiment 1 is obviously lower than that in the scenes of the comparative example 1, the marker 1-1 and the corridor1-4, the tracking fault occurs in the comparative example 2, the serious drift occurs, and the final track cannot be obtained, and the simulation result shows that the embodiment 1 has better robustness.

In addition, in the case of fast motion, i.e. in the sparse feature sequence Office 1-7, embodiment 1 can still maintain high-precision positioning of about 0.18m, which is about 44% higher than that of comparative example 1.

And the positioning error fluctuation of the embodiment 1 is small, the consistency of the positioning estimation is good, and the robust accurate positioning is realized in the low-texture environment.

Experimental example 2

The RMSE (unit: m) results of comparative example 2 and comparative examples 3 to 5 are shown in Table two.

Watch two

As can be seen from table two, embodiment 1 is superior in low texture and accuracy of illumination variation in the room, and embodiment 1 has minimal overall fluctuation, high stability, and root mean square error within 0.44 meters.

The results of example 2 and comparative examples 3 to 5 in the Lab14 are shown in FIG. 14, and it can be seen from the graph that the positioning track of example 2 is closer to OptiTrack, and the positioning error is significantly smaller than that of comparative examples 3 to 5.

Fig. 15 shows a translational error box diagram on the scene Lab1, and fig. 16 shows a rotational error box diagram on the scene Lab 1. As can be seen from the figure, the upper and lower concentration distributions of example 2 are smaller than those of comparative examples 3 to 5.

Fig. 17 shows the position drift of the embodiment 2 in the scene Lab3, and it can be seen from the figure that the triaxial error is concentrated within 300mm, and fig. 18 shows the position drift of the embodiment 2 in the scene Lab6, and it can be seen from the figure that the triaxial error is concentrated within 200mm, which illustrates that the change of the embodiment 2 is smooth, and the positioning accuracy and stability are good.

Fig. 19 to 21 show three-dimensional maps obtained in comparative example 3 and example 2, wherein the left column is the result obtained in comparative example 3, and the right column is the result obtained in example 2, and it can be seen from the figures that comparative example 3 has a large error in the horizontal and vertical directions, double shading and bending phenomena occur during the splicing process, and the map construction precision is low. In embodiment 2, the map stitching effect is greatly improved, and the problems of double shadows and curvature are solved, which also means that the accumulated error in the map construction process is reduced, and embodiment 2 can well complete the construction of the indoor map.

The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.

Claims

1. An RGBD visual inertia simultaneous localization and map construction method based on dotted line feature fusion comprises a front-end visual odometer, a rear-end optimization and a three-dimensional map construction process, and is characterized in that,

2. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 1,

the front-end visual inertial odometer is based on dotted line features, including feature detection and matching, IMU pre-integration, and visual inertial alignment,

3. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 2,

the extraction and tracking of line features comprises the following sub-steps:

s11, restraining the length of the line segment;

s12, extracting line features through near line combination and short line splicing;

and S13, tracking the line features by an optical flow method.

4. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 3,

step S13 includes the following substeps:

s131, matching anchor points;

s132, matching points with lines;

and S133, matching the lines.

5. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 2,

in the front-end visual inertial odometer process, the method also comprises the following steps:

s14, parameterizing point and line characteristics;

and in the process of parameterizing the point and line characteristics, carrying out error constraint on the line characteristics.

6. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 2,

the IMU pre-integration value is taken as a constraint between two consecutive camera frame images.

7. The RGBD visual inertia simultaneous localization and mapping method based on dotted line feature fusion according to claim 2,

in the back-end optimization process, all state variables in the sliding window are optimized by minimizing the sum of all measurement residuals.

8. An RGBD visual inertia simultaneous positioning and map construction device based on dotted line feature fusion is characterized by comprising a front-end visual odometer module, a rear-end optimization module and a three-dimensional map construction module,

the front-end visual inertial odometer module comprises a feature detection and matching sub-module, an IMU pre-integration sub-module and a visual inertial alignment sub-module,

the IMU pre-integration sub-module takes the IMU pre-integration value as the constraint between two frames of images,

the back-end optimization module optimizes all state variables in the sliding window by minimizing the sum of all measured residuals.

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A computer-readable storage medium having computer instructions stored thereon for causing the computer to perform the method of any one of claims 1-7.