CN113012196A

CN113012196A - Positioning method based on information fusion of binocular camera and inertial navigation sensor

Info

Publication number: CN113012196A
Application number: CN202110245725.7A
Authority: CN
Inventors: 李奎霖; 魏武; 曾锦秀; 肖文煜
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-22
Anticipated expiration: 2041-03-05
Also published as: CN113012196B

Abstract

The invention discloses a positioning method based on information fusion of a binocular camera and an inertial navigation sensor, which comprises the following steps of: acquiring a peripheral image sequence of the robot; using a FAST feature point extraction algorithm for the obtained image frame, and introducing a quadtree to quickly screen feature points; for the obtained image frame, extracting line features by using an LSD line feature extraction algorithm, and calculating a line feature descriptor by using an LBD algorithm; performing tracking matching on FAST feature points extracted from the previous and next image frames to obtain an inter-frame estimation pose, re-projecting point and line features in the previous and next frames to the current frame, and performing tracking matching on the line features; confirming whether the current frame is a key frame, if so, acquiring IMU sensor data between a front frame and a rear frame, performing pre-integration on the IMU sensor data, and adding the current frame into a sliding window; and the paired dotted line characteristic constraints in the sliding window and IMU pre-integration tight coupling are combined to perform rear-end optimization, and more accurate inter-frame motion pose is output.

Description

Positioning method based on information fusion of binocular camera and inertial navigation sensor

Technical Field

The invention relates to the technical field of machine vision and positioning navigation, in particular to a positioning method based on information fusion of a binocular camera and an inertial navigation sensor.

Background

The intelligent robot senses the environmental information by means of abundant sensors, and uses the related environmental information for algorithm-level fusion to obtain comprehensive information about the self state and the surrounding environment of the robot, so as to be used as input of a control decision. The image information collected by the camera is rich, so that the repositioning precision can be improved, but the camera frequency is low, and the influence of illumination and the surface texture of an object is large; although the IMU can measure the pose of the robot at a high frequency in a short time, the IMU has serious drift and can continue to work only by the auxiliary correction of other sensors. Therefore, the multi-sensor fusion-based instant positioning algorithm comprehensively utilizes the advantages of various sensors, and can realize high-precision and high-robustness pose determination and map construction on the intelligent robot.

Although the existing visual positioning system mainly based on point features can achieve instant positioning mainly in a specific scene with rich textures, in an indoor environment, weak texture areas such as large-area white walls in corridor channels usually exist. And the light may change dramatically in the channel in a small area. These real-existing scene conditions will cause the visual positioning technology based on point features to fail to work properly. An Inertial Measurement Unit (IMU) integrates a gyroscope and a linear accelerometer component, so that the Inertial Measurement Unit (IMU) has strong adaptability to a large-scale motion state, can output angular velocity and linear acceleration at a high frequency, can acquire real-time motion pose transformation of a robot by integrating the angular velocity and the linear acceleration, but still has the problems of accumulated error and scale drift, so that visual information and IMU information have strong complementarity, visual pose constraint can correct IMU drift, and the IMU can provide absolute scale information for monocular vision. Nevertheless, monocular vision initialization is very demanding with respect to environment and camera motion, and initialization is prone to fail if there is little ambient texture information, little or too much camera motion.

Disclosure of Invention

The invention aims to provide a positioning method based on information fusion of a binocular camera and an inertial navigation sensor on the premise of saving cost and ensuring reliability, provides a mechanism for tightly coupling IMU data and visual point-line characteristics to perform indoor robot visual positioning, and realizes accurate and rapid pose resolving of a robot in a complex environment.

The invention is realized by at least one of the following technical schemes.

A positioning method based on information fusion of a binocular camera and an inertial navigation sensor comprises the following steps:

s1, acquiring a robot peripheral image sequence;

step S2, for the obtained image frame, using an improved FAST feature point extraction algorithm and introducing a quadtree to quickly screen feature points, so that the FAST feature points are uniformly distributed;

step S3, extracting line features of the image frame and calculating a line feature descriptor;

step S4, performing tracking matching on FAST feature points extracted from the previous and subsequent image frames to obtain an inter-frame estimation pose, re-projecting point and line features in the previous and subsequent frames to the current frame according to the estimation pose, and performing tracking matching on the line features;

step S5, determining whether the current frame is a key frame, if not, discarding the current frame, and temporarily storing the visual information and IMU pre-integration information to transmit to the next frame;

and step S6, adding the current frame into a sliding window, performing rear-end optimization by combining the paired dotted line feature constraint in the sliding window and IMU pre-integration tight coupling, and outputting the motion pose between frames.

Preferably, the improved FAST feature point extraction algorithm specifically includes: dividing grids according to the size of the image; traversing each grid, extracting FAST feature points from each grid, regarding the image as an N-ary tree root node, namely N equal divisions, if the number of the root node feature points is more than 1, performing iteration equal divisions on the node to generate four child nodes, wherein the iteration termination condition is that a certain child node only contains one feature point or does not contain the feature point; and finally, the selected characteristic point is the optimal.

Preferably, step S3 is to extract line features by using an LSD line feature extraction algorithm, and calculate line feature descriptors by using an LBD algorithm.

Preferably, the lsd (line Segment detector) line feature extraction algorithm specifically includes: the image size is reduced to a% of the original image through Gaussian sampling, the sawtooth effect existing in the original image is weakened on the premise that the image is not blurred, the pixel gradients of all points in the image are sequenced, and pixel points in the pixel gradient direction are combined to generate a linear support area.

Preferably, the lbd (line Band descriptor) algorithm is specifically: performing pixel gradient accumulation on the detected LSD line characteristics, constructing descriptors by accumulating gradient information associated with each line in the line characteristics, calculating the mean value and standard deviation of descriptor matrixes of each line, and matching line ends according to the distance of the descriptors on the basis of a nearest neighbor matching criterion;

the LBD descriptor is specifically: determining an LSD line segment strip area and a gradient direction; calculating gradient weight of each row of the strip; counting gradient histograms of each row to form a descriptor matrix; the mean and standard deviation were calculated and concatenated into an LBD descriptor.

Preferably, in step S4, the tracking matching of the feature points of the previous and subsequent frames is performed by a direct method based on a minimum luminosity error, and the luminosity error of the feature points and the surrounding pixel blocks is selected as a final error function; the gray value is obtained by a bilinear interpolation method, and the minimum luminosity error function is as follows:

E_u,v＝I_v[p`]-I_u[p]

where p is the previous image frame I_uFAST feature point in (1), p' represents p in the current image frame I_vThe back projected coordinates of (1).

Preferably, the tracking and matching of the line features is realized by minimizing a reprojection error:

wherein l ═ l₁,l₂]^TIs a projected straight line,/₁Representing the line feature to be matched in the previous frame, l₂Representing line features to be matched in the subsequent frame, p^T、q^TAre the homogeneous coordinates of the two end points of the matching line segment.

Preferably, in step S5, the key frame selection policy is:

the current frame, which has been considered as a key frame N consecutive times, can track not less than h FAST point features and at least m line features, or the current frame shares 3/4 no more point features and line features in the reference key frame than the reference key frame.

Preferably, before adding the sliding window, the IMU sensor data between the front and rear frames of the key frame is pre-integrated.

Preferably, the objective function to be optimized in the sliding window is:

E＝‖r_p-J_pX‖²+∑‖r_I(Z_i,X)‖²+∑‖r_f(Z_f,X)‖²+∑‖r_l(Z_l,X)‖²

wherein X is the state quantity to be optimized, including the position, rotation, acceleration of IMU relative to the world coordinate system and the inverse depth of the characteristic point, etc. | r_p-J_pX‖²For residual constraint after marginalization, r_pDenotes the a priori residual, J_pJacobian, | r, representing a prior term_I(Z_i,X)‖²Is an IMU error, r_I() Representing IMU residual, Z_iMeasurement term, | r, representing pre-integration_f(Z_fX) represents a point feature reprojection error, | r_l(Z_l,X)‖²Representing line feature reprojection error, r_f() Representing point feature residual terms, r_l() Represents the residual term of the line feature,Z_fobservations representing line characteristics, Z_lRepresents an observed quantity of line features.

The invention can well overcome the problem that the point characteristics are invalid in scenes such as weak texture corridors, light violent change and the like by a visual positioning technology combining the lead-in characteristics and the point characteristics. And by means of a mechanism of tightly coupling IMU data and visual point-line characteristics, the problem of autonomous positioning failure of pure vision under the conditions of rapid movement, rotation and the like can be solved, and the precision and the robustness of the robot instant positioning system are improved on the whole. The accumulated errors such as rotation, translation, scale drift and the like are obviously reduced, and the problems of unstable characteristic points and loss caused by a long corridor environment are effectively relieved.

Compared with the prior art, the invention has the beneficial effects that:

the invention can well overcome the problem that the point characteristics are invalid in scenes such as weak texture corridors, light violent change and the like by a visual positioning technology combining the lead-in characteristics and the point characteristics. And by means of a mechanism of tightly coupling IMU data and visual point-line characteristics, the problem of autonomous positioning failure of pure vision under the conditions of rapid movement, rotation and the like can be solved, and the precision and the robustness of the robot instant positioning system are improved on the whole. The accumulated errors such as rotation, translation, scale drift and the like are obviously reduced, and the problem of tracking loss caused by unstable feature points and weak textures is effectively solved.

Drawings

Fig. 1 is a positioning method based on information fusion of a binocular camera and an inertial navigation sensor according to the present embodiment;

fig. 2 is a result of actual environment point-line tracing in this embodiment.

Detailed Description

The invention is further described below with reference to the accompanying drawings. It should be noted that the following description gives detailed embodiments and specific operation procedures, and is intended to describe the present application in detail, but the scope of the present invention is not limited to this embodiment.

Reference will now be made in detail to the present preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1 and 2, a flow chart of a positioning method based on information fusion of a binocular camera and an inertial navigation sensor includes the following steps:

s1, acquiring a robot peripheral image sequence;

in the embodiment, a trolley robot is used as a carrier for carrying out experiments, an STM32 single chip microcomputer is used as a bottom layer driving plate, a small foraging depth camera (the camera frequency is 30Hz, and the IMU frequency is 200Hz) is used as an external sensor, and a peripheral image sequence of the robot is acquired.

As another embodiment, the STM32 single chip microcomputer is replaced by an XMC4500 single chip microcomputer to serve as a bottom driving plate.

As another example, the small foraging depth camera is replaced with a Kinect camera.

the extraction process of the original FAST feature points comprises the following steps: traversing each pixel point in the whole image area, investigating 16 pixel points on a circle with the pixel point as the center of a circle and 3 pixels as the radius, and if the difference between the gray value of 9 continuous points (or 12 continuous points) and the gray value of the current pixel point exceeds a preset threshold value t, regarding the points as FAST feature points, however, an original FAST feature point extraction algorithm cannot uniformly extract a certain number of feature points, so that the cluster aggregation of the feature points and the number are not easy to control, the effective image area for pose solving is small, and the camera pose solving is easy to cause large errors. Therefore, the strategy for extracting the feature points by the improved FAST feature point algorithm is as follows: dividing grids according to the size of the image, so that each grid is 15 pixel points long and 15 pixel points wide; and traversing each grid, and extracting FAST characteristic points on each grid. The image is regarded as a root node of a quadtree, namely 4 equal divisions (a binary tree can also be used here, namely halving, iterative equal divisions generate two child nodes, and the six equal divisions/eight equal divisions of the octree can be both used), if the number of the characteristic points of the root node is more than 1, the node is iterated and equally divided to generate 4 child nodes, and the iteration termination condition is that when a certain child node only contains one characteristic point or does not contain the characteristic point; and finally, the selected characteristic point is the optimal.

As another embodiment, the image division mesh size may be 30 pixels long and 30 pixels wide.

As another embodiment, the method for the quadtree to equally divide the grids can be replaced by a binary tree/a hexa-tree/an octree and the like.

Step S3, extracting line features of the acquired image frame by using an LSD (line Segment descriptor) line feature extraction algorithm, and calculating a line feature descriptor by using an LBD (line Band descriptor) algorithm;

the LSD algorithm line segment feature extraction process comprises the following steps: the size of the image is reduced to 80% of the original image through Gaussian sampling, and the sawtooth effect existing in the original image is weakened on the premise that the image is not blurred. The pixel gradients of each point in the image are ordered, and significant edge points tend to occur at points with larger gradient values. And combining the pixel points with the pixel gradient direction with a small difference to generate a linear support area.

The LBD (line Band descriptor) descriptor accumulates pixel gradient in LSD line characteristics, constructs descriptors by accumulating gradient information associated with each line in the line characteristics, calculates the mean value and standard deviation of descriptor matrixes of each line, and matches line ends according to the distance of the descriptors based on the nearest neighbor matching criterion.

And step S4, aiming at FAST feature points extracted from the previous and next image frames, tracking and matching the feature points by a direct method based on the minimized luminosity error to obtain an inter-frame estimated pose, then re-projecting the point and line features in the previous and next frames to the current frame by the estimated pose, and tracking and matching the line features by the minimized re-projection error.

The point-line feature matching process comprises the following steps: the feature matching of the front and rear frame points is carried out by a direct method based on the minimum photometric error, and in order to enhance the robustness of the system, the photometric error of the feature points and the pixel blocks around the feature points can be selected to be the minimum as a final error function.

Because the space point is represented by a floating point number, the projection pixel coordinate estimated on the second frame through pose transformation is also a floating point number, but the pixel coordinate is an integer, the invention adopts a bilinear interpolation method to solve the gray value, and the luminosity error function which needs to be minimized finally is as follows:

E_u,v＝I_v[p`]-I_u[p]

And line feature matching is achieved by minimizing the reprojection error:

wherein l ═ l₁,l₂]^TIs a projected straight line,/₁Representing the line feature to be matched in the previous frame, l₂Representing line features to be matched in the subsequent frame, p^T、q^TIs the homogeneous coordinates of the two end points of the matching line segment, where the optimization does not use the distance between the two end points because the end points of the projected straight line may not be accurate due to factors such as image blur or occlusion, and therefore the directional distance along the projected straight line may not provide valid information, and it is reasonable to use the end point vertical directional distance.

Step S5, determining whether the current frame is a key frame, if not, discarding the current frame, and temporarily storing the visual information and IMU pre-integration information to transmit to the next frame; if so, acquiring IMU sensor data between the front frame and the rear frame, performing pre-integration on the IMU sensor data, and adding the current frame into a sliding window;

and performing back-end optimization by combining the feature point matched based on the direct method of the minimized photometric error in the sliding window and the line feature matched based on the minimized reprojection error and IMU pre-integration tight coupling, wherein the tight coupling refers to that intermediate data obtained by vision and IMU are processed by an optimization filter, the tight coupling needs to add the point line feature into a feature vector to be optimized together with IMU constraint, and more accurate inter-frame motion pose is output, and all inter-frame pose data are combined to form the motion path of the robot carrying the camera.

In order to ensure real-time performance, the back-end optimization needs to control the number of state quantities participating in optimization and keep the number of state quantities participating in graph optimization approximately unchanged, so that the invention adopts a sliding window method, but still needs to screen images participating in optimization to ensure that frames participating in optimization can provide abundant information without redundancy.

Wherein the key frame selection strategy is as follows: the current frame has not been considered as a key frame for 20 consecutive times; the current frame can track not less than 40 FAST point features and at least 15 line features. The total number of point features and line features in the current frame and the reference key frame does not exceed 3/4 for the reference key frame.

The method adopts a visual and IMU tight coupling mode to carry out back-end optimization, namely visual characteristic points are added into the back-end optimization, a visual residual error and an IMU residual error are combined into a combined optimization problem, and the solution of the whole optimization problem is the final state estimation. The back-end optimization uses a form of a sliding window, and an objective function to be optimized in the back-end sliding window is as follows:

wherein X is the state quantity to be optimized, including the position, rotation, acceleration of IMU relative to the world coordinate system and the inverse depth of the characteristic point, etc. | r_p-J_pX‖²For residual constraint after marginalization, r_pDenotes the a priori residual, J_pJacobian, | r, representing a prior term_I(Z_i,X)‖²Is an IMU error, r_I() Representing IMU residual, Z_iMeasurement term, | r, representing pre-integration_f(Z_fX) represents a point feature reprojection error, | r_l(Z_l,X)‖²Representing line feature reprojection error, r_f() Representing point feature residual terms, r_l() Representing line feature residual terms, Z_fObservations representing line characteristics, Z_lRepresents an observed quantity of line features.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims

1. A positioning method based on information fusion of a binocular camera and an inertial navigation sensor is characterized by comprising the following steps:

s1, acquiring a robot peripheral image sequence;

2. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 1, wherein the positioning method comprises the following steps: the improved FAST feature point extraction algorithm specifically comprises the following steps: dividing grids according to the size of the image; traversing each grid, extracting FAST feature points from each grid, regarding the image as an N-ary tree root node, namely N equal divisions, if the number of the root node feature points is more than 1, performing iteration equal divisions on the node to generate four child nodes, wherein the iteration termination condition is that a certain child node only contains one feature point or does not contain the feature point; and finally, the selected characteristic point is the optimal.

3. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 2, wherein the positioning method comprises the following steps: step S3 is to extract line features using the LSD line feature extraction algorithm and calculate line feature descriptors using the LBD algorithm.

4. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 3, wherein the positioning method comprises the following steps: the LSD (line Segment detector) line feature extraction algorithm specifically comprises the following steps: the image size is reduced to a% of the original image through Gaussian sampling, the sawtooth effect existing in the original image is weakened on the premise that the image is not blurred, the pixel gradients of all points in the image are sequenced, and pixel points in the pixel gradient direction are combined to generate a linear support area.

5. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 4, wherein the positioning method comprises the following steps: the LBD (line Band descriptor) algorithm specifically comprises the following steps: performing pixel gradient accumulation on the detected LSD line characteristics, constructing descriptors by accumulating gradient information associated with each line in the line characteristics, calculating the mean value and standard deviation of descriptor matrixes of each line, and matching line ends according to the distance of the descriptors on the basis of a nearest neighbor matching criterion;

6. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 5, wherein the positioning method comprises the following steps: in step S4, tracking and matching the feature points of the previous and subsequent frames is performed by a direct method based on a minimum luminosity error, and the luminosity error of the feature points and the pixel blocks around the feature points is selected as a final error function; the gray value is obtained by a bilinear interpolation method, and the minimum luminosity error function is as follows:

E_u，v＝I_v[p`]-I_u[p]

7. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 6, wherein the positioning method comprises the following steps: and the tracking matching of the line features is realized by minimizing the reprojection error:

wherein l ═ l₁，l₂]^TIs a projected straight line,/₁Representing the line feature to be matched in the previous frame, l₂Representing line features to be matched in the subsequent frame, p^T、q^TAre the homogeneous coordinates of the two end points of the matching line segment.

8. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 7, wherein the positioning method comprises the following steps: in step S5, the key frame selection policy is:

9. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 8, wherein the positioning method comprises the following steps: before adding the sliding window, IMU sensor data between the front frame and the rear frame of the key frame are pre-integrated.

10. The positioning method based on information fusion of the binocular camera and the inertial navigation sensor according to claim 9, wherein: the objective function to be optimized in the sliding window is:

E＝||r_p-J_pX||²+∑||r_I(Z_i，X)||²+∑||r_f(Z_f，X)||²+∑||r_l(Z_l，X)||²

wherein, X is the state quantity to be optimized, including the position, rotation, acceleration of IMU relative to the world coordinate system and the inverse depth of the characteristic point, and | r_p-J_pX||²For residual constraint after marginalization, r_pDenotes the a priori residual, J_pJacobian, r, representing a prior term_I(Z_i，X)||²Is an IMU error, r_I() Representing IMU residual, Z_iMeasurement term representing pre-integral, | | r_f(Z_fX) | | represents the point feature reprojection error, | | | r_l(Z_l，X)||²Representing line feature reprojection error, r_f() Representing point feature residual terms, r_l() Representing line feature residual terms, Z_fObservations representing line characteristics, Z_lRepresents an observed quantity of line features.