Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for correcting a camera pose according to an embodiment of the present invention, which is applicable to a case of correcting an initial translation vector determined based on IMU information, and especially applicable to a scene in which the initial translation vector obtained by a pre-integration module is corrected in an SLAM system or a VIO (Visual-Inertial odometer) system with the pre-integration module. The method can be executed by a camera pose correction device, which can be realized by software and/or hardware, and is integrated in a device with a camera pose estimation function. The method specifically comprises the following steps:
and S110, acquiring an initial pre-integration value and an initial translation vector in an initial pose corresponding to the current image frame of the camera, wherein the initial pre-integration value and the initial translation vector are pre-integrated and determined according to information collected by the IMU inertial measurement unit.
The current image frame may refer to an image frame captured by a camera at the current time. Because the camera is moving, the pose of the camera in the world coordinate system also changes in real time, so that the corresponding camera pose of the camera when shooting each image frame needs to be estimated. The IMU inertial measurement unit may utilize gyroscopes and accelerometers to collect scaled camera acceleration information and angular velocity information. The initial pose may be an initial camera pose obtained by performing pre-integration on information acquired by the IMU. The initial pose may include an initial translation vector and an initial rotation matrix of the camera. The initial pre-integration value may be a numerical value obtained by pre-integrating information acquired by the IMU, for example, performing primary integration on acceleration information acquired by the IMU to obtain a speed value; performing twice integration on acceleration information acquired by the IMU to obtain a displacement value; and integrating the angular speed information acquired by the IMU to obtain a rotation angle value. The initial pre-integration value in the present embodiment may include a first initial pre-integration value, a second initial pre-integration value, and a third initial pre-integration value; wherein the first initial pre-integration value is an initial relative displacement variation between the current image frame and the previous image frame; the second initial pre-integration value is an initial relative speed change amount between the current image frame and the previous image frame; the third initial pre-integrated value is an initial relative rotation angle variation amount between the current image frame and the previous image frame.
Specifically, the embodiment may perform pre-integration in advance according to the information acquired by the IMU inertial measurement unit to determine an initial pre-integration value and an initial pose corresponding to the current image frame of the camera. Illustratively, a pre-integration operation is performed based on IMU information between a current image frame and a previous image frame acquired by the IMU, an initial pre-integration value corresponding to the current image frame is obtained, and then an initial pose (i.e., an initial translation vector and an initial rotation matrix) corresponding to the current image frame is determined based on a kinematic formula and the initial pre-integration value, where the initial translation vector in the obtained initial pose has a large error due to the influence of IMU noise.
For example, the present embodiment may determine a first initial pre-integration value corresponding to a current image frame, such as an initial relative displacement change amount between the current image frame and a previous image frame, by pre-integrating acceleration information between the current image frame and the previous image frame acquired by the IMU, and may determine an initial translation vector corresponding to the current image frame based on the following kinematic formula:
wherein,
for the initial translation vector corresponding to the current image frame jI.e. the displacement of the camera coordinate system corresponding to the current image frame j from the world coordinate system (since the camera is moving, the camera coordinate system with the camera optical center as the origin is not fixed),
the position of the camera at the current moment in the world coordinate system can also be understood;
a rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1 is used for rotation conversion of the coordinate system;
a target translation vector corresponding to the previous image frame j-1;
the initial speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration under the world coordinate system;
is the first initial pre-integration value, i.e., the amount of relative displacement change, between the previous image frame j-1 and the current image frame.
In this embodiment, based on the correction manner of the camera pose provided in steps S110 to S120, the initial translation vector corresponding to the previous image frame j-1 may be corrected, so that the target translation vector corresponding to the previous image frame may be obtained
And S120, calculating a total estimation error corresponding to the initial translation vector according to the image frame processing information, the initial pre-integration value and the initial translation vector of the camera, correcting the initial translation vector based on the total estimation error, and determining a target translation vector corresponding to the current image frame.
The image frame processing information may refer to visual information obtained by processing an image captured by a camera. Illustratively, the image frame processing information may refer to feature point information obtained by performing feature detection and tracking on an image frame captured by a camera. The total estimated error corresponding to the initial translation vector may include visual errors associated with image frame processing information and IMU errors due to IMU noise. The target translation vector corresponding to the current image frame may refer to a more accurate initial translation vector obtained after the initial translation vector is corrected.
Specifically, a vision error corresponding to an initial translation vector may be calculated based on image frame processing information of the camera, and an IMU error corresponding to the initial translation vector may be calculated based on an initial pre-integration value of the camera and a kinematic formula, so that a total estimation error corresponding to the initial translation vector may be obtained. The total estimation error is minimized to correct the initial translation vector, so that a corrected more accurate initial translation vector, that is, a target translation vector corresponding to the current image frame, can be obtained. In the embodiment, the initial translation vector determined based on the IMU information is corrected by using image frame processing information, i.e., visual information, so that noise control can be performed on the IMU, and the influence of IMU noise on the initial translation vector is reduced.
It should be noted that, through experimental verification, the initial rotation matrix obtained by using the IMU information is more accurate than the initial rotation matrix obtained by using the visual information, so that the accuracy and precision of the initial rotation matrix obtained based on the IMU information cannot be further improved by using the visual information.
It should be noted that after the initial translation vector corresponding to the current image frame is corrected to obtain the target translation vector, the target translation vector may be further optimized, for example, sliding window optimization, so as to estimate the final camera pose corresponding to the current image frame. When the target translation vector is used for optimization, the convergence rate of optimization can be greatly increased, and the estimation efficiency of the camera pose is improved.
According to the technical scheme of the embodiment, an initial pre-integration value and an initial translation vector in an initial pose corresponding to a current image frame of a camera are obtained, wherein the initial pre-integration value and the initial translation vector are determined by pre-integration according to information collected by an Inertial Measurement Unit (IMU). And calculating a total estimation error corresponding to the initial translation vector by using image frame processing information, an initial pre-integration value and the initial translation vector obtained after processing an image shot by the camera, and correcting the initial translation vector based on the total estimation error to obtain a corrected target translation vector corresponding to the current image frame. According to the method and the device, the initial translation vector determined based on the IMU information is corrected by using the visual information of the camera, so that noise control can be performed on the IMU, the influence of IMU noise on the initial translation vector is reduced, the accuracy and precision of the estimation of the initial translation vector in the initial pose are improved, and the accuracy and precision of the final estimation of the pose of the camera are improved.
On the basis of the above technical solution, S120 may include: calculating a visual error corresponding to the current image frame according to pixel coordinates of each target feature point in the current image frame, three-dimensional coordinates of each target feature point in the target image frame before the current image frame and a conversion matrix between the current image frame and the target image frame, wherein the conversion matrix is determined according to an initial translation vector and an initial rotation matrix corresponding to the current image frame; calculating an IMU error corresponding to the current image frame according to an initial translation vector corresponding to the current image frame, a time interval between the current image frame and a previous image frame, a previous target translation vector and a previous initial speed corresponding to the previous image frame, and an initial pre-integration value corresponding to the current image frame; adding the visual error and the IMU error to determine a total estimation error corresponding to the initial translation vector; and minimizing the total estimation error by adjusting the size of the initial translation vector, and determining the initial translation vector corresponding to the minimum total estimation error as the target translation vector corresponding to the current image frame.
The target feature points in the current image frame may be feature points that satisfy a preset screening condition. The preset filtering condition may refer to a feature point in the current image frame, the feature point appearing in an image frame before the current image frame and having retained depth information of the feature point. For example, in the SLAM system, for each feature point in the current image frame, the feature point appearing on the key frame in the sliding window may be taken as a target feature point. The target image frame may refer to an image frame in which a target feature point appears before the current image frame and depth information of the target feature point is retained, so that three-dimensional coordinates corresponding to the target feature point may be obtained. For example, in the SLAM system, for each target feature point, a key image frame in which the target feature point first appears in a sliding window may be determined as a target image frame corresponding to the target feature point, where the sliding window includes a plurality of key image frames and a current image frame. The current image frame may or may not be a key image frame; if the current image frame is the key image frame, the current image frame can be kept in the sliding window after the sliding window is optimized; and if the current image frame is a non-key image frame, the current image frame is not kept in the sliding window after optimization.
The conversion matrix between the current image frame and the target image frame may refer to a conversion matrix from the target image frame to the current image frame, which may be determined according to a translation vector from the target image frame to the current image frame and a rotation matrix from the target image frame to the current image frame. The translation vector of the target image frame to the current image frame may be determined according to a translation vector corresponding to the target image frame (translation vector from the camera coordinate system corresponding to the target image frame to the world coordinate system) and an initial translation vector corresponding to the current image frame (initial translation vector from the camera coordinate system corresponding to the current image frame to the world coordinate system). The rotation matrix from the target image frame to the current image frame may be determined according to a rotation matrix corresponding to the target image frame (a rotation matrix from a camera coordinate system corresponding to the target image frame to a world coordinate system) and an initial rotation matrix corresponding to the current image frame (an initial rotation matrix from the camera coordinate system corresponding to the current image frame to the world coordinate system).
Illustratively, the vision error corresponding to the current image frame may be calculated based on the following formula:
wherein r is
projThe vision error corresponding to the current image frame j;
the pixel coordinates of the ith target feature point in the current image frame j are obtained; p
iNormalized three-dimensional coordinates of the ith target feature point in a target image frame k before the current image frame j; lambda [ alpha ]
iA depth value corresponding to the ith target feature point;
a conversion matrix from a target image frame k to a current image frame j; pi represents projecting the three-dimensional coordinates onto a two-dimensional plane of the current image frame; rho is a Huber loss function; c is a set of all target feature points in the current image frame j; f is a set of each target image frame.
Specifically, the present embodiment may calculate a visual error corresponding to the current image frame based on the reprojection error. By corresponding the target characteristic points in the target image frame to three-dimensional coordinates P containing depth information
iλ
iProjecting the image frame to a two-dimensional plane of the current image frame to obtain two-dimensional coordinates corresponding to the target feature point
Due to the error of the initial pose of the camera, the pixel coordinates of the target feature point obtained in the current image frame
And two-dimensional coordinates
Can not be overlapped, so that each target feature in the current image frame can be calculatedAnd adding the corresponding re-projection errors of the target characteristic points to obtain the visual error corresponding to the current image frame. It should be noted that the huber loss function ρ is a loss function using robust regression, and can be used to reduce the influence of an abnormal value in the subsequent optimization process of the initial pose, so that the optimization is faster and the finally obtained camera pose is more accurate.
For example, the IMU error corresponding to the current image frame may be calculated based on the following formula:
wherein r is
IMUFor the IMU error corresponding to the current image frame j,
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
an initial translation vector corresponding to the current image frame j, namely the displacement from a camera coordinate system corresponding to the current image frame j to a world coordinate system;
a previous target translation vector corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1;
the last initial speed of the last image frame under the world coordinate system; g
wIs the gravitational acceleration under the world coordinate system;
is the initial pre-integration value between the previous image frame j-1 to the current image frame.
The previous initial speed of the previous image frame may refer to a camera motion speed estimated in the previous image frame.
Specifically, for each image frame captured by the camera, the initial translation vector corresponding to each image frame may be corrected based on the operations of the above steps S110-S120, and a target translation vector corresponding to each image frame may be obtained. In this embodiment, the IMU error corresponding to the current image frame may be calculated based on the above kinematic formula, that is, the IMU error corresponding to the current image frame is obtained by subtracting the equation on the left side of the equal sign from the equation on the right side of the equal sign in the kinematic formula. It should be noted that before the correction is performed on the initial translation vector corresponding to the current image frame, since the initial translation vector is determined according to the above kinematic formula, the IMU error corresponding to the current image frame at this time is zero.
After the vision error and the IMU error are added to obtain the total estimation error corresponding to the initial translation vector, the present embodiment may adjust the size of the initial translation vector so that the minimum total estimation error may be obtained. In the embodiment, the total estimation error is minimized, so that the sum of the visual error of the corrected initial translation vector and the IMU error is minimized, the initial translation vector obtained based on the IMU information is corrected by using the visual information, the accuracy of the initial pose of the camera is improved, and the influence of IMU noise is reduced.
On the basis of the above technical solution, after step S120, the method may further include: and correcting the initial pre-integral value corresponding to the current image frame according to the current target translation vector corresponding to the current image frame and the previous target translation vector and the previous initial speed corresponding to the previous image frame, and determining the target pre-integral value corresponding to the current image frame.
Specifically, since the initial translation vector is corrected in the present embodiment, the initial pre-integration value corresponding to the current image frame can be corrected according to the target translation vector corresponding to two adjacent image frames and the previous initial velocity corresponding to the previous image frame based on the kinematic formula, so as to obtain the corrected initial pre-integration value, that is, the target pre-integration value. It should be noted that, the present embodiment may calculate the first initial pre-integration value, which is the amount of change in relative displacement between the current image frame and the previous image frame, based on the initial translation vector of the current image frame, so that the corrected initial translation vector may be used to correct the first initial pre-integration value and determine the corrected first target pre-integration value. According to the method and the device, errors introduced during discretization of the time-continuous model can be reduced by correcting the initial pre-integration value of the image frame, so that the accuracy and precision of pose estimation are further improved.
For example, the present embodiment may correct the first initial pre-integration value corresponding to the current image frame according to the current target translation vector corresponding to the current image frame, the previous target translation vector and the previous initial speed corresponding to the previous image frame, and a time interval between the current image frame and the previous image frame, and determine the first target pre-integration value corresponding to the current image frame. Specifically, the first target pre-integration value corresponding to the current image frame may be determined based on the following formula:
wherein,
a first target pre-integration value corresponding to a current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
a current target translation vector corresponding to a current image frame j;
a last target translation vector corresponding to a last image frame j-1 of the current image frame j;
the previous initial speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration in the world coordinate system.
On the basis of the above technical solution, after determining the first target pre-integration value corresponding to the current image frame, the method further includes: and taking the first target pre-integration value as constraint information, optimizing the target translation vector and the initial rotation matrix in the initial pose, and determining the optimized camera pose.
Specifically, the present embodiment may optimize the initial rotation matrix and the target translation vector in the initial pose based on the corrected pre-integral value, so that a more accurate camera pose corresponding to the current image frame may be obtained. Exemplarily, in the SLAM system, the initial rotation matrix in the initial pose corresponding to the current image frame and the corrected target translation vector can be used as the initial value of the camera pose in the sliding window optimization, and the pre-integration value obtained after correction can be used as constraint information to optimize the camera pose, so that the pose optimization result of the current image frame is more accurate, and the accuracy of the camera pose estimation can be greatly improved.
Example two
Fig. 2 is a flowchart of a method for correcting a camera pose according to a second embodiment of the present invention, where on the basis of the second embodiment, after determining a target translation vector corresponding to a current image frame, the present embodiment further includes: the method comprises the steps of correcting the direction of an initial speed according to the initial speed, the initial translation vector and a target translation vector of a current image frame, and determining a target speed corresponding to the current image frame. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.
Referring to fig. 2, the method for correcting the pose of the camera provided by this embodiment specifically includes the following steps:
s210, acquiring an initial pre-integration value and an initial translation vector in an initial pose corresponding to the current image frame of the camera, wherein the initial pre-integration value and the initial translation vector are pre-integrated and determined according to information collected by the IMU inertial measurement unit.
S220, calculating a total estimation error corresponding to the initial translation vector according to the image frame processing information, the initial pre-integration value and the initial translation vector of the camera, correcting the initial translation vector based on the total estimation error, and determining a target translation vector corresponding to the current image frame.
And S230, correcting the direction of the initial speed according to the initial speed, the initial translation vector and the target translation vector of the current image frame, and determining the target speed corresponding to the current image frame.
The initial speed corresponding to the current image frame can be determined by pre-integration according to the information acquired by the IMU inertial measurement unit. The initial velocity in this embodiment is a vector, which includes a velocity magnitude and a velocity direction. For example, the acceleration information between the current image frame and the previous image frame acquired by the IMU is pre-integrated, a second initial pre-integration value of a relative speed variation between the current image frame and the previous image frame may be obtained, and the initial speed of the current image frame may be determined according to the second initial pre-integration value, the target speed of the previous image frame, and the time interval between the current image frame and the previous image frame based on the following kinematic formula:
wherein,
the initial speed corresponding to the current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
the target speed corresponding to the previous image frame j-1; Δ t is the time between the current image frame j and the previous image frame j-1Separating; g
wIs the gravitational acceleration under the world coordinate system;
a second initial pre-integration value corresponding to the current image frame j.
FIG. 3(a) shows an example of the variation of the true initial velocity between two adjacent image frames; fig. 3(b) shows an example of a change in the initial velocity between two adjacent image frames calculated based on a kinematic formula. The black circular blocks in fig. 3(a) and 3(b) represent the end point positions at which the camera takes the image frames; the black rectangular block represents the middle position corresponding to each IMU data acquired between two image frames; the arrows indicate the direction of speed;
the target translation vector corresponding to the previous image frame (frame l) of the current image frame (frame j) can be understood as the position of the camera corresponding to the previous image frame (frame l) in the world coordinate system;
is the initial translation vector corresponding to the current image frame (jth frame), which can be understood as the position of the camera corresponding to the current image frame in the world coordinate system;
the target speed corresponding to the previous image frame;
is the initial speed corresponding to the current image frame.
As shown in fig. 3(a) and 3(b), since the IMU acquisition frequency is higher than the camera frame rate, the IMU data is more than the camera frame, so that there are a plurality of intermediate positions corresponding to the IMU data between two adjacent image frames (i.e., two end positions). Initial velocity in FIG. 3(b)
Is calculated based on kinematic formulaWhich is equal to the true initial velocity given in fig. 3(a)
In contrast, it can be seen that: initial velocity in FIG. 3(b)
The direction of (2) has a large error due to the influence of IMU noise, so that the present embodiment also corrects the direction of the initial velocity.
In particular, FIG. 4 provides an example of a correction for initial velocity. As shown in FIG. 4, the initial translation vector corresponding to the current image frame
Corrected to a target translation vector
After, at initial speed
But the angle between the direction of the speed and the camera position is constant, i.e. the direction of the speed is changed
And
equal, so that the angle between the two can be calculated based on the initial velocity and the initial translation vector of the current image frame
And determining the target speed corresponding to the current image frame according to the included angle and the target translation vector
The direction of the initial speed is corrected, and the accuracy of the initial speed is improved.
According to the technical scheme of the embodiment, after the initial translation vector of the current image frame is corrected, the direction of the initial speed of the current image frame can be corrected according to the target translation vector obtained after correction, so that the influence of IMU noise on the speed can be further reduced, and the accuracy and precision of initial speed estimation are improved.
On the basis of the above technical solution, S230 may include: calculating a rotation vector and an included angle between the corresponding initial speed and the initial translation vector of the current image frame; calculating a second rotation matrix between the initial speed and the initial translation vector according to the rotation vector and the included angle; determining a target direction according to the second rotation matrix and a target translation vector corresponding to the current image frame; and determining the target speed corresponding to the current image frame according to the initial speed and the target direction.
Specifically, the present embodiment may calculate a rotation vector and an included angle between the corresponding initial velocity and the initial translation vector of the current image frame based on the following formulas:
wherein n is the corresponding initial speed of the current image frame
And the initial translation vector
The vector of rotation between the two (c) is,
the corresponding initial translation vector of the current image frame can be understood as the position of the camera corresponding to the current image frame in the world coordinate system.
When calculating the second rotation matrix between the initial velocity and the initial translation vector, since the present embodiment corrects only the direction of the initial velocity and does not correct the magnitude of the initial velocity, it is sufficient to change the velocity direction by using the second rotation matrix with a modulus value of 1. For example, the present embodiment may determine the second rotation matrix between the initial velocity and the initial translation vector based on the following formula:
wherein,
is an initial velocity
And the initial translation vector
A second rotation matrix in between; i is an identity matrix; n is
TA transposed matrix of the rotation vector n; n is an antisymmetric matrix of the rotation vector n.
After the initial translation vector is corrected, the angle between the direction of the velocity and the camera position is unchanged, so that the direction of the initial velocity can be corrected based on the following formula to determine the target direction:
wherein,
in order to be the direction of the target,
and the target translation vector corresponding to the current image frame is obtained.
After obtaining the target direction, the current image frame may be utilizedThe corrected target speed is obtained from the initial speed and the target direction, i.e. the target speed can be based on a formula
Determining the target speed corresponding to the current image frame
On the basis of the above technical solution, S220 may include: calculating a visual error corresponding to the current image frame according to pixel coordinates of each target feature point in the current image frame, three-dimensional coordinates of each target feature point in the target image frame before the current image frame and a conversion matrix between the current image frame and the target image frame, wherein the conversion matrix is determined according to an initial translation vector and an initial rotation matrix corresponding to the current image frame; calculating an IMU error corresponding to the current image frame according to an initial translation vector corresponding to the current image frame, a time interval between the current image frame and a previous image frame, a previous target translation vector and a previous target speed corresponding to the previous image frame, and an initial pre-integration value corresponding to the current image frame; adding the visual error and the IMU error to determine a total estimation error corresponding to the initial translation vector; and minimizing the total estimation error by adjusting the size of the initial translation vector, and determining the initial translation vector corresponding to the minimum total estimation error as the target translation vector corresponding to the current image frame.
In this embodiment, based on the operations in steps S210-230, the direction of the initial speed corresponding to each image frame captured by the camera may be corrected, so as to obtain the target speed corresponding to each image frame. The specific way of calculating the visual error corresponding to the current image frame in this embodiment may be referred to the related description in the first embodiment. When the IMU error corresponding to the current image frame is calculated, the IMU error corresponding to the current image frame can be determined based on the previous target speed obtained after the initial speed of the previous image frame is corrected, and compared with the method of directly calculating the IMU error by using the initial speed of the previous image frame, the IMU error calculated by the embodiment is more accurate, and the correction effect of the initial translation vector is improved, so that the accuracy and precision of the initial pose estimation are further improved, and the accuracy and precision of the final camera pose estimation are improved.
For example, the present embodiment may calculate the IMU error corresponding to the current image frame based on the following formula as follows:
wherein r is
IMUFor the IMU error corresponding to the current image frame j,
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
an initial translation vector corresponding to the current image frame j, namely the displacement from a camera coordinate system corresponding to the current image frame j to a world coordinate system;
a previous target translation vector corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1;
the last target speed of the last image frame under the world coordinate system; g
wIs the gravitational acceleration under the world coordinate system;
is the initial pre-integration value between the previous image frame j-1 to the current image frame.
It should be noted that after the initial translation vector and the initial velocity corresponding to the current image frame are corrected to obtain the target translation vector and the target velocity, the target translation vector and the target velocity may be further optimized, such as sliding window optimization, so as to estimate the final camera pose corresponding to the current image frame. When the target translation vector and the target speed are used for optimization, the convergence speed of optimization can be greatly increased, and the estimation efficiency of the camera pose is improved.
EXAMPLE III
Fig. 5 is a flowchart of a method for correcting a camera pose according to a third embodiment of the present invention, where on the basis of the foregoing embodiment, after determining a target speed corresponding to a current image frame, the present embodiment further includes: the method comprises the steps of correcting an initial pre-integration value corresponding to a current image frame according to a current target translation vector and a current target speed corresponding to the current image frame and a previous target translation vector and a previous target speed corresponding to a previous image frame, and determining a target pre-integration value corresponding to the current image frame. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.
Referring to fig. 5, the method for correcting the pose of the camera provided by this embodiment specifically includes the following steps:
s310, acquiring an initial pre-integration value and an initial translation vector in an initial pose corresponding to the current image frame of the camera, wherein the initial pre-integration value and the initial translation vector are pre-integrated and determined according to information collected by the IMU inertial measurement unit.
S320, calculating a total estimation error corresponding to the initial translation vector according to the image frame processing information, the initial pre-integration value and the initial translation vector of the camera, correcting the initial translation vector based on the total estimation error, and determining a target translation vector corresponding to the current image frame.
S330, correcting the direction of the initial speed according to the initial speed, the initial translation vector and the target translation vector of the current image frame, and determining the target speed corresponding to the current image frame.
S340, correcting the initial pre-integration value corresponding to the current image frame according to the current target translation vector and the current target speed corresponding to the current image frame and the previous target translation vector and the previous target speed corresponding to the previous image frame, and determining the target pre-integration value corresponding to the current image frame.
Specifically, based on the operations in steps S310-S330, after the initial translation vector and the initial velocity corresponding to the image frame captured by the camera are corrected to obtain the previous target translation vector and the previous target velocity corresponding to the previous image frame and the current target translation vector and the current target velocity corresponding to the current image frame, the initial pre-integration value corresponding to the current image frame may be corrected, so that a more accurate pre-integration value is obtained, the influence of IMU noise on the pre-integration value is further reduced, so that the pose may be optimized by using the more accurate pre-integration value, and the accuracy and precision of pose estimation are further improved.
The initial pre-integration value in the present embodiment may include a first initial pre-integration value, a second initial pre-integration value, and a third initial pre-integration value; wherein the first initial pre-integration value is an initial relative displacement variation between the current image frame and the previous image frame; the second initial pre-integration value is an initial relative speed change amount between the current image frame and the previous image frame; the third initial pre-integrated value is an initial relative rotation angle variation amount between the current image frame and the previous image frame. It should be noted that, since the present embodiment corrects only the initial translation vector and the initial velocity, and does not correct the initial rotation vector, the present embodiment may correct the first initial pre-integration value and the second initial pre-integration value based on the target translation vector and the target velocity obtained by the correction, so as to obtain a more accurate pre-integration value.
Exemplarily, S340 may include: correcting a first initial pre-integration value corresponding to the current image frame according to a current target translation vector corresponding to the current image frame, a previous target translation vector and a previous target speed corresponding to a previous image frame and a time interval between the current image frame and the previous image frame, and determining a first target pre-integration value corresponding to the current image frame; and correcting the second initial pre-integration value corresponding to the current image frame according to the current target speed and the previous target speed corresponding to the current image frame and the time interval between the current image frame and the previous image frame, and determining the second target pre-integration value corresponding to the current image frame.
Wherein the first target pre-integration value refers to the corrected first initial pre-integration value. The second target pre-integration value refers to a corrected second initial pre-integration value.
For example, the present embodiment may determine the first target pre-integration value corresponding to the current image frame based on the following formula:
wherein,
a first target pre-integration value corresponding to a current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
a current target translation vector corresponding to a current image frame j;
a last target translation vector corresponding to a last image frame j-1 of the current image frame j;
the previous target speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration in the world coordinate system.
For example, the present embodiment may determine the second target pre-integration value corresponding to the current image frame based on the following formula:
wherein,
a second target pre-integration value corresponding to the current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
the current target speed corresponding to the current image frame j;
the previous target speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration in the world coordinate system.
According to the technical scheme of the embodiment, the initial pre-integration value corresponding to the current image frame can be corrected according to the previous target translation vector and the previous target speed corresponding to the previous image frame and the current target translation vector and the current target speed corresponding to the current image frame, so that a more accurate pre-integration value can be obtained, the influence of IMU noise on the pre-integration value is further reduced, the pose can be optimized by using the more accurate pre-integration value, and the accuracy and precision of pose estimation are further improved.
On the basis of the above technical solution, after determining the target pre-integration value corresponding to the current image frame, the method further includes: and (5) optimizing the target translation vector and the initial rotation matrix in the initial pose by taking the target pre-integration value as constraint information, and determining the optimized camera pose.
Specifically, after the initial translation vector, the initial speed and the initial pre-integration value are corrected by using the image frame processing information, the corrected target translation vector and the corrected target speed can be used as initial values of corresponding parameters in a subsequent pose optimization process, and the corrected target pre-integration value is used as constraint information to optimize the pose, so that the optimized final pose is more accurate, and the accuracy and tracking precision of the pose estimation of the camera are improved.
The following is an embodiment of the device for correcting a camera pose according to an embodiment of the present invention, which belongs to the same inventive concept as the method for correcting a camera pose according to the above embodiments, and reference may be made to the above embodiment of the method for correcting a camera pose, for details that are not described in detail in the embodiment of the device for correcting a camera pose.
Example four
Fig. 6 is a schematic structural diagram of a correction apparatus for a camera pose according to a fourth embodiment of the present invention, where this embodiment is applicable to a case of correcting an initial translation vector determined based on IMU information, and the apparatus may specifically include: an initial information acquisition module 410 and an initial translation vector correction module 420.
The initial information acquisition module 410 is configured to acquire an initial pre-integration value and an initial translation vector in an initial pose corresponding to a current image frame, where the initial pre-integration value and the initial translation vector are determined by pre-integration according to information acquired by an IMU inertial measurement unit; and the initial translation vector correction module 420 is configured to calculate a total estimation error corresponding to the initial translation vector according to the image frame processing information of the camera, the initial pre-integration value and the initial translation vector, correct the initial translation vector based on the total estimation error, and determine a target translation vector corresponding to the current image frame.
Optionally, the apparatus further comprises: and the initial speed correcting module is used for correcting the direction of the initial speed according to the initial speed, the initial translation vector and the target translation vector of the current image frame after the target translation vector corresponding to the current image frame is determined, and determining the target speed corresponding to the current image frame.
Optionally, the apparatus further comprises: and the first initial pre-integral correction module is used for correcting an initial pre-integral value corresponding to the current image frame according to the current target translation vector corresponding to the current image frame and a previous target translation vector and a previous initial speed corresponding to a previous image frame after the target translation vector corresponding to the current image frame is determined, and determining a target pre-integral value corresponding to the current image frame.
Optionally, the apparatus further comprises: the second initial pre-integration correction module is configured to, after determining the target speed corresponding to the current image frame, further include: and correcting the initial pre-integral value corresponding to the current image frame according to the current target translation vector and the current target speed corresponding to the current image frame and the previous target translation vector and the previous target speed corresponding to the previous image frame, and determining the target pre-integral value corresponding to the current image frame.
Optionally, the initial translation vector correction module 420 is specifically configured to: calculating a visual error corresponding to the current image frame according to pixel coordinates of each target feature point in the current image frame, three-dimensional coordinates of each target feature point in the target image frame before the current image frame and a conversion matrix between the current image frame and the target image frame, wherein the conversion matrix is determined according to an initial translation vector and an initial rotation matrix corresponding to the current image frame; calculating an IMU error corresponding to the current image frame according to an initial translation vector corresponding to the current image frame, a time interval between the current image frame and a previous image frame, a previous target translation vector and a previous target speed corresponding to the previous image frame, and an initial pre-integration value corresponding to the current image frame; adding the visual error and the IMU error to determine a total estimation error corresponding to the initial translation vector; and minimizing the total estimation error by adjusting the size of the initial translation vector, and determining the initial translation vector corresponding to the minimum total estimation error as the target translation vector corresponding to the current image frame.
Optionally, the apparatus further comprises: and the target image frame determining module is used for determining a key image frame with a target feature point appearing for the first time in the sliding window as the target image frame corresponding to the target feature point for each target feature point before calculating the visual error corresponding to the current image frame, wherein the sliding window comprises a plurality of key image frames and the current image frame.
Optionally, the vision error corresponding to the current image frame is calculated based on the following formula:
wherein r is
projThe vision error corresponding to the current image frame j;
the pixel coordinates of the ith target feature point in the current image frame j are obtained; p
iNormalized three-dimensional coordinates of the ith target feature point in a target image frame k before the current image frame j; lambda [ alpha ]
iA depth value corresponding to the ith target feature point;
a conversion matrix from a target image frame k to a current image frame j; pi represents projecting the three-dimensional coordinates onto a two-dimensional plane of the current image frame; rho is a Huber loss function; c is a set of all target feature points in the current image frame j; f is a set of each target image frame.
Optionally, the IMU error corresponding to the current image frame is calculated based on the following formula:
wherein r is
IMUFor the IMU error corresponding to the current image frame j,
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
an initial translation vector corresponding to the current image frame j, namely the displacement from a camera coordinate system corresponding to the current image frame j to a world coordinate system;
for the upper part corresponding to the previous image frame j-1A target translation vector; delta t is the time interval between the current image frame j and the previous image frame j-1;
the last target speed of the last image frame under the world coordinate system; g
wIs the gravitational acceleration under the world coordinate system;
is the initial pre-integration value between the previous image frame j-1 to the current image frame.
Optionally, the initial velocity correction module is specifically configured to: calculating a rotation vector and an included angle between the corresponding initial speed and the initial translation vector of the current image frame; calculating a second rotation matrix between the initial speed and the initial translation vector according to the rotation vector and the included angle; determining a target direction according to the second rotation matrix and a target translation vector corresponding to the current image frame; and determining the target speed corresponding to the current image frame according to the initial speed and the target direction.
Optionally, the initial pre-integration value comprises a first initial pre-integration value and a second initial pre-integration value; wherein the first initial pre-integration value is an initial relative displacement variation between the current image frame and the previous image frame; the second initial pre-integration value is an initial relative speed change amount between the current image frame and the previous image frame;
correspondingly, the second initial pre-integration correction module is specifically configured to: correcting a first initial pre-integration value corresponding to the current image frame according to a current target translation vector corresponding to the current image frame, a previous target translation vector and a previous target speed corresponding to a previous image frame and a time interval between the current image frame and the previous image frame, and determining a first target pre-integration value corresponding to the current image frame; and correcting the second initial pre-integration value corresponding to the current image frame according to the current target speed and the previous target speed corresponding to the current image frame and the time interval between the current image frame and the previous image frame, and determining the second target pre-integration value corresponding to the current image frame.
Alternatively, the first target pre-integration value corresponding to the current image frame is determined based on the following formula:
wherein,
a first target pre-integration value corresponding to a current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
a current target translation vector corresponding to a current image frame j;
a last target translation vector corresponding to a last image frame j-1 of the current image frame j;
the previous target speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration in the world coordinate system.
Alternatively, the second target pre-integration value corresponding to the current image frame is determined based on the following formula:
wherein,
a second target pre-integration value corresponding to the current image frame j;
a first rotation matrix from a world coordinate system to a camera coordinate system corresponding to the previous image frame j-1;
the current target speed corresponding to the current image frame j;
the previous target speed corresponding to the previous image frame j-1; delta t is the time interval between the current image frame j and the previous image frame j-1; g
wIs the gravitational acceleration in the world coordinate system.
Optionally, the apparatus further comprises: and the camera pose optimization module is used for optimizing the target translation vector and the initial rotation matrix in the initial pose by taking the target pre-integration value as constraint information after determining the target pre-integration value corresponding to the current image frame, and determining the optimized camera pose.
The camera pose correction device provided by the embodiment of the invention can execute the camera pose correction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the camera pose correction method.
EXAMPLE five
Fig. 7 is a schematic structural diagram of a system for correcting a camera pose according to a fifth embodiment of the present invention. Referring to fig. 7, the system includes: a pre-processing module 510, an initialization module 520, and a pose correction module 530.
The preprocessing module 510 is configured to perform detection processing on image information captured by a camera, determine image frame processing information, perform pre-integration on information acquired by an IMU inertial measurement unit, and determine an initial pre-integration value and an initial pose corresponding to each image frame, where the initial pose includes an initial translation vector; the initialization module 520 is configured to perform system initialization according to the image frame processing information, the initial pre-integration value, and the initial pose; the pose correction module 530 is used to implement a method of correcting the pose of a camera as provided by any of the embodiments of the invention.
The preprocessing module 510 may include an image frame processing unit and an IMU pre-integration unit, where the image frame processing unit is configured to perform detection processing on image information captured by the camera and determine image frame processing information. The IMU pre-integration unit is used for pre-integrating the information acquired by the IMU inertial measurement unit and determining an initial pre-integration value and an initial pose corresponding to each image frame.
The working process of the correction system for the camera pose provided by the embodiment is as follows: first, the preprocessing module 510 detects and processes image information captured by the camera to determine image frame processing information, performs pre-integration on information collected by the IMU inertial measurement unit to determine an initial pre-integration value and an initial pose corresponding to each image frame, and outputs the image frame processing information, the initial pre-integration value, and the initial pose to the initialization module 520. The initialization module 520 performs system initialization according to the output result of the preprocessing module 510, and performs visual inertial navigation alignment on the image frame processing information without scale and the IMU information with scale, thereby completing initialization of gyroscope bias, gravitational acceleration, scale and initial velocity. After the initialization is successful, the pose correction module 530 corrects at least one parameter of the initial translation vector, the initial velocity and the initial pre-integration value to reduce the influence of IMU noise, and optimizes the pose by using the corrected parameter to obtain a more accurate optimized camera pose.
It should be noted that, after the initialization of the system based on the information of the current image frame is successful, the initialization module 520 does not need to perform the initialization again when estimating the camera pose of the next image frame, and the initialization is performed again unless the target tracking fails and the positioning needs to be performed again.
When the correction system for the camera pose in the embodiment corrects the initial translation vector, the initial speed and the initial pre-integration value, only 2-3ms of system operation time cost is needed, so that the accuracy of the estimation of the camera pose of the system can be greatly improved.
For example, for a SLAM system, the pose correction module 530 may include a pose correction unit, a velocity correction unit, a pre-integration value correction unit, a sliding window optimization unit, and a global pose optimization unit. The pose correction unit is used for correcting the initial translation vector determined based on the IMU information according to image frame processing information of the camera; the speed correcting unit is used for correcting the initial speed determined based on the IMU information according to the corrected initial translation vector; the pre-integral value correcting unit is used for correcting the initial pre-integral value according to the corrected initial translation vector and the corrected initial speed; and the sliding window optimization unit is used for optimizing, triangulating and marginalizing the pose of the camera according to the corrected initial translation vector, the corrected initial speed and the corrected initial pre-integral value to obtain more accurate pose and other state quantities. The global pose optimization unit is used for optimizing poses of four degrees of freedom and obtaining a globally consistent pose estimation. The global pose optimization unit can further comprise a loop detection subunit, which is used for detecting whether the camera reaches the previous position and providing the detected information to the global pose optimization unit for the optimization processing of the camera pose.
When the SLAM system is initialized, if there is no local map, the initialization module 520 may obtain some initial 3D points for optimizing the estimated pose, and 3-digitize the newly detected 2D points according to the optimized pose (triangulate an estimated depth, construct 3D points using this depth), and update them into the local map. For example, in an unmodified SLAM system (only a sliding window optimization unit and a global pose optimization unit are included in the pose correction module 530), if the system has no local map at initialization, the preprocessing module 510 obtains initial poses, and the initialization module 520 obtains some initial 3D points for optimizing the initial poses in the sliding window optimization unit and updating the 3D points of the local map. However, in the improved SLAM system incorporating the correction unit, the pre-processing module 510 gets the initial pose at system initialization, and the initialization module 520 gets some initial 3D points, which are used to correct the pre-integrated value/initial pose in the correction unit on the one hand, and to further optimize the pose in the sliding window optimization unit on the other hand, and update the 3D points of the local map.
For example, for a VIO system, the pose correction module 530 may include a pose correction unit, a velocity correction unit, a pre-integration value correction unit, a pose optimization unit. The pose correction unit is used for correcting the initial translation vector determined based on the IMU information according to image frame processing information of the camera; the speed correcting unit is used for correcting the initial speed determined based on the IMU information according to the corrected initial translation vector; the pre-integral value correcting unit is used for correcting the initial pre-integral value according to the corrected initial translation vector and the corrected initial speed; and the pose optimization unit is used for optimizing the pose of the camera according to the corrected initial translation vector, the corrected initial speed and the corrected initial pre-integral value to obtain more accurate pose and other state quantities. The pose optimization unit can be, but is not limited to, a sliding window optimization unit or a filtering optimization unit. The pose optimization unit can perform pose optimization and output a pose according to the correction result on one hand, and can perform 3D conversion on the newly detected 2D points according to the optimized camera pose and update the 2D points into a local map on the other hand.
According to the system for correcting the camera pose, at least one parameter of the initial translation vector, the initial speed and the initial pre-integration value corresponding to the image frame is corrected based on the visual information, so that the influence of IMU noise can be reduced, the pose is optimized by using the corrected parameter, and the optimized more accurate camera pose is obtained.
EXAMPLE six
Fig. 8 is a schematic structural diagram of an apparatus according to a sixth embodiment of the present invention. Referring to fig. 8, the apparatus includes:
one or more processors 810;
a memory 820 for storing one or more programs;
when the one or more programs are executed by the one or more processors 810, the one or more processors 810 are caused to implement the method for correcting the pose of the camera as provided in any of the embodiments above, the method comprising:
acquiring an initial pre-integration value and an initial translation vector in an initial pose corresponding to a current image frame of a camera, wherein the initial pre-integration value and the initial translation vector are subjected to pre-integration determination according to information acquired by an Inertial Measurement Unit (IMU);
and calculating a total estimation error corresponding to the initial translation vector according to the image frame processing information, the initial pre-integration value and the initial translation vector of the camera, correcting the initial translation vector based on the total estimation error, and determining a target translation vector corresponding to the current image frame.
FIG. 8 illustrates an example of a processor 810; the processor 810 and the memory 820 in the server may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 8.
The memory 820 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the correction method of the camera pose in the embodiment of the present invention (for example, the initial information acquisition module 410 and the initial translation vector correction module 420 in the correction apparatus of the camera pose). The processor 810 executes various functional applications of the server and data processing by executing software programs, instructions, and modules stored in the memory 820, that is, implements the above-described correction method of the camera pose.
The memory 820 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the server, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 820 may further include memory located remotely from the processor 810, which may be connected to a server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The server proposed by the embodiment belongs to the same inventive concept as the method for correcting the camera pose proposed by the above embodiment, and the technical details not described in detail in the embodiment can be referred to the above embodiment, and the embodiment has the same beneficial effects as the method for correcting the camera pose.
EXAMPLE seven
The seventh embodiment provides a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a method of correcting a camera pose according to any embodiment of the present invention, the method including:
acquiring an initial pre-integration value and an initial translation vector in an initial pose corresponding to a current image frame of a camera, wherein the initial pre-integration value and the initial translation vector are subjected to pre-integration determination according to information acquired by an Inertial Measurement Unit (IMU);
and calculating a total estimation error corresponding to the initial translation vector according to the image frame processing information, the initial pre-integration value and the initial translation vector of the camera, correcting the initial translation vector based on the total estimation error, and determining a target translation vector corresponding to the current image frame.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.