CN111354043A - Three-dimensional attitude estimation method and device based on multi-sensor fusion - Google Patents

Three-dimensional attitude estimation method and device based on multi-sensor fusion Download PDF

Info

Publication number
CN111354043A
CN111354043A CN202010108437.2A CN202010108437A CN111354043A CN 111354043 A CN111354043 A CN 111354043A CN 202010108437 A CN202010108437 A CN 202010108437A CN 111354043 A CN111354043 A CN 111354043A
Authority
CN
China
Prior art keywords
objective function
depth image
point
imu
measured object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010108437.2A
Other languages
Chinese (zh)
Inventor
蔡国榕
吴则彪
陈桂锟
沈启金
王宗跃
吴云东
徐焕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN202010108437.2A priority Critical patent/CN111354043A/en
Publication of CN111354043A publication Critical patent/CN111354043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The invention provides a three-dimensional attitude estimation method and a device based on multi-sensor fusion, and the method comprises the following steps: acquiring an RGB image of a measured object, extracting FAST characteristic points and tracking the characteristic points; collecting IMU measured values of the measured object, and integrating to obtain the position, the speed and the rotation of the measured object at the current moment; acquiring a depth image of a measured object, and processing the depth image to obtain an iteration closest point; constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual reprojection; and solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object. The method carries out attitude estimation based on various sensors, constructs an objective function for attitude estimation, solves the objective function by using a least square method, inputs the input value of the function through a sliding window, sets the specific number of the sliding window, reduces the solving time and improves the pose estimation precision.

Description

Three-dimensional attitude estimation method and device based on multi-sensor fusion
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a three-dimensional attitude estimation method and device based on multi-sensor fusion.
Background
The attitude estimation problem is the problem of determining the azimuth and the direction of a certain three-dimensional target object. Pose estimation has applications in many areas such as robot vision, motion tracking, and single camera calibration. The sensors used for attitude estimation are not the same in different fields.
Model-based methods are typically estimated using geometric relationships of objects or feature points of objects. The basic idea is to use a certain geometric model or structure to represent the structure and shape of an object, and to establish a corresponding relationship between the model and an image by extracting certain object features, and then to realize estimation of the spatial attitude of the object by geometric or other methods. The model used here may be either a simple geometric form, such as a plane, a cylinder, or some geometry, or a three-dimensional model obtained by laser scanning or other methods. The model-based attitude estimation method is to calculate the similarity and update the attitude of the object by comparing the real image with the synthetic image. In order to avoid optimization search in a global state space, the existing model-based method generally degrades an optimization problem into a matching problem of a plurality of local features, and depends on accurate detection of the local features. When the noise is large and the accurate local features cannot be extracted, the robustness of the method is greatly influenced.
The learning-based method learns the corresponding relationship between the two-dimensional observation and the three-dimensional posture from training samples in different postures acquired in advance by means of a machine learning (machine learning) method, applies a decision rule or a regression function obtained by learning to the samples, and takes the obtained result as posture estimation of the samples. The learning-based method generally adopts global observation features, does not need to detect or identify local features of an object, and has better robustness. The disadvantage is that the precision and continuity of the attitude estimation cannot be guaranteed because the dense sampling required for continuous estimation in a high-dimensional space cannot be obtained.
Disclosure of Invention
The present invention provides the following technical solutions to overcome the above-mentioned drawbacks in the prior art.
A three-dimensional attitude estimation method based on multi-sensor fusion comprises the following steps:
an RGB image acquisition step, wherein an image sensor is used for acquiring an RGB image of a measured object, FAST characteristic points of the RGB image are extracted, and a KLT sparse optical flow method is used for tracking the characteristic points;
an IMU measurement value acquisition step, namely acquiring an IMU measurement value of a measured object by using an IMU sensor, and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;
a depth image obtaining step, namely obtaining a depth image of the measured object by using a depth image sensor, and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;
an objective function construction step, namely constructing an objective function based on prior information, an iteration closest point, an IMU (inertial measurement Unit) measurement value and visual reprojection;
and a pose estimation step, namely solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.
Still further, the method further comprises: and a loop detection step, namely detecting whether a loop occurs in the current frame, if so, executing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error.
Further, the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure BDA0002389161120000031
And displacement parameter
Figure BDA0002389161120000032
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2Where PP represents the prior distribution of prior information, D represents the iterative closest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope and acceleration biases between two frames of the object, and V represents the reprojection error.
Further, the least squares are solved based on a sliding window, input parameters required by the least squares calculation are contained in the sliding window, and error terms except the prior information of each least square are provided by values in the sliding window.
Further, the key frame in the RGB image is determined based on the following conditions:
the difference value of the average parallax of the frame and the average parallax of the key frame of the previous frame exceeds a first threshold value;
alternatively, the number of feature points tracked by the frame is less than a second threshold.
Further, the geometry of the space is derived from the depth image to estimate the transformation matrix between two frames, and the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure BDA0002389161120000033
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure BDA0002389161120000041
And displacement parameter
Figure BDA0002389161120000042
Is a motion parameter between two frames;
the reprojection error V serves as a visual constraint and is defined on the tangent plane of the unit sphere:
Figure BDA0002389161120000043
wherein the content of the first and second substances,
Figure BDA0002389161120000044
is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,
Figure BDA0002389161120000045
and
Figure BDA0002389161120000046
are any two orthogonal bases of the tangent plane to which P corresponds.
The invention also provides a three-dimensional attitude estimation device based on multi-sensor fusion, which comprises:
an RGB image acquisition unit, which is used for acquiring an RGB image of a measured object by using an image sensor, extracting FAST characteristic points of the RGB image and tracking the characteristic points by using a KLT sparse optical flow method;
the IMU measurement value acquisition unit is used for acquiring an IMU measurement value of the measured object by using an IMU sensor and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;
the depth image acquisition unit is used for acquiring a depth image of the measured object by using a depth image sensor and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;
the objective function construction unit is used for constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual re-projection;
and the pose estimation unit is used for solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.
Furthermore, the loop detection unit is used for detecting whether a loop occurs in the current frame, and if so, global pose graph optimization is executed to correct the positioning track of the object so as to eliminate the accumulated error.
Further, the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure BDA0002389161120000051
And displacement parameter
Figure BDA0002389161120000052
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2Where PP represents the prior distribution of prior information, D represents the iterative closest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope and acceleration biases between two frames of the object, and V represents the reprojection error.
Further, the least squares are solved based on a sliding window, input parameters required by the least squares calculation are contained in the sliding window, and error terms except the prior information of each least square are provided by values in the sliding window.
Further, the key frame in the RGB image is determined based on the following conditions:
the difference value of the average parallax of the frame and the average parallax of the key frame of the previous frame exceeds a first threshold value;
alternatively, the number of feature points tracked by the frame is less than a second threshold.
Further, the geometry of the space is derived from the depth image to estimate the transformation matrix between two frames, and the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure BDA0002389161120000053
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure BDA0002389161120000054
And displacement parameter
Figure BDA0002389161120000055
Is a motion parameter between two frames;
the reprojection error V serves as a visual constraint and is defined on the tangent plane of the unit sphere:
Figure BDA0002389161120000061
wherein the content of the first and second substances,
Figure BDA0002389161120000062
is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,
Figure BDA0002389161120000063
and
Figure BDA0002389161120000064
are any two orthogonal bases of the tangent plane to which P corresponds.
The invention has the technical effects that: the invention discloses a three-dimensional attitude estimation method based on multi-sensor fusion, which comprises the following steps: acquiring an RGB image of a measured object by using an image sensor, extracting FAST (FAST Fourier transform) feature points of the RGB image and tracking the feature points by using a KLT sparse optical flow method; the method comprises the steps that an IMU sensor is used for collecting IMU measured values of a measured object, and the IMU measured values are integrated to obtain the position, the speed and the rotation of the measured object at the current moment; acquiring a depth image of a measured object by using a depth image sensor, and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point; constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual reprojection; and solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object. The method is based on RGB images, IMU measurement values and depth images of moving objects acquired by various sensors for attitude estimation, an objective function for attitude estimation is constructed, the objective function is solved by using a least square method, the input value of the function is input through a sliding window, the specific number of the sliding window is set, the solving time is reduced, the estimation precision of the pose is improved, a loop detection process is also designed, global pose graph optimization is executed when loop occurs to correct the positioning track of the objects, so that accumulated errors are eliminated, and a corresponding residual error function is specifically designed.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
Fig. 1 is a flowchart of a three-dimensional attitude estimation method based on multi-sensor fusion according to an embodiment of the present invention.
Fig. 2 is a block diagram of a three-dimensional attitude estimation apparatus based on multi-sensor fusion according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a three-dimensional attitude estimation method based on multi-sensor fusion, which comprises the following steps:
an RGB image acquisition step S101 of acquiring an RGB image of a measurement object using an image sensor, extracting FAST feature points of the RGB image, and tracking the feature points using a KLT sparse optical flow method; the image sensor can be a camera, a video camera and the like, and can acquire a plurality of frames of images of the measured object.
Every time a new RGB image comes, the system will automatically extract FAST feature points, and then use KLT sparse optical flow method for feature point tracking. In order to ensure the tracking accuracy and keep the calculation cost relatively small, the number of feature point tracking can be set to be in the range of 100 to 300, and new feature points are detected while the feature point tracking is performed so as to maintain the number of tracked feature points to be always kept within a threshold value.
An IMU measurement value acquisition step S102, wherein an IMU sensor is used for acquiring an IMU measurement value of a measured object, and the IMU measurement value is integrated to obtain the position, the speed and the rotation of the measured object at the current moment; the IMU sensor is also called an inertial sensor, and the position, the speed and the rotation of the current moment can be obtained by integrating discrete IMU measurement values, so that motion estimation can be performed on an object carrying the sensor at present.
A depth image obtaining step S103, obtaining a depth image of the measured object by using a depth image sensor, and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point; the depth image sensor can be a depth camera or a depth camera, and the geometric structure of the measured object space can be obtained through the depth image, so that the transformation matrix between two frames is estimated.
In addition, the system estimates the real scale of the scene through the depth image. The depth image can provide the real distance of a pixel point relative to a camera optical center (Pinhole), so that the transformation of pixel coordinates and space coordinates can be carried out through a Pinhole camera model, and the real scale of a scene can be recovered.
Although the acquisition frequency of the IMU measurement values in S102 is higher than that of the depth images in S103, in general, the IMU measurement values accumulated in the time difference between two adjacent frames of depth images are not enough to integrate the requirement because the IMU data and the image data need to be aligned. The invention accumulates IMU measured values in image acquisition intervals of four continuous frames so as to avoid fluctuation of an integration result caused by insufficient IMU measured data. The depth image processing of four frames apart obtains the iteration closest point with larger error, the invention reserves the iteration closest point calculation result of the adjacent frame, and combines the iteration closest point calculation results of the continuous four frames as the iteration closest point result of the adjacent 4 frames.
And an objective function construction step S104, constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual reprojection. Since each sensor has an unsuitable environment, the environment during exercise cannot be controlled manually. In order to improve robustness under multiple scenes and obtain more accurate positioning, the invention adopts multiple sensors to perform data fusion, and improves the accuracy and robustness of pose estimation.
The invention comprehensively utilizes three sensor information such as RGB image, depth image, IMU measurement value and the like to estimate the optimal attitude estimation between two frames in a tight coupling (Tightly-coupled) mode, and solves the least square problem in an iterative mode.
Specifically, the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure BDA0002389161120000091
And displacement parameter
Figure BDA0002389161120000092
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2The sum of (a) and (b) is the lowest, where PP denotes the prior distribution of the prior information, D denotes the iterative closest point residual term, I denotes the difference in the variation of position, velocity, rotation, gyroscope bias and acceleration bias between two frames of the object, and V denotes the reprojection error, i.e. PP, D, I, V may also be referred to as four constraints of the objective function, i.e. four residual terms.
The state outside the sliding window can also provide certain constraint information, in order to obtain a more real constraint relation, the state outside the sliding window is also considered (but not optimized), and the system uses a marginalization skill to convert the constraint information into a prior distribution of variables to be optimized, namely PP in the least square problem.
For spatial geometry derived from depth images to estimate the transformation matrix between two frames, the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure BDA0002389161120000101
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure BDA0002389161120000102
And displacement parameter
Figure BDA0002389161120000103
Is the motion parameter between two frames.
According to the calculation characteristics of the iteration closest point, the system adopts the GPU to improve the calculation efficiency of the iteration closest point. Because the depth image similarity between two frames is high, the number of iterations is limited to 5 times at most in order to avoid unnecessary iterations.
For the IMU constraint I, given two frames of IMU measurements (acceleration and angular velocity), IMU pre-integration can compute position, velocity, and rotation. The system defines the IMU error term as the difference in the amount of change in position, velocity, rotation, gyroscope bias, and acceleration bias between two frames.
For the reprojection error V, as a visual constraint, it is defined on the tangent plane of the unit sphere:
Figure BDA0002389161120000104
wherein the content of the first and second substances,
Figure BDA0002389161120000105
is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,
Figure BDA0002389161120000106
and
Figure BDA0002389161120000107
are any two orthogonal bases of the tangent plane to which P corresponds. For projection and back projection
Figure BDA0002389161120000108
Is a projection function derived from a pinhole camera model projected from a three-dimensional space point to a two-dimensional space point, defined as:
Figure BDA0002389161120000111
wherein (f)x,fy) And (c)x,cy) The camera internal reference can be obtained by a checkerboard calibration method and a phase internal reference calibration method. The inverse operation of the projection is back projection.
And a pose estimation step S105, solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.
And adding all residual error items together to form an objective function, and minimizing the objective function by using an iterative method to obtain the optimal pose. The direction of iteration is determined by the first derivative of the objective function. In order to ensure the real-time performance of the system, the maximum iteration number for solving the pose is set to be 8.
In one embodiment, the initial values for the method are provided by roughly estimating the states (gravity vector, gyroscope bias and velocity, etc.) on which the system depends through a sliding window based motion recovery structure (SfM) and a visual inertial correction module before implementing the method.
In one embodiment, the method further comprises: and a loop detection step S106, detecting whether a loop occurs in the current frame, if so, executing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error. The method uses bag-of-words model based loop detection, and if the current frame is determined to be a key frame, the system converts the RGB image into a vector according to the dictionary. And then submitting the vector to loop detection, wherein the loop detection judges whether the current frame has loop or not and returns a result. If the loop is detected to occur, global Pose Graph (Pose-Graph) optimization is performed to correct the positioning track and eliminate accumulated errors.
In one embodiment, the least squares are solved based on a sliding window containing the input parameters required for the least squares calculation, and the error term for each least square, excluding the prior information, is provided by the values within the sliding window. Namely, the invention adopts a form based on a sliding window to realize the least square problem. All the inputs (pixel points, three-dimensional space points, states and the like) required by least squares are contained in the sliding window, and the error term of each least square is provided by the value in the sliding window (except the prior information). To limit the solution time of the least squares problem, setting the number of sliding windows to 10 is sufficient to derive an accurate pose (also referred to as pose) estimate between two frames.
In one embodiment, the keyframes in the RGB image are determined based on the following conditions: the average disparity of the frame is different from the average disparity of the key frame of the previous frame by more than a first threshold (for example, the first threshold is set to 1.0, and the positioning performance is good in most scenes); alternatively, the number of feature points tracked by the frame is less than a second threshold, for example, the second threshold is set to 10, which ensures high robustness of the system and does not generate too many key frames.
The method is based on RGB images, IMU measurement values and depth images of moving objects acquired by various sensors for attitude estimation, an objective function for attitude estimation is constructed, the objective function is solved by using a least square method, the input value of the function is input through a sliding window, the specific number of the sliding window is set, the solving time is reduced, the estimation precision of the pose is improved, a loop detection process is also designed, global pose graph optimization is executed when loop occurs to correct the positioning track of the objects, so that accumulated errors are eliminated, and a corresponding residual error function is specifically designed.
Fig. 2 shows a three-dimensional attitude estimation device based on multi-sensor fusion of the present invention, which comprises:
an RGB image acquisition unit 201 for acquiring an RGB image of a measured object using an image sensor, extracting FAST feature points of the RGB image, and tracking the feature points using a KLT sparse optical flow method; the image sensor can be a camera, a video camera and the like, and can acquire a plurality of frames of images of the measured object.
Every time a new RGB image comes, the system will automatically extract FAST feature points, and then use KLT sparse optical flow method for feature point tracking. In order to ensure the tracking accuracy and keep the calculation cost relatively small, the number of feature point tracking can be set to be in the range of 100 to 300, and new feature points are detected while the feature point tracking is performed so as to maintain the number of tracked feature points to be always kept within a threshold value.
An IMU measurement value acquisition unit 202, configured to acquire an IMU measurement value of a measured object using an IMU sensor, and integrate the IMU measurement value to obtain a position, a speed, and a rotation of the measured object at a current time; the IMU sensor is also called an inertial sensor, and the position, the speed and the rotation of the current moment can be obtained by integrating discrete IMU measurement values, so that motion estimation can be performed on an object carrying the sensor at present.
A depth image obtaining unit 203, configured to obtain a depth image of the measured object by using a depth image sensor, and process the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point; the depth image sensor can be a depth camera or a depth camera, and the geometric structure of the measured object space can be obtained through the depth image, so that the transformation matrix between two frames is estimated.
An objective function construction unit 204, configured to construct an objective function based on the prior information, the iterative closest point, the IMU measurement value, and the visual re-projection. Since each sensor has an unsuitable environment, the environment during exercise cannot be controlled manually. In order to improve robustness under multiple scenes and obtain more accurate positioning, the invention adopts multiple sensors to perform data fusion, and improves the accuracy and robustness of pose estimation.
The invention comprehensively utilizes three sensor information such as RGB image, depth image, IMU measurement value and the like to estimate the optimal attitude estimation between two frames in a tight coupling (Tightly-coupled) mode, and solves the least square problem in an iterative mode.
Specifically, the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure BDA0002389161120000141
And displacement parameter
Figure BDA0002389161120000142
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2Has the lowest value of (a), wherein PP means firstThe prior distribution of the experimental information, D represents the iterative closest point residual terms, I represents the difference in the amount of change in position, velocity, rotation, gyroscope bias and acceleration bias between two frames of the object, and V represents the reprojection error, i.e., PP, D, I, V may also be referred to as four constraints of the objective function, i.e., four residual terms.
The state outside the sliding window can also provide certain constraint information, in order to obtain a more real constraint relation, the state outside the sliding window is also considered (but not optimized), and the system uses a marginalization skill to convert the constraint information into a prior distribution of variables to be optimized, namely PP in the least square problem.
For spatial geometry derived from depth images to estimate the transformation matrix between two frames, the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure BDA0002389161120000151
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure BDA0002389161120000152
And displacement parameter
Figure BDA0002389161120000153
Is the motion parameter between two frames.
According to the calculation characteristics of the iteration closest point, the system adopts the GPU to improve the calculation efficiency of the iteration closest point. Because the depth image similarity between two frames is high, the number of iterations is limited to 5 times at most in order to avoid unnecessary iterations.
For the IMU constraint I, given two frames of IMU measurements (acceleration and angular velocity), IMU pre-integration can compute position, velocity, and rotation. The system defines the IMU error term as the difference in the amount of change in position, velocity, rotation, gyroscope bias, and acceleration bias between two frames.
For the reprojection error V, as a visual constraint, it is defined on the tangent plane of the unit sphere:
Figure BDA0002389161120000154
wherein the content of the first and second substances,
Figure BDA0002389161120000155
is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,
Figure BDA0002389161120000156
and
Figure BDA0002389161120000157
are any two orthogonal bases of the tangent plane to which P corresponds. For projection and back projection
Figure BDA0002389161120000158
Is a projection function derived from a pinhole camera model projected from a three-dimensional space point to a two-dimensional space point, defined as:
Figure BDA0002389161120000159
wherein (f)x,fy) And (c)x,cy) The camera internal reference can be obtained by a checkerboard calibration method and a phase internal reference calibration method. The inverse operation of the projection is back projection.
And the pose estimation unit 205 is configured to solve the minimum value for the objective function by using an iterative method to obtain an optimal pose of the object.
And adding all residual error items together to form an objective function, and minimizing the objective function by using an iterative method to obtain the optimal pose. The direction of iteration is determined by the first derivative of the objective function. In order to ensure the real-time performance of the system, the maximum iteration number for solving the pose is set to be 8.
In one embodiment, prior to implementing the present apparatus, initial values are provided for the apparatus by roughly estimating the states (gravity vector, gyroscope bias, velocity, etc.) on which the system depends based on a sliding window motion recovery structure (SfM) and a visual inertial correction module.
In one embodiment, the apparatus further comprises: and a loop detection unit 206, configured to detect whether a loop occurs in the current frame, and if so, perform global pose graph optimization to correct the positioning trajectory of the object to eliminate the accumulated error. The device uses bag-of-words model based loop detection, if the current frame is determined to be a key frame, the system will convert the RGB image into a vector according to the dictionary. And then submitting the vector to loop detection, wherein the loop detection judges whether the current frame has loop or not and returns a result. If the loop is detected to occur, global Pose Graph (Pose-Graph) optimization is performed to correct the positioning track and eliminate accumulated errors.
In one embodiment, the least squares are solved based on a sliding window containing the input parameters required for the least squares calculation, and the error term for each least square, excluding the prior information, is provided by the values within the sliding window. Namely, the invention adopts a form based on a sliding window to realize the least square problem. All the inputs (pixel points, three-dimensional space points, states and the like) required by least squares are contained in the sliding window, and the error term of each least square is provided by the value in the sliding window (except the prior information). To limit the solution time of the least squares problem, setting the number of sliding windows to 10 is sufficient to derive an accurate pose (also referred to as pose) estimate between two frames.
In one embodiment, the keyframes in the RGB image are determined based on the following conditions: the average disparity of the frame is different from the average disparity of the key frame of the previous frame by more than a first threshold (for example, the first threshold is set to 1.0, and the positioning performance is good in most scenes); alternatively, the number of feature points tracked by the frame is less than a second threshold, for example, the second threshold is set to 10, which ensures high robustness of the system and does not generate too many key frames.
The device carries out attitude estimation based on RGB images, IMU measurement values and depth images of a moving object acquired by various sensors, constructs an objective function for attitude estimation, solves the objective function by using a least square method, inputs the input value of the function through a sliding window, sets the specific number of the sliding window, reduces the solving time, improves the estimation precision of the pose, designs a loop detection process, and corrects the positioning track of the object by executing global pose graph optimization when loop occurs so as to eliminate accumulated errors, and specifically designs a corresponding residual error function.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or the portions that contribute to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the apparatuses described in the embodiments or some portions of the embodiments of the present application.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims (12)

1. A three-dimensional attitude estimation method based on multi-sensor fusion is characterized by comprising the following steps:
an RGB image acquisition step, wherein an image sensor is used for acquiring an RGB image of a measured object, FAST characteristic points of the RGB image are extracted, and a KLT sparse optical flow method is used for tracking the characteristic points;
an IMU measurement value acquisition step, namely acquiring an IMU measurement value of a measured object by using an IMU sensor, and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;
a depth image obtaining step, namely obtaining a depth image of the measured object by using a depth image sensor, and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;
an objective function construction step, namely constructing an objective function based on prior information, an iteration closest point, an IMU (inertial measurement Unit) measurement value and visual reprojection;
and a pose estimation step, namely solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.
2. The method of claim 1, further comprising:
and a loop detection step, namely detecting whether a loop occurs in the current frame, if so, executing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error.
3. The method of claim 2, wherein the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure FDA0002389161110000011
And displacement parameter
Figure FDA0002389161110000012
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2Where PP represents the prior distribution of prior information, D represents the iterative closest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope and acceleration biases between two frames of the object, and V represents the reprojection error.
4. The method of claim 3, wherein the least squares are solved based on a sliding window, the sliding window containing input parameters required for the least squares calculation, and the error term of each least square, excluding the prior information, is provided by a value within the sliding window.
5. The method of claim 4, wherein the key frame in the RGB image is determined based on the following conditions:
the difference value of the average parallax of the frame and the average parallax of the key frame of the previous frame exceeds a first threshold value;
alternatively, the number of feature points tracked by the frame is less than a second threshold.
6. The method of claim 3,
the spatial geometry is derived from the depth image to estimate the transformation matrix between two frames, and the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure FDA0002389161110000021
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure FDA0002389161110000022
And displacement parameter
Figure FDA0002389161110000023
Is a motion parameter between two frames;
the reprojection error V serves as a visual constraint and is defined on the tangent plane of the unit sphere:
Figure FDA0002389161110000024
wherein the content of the first and second substances,
Figure FDA0002389161110000025
is the predicted value of the unit vector of the three-dimensional point after the rigid body transformation of the characteristic point, P is the three-dimensional point obtained by the back projection of the pixel coordinate of the characteristic point through the pinhole camera model,
Figure FDA0002389161110000031
and
Figure FDA0002389161110000032
are any two orthogonal bases of the tangent plane to which P corresponds.
7. A three-dimensional attitude estimation device based on multi-sensor fusion is characterized by comprising:
an RGB image acquisition unit, which is used for acquiring an RGB image of a measured object by using an image sensor, extracting FAST characteristic points of the RGB image and tracking the characteristic points by using a KLT sparse optical flow method;
the IMU measurement value acquisition unit is used for acquiring an IMU measurement value of the measured object by using an IMU sensor and integrating the IMU measurement value to obtain the position, the speed and the rotation of the measured object at the current moment;
the depth image acquisition unit is used for acquiring a depth image of the measured object by using a depth image sensor and processing the depth image by using an Iterative Closest Point (ICP) algorithm to obtain an iterative closest point;
the objective function construction unit is used for constructing an objective function based on the prior information, the iteration closest point, the IMU measurement value and the visual re-projection;
and the pose estimation unit is used for solving the minimum value of the objective function by using an iterative method to obtain the optimal pose of the object.
8. The apparatus of claim 7, further comprising:
and the loop detection unit is used for detecting whether a loop occurs to the current frame, and if so, performing global pose graph optimization to correct the positioning track of the object so as to eliminate the accumulated error.
9. The method of claim 8, wherein the objective function is:
min(R,t){PP+∑||D||2+∑||I||2+∑||V||2};
the most suitable motion parameter is obtained by the least square method: rotation parameter
Figure FDA0002389161110000033
And displacement parameter
Figure FDA0002389161110000041
Making four residual terms PP, ∑ | | | D | | non-woven2、∑||I||2And ∑ V calculation2Where PP represents the prior distribution of prior information, D represents the iterative closest point residual term, I represents the difference in the amount of change in position, velocity, rotation, gyroscope and acceleration biases between two frames of the object, and V represents the reprojection error.
10. The method of claim 9, wherein the least squares are solved based on a sliding window, the sliding window containing input parameters required for the least squares calculation, and the error term of each least square, excluding the prior information, is provided by a value within the sliding window.
11. The method of claim 10, wherein determining the key frame in the RGB image is based on:
the difference value of the average parallax of the frame and the average parallax of the key frame of the previous frame exceeds a first threshold value;
alternatively, the number of feature points tracked by the frame is less than a second threshold.
12. The method of claim 9,
the spatial geometry is derived from the depth image to estimate the transformation matrix between two frames, and the Iterative Closest Point (ICP) residual term D uses the point-to-face distance as:
D=(T·Pj-Pi)·ni
Figure FDA0002389161110000042
wherein, PjAnd PiIs a matched three-dimensional point, n, in the depth imageiIs the corresponding normal vector, rotation parameter of the previous frame
Figure FDA0002389161110000043
And displacement parameter
Figure FDA0002389161110000044
Is a motion parameter between two frames;
the reprojection error V serves as a visual constraint and is defined on the tangent plane of the unit sphere:
Figure FDA0002389161110000051
wherein the content of the first and second substances,
Figure FDA0002389161110000052
of unit vectors of three-dimensional points after rigid body transformation of characteristic pointsPredicting value P is three-dimensional point obtained by pixel coordinate back projection of pinhole camera model,
Figure FDA0002389161110000053
and
Figure FDA0002389161110000054
are any two orthogonal bases of the tangent plane to which P corresponds.
CN202010108437.2A 2020-02-21 2020-02-21 Three-dimensional attitude estimation method and device based on multi-sensor fusion Pending CN111354043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010108437.2A CN111354043A (en) 2020-02-21 2020-02-21 Three-dimensional attitude estimation method and device based on multi-sensor fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010108437.2A CN111354043A (en) 2020-02-21 2020-02-21 Three-dimensional attitude estimation method and device based on multi-sensor fusion

Publications (1)

Publication Number Publication Date
CN111354043A true CN111354043A (en) 2020-06-30

Family

ID=71194092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010108437.2A Pending CN111354043A (en) 2020-02-21 2020-02-21 Three-dimensional attitude estimation method and device based on multi-sensor fusion

Country Status (1)

Country Link
CN (1) CN111354043A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112097768A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Robot posture determining method and device, robot and storage medium
CN112230242A (en) * 2020-09-30 2021-01-15 深兰人工智能(深圳)有限公司 Pose estimation system and method
CN112734765A (en) * 2020-12-03 2021-04-30 华南理工大学 Mobile robot positioning method, system and medium based on example segmentation and multi-sensor fusion
CN113436254A (en) * 2021-06-29 2021-09-24 杭州电子科技大学 Cascade decoupling pose estimation method
CN113487674A (en) * 2021-07-12 2021-10-08 北京未来天远科技开发有限公司 Human body pose estimation system and method
CN113610149A (en) * 2021-08-05 2021-11-05 上海氢枫能源技术有限公司 Pose real-time display method and system of hydrogen compressor
CN114554030A (en) * 2020-11-20 2022-05-27 空客(北京)工程技术中心有限公司 Device detection system and device detection method
CN114608569A (en) * 2022-02-22 2022-06-10 杭州国辰机器人科技有限公司 Three-dimensional pose estimation method, system, computer device and storage medium
CN116310083A (en) * 2023-02-03 2023-06-23 杭州百世伽信息科技有限公司 Human body posture depth image generation method and human body posture recognition system
CN116912948A (en) * 2023-09-12 2023-10-20 南京硅基智能科技有限公司 Training method, system and driving system for digital person

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934827A (en) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 The method for reconstructing and device of three-dimensional scenic
US20170278231A1 (en) * 2016-03-25 2017-09-28 Samsung Electronics Co., Ltd. Device for and method of determining a pose of a camera
CN108876897A (en) * 2018-04-20 2018-11-23 杭州电子科技大学 The quickly scene three-dimensional reconstruction method under movement
CN109993113A (en) * 2019-03-29 2019-07-09 东北大学 A kind of position and orientation estimation method based on the fusion of RGB-D and IMU information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934827A (en) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 The method for reconstructing and device of three-dimensional scenic
US20170278231A1 (en) * 2016-03-25 2017-09-28 Samsung Electronics Co., Ltd. Device for and method of determining a pose of a camera
CN108876897A (en) * 2018-04-20 2018-11-23 杭州电子科技大学 The quickly scene three-dimensional reconstruction method under movement
CN109993113A (en) * 2019-03-29 2019-07-09 东北大学 A kind of position and orientation estimation method based on the fusion of RGB-D and IMU information

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230242A (en) * 2020-09-30 2021-01-15 深兰人工智能(深圳)有限公司 Pose estimation system and method
CN112230242B (en) * 2020-09-30 2023-04-25 深兰人工智能(深圳)有限公司 Pose estimation system and method
CN112097768B (en) * 2020-11-17 2021-03-02 深圳市优必选科技股份有限公司 Robot posture determining method and device, robot and storage medium
CN112097768A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Robot posture determining method and device, robot and storage medium
CN114554030B (en) * 2020-11-20 2023-04-07 空客(北京)工程技术中心有限公司 Device detection system and device detection method
CN114554030A (en) * 2020-11-20 2022-05-27 空客(北京)工程技术中心有限公司 Device detection system and device detection method
CN112734765A (en) * 2020-12-03 2021-04-30 华南理工大学 Mobile robot positioning method, system and medium based on example segmentation and multi-sensor fusion
CN112734765B (en) * 2020-12-03 2023-08-22 华南理工大学 Mobile robot positioning method, system and medium based on fusion of instance segmentation and multiple sensors
CN113436254A (en) * 2021-06-29 2021-09-24 杭州电子科技大学 Cascade decoupling pose estimation method
CN113436254B (en) * 2021-06-29 2022-07-05 杭州电子科技大学 Cascade decoupling pose estimation method
CN113487674A (en) * 2021-07-12 2021-10-08 北京未来天远科技开发有限公司 Human body pose estimation system and method
CN113487674B (en) * 2021-07-12 2024-03-08 未来元宇数字科技(北京)有限公司 Human body pose estimation system and method
CN113610149A (en) * 2021-08-05 2021-11-05 上海氢枫能源技术有限公司 Pose real-time display method and system of hydrogen compressor
CN113610149B (en) * 2021-08-05 2024-03-26 上海氢枫能源技术有限公司 Method and system for displaying pose of hydrogen compressor in real time
CN114608569A (en) * 2022-02-22 2022-06-10 杭州国辰机器人科技有限公司 Three-dimensional pose estimation method, system, computer device and storage medium
CN114608569B (en) * 2022-02-22 2024-03-01 杭州国辰机器人科技有限公司 Three-dimensional pose estimation method, system, computer equipment and storage medium
CN116310083A (en) * 2023-02-03 2023-06-23 杭州百世伽信息科技有限公司 Human body posture depth image generation method and human body posture recognition system
CN116310083B (en) * 2023-02-03 2023-11-14 杭州百世伽信息科技有限公司 Human body posture depth image generation method and human body posture recognition system
CN116912948A (en) * 2023-09-12 2023-10-20 南京硅基智能科技有限公司 Training method, system and driving system for digital person
CN116912948B (en) * 2023-09-12 2023-12-01 南京硅基智能科技有限公司 Training method, system and driving system for digital person

Similar Documents

Publication Publication Date Title
CN111354043A (en) Three-dimensional attitude estimation method and device based on multi-sensor fusion
Teed et al. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras
Jiao et al. Robust odometry and mapping for multi-lidar systems with online extrinsic calibration
US10225473B2 (en) Threshold determination in a RANSAC algorithm
EP1679657B1 (en) Estimation system, estimation method, and estimation program for estimating object state
CN101398934B (en) Method and system for tracking objects in images
EP2572319B1 (en) Method and system for fusing data arising from image sensors and from motion or position sensors
US11138742B2 (en) Event-based feature tracking
CN108051002A (en) Transport vehicle space-location method and system based on inertia measurement auxiliary vision
CN108229416B (en) Robot SLAM method based on semantic segmentation technology
US20070237359A1 (en) Method and apparatus for adaptive mean shift tracking
CN110726406A (en) Improved nonlinear optimization monocular inertial navigation SLAM method
Michot et al. Bi-objective bundle adjustment with application to multi-sensor slam
CN111932674A (en) Optimization method of line laser vision inertial system
Roberts et al. Learning general optical flow subspaces for egomotion estimation and detection of motion anomalies
EP3633617A2 (en) Image processing device
US10229508B2 (en) Dynamic particle filter parameterization
CN114485640A (en) Monocular vision inertia synchronous positioning and mapping method and system based on point-line characteristics
CN112179373A (en) Measuring method of visual odometer and visual odometer
White et al. An iterative pose estimation algorithm based on epipolar geometry with application to multi-target tracking
CN114693754A (en) Unmanned aerial vehicle autonomous positioning method and system based on monocular vision inertial navigation fusion
Dang et al. Stereo calibration in vehicles
Schill et al. Estimating ego-motion in panoramic image sequences with inertial measurements
Jiang et al. Icp stereo visual odometry for wheeled vehicles based on a 1dof motion prior
US20200184656A1 (en) Camera motion estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination