WO2023155258A1 - Visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering - Google Patents

Visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering Download PDF

Info

Publication number
WO2023155258A1
WO2023155258A1 PCT/CN2022/080346 CN2022080346W WO2023155258A1 WO 2023155258 A1 WO2023155258 A1 WO 2023155258A1 CN 2022080346 W CN2022080346 W CN 2022080346W WO 2023155258 A1 WO2023155258 A1 WO 2023155258A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
frame
bundle
bundles
trajectory
Prior art date
Application number
PCT/CN2022/080346
Other languages
French (fr)
Chinese (zh)
Inventor
槐建柱
庄园
林煜凯
Original Assignee
武汉大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉大学 filed Critical 武汉大学
Publication of WO2023155258A1 publication Critical patent/WO2023155258A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/005Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 with correlation of navigation data from several sources, e.g. map or contour matching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • G01C21/1656Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments with passive imaging devices, e.g. cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/18Stabilised platforms, e.g. by gyroscope
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/183Compensation of inertial measurements, e.g. for temperature effects

Definitions

  • the invention belongs to the field of multi-sensor fusion navigation and positioning, and in particular relates to the estimation of position, velocity and attitude based on a combined system of a monocular or binocular camera and an inertial measurement unit (IMU) and real-time self-calibration of sensor parameters in the system.
  • IMU inertial measurement unit
  • Visual inertial odometry can track the pose of the device in real time by using the data collected by the fixed camera and IMU on the device. This technology combines computer 3D vision and inertial dead reckoning, and uses low-cost sensors to achieve centimeter-level real-time positioning accuracy.
  • Visual-inertial odometry is also widely used in industries such as drones and autonomous driving. DJI's new UAV can accurately estimate the pose in complex scenes through visual inertial odometry technology [5], so as to realize a series of flight actions such as obstacle avoidance and crossing. A visual-inertial odometry approach is used in several autonomous driving suites to assist in automated parking.
  • the present invention proposes a visual-inertial odometry method suitable for camera and IMU combined systems based on keyframe-based sliding window filter (Keyframe-based Sliding Window Filter, KSWF) contained in the present invention.
  • KSWF Keyframe-based Sliding Window Filter
  • This method is based on Extended Kalman Filter (EKF) for state estimation, so it has good real-time performance.
  • EKF Extended Kalman Filter
  • This method can use scene features to calibrate (ie, self-calibrate) the geometric parameters of the camera and IMU in real time, and filter the variables in the sliding window based on the concept of key frames to solve the degenerate motion.
  • state quantities such as position, velocity, attitude, and sensor parameters estimated by the filter as state vectors.
  • the filter also estimates the uncertainty of these state quantities, ie the covariance of the state vectors.
  • the position and attitude are referred to as pose for short, and the pose and velocity at the same time t are referred to as navigation state quantities.
  • sensors generally refer to cameras and IMUs
  • devices refer to devices that carry these sensors, such as mobile phones
  • filters refer to programs that fuse data for state estimation.
  • KSWF is suitable for a system consisting of one or two cameras and an IMU fixed, based on sliding window filter fusion of their collected data to estimate the state vector and covariance of the system (or carrying equipment).
  • KSWF can calibrate the zero bias, scale factor, misalignment and acceleration sensitivity of the IMU in real time, as well as the camera's projection geometry parameters, external parameters, camera time delay and rolling shutter effect.
  • Its feature association front-end obtains a series of frames from a monocular camera or a series of roughly synchronized frame pairs from a binocular camera as input.
  • the monocular frames or binocular frame pairs are collectively referred to as frame bundles.
  • the front-end extracts feature points in frame bundles, and matches feature points between frame bundles based on the concept of key frames, so as to obtain the trajectory observed by feature points (that is, feature trajectory), and according to the overlapping ratio of the current frame bundle and the historical frame bundle view Select the keyframe bundle.
  • the back-end filter uses IMU data for pose estimation, then uses the observations in the feature trajectory to update the state vector and covariance of the filter, and finally manages the state quantities in the filter based on the concept of key frames to ensure real-time operation.
  • KSWF includes the geometric calibration parameters of the camera and IMU in the state vector, it can estimate the calibration parameters of the sensor in real time to deal with inaccurate calibration.
  • the calibration parameters in the state vector can be fixed to reduce the amount of calculation.
  • this method selects and removes redundant state quantities based on the concept of key frames.
  • the process of removing redundant state quantities is also called marginalization.
  • KSWF determines the marginalization order according to whether the navigation state quantity is associated with the keyframe bundle, so as to prolong the existence time of the keyframe bundle. In this way, when the device performs degenerate motion, since the key frame bundle basically does not change at this time, this method can avoid the pose drift of the traditional method.
  • the invention provides a visual inertial odometer method with self-calibration based on key frame sliding window filtering.
  • At least one IMU including a three-axis gyroscope and a three-axis accelerometer
  • at least one camera such as a wide-view camera or a fisheye camera
  • the shutter mode of the camera can be rolling shutter or global shutter.
  • the present invention assumes that the camera and IMU on the device have been roughly calibrated. Examples of such devices are smartphones, AR glasses, delivery robots, etc.
  • the device collects a sequence of image frame bundles through N cameras (each frame bundle contains N images with similar time taken by N cameras, N ⁇ 1) and an IMU collects 3-axis accelerometer data and 3-axis gyroscope data , and then use this method to fuse the data of the camera and IMU to estimate the state and parameters of the device, as well as the uncertainty.
  • the position, velocity, attitude, and parameters of the camera and IMU are collectively called state vectors.
  • the uncertainty of these state vectors is determined by the state
  • the covariance of the vector is described; for simplicity, the pose and velocity at the same time t are simply referred to as the navigation state quantity;
  • Implementation method of the present invention comprises the following steps:
  • Step 1 when the j-th frame bundle arrives, extract the feature points and corresponding descriptors on the N images, and get the set of feature points for image k
  • the detection method of the feature point in step 1 includes ORB and BRISK
  • the method of generating the descriptor includes BRISK, FREAK and the feature descriptor generation method based on deep learning.
  • Step 2 match the frame bundle j with the previous K keyframe bundles to expand the feature trajectories of feature points.
  • the matching of two frame bundles includes the matching of N pairs of images, and for each pair of images, the feature points in the two frames of images are matched according to the feature descriptor.
  • u, v are the pixel coordinates of the feature point in the kth image of the jth frame beam
  • frame bundle j and frame bundle j-1 will also be matched to expand the feature track of the feature points.
  • pairwise matching will be performed on the N frame images in the frame bundle j to expand the feature track of the feature points.
  • the matching between two frames of images is realized by finding feature descriptors with similar Hamming distances.
  • the implementation of searching for similar feature descriptors includes brute force search, k-nearest neighbor or window-based search.
  • Step 3 if the sliding window filter has not been initialized, the state vectors such as pose, velocity and sensor parameters and their covariance will be initialized through the feature trajectory and IMU data in several frame bundles (such as two keyframe bundles).
  • the state vector X(t) contains the current navigation state quantity ⁇ (t), the IMU parameter x imu , and the parameters of N cameras Navigation state quantities corresponding to the past N kf keyframe bundles
  • Step 4 when the filter is initialized, for the current frame bundle j that has completed feature matching, according to the IMU data, recursively estimate the time t j corresponding to the frame bundle j from the pose and velocity corresponding to the previous frame bundle j-1 pose and velocity, and clone the estimated pose and velocity to augment the state vector and covariance of the filter.
  • Step 5 Using multiple feature trajectories in frame bundle j as observation information, update the state vector and covariance of the filter.
  • the updated state vector and covariance can be released to downstream applications, such as projection of virtual scenes in AR and robot path planning.
  • the processing mode of the feature track is determined.
  • observations in these feature trajectories are used to update the state vector and covariance. This method supports real-time self-calibration because the geometric parameters of the sensor can be placed in the state vector to be updated.
  • the three cases include that the feature track disappears in the current frame bundle j, the feature track corresponds to a landmark point in the state vector, and the feature track corresponds to a well-observed landmark point that is not in the state vector.
  • a landmark point is first triangulated with the characteristic trajectory, and the reprojection error and Jacobian matrix of all observations in the characteristic trajectory are calculated, and then the matrix zero-space projection is used to eliminate the Jacobian matrix of the landmark point parameters, and then Outliers were removed by the Markov test.
  • the reprojection error and the Jacobian matrix of the feature trajectory observed in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test.
  • the third case first use the feature trajectory to triangulate a landmark point, then calculate the reprojection error and Jacobian matrix of all observations in the feature trajectory, remove outliers by the Markov test, and then add the landmark point to the state vector and covariance matrix.
  • all the characteristic trajectories are divided into three categories. For each type of characteristic trajectory, the above-mentioned reprojection error and Jacobian matrix are superimposed, and the state vector and covariance matrix are updated.
  • the update method is the same as that of the classic EKFs are the same. .
  • Step 6 in order to control the calculation amount, after the update is completed, the redundant navigation state quantity in the current state vector will be detected. These redundant state quantities are selected based on whether they correspond to keyframe bundles, and are then moved out of the filter.
  • redundant frame bundles are selected and marginalized. For each marginalization operation, at least N r redundant frame bundles are selected (for monocular cameras, N r is 3, and for binocular cameras, N r is 2). To meet this requirement, redundant frame bundles are first selected among the nearest non-key frame bundles while excluding the nearest N tf frame bundles, and then redundant frame bundles are selected among the earliest key frame bundles.
  • the filter can be updated with feature trajectories whose length exceeds two observations—that is, if such a feature track can successfully triangulate a landmark point with all its observations, then use
  • the feature point observations in the redundant frame bundle on the feature track are updated by EKF, similar to the first case in step 5; the feature points belonging to other feature tracks in the redundant frame bundle will be discarded.
  • EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
  • steps 1-6 will be repeated.
  • Each loop will publish the filter estimated state vector and covariance to serve downstream applications. These published information describe state quantities such as pose, velocity, and sensor parameters of the device.
  • the odometry method can estimate IMU internal parameters, camera internal parameters, camera external parameters, time delay and rolling shutter effect during the filtering process, it can be applied to camera-IMU combinations with inaccurate initial calibration or rolling shutter, An example is the camera-IMU combination on a mobile phone.
  • the odometry method uses keyframe-based feature association and state quantity management, so it can robustly handle degenerate motion situations, such as stationary, hovering, and approximate pure rotation.
  • Figure 1 is a schematic diagram of the visual inertial odometry method based on key frame sliding window filtering.
  • the gray bottom module uses the concept of key frames.
  • Figure 2 is a schematic diagram of the multi-thread implementation of the visual inertial odometry method based on key frame sliding window filtering.
  • the gray bottom module uses the concept of key frames.
  • Figure 3 is a flow chart of steps 1-7 of image feature point matching based on key frames, taking a binocular camera as an example.
  • Fig. 4 is a schematic diagram of two examples of state quantities contained in a state vector.
  • Fig. 5 is a schematic diagram of state vectors before and after cloning the navigation state quantity for the current frame bundle j for the situation in Fig. 3 , where ⁇ j is a newly added state quantity.
  • FIG. 6 is a schematic diagram of the covariance matrix before cloning the navigation state quantity of the current frame bundle j for the situation in FIG. 3 .
  • Figure 7 is a schematic diagram of the covariance matrix after cloning the navigation state quantity of the current frame bundle j for the situation in Figure 3.
  • the squares filled with diagonal lines correspond to the newly added rows and columns. Note that the rows and columns filled with diagonal lines The rows and columns filling the checkerboard have equal elements.
  • Figure 8 is a schematic diagram of the three ways that the feature trajectory of the landmark point is used for state vector and covariance update. If the landmark point disappears in the current frame bundle and is not in the state vector, it is projected to remove its Jacobian coefficient matrix, and then used for filtering Update; if the landmark point is in the state vector, it is directly used for filter update; if the landmark point is observed in the current frame beam and can be successfully triangulated, it is added to the state vector, and then used for filter update.
  • FIG. 9 is a schematic diagram of a method for selecting redundant frame bundles.
  • the figure shows the navigation state quantities corresponding to key frame bundles (gray background) and normal frame bundles (white background). Note that at least one keyframe bundle needs to be preserved after marginalizing redundant frame bundles.
  • the white background marks the common frame bundle
  • the dotted line marks the marginalized redundant Yu frame bundle.
  • the present invention uses the data collected by the camera and IMU, and considers the problems caused by the residual error of low-end sensor calibration and degenerate motion such as static, and proposes a key frame-based sliding window filter with self-calibration for real-time pose estimation.
  • This filter is suitable for the combined system of monocular or binocular camera and IMU.
  • the shutter mode of the camera can be rolling shutter or global shutter.
  • the method provided by the present invention can be implemented with computer software technology, and the main steps of implementation are: image feature extraction, feature association based on key frames, filter initialization, state calculation based on IMU, update filter using feature observation, and key frame-based For status quantity management, see Figure 1.
  • Embodiment Take binocular camera and IMU combined system as an example to carry out a specific elaboration on the process of the present invention, as follows:
  • Step 1 for a frame bundle j containing N frames of time-similar images captured by N cameras, detect the feature points in each image, and generate descriptors (ie, feature vectors) for these feature points.
  • the feature point detection methods include ORB[6] or BRISK[7], and methods for generating descriptors include BRISK[7] or FREAK[8].
  • Detect and extract image features for frame bundle j captured by 2 cameras.
  • each feature point concentric circles with different radii are constructed with the feature point as the center, and a certain number of equally spaced sampling points are obtained on each circle. There are 60 sampling points including the feature point itself. Points are Gaussian filtered to eliminate the aliasing effect, and the main direction of the gray scale formed by these sampling points is calculated; then the sampling area around the feature point is rotated to the main direction to obtain a new sampling area, and 512 sampling point pairs are constructed, according to The gray difference of point pairs forms a 512-dimensional binary code, that is, each feature vector is 64Bytes.
  • Step 2 use the image feature descriptor extracted in step 1 to perform feature association based on keyframes (also called feature matching).
  • the matching process associates the features of the current frame bundle j with the features of several previous frame bundles, including the K most recent keyframe bundles and the previous frame bundle j ⁇ 1.
  • frame bundle j is selected as the key frame bundle. If there are multiple cameras, feature association can also be performed between the N images of the current frame bundle j.
  • the matching of feature descriptors between two frames of images is done by brute force search, k-nearest neighbor or window search.
  • Figure 3 shows the process of feature point matching for the data collected by a device with a binocular camera, where the frame bundles with a gray background represent key frame bundles, and the frame bundles with a white background represent ordinary frame bundles.
  • Feature matching includes 1-7 and other steps, 1 is the matching of the current frame bundle j and the left camera image of the previous frame bundle j-1, 2 is the matching of the current frame bundle j and the key frame bundle j-2 left camera image, 3 is The match between the current frame bundle j and the key frame bundle j- 2 left camera image, 4 is the match between the current frame bundle j and the previous frame bundle j-1 right camera image, 5 is the current frame bundle j and the key frame bundle j-2 right.
  • the matching of camera images 6 is the matching of the current frame bundle j and the right camera image of the key frame bundle j 2
  • 7 is the matching between the two frame images of the left and right cameras in the current frame bundle j.
  • Feature matching first performs the matching between the current frame bundle j and the first two key frame bundles j-2, j 2 —2356, and then judges whether the current frame bundle j is selected as the key frame bundle. If the number of feature matches in the current frame bundle is less than 20% of the number of feature points, select j as the key frame bundle. Next, perform the matching of the current frame bundle j and the previous frame bundle j-1—14. Finally, perform the matching between the 2 frame images in the current frame bundle j—7.
  • the matching of every two frames of images consists of two steps. First, the nearest feature matching is found by violent search according to the Hamming distance between the feature descriptors, and then the abnormal matching is removed by the RANSAC method including the 5-point method.
  • Step 3 if the sliding window filter has not been initialized, initialize the state vector and covariance using the feature trajectory and IMU data in several frame bundles (such as 2 keyframes).
  • x imu can include the bias, scale factor, misalignment and acceleration sensitivity of the IMU (Fig. 4 instance 2)
  • x C can include the projection geometry parameters of each camera, extrinsic parameters, camera time delay and rolling shutter effect (Fig. 4 instance 2).
  • step 2 gives two key frame bundles j 1 , j 2 and corresponding IMU data, initialize the filter.
  • the initial position and velocity of the filter are set to zero, and the initial attitude is determined from the initial accelerometer data, so that the z-axis of the filter's reference world coordinate system ⁇ W ⁇ is along the negative gravity direction.
  • the bias of the IMU if the pixel movement (i.e., optical flow) of the frame bundle j2 relative to j1 is small, the bias of the gyroscope and accelerometer is set by averaging the IMU data. Otherwise, the bias of the gyroscope and accelerometer will be set to zero.
  • the time delay of the camera relative to the IMU is initialized to 0.
  • the covariance of other sensor parameters and state quantities is set according to data tables or experience. If the sensor has been well calibrated in advance, the entries of the covariance matrix corresponding to these sensor parameters can be set to zero to fix these parameters, that is, these parameters will not be updated.
  • Step 5 update the state vector and covariance using the feature trajectory in the current frame bundle j.
  • different ways are adopted to perform EKF update. These three cases include that the feature trajectory disappears in the current frame bundle, the landmark point corresponding to the feature trajectory is in the state vector, and the feature trajectory corresponds to a well-observed landmark point that is not in the state vector.
  • Well-observed landmark points need to meet two conditions. One is that there are sufficiently long observations, such as being observed in 7 frames of images, and the other is that their corresponding surface points can be successfully triangulated, that is, these observations have sufficient parallax.
  • first use the characteristic trajectory to triangulate a landmark point first use the characteristic trajectory to triangulate a landmark point, then calculate the reprojection error and Jacobian matrix of all observations on the characteristic trajectory, and then use the matrix zero-space projection to eliminate the Jacobian matrix of the landmark point parameters, Outliers were removed by the Markov test.
  • the reprojection error and the Jacobian matrix of the observations of the feature trajectory in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test.
  • first use the characteristic trajectory to triangulate a landmark point then calculate the reprojection error and Jacobian matrix of all observations on the characteristic trajectory, remove outliers through the Markov test, and then add the landmark point to the state vector and covariance matrix.
  • a feature trajectory that disappears in the current frame bundle j if it contains at least 3 observations, it can be used as a complete feature trajectory to triangulate a landmark point. If the triangulation is successful, the reprojection error and the Jacobian matrix of all observations on the feature trajectory will be calculated, and then the matrix null space projection will be used to eliminate the Jacobian matrix of the landmark point parameters, and then the outliers will be removed by the Markov test. For the feature trajectory of a landmark point in the state vector, the reprojection error and the Jacobian matrix of the landmark point observation in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test.
  • Step 6 in order to limit the amount of calculation, when the number of navigation state quantities in the sliding window exceeds the sum of the allowable number of key frame bundles N kf and the number of recent frame bundles N tf , select redundant frame bundles from the sliding window and convert them to marginalized. Because two observations of an unknown landmark point cannot provide effective constraints on the pose, at least N r redundant frame bundles are selected in a marginalization operation (for monocular cameras, N r is 3, for stereo cameras, Nr is 2). To meet this requirement, redundant frame bundles are first selected among the nearest non-key frame bundles while excluding the nearest N tf frame bundles, and then redundant frame bundles are selected among the earliest key frame bundles.
  • the sliding window filter can be updated by using the feature trajectory whose length exceeds two observations. Specifically, if such a feature trajectory can successfully triangulate a landmark point with all its observations, Then use the feature point observations in the redundant frame bundle on the feature track to perform EKF update, similar to the first case in step 5; the feature points belonging to other feature tracks in the redundant frame bundle will be discarded. After the EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
  • N r redundant frame bundles for a monocular camera, N r is 3, and for a binocular camera, N r is 2)
  • the nearest N tf frame bundles are excluded first, and then the nearest Select redundant frame bundles from the non-key frame bundles (assuming that n r are selected), and then select N r -n r redundant frame bundles from the earliest key frame bundles. Note that after removing these redundant frame bundles, at least one keyframe bundle should be kept.
  • the redundant frame bundles are key frame bundle j 1 and normal frame bundle j-3 (as shown by the dotted line box in the left figure of FIG. 10 ).
  • the redundant frame bundles are the two oldest frame bundles.
  • the filter is updated with observations of landmark points that are observed more than twice in the redundant bundle. If the landmark point can be successfully triangulated using all its observations, use the observations in the redundant frame bundle on its feature trajectory to perform EKF update, similar to the first case in step 5; other feature point observations in the redundant frame bundle will be thrown away.
  • EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
  • the method provided by the present invention can also be implemented as a corresponding program by utilizing multi-thread design, as shown in FIG. 2 .
  • the multi-threaded program of visual-inertial odometry based on key frame sliding window filtering includes feature extraction thread, key frame based feature association thread, camera-IMU synchronization thread and sliding window filtering thread. This multi-threaded implementation can significantly increase the throughput of the program.
  • the feature extraction thread is used to detect the feature points of each frame image captured by the camera, and use a feature description method to generate descriptors for the feature points.
  • a feature description method to generate descriptors for the feature points.
  • BRISK uses the FAST[9] algorithm for feature point detection, and detects features by constructing image pyramids in the scale space.
  • the binary code of the descriptor is obtained by comparing the gray value of 512 pixel pairs in the neighborhood of the feature point.
  • the key frame-based feature association thread is used to associate the image features of the current frame bundle and the previous frame bundle, and perform key frame-based feature point matching according to the image features extracted by the feature extraction thread.
  • the specific matching steps include matching the current frame bundle with K key frame bundles, matching the current frame bundle with the previous frame bundle, and matching the N images of the current frame bundle two by two. After the matching with the K key frame bundles is completed, it will be judged whether the current frame bundle is selected as the key frame bundle according to the ratio of feature matching.
  • One way to achieve two-frame image feature matching is to find the closest descriptor to each feature descriptor through violent search. Finally, these matches will go through a RANSAC method including a 5-point method to remove outliers.
  • Camera-IMU synchronization thread for synchronizing camera and IMU data.
  • the thread can lock other threads through the condition variable to wait for the arrival of the IMU data corresponding to the current frame bundle.
  • the EKF update is performed in three cases: the feature trajectory that disappears in the current frame bundle j, the feature trajectory corresponding to the landmark point in the state vector, and the landmark point that is not in the state vector but has a long enough observation Characteristic locus of punctuation.
  • the three cases differ in how the update is prepared.
  • the matrix null-space projection is utilized to eliminate the Jacobian matrix of the landmark point parameters.
  • the triangulated new landmark points will be added to the state vector and covariance matrix.
  • redundant frame bundles are selected and marginalized. At least N r redundant frame bundles are selected each time (for monocular devices, N r is 3, and for binocular devices, N r is 2). When these frame bundles are selected, the nearest N tf frame bundles will be excluded, then redundant frame bundles will be selected among non-key frame bundles, and finally redundant frame bundles will be selected among the earliest key frame bundles. After the redundant frame bundle is selected, the filter can be updated with observations of landmark points observed more than twice in the redundant frame bundle.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Navigation (AREA)

Abstract

A visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering, which belongs to the field of multi-sensor fusion navigation and positioning. In a traditional filtering method, old state quantities are continuously removed as time elapses, and in a situation of degraded motion, there is not enough parallax between frame bundles corresponding to a reserved state quantity, thus making it difficult to constrain motion and leading to drifting. The visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering comprises several steps: image feature extraction, feature association based on keyframes, filter initialization, IMU-based state extrapolation, filter updating using feature observation, and keyframe-based state quantity management; and geometric parameters of a sensor can be estimated in real time. The described method is based on a keyframe organization state quantity; state quantities corresponding to keyframe bundles are not removed during degraded motion, so that good observation can still be ensured, and drifting is avoided. The described method is the first keyframe-based sliding window filtering method supporting self-calibration.

Description

基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法A visual-inertial odometry method with self-calibration based on key frame sliding window filtering 技术领域technical field
本发明属于多传感器融合导航定位领域,具体涉及基于单目或双目相机与惯性测量单元(IMU)组合系统的位置、速度与姿态的推算和系统中传感器参数的实时自标定。The invention belongs to the field of multi-sensor fusion navigation and positioning, and in particular relates to the estimation of position, velocity and attitude based on a combined system of a monocular or binocular camera and an inertial measurement unit (IMU) and real-time self-calibration of sensor parameters in the system.
背景技术Background technique
如何实时估计设备在场景中的位置和姿态(即位姿),是增强现实(AR)、移动机器人、无人机等应用领域的核心问题之一。视觉惯性里程计(visual inertial odometry)利用设备上固定安装的相机和IMU采集的数据,能够实时追踪设备的位姿。该技术融合了计算机三维视觉和惯性航位推算,利用低成本的传感器,能够达到厘米级的实时定位精度。How to estimate the position and attitude (i.e., pose) of a device in a scene in real time is one of the core issues in the application fields of augmented reality (AR), mobile robots, and drones. Visual inertial odometry (visual inertial odometry) can track the pose of the device in real time by using the data collected by the fixed camera and IMU on the device. This technology combines computer 3D vision and inertial dead reckoning, and uses low-cost sensors to achieve centimeter-level real-time positioning accuracy.
很多AR相关的企业对该技术进行了研发。谷歌公司在2014年Tango项目中展示了含有鱼眼相机和IMU的手机可以通过融合这些传感器的信息来估计手机的位姿并进行场景的三维重建,证明了移动设备可以在不依赖卫星导航系统的情况下拥有位姿和空间感知能力。随后,谷歌公司协同手机厂商推出了基于Tango的智能手机。该手机通过配备的鱼眼相机和IMU,可以跟踪自身的三维运动轨迹。2017年,苹果公司推出了针对含相机和IMU的iOS手持设备的ARKit框架,用于开发增强现实的应用,其中的运动跟踪技术是一种典型的视觉惯性里程计[1],[2]。同年,谷歌公司推出了适用于Android手机的含视觉惯性里程计[3],[4]的ARCore框架。Many AR-related companies have developed this technology. In the 2014 Tango project, Google demonstrated that a mobile phone containing a fisheye camera and an IMU can estimate the pose of the mobile phone and perform three-dimensional reconstruction of the scene by fusing information from these sensors, which proves that mobile devices can operate without relying on satellite navigation systems. Possesses pose and space awareness in the situation. Subsequently, Google cooperated with mobile phone manufacturers to launch Tango-based smart phones. The mobile phone can track its own three-dimensional motion trajectory through the equipped fisheye camera and IMU. In 2017, Apple launched the ARKit framework for iOS handheld devices with cameras and IMUs for the development of augmented reality applications. The motion tracking technology is a typical visual-inertial odometry [1], [2]. In the same year, Google launched the ARCore framework for Android phones with visual-inertial odometry [3], [4].
视觉惯性里程计也广泛应用于无人机和自动驾驶等行业。大疆新型的无人机通过视觉惯性里程计技术[5]可以在复杂场景中精确的估计位姿,从而实现避障和穿越等一系列飞行动作。多种自动驾驶套件中使用了视觉惯性里程计方法辅助自动泊车。Visual-inertial odometry is also widely used in industries such as drones and autonomous driving. DJI's new UAV can accurately estimate the pose in complex scenes through visual inertial odometry technology [5], so as to realize a series of flight actions such as obstacle avoidance and crossing. A visual-inertial odometry approach is used in several autonomous driving suites to assist in automated parking.
目前的视觉惯性里程计方法仍然存在一些问题,比如如何应对瑕疵的参数标定和如何应对退化运动(如静止和近似纯旋转)。对于大量的消费级产品,低成本的传感器通常没有精确的标定,这些不精确的标定会明显的降低视觉惯性里程计的位姿估计精度。另一方面,这些设备(如手机)在实际使用中会经常经历退化运动,现有的实时里程计方法在这些情况下经常会产生明显的位姿漂移。There are still some problems with current visual-inertial odometry methods, such as how to deal with imperfect parameter calibration and how to deal with degenerate motions (such as stationary and approximately pure rotation). For a large number of consumer products, low-cost sensors are usually not accurately calibrated, and these imprecise calibrations will significantly reduce the pose estimation accuracy of visual-inertial odometry. On the other hand, these devices (e.g., mobile phones) often experience degenerative motions during practical use, and existing real-time odometry methods often produce significant pose drift in these situations.
发明内容Contents of the invention
为了解决上述两个问题,本发明提出了一种含自标定的基于关键帧滑窗滤波(Keyframe-based Sliding Window Filter,KSWF)的适用于相机和IMU组合系统的视觉惯性 里程计方法。该方法基于扩展卡尔曼滤波(EKF)进行状态估计,所以具有很好的实时性。该方法能够实时利用场景特征标定(即自标定)相机和IMU的几何参数,并且基于关键帧的概念进行针对滑动窗口中变量的滤波,来解决退化运动。在下文描述中,我们将该滤波器估计的位置、速度、姿态以及传感器的参数等状态量总称为状态向量。该滤波器也估计了这些状态量的不确定度,即状态向量的协方差。为简化起见,位置和姿态简称为位姿,在同一时刻t的位姿和速度简称为导航状态量。本文中传感器泛指相机和IMU,设备指承载这些传感器的装置,如手机,滤波器指融合数据进行状态估计的程序。KSWF适用于由一个或两个相机和一个IMU固定组成的系统,基于滑窗滤波器融合他们采集的数据以估计该系统(或承载设备)的状态向量和协方差。KSWF可以实时的标定IMU的零偏、尺度因子、错位和加速度敏感性,以及相机的投影几何参数、外部参数、相机的时间延迟和卷帘快门效应。其特征关联前端从单目相机获取一系列帧或从双目相机获取一系列大致同步的帧对作为输入,为了简化,后文将单目的帧或双目的帧对统一称为帧束。该前端在帧束中提取特征点,并基于关键帧的概念在帧束间匹配特征点,从而得到特征点观测的轨迹(即特征轨迹),并根据当前帧束与历史帧束视野重叠的比例选择关键帧束。后端滤波器利用IMU数据进行位姿推算,然后利用特征轨迹中的观测更新滤波器的状态向量和协方差,最后基于关键帧的概念管理滤波器中的状态量以保证实时运算。In order to solve the above two problems, the present invention proposes a visual-inertial odometry method suitable for camera and IMU combined systems based on keyframe-based sliding window filter (Keyframe-based Sliding Window Filter, KSWF) contained in the present invention. This method is based on Extended Kalman Filter (EKF) for state estimation, so it has good real-time performance. This method can use scene features to calibrate (ie, self-calibrate) the geometric parameters of the camera and IMU in real time, and filter the variables in the sliding window based on the concept of key frames to solve the degenerate motion. In the following description, we collectively refer to state quantities such as position, velocity, attitude, and sensor parameters estimated by the filter as state vectors. The filter also estimates the uncertainty of these state quantities, ie the covariance of the state vectors. For simplicity, the position and attitude are referred to as pose for short, and the pose and velocity at the same time t are referred to as navigation state quantities. In this article, sensors generally refer to cameras and IMUs, devices refer to devices that carry these sensors, such as mobile phones, and filters refer to programs that fuse data for state estimation. KSWF is suitable for a system consisting of one or two cameras and an IMU fixed, based on sliding window filter fusion of their collected data to estimate the state vector and covariance of the system (or carrying equipment). KSWF can calibrate the zero bias, scale factor, misalignment and acceleration sensitivity of the IMU in real time, as well as the camera's projection geometry parameters, external parameters, camera time delay and rolling shutter effect. Its feature association front-end obtains a series of frames from a monocular camera or a series of roughly synchronized frame pairs from a binocular camera as input. For simplicity, the monocular frames or binocular frame pairs are collectively referred to as frame bundles. The front-end extracts feature points in frame bundles, and matches feature points between frame bundles based on the concept of key frames, so as to obtain the trajectory observed by feature points (that is, feature trajectory), and according to the overlapping ratio of the current frame bundle and the historical frame bundle view Select the keyframe bundle. The back-end filter uses IMU data for pose estimation, then uses the observations in the feature trajectory to update the state vector and covariance of the filter, and finally manages the state quantities in the filter based on the concept of key frames to ensure real-time operation.
因为KSWF在状态向量中包含了相机和IMU的几何标定参数,所以可以实时估计传感器的标定参数,以应对标定不准确的情况。在有准确标定参数的情况下,状态向量中的标定参数可以被固定,以减少计算量。Because KSWF includes the geometric calibration parameters of the camera and IMU in the state vector, it can estimate the calibration parameters of the sensor in real time to deal with inaccurate calibration. In the case of accurate calibration parameters, the calibration parameters in the state vector can be fixed to reduce the amount of calculation.
本方法为了避免计算量随时间不断增加,基于关键帧的概念选取并去除多余的状态量。去除多余状态量的环节也叫边缘化。KSWF根据导航状态量是否与关键帧束相关联来确定其边缘化的顺序,以延长关键帧束存在的时长。这样在设备进行退化运动时,由于关键帧束此时基本不发生变化,所以该方法可以避免传统方法的位姿漂移。In order to avoid the continuous increase of calculation amount over time, this method selects and removes redundant state quantities based on the concept of key frames. The process of removing redundant state quantities is also called marginalization. KSWF determines the marginalization order according to whether the navigation state quantity is associated with the keyframe bundle, so as to prolong the existence time of the keyframe bundle. In this way, when the device performs degenerate motion, since the key frame bundle basically does not change at this time, this method can avoid the pose drift of the traditional method.
本发明针对现有相机—IMU融合状态估计技术难以处理标定误差和退化运动的不足,提供一种含自标定的基于关键帧滑窗滤波的视觉惯性里程计方法。本发明适用的设备上固定安装有至少一个IMU(含三轴陀螺仪和三轴加速度计)和至少一个相机(例如宽视角相机或鱼眼相机),相机的快门方式可以为卷帘快门或全局快门。本发明假设该设备上的相机和IMU有过粗略的标定。这种设备的实例有智能手机、AR眼镜、快递机器人等。该设备通过N个相机采集图像帧束的序列(每个帧束含由N个相机拍摄的N张时间相近的图像,N≥1)和一个IMU采集3轴加速度计数据和3轴陀螺仪数据,然后利用本方法融合相机和IMU的数 据来估计设备的状态和参数,以及不确定度,将位置、速度、姿态以及相机和IMU的参数总称为状态向量,这些状态向量的不确定度由状态向量的协方差来描述;为简化起见,在同一个时刻t的位姿和速度简称为导航状态量;Aiming at the problem that the existing camera-IMU fusion state estimation technology is difficult to deal with calibration errors and degenerate motions, the invention provides a visual inertial odometer method with self-calibration based on key frame sliding window filtering. At least one IMU (including a three-axis gyroscope and a three-axis accelerometer) and at least one camera (such as a wide-view camera or a fisheye camera) are fixedly installed on the device to which the present invention is applicable, and the shutter mode of the camera can be rolling shutter or global shutter. The present invention assumes that the camera and IMU on the device have been roughly calibrated. Examples of such devices are smartphones, AR glasses, delivery robots, etc. The device collects a sequence of image frame bundles through N cameras (each frame bundle contains N images with similar time taken by N cameras, N≥1) and an IMU collects 3-axis accelerometer data and 3-axis gyroscope data , and then use this method to fuse the data of the camera and IMU to estimate the state and parameters of the device, as well as the uncertainty. The position, velocity, attitude, and parameters of the camera and IMU are collectively called state vectors. The uncertainty of these state vectors is determined by the state The covariance of the vector is described; for simplicity, the pose and velocity at the same time t are simply referred to as the navigation state quantity;
本发明的实现方法包含以下步骤:Implementation method of the present invention comprises the following steps:
步骤1,当第j帧束到来时,在其中的N张图像上提取特征点和对应的描述子,对于图像k,得到其特征点的集合
Figure PCTCN2022080346-appb-000001
Step 1, when the j-th frame bundle arrives, extract the feature points and corresponding descriptors on the N images, and get the set of feature points for image k
Figure PCTCN2022080346-appb-000001
进一步的,步骤1中特征点的检测方法包括ORB和BRISK,生成描述子的方法包括BRISK、FREAK和基于深度学习的特征描述子生成方法。Further, the detection method of the feature point in step 1 includes ORB and BRISK, and the method of generating the descriptor includes BRISK, FREAK and the feature descriptor generation method based on deep learning.
步骤2,将帧束j与先前的K个关键帧束进行匹配,以扩充特征点的特征轨迹。两个帧束的匹配包含N对图像的匹配,对每对图像,根据特征描述子匹配两帧图像中的特征点。两个匹配的特征点对应一个场景中的地标点L m,m=0,1,...,该地标点L m在多帧图像(j=0,1,...)中的观测
Figure PCTCN2022080346-appb-000002
(
Figure PCTCN2022080346-appb-000003
其中u,v是特征点在第j帧束的第k张图像中的像素坐标)形成一个特征轨迹
Figure PCTCN2022080346-appb-000004
当前帧束j与K个关键帧束匹配完成后,根据帧束j中的特征匹配个数与特征点个数的比例决定是否把帧束j选为关键帧束。
Step 2, match the frame bundle j with the previous K keyframe bundles to expand the feature trajectories of feature points. The matching of two frame bundles includes the matching of N pairs of images, and for each pair of images, the feature points in the two frames of images are matched according to the feature descriptor. Two matching feature points correspond to a landmark point L m in a scene, m=0,1,..., the observation of the landmark point L m in multiple frames of images (j=0,1,...)
Figure PCTCN2022080346-appb-000002
(
Figure PCTCN2022080346-appb-000003
where u, v are the pixel coordinates of the feature point in the kth image of the jth frame beam) to form a feature trajectory
Figure PCTCN2022080346-appb-000004
After the current frame bundle j is matched with K key frame bundles, it is determined whether to select frame bundle j as a key frame bundle according to the ratio of the number of feature matches in frame bundle j to the number of feature points.
接着,如果上一帧束j-1不是关键帧束,还将匹配帧束j和帧束j-1,以扩充特征点的特征轨迹。Next, if the last frame bundle j-1 is not a key frame bundle, frame bundle j and frame bundle j-1 will also be matched to expand the feature track of the feature points.
对于N>1的情况,还会对帧束j中的N帧图像进行两两匹配,以扩充特征点的特征轨迹。For the case of N>1, pairwise matching will be performed on the N frame images in the frame bundle j to expand the feature track of the feature points.
两帧图像间的匹配通过寻找Hamming距离相近的特征描述子来实现。搜索相近的特征描述子的实现方式包括暴力搜索、k最近邻或基于窗口的搜索。The matching between two frames of images is realized by finding feature descriptors with similar Hamming distances. The implementation of searching for similar feature descriptors includes brute force search, k-nearest neighbor or window-based search.
步骤3,如果滑窗滤波器尚未初始化,将通过几个帧束(如两个关键帧束)中的特征轨迹和IMU数据初始化位姿、速度和传感器参数等状态向量以及他们的协方差。 Step 3, if the sliding window filter has not been initialized, the state vectors such as pose, velocity and sensor parameters and their covariance will be initialized through the feature trajectory and IMU data in several frame bundles (such as two keyframe bundles).
步骤3中状态向量X(t)包含当前导航状态量π(t)、IMU参数x imu,N个相机的参数
Figure PCTCN2022080346-appb-000005
Figure PCTCN2022080346-appb-000006
过去的N kf个关键帧束对应的导航状态量
Figure PCTCN2022080346-appb-000007
和最近N tf+1个帧束对应的导航状态量
Figure PCTCN2022080346-appb-000008
以及m个地标点L 1,L 2,...,L m;其中,x imu包括IMU的零偏、尺度因子、错位和加速度敏感性,x C包含每个相机的投影几何参数、外部参数、相机的时间延迟和卷帘快门效应;过去的N kf个关键帧束和最近的N tf个帧束形成一个导航状态量的滑动窗口。
In step 3, the state vector X(t) contains the current navigation state quantity π(t), the IMU parameter x imu , and the parameters of N cameras
Figure PCTCN2022080346-appb-000005
Figure PCTCN2022080346-appb-000006
Navigation state quantities corresponding to the past N kf keyframe bundles
Figure PCTCN2022080346-appb-000007
The navigation state quantity corresponding to the most recent N tf +1 frame bundles
Figure PCTCN2022080346-appb-000008
and m landmark points L 1 , L 2 ,...,L m ; among them, x imu includes the zero bias, scale factor, dislocation and acceleration sensitivity of the IMU, and x C includes the projection geometry parameters and external parameters of each camera , the time delay of the camera and the rolling shutter effect; the past N kf key frame bundles and the latest N tf frame bundles form a sliding window of the navigation state quantity.
步骤4,在滤波器完成初始化的情况下,对于完成特征匹配的当前帧束j,根据IMU数 据,从上一帧束j-1对应的位姿和速度递推估算帧束j对应时刻t j的位姿和速度,并克隆推算的位姿和速度以扩充滤波器的状态向量和协方差。 Step 4, when the filter is initialized, for the current frame bundle j that has completed feature matching, according to the IMU data, recursively estimate the time t j corresponding to the frame bundle j from the pose and velocity corresponding to the previous frame bundle j-1 pose and velocity, and clone the estimated pose and velocity to augment the state vector and covariance of the filter.
步骤5,用帧束j中的多个特征轨迹作为观测信息,更新滤波器的状态向量和协方差。更新的状态向量和协方差可以发布给下游的应用,例如AR中虚拟场景的投影和机器人路径规划等。根据特征轨迹是否在当前帧束j中消失或者对应的地标点是否在状态向量中,确定特征轨迹的处理方式。进而,利用这些特征轨迹中的观测来更新状态向量和协方差。因为传感器的几何参数可以置于状态向量中以得到更新,所以本方法支持实时自标定。Step 5: Using multiple feature trajectories in frame bundle j as observation information, update the state vector and covariance of the filter. The updated state vector and covariance can be released to downstream applications, such as projection of virtual scenes in AR and robot path planning. According to whether the feature track disappears in the current frame bundle j or whether the corresponding landmark point is in the state vector, the processing mode of the feature track is determined. In turn, observations in these feature trajectories are used to update the state vector and covariance. This method supports real-time self-calibration because the geometric parameters of the sensor can be placed in the state vector to be updated.
具体而言,针对每条特征轨迹可能发生的三种情况,采取不同的方式处理。这三种情况包括该特征轨迹在当前帧束j中消失,该特征轨迹对应的地标点在状态向量中,以及该特征轨迹对应不在状态向量中但观测良好的地标点。对于第一种情况,首先用该特征轨迹三角化一个地标点,并计算特征轨迹中所有观测的重投影误差和雅可比矩阵,然后利用矩阵零空间投影以消除地标点参数的雅可比矩阵,再通过马氏检验去除异常值。对于第二种情况,首先计算特征轨迹在当前帧束j中观测的重投影误差和雅可比矩阵,然后通过马氏检验去除异常值。对于第三种情况,首先用该特征轨迹三角化一个地标点,然后计算特征轨迹中所有观测的重投影误差和雅可比矩阵,通过马氏检验去除异常值,接着添加该地标点到状态向量和协方差矩阵。最后,按以上三种观测情况把所有的特征轨迹分为三类,对每类特征轨迹,叠加上述的重投影误差和雅可比矩阵,对状态向量和协方差矩阵进行更新,更新的方式与经典EKF相同。。Specifically, different methods are used to deal with the three situations that may occur in each feature trajectory. The three cases include that the feature track disappears in the current frame bundle j, the feature track corresponds to a landmark point in the state vector, and the feature track corresponds to a well-observed landmark point that is not in the state vector. For the first case, a landmark point is first triangulated with the characteristic trajectory, and the reprojection error and Jacobian matrix of all observations in the characteristic trajectory are calculated, and then the matrix zero-space projection is used to eliminate the Jacobian matrix of the landmark point parameters, and then Outliers were removed by the Markov test. For the second case, the reprojection error and the Jacobian matrix of the feature trajectory observed in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test. For the third case, first use the feature trajectory to triangulate a landmark point, then calculate the reprojection error and Jacobian matrix of all observations in the feature trajectory, remove outliers by the Markov test, and then add the landmark point to the state vector and covariance matrix. Finally, according to the above three observation situations, all the characteristic trajectories are divided into three categories. For each type of characteristic trajectory, the above-mentioned reprojection error and Jacobian matrix are superimposed, and the state vector and covariance matrix are updated. The update method is the same as that of the classic EKFs are the same. .
步骤6,为了控制计算量,在更新完毕后,会检测当前状态向量中冗余的导航状态量。这些冗余的状态量根据其是否与关键帧束对应来选取,接着会被移出滤波器。 Step 6, in order to control the calculation amount, after the update is completed, the redundant navigation state quantity in the current state vector will be detected. These redundant state quantities are selected based on whether they correspond to keyframe bundles, and are then moved out of the filter.
具体而言,当滤波器状态向量中导航状态量的数量超过允许的关键帧束数目N kf和最近帧束数目N tf之和时,则从中选取冗余帧束并将其边缘化。每次边缘化操作,至少选择N r个冗余帧束(对于单目相机,N r为3,对于双目相机,N r为2)。为了达到这个要求,首先在最近的非关键帧束中选择冗余帧束,同时排除最近的N tf个帧束,然后在最早的关键帧束中选择冗余帧束。在找到N r个冗余帧束后,可以利用其中长度超过两次观测的特征轨迹来更新滤波器——即如果这样的一条特征轨迹可以利用其所有的观测成功三角化一个地标点,则使用该特征轨迹上冗余帧束中的特征点观测进行EKF更新,类似步骤5第一种情况;冗余帧束中属于其他特征轨迹的特征点将被丢弃。EKF更新后,删除这些冗余帧束对应的导航状态量和协方差矩阵的行和列。 Specifically, when the number of navigation state variables in the filter state vector exceeds the sum of the allowable number of key frame bundles N kf and the number of recent frame bundles N tf , redundant frame bundles are selected and marginalized. For each marginalization operation, at least N r redundant frame bundles are selected (for monocular cameras, N r is 3, and for binocular cameras, N r is 2). To meet this requirement, redundant frame bundles are first selected among the nearest non-key frame bundles while excluding the nearest N tf frame bundles, and then redundant frame bundles are selected among the earliest key frame bundles. After finding N r redundant frame bundles, the filter can be updated with feature trajectories whose length exceeds two observations—that is, if such a feature track can successfully triangulate a landmark point with all its observations, then use The feature point observations in the redundant frame bundle on the feature track are updated by EKF, similar to the first case in step 5; the feature points belonging to other feature tracks in the redundant frame bundle will be discarded. After the EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
对下一帧束j+1,将重复执行步骤1-6。每次循环都将发布滤波器估计的状态向量和协方差,以服务下游的应用。这些发布的信息描述了设备的位姿、速度和传感器参数等状态量。For the next frame bundle j+1, steps 1-6 will be repeated. Each loop will publish the filter estimated state vector and covariance to serve downstream applications. These published information describe state quantities such as pose, velocity, and sensor parameters of the device.
与现有技术相比,本发明的优点和有益效果如下:Compared with prior art, advantage and beneficial effect of the present invention are as follows:
1.由于该里程计方法在滤波过程中可以估计IMU内参、相机内参、相机外参、时间延迟和卷帘快门效应,所以能够适用于初始标定不准确或使用卷帘快门的相机-IMU组合,例如手机上的相机-IMU组合。1. Since the odometry method can estimate IMU internal parameters, camera internal parameters, camera external parameters, time delay and rolling shutter effect during the filtering process, it can be applied to camera-IMU combinations with inaccurate initial calibration or rolling shutter, An example is the camera-IMU combination on a mobile phone.
2.该里程计方法采用了基于关键帧的特征关联和状态量管理,所以能够稳健的处理退化运动的情形,例如静止、悬停、近似纯旋转等。2. The odometry method uses keyframe-based feature association and state quantity management, so it can robustly handle degenerate motion situations, such as stationary, hovering, and approximate pure rotation.
附图说明Description of drawings
图1为基于关键帧滑窗滤波的视觉惯性里程计方法示意图,灰底模块运用了关键帧的概念。Figure 1 is a schematic diagram of the visual inertial odometry method based on key frame sliding window filtering. The gray bottom module uses the concept of key frames.
图2为基于关键帧滑窗滤波的视觉惯性里程计方法的多线程实现示意图,灰底模块运用了关键帧的概念。Figure 2 is a schematic diagram of the multi-thread implementation of the visual inertial odometry method based on key frame sliding window filtering. The gray bottom module uses the concept of key frames.
图3为基于关键帧的图像特征点匹配的步骤①-⑦的流程图,以双目相机为例。Figure 3 is a flow chart of steps ①-⑦ of image feature point matching based on key frames, taking a binocular camera as an example.
图4为状态向量中所含状态量的两个实例示意图。Fig. 4 is a schematic diagram of two examples of state quantities contained in a state vector.
图5为针对图3的情况,对当前帧束j进行导航状态量克隆之前和之后的状态向量示意图,π j是新增的状态量。 Fig. 5 is a schematic diagram of state vectors before and after cloning the navigation state quantity for the current frame bundle j for the situation in Fig. 3 , where π j is a newly added state quantity.
图6为针对图3的情况,在对当前帧束j导航状态量克隆之前的协方差矩阵示意图。FIG. 6 is a schematic diagram of the covariance matrix before cloning the navigation state quantity of the current frame bundle j for the situation in FIG. 3 .
图7为针对图3的情况,在对当前帧束j导航状态量克隆之后的协方差矩阵示意图,内填斜线的方格对应新增的行和列,注意内填斜线的行列与内填棋盘格的行列具有相等的元素。Figure 7 is a schematic diagram of the covariance matrix after cloning the navigation state quantity of the current frame bundle j for the situation in Figure 3. The squares filled with diagonal lines correspond to the newly added rows and columns. Note that the rows and columns filled with diagonal lines The rows and columns filling the checkerboard have equal elements.
图8为地标点特征轨迹用于状态向量和协方差更新的三种方式示意图,如果地标点在当前帧束消失且不在状态向量中,则投影以去除其雅可比系数矩阵,然后再用于滤波更新;如果地标点在状态向量中,则直接用于滤波更新;如果地标点在当前帧束被观测到并且可以被成功三角化,则添加到状态向量中,然后用于滤波更新。Figure 8 is a schematic diagram of the three ways that the feature trajectory of the landmark point is used for state vector and covariance update. If the landmark point disappears in the current frame bundle and is not in the state vector, it is projected to remove its Jacobian coefficient matrix, and then used for filtering Update; if the landmark point is in the state vector, it is directly used for filter update; if the landmark point is observed in the current frame beam and can be successfully triangulated, it is added to the state vector, and then used for filter update.
图9为冗余帧束选取的方法示意图。图中显示了关键帧束(灰色底)和普通帧束(白色底)对应的导航状态量。注意在边缘化冗余帧束后需要保留至少一个关键帧束。FIG. 9 is a schematic diagram of a method for selecting redundant frame bundles. The figure shows the navigation state quantities corresponding to key frame bundles (gray background) and normal frame bundles (white background). Note that at least one keyframe bundle needs to be preserved after marginalizing redundant frame bundles.
图10为冗余帧束选取的两个例子示意图,假设N tf=3,N r=2,灰色底标记了关键帧束,白色底标记了普通帧束,虚线标记了将被边缘化的冗余帧束。左图:针对图3的情况, 对当前帧束完成滤波更新后,需要选取的冗余帧束。右图:针对另一种假设情况,需要选取的冗余帧束。 Figure 10 is a schematic diagram of two examples of redundant frame bundle selection, assuming N tf =3, N r =2, the gray background marks the key frame bundle, the white background marks the common frame bundle, and the dotted line marks the marginalized redundant Yu frame bundle. Left picture: For the situation in Figure 3, after completing the filter update of the current frame bundle, the redundant frame bundle needs to be selected. Right: For another hypothetical situation, redundant frame bundles need to be selected.
具体实施方式Detailed ways
本发明利用相机和IMU采集的数据,考虑低端传感器标定的残余误差和静止等退化运动导致的问题,提出了一种含自标定的基于关键帧的滑窗滤波器进行实时位姿估计。该滤波器适用于单目或双目相机和IMU的组合系统,相机的快门方式可以为卷帘快门或全局快门。通过实时估算传感器的几何标定参数,可以提高位姿估计的精度;通过运用关键帧的概念,可以避免退化运动时的位姿漂移现象。The present invention uses the data collected by the camera and IMU, and considers the problems caused by the residual error of low-end sensor calibration and degenerate motion such as static, and proposes a key frame-based sliding window filter with self-calibration for real-time pose estimation. This filter is suitable for the combined system of monocular or binocular camera and IMU. The shutter mode of the camera can be rolling shutter or global shutter. By estimating the geometric calibration parameters of the sensor in real time, the accuracy of pose estimation can be improved; by using the concept of key frames, the pose drift phenomenon during degenerate motion can be avoided.
本发明提供的方法能够用计算机软件技术实现,实施主要步骤分为:图像特征提取、基于关键帧的特征关联、滤波器初始化、基于IMU的状态推算、利用特征观测更新滤波器和基于关键帧的状态量管理,参见图1。实施例以双目相机和IMU组合系统为例对本发明的流程进行一个具体阐述,如下:The method provided by the present invention can be implemented with computer software technology, and the main steps of implementation are: image feature extraction, feature association based on key frames, filter initialization, state calculation based on IMU, update filter using feature observation, and key frame-based For status quantity management, see Figure 1. Embodiment Take binocular camera and IMU combined system as an example to carry out a specific elaboration on the process of the present invention, as follows:
步骤1,对于由N个相机捕获的含N帧时间相近图像的帧束j,检测每张图像中的特征点,并对这些特征点生成描述子(即特征向量),特征点的检测方法包括ORB[6]或BRISK[7],生成描述子的方法有BRISK[7]或FREAK[8]。 Step 1, for a frame bundle j containing N frames of time-similar images captured by N cameras, detect the feature points in each image, and generate descriptors (ie, feature vectors) for these feature points. The feature point detection methods include ORB[6] or BRISK[7], and methods for generating descriptors include BRISK[7] or FREAK[8].
实施例具体的实施过程说明如下(例如N=2,特征检测和描述方法都为BRISK):The specific implementation process of the embodiment is described as follows (for example, N=2, both feature detection and description methods are BRISK):
对由2个相机捕获的帧束j,检测和提取图像特征。为了对每帧图像检测特征点,首先构建含4个图层和4个内图层的尺度空间,得到该帧图像8个不同采样率的图像。然后利用FAST[9]检测算子定位特征点,并在多尺度空间中对定位的特征点进行非极大值抑制,即每个特征点与该特征点的图像空间(8邻域)和尺度空间(上下两层9×2=18邻域)的共26个邻域点比较,保留在邻域中FAST响应值取得最大值的特征点,并去除邻域中的非极大值。最后,基于二次拟合得到特征点的亚像素级精确位置和尺度空间的精确尺度。为了描述每个特征点,首先以该特征点为中心,构建不同半径的同心圆,在每个圆上获取一定数目的等间隔采样点,这些采样点包括特征点本身共有60个,对所有采样点进行高斯滤波以消除混叠效应,并计算这些采样点形成的灰度主方向;然后旋转该特征点周围的采样区域至主方向,得到新的采样区域,并构建512个采样点对,根据点对的灰度差异形成512维的二进制编码,即每个特征向量为64Bytes。Detect and extract image features for frame bundle j captured by 2 cameras. In order to detect feature points for each frame of image, first construct a scale space with 4 layers and 4 inner layers, and obtain 8 images with different sampling rates of the frame image. Then use the FAST[9] detection operator to locate the feature points, and perform non-maximum value suppression on the located feature points in the multi-scale space, that is, each feature point is related to the image space (8-neighborhood) and scale of the feature point. A total of 26 neighborhood points in the space (upper and lower layers of 9×2=18 neighborhoods) are compared, and the feature points with the maximum FAST response value in the neighborhood are retained, and non-maximum values in the neighborhood are removed. Finally, based on the quadratic fitting, the sub-pixel-level precise position of the feature points and the precise scale of the scale space are obtained. In order to describe each feature point, first of all, concentric circles with different radii are constructed with the feature point as the center, and a certain number of equally spaced sampling points are obtained on each circle. There are 60 sampling points including the feature point itself. Points are Gaussian filtered to eliminate the aliasing effect, and the main direction of the gray scale formed by these sampling points is calculated; then the sampling area around the feature point is rotated to the main direction to obtain a new sampling area, and 512 sampling point pairs are constructed, according to The gray difference of point pairs forms a 512-dimensional binary code, that is, each feature vector is 64Bytes.
步骤2,利用步骤1提取的图像特征描述子进行基于关键帧的特征关联(也称为特征匹配)。如图3所示,匹配过程将当前帧束j的特征与几个先前帧束的特征相关联,包括K个 最近的关键帧束和上一帧束j-1。在与K关键帧束的匹配完成后,如果当前帧束中特征匹配个数与特征点个数的比例<T r,则将帧束j选为关键帧束。如果有多个相机,也可以在当前帧束j的N张图像之间进行特征关联。两帧图像之间特征描述子的匹配通过暴力搜索、k最近邻或窗口搜索的方法完成。 Step 2, use the image feature descriptor extracted in step 1 to perform feature association based on keyframes (also called feature matching). As shown in Fig. 3, the matching process associates the features of the current frame bundle j with the features of several previous frame bundles, including the K most recent keyframe bundles and the previous frame bundle j−1. After the matching with the K key frame bundle is completed, if the ratio of the number of feature matches to the number of feature points in the current frame bundle is < T r , frame bundle j is selected as the key frame bundle. If there are multiple cameras, feature association can also be performed between the N images of the current frame bundle j. The matching of feature descriptors between two frames of images is done by brute force search, k-nearest neighbor or window search.
实施例具体的实施过程说明如下(例如N=2,K=2,暴力搜索,T r=0.2): The specific implementation process of the embodiment is described as follows (for example, N=2, K=2, violent search, T r =0.2):
图3展示了对于一个含双目相机的设备采集的数据,进行特征点匹配的过程,其中灰底的帧束表示关键帧束,白底帧束表示普通帧束。特征匹配包含①-⑦等步骤,①是当前帧束j与上一帧束j-1左相机图像的匹配,②是当前帧束j与关键帧束j-2左相机图像的匹配,③是当前帧束j与关键帧束j 2左相机图像的匹配,④是当前帧束j与上一帧束j-1右相机图像的匹配,⑤是当前帧束j与关键帧束j-2右相机图像的匹配,⑥是当前帧束j与关键帧束j 2右相机图像的匹配,⑦是当前帧束j中左右两个相机的两帧图像之间的匹配。特征匹配首先执行当前帧束j与前两个关键帧束j-2,j 2的匹配—②③⑤⑥,然后判断是否选定当前帧束j为关键帧束。如果当前帧束中特征匹配的个数小于其特征点个数的20%,则选定j为关键帧束。接着,执行当前帧束j与上一帧束j-1的匹配—①④。最后,执行当前帧束j中2帧图像之间的匹配—⑦。每两帧图像的匹配含两个步骤,首先根据特征描述子间的Hamming距离通过暴力搜索寻找最近的特征匹配,然后通过内含5点法的RANSAC方法去除异常匹配。 Figure 3 shows the process of feature point matching for the data collected by a device with a binocular camera, where the frame bundles with a gray background represent key frame bundles, and the frame bundles with a white background represent ordinary frame bundles. Feature matching includes ①-⑦ and other steps, ① is the matching of the current frame bundle j and the left camera image of the previous frame bundle j-1, ② is the matching of the current frame bundle j and the key frame bundle j-2 left camera image, ③ is The match between the current frame bundle j and the key frame bundle j- 2 left camera image, ④ is the match between the current frame bundle j and the previous frame bundle j-1 right camera image, ⑤ is the current frame bundle j and the key frame bundle j-2 right The matching of camera images, ⑥ is the matching of the current frame bundle j and the right camera image of the key frame bundle j 2 , and ⑦ is the matching between the two frame images of the left and right cameras in the current frame bundle j. Feature matching first performs the matching between the current frame bundle j and the first two key frame bundles j-2, j 2 —②③⑤⑥, and then judges whether the current frame bundle j is selected as the key frame bundle. If the number of feature matches in the current frame bundle is less than 20% of the number of feature points, select j as the key frame bundle. Next, perform the matching of the current frame bundle j and the previous frame bundle j-1—①④. Finally, perform the matching between the 2 frame images in the current frame bundle j—⑦. The matching of every two frames of images consists of two steps. First, the nearest feature matching is found by violent search according to the Hamming distance between the feature descriptors, and then the abnormal matching is removed by the RANSAC method including the 5-point method.
步骤3,如果滑窗滤波器尚未初始化,利用几个帧束(如2个关键帧)中的特征轨迹和IMU数据初始化状态向量和协方差。在时刻t所有状态量组成的状态向量X(t)一般情况下如图4所示,其中包含当前时刻t的导航状态量π(t)、IMU参数x imu,N个相机的参数
Figure PCTCN2022080346-appb-000009
Figure PCTCN2022080346-appb-000010
过去的N kf个关键帧束对应的导航状态量
Figure PCTCN2022080346-appb-000011
和最近N tf+1个帧束对应的导航状态量
Figure PCTCN2022080346-appb-000012
以及m个地标点L 1,L 2,...,L m(在图4中,N kf=2,N tf=3)。其中,x imu可以包括IMU的零偏、尺度因子、错位和加速度敏感性(图4实例2),x C可以包含每个相机的投影几何参数、外部参数、相机的时间延迟和卷帘快门效应(图4实例2)。这些过去的N kf个关键帧束和最近的N tf个帧束形成一个导航状态量的滑动窗口。
Step 3, if the sliding window filter has not been initialized, initialize the state vector and covariance using the feature trajectory and IMU data in several frame bundles (such as 2 keyframes). The state vector X(t) composed of all state quantities at time t is generally shown in Figure 4, which includes the navigation state quantity π(t) at the current time t, the IMU parameter x imu , and the parameters of N cameras
Figure PCTCN2022080346-appb-000009
Figure PCTCN2022080346-appb-000010
Navigation state quantities corresponding to the past N kf keyframe bundles
Figure PCTCN2022080346-appb-000011
The navigation state quantity corresponding to the most recent N tf +1 frame bundles
Figure PCTCN2022080346-appb-000012
and m landmark points L 1 , L 2 , ..., L m (in FIG. 4 , N kf =2, N tf =3). Among them, x imu can include the bias, scale factor, misalignment and acceleration sensitivity of the IMU (Fig. 4 instance 2), and x C can include the projection geometry parameters of each camera, extrinsic parameters, camera time delay and rolling shutter effect (Fig. 4 instance 2). These past N kf key frame bundles and the latest N tf frame bundles form a sliding window of the navigation state quantity.
实施例具体实施方案为:The specific implementation plan of embodiment is:
等到步骤2给出两个关键帧束j 1,j 2和相应的IMU数据后,对滤波器进行初始化。滤波器初始的位置和速度设为零,初始的姿态通过起初的加速度计数据来确定,以使得滤波器的参考世界坐标系{W}的z轴沿着负重力方向。对于IMU的零偏,如果帧束j 2相对于j 1的像素移动(即光流)较小,则通过平均IMU数据来设置陀螺仪和加速度计的零偏。否则,陀螺仪 和加速度计的零偏将被设置为零。相机相对IMU的时间延迟初始化为0。其他的传感器参数和状态量的协方差根据数据表或经验设定。如果传感器事先已经做了很好的标定,可以设定这些传感器参数对应的协方差矩阵的条目为零,来固定这些参数,即这些参数不被更新。 After step 2 gives two key frame bundles j 1 , j 2 and corresponding IMU data, initialize the filter. The initial position and velocity of the filter are set to zero, and the initial attitude is determined from the initial accelerometer data, so that the z-axis of the filter's reference world coordinate system {W} is along the negative gravity direction. For the bias of the IMU, if the pixel movement (i.e., optical flow) of the frame bundle j2 relative to j1 is small, the bias of the gyroscope and accelerometer is set by averaging the IMU data. Otherwise, the bias of the gyroscope and accelerometer will be set to zero. The time delay of the camera relative to the IMU is initialized to 0. The covariance of other sensor parameters and state quantities is set according to data tables or experience. If the sensor has been well calibrated in advance, the entries of the covariance matrix corresponding to these sensor parameters can be set to zero to fix these parameters, that is, these parameters will not be updated.
步骤4,对于当前帧束j,通过IMU数据将导航状态量π(t)和协方差矩阵传播到该帧束对应的时刻t j。然后,克隆导航状态量π(t j)=π(t)并添加至状态向量,并且在协方差矩阵中添加对应π(t j)的行和列,这些行和列的值从对应π(t)的行和列复制。 Step 4, for the current frame bundle j, propagate the navigation state quantity π(t) and the covariance matrix to the time t j corresponding to the frame bundle through the IMU data. Then, clone the navigation state quantity π(t j )=π(t) and add it to the state vector, and add the row and column corresponding to π(t j ) in the covariance matrix, and the values of these rows and columns are changed from corresponding π( t) for row and column replication.
实施例具体实施方案为:The specific implementation plan of embodiment is:
对于帧束j,将导航状态量π(t)和协方差矩阵通过IMU数据传播到该帧束对应的时刻t j。然后,将导航状态量π(t)克隆为π(t j)并添加至状态向量(如图5所示,注意π j∶=π(t j)),并且对协方差矩阵(如图6所示)的相应行和列也进行扩增(如图7所示)。 For frame bundle j, the navigation state quantity π(t) and covariance matrix are propagated to the time t j corresponding to the frame bundle through IMU data. Then, the navigation state quantity π(t) is cloned into π(t j ) and added to the state vector (as shown in Figure 5, note that π j := π(t j )), and the covariance matrix (as shown in Figure 6 The corresponding rows and columns are also amplified (as shown in Figure 7).
步骤5,利用当前帧束j中的特征轨迹更新状态向量和协方差。针对特征轨迹可能发生的三种情况,采取不同的方式执行EKF更新。这三种情况包括该特征轨迹在当前帧束中消失、该特征轨迹对应的地标点在状态向量中,以及该特征轨迹对应不在状态向量中但观测良好的地标点。观测良好的地标点需要满足两个条件,一是有足够长的观测,比如在7帧图像中被观测到,二是其对应的地表点可以成功的三角化,即这些观测有足够的视差。对于第一种情况,首先利用该特征轨迹三角化一个地标点,然后计算该特征轨迹上所有观测的重投影误差和雅可比矩阵,之后利用矩阵零空间投影以消除地标点参数的雅可比矩阵,再通过马氏检验去除异常值。对于第二种情况,首先计算特征轨迹在当前帧束j中的观测的重投影误差和雅可比矩阵,然后通过马氏检验去除异常值。对于第三种情况,首先利用该特征轨迹三角化一个地标点,然后计算该特征轨迹上所有观测的重投影误差和雅可比矩阵,通过马氏检验去除异常值,接着添加该地标点到状态向量和协方差矩阵。最后,按以上三种观测情况把所有的特征轨迹分为三类,对每类特征轨迹,叠加上述的重投影误差和雅可比矩阵,对状态向量和协方差矩阵进行更新,更新的方式与经典EKF相同。Step 5, update the state vector and covariance using the feature trajectory in the current frame bundle j. For the three situations that may occur in the feature trajectory, different ways are adopted to perform EKF update. These three cases include that the feature trajectory disappears in the current frame bundle, the landmark point corresponding to the feature trajectory is in the state vector, and the feature trajectory corresponds to a well-observed landmark point that is not in the state vector. Well-observed landmark points need to meet two conditions. One is that there are sufficiently long observations, such as being observed in 7 frames of images, and the other is that their corresponding surface points can be successfully triangulated, that is, these observations have sufficient parallax. For the first case, first use the characteristic trajectory to triangulate a landmark point, then calculate the reprojection error and Jacobian matrix of all observations on the characteristic trajectory, and then use the matrix zero-space projection to eliminate the Jacobian matrix of the landmark point parameters, Outliers were removed by the Markov test. For the second case, the reprojection error and the Jacobian matrix of the observations of the feature trajectory in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test. For the third case, first use the characteristic trajectory to triangulate a landmark point, then calculate the reprojection error and Jacobian matrix of all observations on the characteristic trajectory, remove outliers through the Markov test, and then add the landmark point to the state vector and covariance matrix. Finally, according to the above three observation situations, all the characteristic trajectories are divided into three categories. For each type of characteristic trajectory, the above-mentioned reprojection error and Jacobian matrix are superimposed, and the state vector and covariance matrix are updated. The update method is the same as that of the classic EKFs are the same.
实施例具体实施方案为:The specific implementation plan of embodiment is:
如图8所示,对于在当前帧束j中消失的特征轨迹,若其包含至少3个观测,则可作为一个完整的特征轨迹来三角化一个地标点。如果三角化成功,将计算该特征轨迹上所有观测的重投影误差和雅可比矩阵,之后利用矩阵零空间投影以消除地标点参数的雅可比矩阵,进而通过马氏检验去除异常值。对于在状态向量中的地标点的特征轨迹,首先计算当前帧束j中该地标点观测的重投影误差和雅可比矩阵,然后通过马氏检验去除异常值。对于不在状态 向量中的地标点的足够长的特征轨迹(如观测个数>7),首先三角化地标点,如果三角化成功,将计算该特征轨迹上所有观测的重投影误差和雅可比矩阵,再通过马氏检验去除异常值,进而将该地标点添加到状态向量中并相应的扩展协方差矩阵。最后,把所有的特征轨迹按以上情况,分为三类,将每一类中所有观测轨迹对应的重投影误差和雅可比矩阵叠在一起,执行一次EKF更新步骤(共三次更新)。As shown in Figure 8, for a feature trajectory that disappears in the current frame bundle j, if it contains at least 3 observations, it can be used as a complete feature trajectory to triangulate a landmark point. If the triangulation is successful, the reprojection error and the Jacobian matrix of all observations on the feature trajectory will be calculated, and then the matrix null space projection will be used to eliminate the Jacobian matrix of the landmark point parameters, and then the outliers will be removed by the Markov test. For the feature trajectory of a landmark point in the state vector, the reprojection error and the Jacobian matrix of the landmark point observation in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test. For sufficiently long feature trajectories of landmark points that are not in the state vector (such as the number of observations > 7), first triangulate the landmark points, and if the triangulation is successful, the reprojection error and Jacobian matrix of all observations on the feature trajectory will be calculated , and then the outliers are removed by the Markov test, and then the landmark points are added to the state vector and the covariance matrix is expanded accordingly. Finally, divide all feature trajectories into three categories according to the above conditions, stack the reprojection errors and Jacobian matrices corresponding to all observed trajectories in each category, and perform an EKF update step (a total of three updates).
步骤6,为了限制计算量,当滑动窗口中导航状态量的数量超过允许的关键帧束数目N kf和最近帧束数目N tf之和时,则从滑动窗口中选取冗余帧束并将其边缘化。因为对一个未知地标点的两个观测不能对位姿提供有效约束,所以在一次边缘化操作中至少选择N r个冗余帧束(对于单目相机,N r为3,对于双目相机,N r为2)。为了满足这个要求,首先在最近的非关键帧束中选择冗余帧束,同时排除最近的N tf个帧束,然后在最早的关键帧束中选择冗余帧束。在找到N r个冗余帧束后,可以利用其中长度超过两次观测的特征轨迹来更新滑窗滤波器,具体而言如果这样一条特征轨迹可以利用其所有的观测成功三角化一个地标点,则使用该特征轨迹上冗余帧束内的特征点观测进行EKF更新,类似步骤5第一种情况;冗余帧束中属于其他特征轨迹的特征点将被丢弃。EKF更新后,删除这些冗余帧束对应的导航状态量和协方差矩阵的行和列。 Step 6, in order to limit the amount of calculation, when the number of navigation state quantities in the sliding window exceeds the sum of the allowable number of key frame bundles N kf and the number of recent frame bundles N tf , select redundant frame bundles from the sliding window and convert them to marginalized. Because two observations of an unknown landmark point cannot provide effective constraints on the pose, at least N r redundant frame bundles are selected in a marginalization operation (for monocular cameras, N r is 3, for stereo cameras, Nr is 2). To meet this requirement, redundant frame bundles are first selected among the nearest non-key frame bundles while excluding the nearest N tf frame bundles, and then redundant frame bundles are selected among the earliest key frame bundles. After finding N r redundant frame bundles, the sliding window filter can be updated by using the feature trajectory whose length exceeds two observations. Specifically, if such a feature trajectory can successfully triangulate a landmark point with all its observations, Then use the feature point observations in the redundant frame bundle on the feature track to perform EKF update, similar to the first case in step 5; the feature points belonging to other feature tracks in the redundant frame bundle will be discarded. After the EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
实施例具体实施方案为:The specific implementation plan of embodiment is:
为了选取N r个冗余帧束(对于单目相机,N r为3,对于双目相机,N r为2),如图9所示,先排除最近的N tf个帧束,然后在最近的非关键帧束中选择冗余帧束(假设选取n r个),然后在最早的关键帧束中选择N r-n r个冗余帧束。注意去除这些冗余帧束后,至少要保留一个关键帧束。对于图3中N kf=2和N tf=3的情况,冗余帧束为关键帧束j 1和普通帧束j-3(如图10左图虚线框所示)。对于另外一种假设的情况,如图10右图所示,冗余帧束为最旧的两个帧束。选定冗余帧束后,将利用在冗余帧束中观测次数超过两次的地标点的观测来更新滤波器。如果该地标点可以利用其所有的观测成功三角化,则使用其特征轨迹上冗余帧束内的观测进行EKF更新,类似步骤5第一种情况;冗余帧束中其他的特征点观测将被丢弃。EKF更新后,删除这些冗余帧束对应的导航状态量和协方差矩阵的行和列。 In order to select N r redundant frame bundles (for a monocular camera, N r is 3, and for a binocular camera, N r is 2), as shown in Figure 9, the nearest N tf frame bundles are excluded first, and then the nearest Select redundant frame bundles from the non-key frame bundles (assuming that n r are selected), and then select N r -n r redundant frame bundles from the earliest key frame bundles. Note that after removing these redundant frame bundles, at least one keyframe bundle should be kept. For the case of N kf =2 and N tf =3 in FIG. 3 , the redundant frame bundles are key frame bundle j 1 and normal frame bundle j-3 (as shown by the dotted line box in the left figure of FIG. 10 ). For another hypothetical situation, as shown in the right figure of FIG. 10 , the redundant frame bundles are the two oldest frame bundles. When a redundant bundle is selected, the filter is updated with observations of landmark points that are observed more than twice in the redundant bundle. If the landmark point can be successfully triangulated using all its observations, use the observations in the redundant frame bundle on its feature trajectory to perform EKF update, similar to the first case in step 5; other feature point observations in the redundant frame bundle will be thrown away. After the EKF is updated, the rows and columns of the navigation state quantities and the covariance matrix corresponding to these redundant frame bundles are deleted.
本发明提供的方法,也可以利用多线程设计实现为相应程序,如图2所示。基于关键帧滑窗滤波的视觉惯性里程计的多线程程序包含特征提取线程、基于关键帧的特征关联线程、相机-IMU同步线程和滑窗滤波线程。该多线程实现可以显著提高程序的吞吐率。The method provided by the present invention can also be implemented as a corresponding program by utilizing multi-thread design, as shown in FIG. 2 . The multi-threaded program of visual-inertial odometry based on key frame sliding window filtering includes feature extraction thread, key frame based feature association thread, camera-IMU synchronization thread and sliding window filtering thread. This multi-threaded implementation can significantly increase the throughput of the program.
特征提取线程,用于检测相机拍摄的每一帧图像的特征点,并用一种特征描述方法对特征点生成描述子。对于由N个相机捕获的含N个时间戳接近图像的一个帧束j,使用如BRISK[7]方法检测特征点并提取描述子。BRISK利用FAST[9]算法进行特征点检测,通过构造尺度空间中的图像金字塔来检测特征。为了生成特征点的描述子,通过比较特征点邻域内512个像素对的灰度值,得到描述子的二进制编码。The feature extraction thread is used to detect the feature points of each frame image captured by the camera, and use a feature description method to generate descriptors for the feature points. For a frame bundle j with N timestamps close to the image captured by N cameras, use methods such as BRISK [7] to detect feature points and extract descriptors. BRISK uses the FAST[9] algorithm for feature point detection, and detects features by constructing image pyramids in the scale space. In order to generate the descriptor of the feature point, the binary code of the descriptor is obtained by comparing the gray value of 512 pixel pairs in the neighborhood of the feature point.
基于关键帧的特征关联线程,用于关联当前帧束与先前帧束的图像特征,根据特征提取线程提取的图像特征进行基于关键帧的特征点匹配。具体的匹配步骤包括匹配当前帧束与K个关键帧束,匹配当前帧束与上一帧束,以及两两匹配当前帧束的N张图像。在与K个关键帧束匹配完成后,将根据特征匹配的比例判断是否选定当前帧束为关键帧束。两帧图像特征匹配的一种实现方法是通过暴力搜索的方式找到距每个特征描述子最近的描述子,最后这些匹配会经过一个包含5点法的RANSAC方法来去除外点。The key frame-based feature association thread is used to associate the image features of the current frame bundle and the previous frame bundle, and perform key frame-based feature point matching according to the image features extracted by the feature extraction thread. The specific matching steps include matching the current frame bundle with K key frame bundles, matching the current frame bundle with the previous frame bundle, and matching the N images of the current frame bundle two by two. After the matching with the K key frame bundles is completed, it will be judged whether the current frame bundle is selected as the key frame bundle according to the ratio of feature matching. One way to achieve two-frame image feature matching is to find the closest descriptor to each feature descriptor through violent search. Finally, these matches will go through a RANSAC method including a 5-point method to remove outliers.
相机-IMU同步线程,用于同步相机和IMU数据。该线程可以通过条件变量来锁住其他线程,以等待对应当前帧束的IMU数据到来。Camera-IMU synchronization thread for synchronizing camera and IMU data. The thread can lock other threads through the condition variable to wait for the arrival of the IMU data corresponding to the current frame bundle.
滑窗滤波线程,利用特征关联线程得到的特征轨迹和IMU数据更新滤波器。如果滤波器尚未初始化,将利用帧束间的特征轨迹、IMU数据、数据表和经验值来初始化滤波器的状态向量和协方差。对于已经初始化的情形,当帧束j到达时,将导航状态量π(t)和协方差矩阵通过IMU数据传播到j帧束的时刻t j,然后克隆导航状态量π(t j)=π(t)并添加到状态向量,并且对协方差矩阵也进行相应扩展。接着,根据特征轨迹的属性,分三种情况执行EKF更新:在当前帧束j中消失的特征轨迹,在状态向量中的地标点对应的特征轨迹,以及不在状态向量中但观测足够长的地标点的特征轨迹。三种情况在如何准备更新方面有所不同。对于第一种情况,利用矩阵零空间投影以消除地标点参数的雅可比矩阵。对于第三种情况,三角化的新地标点将被添加到状态向量和协方差矩阵。这些特征轨迹按这三种情况分为三类。每类执行一次经典的EKF更新。当状态向量中导航状态量的数目超过允许的关键帧束数目N kf和最近帧束数目N tf之和时,从中选择冗余帧束并将其边缘化。每次至少选择N r个冗余帧束(对于单目设备,N r为3,对于双目设备,N r为2)。这些帧束选取时,将排除最近的N tf个帧束,接着在非关键帧束中选择冗余帧束,最后在最早的关键帧束中选择冗余帧束。选取冗余帧束后,可以利用在冗余帧束中观测超过两次的地标点的观测来更新滤波器。找出这些地标点后,选取其中可以成功三角化的地标点,利用他们在冗余帧束中的观测进行一次EKF更新,类似于上述更新的第一种情况。冗余帧束中的其他特征点观测将被丢弃。更新完成后,删除这些冗 余帧束对应的导航状态量和协方差矩阵的条目。 The sliding window filtering thread uses the feature trajectory and IMU data obtained by the feature association thread to update the filter. If the filter has not been initialized, the state vector and covariance of the filter will be initialized using the feature trajectory between frame bundles, IMU data, data tables and empirical values. For the already initialized situation, when the frame bundle j arrives, the navigation state quantity π(t) and the covariance matrix are propagated to the moment t j of the j frame bundle through the IMU data, and then the navigation state quantity π(t j )=π (t) and added to the state vector, and the covariance matrix is also expanded accordingly. Then, according to the properties of the feature trajectory, the EKF update is performed in three cases: the feature trajectory that disappears in the current frame bundle j, the feature trajectory corresponding to the landmark point in the state vector, and the landmark point that is not in the state vector but has a long enough observation Characteristic locus of punctuation. The three cases differ in how the update is prepared. For the first case, the matrix null-space projection is utilized to eliminate the Jacobian matrix of the landmark point parameters. For the third case, the triangulated new landmark points will be added to the state vector and covariance matrix. These characteristic trajectories are divided into three categories according to these three situations. A classic EKF update is performed per class. When the number of navigation state quantities in the state vector exceeds the sum of the allowable number of key frame bundles N kf and the number of recent frame bundles N tf , redundant frame bundles are selected and marginalized. At least N r redundant frame bundles are selected each time (for monocular devices, N r is 3, and for binocular devices, N r is 2). When these frame bundles are selected, the nearest N tf frame bundles will be excluded, then redundant frame bundles will be selected among non-key frame bundles, and finally redundant frame bundles will be selected among the earliest key frame bundles. After the redundant frame bundle is selected, the filter can be updated with observations of landmark points observed more than twice in the redundant frame bundle. After finding these landmark points, select the landmark points that can be successfully triangulated, and use their observations in the redundant frame bundle to perform an EKF update, similar to the first case of the above update. Other feature point observations in redundant frame bundles will be discarded. After the update is completed, delete the entries of the navigation state quantity and covariance matrix corresponding to these redundant frame bundles.
本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各式修改、补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains may make various modifications, supplements or similar substitutions to the described specific embodiments without departing from the spirit of the present invention or exceeding the scope defined in the appended claims.
[1]A.Flint,O.Naroditsky,C.P.Broaddus,A.Grygorenko,S.Roumeliotis,and O.Bergig,“Visual-based inertial navigation,”US9424647B2,Aug.23,2016 Accessed:Apr.26,2021.[Online].Available:https://patents.google.com/patent/US9424647/nl[1] A.Flint, O.Naroditsky, C.P.Broaddus, A.Grygorenko, S.Roumeliotis, and O.Bergig, "Visual-based inertial navigation," US9424647B2, Aug.23, 2016 Accessed: Apr.26, 2021. [Online].Available: https://patents.google.com/patent/US9424647/nl
[2]S.I.Roumeliotis and K.J.Wu,“Inverse sliding-window filters for vision-aided inertial navigation systems,”US9658070B2,May 23,2017 Accessed:Apr.26,2021.[Online].Available:https://patents.google.com/patent/US9658070B2/en?oq=US9658070B2-patent-Inverse+sliding-window+filters+for+vision-aided+inertial+navigation[2] S.I.Roumeliotis and K.J.Wu, "Inverse sliding-window filters for vision-aided inertial navigation systems," US9658070B2, May 23, 2017 Accessed: Apr.26, 2021. [Online]. Available: https://patents. google.com/patent/US9658070B2/en? oq=US9658070B2-patent-Inverse+sliding-window+filters+for+vision-aided+inertial+navigation
[3]S.I.Roumeliotis andA.I.Mourikis,“Vision-aided inertial navigation,”US9766074B2,Sep.19,2017 Accessed:Apr.26,2021.[Online].Available:https://patents.google.com/patent/US9766074B2/en[3] S.I.Roumeliotis and A.I.Mourikis, "Vision-aided inertial navigation," US9766074B2, Sep.19, 2017 Accessed: Apr.26, 2021. [Online]. Available: https://patents.google.com/ patent/US9766074B2/en
[4]M.Li and A.Mourikis,“Real-time pose estimation system using inertial and feature measurements,”US9798929B2,Oct.24,2017 Accessed:Feb.18,2022.[Online].Available:https://patents.google.com/patent/US9798929B2/en[4]M.Li and A.Mourikis, "Real-time pose estimation system using inertial and feature measurements," US9798929B2, Oct.24, 2017 Accessed: Feb.18, 2022.[Online].Available:https:// patents.google.com/patent/US9798929B2/en
[5]T.Qin,P.Li,and S.Shen,“VINS-Mono:A robust and versatile monocular visual-inertial state estimator,”IEEE Trans.Robot.,vol.34,no.4,pp.1004–1020,Aug.2018,doi:10.1109/TRO.2018.2853729.[5] T.Qin, P.Li, and S.Shen, "VINS-Mono: A robust and versatile monocular visual-inertial state estimator," IEEE Trans.Robot., vol.34, no.4, pp.1004 –1020, Aug.2018, doi:10.1109/TRO.2018.2853729.
[6]E.Rublee,V.Rabaud,K.Konolige,and G.Bradski,“ORB:An efficient alternative to SIFT or SURF,”in 2011International Conference on Computer Vision,Nov.2011,pp.2564–2571.doi:10.1109/ICCV.2011.6126544.[6] E.Rublee, V.Rabaud, K.Konolige, and G.Bradski, "ORB: An efficient alternative to SIFT or SURF," in 2011International Conference on Computer Vision, Nov.2011, pp.2564–2571.doi :10.1109/ICCV.2011.6126544.
[7]S.Leutenegger,M.Chli,and R.Y.Siegwart,“BRISK:Binary robust invariant scalable keypoints,”in Intl.Conf.on Computer Vision(ICCV),Barcelona,Spain,Nov.2011,pp.2548–2555.doi:10.1109/ICCV.2011.6126542.[7] S.Leutenegger, M.Chli, and R.Y.Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Intl.Conf.on Computer Vision (ICCV), Barcelona, Spain, Nov.2011, pp.2548–2555 .doi:10.1109/ICCV.2011.6126542.
[8]A.Alahi,R.Ortiz,and P.Vandergheynst,“FREAK:Fast retina keypoint,”in 2012 IEEE Conference on Computer Vision and Pattern Recognition,Jun.2012,pp.510–517.doi:10.1109/CVPR.2012.6247715.[8] A.Alahi, R.Ortiz, and P.Vandergheynst, "FREAK: Fast retina keypoint," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun.2012, pp.510–517. doi:10.1109/CVPR .2012.6247715.
[9]E.Rosten and T.Drummond,“Machine learning for high-speed corner detection,”in European conference on computer vision,2006,pp.430–443.[9] E.Rosten and T.Drummond, "Machine learning for high-speed corner detection," in European conference on computer vision, 2006, pp.430–443.

Claims (6)

  1. 一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于,包含以下步骤:A kind of visual-inertial odometry method containing self-calibration based on key frame sliding window filtering, is characterized in that, comprises the following steps:
    首先在设备上固定安装一个IMU和N个相机,相机用于采集图像帧束的序列,IMU用于采集3轴加速度计数据和3轴陀螺仪数据,然后通过滤波器执行以下步骤估计设备的状态和参数,以及不确定度;以下将位置、速度、姿态以及相机和IMU的参数总称为状态向量,这些状态向量的不确定度由状态向量的协方差来描述;为简化起见,在同一个时刻t的位姿和速度简称为导航状态量;First, an IMU and N cameras are fixedly installed on the device. The camera is used to collect the sequence of image frame bundles, and the IMU is used to collect 3-axis accelerometer data and 3-axis gyroscope data, and then perform the following steps to estimate the state of the device through the filter and parameters, as well as uncertainty; hereinafter, the parameters of position, velocity, attitude, camera and IMU are collectively referred to as state vectors, and the uncertainty of these state vectors is described by the covariance of state vectors; for simplicity, at the same moment The pose and velocity of t are referred to as navigation state quantities for short;
    步骤1,对于由N个相机捕获的含N帧时间相近的图像的帧束j,检测每张图像中的特征点,并对这些特征点生成描述子,即特征向量;Step 1, for the frame bundle j of N images captured by N cameras with similar time frames, detect the feature points in each image, and generate descriptors, ie, feature vectors, for these feature points;
    步骤2,利用步骤1提取的特征描述子进行基于关键帧的特征匹配;Step 2, using the feature descriptor extracted in step 1 to perform keyframe-based feature matching;
    步骤3,滤波器尚未初始化时,将通过帧束中的特征轨迹和IMU数据初始化状态向量和协方差;Step 3, when the filter has not been initialized, the state vector and covariance will be initialized through the feature trajectory and IMU data in the frame bundle;
    步骤4,在滤波器完成初始化的情况下,对于完成特征匹配的帧束j,根据IMU的数据,从帧束j-1对应的位姿和速度递推估算帧束j对应时刻t j的位姿和速度,并克隆推算的位姿和速度以扩充滤波器的状态向量和协方差; Step 4, when the filter is initialized, for the frame bundle j that has completed the feature matching, according to the data of the IMU, recursively estimate the position of the frame bundle j corresponding to the time t j from the pose and velocity corresponding to the frame bundle j-1 pose and velocity, and clone the inferred pose and velocity to augment the state vector and covariance of the filter;
    步骤5,利用特征点的特征轨迹更新滤波器的状态向量和协方差;Step 5, update the state vector and covariance of the filter by using the feature trajectory of the feature point;
    步骤6,更新完毕后,检测是否存在冗余帧,并删除冗余帧对应的导航状态量和协方差矩阵相应的行和列;Step 6, after the update is completed, detect whether there is a redundant frame, and delete the corresponding row and column of the navigation state quantity and the covariance matrix corresponding to the redundant frame;
    对下一帧束j+1,将重复执行步骤1-6;For the next frame bundle j+1, steps 1-6 will be repeated;
    每次循环都发布滤波器估计的状态向量和协方差,以服务下游的应用。Each loop publishes the filter estimated state vector and covariance to serve downstream applications.
  2. 如权利要求1所述的一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于:步骤1中特征点的检测方法包括ORB和BRISK,生成描述子的方法包括BRISK、FREAK和基于深度学习的特征描述子生成方法。A kind of visual-inertial odometry method containing self-calibration based on key frame sliding window filtering as claimed in claim 1, characterized in that: the detection method of feature points in step 1 includes ORB and BRISK, and the method for generating descriptors includes BRISK , FREAK and deep learning-based feature descriptor generation methods.
  3. 如权利要求1所述的一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于:步骤2中特征匹配过程是将当前帧束j的特征与几个先前帧束的特征相关联,包括K个最近的关键帧和上一帧束j-1,在与K个关键帧的匹配完成后,如果当前帧中特征匹配个数与特征点个数的比例小于T r,则将帧束j选为关键帧;如果每个帧束含多个相机的N张图像,则在当前帧束j的N张图像之间进行特征关联,两帧图像之间特征描述子的匹配通 过暴力搜索、k最近邻或窗口搜索的方法来完成。 A kind of visual-inertial odometry method containing self-calibration based on key frame sliding window filtering as claimed in claim 1, characterized in that: in step 2, the feature matching process is to combine the features of the current frame bundle j with several previous frame bundles The features associated with each other, including the K most recent key frames and the previous frame bundle j-1, after the matching with the K key frames is completed, if the ratio of the number of feature matches to the number of feature points in the current frame is less than T r , then frame bundle j is selected as the key frame; if each frame bundle contains N images of multiple cameras, feature association is performed between the N images of the current frame bundle j, and the feature descriptor between two frame images Matching is done by brute-force search, k-nearest neighbor or window search.
  4. 如权利要求1所述的一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于:步骤3中状态向量X(t)包含当前导航状态量π(t)、IMU参数x imu,N个相机的参数
    Figure PCTCN2022080346-appb-100001
    过去的N kf个关键帧束对应的导航状态量
    Figure PCTCN2022080346-appb-100002
    和最近N tf+1个帧束对应的导航状态量
    Figure PCTCN2022080346-appb-100003
    以及m个地标点L 1,L 2,…,L m;其中,x imu包括IMU的零偏、尺度因子、错位和加速度敏感性,x C包含每个相机的投影几何参数、外部参数、相机的时间延迟和卷帘快门效应;过去的N kf个关键帧束和最近的N tf个帧束形成一个导航状态量的滑动窗口。
    A kind of visual-inertial odometer method containing self-calibration based on key frame sliding window filtering as claimed in claim 1, is characterized in that: in step 3, state vector X (t) comprises current navigation state quantity π (t), IMU Parameter x imu , parameters of N cameras
    Figure PCTCN2022080346-appb-100001
    Navigation state quantities corresponding to the past N kf keyframe bundles
    Figure PCTCN2022080346-appb-100002
    The navigation state quantity corresponding to the most recent N tf +1 frame bundles
    Figure PCTCN2022080346-appb-100003
    and m landmark points L 1 , L 2 ,...,L m ; among them, x imu includes the zero bias, scale factor, misalignment and acceleration sensitivity of the IMU, and x C includes the projection geometry parameters, external parameters, camera The time delay and rolling shutter effect of ; the past N kf key frame bundles and the latest N tf frame bundles form a sliding window of the navigation state quantity.
  5. 如权利要求1所述的一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于:步骤5中针对每条特征轨迹可能发生的三种情况,采取不同的方式执行EKF更新,这三种情况包括特征轨迹在当前帧中消失、特征轨迹对应的地标点在状态向量中,以及特征轨迹对应不在状态向量中但观测良好的地标点;对于第一种情况,首先利用特征轨迹三角化一个地标点,然后计算特征轨迹上所有观测的重投影误差和雅可比矩阵,之后利用矩阵零空间投影以消除地标点参数的雅可比矩阵,再通过马氏检验去除异常值;对于第二种情况,首先计算特征轨迹在当前帧束j中的观测的重投影误差和雅可比矩阵,然后通过马氏检验去除异常值;对于第三种情况,首先利用特征轨迹三角化一个地标点,然后计算该特征轨迹上所有观测的重投影误差和雅可比矩阵,通过马氏检验去除异常值,接着添加该地标点到状态向量和协方差矩阵;最后,按以上三种观测情况把所有的特征轨迹分为三类,对每类特征轨迹,利用上述的重投影误差和雅可比矩阵对状态向量和协方差矩阵进行更新,更新的方式与经典EKF相同。A kind of visual-inertial odometry method containing self-calibration based on key frame sliding window filtering as claimed in claim 1, characterized in that: in step 5, for the three situations that may occur for each feature track, different methods are used to execute EKF update, these three cases include the feature trajectory disappearing in the current frame, the landmark point corresponding to the feature trajectory is in the state vector, and the feature trajectory corresponds to a well-observed landmark point that is not in the state vector; for the first case, first use The feature trajectory triangulates a landmark point, and then calculates the reprojection error and Jacobian matrix of all observations on the feature trajectory, and then uses the matrix zero-space projection to eliminate the Jacobian matrix of the landmark point parameters, and then removes outliers through the Markov test; for In the second case, the reprojection error and the Jacobian matrix of the observations of the feature trajectory in the current frame bundle j are first calculated, and then the outliers are removed by the Markov test; for the third case, a landmark point is first triangulated using the feature trajectory , and then calculate the reprojection error and Jacobian matrix of all observations on the characteristic trajectory, remove outliers through the Markov test, and then add the landmark point to the state vector and covariance matrix; finally, according to the above three observations, all The characteristic trajectory is divided into three types. For each type of characteristic trajectory, the state vector and covariance matrix are updated using the above-mentioned reprojection error and Jacobian matrix. The update method is the same as that of the classic EKF.
  6. 如权利要求5所述的一种基于关键帧滑窗滤波的含自标定的视觉惯性里程计方法,其特征在于:步骤6的具体实现方式如下;A kind of visual-inertial odometer method containing self-calibration based on key frame sliding window filtering as claimed in claim 5, is characterized in that: the concrete realization mode of step 6 is as follows;
    当滤波器状态向量中导航状态量的数量超过允许的关键帧数目N kf和最近帧数目N tf之和时,则从中选取冗余帧束并将其边缘化;每次边缘化操作,至少选择N r个冗余帧束,为了达到这个要求,首先在最近的非关键帧束中选择冗余帧,同时排除最近的N tf个帧束,然后在最早的关键帧束中选择冗余帧,在找到N r个冗余帧后,利用其中长度超过两次观测的特征轨迹来更新滑窗滤波器,即如果这种特征轨迹可以利用其所有的观测成功三角化一个地标点,则使用该轨迹上冗余帧中的特征轨迹进行EKF更新,更新方法和步骤5中的第一种情况相同; 冗余帧中属于其他特征轨迹的特征点将被丢弃,EKF更新后,删除这些冗余帧对应的导航状态量和协方差矩阵的行和列。 When the number of navigation state quantities in the filter state vector exceeds the sum of the allowable number of key frames N kf and the number of recent frames N tf , then select redundant frame bundles and marginalize them; for each marginalization operation, at least select N r redundant frame bundles, in order to meet this requirement, first select redundant frames in the nearest non-key frame bundle, while excluding the nearest N tf frame bundles, then select redundant frames in the earliest key frame bundle, After finding N r redundant frames, update the sliding window filter by using the feature trajectory whose length exceeds two observations, that is, if such a feature trajectory can successfully triangulate a landmark point with all its observations, then use this trajectory The feature trajectories in the upper redundant frame are updated by EKF, the update method is the same as the first case in step 5; the feature points belonging to other feature trajectories in the redundant frame will be discarded, after the EKF is updated, delete these redundant frames corresponding to The rows and columns of the navigation state quantities and covariance matrix.
PCT/CN2022/080346 2022-02-21 2022-03-11 Visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering WO2023155258A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210158652.2 2022-02-21
CN202210158652.2A CN114623817B (en) 2022-02-21 2022-02-21 Self-calibration-contained visual inertial odometer method based on key frame sliding window filtering

Publications (1)

Publication Number Publication Date
WO2023155258A1 true WO2023155258A1 (en) 2023-08-24

Family

ID=81899165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080346 WO2023155258A1 (en) 2022-02-21 2022-03-11 Visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering

Country Status (2)

Country Link
CN (1) CN114623817B (en)
WO (1) WO2023155258A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824463A (en) * 2023-08-31 2023-09-29 江西啄木蜂科技有限公司 Video key frame extraction method, computer readable storage medium and electronic device
CN117705107A (en) * 2024-02-06 2024-03-15 电子科技大学 Visual inertial positioning method based on two-stage sparse Shuerbu
CN117760428A (en) * 2024-02-22 2024-03-26 西北工业大学 Autonomous positioning method based on multi-stereoscopic vision inertial tight coupling
CN117826141A (en) * 2023-12-29 2024-04-05 广东工业大学 Collaborative positioning method for distributed unmanned aerial vehicle group in complex environment
CN118031914A (en) * 2024-04-11 2024-05-14 武汉追月信息技术有限公司 Urban engineering mapping method based on unmanned aerial vehicle remote sensing technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160305784A1 (en) * 2015-04-17 2016-10-20 Regents Of The University Of Minnesota Iterative kalman smoother for robust 3d localization for vision-aided inertial navigation
CN111024066A (en) * 2019-12-10 2020-04-17 中国航空无线电电子研究所 Unmanned aerial vehicle vision-inertia fusion indoor positioning method
CN111707261A (en) * 2020-04-10 2020-09-25 南京非空航空科技有限公司 High-speed sensing and positioning method for micro unmanned aerial vehicle
CN111780754A (en) * 2020-06-23 2020-10-16 南京航空航天大学 Visual inertial odometer pose estimation method based on sparse direct method
CN113066127A (en) * 2021-04-02 2021-07-02 视辰信息科技(上海)有限公司 Visual inertial odometer method and system for calibrating equipment parameters on line
CN113865584A (en) * 2021-08-24 2021-12-31 知微空间智能科技(苏州)有限公司 UWB three-dimensional object finding method and device based on visual inertial odometer
CN114001733A (en) * 2021-10-28 2022-02-01 浙江大学 Map-based consistency efficient visual inertial positioning algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166149B (en) * 2018-08-13 2021-04-02 武汉大学 Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN111739063B (en) * 2020-06-23 2023-08-18 郑州大学 Positioning method of power inspection robot based on multi-sensor fusion
CN111750864B (en) * 2020-06-30 2022-05-13 杭州海康机器人技术有限公司 Repositioning method and device based on visual map

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160305784A1 (en) * 2015-04-17 2016-10-20 Regents Of The University Of Minnesota Iterative kalman smoother for robust 3d localization for vision-aided inertial navigation
CN111024066A (en) * 2019-12-10 2020-04-17 中国航空无线电电子研究所 Unmanned aerial vehicle vision-inertia fusion indoor positioning method
CN111707261A (en) * 2020-04-10 2020-09-25 南京非空航空科技有限公司 High-speed sensing and positioning method for micro unmanned aerial vehicle
CN111780754A (en) * 2020-06-23 2020-10-16 南京航空航天大学 Visual inertial odometer pose estimation method based on sparse direct method
CN113066127A (en) * 2021-04-02 2021-07-02 视辰信息科技(上海)有限公司 Visual inertial odometer method and system for calibrating equipment parameters on line
CN113865584A (en) * 2021-08-24 2021-12-31 知微空间智能科技(苏州)有限公司 UWB three-dimensional object finding method and device based on visual inertial odometer
CN114001733A (en) * 2021-10-28 2022-02-01 浙江大学 Map-based consistency efficient visual inertial positioning algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STEFAN LEUTENEGGER, SIMON LYNEN, MICHAEL BOSSE, ROLAND SIEGWART, PAUL FURGALE: "Keyframe-based visual–inertial odometry using nonlinear optimization", THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, vol. 34, no. 3, 1 March 2015 (2015-03-01), pages 314 - 334, XP055417249, DOI: 10.1177/0278364914554813 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824463A (en) * 2023-08-31 2023-09-29 江西啄木蜂科技有限公司 Video key frame extraction method, computer readable storage medium and electronic device
CN116824463B (en) * 2023-08-31 2023-12-19 江西啄木蜂科技有限公司 Video key frame extraction method, computer readable storage medium and electronic device
CN117826141A (en) * 2023-12-29 2024-04-05 广东工业大学 Collaborative positioning method for distributed unmanned aerial vehicle group in complex environment
CN117705107A (en) * 2024-02-06 2024-03-15 电子科技大学 Visual inertial positioning method based on two-stage sparse Shuerbu
CN117705107B (en) * 2024-02-06 2024-04-16 电子科技大学 Visual inertial positioning method based on two-stage sparse Shuerbu
CN117760428A (en) * 2024-02-22 2024-03-26 西北工业大学 Autonomous positioning method based on multi-stereoscopic vision inertial tight coupling
CN117760428B (en) * 2024-02-22 2024-04-30 西北工业大学 Autonomous positioning method based on multi-stereoscopic vision inertial tight coupling
CN118031914A (en) * 2024-04-11 2024-05-14 武汉追月信息技术有限公司 Urban engineering mapping method based on unmanned aerial vehicle remote sensing technology

Also Published As

Publication number Publication date
CN114623817B (en) 2024-04-26
CN114623817A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
WO2023155258A1 (en) Visual inertial odometry method that contains self-calibration and is based on keyframe sliding window filtering
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
JP7326720B2 (en) Mobile position estimation system and mobile position estimation method
CN109307508B (en) Panoramic inertial navigation SLAM method based on multiple key frames
CN112197770B (en) Robot positioning method and positioning device thereof
Heng et al. Self-calibration and visual slam with a multi-camera system on a micro aerial vehicle
Rambach et al. Learning to fuse: A deep learning approach to visual-inertial camera pose estimation
Huang Review on LiDAR-based SLAM techniques
Oskiper et al. Multi-sensor navigation algorithm using monocular camera, IMU and GPS for large scale augmented reality
CN112304307A (en) Positioning method and device based on multi-sensor fusion and storage medium
JP7147753B2 (en) Information processing device, information processing method, and program
Cai et al. Mobile robot localization using gps, imu and visual odometry
Zhang et al. Vision-aided localization for ground robots
CN111882602B (en) Visual odometer implementation method based on ORB feature points and GMS matching filter
CN110207693B (en) Robust stereoscopic vision inertial pre-integration SLAM method
Zhou et al. On-board inertial-assisted visual odometer on an embedded system
CN114485640A (en) Monocular vision inertia synchronous positioning and mapping method and system based on point-line characteristics
Chen et al. Stereo visual inertial pose estimation based on feedforward-feedback loops
CN116007609A (en) Positioning method and computing system for fusion of multispectral image and inertial navigation
CN112991400B (en) Multi-sensor auxiliary positioning method for unmanned ship
CN112731503B (en) Pose estimation method and system based on front end tight coupling
Ling et al. RGB-D inertial odometry for indoor robot via keyframe-based nonlinear optimization
Qayyum et al. Inertial-kinect fusion for outdoor 3d navigation
Song et al. Bundledslam: An accurate visual slam system using multiple cameras
Yin et al. Stereo visual-inertial odometry with online initialization and extrinsic self-calibration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22926576

Country of ref document: EP

Kind code of ref document: A1