CN113240597A - Three-dimensional software image stabilization method based on visual inertial information fusion - Google Patents
Three-dimensional software image stabilization method based on visual inertial information fusion Download PDFInfo
- Publication number
- CN113240597A CN113240597A CN202110497661.XA CN202110497661A CN113240597A CN 113240597 A CN113240597 A CN 113240597A CN 202110497661 A CN202110497661 A CN 202110497661A CN 113240597 A CN113240597 A CN 113240597A
- Authority
- CN
- China
- Prior art keywords
- imu
- coordinate system
- matrix
- camera
- image stabilization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000006641 stabilisation Effects 0.000 title claims abstract description 58
- 238000011105 stabilization Methods 0.000 title claims abstract description 57
- 230000000007 visual effect Effects 0.000 title claims abstract description 49
- 230000004927 fusion Effects 0.000 title claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 71
- 238000005457 optimization Methods 0.000 claims abstract description 31
- 230000009466 transformation Effects 0.000 claims abstract description 29
- 238000010168 coupling process Methods 0.000 claims abstract description 14
- 230000008878 coupling Effects 0.000 claims abstract description 13
- 238000005859 coupling reaction Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims abstract description 12
- 230000010354 integration Effects 0.000 claims description 15
- 238000005259 measurement Methods 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000013519 translation Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000006073 displacement reaction Methods 0.000 claims description 4
- 230000005484 gravity Effects 0.000 claims description 4
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 abstract description 2
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 239000003381 stabilizer Substances 0.000 description 3
- 238000012937 correction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/10—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
- G01C21/12—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
- G01C21/16—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
- G01C21/165—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S11/00—Systems for determining distance or velocity not using reflection or reradiation
- G01S11/12—Systems for determining distance or velocity not using reflection or reradiation using electromagnetic waves other than radio waves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Automation & Control Theory (AREA)
- Electromagnetism (AREA)
- Image Processing (AREA)
Abstract
The invention provides a three-dimensional software image stabilization method based on visual inertia information fusion. Firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result. The invention can effectively solve the problem of poor motion vector estimation performance of the classical SFM method and improve the spatial position of the motion-compensated three-dimensional characteristic point and the overall motion vector precision.
Description
Technical Field
The invention relates to a Visual three-dimensional Software image Stabilization method with Inertial components, in particular to a Software image Stabilization (Software Stabilization) method based on a Visual Inertial Information Fusion (Visual Inertial Information Fusion) theory.
Background
The rapid development of the electronic information age leads to the technical soaring, and the unmanned technology gradually develops from vehicles to the air. Unmanned systems play an increasingly important role in military and civilian areas such as urban transportation, battlefield combat and the like. In the development of unmanned systems, the design of sensing, navigation and control systems is of paramount importance. The visual navigation technology is characterized in that the unmanned system extracts abundant navigation information from an environment image acquired by the camera in real time, so that the unmanned system has the capability of interacting with the environment, and the automation and intelligence level of control is improved. However, in some cases, the camera carrier may generate undesirable random jitter, and in such cases, the video and images obtained by the camera carrier may have unstable and blurred pictures, resulting in degradation of video quality and observation effect. In order to ensure the use value of the obtained video sequence, anti-shake processing, namely image stabilization processing, needs to be performed on the moving camera platform.
Image stabilization systems can be classified into mechanical image stabilization, optical image stabilization and digital image stabilization according to the image stabilization mode. The mechanical image stabilization platform is large in size and low in precision in practical use, and is easily influenced by external force errors such as friction force and wind resistance, so that the mechanical image stabilization platform is not suitable for serving as a main means for image stabilization of the small rotor wing unmanned aerial vehicle; the optical image stabilizer has a good optical image stabilization effect, can directly utilize an optical method to realize stabilization of an image sequence, but needs achromatic treatment on a variable optical wedge due to the existence of a secondary spectrum, so that the structure and the process of the image stabilizer are too complex, passive compensation is only carried out by a prism, a reflector or an optical wedge, and the image stabilization capability of the image stabilizer is greatly limited; the Digital Video Stabilization (DVS) technology is also called electronic image stabilization, and based on computer vision, image processing, signal processing and other subjects, the image stabilization function is realized mainly by compiling an image stabilization algorithm by a computer and software codes without the support of other systems, and the compiled software codes can be transplanted, so that the Digital video stabilization DVS system is convenient to maintain and update, has low cost and good precision. Digital video image stabilization is an emerging technology developed in recent decades, and generates a stable video sequence mainly by eliminating or reducing video jitter. Image stabilization methods can be divided into 2D, 2.5D and 3D methods according to current research models. The 2D method has simple algorithm principle and small operand, but the image stabilizing effect is not natural enough due to lack of depth of field, and the algorithm performance is limited; the 2.5D algorithm compromises the 2D algorithm and the 3D algorithm, the algorithm complexity is high, the image stabilizing effect is good, the parallax can be processed, but the algorithm needs to adopt enough long continuous appearance characteristic points in the multi-frame image to generate a track and stabilize the image according to certain constraint conditions, and the algorithm is difficult to be used for real-time image stabilization; the 3D algorithm has the highest complexity and the best performance, but the image stabilization effect of the algorithm is limited by the performance of the Motion recovery Structure (SFM), and it is difficult to obtain a good enough image stabilization effect.
Disclosure of Invention
The invention aims to provide a three-dimensional software image stabilization method based on visual inertial information fusion. The three-dimensional software image stabilization method based on visual inertial information fusion can be combined with a visual navigation method, the calculation amount of an algorithm is reduced, the image stabilization performance is good, and the quality of an obtained image sequence can be effectively improved.
The three-dimensional software image stabilization method based on visual inertia information fusion provided by the invention combines the advantages of high precision of the visual inertia method and the characteristic of relatively natural grid warping change, and can effectively solve the problem of precision bottleneck existing in the conventional image stabilization algorithm.
The technical scheme of the invention is as follows:
the three-dimensional software image stabilization method based on visual inertial information fusion comprises the following steps:
step 1: calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization;
step 2: performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
and step 3: tightly coupling IMU data and visual data to perform joint optimization, and acquiring a camera transformation matrix and optimized three-dimensional space characteristic points;
and 4, step 4: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
Further, in step 1, the camera and IMU calibration method is calibrated with reference to the github open source calibration tool Kalibr.
Further, in step 1, errors of the angular velocity and the linear velocity are obtained through IMU calibration to correct the IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.
Further, in step 2, the process of performing monocular vision initialization is as follows:
image frame I obtained for monocular vision1,I2There is a common three-dimensional point P under the world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ]]T(ii) a The poses of the three-dimensional point in the two frames of images have a relation:
s1p1=KP,s2p2=K(RP+t)
x1=K-1p1,x2=K-1p2
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera; can obtain the pose relation
Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and then, obtaining the scale information of each characteristic point by utilizing an IMU pre-integration model.
Further, the IMU pre-integration model is:
obtaining the gyroscope bias b according to the IMU calibration of the step 1wAccelerometer bias baAnd additive noise na,nw(ii) a Obtaining accelerometer values using 6-axis IMUAnd the gyroscope obtains the angular velocity value
Where the b subscripts indicate being in a body coordinate system,for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwThe gravity parameters under a world coordinate system; if the camera is at [ t ]k,tk+1]Image frames acquired in time are respectively k and k +1, wherein the corresponding positions of a certain characteristic point under a body coordinate system are respectively bkAnd bk+1And obtaining the transmission of the position, speed and direction values in the world coordinate system through IMU measurement values in a time interval as follows:
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,the body coordinate system is converted into a rotation matrix under world coordinates; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:
wherein:
and performing first-order Taylor expansion on the formula to obtain an approximate value, and then performing intermediate value expansion to obtain:
and obtaining an error transfer function matrix:
wherein
Obtaining corresponding parameters by using the error transfer function matrix; by using
Obtaining the scale information s so as to recover the actual depth of each characteristic point and the result under the world coordinate system, wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera.
Further, in step 3, the IMU data and the visual data are closely coupled for joint optimization, a sliding window method is adopted for optimization, and a part of frame information is selected as key frame information to be jointly optimized with the rest solutions.
Further, the optimization process by using a sliding window method in step 3 is as follows:
using formulas
Representing the parameter variable to be optimized, where xkN denotes a camera state at each time, including a positionSpeed of rotationAnd rotateAnd bias matrix b of accelerometer and gyroscope at each momenta,bg;Is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnIs the inverse depth value; solving an objective function
And optimizing, wherein the first part of the objective function represents the marginalized residual error part, the second part is the IMU residual error, and the third part is the visual error function part.
Advantageous effects
The invention establishes a monocular vision-based motion estimation model by a visual inertial fusion method, thereby realizing the improvement of the precision of the motion estimation method. And on the basis of the traditional SFM motion estimation method, a visual inertial navigation information tight coupling method is added, so that the overall accuracy of the algorithm for obtaining the three-dimensional information is improved.
The method of pre-warping and grid transformation local warping adopted by the invention effectively solves the parallax problem by a motion compensation method on the basis of the traditional image stabilization method, so that the result after image stabilization is more natural, and the image stabilization visual effect of an image stabilization algorithm can be effectively improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic antipodal geometry, with adjacent image frames I1,I2Setting an image frame I1→I2Is R, t, and the center position of the camera in each image frame is O1,O2. Suppose an image frame I1Its corresponding characteristic point is p1Image frame I2Corresponding characteristic point p2The two are matched by an ORB feature point method and subjected to RANSAC screeningThe result can be seen as the projection of the same point in two different image frame imaging planes. Thus, connecting the wiresAndwill intersect at a point P in three-dimensional space, represented by O1,O2And P we can get a plane named the polar plane (Epipolar plane), O1,O2The connecting line between the two points will intersect with the image frame at e1,e2Two points, referred to as poles (Epipoles), O1O2It is called Baseline. l1,l2For two image frame planes I1,I2The intersection with the polar plane is called the Epipolar line.
FIG. 2 is a schematic diagram of IMU integration, wherein red represents visual data and green represents IMU data.
Detailed Description
The invention provides a three-dimensional image stabilization method based on visual inertial navigation information fusion, aiming at the problem that the precision of the traditional image stabilization method cannot meet the requirement of a future high-precision image stabilization algorithm. The method mainly comprises two main parts of motion estimation and motion compensation. The motion estimation adopts a method of combining visual inertia information, can effectively solve the problem of poor motion vector estimation performance of the classical SFM method, and improves the spatial position of the three-dimensional characteristic point of the motion compensation and the overall motion vector precision. The motion compensation is used for pre-warping according to the obtained motion vector and the three-dimensional characteristic point cloud, carrying out grid segmentation to carry out three-dimensional image warping further compensation, and obtaining a result after image stabilization, so that the image stabilization precision is improved.
The main ideas of the invention are as follows: firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
The specific process is as follows:
calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters to acquire an IMU error model of subsequent tight coupling joint optimization;
the specific calibration method of vision and IMU calibration can be calibrated by referring to a github open source calibration tool Kalibr. The IMU calibration is to obtain the error of angular velocity and linear velocity to correct an IMU error model for scale acquisition and final overall optimization, the visual calibration is to obtain an internal reference matrix K and a distortion coefficient, the parameters can be further corrected by combined calibration, and a conversion matrix between the IMU and a camera carrier is obtained, and a monocular pinhole camera model is adopted by default. The distortion parameter of which comprises the tangential distortion p1,p2And radial distortion k1,k2,k3:
Wherein X and Y are measured values of the camera normalized coordinate system, X and Y are actual values of the camera normalized coordinate system, and r2=x2+y2The true values x, y after distortion correction can be obtained from the equation (1-1).
Performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
after distortion correction, the input image frame can be defaulted to be distortion-free. The pixels of the image frame taken later are by default already corrected. Initialization may then occur. As shown in fig. 1, image frame I1,I2A common three-dimensional point P under the world coordinate system is positioned as [ X, Y, Z ] under the world coordinate system]T. According to the monocular camera model, the poses of the three-dimensional points in the two images have the relationship of the formula (1-2):
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera, and the formula (1-3) can be deduced from the formula (1-2):
the essential matrix E and the basic matrix F can be obtained from the equations (1-3), and the rotation matrix R and the translation matrix t can be further obtained from the essential matrix E and the basic matrix F, for example, by an eight-point method or a five-point method. And then, triangulation can be adopted to recover the relative depth of each feature point, wherein the relative depth is mainly limited by the defect that a monocular camera cannot acquire an actual scale. And for jointly optimizing the scale, obtaining scale information by means of an IMU pre-integration model. In addition, another function of the IMU pre-integration is to obtain a linear model of the IMU integration error transfer formula for subsequent joint optimization.
IMU pre-integration model: according to the IMU calibration in the step 1, the gyroscope bias b can be obtainedwAccelerometer bias baAnd additive noise na,nw。
In order to make the IMU data information acquisition module universal, 6-axis IMU is selected for data acquisition, so that an accelerometer value can be acquired through an accelerometerAnd the gyroscope obtains the angular velocity value
Where the b subscripts indicate being in a body coordinate system,for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwIs a gravity parameter under a world coordinate system.
Suppose the vision sensor is at tk,tk+1]The image frames acquired in time are respectively k and k +1, and the corresponding positions under the body coordinate system are respectively bkAnd bk+1The position, velocity and direction values in the world coordinate system are passed through the IMU measurements over a time interval as equations (1-5) to (1-8):
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,for t moment body coordinate system toA rotation matrix in world coordinates. As can be seen from FIG. 2, b is the body coordinate systemk+1State of (1)Dependent on the previous moment bkThe real-time state of the system, so if the system is directly used for subsequent optimization procedures, the overall calculation amount is greatly increased. This is because the state result that is re-transmitted to the IMU every iteration when optimization is performed is directly used, and the state is updated.
For fusing the visual and IMU data, the position, speed and direction values under the world coordinate system should be converted into the body coordinate system, i.e. multiplied by the world coordinate system w to the body coordinate system b at the corresponding timekOf the rotation matrixThe formulae (1-5), (1-6), (1-7) can be rewritten as:
wherein:
q represents a quaternion form, and the three components can be regarded as b in a body coordinate systemk+1Relative to bkThe amount of exercise of (2) is obtained by respectively corresponding to the displacement, the velocity and the quaternion, and only the offset b of the accelerometer and the gyroscope needs to be considered in the three componentsaAnd bwAnd the state of the previous moment does not influence the stateBy using the method as an optimization variable, the overall calculation amount can be reduced to a certain extent. Secondly, in the case of small change between adjacent frames, the above three values can be approximated by using a first-order taylor expansion:
there are various pre-integration methods for IMU in discrete time, for example, euler integration method, RK4 integration method, median method, etc. are used, and in this embodiment, the median method is performed on the equation (1-13), and the following can be obtained:
at the initial moment of time, the device is started,all state values are 0 and noise value na、nwThe value of (a) is 0,is the unit quaternion and t is the time interval between adjacent measurements of the IMU.
Through the above process, the obtaining of the measurement value of the IMU pre-integration is already finished, and further, in order to complete the whole flow of the IMU pre-integration, when the method is applied to the nonlinear optimization, it is necessary to derive the covariance matrix of the equation and solve the corresponding jacobian matrix to obtain the result, so it is necessary to establish a linear gaussian error state recurrence equation using the measurement value, and further derive the error transfer function matrix from the equations (1-9) (1-10) (1-11) by using the covariance of the linear gaussian system, and the detailed form is shown in the following equations (1-15), where R is RkAnd Rk+1The linear relationship can be established by the integral equation of dynamics and obtained from the rotation matrix at the previous moment:
in the formulae (1-15):
for simplicity, the error transfer equation is abbreviated
δzk+1=Hδzk+Vn
The initial value of Jacobian is set as an identity matrix I, and the iterative formula is as follows:
Jk+1=FJk
from the error propagation model, an iterative formula of covariance can be obtained as
Pk+1=HPHT+VQVT
The covariance initial value is 0; the IMU pre-integration provides an initial value for observation for fusion initialization of subsequent information and provides a measurement item for iterative optimization. In an actual situation, in order to more accurately fuse the two pieces of information, the IMU and the camera are jointly calibrated through the step 1 to obtain distortion parameters of the IMU and the camera, and the distortion removal processing is performed on the obtained data to improve the accuracy of the data.
And (4) substituting the formula (1-15) back to the formula (1-13) to obtain a corresponding reference value, wherein the part is a numerical value component required by subsequent tight coupling.
In addition, in order to recover the actual scale of the three-dimensional space, the scale result can be obtained by solving the equations (1-16), and the part can be decomposed by LDLT to obtain the final scale information s, so that the actual depths of the feature points and the poses in the solution result and the result under the world coordinate system can be recovered. Wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera. R and p denote rotation matrix and translation, respectively. The formula (1-16) is optimized by using a least square method, the optimized gravity can be obtained, and c is obtained according to the change of the gravity0And (3) converting the coordinate system into a world coordinate system, and then transferring variables (three-dimensional characteristic points recovered by translation and triangulation in the visual epipolar geometry operation) in the initial camera coordinate system to the world coordinate system and recovering the scale until the initialization is finished.
The initialized visual image frame can adopt a sliding window method, and 10 frames of images are continuously ensured to be used for initialization until the initialization is successful. After the initialization is successful, enough three-dimensional space characteristic points x in the world coordinate system can be obtained, and then the solution of the characteristic points and the transformation matrix in the world coordinate system can be carried out by adopting a P3P and ICP method, so that the calculated amount is effectively reduced.
Tightly coupling the IMU data and the visual data to perform combined optimization to obtain a camera transformation matrix and optimized three-dimensional space characteristic points;
if only the pose calculation result is used without tight coupling optimization, errors are accumulated slowly along with the time, and all data are used as subsequent optimization production conditions, the calculated amount is excessively increased, so that the method selects a sliding window method for optimization, selects a part of frame information as key frame information and performs combined optimization with other solutions, and obtains the final optimized solution. The selection requirement of the key frame meets the large requirement of enough parallax, and the specific selection is according to actual reference. The parameter variables to be optimized can be represented by the equations (1-17)
Wherein xkN denotes a camera state at each time, including a positionSpeed of rotationAnd rotateAnd an accelerometer b at each momentaAnd a gyroscope bgThe bias matrix of (a) is,is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnFor inverse depth values (the inverse of the depth of the feature points in the normalized camera coordinate system, the number of which is the number of landmark feature points included in the keyframe, which conforms to the gaussian distribution and can simplify the computational constraints), the objective function for state estimation optimization can be further expressed as
In the equations (1-18), the first part represents the rimmed residual, the second part is the IMU residual, and the third part is the visual error function part.
The visual residual function obtains an optimal solution in part by minimizing the normalized coordinate error of the same feature point observed at different times. For the characteristic points, there are the following geometrical relationships (1-19)
Wherein T represents a transformation matrix, which represents that the camera coordinate system at the moment i is converted into an inertial coordinate system to a world coordinate system, and then the world coordinate system is converted into the inertial coordinate system at the moment j and then the camera coordinate system after four times of coordinate system conversion. Then, from the expressions (1-19), new relations (1-20) can be obtained
Wherein R and p correspond to the rotation and translation matrices after the transformation matrix is disassembled, and for the pinhole camera, there is a visual residual error formula
In combination with equations (1-20), the minimized visual residual can be obtained.
The second part is the error of the IMU component part according to the pre-integral formula (1-9) (1-10) (1-11), having the formula (1-22)
The first part is a marginalized residual part, and the purpose of introducing the marginalized residual is mainly for the trade-off of operation speed and performance. If only two adjacent frames of images are considered to optimize the pose, although the algorithm can be guaranteed to have good operation speed, error accumulation is quicker, and the operation of optimizing the information of all the images in each frame is very redundant and has large operation amount. In order to ensure that the operand is relatively fixed, a sliding window method is selected to carry out nonlinear optimization on the fixed frame number, and marginalization is introduced. The purpose of introducing the marginalization is to stop calculating the pose of the frame and the related landmark points thereof while keeping the window constraint of the edge frame, so as to accelerate the speed while ensuring the sparsity of the optimization matrix. Marginalization specifically considers the following two cases: one is that if a new reference frame is added into the sliding window, and the last frame existing in the current window is a key frame, the pose of the last frame is moved out, and the visual and inertial data related to the pose are rimmed and used as prior values; and when the penultimate frame is not a key frame, marginalizing the visually relevant portion while preserving the relevant IMU constraints. Namely, schur elimination element removes non-key information, matrix dimension expansion is carried out according to the quantity to be optimized, finally, an open source cerees base can be adopted to solve the overall optimization result, and finally pose information with smaller errors and a pose under a characteristic point world coordinate system are obtained. The part can finally obtain the transformation matrix between frames and the three-dimensional space point information for subsequent weak constraint transformation.
And (IV) processing and compensating the acquired three-dimensional motion vector and the space pose, namely pre-warping according to the acquired transformation matrix, and performing local grid transformation on the acquired feature points to acquire a final image stabilization result.
Before the previous steps are carried out, filtering is carried out on motion filtering according to the obtained inter-frame motion vector by using the steps in the classical image stabilization method, pre-warping is carried out, primary compensation is carried out on the algorithm according to a global motion result, secondary motion compensation is carried out by adopting weak constraint compensation, and a stable image result after compensation transformation is obtained. In the secondary motion compensation, the information mainly adopted is the three-dimensional space information acquired before, the three-dimensional space information is projected into a stable frame, the projection error of the corresponding point is observed, and weak constraint compensation is performed. The core idea of the weak constraint compensation part comprises the following two parts:
(1) the characteristic points in the image are corrected by adopting strong constraint, which may cause complete deformation of the image, generate time-domain incoherence and distort the transformed edge part. And the calculated characteristic points are used as weak constraints, so that distortion can be effectively distributed to other flat area parts of the image, and therefore, the method selects and adopts a weak constraint method, and the interferences are ignored visually under the condition of ensuring that the image is not distorted as much as possible.
(2) In order to maintain the content and detail in the image while the image is deformed, additional constraints are needed to make the before and after deformation conform to the similarity transformation. In addition, the part with sparse characteristic points can reduce the constraint of the part so as to distribute the image distortion to the visually unobvious area. To do this, the input video frame image may be processedThe system is divided into n x m parts uniformly, each part is called a grid, and a feature point exists in the grid.
The position of each obtained characteristic point in the image is projected asIn the algorithm, each feature point is represented by four vertexes of a grid where the feature point is located in a bilinear interpolation mode. Suppose thatFor vectors consisting of four vertices of a mesh, VkAre the corresponding four mesh vertices in the output image. The vector ω k is a bilinear interpolation coefficient having a sum of 1, soThe data item can be represented as a least squares problem:
wherein VkFour unknown vertices, PkAnd optimizing the position of the characteristic point through the nonlinearity. The purpose of using this data item is to make the feature point P in the output imagekAnd the characteristic points of the original imageThe distance between the interpolation positions of the grids is the minimum.
The similarity transformation term measures the difference between the grid cells in the output image and the similarity transformation of the corresponding input grid. Each mesh is divided into two triangles to build the model.
Where u and v are known values that can be found in the original image, when griddingU is 0 and V is 1 for a square, and V is not satisfied in the similarity transformation relation in the output image1Deviation from the ideal position, as shown in figure 3, the transformed ideal position is shown by the dotted line, V1Is the actual position. Then a similarity transformation is needed to be used so that the distance between the transformed position and the ideal position result satisfies the minimum relationship:
Es(V1)=ωs||V1-(V2+u(V3-V2)+vR90(V1-V1))||2 (1-25)
wherein, the weight omega used for calculating the optimal solution in each gridsSet as the two-norm of the color variance, then the variance for each mesh can be viewed as the sum of the similarity transformations after being divided into two triangles, with 8 terms in total. And when the formulas (1-23) and (1-25) obtain the minimum value at the same time, the ideal optimal coordinate of the output vertex in the grid can be obtained, then the standard texture mapping algorithm is adopted for solving, the generated output image after further image stabilization is obtained, then the image is cut to eliminate the black edge, and the final stable image frame can be obtained.
The method considers the problems of possible real-time property and the like in the future, slightly reduces the impression problem, keeps certain visual incoherence on a time domain within an acceptable range, preferentially selects the feature points continuously appearing in the previous key frame and the data frame as the feature points for image stabilization, and ensures the consistency of the algorithm as much as possible.
And then, continuously repeating the steps to obtain a stable output image sequence.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.
Claims (7)
1. A three-dimensional software image stabilization method based on visual inertial information fusion is characterized by comprising the following steps: the method comprises the following steps:
step 1: calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization;
step 2: performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
and step 3: tightly coupling IMU data and visual data to perform joint optimization, and acquiring a camera transformation matrix and optimized three-dimensional space characteristic points;
and 4, step 4: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
2. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 1, the camera and IMU calibration method is calibrated by referring to a github open source calibration tool Kalibr.
3. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in the step 1, errors of angular velocity and linear velocity are obtained through IMU calibration to correct an IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.
4. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 2, the process of performing monocular vision initialization is as follows:
image frame I obtained for monocular vision1,I2There is a common three-dimensional point P under the world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ]]T(ii) a The poses of the three-dimensional point in the two frames of images have a relation:
s1p1=KP,s2p2=K(RP+t)
x1=K-1p1,x2=K-1p2
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera; can obtain the pose relation
Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and then, obtaining the scale information of each characteristic point by utilizing an IMU pre-integration model.
5. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 4, characterized in that: the IMU pre-integration model is as follows:
obtaining the gyroscope bias b according to the IMU calibration of the step 1wAccelerometer bias baAnd additive noise na,nw;
Where the b subscripts indicate being in a body coordinate system,for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwThe gravity parameters under a world coordinate system; if the camera is at [ t ]k,tk+1]Image frames acquired in time are respectively k and k +1, wherein the corresponding positions of a certain characteristic point under a body coordinate system are respectively bkAnd bk+1And obtaining the transmission of the position, speed and direction values in the world coordinate system through IMU measurement values in a time interval as follows:
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,body seatThe standard system is transferred to a rotation matrix under the world coordinate; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:
wherein:
and performing first-order Taylor expansion on the formula to obtain an approximate value, and then performing intermediate value expansion to obtain:
and obtaining an error transfer function matrix:
wherein
Obtaining corresponding parameters by using the error transfer function matrix; by using
Obtaining the scale information s so as to recover the actual depth of each characteristic point and the result under the world coordinate system, wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera.
6. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: and 3, optimizing by adopting a sliding window method when tightly coupling the IMU data and the visual data for joint optimization, and selecting a part of frame information as key frame information to perform joint optimization with other solutions.
7. The three-dimensional software image stabilization method based on visual inertial information fusion of claim 6, wherein: the optimization process by adopting a sliding window method in the step 3 comprises the following steps:
using formulas
Representing the parameter variable to be optimized, where xkN denotes a camera state at each time, including a positionSpeed of rotationAnd rotateAnd bias matrix b of accelerometer and gyroscope at each momenta,bg;Is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnIs the inverse depth value; solving an objective function
And optimizing, wherein the first part of the objective function represents the marginalized residual error part, the second part is the IMU residual error, and the third part is the visual error function part.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497661.XA CN113240597B (en) | 2021-05-08 | 2021-05-08 | Three-dimensional software image stabilizing method based on visual inertial information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497661.XA CN113240597B (en) | 2021-05-08 | 2021-05-08 | Three-dimensional software image stabilizing method based on visual inertial information fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113240597A true CN113240597A (en) | 2021-08-10 |
CN113240597B CN113240597B (en) | 2024-04-26 |
Family
ID=77132324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110497661.XA Active CN113240597B (en) | 2021-05-08 | 2021-05-08 | Three-dimensional software image stabilizing method based on visual inertial information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113240597B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115147325A (en) * | 2022-09-05 | 2022-10-04 | 深圳清瑞博源智能科技有限公司 | Image fusion method, device, equipment and storage medium |
CN116208855A (en) * | 2023-04-28 | 2023-06-02 | 杭州未名信科科技有限公司 | Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110030994A (en) * | 2019-03-21 | 2019-07-19 | 东南大学 | A kind of robustness vision inertia close coupling localization method based on monocular |
CN110345944A (en) * | 2019-05-27 | 2019-10-18 | 浙江工业大学 | Merge the robot localization method of visual signature and IMU information |
CN110498039A (en) * | 2019-08-05 | 2019-11-26 | 北京科技大学 | A kind of intelligent monitor system based on bionic flapping-wing flying vehicle |
WO2020087846A1 (en) * | 2018-10-31 | 2020-05-07 | 东南大学 | Navigation method based on iteratively extended kalman filter fusion inertia and monocular vision |
WO2021027323A1 (en) * | 2019-08-14 | 2021-02-18 | 北京理工大学 | Hybrid image stabilization method and device based on bionic eye platform |
CN112749665A (en) * | 2021-01-15 | 2021-05-04 | 东南大学 | Visual inertia SLAM method based on image edge characteristics |
-
2021
- 2021-05-08 CN CN202110497661.XA patent/CN113240597B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020087846A1 (en) * | 2018-10-31 | 2020-05-07 | 东南大学 | Navigation method based on iteratively extended kalman filter fusion inertia and monocular vision |
CN110030994A (en) * | 2019-03-21 | 2019-07-19 | 东南大学 | A kind of robustness vision inertia close coupling localization method based on monocular |
CN110345944A (en) * | 2019-05-27 | 2019-10-18 | 浙江工业大学 | Merge the robot localization method of visual signature and IMU information |
CN110498039A (en) * | 2019-08-05 | 2019-11-26 | 北京科技大学 | A kind of intelligent monitor system based on bionic flapping-wing flying vehicle |
WO2021027323A1 (en) * | 2019-08-14 | 2021-02-18 | 北京理工大学 | Hybrid image stabilization method and device based on bionic eye platform |
CN112749665A (en) * | 2021-01-15 | 2021-05-04 | 东南大学 | Visual inertia SLAM method based on image edge characteristics |
Non-Patent Citations (1)
Title |
---|
程传奇;郝向阳;李建胜;刘智伟;胡鹏;: "基于非线性优化的单目视觉/惯性组合导航算法", 中国惯性技术学报, no. 05 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115147325A (en) * | 2022-09-05 | 2022-10-04 | 深圳清瑞博源智能科技有限公司 | Image fusion method, device, equipment and storage medium |
CN116208855A (en) * | 2023-04-28 | 2023-06-02 | 杭州未名信科科技有限公司 | Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system |
CN116208855B (en) * | 2023-04-28 | 2023-09-01 | 杭州未名信科科技有限公司 | Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113240597B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109166149B (en) | Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU | |
CN107255476B (en) | Indoor positioning method and device based on inertial data and visual features | |
WO2024045632A1 (en) | Binocular vision and imu-based underwater scene three-dimensional reconstruction method, and device | |
CN110068335B (en) | Unmanned aerial vehicle cluster real-time positioning method and system under GPS rejection environment | |
CN104732518B (en) | A kind of PTAM improved methods based on intelligent robot terrain surface specifications | |
CN112304307A (en) | Positioning method and device based on multi-sensor fusion and storage medium | |
CN112102458A (en) | Single-lens three-dimensional image reconstruction method based on laser radar point cloud data assistance | |
CN110349249B (en) | Real-time dense reconstruction method and system based on RGB-D data | |
CN113240597A (en) | Three-dimensional software image stabilization method based on visual inertial information fusion | |
JP2023505891A (en) | Methods for measuring environmental topography | |
CN114964276B (en) | Dynamic vision SLAM method integrating inertial navigation | |
CN112669354A (en) | Multi-camera motion state estimation method based on vehicle incomplete constraint | |
CN110942477A (en) | Method for depth map fusion by using binocular camera and laser radar | |
CN114998556B (en) | Virtual-real fusion method for mixed reality flight simulation system | |
CN113345032B (en) | Initialization map building method and system based on wide-angle camera large distortion map | |
Wang et al. | A novel binocular vision system for accurate 3-D reconstruction in large-scale scene based on improved calibration and stereo matching methods | |
CN113211433B (en) | Separated visual servo control method based on composite characteristics | |
Liu et al. | Dense stereo matching strategy for oblique images that considers the plane directions in urban areas | |
CN112767481B (en) | High-precision positioning and mapping method based on visual edge features | |
CN117115271A (en) | Binocular camera external parameter self-calibration method and system in unmanned aerial vehicle flight process | |
CN117237789A (en) | Method for generating texture information point cloud map based on panoramic camera and laser radar fusion | |
CN116182855B (en) | Combined navigation method of compound eye-simulated polarized vision unmanned aerial vehicle under weak light and strong environment | |
CN111145267B (en) | 360-degree panoramic view multi-camera calibration method based on IMU assistance | |
CN114485648B (en) | Navigation positioning method based on bionic compound eye inertial system | |
CN116630556A (en) | Method, system and storage medium for reconstructing map based on aerial map data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |