CN113240597B

CN113240597B - Three-dimensional software image stabilizing method based on visual inertial information fusion

Info

Publication number: CN113240597B
Application number: CN202110497661.XA
Authority: CN
Inventors: 唐成凯; 李冠林; 张怡; 张玲玲; 王晨; 程卓
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2024-04-26
Anticipated expiration: 2041-05-08
Also published as: CN113240597A

Abstract

The invention provides a three-dimensional software image stabilizing method based on visual inertia information fusion. Firstly, calibrating a camera and a visual inertial sensor, obtaining distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for obtaining an IMU error model of subsequent tight coupling joint optimization; secondly, monocular vision initialization is carried out, and sufficient three-dimensional feature points under a world coordinate system are obtained; thirdly, tightly coupling IMU data with visual data to perform joint optimization, and transforming a matrix and three-dimensional space feature points by a camera; fourth step: and pre-warping is carried out according to the obtained transformation matrix, and the obtained characteristic points are subjected to local grid transformation, so that a final image stabilizing result is obtained. The invention can effectively solve the problem of poor motion vector estimation performance of the classical SFM method, and improve the spatial position of the three-dimensional characteristic point of motion compensation and the overall motion vector precision.

Description

Three-dimensional software image stabilizing method based on visual inertial information fusion

Technical Field

The invention relates to a visual three-dimensional software image stabilizing method with an inertia component, in particular to a software image stabilizing (Software Stabilization) method based on a visual inertia information fusion (Visual Inertial Information Fusion) theory.

Background

The rapid development of the electronic information age brings about the technological flight, and the unmanned technology gradually develops from vehicles to the sky. Unmanned systems play an increasingly important role in the military and civil fields of urban transportation, battlefield strikes, and the like. The design of sensing, navigation and control systems is critical in the development of unmanned systems. The visual navigation technology is used for extracting rich navigation information from the environment image acquired by the camera in real time by the unmanned system, so that the unmanned system has the capability of interacting with the environment, and the automation and the intellectualization level of control are improved. However, undesirable random shake of the camera carrier may occur in some cases, in which case unstable and blurred pictures may occur in the video and image acquired by the camera carrier, resulting in reduced video quality and viewing effects. In order to ensure the use value of the acquired video sequence, the moving image pickup platform needs to be subjected to anti-shake processing, namely image stabilization processing.

The image stabilizing system can be divided into mechanical image stabilizing, optical image stabilizing and digital image stabilizing according to the image stabilizing mode. The mechanical image stabilizing platform is large in size, low in precision and easy to be influenced by external force errors such as friction force and wind resistance in actual use, so that the mechanical image stabilizing platform is uncomfortable and is used as a main means for stabilizing the image of the small rotor unmanned aerial vehicle; the optical image stabilizing effect is good, the image sequence can be stabilized by directly utilizing an optical method, but the achromatism treatment is required to be carried out on the variable optical wedge due to the existence of a secondary spectrum, so that the structure and the process of the image stabilizer are too complex, and the image stabilizing capability is limited greatly only by passive compensation of a prism, a reflecting mirror or the optical wedge and the like; the digital video image stabilization (Digital video stabilization DVS) technology is also called electronic image stabilization, is based on subjects such as computer vision, image processing, signal processing and the like, mainly relies on a computer and software codes to write an image stabilization algorithm to realize an image stabilization function, does not need other system support, and the written software codes can be transplanted, are convenient to maintain and update, have lower cost and have better precision. Digital video stabilization is an emerging technology developed over the last decades that generates stable video sequences, mainly by eliminating or reducing video jitter. The image stabilization method can be classified into 2D,2.5D and 3D methods according to the current research model. The 2D method has simple algorithm principle and small operation amount, but due to lack of depth of field, the image stabilizing effect is not natural enough, and the algorithm performance is limited; the 2.5D algorithm balances the 2D algorithm and the 3D algorithm, has higher algorithm complexity and better image stabilizing effect, and can process parallax, but the algorithm needs to adopt continuous occurrence characteristic points which are long enough in multi-frame images, generates tracks, stabilizes images according to certain constraint conditions, and is difficult to use for real-time image stabilization; the 3D algorithm has the highest complexity and the optimal performance, but the image stabilizing effect of the algorithm is limited by the performance of the motion restoration structure (Structure From Motion, SFM), so that it is difficult to obtain a good enough image stabilizing effect.

Disclosure of Invention

The invention aims to provide a three-dimensional software image stabilizing method based on visual inertia information fusion. The three-dimensional software image stabilizing method based on the visual inertia information fusion can be combined with a visual navigation method, reduces the calculated amount of an algorithm, has good image stabilizing performance, and can effectively improve the quality of an acquired image sequence.

The three-dimensional software image stabilizing method based on the visual inertia information fusion combines the advantages of high precision of the visual inertia method and the characteristic of relatively natural grid warpage change, and can effectively solve the problem of precision bottleneck existing in the conventional image stabilizing algorithm.

The technical scheme of the invention is as follows:

the three-dimensional software image stabilizing method based on visual inertia information fusion comprises the following steps:

Step 1: calibrating the camera and the IMU, obtaining distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for obtaining an IMU error model of subsequent tight coupling joint optimization;

Step 2: monocular vision initialization is carried out, and a sufficient amount of three-dimensional feature points under a world coordinate system are obtained;

Step 3: tightly coupling the IMU data with the visual data to perform joint optimization, and obtaining a camera transformation matrix and optimized three-dimensional space feature points;

Step 4: and pre-warping is carried out according to the obtained transformation matrix, and the obtained characteristic points are subjected to local grid transformation, so that a final image stabilizing result is obtained.

Further, in step 1, the camera and IMU calibration method refers to github open source calibration tool Kalibr for calibration.

In the step 1, an IMU error model is corrected by obtaining the errors of the angular speed and the linear speed through IMU calibration; obtaining a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through joint calibration, and obtaining a conversion matrix between the IMU and the camera carrier.

Further, in step 2, the process of monocular vision initialization is as follows:

For an image frame I ₁,I₂ obtained by monocular vision, a common three-dimensional point P exists under a world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ] ^T; the pose of the three-dimensional point in two frames of images has a relation:

s₁p₁＝KP,s₂p₂＝K(RP+t)

x₁＝K^-1p₁,x₂＝K^-1p₂

wherein R, t is a rotation matrix and a displacement vector between two adjacent frames of the camera; can be obtained according to the pose relation

Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and obtaining the scale information of each characteristic point by using the IMU pre-integration model.

Further, the IMU pre-integration model is:

Obtaining a gyroscope bias b _w, an accelerometer bias b _a and additional noise n _a,n_w according to the IMU calibration in the step 1; acquisition of accelerometer values using a 6-axis IMU Gyroscope acquisition of angular velocity values/>

Wherein b subscripts are indicated in the volumetric coordinate system,For the rotation matrix of world coordinate system transfer to body coordinate system, g ^w is the gravity parameter in world coordinate system; if the image frames acquired by the camera in the time of [ t _k,t_k+1 ] are k and k+1 respectively, wherein the corresponding positions of a certain feature point under the body coordinate system are b _k and b _k+1 respectively, the transmission of the position, speed and direction values under the world coordinate system in the time interval through the IMU measurement values is as follows:

Where Deltat _k is the time interval between [ t _k,t_k+1 ], The body coordinate system is transferred to a rotation matrix under world coordinates; further converting the position, speed and direction values in the world coordinate system into the body coordinate system:

Wherein:

performing first-order Taylor expansion on the obtained data to obtain an approximate value, and expanding the approximate value by using a median method to obtain the data:

And obtaining an error transfer function matrix:

Wherein the method comprises the steps of

Obtaining corresponding parameters by utilizing an error transfer function matrix; by means of

The scale information s is obtained so as to recover the actual depth of each feature point and the result under the world coordinate system, wherein the corner mark c ₀ represents the camera coordinate system under the first frame image of the camera.

Furthermore, in the step 3, when the IMU data and the visual data are tightly coupled to be jointly optimized, a sliding window method is adopted for optimization, and a part of frame information is selected as key frame information to be jointly optimized with other solutions.

Further, in the step3, the optimization process performed by adopting a sliding window method is as follows:

using the formula

Representing parameter variables to be optimized, where x _k, k=0,..n represents camera state at each time instant, including positionSpeed/>Rotation/>And a bias matrix b _a,b_g for the accelerometer and gyroscope at each moment; /(I)As a transformation matrix between a camera coordinate system and a body coordinate system, lambda _n is an inverse depth value; solving an objective function

The optimization is realized, the first part of the objective function represents an marginalized residual error part, the second part is an IMU residual error, and the third part is a visual error function part.

Advantageous effects

According to the invention, a motion estimation model based on monocular vision is established through a visual inertia fusion method, and the precision of the motion estimation method is improved by utilizing the motion estimation model. Compared with the traditional SFM motion estimation method, the visual inertial navigation information tight coupling method is added, and the overall accuracy of three-dimensional information acquired by the algorithm is improved.

The method of pre-warping and grid transformation local warping adopted by the invention effectively solves the parallax problem based on the traditional image stabilizing method by the motion compensation method, ensures that the result after image stabilization is more natural, and can effectively improve the image stabilizing visual effect of the image stabilizing algorithm.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

Fig. 1 is a epipolar geometry diagram, wherein adjacent image frames I ₁,I₂, let the motion of image frame I ₁→I₂ be R, t, and the center position of the camera in each image frame be O ₁,O₂. Assuming that the corresponding feature point of the image frame I ₁ is p ₁, and the corresponding feature point p ₂ of the image frame I ₂ is the result obtained by performing the RANSAC screening after the ORB feature point method matching, and can be regarded as the projection of the same point in two different image frame imaging planes. Thus, the connection line And/>In three dimensions, the two points intersect at a point P, and from O ₁,O₂ and P we can get a plane named polar plane (Epipolar plane), the line between the two points of O ₁,O₂ intersects with the image frame at two points e ₁,e₂, the two points are called poles (Epipoles), and O ₁O₂ is called base line (Baseline). l ₁,l₂ is the intersection of the two image frame planes I ₁,I₂ with the polar plane, called the epipolar line (Epipolar line).

Fig. 2 is a schematic diagram of IMU integration, wherein red represents visual data and green represents IMU data.

Figure 3 is a schematic view of a partial warp,Represents the grid vertices, u and v represent the transformed coordinate system/>, respectivelyCoordinates.

Detailed Description

Aiming at the problem that the precision of the traditional image stabilizing method cannot meet the requirement of a future high-precision image stabilizing algorithm, the invention provides a three-dimensional image stabilizing method based on visual inertial navigation information fusion. Mainly comprising two main parts of motion estimation and motion compensation. The motion estimation adopts a method combining visual inertia information, so that the problem of poor motion vector estimation performance of a classical SFM method can be effectively solved, and the spatial position of a three-dimensional characteristic point of motion compensation and the overall motion vector precision are improved. The motion compensation is responsible for pre-warping according to the acquired motion vector and the three-dimensional characteristic point cloud, and further compensating for three-dimensional image warping by grid segmentation, so as to acquire a result after image stabilization, thereby improving image stabilization precision.

The main idea of the invention is as follows: firstly, calibrating a camera and a visual inertial sensor, obtaining distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for obtaining an IMU error model of subsequent tight coupling joint optimization; secondly, monocular vision initialization is carried out, and sufficient three-dimensional feature points under a world coordinate system are obtained; thirdly, tightly coupling IMU data with visual data to perform joint optimization, and transforming a matrix and three-dimensional space feature points by a camera; fourth step: and pre-warping is carried out according to the obtained transformation matrix, and the obtained characteristic points are subjected to local grid transformation, so that a final image stabilizing result is obtained.

The specific flow is as follows:

Firstly, calibrating a camera and an IMU, obtaining distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for obtaining an IMU error model of subsequent tight coupling joint optimization;

The vision and IMU calibration specific calibration method can refer to github open source calibration tool Kalibr for calibration. The IMU calibration is used for obtaining errors of angular speed and linear speed, correcting an IMU error model and is used for scale obtaining and final overall optimization, the vision calibration is used for obtaining an internal reference matrix K and a distortion coefficient, the combination calibration can further correct the parameters and obtain a conversion matrix between the IMU and an image pickup carrier, and the method defaults to a monocular pinhole camera model. The distortion parameters include tangential distortion p ₁,p₂ and radial distortion k ₁,k₂,k₃:

Wherein X and Y are measured values of the normalized coordinates of the camera, X and Y are actual values under the normalized coordinates of the camera, r ²＝x²+y², and the distortion corrected true values X and Y can be obtained according to the formula (1-1).

Secondly, monocular vision initialization is carried out, and a sufficient number of three-dimensional feature points under a world coordinate system are obtained;

After distortion correction, the default input image frame may be undistorted. The pixels of the image frames employed thereafter have by default undergone correction. Initialization may then take place. As shown in fig. 1, in the image frame I ₁,I₂, there is a common three-dimensional point P in the world coordinate system, which is located at a position [ X, Y, Z ] ^T in the world coordinate system. According to the monocular camera model, the pose of the three-dimensional point in two frames of images has the relation of the formula (1-2):

wherein R, t is a rotation matrix and a displacement vector between two adjacent frames of the camera, and the formula (1-3) can be deduced from the formula (1-2):

The essential matrix E and the basic matrix F can be obtained by the formulas (1-3), and the rotation matrix R and the translation matrix t can be further obtained by the essential matrix E and the basic matrix F, for example, by an eight-point method or a five-point method. Triangulation may then be used to recover the relative depth of the feature points, which is a part of the relative depth that is limited primarily by the inability of the monocular camera to acquire the actual dimensions. And for the joint optimization scale, obtaining scale information by means of an IMU pre-integration model. In addition, another role of IMU pre-integration is to obtain a linear model of IMU integration error transfer formula for subsequent joint optimization.

IMU pre-integration model: according to the IMU calibration in step 1, the gyroscope bias b _w, the accelerometer bias b _a and the additive noise n _a,n_w can be obtained.

In order to make the IMU data information acquisition module universal, the 6-axis IMU is selected for acquiring data, so that the accelerometer value can be acquired through the accelerometerGyroscope acquisition of angular velocity values/>

Wherein b subscripts are indicated in the volumetric coordinate system,For the rotation matrix in which the world coordinate system is transferred to the body coordinate system, g ^w is the gravity parameter in the world coordinate system.

Assuming that the image frames acquired by the vision sensor in time [ t _k,t_k+1 ] are k and k+1 respectively, the corresponding positions under the body coordinate system are b _k and b _k+1 respectively, and the transmission of the speed and direction values over the time interval by the IMU measurement values is equations (1-5) to (1-8):

Where Deltat _k is the time interval between [ t _k,t_k+1 ], And (5) converting the body coordinate system at the moment t into a rotation matrix under world coordinates. As can be seen from FIG. 2, the state/>, of b _k+1 in the body coordinate systemDepending on the real-time state of the previous moment b _k, the overall calculation is greatly increased if the subsequent optimization procedure is performed directly. This is because the state results of the incoming IMU are retransmitted for each iteration for the optimization directly.

To fuse vision and IMU data, the position, speed and direction values in the world coordinate system are converted into a rotation matrix in the body coordinate system, i.e. the rotation matrix is multiplied by the world coordinate system w to the body coordinate system b _k at the corresponding momentFormulas (1-5), (1-6), and (1-7) are rewritable:

Wherein:

q represents a quaternion form, and the three components can be regarded as the motion amount of b _k+1 relative to b _k in a body coordinate system, and respectively correspond to displacement, speed and quaternion, so that only the offsets b _a and b _w of the accelerometer and the gyroscope need to be considered in the three components, and the state at the previous moment can not influence the offsets, and the offsets are taken as optimization variables, so that the overall calculated amount can be reduced to a certain extent. Secondly, in the case of small variation between adjacent frames, the three values may be approximated by using a first-order taylor expansion:

There are various ways of pre-integrating IMU in discrete time, such as using euler integration, RK4 integration, median, etc., and in this embodiment, the median of formulas (1-13) is developed to obtain:

At the beginning of the time of the day, The state values are all 0, the noise value n _a、n_w is 0,/>And the unit quaternion is given, and t is the adjacent measurement time interval of the IMU.

Through the above process, the acquisition of the measurement value of the IMU pre-integration is finished, further, in order to complete the whole process of the IMU pre-integration, the method is applied to nonlinear optimization, a covariance matrix of an equation needs to be derived, and a corresponding jacobian matrix is solved to acquire a result, so that a linear gaussian error state recurrence equation needs to be established by using the measurement value, further derivation is performed by using the covariance of a linear gaussian system, an error transfer function matrix can be acquired by the following formulas (1-9) (1-10) (1-11), and the detailed form is shown in the following formulas (1-15), wherein, a linear relation can be established by a dynamic integral equation, and the R _k and the R _k+1 can be acquired by a rotation matrix at the previous moment:

in the formula (1-15):

For simplicity, the error transfer equation is abbreviated as

δz_k+1＝Hδz_k+Vn

The initial value of Jacobian is set as the identity matrix I, and the iterative formula is as follows:

J_k+1＝FJ_k

According to the error propagation model, a covariance iteration formula can be obtained as follows

P_k+1＝HPH^T+VQV^T

The initial covariance value is 0; the IMU pre-integration provides an initial value for observation for fusion initialization of subsequent information and provides a measurement term for iterative optimization. In practical situations, in order to more accurately fuse the two information, the IMU and the camera are firstly calibrated in a combined mode through the step1, distortion parameters of the IMU and the camera are obtained, de-distortion processing is conducted on the obtained data, and accuracy of the data is improved.

The corresponding reference value can be obtained by substituting the formulas (1-15) back (1-13), and the corresponding reference value is the numerical component required by the subsequent tight coupling.

In addition, to restore the actual scale of the three-dimensional space, the scale result can be obtained by solving the formulas (1-16), the part can be decomposed by LDLT to obtain the final scale information s, and then the actual depth of the feature points and the pose in the solving result and the result under the world coordinate system can be restored. Wherein the corner mark c ₀ represents the camera coordinate system under the first frame image of the camera. R and p represent rotation matrix and translation, respectively. And (3) optimizing the formulas (1-16) by using a least square method, obtaining optimized gravity, obtaining a conversion relation from a c ₀ coordinate system to a world coordinate system according to gravity change, transferring variables (three-dimensional feature points restored by translation and triangulation in visual epipolar geometry operation) under an initial camera coordinate system to the world coordinate system, and restoring the dimensions until the initialization is completed.

The initialized visual image frames can adopt a sliding window method, and 10 images are continuously ensured to be used for initialization until the initialization is successful. After the initialization is successful, enough three-dimensional space feature points x under the world coordinate system can be obtained, and then the feature points and the transformation matrix under the world coordinate system can be solved by adopting P3P and ICP methods, so that the calculated amount is effectively reduced.

Performing joint optimization on the IMU data and visual data in a tight coupling way to obtain a camera transformation matrix and optimized three-dimensional space feature points;

If the pose result is calculated only without tight coupling optimization, errors are slowly accumulated along with the time, and the calculated amount is excessively increased when all data are used as the subsequent optimized production conditions, so that the method selects a sliding window method for optimization, selects a part of frame information as key frame information to be combined with other solutions for optimization, and obtains the final optimized solution. The selection requirement of the key frame meets the requirement of enough parallax, and the specific selection is based on actual references. Then the parameter variables to be optimized can be represented by equations (1-17)

Where x _k, k=0,..n represents the camera state at each time instant, including positionSpeed/>Rotation/>And bias matrices for accelerometer b _a and gyroscope b _g at each moment,/>For the transformation matrix between the camera coordinate system and the body coordinate system, lambda _n is an inverse depth value (the inverse value of the depth of the feature points of the normalized camera coordinate system, the number is the number of the landmark feature points contained in the key frame, the value accords with Gaussian distribution and can simplify operation constraint), and according to the formula (1-17), the objective function of state estimation optimization can be further expressed as

In equations (1-18), the first part represents an marginalized residual portion, the second part is an IMU residual, and the third part is a visual error function portion.

The visual residual function portion obtains an optimal solution by minimizing normalized coordinate errors observed at different times for the same feature point. For the feature points, there are the following geometrical relations (1-19)

Wherein T represents a transformation matrix, and represents that the transformation matrix is transformed into an inertial coordinate system from a camera coordinate system at the moment i to a world coordinate system, and then into the inertial coordinate system from the world coordinate system to the moment j and then into the camera coordinate system by four coordinate system transformation. Then a new relation (1-20) can be obtained from the relation (1-19)

Wherein R and p correspond to rotation and translation matrices after the transformation matrix is disassembled, and for a pinhole camera, a visual residual error formula is provided

Combining equations (1-20) may result in a minimized visual residual portion.

The second part is the error of the IMU component part, which is expressed by the formula (1-22) according to the pre-integral formula (1-9) (1-10) (1-11)

The first part is an marginalized residual part, and the purpose of introducing the marginalized residual is mainly considered in terms of the compromise of operation speed and performance. If the pose is optimized by considering only two adjacent frames of images, although the algorithm can be guaranteed to have good operation speed, error accumulation can be quicker, and the optimization of the information of all the images by operating each frame can be quite redundant and has large operation amount. In order to ensure that the operand is relatively fixed, a sliding window method is adopted to carry out nonlinear optimization on the fixed frame number, and marginalization is introduced. The purpose of introducing marginalization is to stop calculating the pose of the frame and the related road sign point while keeping the constraint of the edge frame on the window so as to accelerate the speed while ensuring the sparsity of the optimized matrix. Marginalization specifically considers two cases: one is to add a new reference frame in the sliding window, and the last frame existing in the current window is a key frame, then the pose of the last frame is shifted out, and the visual and inertial data related to the last frame are marginalized to be used as prior values; and when the penultimate frame is not a key frame, the visually relevant portion is marginalized while the relevant IMU constraints are preserved. And performing schur elimination, namely removing non-key information, performing matrix dimension expansion according to the quantity to be optimized, and finally solving the overall optimization result by adopting an open source ceres library to finally obtain pose information with smaller error and pose under a feature point world coordinate system. The part can finally obtain the transformation matrix between frames and the three-dimensional space point information for the subsequent weak constraint transformation.

And fourthly, processing the acquired three-dimensional motion vector and the spatial pose, compensating, namely, pre-warping according to the acquired transformation matrix, and carrying out local grid transformation on the acquired characteristic points to acquire a final image stabilizing result.

Before the steps, the motion filtering uses the steps in the classical image stabilizing method to filter according to the acquired inter-frame motion vector, pre-warping is carried out, the algorithm is compensated for the first time according to the global motion result, and then the weak constraint compensation is adopted to carry out the secondary motion compensation, so that the stabilized image result after compensation transformation is acquired. In the secondary motion compensation, the information mainly adopted is the three-dimensional space information acquired before, the three-dimensional space information is projected into a stable frame, the projection error of a corresponding point is observed, and weak constraint compensation is carried out. The weak constraint compensation part core thought comprises the following two parts:

(1) The feature points in the image are corrected by strong constraint, which may cause complete deformation of the image, generate time domain incoherence and distort the transformed edge portions. The calculated characteristic points are used as weak constraint, so that the distortion can be effectively distributed to other flat area parts of the image, and the method selects the weak constraint method, so that the interference is ignored visually under the condition of ensuring that the image is not distorted as much as possible.

(2) In order to maintain the content and details in the image while the image is deformed, additional constraints are required to make the deformation before and after the deformation conform to the similarity transformation. In addition, the part with sparse feature points can reduce the constraint of the part to distribute image distortion to areas with no visual significance. To do this, the input video frame image may be processedEvenly divided into n x m parts, each part being called a grid in which a feature point exists.

The position of each obtained characteristic point in the image is projected asIn the algorithm, each characteristic point is represented by four vertexes of a grid where the characteristic point is located by using a bilinear interpolation mode. Hypothesis/>V _k is the corresponding four mesh vertices in the output image, which is a vector of four vertices of the mesh. The vector ωk is a bilinear interpolation coefficient with a sum of 1, soThe data item can be expressed as a least squares problem:

Where V _k is four unknown vertices and P _k is the position of the nonlinear optimized feature points. The purpose of this data item is to make the feature point P _k in the output image and the original image feature point The interpolation position interval distance of the grid is the smallest.

The similarity transformation term measures the difference between the grid cells in the output image and the similarity transformation of the corresponding input grid. Each mesh is divided into two triangles to build the model.

Where u and V are known values that can be found in the original image, u=0, v=1 when the grid is square, and V ₁ is deviated from the ideal position when the triangle in the output image does not satisfy the similarity transformation relationship, as shown in fig. 3, the ideal position after transformation is shown by the dotted line, and V ₁ is the actual position. Then a similarity transformation is required such that the distance between the transformed position and the ideal position result satisfies a minimum relationship:

E_s(V₁)＝ω_s||V₁-(V₂+u(V₃-V₂)+vR₉₀(V₁-V₁))||² (1-25)

The weight ω _s for calculating the optimal solution in each grid is set as a two-norm of the color variance, so that the change of each grid can be regarded as a sum of two triangles and then similar transformation is performed, and total 8 terms are obtained. When the formulas (1-23) and (1-25) simultaneously obtain the minimum value, the ideal optimal coordinate of the output vertex in the grid can be obtained, then the optimal coordinate is solved by adopting a standard texture mapping algorithm, the generated output image after further image stabilization is obtained, and the image is cut to eliminate black edges, so that the final stable image frame can be obtained.

The method considers the problems of possible real-time performance and the like in the future, selects the problem of slightly reducing the look and feel, keeps certain visual dissonance in the time domain within an acceptable range, preferentially selects the characteristic points continuously appearing in the previous key frames and data frames as characteristic points for stabilizing the image, and ensures the continuity of the algorithm as much as possible.

And then, continuously repeating the steps to obtain a stable output image sequence.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims

1. A three-dimensional software image stabilizing method based on visual inertia information fusion is characterized in that: the method comprises the following steps:

when the IMU data and the visual data are tightly coupled to be jointly optimized, a sliding window method is adopted for optimization, and a part of frame information is selected as key frame information to be jointly optimized with other solutions;

The optimization process by adopting the sliding window method comprises the following steps:

using the formula

Optimizing, wherein the first part of the objective function represents an marginalized residual error part, the second part is an IMU residual error, and the third part is a visual error function part;

2. The three-dimensional software image stabilization method based on visual inertia information fusion according to claim 1, wherein the method comprises the following steps: in step 1, the camera and IMU calibration method refers to github open source calibration tool Kalibr for calibration.

3. The three-dimensional software image stabilization method based on visual inertia information fusion according to claim 1, wherein the method comprises the following steps: in the step 1, an IMU error model is corrected by obtaining errors of angular speed and linear speed through IMU calibration; obtaining a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through joint calibration, and obtaining a conversion matrix between the IMU and the camera carrier.

4. The three-dimensional software image stabilization method based on visual inertia information fusion according to claim 1, wherein the method comprises the following steps: in the step 2, the monocular vision initialization process is as follows:

s₁p₁＝KP,s₂p₂＝K(RP+t)

x₁＝K^-1p₁,x₂＝K^-1p₂

5. The three-dimensional software image stabilization method based on visual inertia information fusion according to claim 4, wherein the method comprises the following steps: the IMU pre-integration model is as follows:

Wherein:

And obtaining an error transfer function matrix:

Wherein the method comprises the steps of