CN113436261B

CN113436261B - Monocular vision inertial positioning method for automatic driving of closed park

Info

Publication number: CN113436261B
Application number: CN202110706284.6A
Authority: CN
Inventors: 秦晓辉; 马珅; 刘海涛; 尚敬; 胡云卿; 王晓伟; 边有钢; 徐彪; 谢国涛; 秦兆博; 秦洪懋; 胡满江; 丁荣军
Original assignee: Hunan University; CRRC Zhuzhou Institute Co Ltd
Current assignee: Hunan University; CRRC Zhuzhou Institute Co Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2022-04-29
Anticipated expiration: 2041-06-24
Also published as: CN113436261A

Abstract

The invention discloses a monocular vision inertial positioning method for automatic driving of a closed park, which comprises the following steps: s0, reading multi-frame continuous monocular images and IMU measurement information, and performing initialization processing; s1, tracking the feature points of the previous frame, extracting the feature points from the current frame, and processing the IMU measurement information of the current frame and the previous frame; s2, solving the pose of the current frame; s3, decoupling the translation vector and the depth of the feature point from the rotation vector, determining abnormal points in the feature point of the current frame, and removing the abnormal points; s4, selecting a visual reprojection error and an IMU residual error in a sliding window as cost functions, constructing a least square problem, and solving the pose of the vehicle; s5, setting an optimization mode according to the uncertainty of IMU measurement information; s6, in the sliding window, if the previous frame of the current frame is a non-key frame, removing the previous frame; otherwise, removing the initial frame of the sliding window; go to S1 after the frame is removed. The invention can ensure that the vehicle can realize real-time stable and accurate positioning under various working conditions of the park.

Description

Monocular vision inertial positioning method for automatic driving of closed park

Technical Field

The invention relates to the technical field of automatic driving, in particular to a monocular vision inertial positioning method for automatic driving in a closed park.

Background

In recent years, automatic driving at a level of L3 or higher has received attention from a large number of enterprises and universities, and a closed park is considered as a priority scene for the application of current automatic driving technology due to the characteristics of a vehicle speed of 30km/h or less, fixed lines, few dynamic objects and the like. In order to ensure that the destination is reached safely and reliably in the park, it is important to solve the problem of "where i am" and how to go "of the unmanned vehicle, and therefore an accurate and robust positioning technology is needed.

A common positioning method for autopilot is a combined inertial navigation system consisting of a global navigation satellite system positioning unit (GNSS) and an inertial unit (INS). Most vehicles adopt a satellite positioning mode of installing double antennas to obtain absolute three-dimensional coordinates of the vehicles, positioning accuracy in open areas can reach centimeter level, but GNSS signals are unstable under the conditions of tree and building shading and bad weather, and positioning errors are large. The IMU (English is called as 'Inertial measurement Unit' in all, Chinese is called as 'Inertial measurement Unit') is not interfered by external measurement, and the pose obtained by self high-frequency integration and the low-frequency GNSS pose cooperate together to finish high-frequency and high-precision positioning. If the GNSS signal is weak for a long time, it drifts due to noise and bias of the IMU by simply relying on the inertial navigation system. In recent years, many patents adopt a synchronous positioning and mapping (SLAM) method, and by mounting a laser radar or a camera, autonomous positioning and navigation can be performed in an unknown environment. The laser SLAM is accurate and stable in positioning, but the laser radar is high in cost; visual SLAM positioning has abundant information and low cost, but positioning is inaccurate in the cases of pure rotation movement, sharp turning and the like.

The related patents of the current automatic driving positioning system propose techniques that combine the above positioning methods to solve each other's drawbacks. The automatic driving positioning method of the commercial vehicle proposed by Jiangling automobile shards judges whether the front of the vehicle is a tunnel or gallery bridge scene or not through a camera, if so, the laser SLAM is used for positioning, and otherwise, the combined inertial navigation positioning is used. The automatic driving positioning method proposed by Kuwa robot, Anhui, can automatically switch between the combined inertial navigation, laser SLAM and visual SLAM positioning methods according to requirements. The map fusion method based on the laser SLAM and the visual SLAM, which is provided by Hangzhou electronic technology university, can overcome the defect of single SLAM and make the fused map more suitable for navigation.

The patent shows that the multi-sensor fusion positioning is a trend, and compared with the single-sensor positioning, the positioning precision is higher and the stability is better. However, in addition to the positioning effect, the system usability, the price of the sensor, and the like are also considered.

Disclosure of Invention

Aiming at the problems, the invention aims to provide a monocular vision inertial positioning system for automatic driving of a closed park, which does not need prior information of the environment, ensures that vehicles realize real-time stable and accurate positioning under various working conditions of the park by fusing the characteristics of absolute scale and quick response of an IMU (inertial measurement Unit) and the characteristics of simple monocular vision calibration and accurate pose, and has the advantages of low cost, small volume and good economical efficiency.

In order to achieve the aim, the invention provides a monocular vision inertial positioning method for automatic driving of a closed park, which comprises the following steps:

s0, reading multi-frame continuous images and corresponding IMU measurement information, performing initialization processing, determining IMU parameters, and initially setting an optimization mode as a mode A mode, wherein in the mode A mode, a rotation vector and a translation vector obtained by IMU rotation integration and translation integration are used as poses of image frames;

s1, reading IMU measurement information corresponding to the current frame, tracking feature points of the previous frame, extracting a preset number of feature points from the current frame, and processing the IMU measurement information between the current frame and the previous frame by adopting a pre-integration method;

s2, based on the optimization mode, adopting the feature points of the current frame or IMU measurement value integral to obtain the pose of the current frame;

s3, determining abnormal points in the feature points of the current frame by adopting a method of decoupling the translation vector and the depth of the feature points from the rotation vector and solving normalized reprojection errors, and removing the abnormal points;

s4, selecting corresponding visual reprojection errors and IMU residual errors in a sliding window as cost functions according to the set optimization mode, constructing a least square problem, and solving through nonlinear optimization to obtain a vehicle pose;

s5, setting an optimization mode according to the uncertainty of IMU measurement information, namely selecting one of mode I, mode G and mode A, wherein in the mode I, the pose of the image frame is solved through visual PnP by taking a rotation vector obtained by IMU rotation integration as an initial value, and in the mode G, the pose of the image frame is solved through visual PnP by taking a translation vector obtained by IMU translation integration as an initial value;

s6, in the sliding window, if the previous frame of the current frame is a non-key frame, removing the previous frame; otherwise, removing the initial frame of the sliding window; go to S1 after the frame is removed.

Further, the initialization step S0 specifically includes the following steps:

s01, reading in a frame of image as a current frame and reading in corresponding IMU measurement information, and if only one frame of image of the current frame exists in the sliding window, extracting M characteristic points from the current frame; otherwise, tracking the feature points of the previous frame by using an LK optical flow, and extracting appropriate feature points from the current frame to ensure that the total number of the feature points of the current frame is M;

if the current frame is the first two frames in the sliding window, setting the current frame as a key frame; otherwise, judging whether the previous frame is a key frame based on the parallax between the previous frame and the previous frame of the previous frame; specifically, if the disparity between the previous frame and the previous frame of the previous frame is greater than or equal to a set threshold, setting the previous frame as a key frame; otherwise, setting the previous frame as a common frame;

s02, if the number of the images in the sliding window reaches the size of the sliding window, the step goes to S03; otherwise, returning to S01;

s03, solving the pose of each frame by using the visual features of the image in the sliding window, constructing an initialization cost function from the perspective of maximum posterior estimation, estimating the initial value of the state quantity of the cost function in a self-adaptive manner, optimizing by combining the pose of each frame, if the optimization result is converged, successfully initializing, solving IMU parameters, and ending the initialization step S0; otherwise go to S01.

Further, in step S03, the pose is solved using the visual features by:

s031, calculate the parallax error of the zeroth frame and the last frame in the sliding window, if the parallax error is greater than the second preset threshold, use the epipolar geometry method to solve the pose transformation of the zeroth frame and the last frame, change to step S032; if the parallax is not greater than the second preset threshold or greater than the second preset threshold but the pose transformation between the zeroth frame and the last frame cannot be solved, returning to the step S01 after the frames are removed from the sliding window;

s032, setting a coordinate system where a zeroth frame is located as a world coordinate system, setting a pose as an identity matrix, solving the pose of each frame by using a feature matching relation between frames in a sliding window, wherein the pose represents the conversion from a camera coordinate system to the world coordinate system, recovering a 3D point by using a triangulation method, constructing a cost function by using the pose of each frame in the sliding window and the 3D point coordinate as reprojection errors of an optimized state quantity, optimizing, and if the pose and the 3D point coordinate do not converge after optimization, returning to the step S01 after removing the frames in the sliding window.

Further, in step S03, an initialization cost function is constructed, initial values of state quantities of the cost function are adaptively estimated, and an IMU parameter value is obtained by optimizing in combination with a pose:

s033, calculating IMU pre-integration and covariance between frames in the sliding window;

s034, assuming that the external reference of rotation between the camera and the IMU is known, converting the pose of each frame in the sliding window from the current camera coordinate system to the Body coordinate system of the IMU;

s035, initializing optimized state quantities including scale, gravity direction, speed, accelerometer deviation and gyroscope deviation, calculating the initial value of the speed by the translation of the image frame, defaulting the accelerometer deviation and the gyroscope deviation to zero, and estimating the initial values of the gravity direction and the scale by assuming that the initial noise of the velocity and the displacement pre-integration is zero;

s036, constructing an initialization cost function by using IMU pre-integration residual errors, adding a prior to the accelerometer, wherein the order of the prior covariance diagonal value of the accelerometer is the same as the order of the covariance matrix diagonal value among accelerometer deviations, and the latter is directly read out from the IMU covariance matrix in S033;

s037, optimizing to obtain IMU parameter values, converting the world coordinate system of the pose in the sliding window into a northeast coordinate system, wherein the pose represents the conversion from the Body coordinate system of the IMU to the northeast coordinate system, namely the final output vehicle pose.

Further, in step S035, IMU parameters are defined as state quantities

i∈[0，WZ]The conventional pre-integration residual equation is described as r_I＝[r_ΔR，r_Δv，r_Δp]：

In the formula (I), the compound is shown in the specification,

is the size of the window;

is a scale factor;

is a matrix R_BILie algebra of

Is the gravity coordinate g of the northeast_IBody coordinate system G from (0, 0, G) to IMU_BThe rotation matrix of (a) is,

is the magnitude of gravity;

b_ais the deviation of the accelerometer; b_gIs the deviation of the gyroscope;

the speed of the ith frame in the sliding window is the non-scale speed; r_iIs a rotation matrix of the ith frame, R_jRotation matrix for j frame, Δ R_ijFor the rotation pre-integration between the ith and jth frames, (. DEG)_vecIndicating the lie algebra of the quaternion corresponding to the rotation matrix;

is the non-scale speed of the jth frame in the sliding window, Δ v_ijPre-integrating the speed between the ith frame and the jth frame;

non-scale translation, Δ p, for finding visual structure_ijPre-integrating the translation between the ith frame and the jth frame; Δ t_ijThe time elapsed from the ith frame to the jth frame.

Further, in step S036, the IMU parameter χ is estimated from the maximum a posteriori estimation problem perspective, defining I_i，i+1Is a pre-integration value, I, between I and I +1 frames_0：WZThe posterior distribution P (χ | I) is the set of pre-integrated values for successive IMU measurements within the sliding window_0：WZ) The description is as follows:

P(χ|I_0：WZ)∝P(I_0：WZ|χ)P(χ)

wherein P (x) is a prior distribution, P (I)_0：WZ| χ) is the likelihood distribution, which is decomposed as:

by solving for the maximum likelihood distribution argmax of χ_χP(I_0：WZ| χ) to estimate χ;

suppose P (I)_0：WZI% follows Gaussian distribution to make P (I)_0：WZ| χ) is equivalent to solving the parameter with the minimum negative logarithm to obtain the initialized objective function:

wherein r is_p＝b_prior-b_aFor the accelerometer to be a priori the residual error,

for IMU pre-integration residual, Σ, between frames in a sliding window_pIs r_pThe covariance of (a) of (b),

is composed of

Of (a) covariance, Σ_pIs a diagonal matrix;

update χ using the perturbation model:

to keep s positive, define the update mode as:

s＝s*exp(δs)

by updating theta in streaming space_BI：

δθ_BI＝(δg_BI，0)

θ_BI＝Log(Exp(θ_BI)*Exp(δθ_BI))

Where δ s is the amount of disturbance of s, δ g_BIθ represents (δ α, δ β)_BIThe amount of disturbance of (a) is,

θ_BIthrough g_IAnd g_BObtaining Exp (-) as an exponential function, Exp (-) as_BIAn exponential mapping function for transforming the rotation matrix into θ_BIA logarithmic mapping function of;

if the initialized target function does not converge, the frame is removed through the sliding window, and the step S1 is returned to.

Further, step S3 includes:

s31, setting the optimized state quantity of each frame in the sliding window as a translation vector and the depth of the feature points, constructing a cost function by using the visual reprojection error, and solving the reprojection error size of all the feature points;

s32, decoupling the feature point optical flow into an optical flow only related to translation and feature point depth by utilizing the rotation vector;

s33, taking the two norms of the decoupled feature point optical flows as the normalization terms of the corresponding reprojection errors, solving the normalized reprojection errors of all the feature points, and taking the set multiple 3 times of the average number of the feature points as the threshold value of the abnormal point in judgment; and if the normalized reprojection error of the current feature point is larger than a threshold value, the feature point is regarded as an abnormal point and removed.

Further, in step S4, the method for selecting the corresponding visual reprojection error and IMU residual as the cost function in the sliding window is as follows:

if the mode I is adopted, constructing a cost function by using the visual reprojection error;

if the mode G is adopted, constructing a cost function by using the rotation pre-integration residual error and the visual reprojection error;

if in mode A, a cost function is constructed by the rotation pre-integration residual, the velocity pre-integration residual, the translation pre-integration residual, the gyroscope and accelerometer residual, and the visual re-projection error.

Further, in step S5, the method for determining the optimization mode according to the IMU measurement information or the uncertainty of the IMU measurement value is as follows:

s51, if the current optimization mode is mode I or mode G, constructing a cost function by using the rotation pre-integration residual error and the visual residual error and solving a least square problem to obtain a covariance matrix of gyroscope deviation of the current frame, and taking the trace as uncertainty, wherein the covariance matrix of the gyroscope deviation can obtain a corresponding diagonal block from the inversion of a Hessian matrix for calculation; when the uncertainty is lower than a third set threshold, setting the optimization mode to be mode G, and entering S52; otherwise, setting the optimization mode to mode I, and turning to the step S6;

s52, if the current optimization mode is mode A, constructing a cost function by using rotation, translation, speed pre-integration residual errors and visual residual errors, firstly checking uncertainty of gyroscope deviation of the current frame, and if the uncertainty exceeds a fourth set threshold value, setting the optimization mode as mode I; otherwise, the uncertainty of the scale is evaluated, and if the variance value relative to the current scale estimate is higher than a fifth set threshold, the optimization mode is set to mode G, otherwise the optimization mode is set to mode a.

Further, in step S51, the state quantities are pose and gyro biases, and the following objective functions are established with the rotation pre-integration residual and the vision residual:

where ρ is the Huber kernel function, e_R、e_visRespectively, the rotational pre-integration residual and the visual residual, sigma_R、∑_visCovariance matrices of the rotation pre-integration and the vision respectively;

the rotation pre-integration residuals and covariance matrix have been calculated in the IMU pre-integration of S1;

the visual residual and its covariance matrix can be obtained from the following equation:

wherein the content of the first and second substances,

and

denotes the linearization point, J_iFor the jacobian matrix of the visual reprojection errors in the step S4 nonlinear optimization,

respectively a rotation matrix and a translation vector of the ith frame from a camera coordinate system to a northeast coordinate system;

in step S52, the state quantities are pose, gyroscope bias, accelerometer bias, velocity, scale, and gravity direction, and the following objective functions are established with the rotation pre-integration residual, the velocity pre-integration residual, the translation pre-integration residual, and the vision residual:

wherein e is_RvpFor the pre-integrated residual in S1, e_visIs the visual residual, Σ, in S51_Rvp、∑_visCovariance matrices, e, of pre-integration and vision, respectively_R、e_v、e_pRespectively a rotation pre-integration residual error, a speed pre-integration residual error and a translation pre-integration residual error.

Due to the adoption of the technical scheme, the invention has the following advantages: 1. the invention adopts an initialization method based on maximum posterior estimation, correctly considers the correlation among IMU parameters, does not make any assumption on the initial value of the parameters, does not assume that the IMU deviation is zero, estimates all IMU parameters at one time, and solves the problem of inconsistency caused by adopting a staged estimation method in the past initialization; 2. the method for removing the outlier based on the depth normalization reprojection error of the translation vector and the feature point is used, and the influence of the feature point position and the motion estimation error factor is eliminated; 3. the invention adopts a multi-mode optimization method based on IMU measurement information or IMU measurement value uncertainty, and avoids the problem of information deficiency caused by pure rotation motion, uniform motion and the like by correctly considering the uncertainty of the IMU measurement information or the IMU measurement value, thereby maintaining accurate attitude optimization; 4. the invention realizes a complete monocular vision inertial positioning system, not only ensures accurate and robust real-time positioning in a closed park, but also has simple equipment and good economical efficiency.

Drawings

Fig. 1 is a flow chart of an embodiment of a method according to the invention.

FIG. 2 is a diagram of an initialization factor graph according to an embodiment of the method of the present invention.

Fig. 3 and 4 are schematic views of factor graphs of an IMU measurement information uncertainty determination method of the present invention.

Detailed Description

In the drawings, the same or similar reference numerals are used to denote the same or similar elements or elements having the same or similar functions. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

In the description of the present invention, the terms "central", "longitudinal", "lateral", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the scope of the present invention.

The monocular vision inertial positioning method for automatic driving of the closed park, provided by the embodiment of the invention, comprises the following steps:

as shown in fig. 1, feature points of a monocular image (hereinafter, may be described as "image" or "XX frame") are extracted and tracked, a current key frame is determined, and simultaneously, input IMU measurement information or IMU measurement values are pre-integrated; after initialization is successful, firstly removing abnormal points, then obtaining the vehicle pose through nonlinear optimization, and determining an optimization mode by using an optimization result; and finally, removing frames by using a sliding window method to ensure real-time performance and accuracy.

The main steps are detailed as follows:

s0. initialization step: reading in multi-frame continuous images and corresponding IMU measurement information or IMU measurement values, initializing the IMU measurement information or the IMU measurement values, determining IMU parameters, and initially setting an optimization mode to be a mode A mode, wherein in the mode A mode, a rotation vector and a translation vector obtained by IMU rotation integration and translation integration are directly used as the poses of image frames.

S1, reading IMU measurement information corresponding to a current frame, tracking feature points of a previous frame, extracting a preset number of feature points from the current frame, and processing the IMU measurement information between the current frame and the previous frame by adopting a pre-integration method.

And S2, based on the optimization mode, adopting the feature points of the current frame or IMU measurement value integrals to obtain the pose of the current frame.

S3, determining abnormal points in the feature points of the current frame by adopting a method of decoupling the translation vector and the depth of the feature points with the rotation vector and solving the normalized reprojection error, and removing the abnormal points. Wherein the outlier may include a point where the recovered 3D point has a large reprojection error with the tracked frame, and a moving point.

S4, selecting corresponding visual re-projection errors and IMU residual errors in the sliding window as cost functions according to the set optimization mode, constructing a least square problem, and solving through nonlinear optimization to obtain the vehicle pose;

s5, setting an optimization mode according to IMU measurement information or the uncertainty of the IMU measurement value, namely selecting one of mode I, mode G and mode A, wherein in the mode I, the pose of the image frame is solved through the visual PnP by taking a rotation vector obtained by IMU rotation integration as an initial value, and in the mode G, the pose of the image frame is solved through the visual PnP by taking a translation vector obtained by IMU translation integration as an initial value;

In one embodiment, the initialization step S0 specifically includes the following steps:

s01, reading in a frame of image as a current frame and reading in corresponding IMU measurement information, and if only one frame of image of the current frame exists in a sliding window, extracting M characteristic points from the current frame; otherwise, using LK optical flow to track the feature points of the last frame, and extracting proper feature points in the current frame to ensure that the total number of the feature points of the current frame is M.

If the current frame is the first two frames in the sliding window, setting the current frame as a key frame; otherwise, judging whether the previous frame is a key frame based on the parallax between the previous frame and the previous frame of the previous frame; specifically, if the disparity between the previous frame and the previous frame of the previous frame is greater than or equal to a set threshold (for example, but not limited to, 8-12 pixels), the previous frame is set as a key frame; otherwise, the last frame is set as the normal frame. Here, the parallax may be understood as an average value of euclidean distances calculated for coordinates of feature points matched between two frames.

S02, if the number of the images in the sliding window reaches the size of the sliding window, entering S03; otherwise, returning to S01;

s03, solving the pose of each frame by using the visual features of the image in the sliding window, constructing an initialization cost function from the perspective of maximum posterior estimation, estimating the initial value of the state quantity of the cost function in a self-adaptive manner, optimizing by combining the pose of each frame, if the optimization result is converged, successfully initializing, obtaining IMU parameters, and ending the initialization step S0; otherwise go to S01.

In one embodiment, in step S03, the pose is solved using the visual features by:

s031, calculate the parallax error of the zeroth frame and the last frame in the sliding window, if the parallax error is greater than the second preset threshold, use the epipolar geometry method to solve the pose transformation of the zeroth frame and the last frame, change to step S032; if the parallax is not greater than the second preset threshold or greater than the second preset threshold but the pose transformation between the zeroth frame and the last frame cannot be solved, the step S01 is returned after the frame is removed from the sliding window.

In steps 0S31 and 0S32, the sliding window deframing method is as follows: if the previous frame of the current frame is a non-key frame, removing the previous frame; otherwise, the initial frame of the sliding window is removed.

In one embodiment, in step S03, the initialization cost function is constructed, the initial value of the state quantity of the cost function is estimated adaptively, and the IMU parameter value is obtained by optimization in combination with the pose:

and S033, calculating IMU pre-integration and covariance between frames in the sliding window.

S034, assuming that the external reference of rotation between the camera and the IMU is known, converting the pose of each frame in the sliding window from the current camera coordinate system to the Body coordinate system of the IMU.

S035, initializing optimized state quantities including scale, gravity direction, speed, accelerometer deviation and gyroscope deviation, calculating the initial value of the speed through translation of the image frame, defaulting the accelerometer deviation and the gyroscope deviation to zero, and estimating the initial values of the gravity direction and the scale through assuming that the initial noise of velocity and displacement pre-integration is zero.

S036, an initialization cost function is built by IMU pre-integration residual errors, a priori is added to the accelerometer, the order of magnitude of the prior covariance diagonal value of the accelerometer is the same as the order of magnitude of the covariance matrix diagonal value between accelerometer deviations, and the order is directly read out from the IMU covariance matrix in S033.

S037, obtaining IMU parameter values through optimization, wherein the IMU parameter values are represented by lie algebra theta_BIObtaining a rotation matrix R from the Body system to the northeast coordinate system_IBAnd converting the world coordinate system of the pose in the sliding window into a northeast coordinate system, wherein the pose represents the conversion from the Body coordinate system of the IMU to the northeast coordinate system, namely the final output vehicle pose.

In one embodiment, as shown in FIG. 2, in step S035, IMU parameters are defined as state quantities

i∈[0，WZ]The pre-integration residual equation is described as r_I＝[r_ΔR，r_Δv，r_Δp]：

Wherein WZ is the size of the sliding window；

Is a scale factor; theta_BIIs a rotation matrix R_BILie algebra of_BIBelongs to SO (3) and is the gravity coordinate of northeast_gIBody coordinate system G from epsilon (0, 0, G) to IMU_BG is the magnitude of gravity;

is the deviation of the accelerometer and gyroscope;

It should be noted that the initial value of the state quantity has a great influence on the optimized convergence rate and the result accuracy, so the method for obtaining the initial value of the present invention is as follows:

speed of rotation

Is initiated by a translation of the image frames i, i +1

Divided by the time between them.

Deviation b_a，b_gThe magnitude of each dimension of the value is typically two bits after the decimal point and is therefore directly initialized to a zero vector.

Rotation matrix R_BIThe residual estimate is pre-integrated by velocity. Since the initialization is done within 1-2 seconds of the vehicle motion, the initial velocity pre-integration residual can be assumed to be vector 0. Continuous inter-frame R solution by velocity pre-integration residual_BIIs finally converted into lie algebra theta_BI。

The scale factor s is estimated from the translated pre-integration residuals. Assuming the noise-induced translational pre-integration residual as vector 0, the velocity of each frame in the sliding window and the rotation matrix R are estimated_BI. And (3) solving a median value of the continuous frames s as an initial scale factor by using the translation pre-integration residual error, so as to avoid too large estimation error of s caused by too fast speed change.

In one embodiment, in step S036, the IMU parameter χ is estimated from the maximum a posteriori estimation problem perspective, if I is defined_i，i+1Is a pre-integration value, I, between I and I +1 frames_0：WZThe posterior distribution P (χ | I) is a collection of IMU measurement information or pre-integrated values of IMU measurements between successive frames in a sliding window_0：WZ) The description is as follows:

P(χ|I_0：WZ)∝P(I_0：WZ|χ)P(χ)

wherein, P (I)_0：WZ| χ) is a likelihood distribution, P (χ) is a prior distribution, and P (I) is a likelihood distribution because IMU measurement information or IMU measurement values are independent of each other_0：WZχ) can be decomposed into:

wherein the content of the first and second substances,

is composed of

Of covariance, r_p＝b_prior-b_aFor the accelerometer a priori residual, b_priorThe value of each dimension is 0.01, Σ_pIs a diagonal matrix, whose diagonal values are of the same order of magnitude as the diagonal values of the covariance matrix between accelerometer biases, which can be read directly from the covariance matrix of the IMU in S033,

and

obtained from S033.

As shown in the factor graph of fig. 2, the pose solved by the visual structure is considered to be accurate and remains unchanged. Since the initialization time is extremely short, the offset of each frame in the sliding window can be considered constant, so the IMU residual does not include random walk of the offset.

In most cases, the accelerometer bias is too small relative to the gravitational acceleration and therefore difficult to distinguish. To do this, an accelerometer a priori residual needs to be added: r is_p＝b_prior-b_a，b_priorThe value of each dimension is 0.01. Σ is a diagonal matrix whose diagonal values are of the same order of magnitude as the diagonal values of the inter-accelerometer bias covariance matrix, which can be read directly from the covariance matrix of the IMU in S33.

If the motion of the vehicle does not contain enough information to estimate the accelerometer bias, the a priori residuals have their estimates close to zero. No a priori is added to the gyro bias since it can be well estimated from the orientation of the image frames and the gyro measurements.

When the inertia parameters are subjected to nonlinear optimization, the deviation and the speed are directly added and updated, and a disturbance model is used for updating x:

to keep s positive, define the update mode as:

s＝s*exp(δs)

by updating theta in streaming space_BI：

δθ_BI＝(δg_BI，0)

θ_BI＝Log(Exp(θ_BI)*Exp(δα，δβ，0))

θ_BIis actually g_IAnd g_BParameterized from two angles, Exp (-) is an exponential function, Exp (-) is a function of θ_BIAn exponential mapping function for transforming the rotation matrix into θ_BIA logarithmic mapping function of.

If the initialization objective function does not converge, the frame is removed via the sliding window and the process returns to step S1.

In one embodiment, as shown in the pseudo code of fig. 3, step S3 specifically includes:

s31, setting the optimized state quantity of each frame in the sliding window as a translation vector and the depth of the feature points, constructing a cost function by using the visual reprojection errors, and solving the reprojection error size of all the feature points. For example: judging whether the depth of the normalized feature points in the sliding window except the last frame in a camera coordinate system is larger than 0, if the depth is larger than 0 and the matching points of the current feature points exist in the last frame, setting optimization variables of an image frame as the depth and translation vectors of the feature points, constructing a cost function by using visual reprojection errors, solving the size of the reprojection errors of all the feature points, and if a target function formed by using the visual reprojection errors as the cost function is not converged, repeating the step S31 until the final convergence or the maximum iteration times are reached; otherwise, skipping the feature point to judge the next feature point.

S32, decoupling the feature point optical flow into the optical flow only related to translation and feature point depth by utilizing the rotation vector.

S33, taking the two norms of the decoupled feature point light streams as normalization terms of corresponding reprojection errors, solving normalized reprojection errors of all feature points, and taking a set multiple 3 times of the average number of the feature points as a threshold value of abnormal points in judgment; and if the normalized reprojection error of the current feature point is larger than a threshold value, the feature point is regarded as an abnormal point and removed.

In one embodiment, in step S4, the method of selecting the corresponding visual reprojection error and IMU residual as cost functions in the sliding window is as follows:

if in mode I, the cost function is constructed with the visual reprojection error.

If in mode G, a cost function is constructed with the rotated pre-integration residual and the visual reprojection error.

In one embodiment, in step S5, the method for determining the optimization mode according to the IMU measurement information or the uncertainty of the IMU measurement value is as follows:

s51, if the current optimization mode is mode I or mode G, constructing a cost function by using the rotation pre-integral residual error and the visual residual error, solving a least square problem, obtaining a covariance matrix of gyroscope deviation of the current frame, and obtaining a corresponding diagonal block from the inversion of a Hessian matrix to calculate by using the trace of the covariance matrix as uncertainty; when the uncertainty is lower than a third set threshold, setting the optimization mode to be mode G, and entering S52; otherwise, the optimization mode is set to mode I, and the step S6 is proceeded to.

S52, if the current optimization mode is mode A, constructing a cost function by using rotation, translation, speed pre-integration residual errors and visual residual errors, firstly checking uncertainty of gyroscope deviation of a current frame, and if the uncertainty exceeds a fourth set threshold value, setting the optimization mode as mode I; otherwise, the uncertainty of the scale is evaluated, and if the variance value relative to the current scale estimate is higher than a fifth set threshold, the optimization mode is set to mode G, otherwise the optimization mode is set to mode a.

The uncertainty of the gyro bias for the latest frame is first checked and if the trace of its covariance matrix exceeds a threshold, mode I is selected for subsequent optimization. Otherwise, the uncertainty of the scale will be evaluated. Accelerometer bias and gravitational acceleration tend to be coupled and in most cases difficult to distinguish, and it is found from the velocity pre-integration residual and the translation pre-integration residual that there is a strong correlation between scale, velocity, gravitational direction and accelerometer bias, so that assessing uncertainty in scale is sufficient to measure uncertainty in the state related to accelerometer measurements. Mode G is selected if the variance value relative to the current scale estimate is above a threshold, and mode a is selected otherwise.

Wherein, mode I refers to a mode with large measurement uncertainty of a gyroscope and an accelerometer; mode G refers to a mode with small measurement uncertainty of a gyroscope and large measurement uncertainty of an accelerometer; mode A refers to the mode where the gyroscope and accelerometer measurement uncertainty is small.

In one embodiment, in step S51, the state quantities are pose and gyroscope biases to rotate the pre-integration residual and the vision residual to establish the following objective functions:

where ρ is the Huber kernel function, e_R、e_visRespectively a rotational pre-integration residual and a visual residual, sigma_R、Σ_visCovariance matrices of the rotation pre-integration and the vision respectively; the rotation pre-integration residuals and covariance matrix have been calculated in the IMU pre-integration at S1.

wherein the content of the first and second substances,

and

respectively, the rotation matrix and the translation vector of the ith frame from the camera coordinate system to the northeast coordinate system.

wherein e is_RvpFor the pre-integrated residual in S1, e_visIs the visual residual, Σ, in S51_Rvp、∑_visCovariance matrices, e, of pre-integration and vision, respectively_R、e_v、e_pRespectively, a rotation pre-integration residual, a velocity pre-integration residual and a sumThe pre-integration residual is shifted.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A monocular vision inertial positioning method oriented to automatic driving of a closed park, which fuses a plurality of frames of continuous monocular images and IMU measurement information to output the pose of a vehicle, is characterized by comprising the following steps:

s0, reading multi-frame continuous monocular images and corresponding IMU measurement information, performing initialization processing, determining IMU parameters, and initially setting an optimization mode as a mode A mode, wherein in the mode A mode, a rotation vector and a translation vector obtained by IMU rotation integration and translation integration are used as poses of image frames;

s3, determining abnormal points in the feature points of the current frame by adopting a method of decoupling the translation vector and the depth of the feature points from the rotation vector and solving the normalized reprojection error, and removing the abnormal points;

s6, in the sliding window, if the previous frame of the current frame is a non-key frame, removing the previous frame; otherwise, removing the initial frame of the sliding window; a transition is made to S1 after the frame is removed,

in step S4, the method for selecting the corresponding visual reprojection error and IMU residual as the cost function in the sliding window is as follows:

if in mode A, a cost function is constructed with the rotation pre-integration residual, the velocity pre-integration residual, the translation pre-integration residual, the gyroscope and accelerometer residual, and the visual re-projection error,

in step S5, the method for determining the optimization mode according to the IMU measurement information or the uncertainty of the IMU measurement value is as follows:

2. Monocular visual inertial positioning method towards closed park autopilot according to claim 1,

the initialization step S0 specifically includes the following steps:

3. The closed park autopilot-oriented monocular visual inertial positioning method of claim 2 wherein in step S03 the pose is solved using visual features by:

4. The monocular vision inertial positioning method for automatic driving of a closed park according to claim 3, wherein in step S03, the initialization cost function is constructed, the initial value of the state quantity of the cost function is estimated adaptively, and the IMU parameter value is obtained by optimizing the pose according to the following steps:

s034, based on the rotating external reference between the camera and the IMU, converting the pose of each frame in the sliding window from the current camera coordinate system to the Body coordinate system of the IMU;

s035, initializing optimized state quantities including scale, gravity direction, speed, accelerometer deviation and gyroscope deviation, calculating the initial value of the speed by the translation of the image frame, defaulting the accelerometer deviation and the gyroscope deviation to zero, and estimating the initial value of the gravity direction and the scale based on the fact that the initial noise of the velocity and displacement pre-integration is zero;

5. The monocular visual inertial positioning method for automatic driving on a closed park as claimed in claim 4, wherein in step S035, IMU parameters are defined as state quantities

Describing the conventional pre-integration residual equation as r_I＝[r_ΔR，r_Δv，r_Δp]：

In the formula (I), the compound is shown in the specification,

is the size of the window;

is a scale factor;

is a matrix R_BILie algebra of

is the magnitude of gravity;

b_ais the deviation of the accelerometer; b_gIs the deviation of the gyroscope;

6. The closed park autopilot-oriented monocular vision inertial positioning method of claim 5 wherein in step S036, the IMU parameter χ, definition I, is estimated from a maximum a posteriori estimation problem perspective_i，i+1Is a pre-integration value, I, between I and I +1 frames_0：WZThe posterior distribution P (χ | I) is the set of pre-integrated values for successive IMU measurements within the sliding window_0：WZ) The description is as follows:

P(χ|I_0：WZ)∝P(I_0：WZ|χ)P(χ)

is composed of

Of (a) covariance, Σ_pIs a diagonal matrix;

update χ using the perturbation model:

to keep s positive, define the update mode as:

s＝s*exp(δs)

by updating theta in streaming space_BI：

δθ_BI＝(δg_BI，0)

θ_BI＝Log(Exp(θ_BI)*Exp(δθ_BI))

7. The closed park autopilot-oriented monocular visual inertial positioning method of claim 6 wherein step S3 includes:

8. The closed park autopilot-oriented monocular visual inertial positioning method of claim 1 wherein in step S51, the state quantities are pose and gyroscope biases to establish the following objective function with the rotational pre-integration residual and the visual residual:

where ρ is the Huber kernel function, e_R、e_visRespectively, the rotational pre-integration residual and the visual residual, sigma_R、∑_visCovariance moments, respectively rotation pre-integration and visionArraying;

wherein the content of the first and second substances,

and

wherein e is_RvpAs a pre-product in S1Partial residual error, e_visIs the visual residual, Σ, in S51_Rvp、∑_visCovariance matrices, e, of pre-integration and vision, respectively_R、e_v、e_pRespectively a rotation pre-integration residual error, a speed pre-integration residual error and a translation pre-integration residual error.