CN118052963A

CN118052963A - Method, medium and system for reducing XR long-time error accumulation

Info

Publication number: CN118052963A
Application number: CN202410451091.4A
Authority: CN
Inventors: 周安斌; 晏武志; 孙腾飞; 郑建华
Original assignee: Shandong Jindong Digital Creative Co ltd
Current assignee: Shandong Jindong Digital Creative Co ltd
Priority date: 2024-04-16
Filing date: 2024-04-16
Publication date: 2024-05-17
Anticipated expiration: 2044-04-16
Also published as: CN118052963B

Abstract

The invention provides a method, a medium and a system for reducing XR long-time error accumulation, belonging to the technical field of XR error reduction, comprising the following steps: preprocessing XR sensor data by using a filtering algorithm; fusing multi-source data acquired by a sensor, and solving an optimal estimated value of the pose of the XR equipment in a virtual environment; introducing a nonlinear error state Kalman filter to estimate and compensate the system state and measurement noise in real time; combining priori knowledge and motion constraint conditions to construct an error accumulation model, and predicting error change of future states; according to the error accumulation condition, dynamically adjusting the sensor weight, and reducing the sensor weight with large error contribution; periodically aligning with a known calibration position, and recalibrating the pose error; an error distribution model is built, and error deviation is corrected by using a statistical rule in subsequent application; when the error accumulation exceeds a preset error threshold, starting a repositioning program, and redefining the initial position and the initial posture.

Description

Method, medium and system for reducing XR long-time error accumulation

Technical Field

The invention belongs to the technical field of XR error reduction, and particularly relates to a method, medium and system for reducing XR long-time error accumulation.

Background

With rapid development of Augmented Reality (AR) and Virtual Reality (VR) technologies, XR (augmented reality) devices are widely used in various fields of industry, medical treatment, education, entertainment, etc. The XR equipment acquires information of the environment and the state of the XR equipment by integrating various sensors such as an IMU (inertial measurement unit), an RGB (red, green and blue) camera, a depth camera, a laser radar and the like, and combines algorithms such as computer vision, SLAM (synchronous positioning and map construction) and the like, so that seamless fusion of virtual information and a real environment is realized.

In the industrial field, XR technology can assist the workman to carry out equipment operation maintenance, through overlapping virtual operation guide and data, reduces workman's study cost, improves the operating efficiency. In the medical field, the XR technology provides a brand new auxiliary mode for surgical operation, and a doctor can project a virtual anatomical model in a patient to guide the operation; at the same time, XR can also be used for medical training, and provides an immersive virtual operation environment for students. In the education field, the XR technology presents abstract knowledge content into a three-dimensional virtual scene of a vivid image, so that learning experience and learning efficiency are greatly improved. In the entertainment field, the XR technology enables users to face the virtual world, and a brand new experience mode is injected into the fields of games, movies and the like.

However, XR devices still face serious technical challenges in practical applications, where accuracy and stability of pose estimation of the XR device in a virtual environment are one of the key factors restricting XR applications. Because XR equipment needs to be in seamless connection with a real environment, the accuracy of estimating the position and the gesture (commonly called gesture) of the equipment is extremely high, and even a tiny gesture error can cause obvious deviation between virtual information and a real scene, so that discomfort is brought to a user.

Currently, a commonly adopted pose estimation method of XR equipment is based on a multi-source sensor fusion technology. Specifically, the pose of the XR equipment is estimated by fusing sensor data such as IMU, vision, depth and the like and applying a filtering algorithm and a nonlinear optimization algorithm. Although the method can obtain accurate pose calculation results in a short time, the pose estimation results gradually deviate from the true values due to the continuous accumulation of various error sources (such as IMU integral drift, vision measurement errors and the like) along with the extension of the operation time. This long-term error accumulation phenomenon is one of the main bottlenecks that restrict the further development of XR technology.

Disclosure of Invention

In view of the above, the present invention provides a method, medium and system for reducing long-time error accumulation of XR, which can solve the technical problem that in the prior art, in the XR operation process, as the operation time is prolonged, the calculation result of the pose of the XR device in the virtual environment gradually deviates from the true value due to the continuous accumulation of various error sources.

The invention is realized in the following way:

a first aspect of the invention provides a method of reducing XR long term error accumulation, comprising the steps of:

S10, collecting sensor data in the operation process of the XR equipment, wherein the sensor data comprise visual, inertial and depth multi-source data;

S20, preprocessing sensor data by using a filtering algorithm:

S30, establishing a comprehensive optimization model, fusing the preprocessed sensor data, and solving an optimal estimated value of the pose of the XR equipment in the virtual environment;

s40, introducing a nonlinear error state Kalman filter, and estimating and compensating the system state and the measurement noise in real time;

s50, constructing an error accumulation model by combining priori knowledge and motion constraint conditions, and predicting error change of a future state;

s60, dynamically adjusting the sensor weight according to the error accumulation condition, and reducing the sensor weight with large error contribution;

s70, periodically aligning with the known calibration position, and recalibrating the pose error;

s80, constructing an error distribution model, and correcting error deviation in subsequent application by using a statistical rule;

and S90, when the error accumulation exceeds a preset error threshold value, starting a repositioning program, and redefining the initial position and the initial posture.

Based on the technical scheme, the method for reducing the XR long-time error accumulation can be improved as follows:

The sensor data specifically comprises data from an inertial measurement unit, a vision sensor and a depth sensor.

The step of preprocessing the sensor data by using a filtering algorithm specifically comprises the following steps: filtering the image data by adopting bilateral filtering to remove high-frequency noise; and removing abnormal values of the inertia and depth data by adopting a Kalman filtering method.

The method comprises the steps of establishing a comprehensive optimization model, fusing multi-source data comprising vision, inertia and depth, and solving an optimal pose estimated value, and specifically comprises the following steps: performing target detection and track association on the visual data to acquire target motion information; fusing target motion information with inertia and depth data to construct an error cost function; and solving the optimal pose solution by adopting an optimization algorithm.

The step of combining priori knowledge and motion constraint conditions to construct an error accumulation model and predict the error change of the future state specifically comprises the following steps: establishing a terrain model, and judging the steep degree of the terrain by utilizing the frequency characteristics of adjacent areas; adopting smooth constraint for the flat area; for steep areas, edge-preserving constraint is adopted; and merging constraint conditions into an error model, and predicting the future error change trend.

The step of periodically aligning with a known calibration position and recalibrating the pose error specifically comprises the following steps: when a known calibration target is detected, aligning the estimated pose with the calibration position; and correcting the pose error by using the alignment pose difference value.

The step of constructing an error distribution model and correcting error deviation by using a statistical rule in subsequent application specifically comprises the following steps: counting the distribution rule of pose errors under different application scenes; and applying the distribution model to the new scene to correct the systematic error.

Furthermore, the optimization algorithm adopts Gauss Newton method.

A second aspect of the present invention provides a computer readable storage medium having stored therein program instructions which, when executed, are adapted to carry out a method of reducing XR long term error accumulation as described above.

A third aspect of the invention provides a system for reducing XR long term error accumulation comprising a computer readable storage medium as described above.

Compared with the prior art, the method, the medium and the system for reducing the XR long-time error accumulation have the beneficial effects that: 1. and the error accumulation condition is monitored in real time, and the error diffusion rate is actively reduced. The traditional method usually carries out correction by passively waiting for error accumulation to a certain extent, and the invention constructs an error accumulation model by combining priori knowledge and motion constraint conditions to predict the error change of the future state, thereby being capable of observing the error diffusion trend in real time and taking corresponding control measures in advance.

2. And the sensor weight is dynamically adjusted, so that the system robustness is improved. Based on the predicted error distribution, the method can evaluate the contribution degree of each sensor to the overall error, thereby dynamically adjusting the weight of each sensor in the fusion model, reducing the sensor weight with large error contribution and improving the robustness of the system to single sensor failure.

3. And (5) periodically correcting the pose, and controlling the upper limit of error increase. By aligning with the known calibration position, the invention can pull back the pose error to a smaller level at intervals, thereby effectively controlling the unlimited increase of the error and ensuring the precision and stability of the long-term operation of the system.

4. And correcting error offset by utilizing a data statistics rule, so that the overall accuracy is improved. According to the invention, an error distribution model is constructed, the corresponding error distribution can be inquired according to the current motion state and the environmental condition, and the pose estimated value is corrected by utilizing the statistical parameter of the error distribution model, so that the error is minimized in the sense of mathematical statistics.

5. The relocation mechanism ensures system availability. When the error accumulation exceeds a preset threshold, the invention automatically triggers the repositioning program to redetermine the initial position and the gesture, thereby preventing the system from being invalid due to unlimited increase of the error and further ensuring the continuous usability of the system.

In summary, compared with the prior art, the method has the remarkable advantages that error diffusion is delayed from the root of error accumulation, and errors are comprehensively managed by utilizing various different levels of error control means, so that the pose estimation precision and stability of the XR equipment are greatly improved, that is, the technical scheme of the invention solves the technical problem that the pose calculation result of the XR equipment in a virtual environment gradually deviates from a true value due to continuous accumulation of various error sources along with the extension of operation time in the operation process of the XR in the prior art.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the method for reducing the XR long-time error accumulation provided by the invention comprises the following steps: the method comprises the following steps of:

S20, preprocessing sensor data by utilizing a filtering algorithm;

The following describes in detail the specific embodiments of the above steps:

the specific embodiment of step S10 is to collect motion data from sensors built into or external to the XR device, including but not limited to Inertial Measurement Units (IMUs), RGB cameras, depth cameras, lidar, etc. In particular, the IMU may provide acceleration and angular velocity information of the device, the RGB camera may capture an image of the environment, the depth camera may be able to acquire depth information of the environment, and the lidar may scan to acquire ambient point cloud data. These raw sensor data are the basis for subsequent sensor fusion and pose estimation.

Step S20 pre-processes the raw sensor data using a filtering algorithm to eliminate the effects of measurement noise and distortion. Common filtering algorithms include kalman filtering, wavelet transformation, mean filtering, etc. Depending on the particular sensor type and noise characteristics, a suitable filtering algorithm may be selected or a combination of filtering algorithms may be used to achieve a better filtering effect. For example, for IMU data, gaussian white noise may be removed using kalman filtering, and high frequency noise removed by wavelet transformation; for image data, salt and pepper noise and gaussian noise can be removed using mean or gaussian filtering; for point cloud data, moving average filtering or voxel filtering may be used to reduce the effects of outliers. The purpose of the data preprocessing is to provide high quality sensor input for subsequent multi-source data fusion.

Step S30 is to build a comprehensive optimization model, fuse the multi-source sensor data such as vision, inertia, depth and the like, and solve the optimal pose estimation value. This step typically employs a nonlinear optimization algorithm to construct a cost function and minimize the cost function to solve for the optimal solution. Common optimization algorithms include Gauss Newton's method, L-M algorithm, etc. Various error items, such as a re-projection error, an IMU pre-integration error, a plane constraint error and the like, need to be considered in the optimization model, and the error items are weighted, fused and built into a cost function. Meanwhile, a sensor state equation and an observation equation are also required to be established, and the relation between a state variable (such as pose, speed and the like) and an observed quantity (such as feature points, IMU measured values and the like) is described. The objective of solving the optimization problem is to obtain pose estimates that maximize the posterior probability under all observations.

Step S40 introduces a nonlinear Error state Kalman filter (Error-STATE KALMAN FILTER, ESKF) to estimate and compensate the system state and measurement noise in real time. ESKF is widely applied to the positioning navigation fields such as SLAM (synchronous positioning and map building), VIO (visual inertial odometer) and the like. ESKF a state space model is built, including state transition equations and observation equations. The state transfer equation describes a recursive process of system states over time, and the observation equation relates states to sensor observations. ESKF through a predict-update cycle, the system state is corrected with the latest observables. In particular ESKF divides the state variable into two parts, the real state variable and the error state variable, and linearizes the system over the error state space, using an extended kalman filter algorithm for estimation. By ESKF, the optimal estimation value of the system state can be estimated on line, and parameters such as bias, noise covariance matrix and the like of the sensor can be estimated at the same time, so that the measurement is compensated, and the system estimation precision is improved.

For the establishment and training of error accumulation models.

First, a sequence of sensor data and synchronized reference truth data are collected while the XR system is running in real environment. The reference truth may come from high precision external positioning systems such as laser tracking systems, robotic measurement arms, etc. Then, the sensor data is input into the algorithm flow of the above steps S10-S40, and the pose estimation sequence of the system is obtained. Comparing the pose estimation sequence with a reference true value, and calculating an error between the pose estimation sequence and the reference true value, namely a pose error sequence.

Next, the pose error sequence needs to be analyzed to explore the law of error accumulation. The common practice is to construct an error model to disassemble pose errors into various error sources, such as IMU integral drift errors, vision measurement errors, depth measurement errors, etc. For different types of error sources, corresponding error models can be fitted by analyzing the change rules of the error sources in time and space. For example, the IMU integrated drift error may be fitted with a random walk model and the vision measurement error may be fitted with a gaussian noise model. In addition, the influence of factors such as a motion mode, an environment structure and the like on error accumulation needs to be considered, so that an error model is further improved.

After the preliminary error model is obtained, the model needs to be trained and optimized by using the training data set. The training data set should include a variety of typical motion patterns and environmental scenarios to ensure generalization of the error model. The parameters of the error model can be estimated by adopting methods such as maximum likelihood estimation, bayesian estimation and the like. Meanwhile, machine learning techniques such as neural networks, gaussian processes and the like can be introduced to directly learn an error model from training data.

In the error model training process, a reasonable threshold value needs to be set as a termination condition. One common practice is to divide the training data set into a training set and a validation set, monitor the error indicator on the validation set during the training process, and terminate the training process when the error indicator on the validation set no longer drops significantly. The reference value of the error threshold value may be typically set to 95% or 99% fractional number of the overall error sequence, depending on the specific application scenario and accuracy requirements.

In general, the creation and training of the error accumulation model requires the following key steps: 1) Collecting real operation data and a reference true value; 2) Calculating a pose error sequence; 3) Analyzing an error source and constructing a preliminary error model; 4) Building a training data set; 5) Training and optimizing the error model by utilizing a training data set; 6) A reasonable termination threshold is set. Through the process, an error model capable of accurately describing an error accumulation rule can be obtained, and a foundation is laid for subsequent error compensation and control.

Step S50 is to combine the priori knowledge and the motion constraint condition to construct an error accumulation model to predict the error change of the future state. In this step, it is necessary to predict the system state and observation at the future time by using the error accumulation model obtained in the previous step and combining the motion equation and the observation equation of the system, thereby predicting the future error state. In particular, the following substeps may be employed:

firstly, according to the system state and the motion model at the current moment, a state prior estimation at the future moment is presumed by using a state transition equation. Secondly, according to prior state estimation and an observation model of the system, calculating the observed quantity at the prediction moment. Then, the prior state estimation and the prediction observables are substituted into the trained error accumulation model, and the prediction error distribution at the moment is estimated. In this process, known prior information, such as environmental constraints, motion patterns, etc., may be introduced to correct the error prediction. And finally, correcting the system state at the future moment according to the error prediction result, thereby obtaining a more accurate state estimation value.

The purpose of this step is to find and circumvent in advance, by modeling and prediction, the larger errors that may occur in the future, ready for subsequent error control. Meanwhile, the step lays a foundation for dynamically adjusting the weight of the sensor, and the weight of the sensor can be adjusted according to the contribution degree of different sensors to errors.

Step S60 dynamically adjusts the sensor weight according to the error accumulation condition, and reduces the sensor weight with large error contribution. This step can be divided into the following sub-steps:

First, the degree of contribution of each sensor to the overall error is counted and analyzed. The error contribution of each sensor, such as IMU drift error, vision measurement error, etc., may be calculated based on the error distribution predicted in the previous step. Secondly, setting a threshold range of sensor weights, and reducing the weight of a certain sensor when the error contribution of the certain sensor exceeds an upper threshold; when the weight is lower than the lower threshold, the weight is increased. The threshold range can be set according to practical application scenes and precision requirements, and the upper limit can be generally 80% of the quantiles of error contribution, and the lower limit can be generally 20% of the quantiles. Then, a sensor weight adjustment model is constructed, sensor weights are used as variables to be optimized, overall system errors are minimized, and optimal weight configuration is solved. In the optimization process, smoothness constraint can be introduced, so that severe fluctuation of the weight is avoided. And finally, updating the optimized weight into the multi-source data fusion model for subsequent pose estimation.

The purpose of this step is to suppress the impact of the sensor with the large contribution of errors on the system by dynamically adjusting the sensor weights, thereby reducing the overall error level. Meanwhile, sensor data with higher precision can be fully utilized, and the robustness of the system is improved.

Step S70 is to periodically align with the known calibration positions and recalibrate the pose errors. This step comprises the sub-steps of:

Firstly, some known calibration positions in the environment need to be acquired in advance, which can be manually set markers or environment features modeled in advance. These known calibration positions are then periodically detected and identified during system operation using visual or laser or like sensors. When the calibration position is detected, comparing the current pose estimation with the known value of the calibration position, and calculating pose deviation. And then, correcting the pose state of the system according to the pose deviation to realize alignment operation. In the alignment process, the robust kernel function and other technologies can be introduced, so that the influence of outliers is reduced. Meanwhile, the parameters of the error accumulation model need to be updated to be matched with the corrected pose state.

The effect of this step is to periodically correct the accumulated pose errors of the system, preventing the errors from growing indefinitely. By aligning with the known calibration position, the pose error can be pulled back to a small level, thereby slowing down the rapid accumulation of errors. This step provides a basis for subsequent error control.

Step S80 is to construct an error distribution model, and correct the error offset by using the statistical rule in the subsequent application. This step can be divided into the following sub-steps:

First, a large amount of actual operation data needs to be collected, including sensor raw data and pose estimation results. And then comparing the pose estimation result with a reference true value to calculate a pose error sequence. For different motion modes and environmental scenes, the distribution characteristics of pose errors are respectively counted, wherein the distribution characteristics comprise parameters such as mean value, variance, kurtosis, skewness and the like. Next, an error distribution model is constructed, and modeling description is performed on error distribution under different conditions. Common error distribution models include a gaussian distribution model, a student's t distribution model, a mixed gaussian model, and the like, and an appropriate model form can be selected according to specific error distribution characteristics. In the model construction process, the model needs to be trained so that the model can be well fitted with actual error distribution data. The training method can adopt classical methods such as maximum likelihood estimation, bayesian estimation and the like, and can also use machine learning techniques such as deep learning and the like.

After the error distribution model is built, in practical application, the corresponding error distribution model can be queried according to the current motion state and environmental conditions, and the statistical parameters of the errors can be obtained. Then, the pose estimation value is corrected by using the statistical parameters, so that the system error is reduced. The method aims to fully utilize the statistical rule of the historical data to effectively compensate the system error and improve the accuracy and stability of pose estimation.

Step S90 is to start the repositioning procedure to redefine the initial position and posture when the error accumulation exceeds a preset error threshold. This step comprises the sub-steps of:

First, an error threshold needs to be set, and when the accumulated error of the system exceeds the threshold, that is, the error is considered to be excessive, a relocation operation needs to be performed. The setting of the error threshold can be determined according to the precision requirement of the actual application scene, and usually 95% or 99% quantiles of the pose error can be taken as references. Next, during system operation, the current error level is continuously monitored, and once a preset threshold is exceeded, a relocation procedure is triggered. The repositioning program will first pause the current pose estimation process and enter repositioning mode. In the repositioning mode, the system will rescan and model the environment with visual and depth sensors, reconstructing the local map. At the same time, known environmental features are searched and matched to determine the current approximate location. And finally, restarting pose estimation and SLAM processes by taking the matched position and the reconstructed local map as initial values to realize repositioning.

The function of this step is to prevent the unrestricted accumulation of errors, leading to system failure. By setting a threshold and performing a repositioning operation, the error of the system can be returned to a smaller level, thereby ensuring continued usability of the system. Meanwhile, the repositioning operation also provides better initial conditions for subsequent pose estimation, and is beneficial to improving estimation accuracy.

In the training process of constructing the error distribution model, a sufficient training data set needs to be prepared. The training data set should contain various typical motion patterns such as uniform motion, acceleration motion, rotation motion, etc., as well as different environmental scenarios such as indoor, outdoor, simple, complex, etc. For each motion pattern and environmental scenario, enough sensor data and reference truth data need to be collected to ensure coverage of the training data.

In constructing the training dataset, attention is paid to the following: 1) The high precision of the reference true value is ensured, and the reference true value can be obtained by using an external positioning system with high precision, such as a laser tracking system, a real-time motion capturing system and the like; 2) The sensor data and the reference true value need to be synchronously acquired, and the time stamps are aligned; 3) The data acquisition process should cover various working states as much as possible, including stationary, uniform motion, acceleration motion, etc.; 4) The environment scene needs to be diversified, including indoor and outdoor situations, simplicity, complexity and the like; 5) For each condition, the duration of the data acquisition should be long enough to adequately reflect the error accumulation process.

After the training data set is prepared, a suitable error distribution model needs to be constructed. Common error distribution models include single-variable gaussian distribution models, multiple gaussian distribution models, student t distribution models, mixed gaussian models, and the like. The model is selected according to the distribution characteristics of the actual error data, and can be analyzed firstly, a distribution curve of the error data is drawn, and then a model form which can fit the distribution well is selected.

For a single-variable Gaussian distribution model, only two parameters of mean and variance are needed to be estimated; for a multi-element Gaussian distribution model, a mean vector and a covariance matrix need to be estimated; for a student t distribution model, in addition to mean and variance, a degree of freedom parameter needs to be estimated; for the mixed Gaussian model, parameters such as weight, mean and covariance of each Gaussian component, number of components and the like need to be estimated.

The estimation of the model parameters can adopt a maximum likelihood estimation method or a Bayesian estimation method. The thought of maximum likelihood estimation is to maximize the likelihood function of the observed data to obtain the optimal estimated value of the model parameters; the Bayesian estimation is to solve posterior distribution of model parameters by combining observation data on the basis of given prior distribution.

In the parameter estimation process, reasonable termination conditions, such as maximum iteration times, convergence threshold of gradient descent and the like, need to be set. For complex models, such as a mixture gaussian model, it may be necessary to introduce heuristic optimization algorithms (e.g., EM algorithms, MCMC sampling, etc.) for parameter estimation.

In addition to classical parameter estimation methods, there have been studies in recent years to explore methods for learning error distribution models directly from data using deep learning techniques. For example, the potential distribution of error data may be modeled using a depth generation model such as a variational self-encoder (VAE) or a Generation Antagonism Network (GAN). The method based on deep learning does not need to manually specify a model form, can automatically find the internal distribution rule of the data, and has good applicability and expansibility.

Regardless of the modeling method, a validation set is required to be prepared in the training process for model selection and super-parameter tuning. All data can be read as 8:2 is divided into a training set and a verification set, model training is carried out on the training set, and generalization performance of the model is evaluated on the verification set. Common evaluation metrics include log likelihood values, mean square error, etc. The training process may be terminated when the evaluation index on the validation set is no longer significantly improved.

Through the steps, an error distribution model capable of well describing pose error distribution can be obtained. In the implementation process, the corresponding error distribution model can be inquired according to the current motion state and the environmental condition, and the pose estimated value is corrected by using the statistical parameters (such as mean value, variance and the like) of the error distribution model, so that the positioning accuracy and stability of the system are improved.

The calculations involved in the above steps are described in detail below:

The specific implementation of step S10 is to acquire raw data from various sensors built in or external to the XR device. Common sensors include Inertial Measurement Units (IMUs), RGB cameras, depth cameras, lidars, and the like.

The IMU may provide acceleration of the deviceSum angular velocity/>Information of whichIs triaxial acceleration in/>；Is the triaxial angular velocity, and the unit is/>。

The RGB camera can acquire an environment imageWherein/>The value ranges of the pixel coordinates of the image are respectively/>W and H are the image width and height, respectively; /(I)Is an acquisition timestamp.

The depth camera can acquire the depth information of the environmentWherein/>Corresponding to the pixel coordinates of RGB image,/>Is an acquisition timestamp. Depth value/>Representing pixels/>The spatial point at the location is the distance from the camera center in/>。

The laser radar can scan and acquire point cloud data of the environmentWhereinFor/>Three-dimensional coordinates of individual points in units of/>；/>Is the total number of points in the point cloud; /(I)Is an acquisition timestamp.

Step S20 pre-processes the raw sensor data using a filtering algorithm to remove the effects of measurement noise and distortion. For IMU data, a kalman filter algorithm may be used to remove gaussian white noise while a wavelet transform algorithm is used to remove high frequency noise. Specifically, the Kalman filtering algorithm builds a state space model:

；

Wherein the method comprises the steps of Is a system state vector, including position, speed, attitude, etc.; /(I)Is a control input; /(I)Is the observed quantity; And/> Process noise and measurement noise, respectively, obey gaussian distribution. By means of a prediction-update cycle, the Kalman filtering can obtain an optimal state estimate/>. The wavelet transformation utilizes wavelet basis functions of different scales to carry out multi-scale decomposition on IMU data, thereby effectively removing high-frequency noise components.

For image data, salt and pepper noise and gaussian noise may be removed using mean or gaussian filtering. The idea of mean value filtering is to replace the gray value of the current pixel with the mean value of the neighborhood pixels:

；

Wherein the method comprises the steps of Is the neighborhood radius. Gaussian filtering is to assign different gaussian weights to pixels in the neighborhood:

；

Wherein the method comprises the steps of Is a two-dimensional Gaussian kernel, kernel parameter/>The degree of filtering is controlled.

For point cloud data, a moving average filtering or voxel filtering algorithm may be used to reduce the effects of outliers. The idea of moving average filtering is for each query pointFind/>, within its neighborhoodNearest neighbors are calculated and the average value of the neighbors is taken as/>Is a filtered result of (a):

；

Voxel filtering is to divide a space into regular three-dimensional voxel grids, and average values of points falling into the same voxel are taken as representative points of the voxel. Therefore, outliers in the voxels can be effectively removed, and the quality of the point cloud is improved.

Step S30 is to build a comprehensive optimization model, fuse the multi-source sensor data such as vision, inertia, depth and the like, and solve the optimal pose estimation value. This step typically employs a nonlinear optimization algorithm to construct a cost function and minimize the cost function to solve for the optimal solution.

The system is set to beWherein/>Representing the device pose as a rotation matrix; /(I)Is the position; /(I)Is the speed; /(I)And/>The bias of the accelerometer and gyroscope, respectively. The cost function may be constructed as:

；

Wherein the method comprises the steps of Is a robust kernel function for suppressing the effects of outlier errors; /(I)For/>Item residual, comprising:

Visual re-projection residual:

；

Wherein the method comprises the steps of The observed quantity of the feature points, such as image coordinates; /(I)For the camera projection model, three-dimensional feature points/>In given pose/>Down projected to the image plane.

2) IMU pre-integration residual:

；

Wherein the method comprises the steps of For IMU in time interval/>A measurement of the internal integral; the latter two terms are theoretical pre-integration values calculated from the states.

3) Plane constraint residual:

For each detected plane Wherein/>Is a plane normal vector,/>If a three-dimensional point/>, is the distance of the plane to the originSatisfy/>Then the point is said to fall on the plane/>And (3) upper part. The residual may be constructed as:

；

By minimizing the cost function, an optimal solution that maximizes the state posterior probability under all observations can be obtained Namely solving:

；

The nonlinear least squares problem may be solved here using optimization algorithms such as gauss newton's method, L-M algorithm, dog leg method, etc.

Step S40 introduces a nonlinear Error State Kalman Filter (ESKF) to estimate and compensate the system state and measurement noise in real time. ESKF divides the system state into two parts, namely the real stateAnd error status/>The relationship is as follows:

；

Wherein the method comprises the steps of For some complex operation, for example, there is a rotation matrix:

；

to the above pair Linearization is performed to obtain a propagation equation and an observation equation of the error state:

；

Wherein the method comprises the steps of Is a state transition matrix,/>For observing matrix,/>And/>Process noise and observation noise, respectively. By extending the prediction-update step of the Kalman filter, one can obtain/>Optimal estimation/>And utilizeAnd updating the estimated value of the real state. Meanwhile, ESKF can also estimate parameters such as offset, noise covariance and the like of the sensor, and utilize the parameters to compensate measurement data in real time, so that the estimation accuracy and stability of the system are improved.

Step S50 is to combine the priori knowledge and the motion constraint condition to construct an error accumulation model and predict the error distribution at the future time. Is provided withMean value is/>Covariance matrix is/>Is a multi-element gaussian distribution of (c). Estimation of the System State at the present instant is/>. According to the state transition equation of the system:

；

Wherein the method comprises the steps of Is a known state transfer function,/>For control input,/>For process noise, a priori state distribution at the next time can be inferred:

；

Wherein the method comprises the steps of Is the covariance matrix after propagation. According to the observation equation:

；

Wherein the method comprises the steps of Is a known observation function,/>To observe noise, a distribution of predicted observables can be calculated:

；

Wherein the method comprises the steps of Is an observation matrix. Distributing/>, a priori stateAnd predicting observed quantity distribution/>Substituting the trained error accumulation model, the error distribution at the next moment can be estimated. In the process, known prior conditions such as environmental constraints, motion modes and the like can be introduced to correct the error prediction result. And finally, correcting the system state at the future moment according to the corrected error prediction distribution to obtain an accurate state estimation value.

With respect to step S60, the contribution degree of each sensor to the overall error can be calculated from the error distribution predicted in the previous step. Assume the firstThe error component of the individual sensors is/>Wherein/>Is a mean vector,/>Is a covariance matrix. The contribution of this sensor to the overall error can be expressed in terms of its mean norm/>Or variance trace/>To represent.

Setting sensor weightsUpper and lower threshold of/>And/>When the sensor/>When the error contribution of (2) exceeds the upper threshold, the weight/>, is reduced; When the weight is lower than the lower threshold value, the weight/>, is increased. The choice of threshold may be determined from a statistical distribution of error contributions, e.g. upper threshold/>The 80% quantile of the error contribution is taken, the lower threshold/>20% Quantiles may be taken.

Next, a sensor weight adjustment model is constructed, and weight vectors are usedAs variables to be optimized to minimize overall systematic error/>The method aims at:

；/>

Wherein the method comprises the steps of To take into account the weighted norms of the covariance. Meanwhile, to avoid severe fluctuations in weight, a smoothness constraint may be introduced:

；

Wherein the method comprises the steps of Is the weight vector of the last moment,/>Is a weight coefficient of the smoothing term. By solving the above-described optimization problem, a new weight configuration/>, can be obtained. Will/>And updating the sensor data into a multi-source data fusion model for subsequent pose estimation calculation, so that the influence of the sensor with large error contribution on the system is reduced, and the robustness of the system is improved.

Step S70 is to periodically align with the known calibration positions and recalibrate the pose errors. First, some known calibration positions in the environment need to be acquired in advance, which can be manually set markers or environment features modeled in advance. Set the firstThe position of each calibration position is/>The corresponding descriptor is/>。

These calibration positions are periodically detected and identified during system operation using visual or laser sensors. Assume that at the current timeDetection of the/>Obtaining the descriptors/>, of the calibration positionsAnd current pose estimation/>The coordinates of the position/>The pose deviation is:

；

Wherein the method comprises the steps of For a camera projection model or a point cloud matching model, according to the current pose/>And observation/>And calculating the coordinates of the calibration position.

To suppress the effects of outliers, a robust cost function may be constructed:

；

Wherein the method comprises the steps of Is a robust kernel function, such as Huber kernel, cauchy kernel, etc.; /(I)Index sets for all calibration positions detected at the current moment; /(I)Is the corresponding covariance matrix. Solving the minimum value of the cost function to obtain the pose correction increment/>：

；

Will beThe method is applied to the current pose, and can be aligned with the calibration position:

；

Meanwhile, the parameters of the error accumulation model need to be updated to be matched with the new pose state. Through the steps, the pose errors can be periodically pulled back to a smaller level, so that the rapid accumulation of errors is delayed.

Specifically, the principle of the invention is as follows: the method can effectively reduce XR long-time error accumulation, and is characterized by deeply analyzing the error accumulation rule and providing a brand-new error control frame. Specifically, the technical principle of the scheme comprises the following aspects:

1. error accumulation model establishment

According to the invention, the pose errors are disassembled into different error sources, such as IMU integral drift errors, vision measurement errors, depth measurement errors and the like, by analyzing the real operation data. For each error source, a corresponding error model is constructed according to the change rule of the error source in time and space, for example, IMU drift errors can be fitted by using a random walk model, and vision measurement errors can be fitted by using a Gaussian noise model. In addition, the influence of factors such as a motion mode, an environment structure and the like on error accumulation is considered, and an error model is further improved. Based on this error model, the error distribution at the future time instant can be predicted.

2. Active control based on error prediction

With the prediction of future errors, the scheme of the invention can actively take corresponding control measures instead of passively correcting the errors until the errors accumulate to a certain degree. For example, the sensor weight is dynamically adjusted according to the prediction result, so that the sensor weight with large error contribution is reduced, and the overall system error is reduced; and the prediction result is corrected by combining environmental constraint, so that the error diffusion rate is reduced in advance.

3. Multi-level error management strategy

The scheme adopts a strategy under multi-tube alignment on error control, and carries out omnibearing management on errors through different levels of means such as pose correction, statistical correction, repositioning and the like. The periodic pose correction can effectively control the upper limit of error increase; the statistical correction utilizes the regularity of the data and minimizes error offset from a probability angle; the repositioning mechanism is the last defense line, and ensures the usability of the system. This multi-level management allows the increase of errors to be omnidirectionally constrained.

It should be noted that the variables or formulas involved in the present specification are explained as follows:

: triaxial acceleration, unit/> ；

: Three-axis angular velocity, unit/>；

: Image gray value,/>Is pixel coordinates/>,/>And/>Image width and height, respectively;

: the filtered image gray value;

: a filter radius;

: depth value,/> Is pixel coordinates/>For time stamp, unit/>；

: A point cloud data set;

: first/> Three-dimensional coordinates of individual points, units/>；

: The total number of points in the point cloud;

: inquiring points;

: query Point/> Is a filtered result of (a);

: the number of nearest neighbors;

: a system state vector;

: a control input;

: an observed quantity;

: process noise, compliance/> ；

: Measurement noise, compliance/>；

: A state transition matrix; /(I)

: Controlling an input matrix;

: observing a matrix;

: system state, including rotation/> Location/>Speed/>Add bias/>Gyro bias/>；

: A cost function;

: a robust kernel function;

: first/> A term residual;

: feature point observables (e.g., image coordinates);

: a camera projection model;

: three-dimensional feature points;

: IMU pre-integration measurements;

: plane parameters,/> Is normal vector,/>Is the distance to the origin;

: plane constraint residual errors;

: an error state;

: complex operations, e.g. on rotation/> ；

: An error state transition matrix;

: error state process noise;

: an error state observation matrix;

: observing noise;

: mean/> Covariance/>Is a gaussian distribution of (c);

: a state transfer function;

: observing a function;

: first/> Weights of the individual sensors;

: a weight upper threshold;

: a lower weight threshold;

: a weight vector;

: first/> Error distribution of individual sensors;

: a weighted norm;

: the weight vector at the last moment;

: smoothing the term weight coefficient;

: first/> True positions of the calibration positions;

: first/> Descriptors of the calibration positions; /(I)

: Observed/>Calibrating position descriptors;

: the/>, obtained according to the current pose estimation Calibrating position coordinates;

: pose deviation;

: a calibration position index set currently detected;

: and (5) correcting the position and posture and increasing.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention.

Claims

1. A method of reducing XR long term error accumulation comprising the steps of:

S20, preprocessing sensor data by using a filtering algorithm:

2. The method of claim 1, wherein the sensor data comprises data from an inertial measurement unit, a vision sensor, and a depth sensor.

3. The method for reducing XR long time error accumulation according to claim 1, wherein the step of preprocessing the sensor data by using a filtering algorithm comprises: filtering the image data by adopting bilateral filtering to remove high-frequency noise; and removing abnormal values of the inertia and depth data by adopting a Kalman filtering method.

4. The method for reducing XR long-term error accumulation according to claim 1, wherein the step of establishing a comprehensive optimization model, fusing multi-source data including vision, inertia and depth, and solving an optimal pose estimation value comprises the following specific steps: performing target detection and track association on the visual data to acquire target motion information; fusing target motion information with inertia and depth data to construct an error cost function; and solving the optimal pose solution by adopting an optimization algorithm.

5. The method for reducing XR long-term error accumulation according to claim 1, wherein the step of constructing an error accumulation model by combining a priori knowledge and motion constraint conditions, and predicting the error change of the future state comprises the following steps: establishing a terrain model, and judging the steep degree of the terrain by utilizing the frequency characteristics of adjacent areas; adopting smooth constraint for the flat area; for steep areas, edge-preserving constraint is adopted; and merging constraint conditions into an error model, and predicting the future error change trend.

6. The method according to claim 1, wherein the step of periodically aligning with a known calibration position and recalibrating the pose error is performed by: when a known calibration target is detected, aligning the estimated pose with the calibration position; and correcting the pose error by using the alignment pose difference value.

7. The method for reducing XR long-term error accumulation according to claim 1, wherein the step of constructing an error distribution model and correcting the error offset in the subsequent application by using a statistical law comprises: counting the distribution rule of pose errors under different application scenes; and applying the distribution model to the new scene to correct the systematic error.

8. The method of claim 4, wherein the optimization algorithm uses gauss newton's method.

9. A computer readable storage medium having stored therein program instructions which, when executed, are adapted to carry out the method of reducing XR long term error accumulation of any one of claims 1 to 8.

10. A system for reducing XR long term error accumulation comprising the computer readable storage medium of claim 9.