WO2021160098A1

WO2021160098A1 - Error state kalman filter for visual slam by dynamically tuning measurement noise covariance

Info

Publication number: WO2021160098A1
Application number: PCT/CN2021/076067
Authority: WO
Inventors: Peng HE
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-02-13
Filing date: 2021-02-08
Publication date: 2021-08-19
Also published as: CN115003983B; CN115003983A

Abstract

Techniques for performing image rate pose measurement update are described. In an example, a computer system (110) receives a computed position from an inertial measurement unit (113). The computer system (110) receives multiple images from a camera (112, 114). The computer system (110) predicts an error state comprising a state variable and a state covariance. The computer system (110) uses an optimization process to provide a first SLAM pose at a first frequency and pure visual odometry to provide a second SLAM pose at a second frequency. The computing system (110) updates the state variable and the state covariance using the first SLAM pose or second SLAM pose.

Description

ERROR STATE KALMAN FILTER FOR VISUAL SLAM BY DYNAMICALLY TUNING MEASUREMENT NOISE COVARIANCE

BACKGROUND OF THE INVENTION

Inertial navigation is a classical method for localization. Inertial navigation uses one or more sensors, for example, motion sensors or rotation sensors, to continuously calculate and adjust the position, the orientation, and the velocity of a moving object. Motion and rotation sensing can be accomplished via an inertial measurement unit (IMU) . An advantage of inertial navigation is that it relies only on the sensor itself without the need for external references such as global positioning systems, sonar location mapping, or camera based location estimation.

Another method of determining location is applying an Extended Kalman Filter (EKF) , such as with a global positioning system. However, in EKF, orientation needs to use quaternion or rotation matrix, which have many parameters. Sometimes, EKF has high-order products, which makes the computation of Jacobians difficult and slow. In addition, when state values are too large, computing Jacobians using EKF may have various computational problems, e.g., possible parameter singularities, gimbal lock issues, etc., which cannot provide a guarantee that the linearization validity holds at all times.

SUMMARY OF THE INVENTION

Embodiments of the present invention are related to updating an ESKF filter with image rate measurements by interleaving measured updates to more tightly couple predicted state variables and measured state variables. Embodiments of the present invention are applicable to a variety of applications in augmented reality and computer-based display systems.

According to an embodiment of the present invention, a method of updating state variable in a simultaneous localization and mapping system is provided. The method includes receiving a computed position from an inertial measurement unit that includes a position, orientation, and velocity. The method includes receiving multiple images from a camera. The method further includes predicting an error state including a state variable and a state covariance. The method includes processing a first subset of images from the multiple images using an optimization process to provide a first SLAM pose at a first frequency. The method further includes processing a second subset of images from the multiple images using pure visual odometry to provide a second SLAM pose at a second frequency. The method further includes updating the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose. In one example, the state variable includes a vector of at least velocity, an accelerometer metric, gyroscopic metric, or gravitational measurement.

According to an embodiment of the present invention, a computing system for updating a state variable and state covariance of a SLAM system is provided. The computing system includes an inertial measurement unit. The computing system further includes a processor and one or more memories storing instructions executable by the processor. The processor executes instructions that configure the computing system to update the state variable and state covariance of the SLAM system. The processor configures the computing system to receive a computed position from an inertial measurement unit that includes a position, orientation, and velocity. The processor configures the computing system to receive multiple images from a camera. The processor configures the computing system to predict an error state including a state variable and a state covariance. The processor configures the computing system to use an optimization process for a first subset of images from the multiple images to provide a first SLAM pose at a first frequency. The processor configures the computing system to use pure visual odometry on a second subset of images from the multiple images to provide a second SLAM pose at a second frequency. The processor configures the computing system to update the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose.

Numerous benefits are achieved by way of the present invention over conventional techniques. For example, embodiments of the present invention provide a SLAM output for every image at the frame rate so that the SLAM pose can be used for measurement updates in an ESKF process. Further, the dynamic tuning of measurement noise covariance can make the SLAM pose more accurate as this covariance value represents the more confident the filter is with the measurement. The measurement update may come directly from the output of visual SLAM. These and other embodiments of the invention along with many of its advantages and features are described in more detail in conjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a computer system that includes a depth sensor and red, green, and blue (RGB) optical sensors for AR applications according to an embodiment of the present invention.

FIG. 2 illustrates an example of a block diagram for using Error-State Kalman Filtering (ESKF) to update state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention.

FIG. 3 illustrates another example of a flow for updating state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention.

FIG. 4 is a simplified flowchart illustrating a method of updating state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention.

FIG. 5A illustrates an example of a plot of iris pose, predicted iris pose, and ESKF pose according to an embodiment of the present invention.

FIG. 5B is a plot of the derivative of the predicted iris pose and ESKF pose according to an embodiment of the present invention.

FIG. 6A illustrates an example of a plot of iris pose, predicted iris pose, and ESKF pose, according to an embodiment of the present invention.

FIG. 6B illustrates an x-derivative of the predicted iris pose and the ESKF pose, according to an embodiment of the present invention.

FIG. 7 illustrates an example computer system according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Embodiments of the present invention relate to tightly coupling inertial measurement sensors (IMU) to Visual-inertial Simultaneous Localization and Mapping (SLAM) systems used in Augmented Reality (AR) systems. SLAM involves determining and mapping a previously-unknown real-world environment using image capture and determining a position in the real-world using visual information in the captured images.

In particular, embodiments of the present invention provide improvements over conventional inventions by providing SLAM pose measurement updates to an EKSF process at the image capture rate. For example, certain embodiments combine an optimized SLAM pose and a pure visual odometry SLAM pose to provide higher measurement update rates. In turn, the ESKF process may be used to generate more accurate measurement updates and support higher rates of movement while maintaining the ESKF processing method. Dynamic tuning of measurement noise covariance can make the pose output more accurate, as this covariance value represents how confident the filter is with the measurement. The measurement comes directly from the output of visual SLAM. The coupling of SLAM with optimized pose determination and SLAM using pure visual odometry to provide a higher frequency update (up to and including image rate) of the measurement position improves the predicted state and provides a solution for fusing IMU information with visual SLAM for higher velocity applications.

FIG. 1 illustrates an example of a computer system 110 that includes a depth sensor 112 and an RGB optical sensor 114 for augmented reality applications according to an embodiment of the present invention. FIG. 1 includes a computer system 110 that includes a depth sensor 112, an Inertial Measurement Unit (IMU) 113, and an RGB optical sensor 114 for AR applications including calibration of one or more sensors. The AR applications can be implemented by an AR module 116 of the computer system 110.

In an example, the computer system 110 represents a suitable user device that includes, in addition to the depth sensor 112 and the RGB optical sensor 114, one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present invention. For instance, the computer system 110 can be any of a smartphone, a tablet, an AR headset, or a wearable AR device.

The depth sensor 112 has a known maximum depth range (e.g., a maximum working distance) and this maximum value may be stored locally and/or accessible to the AR module 116. The depth sensor 112 can be a ToF camera. In this case, the depth map generated by the depth sensor 112 includes a depth image.

The IMU 113 can detect inertial measurements and transmit measurements of the detected inertial measurements to computer system 110. Examples of measurements include position, angular velocity, and acceleration. IMU 113 can include an accelerometer and/or a gyroscope. An accelerometer can output a measurement of acceleration in one or more directions (e.g., x, y, and/or z) . A gyroscope can output a measurement of angular velocity. In turn, the computer system 110 receives the inertial measurements and adjusts the AR scene 120 based on the received measurements.

The RGB optical sensor 114 can be a color camera. The depth image and the RGB image can have different resolutions. Typically, the resolution of the depth image is smaller than that of the RGB image. For instance, the depth image has a 640x180 resolution, whereas the RGB image has a 1920x1280 resolution. Generally, the RGB optical sensor 114 generates an RGB image of a real-world environment that includes, for instance, a real-world object 130. The depth sensor 112 generates depth data about the real-world environment, where this data includes, for instance, a depth map that shows depth (s) of the real-world object 130 (e.g., distance (s) between the depth sensor 112 and the real-world object 130) .

In addition, the depth sensor 112, the IMU 113, and/or the RGB optical sensor 114, as installed in the computer system 110, may be separated by a transformation (e.g., distance offset, field of view angle difference, etc. ) . This transformation may be known and its value may be stored locally and/or accessible to the AR module 116. When cameras are used, the ToF camera and the color camera can have similar field of views. But due to the transformation, the field of views would partially, rather than fully, overlap.

The AR module 116 can generate a red, green, blue, and depth (RGBD) image from the RGB image and the depth map to detect an occlusion of the virtual object 124 by at least a portion of the real-world object representation 122 or vice versa. The AR module 116 can additionally or alternatively generate a 3D model of the real-world environment based on the depth map, where the 3D model includes multi-level voxels. Such voxels are used to detect collision between the virtual object 124 and at least a portion of the real-world object representation 122. The AR scene 120 can be rendered to properly show the occlusion and avoid the rendering of the collision. The AR module 116 can be implemented as specialized hardware and/or a combination of hardware and software (e.g., general purpose processor and computer-readable instructions stored in memory and executable by the general purpose processor) . In addition to initializing an AR session and performing VIO, the AR module 116 can detect occlusion and collision to properly render the AR scene 120.

In an illustrative example of FIG. 1, a smartphone is used for an AR session that shows the real-world environment. In particular, the AR session includes rendering an AR scene that includes a representation of a real-world table on top of which a vase (or some other real-world object) is placed. A virtual ball (or some other virtual object) is to be shown in the AR scene. In particular, the virtual ball is to be shown on top of the table too. By tracking the occlusion between the virtual ball and a virtual vase (that represents the real-world vase) , the virtual vase can occlude, in parts of the AR scene, when the virtual ball is behind the virtual vase relative to the pose of the smartphone. In other parts of the AR scene, the virtual ball can occlude the virtual vase when the virtual vase is behind the virtual ball relative to a change in the pose of the smartphone. And in remaining parts of the AR scene, no occlusion is present. In addition, a user of the smartphone can interact with the virtual ball to move the virtual ball on the top surface of the virtual table (that represents the real-world table) . By tracking a possible collision between the virtual ball and the virtual object, any interaction that would cause collision would not be rendered. In other words, the collision tracking can be used to control where the virtual ball can be moved in the AR scene.

In another example of FIG. 1, computer system 110 performs an ESKF process to provide a measurement update using of one or more sensors such as depth sensor 112 and/or RGB optical sensor 114. For example, computer system 110 can perform measurement updates using both an optimized SLAM pose and a SLAM pose obtained using pure visual odometry.

FIG. 2 illustrates an example of a process for using an Error-State Kalman Filtering (ESKF) process 201 to update state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention.

In an example, the ESKF process 201 receives an input from an inertial measurement unit (IMU) and an input from a visual SLAM module. For example, the ESKF process 201 receives the IMU raw data 202 from the IMU. IMU raw data 202 may include acceleration and angular velocity information of the camera system. The ESKF process 201 predicts an error-state at block 204 using the IMU raw data 202 to generate a predicted state variable. An example of the predicted state variable is a vector including matrices that include pose information including 3-axes position, the 3-axes velocity, 3-axes orientation (e.g., a quaternion) , and IMU sensor information including 3-axes accelerometer bias, 3-axes gyroscope bias. Using the IMU raw data, the ESKF 201 predicts the predicted state variable. The ESKF process 201 may also predict a predicted state covariance.

The ESKF process 201 predicts the error state

(e.g., the state variable) by computing the error in the state variable x by computing the predicted state

as represented by

The state transition model F _x of the state variable x and the state of prediction u _m are used to predict the error state (the predicted state variable) . The ESKF process 201 may also predict the error state covariance (the predicted state covariance) as represented by

The predicted covariance P is computed by combining the state transition model F _x, the state covariance P, the transpose of the transition model

and the initial transition model F _i, an estimated covariance Q _i, and the transpose of the initial transition model

Continuing with the present example, the computing system 110 may capture multiple images of a real-world scene. For instance, the computing system 110 can receive image capture data 206 from a camera and perform visual SLAM 208. The computing system 110 can process the image capture data 206 using visual SLAM 208 to generate an image rate pose 210. The ESKF process 201 may receive the image rate pose 210 from computing system 110. Additional detail with regard to image rate pose 210 generation is likely best understood with regard to FIG. 5.

The ESKF process 201 performs a measurement update at block 212 by using a relationship between a measurement vector that is estimated from visual SLAM 208 and a noise factor. For instance, the ESKF process 201 update equations can be represented as a relationship between the measurement vector, including position, velocity, orientation, accelerometer bias, gyroscope bias, gravitational acceleration, that are estimated from the visual SLAM 208 and a Gaussian white noise. For instance, y=h (x _t) + v, where y is the measurement vector, including position, velocity, orientation, accelerometer bias, gyroscope bias, gravitational acceleration, estimated from our visual SLAM, h () is a nonlinear function of the system state (the true state) , and v is a white Gaussian noise with covariance V. The white Gaussian noise v can be represented as

where the white Gaussian noise is drawn from a zero-mean normal distribution with variance N. In order to linearize the h () function, Jacobian matrix H can be used. In the present example, the Jacobian matrix H is a 19*19 Identity matrix. The ESKF correction or update equations can, therefore, be represented as

and P ← (I-KH) P.

In some examples, during the update process, the ESKF sets the measurement noise covariance V, described above, to update the state variables and state covariance. Generally, this measurement noise covariance is initialized and fixed during the ESKF update. However, the measurement noise covariance changes due to the integration of visual SLAM to determine the measurement vector. Performing visual SLAM 208 results in various quality levels of SLAM performance that can be characterized by various SLAM quality metrics (e.g., good, weak, lost, etc. ) . In the example of FIG. 2, the image rate pose 210 (e.g., a slam pose) is used by ESKF process 201 as the measurement vector. Due to the variations in SLAM quality, the ESKF process 201 tunes the covariance dynamically according to the performance of visual SLAM. In some embodiments, a lower value of the measurement noise covariance (or the narrower a normal distribution is) increases a confidence metric of the filter relating to the measurement. For example, in a measurement that is characterized with visual SLAM performance that is good, the measurement noise covariance should be a lesser value. In another example of a measurement that is characterized with visual SLAM performance that is poor or lost, the measurement noise covariance should be a greater value.

The ESKF process 201 injects the measured error into the nominal state at block 214. For instance, at block 214, the ESKF process 201 performs a direct sum of the full state x with the measured error

Amathematical representation of the injection may be represented as

The measured error

may be a vector that includes one or more error terms or matrices. In some embodiments, the measured error

is a vector that includes measured error for the 3-axes position

the 3-axes velocity

3-axes orientation (e.g., a quaternion)

however, may include additional or alternative measured error components.

The ESKF process 201 resets the ESKF at block 216. For instance, the error state

is reset to 0. The ESKF process 201 tunes the covariance, in some cases using the formula

FIG. 3 illustrates another example of a flow for updating state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention. FIG. 3 illustrates the operations in a manner similar to FIG. 2 and additionally is annotated with formulas for the operations that may be performed at one or more steps as described with regard to FIG. 2.

For instance, various equations are illustrated in FIG. 3 at various positions of the ESKF process. As described with regard to FIG. 2, the ESKF process may perform various computations for each step. In one example, the equations above the arrow between IMU raw data and the ESKF process step error state prediction represent the mathematical prediction of the state variable and the prediction of the covariance. Similarly, the ESKF process illustrates that the measurement update may be computed from the SLAM pose, as described with regard to block 212. In another example, the full state x is computed by injecting the observer (i.e., measured) error state into the state variable. This can be represented by the equation

and the description of block 214. Various elements from FIG. 2 have been maintained with the same reference numerals.

Referring to FIG. 3, the ESKF generates a predicted state

301 (e.g., the predicted state by the ESKF) by predicting a predicted state

301 of the full state variable x. In the embodiment of FIG. 3, the predicted state 301 is shown by the computation

In addition to the predicted state, the ESKF may also generate a predicted state covariance 302. In the example illustrated by FIG. 3, the predicted state covariance 302 can be calculated by

The ESKF performs a measurement update by generating the measured state using a non-linear function of the true state h () and adding a white Gaussian noise v. The ESKF measurement update equations 303 can, therefore, be represented as

and P ← (I-KH) P. While the ESKF is performing the measurement update, the ESKF sets the measurement noise covariance V. The ESKF sets the measurement noise based at least in part on the quality level of the visual SLAM. The optimized SLAM pose and pure visual odometry SLAM pose may be impacted by performance of visual SLAM. Accordingly, dynamically tuning the measurement noise renders a more accurate SLAM pose. The ESKF incorporates the measured error into the nominal state variable by performing a direct sum 304 of the full state x with the measured error

The ESKF resets the cycle by setting the predicted state to 0 and tuning the covariance for the next iteration of the ESKF by using reset equations 305.

FIG. 4 is a simplified flowchart illustrating a method of updating state variables in a simultaneous localization and mapping (SLAM) system according to an embodiment of the present invention. Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations. Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, and/or reordered.

In an example, the flow starts at operation 402, where the computer system receives a computed position from an inertial measurement unit. An IMU is a navigation device that continuously calculates a position, orientation, or velocity using dead reckoning of the moving object without the need for external references or sensors.

In an example, the flow includes operation 404, where the computer system receives a plurality of images from a camera. For instance, the computing system uses a camera system, such as RGB optical sensor 114 or depth sensor 112 to generate image data to represent a real world scene.

In an example, the flow includes operation 406, where the computer system predicts an error state comprising a state variable and a state covariance from the computed position as described with regard to FIGS. 2 and 3. The computer system can predict the error state covariance (the predicted state covariance) as represented by

and rhw initial transition model F _i, an estimated covariance Q _i, the transpose of the initial transition model

The computer system may also compute the predicted state

as represented by

The state transition model F _x of the state variable x and the state of prediction u _m are used to predict the error state (the predicted state variable) .

In an example, the flow includes operation 408, where the computer system processes a first subset of images of the plurality of images to provide a first SLAM pose at a first frequency. This first SLAM pose can be referred to as an optimized SLAM pose since it is generated, in some embodiments, by bundling images and performing an optimization process to generate an optimized SLAM pose . Generally, keyframe-based visual SLAM methods cannot always output with image rate, because of the slow optimization process in the back-end. For example, for an image with a frequency of 30 Hz, the optimization process results in a 10 Hz SLAM pose. The optimization process may be any optimization process that generates an optimized SLAM pose (e.g., Bayesian, graphical, non-linear, etc. ) .

In an example, the flow includes operation 410, where the computer system processes a second subset of images of the plurality of images with pure visual odometry to provide a second SLAM pose at a second frequency. This second SLAM pose can be referred to as a pure visual odometry SLAM pose since it is generated, in some embodiments, by performing visual odometry without an optimization process as with the first SLAM pose. As described in the above example, for a computer system using an image frequency of 30 Hz, the computer system processes a second subset of images of the plurality of images at 20 Hz. For instance, the computer system processes the second subset of images using only front-end processing techniques such as Perspective-n-Point.

In an example, the flow includes operation 412, where the computer system updates the state variable and the state covariance using the first SLAM pose or the second SLAM pose. The ESKF process 201 performs an update of the state variable and the state covariance using the first SLAM pose at the first frequency (i.e., the optimize SLAM pose) and/or the second SLAM pose at the second frequency (i.e., the pure visual odometry SLAM pose) . The computer system performs a measurement update using the first SLAM pose or the second SLAM pose to achieve an image pose rate (e.g., using the first frequency and the second frequency to achieve a sum of the frequencies) . Thus, in the example of a first frequency of 10 Hz, and a second frequency of 20 Hz, the computer system performs a measurement update at an image pose rate of 30 Hz.

It should be appreciated that the specific steps illustrated in FIG. 4 provide a particular method of updating state variables in a SLAM system according to an embodiment of the present invention. Other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 4 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 5A illustrates an example of a plot 501 of iris pose, predicted pose, and ESKF pose according to an embodiment of the present invention. FIG. 5B is a plot of the derivative of iris pose and an Kalman Filtered (i.e., ESKF, EKF, etc. ) pose according to an embodiment of the present invention. Referring to FIG. 5A, plot 501 illustrates poses 503 including an iris pose 503A, a predicted iris pose 503B, and an ESKF pose 503C. The poses 503 illustrate a visual representation of the difference in the actual position with reference to a y-axis 502 and an x-axis 504 of (iris pose 503A) , IMU pose (predicted iris pose 503B) , and the smoothed ESKF pose (ESKF pose 503C) . The poses 503 represent the positions (or trajectories) of the iris performing the process as described with regards to embodiments of the present disclosure. During movements, the position (or trajectory) will vary based on the movements. In the example of FIG. 5A, the position/trajectory is shown as a linearization in two dimensions x (horizontal axis) 504 and y (vertical axis) 502. In one embodiment that represents trajectory, the x-axis 504 and y-axis 502 may represent the trajectory intensity in the axis. For example, a value of (-3, -3) may represent a trajectory that is changing position at -3x per meter and -3y per meter.

Referring to FIG. 5B, which is a plot 507 of the y-derivative of predicted iris pose 503B and ESKF pose 503C, one can see that as the iris changes position and/or trajectory with respect to a y-axis 506, and an x-axis 505, and the derivatives reflect the changes in the y-axis. In an embodiment where the predicted iris pose 503B and ESKF pose 503C represent the trajectory of the iris, the y-derivative plot depicts the change in the linearization of the predicted iris pose 503B and ESKF pose 503C in the y-axis trajectory.

FIG. 6A illustrates an example of a plot 601 of iris pose, predicted pose, and ESKF pose, according to an embodiment of the present invention. For example, FIG. 6A illustrates a plot 601 that illustrates poses 603 including an iris pose 603A, a predicted iris pose 603B, and an ESKF pose 603C. The poses 603 illustrate a visual representation of the difference in the actual position (iris pose 603A) , IMU pose (predicted iris pose 603B) , and the smoothed ESKF pose (ESKF pose 603C) with respect to an x-axis 604 and a y-axis 602.

Referring to FIG. 6B, which is a plot 607 that illustrates an x-derivative of the predicted iris pose 603B and the ESKF pose 603C, one can see that as the iris changes position and/or trajectory, the derivatives reflect the changes in the x-axis. In an embodiment where the predicted iris pose 603B and ESKF pose 603C represent the trajectory of the iris, the y-derivative plot depicts the change in the linearization of the predicted iris pose 603B and ESKF pose 603C in the x-axis trajectory with respect to a y-axis 605 and x-axis 606.

FIG. 7 illustrates examples of components of a computer system 700 according to certain embodiments. The computer system 700 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 700, the computer system 700 can also be distributed.

The computer system 700 includes at least a processor 702, a memory 704, a storage device 706, input/output peripherals (I/O) 708, communication peripherals 710, and an interface bus 712. The interface bus 712 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 700. The memory 704 and the storage device 706 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example

memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 704 and the storage device 706 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 700.

Further, the memory 704 includes an operating system, programs, and applications. The processor 702 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 704 and/or the processor 702 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 708 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 708 are connected to the processor 702 through any of the ports coupled to the interface bus 712. The communication peripherals 710 are configured to facilitate communication between the computer system 700 and other computing devices over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present invention has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present invention. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present invention.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical, electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present invention. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Claims

A method of updating state variables in a simultaneous localization and mapping (SLAM) system, the method comprising:

receiving a computed position from an inertial measurement unit, wherein the computed position comprises a position, an orientation, and a velocity;

receiving a plurality of images from a camera;

predicting, using the computed position, an error state comprising a state variable and a state covariance, wherein the state variable comprises a vector comprising one or more of a position, a velocity, an accelerometer metric, a gyroscope metric, or a gravitational measurement;

processing, using an optimization process, a first subset of images of the plurality of images to provide a first SLAM pose at a first frequency;

processing, using pure visual odometry, a second subset of images of the plurality of images to provide a second SLAM pose at a second frequency; and

updating the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose.
The method of claim 1 wherein providing the first SLAM pose comprises a first set of state parameters that are optimized by an optimization process and providing the second SLAM pose comprises a second set of state parameters that bypass the optimization process.
The method of claim 1 wherein the plurality of images are captured at an image frequency equal to a sum of the first frequency and the second frequency.
The method of claim 1 wherein updating the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose comprises updating the state variable and the state covariance at an image rate pose comprising:

using the first SLAM pose to update the state variable and the state covariance at a rate of 10 updates per second; and

using the second SLAM pose to update the state variable and the state covariance at a rate of 20 updates per second.
The method of claim 4 wherein the image rate pose further comprises an interleaved sequence of the first SLAM pose and the second SLAM pose.
The method of claim 1 wherein the state covariance is computed based on at least a first quality metric of the first SLAM pose or a second quality metric of the second SLAM pose.
The method of claim 1 wherein updating the state variable and the state covariance using at least the first SLAM pose or the second SLAM pose comprises updating the state variable and state covariance using the computed position when a SLAM quality metric is lost.
A computer system comprising:

a processor;

an inertial measurement unit; and

one or more memories storing computer-readable instructions that, upon execution by the processor, configure the computer system to:

receive a computed position from an inertial measurement unit, wherein the computed position comprises a position, an orientation, and a velocity;

receive a plurality of images from a camera;

predict, using the computed position, an error state comprising a state variable and a state covariance, wherein the state variable comprises a vector comprising one or more of a position, a velocity, an accelerometer metric, a gyroscope metric, or a gravitational measurement;

process, using an optimization process, a first subset of images of the plurality of images to provide a first SLAM pose at a first frequency;

process, using pure visual odometry, a second subset of images of the plurality of images to provide a second SLAM pose at a second frequency; and

update the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose.
The computer system of claim 8 wherein the plurality of images are captured at an image frequency equal to a sum of the first frequency and the second frequency.
The computer system of claim 8 wherein updating the state variable and the state covariance using the first SLAM pose or second SLAM pose comprises updating the state variable and the state covariance at an image rate pose comprising:

using the first SLAM pose to update the state variable and the state covariance at a rate of 10 updates per second; and

using the second SLAM pose to update the state variable and the state covariance at a rate of 20 updates per second.
The computer system of claim 10 wherein the image rate pose further comprises an interleaved sequence of the first SLAM pose and the second SLAM pose.
The computer system of claim 8 wherein the state covariance is computed based on a first quality metric of the first SLAM pose or a second quality metric of the second SLAM pose.
The computer system of claim 8 wherein updating the state variable and the state covariance using the first SLAM pose or second SLAM pose comprises updating the state variable and state covariance using the computed position when a SLAM quality metric is lost.
One or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations including:

receiving a computed position from an inertial measurement unit, wherein the computed position comprises a position, an orientation, and a velocity;

receiving a plurality of images from a camera;

predicting, using the computed position, an error state comprising a state variable and a state covariance, wherein the state variable comprises a vector comprising one or more of a position, a velocity, an accelerometer metric, a gyroscope metric, or a gravitational measurement;

processing, using an optimization process, a first subset of images of the plurality of images to provide a first SLAM pose at a first frequency;

processing, using pure visual odometry, a second subset of images of the plurality of images to provide a second SLAM pose at a second frequency; and

updating the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose.
The one or more non-transitory computer-storage media of claim 14 wherein the plurality of images are captured at an image frequency equal to a sum of the first frequency and the second frequency.
The one or more non-transitory computer-storage media of claim 14 wherein updating the state variable and the state covariance using at least one of the first SLAM pose or the second SLAM pose comprises updating the state variable and the state covariance at an image rate pose comprising:

using the first SLAM pose to update the state variable and the state covariance at a rate of 10 updates per second; and

using the second SLAM pose to update the state variable and the state covariance at a rate of 20 updates per second.
The one or more non-transitory computer-storage media of claim 16 wherein the image rate pose further comprises an interleaved sequence of the first SLAM pose and the second SLAM pose.
The one or more non-transitory computer-storage media of claim 14 wherein the state covariance is computed based on a first quality metric of the first SLAM pose or a second quality metric of the second SLAM pose.
The one or more non-transitory computer-storage media of claim 14 wherein updating the state variable and the state covariance using the at least one of the first SLAM pose or the second SLAM pose comprises updating the state variable and state covariance using the computed position when a SLAM quality metric is lost.
The one or more non-transitory computer-storage media of claim 14 wherein providing the first SLAM pose comprise a first set of state parameters that are optimized by an optimization process and providing the second SLAM pose comprises a second set of state parameters that bypass the optimization process.