WO2021178980A1 - Synchronisation de données et prédiction de pose en réalité étendue - Google Patents

Synchronisation de données et prédiction de pose en réalité étendue Download PDF

Info

Publication number
WO2021178980A1
WO2021178980A1 PCT/US2021/029607 US2021029607W WO2021178980A1 WO 2021178980 A1 WO2021178980 A1 WO 2021178980A1 US 2021029607 W US2021029607 W US 2021029607W WO 2021178980 A1 WO2021178980 A1 WO 2021178980A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose data
time
current
data
prior
Prior art date
Application number
PCT/US2021/029607
Other languages
English (en)
Inventor
Peng HE
Original Assignee
Innopeak Technology, Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innopeak Technology, Inc filed Critical Innopeak Technology, Inc
Priority to PCT/US2021/029607 priority Critical patent/WO2021178980A1/fr
Publication of WO2021178980A1 publication Critical patent/WO2021178980A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • This application relates generally to data processing technology including, but not limited to, methods, systems, and non-transitory computer-readable media for synchronizing data captured by different sensors of an electronic system and predicting poses using the synchronized data.
  • SLAM Simultaneous localization and mapping
  • VR virtual reality
  • AR augmented reality
  • autonomous driving and navigation.
  • high frequency pose estimation is enabled by sensor fusion.
  • Asynchronous time warping is often applied in an AR system to warp an image before it is sent to a display to correct for head movement that occurs after the image is rendered.
  • relevant image data and inertial sensor data are stored locally such that they can be synchronized and used for pose estimation/predication. As more image and sensor data are stored in the memory, a corresponding memory size increases and memory management becomes difficult and expensive. It would be beneficial to have a more efficient mechanism for data synchronization and pose estimation than the current practice.
  • Various embodiments of this application are directed to synchronizing image data and inertial sensor data and estimating poses in extended reality (e.g., including VR, AR, and mixed reality).
  • extended reality e.g., including VR, AR, and mixed reality.
  • Such data synchronization is based on a determination that an adjustment value between image-based pose data and sensor-based pose data remains substantially the same during an image latency time lasting from a prior time when an image is captured to a current time when the image has made available.
  • Recent inertial sensor data are used to estimate a pose that is adjusted by the adjusted value, thereby providing an accuracy level that can be offered by the image captured concurrently with the recent inertial sensor data without waiting for the image latency time of the image.
  • the pose data are stored in a ring buffer having a fixed or tunable size.
  • a method is implemented at an electronic system for estimating poses. The method includes obtaining an image that is captured at a prior time t, by a camera of an electronic system and made available by the camera prior to the current time t j .
  • the method further includes extracting prior pose data (p wid ) corresponding to the prior time t, that are stored in memory, determining image-based pose data (p mew ) corresponding to the prior time t, based on the image, and determining a prior adjustment value (rip) corresponding to the prior time based on the image-based pose data (p mew ) and the prior pose data ip, aid ).
  • the method further includes obtaining a current inertial sensor sample that is captured substantially at the current time t j by an inertial measurement unit (IMU) of the electronic system and determining a current pose data (p jnew ) corresponding to the current time based on the current inertial sensor sample and the prior adjustment value (rip).
  • IMU inertial measurement unit
  • determination of the current pose data (p jnew ) further includes determining current inertial- based pose data (p J0id ) based on at least the prior pose data (p wid ) and the current inertial sensor sample and adjusting the current inertial-based pose data (p J0id ) by the prior adjustment value (rip) to determine the current pose data (p jnew ).
  • some implementations include an electronic system that includes one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform any of the above methods.
  • some implementations include a non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the above methods.
  • Figure 1 A is an example data processing environment having one or more servers communicatively coupled to one or more client devices, in accordance with some embodiments.
  • Figure IB illustrates a pair of augmented reality (AR) glasses (also called a head-mounted display) that can be communicatively coupled to a data processing environment, in accordance with some embodiments.
  • AR augmented reality
  • Figure 2 is a block diagram illustrating a data processing system, in accordance with some embodiments.
  • FIG. 3 is a flowchart of a process for processing inertial sensor data and image data of an electronic system (e.g., a server, a client device, or a combination of both) using a SLAM module, in accordance with some embodiments.
  • an electronic system e.g., a server, a client device, or a combination of both
  • SLAM module a process for processing inertial sensor data and image data of an electronic system (e.g., a server, a client device, or a combination of both) using a SLAM module, in accordance with some embodiments.
  • Figure 4 is a temporal diagram illustrating two parallel temporal threads of inertial sensor data and image data, in accordance with some embodiments.
  • Figure 5 is a temporal diagram illustrating a process of estimating poses with reference to two parallel temporal threads, in accordance with some embodiments.
  • Figure 6A and 6B illustrates process of extrapolating pose data associated with a subsequent time, in accordance with some embodiments.
  • Figure 7 illustrates a process of storing pose data in a pose data buffer, in accordance with some embodiments.
  • Figure 8 is a flowchart of a method for estimating poses, in accordance with some embodiments.
  • Figure 1A is an example data processing environment 100 having one or more servers 102 communicatively coupled to one or more client devices 104, in accordance with some embodiments.
  • the one or more client devices 104 may be, for example, desktop computers 104 A, tablet computers 104B, mobile phones 104C, or intelligent, multi-sensing, network-connected home devices (e.g., a camera).
  • the one or more client devices 104 include a head-mounted display 150.
  • Each client device 104 can collect data or user inputs, executes user applications, and present outputs on its user interface.
  • the collected data or user inputs can be processed locally (e.g., for training and/or for prediction) at the client device 104 and/or remotely by the server(s) 102.
  • the one or more servers 102 provides system data (e.g., boot files, operating system images, and user applications) to the client devices 104, and in some embodiments, processes the data and user inputs received from the client device(s) 104 when the user applications are executed on the client devices 104.
  • the data processing environment 100 further includes a storage 106 for storing data related to the servers 102, client devices 104, and applications executed on the client devices 104.
  • storage 106 may store video content (including visual and audio content), static visual content, and/or inertial sensor data for training a machine learning model (e.g., deep learning network).
  • storage 106 may also store video content, static visual content, and/or inertial sensor data obtained by a client device 104 to which a trained machine learning model can be applied to determine one or more poses associated with the video content, static visual content, and/or inertial sensor data.
  • the one or more servers 102 can enable real-time data communication with the client devices 104 that are remote from each other or from the one or more servers 102. Further, in some embodiments, the one or more servers 102 can implement data processing tasks that cannot be or are preferably not completed locally by the client devices 104.
  • the client devices 104 include a game console (e.g., the head-mounted display 150) that executes an interactive online gaming application.
  • the game console receives a user instruction and sends it to a game server 102 with user data.
  • the game server 102 generates a stream of video data based on the user instruction and user data and providing the stream of video data for display on the game console and other client devices that are engaged in the same game session with the game console.
  • the client devices 104 include a networked surveillance camera and a mobile phone 104C.
  • the networked surveillance camera collects video data and streams the video data to a surveillance camera server 102 in real time. While the video data is optionally pre-processed on the surveillance camera, the surveillance camera server 102 processes the video data to identify motion or audio events in the video data and share information of these events with the mobile phone 104C, thereby allowing a user of the mobile phone 104 to monitor the events occurring near the networked surveillance camera in the real time and remotely.
  • the one or more servers 102, one or more client devices 104, and storage 106 are communicatively coupled to each other via one or more communication networks 108, which are the medium used to provide communications links between these devices and computers connected together within the data processing environment 100.
  • the one or more communication networks 108 may include connections, such as wire, wireless communication links, or fiber optic cables. Examples of the one or more communication networks 108 include local area networks (LAN), wide area networks (WAN) such as the Internet, or a combination thereof.
  • the one or more communication networks 108 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB), FIREWIRE, Long Term Evolution (LTE), Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol.
  • a connection to the one or more communication networks 108 may be established either directly (e.g., using 3G/4G connectivity to a wireless carrier), or through a network interface 110 (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof.
  • a network interface 110 e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node
  • the one or more communication networks 108 can represent the Internet of a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other electronic systems that route data and messages.
  • deep learning techniques are applied in the data processing environment 100 to process content data (e.g., video data, visual data, audio data) obtained by an application executed at a client device 104 to identify information contained in the content data, match the content data with other data, categorize the content data, or synthesize related content data.
  • the content data may broadly include inertial sensor data captured by inertial sensor(s) of a client device 104.
  • data processing models are created based on one or more neural networks to process the content data. These data processing models are trained with training data before they are applied to process the content data.
  • both model training and data processing are implemented locally at each individual client device 104 (e.g., the client device 104C and head-mounted display 150).
  • the client device 104C or head-mounted display 150 obtains the training data from the one or more servers 102 or storage 106 and applies the training data to train the data processing models. Subsequently to model training, the client device 104C or head-mounted display 150 obtains the content data (e.g., captures video data via an internal camera) and processes the content data using the training data processing models locally.
  • both model training and data processing are implemented remotely at a server 102 (e.g., the server 102A) associated with a client device 104 (e.g.
  • the server 102A obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models.
  • the client device 104A or head-mounted display 150 obtains the content data, sends the content data to the server 102A (e.g., in an application) for data processing using the trained data processing models, receives data processing results (e.g., recognized or predicted device poses) from the server 102A, presents the results on a user interface (e.g., associated with the application), rending virtual objects in a field of view based on the poses, or implements some other functions based on the results.
  • data processing results e.g., recognized or predicted device poses
  • the client device 104A or head-mounted display 150 itself implements no or little data processing on the content data prior to sending them to the server 102 A. Additionally, in some embodiments, data processing is implemented locally at a client device 104 (e.g., the client device 104B and head-mounted display 150), while model training is implemented remotely at a server 102 (e.g., the server 102B) associated with the client device 104B or head- mounted display 150.
  • the server 102B obtains the training data from itself, another server 102 or the storage 106 and applies the training data to train the data processing models.
  • the trained data processing models are optionally stored in the server 102B or storage 106.
  • the client device 104B or head-mounted display 150 imports the trained data processing models from the server 102B or storage 106, processes the content data using the data processing models, and generates data processing results to be presented on a user interface or used to initiate some functions (e.g., rendering virtual objects based on device poses) locally.
  • FIG. IB illustrates a pair of augmented reality (AR) glasses 150 (also called a head-mounted display) that can be communicatively coupled to a data processing environment 100, in accordance with some embodiments.
  • the AR glasses 150 can be includes a camera, a microphone, a speaker, one or more inertial sensors (e.g., gyroscope, accelerometer), and a display.
  • the camera and microphone are configured to capture video and audio data from a scene of the AR glasses 150, while the one or more inertial sensors are configured to capture inertial sensor data.
  • the camera captures hand gestures of a user wearing the AR glasses 150.
  • the microphone records ambient sound, including user’s voice commands.
  • both video or static visual data captured by the camera and the inertial sensor data measured by the one or more inertial sensors are applied to determine and predict device poses.
  • the video, static image, audio, or inertial sensor data captured by the AR glasses 150 is processed by the AR glasses 150, server(s) 102, or both to recognize the device poses.
  • deep learning techniques are applied by the server(s) 102 and AR glasses 150 jointly to recognize and predict the device poses.
  • the device poses are used to control the AR glasses 150 itself or interact with an application (e.g., a gaming application) executed by the AR glasses 150.
  • the display of the AR glasses 150 displays a user interface, and the recognized or predicted device poses are used to render or interact with user selectable display items on the user interface.
  • deep learning techniques are applied in the data processing environment 100 to process video data, static image data, or inertial sensor data captured by the AR glasses 150.
  • Device poses are recognized and predicted based on such video, static image, and/or inertial sensor data using a data processing model. Training of the data processing model is optionally implemented by the server 102 or AR glasses 150. Inference of the device poses is implemented by each of the server 102 and AR glasses 150 independently or by both of the server 102 and AR glasses 150 jointly.
  • FIG. 2 is a block diagram illustrating a data processing system 200, in accordance with some embodiments.
  • the data processing system 200 includes a server 102, a client device 104, a storage 106, or a combination thereof.
  • the data processing system 200 typically, includes one or more processing units (CPUs) 202, one or more network interfaces 204, memory 206, and one or more communication buses 208 for interconnecting these components (sometimes called a chipset).
  • the data processing system 200 includes one or more input devices 210 that facilitate user input, such as a keyboard, a mouse, a voice- command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls.
  • the client device 104 of the data processing system 200 uses a microphone and voice recognition or a camera 260 and gesture recognition to supplement or replace the keyboard.
  • the client device 104 includes one or more cameras 260, scanners, or photo sensor units for capturing images, for example, of graphic serial codes printed on the electronic devices.
  • the data processing system 200 also includes one or more output devices 212 that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays.
  • the client device 104 includes a location detection device, such as a GPS (global positioning satellite) or other geo location receiver, for determining the location of the client device 104.
  • GPS global positioning satellite
  • the client device 104 includes an inertial measurement unit (IMU) 280 integrating multi-axes inertial sensors to provide estimation of a location and an orientation of the client device 104 in space.
  • IMU inertial measurement unit
  • the one or more inertial sensors include, but are not limited to, a gyroscope, an accelerometer, a magnetometer, and an inclinometer.
  • Memory 206 includes high-speed random access memory, such as DRAM,
  • SRAM, DDR RAM, or other random access solid state memory devices and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices.
  • Memory 206 optionally, includes one or more storage devices remotely located from one or more processing units 202.
  • Memory 206, or alternatively the non-volatile memory within memory 206 includes a non-transitory computer readable storage medium.
  • memory 206, or the non- transitory computer readable storage medium of memory 206 stores the following programs, modules, and data structures, or a subset or superset thereof:
  • Operating system 214 including procedures for handling various basic system services and for performing hardware dependent tasks
  • Network communication module 216 for connecting each server 102 or client device 104 to other devices (e.g., server 102, client device 104, or storage 106) via one or more network interfaces 204 (wired or wireless) and one or more communication networks 108, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
  • User interface module 218 for enabling presentation of information (e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.) at each client device 104 via one or more output devices 212 (e.g., displays, speakers, etc.);
  • information e.g., a graphical user interface for application(s) 224, widgets, websites and web pages thereof, and/or games, audio and/or video content, text, etc.
  • output devices 212 e.g., displays, speakers, etc.
  • Input processing module 220 for detecting one or more user inputs or interactions from one of the one or more input devices 210 and interpreting the detected input or interaction;
  • Web browser module 222 for navigating, requesting (e.g., via HTTP), and displaying websites and web pages thereof, including a web interface for logging into a user account associated with a client device 104 or another electronic device, controlling the client or electronic device if associated with the user account, and editing and reviewing settings and data that are associated with the user account;
  • One or more user applications 224 for execution by the data processing system 200 e.g., games, social network applications, smart home applications, and/or other web or non-web based applications for controlling another electronic device and reviewing data captured by such devices;
  • Model training module 226 for receiving training data and establishing a data processing model for processing content data (e.g., video data, visual data, audio data, and inertial sensor data) to be collected or obtained by a client device 104;
  • content data e.g., video data, visual data, audio data, and inertial sensor data
  • Data processing module 228 for processing content data using data processing models 240, thereby identifying information contained in the content data, matching the content data with other data, categorizing the content data, or synthesizing related content data, where in some embodiments, the data processing module 228 is associated with one of the user applications 224 to process the content data in response to a user instruction received from the user application 224;
  • Pose determination and prediction module 230 for determining and predicting a pose of the client device 104 (e.g., AR glasses 150), where in some embodiments, the pose is determined and predicted jointly by the pose determination and prediction module 232 and data processing module 228, and the module 232 further includes an SLAM module 232 for mapping a scene where a client device 104 is located and identifying a location of the client device 104 within the scene;
  • Pose-based rendering module 234 for rendering virtual objects on top of a field of view of a camera 260 of the client device 104 in real time;
  • Pose data buffer 236 for storing pose data optionally with inertial sensor data and image data for the purposes of determining recent pose data and predicting subsequent pose data;
  • One or more databases 238 for storing at least data including one or more of: o Device settings 240 including common device settings (e.g., service tier, device model, storage capacity, processing capabilities, communication capabilities, etc.) of the one or more servers 102 or client devices 104; o User account information 242 for the one or more user applications 224, e.g., user names, security questions, account history data, user preferences, and predefined account settings; o Network parameters 244 for the one or more communication networks 108, e.g., IP address, subnet mask, default gateway, DNS server and host name; o Training data 246 for training one or more data processing models 240; o Data processing model(s) 248 for processing content data (e.g., video data, visual data, audio data) using deep learning techniques; and o Content data and results 250 that are obtained by and outputted to the client device 104 of the data processing system 200, respectively, including a subset or all of the following data: Historic inertial sensor data 252 that are measured by the I
  • the one or more databases 230 are stored in one of the server 102, client device 104, and storage 106 of the data processing system 200.
  • the one or more databases 230 are distributed in more than one of the server 102, client device 104, and storage 106 of the data processing system 200.
  • more than one copy of the above data is stored at distinct devices, e.g., two copies of the data processing models 240 are stored at the server 102 and storage 106, respectively.
  • Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
  • the above identified modules or programs i.e., sets of instructions
  • memory 206 optionally, stores a subset of the modules and data structures identified above.
  • memory 206 optionally, stores additional modules and data structures not described above.
  • FIG. 3 is a flowchart of a process 300 for processing inertial sensor data and image data of an electronic system (e.g., a server 102, a client device 104, or a combination of both) using a SLAM module 230, in accordance with some embodiments.
  • the process 300 includes measurement preprocessing 302, initialization 304, local visual-inertial odometry (VIO) with relocation 306, and global pose graph optimization 308.
  • VIO local visual-inertial odometry
  • a camera 260 captures image data of a scene at an image frame rate (e.g., 30 FPS), and features are detected and tracked (310) from the image data.
  • an image frame rate e.g., 30 FPS
  • An IMU 280 measures inertial sensor data at a sampling frequency (e.g., 1000 Hz) concurrently with the camera 260 capturing the image data, and the inertial sensor data are pre-integrated (312) to provide pose data.
  • a sampling frequency e.g. 1000 Hz
  • the image data captured by the camera 260 and the inertial sensor data measured by the IMU 280 are temporally aligned (314).
  • Vision- only structure from motion (SfM) techniques 314 are applied (316) to couple the image data and inertial sensor data, estimate three-dimensional structures, and map the scene of the camera 260.
  • a sliding window 318 and associated states from a loop closure 320 are used to optimize (322) a VIO.
  • the VIO corresponds (324) to a keyframe of a smooth video transition and a corresponding loop is detected (326)
  • features are retrieved (328) and used to generate the associated states from the loop closure 320.
  • global pose graph optimization 308 a multi-degree-of-freedom (multi- DOF) pose graph is optimized (330) based on the states from the loop closure 320, and a keyframe database 332 is updated with the keyframe associated with the VIO.
  • the features that are detected and tracked (310) are used to monitor (334) motion of an object in the image data and estimate image-based poses 336, e.g., according to the image frame rate.
  • the inertial sensor data that are pre-integrated (312) may be propagated (338) based on the motion of the object and used to estimate inertial-based poses 340, e.g., according to the sampling frequency of the IMU 280.
  • the image-based poses 336 and the inertial-based poses 340 are stored in the pose data buffer 236 and used by the module 230 to estimate and predict poses that are used by the pose-based rendering module 234.
  • the module 230 receives the inertial sensor data measured by the IMU 280 and obtains image-based poses 336 to estimate and predict more poses that are further used by the pose-based rendering module 234.
  • SLAM high frequency pose estimation is enabled by sensor fusion, which relies on data synchronization between imaging sensors and the IMU 280.
  • the imaging sensors e.g., cameras, lidars
  • the imaging sensors provide image data desirable for pose estimation, and oftentimes operate at a low frequency (e.g., 30 frames per second) and with a large latency (e.g., 30 millisecond).
  • the IMU 280 can measure inertial sensor data and operate at a very high frequency (e.g., 1000 samples per second) and with a negligible latency (e.g., ⁇ 0.1 millisecond).
  • Asynchronous time warping is often applied in an AR system to warp an image before it is sent to a display to correct for head movement that occurs after the image is rendered.
  • ATW algorithms reduce a latency of the image, increase or maintain a frame rate, or reduce judders caused by missing image frames.
  • relevant image data and inertial sensor data are stored locally such that they can be synchronized and used for pose estimation/predication.
  • the image and inertial sensor data are stored in one of multiple STL containers, e.g., std::vector, std::queue, std::list, etc., or other self-defined containers. These containers are generally very convenient for use.
  • the image and inertial sensor data are stored in the STL containers with their time stamps, and the timestamps are used for data search, data insertion, and data organization.
  • Figure 4 is a temporal diagram 400 illustrating two parallel temporal threads
  • the temporal thread 402 of inertial sensor data includes a temporally-ordered sequence of inertial sensor samples 406 measured by the IMU 280 at a sampling frequency. Each inertia sensor sample 406 has a first latency from being captured by the IMU 280 to being available to be used by a pose determination and prediction module 230.
  • the temporal thread 404 of image data includes a temporally-ordered sequence of images 408 captured by a camera 260 at an image frame rate. Each image 406 has a second latency from being captured by the camera 260 to being available to be used by the pose determination and prediction module 230.
  • the sampling frequency of the inertial sensor samples 406 is greater than the image frame rate, and the first latency is much less than the second latency.
  • a temporal position of each inertial sensor sample 406 or image 408 on the temporal threads 402 or 404 corresponds to a respective time when the corresponding inertial sensor sample 406 or image 408 is available to be used by the pose module 230.
  • the images 408 are captured at a frequency of 30 Hz, and the image frame rate is 30 frames per second.
  • the second latency of each image 408 is approximately 30 ms.
  • every two consecutive images 408 are temporally separated by 33 ms, while each image 408 is available the second latency of 30 ms (covering image transfer and processing) after being captured by the camera 260.
  • the inertial sensor samples 406 are measured by the IMU 280 at a sampling frequency of 1000 Hz, and have the first latency that approximates 0.1 ms, which is almost negligible compared with the second latency and the temporal separation between two images 408.
  • a current time t j is after the current image 408A is captured and before the current image 408A is captured.
  • a prior image 408P that is immediately preceding the current image 408A has been available at a prior time t, to determine image-based pose data p mew and correct prior inertial-based pose data pi oid accurately.
  • the current time t j is separated from the prior time t, of applying the prior image 408P by a time equal to less than a combination of a frame separation and the second latency Jh (e.g., ⁇ 63 ms).
  • a separation time between the current and prior times t j and t corresponds to about 60 inertial sensor samples 406.
  • a current inertial sensor sample 406A is captured substantially at the current time t j , e.g., captured within a temporal range 412 (which is equal to the first latency) of the current time t j or at a time that is closer to the current time t j than any other inertial sensor samples 406.
  • the current inertial sensor sample 406A is captured substantially at the current time t j when the current inertial sensor sample 406A is the first inertial sensor sample 406 captured after the current time t j or is among a first predefined number of inertial sensor samples captured immediately after the current time t j.
  • the current inertial sensor sample 406A is captured substantially at the current time t j when the current inertial sensor sample 406A is the last inertial sensor sample 406 captured before the current time t j or is among a second predefined number of inertial sensor samples captured immediately before the current time t j.
  • a prior inertial sensor sample 406A’ is captured substantially at the prior time t e.g., captured within a temporal range 412 (which is equal to the first latency) of the current time t, or at a time that is closer to the prior time t, than any other inertial sensor samples 406.
  • the prior inertial sensor sample 406A’ is captured substantially at the prior time t, when the prior inertial sensor sample 406A’ is the first inertial sensor sample 406 captured after the prior time t, or is among the first predefined number of inertial sensor samples captured immediately after the prior time t,.
  • the prior inertial sensor sample 406A’ is captured substantially at the prior time t, when the prior inertial sensor sample 406A’ is the last inertial sensor sample 406 captured before the prior time t, or is among the second predefined number of inertial sensor samples captured immediately before the prior time t,.
  • an Extended Kalman Filter (EKF) or Error State Kalman Filter (ESKF) is applied to update poses obtained between the prior time t, of capturing the first image 408 A and the current time t j of receiving the first image 408A.
  • the poses are updated using the inertial sensor samples 406 (e.g., acceleration, angular velocity, and corresponding bias and noise) captured between the current and prior times t j and t,.
  • approximately 60 old poses can be determined between the current and prior times t j and t,, i.e., between capturing of the first image 408A and capturing of the first inertial sensor sample 406A.
  • 60 old poses need to be renewed or repropagated to result in a latest renewed pose at the current time t j.
  • Such a re-propagation loop needs time.
  • this re-propagation loop delays and extends a duration of time needed for pose determination and prediction.
  • the prior image 408P is captured at a prior time t, by a camera 260 of an electronic system and made available by the camera 260 prior to a current time t j.
  • the prior time t corresponds to prior pose data (p wid ) and image- based pose data (p mew ), and the image-based pose data (p mew ) is determined based on the prior image 408P.
  • the image-based pose data (p mew ) is more accurate than the prior pose data ( p, oid ), and however, are not available the prior time t, .
  • a prior adjustment value (rip) indicates a difference between a difference between the pose data p mew and p wid at the prior time t, and remains substantially the same from the prior time t, and the current time t j.
  • the prior adjustment value (rip) may be applied to adjust pose data determined based on each inertial sensor sample 406 to obtain corresponding image-based pose data, including adjusting current inertial-based pose data (p J id ) determined based on the current inertial sensor sample 406A to a current pose data (p new ).
  • the current inertial -based pose data (jod ) is determined based on the prior pose data (p, id ) and the current inertial sensor sample 406A, and adjusted by the prior adjustment value (rip) to determine the current pose data (p jnew ).
  • the EKF and ESKF do not need to be applied to update each pose data item between the prior time t, and current time t j , thereby conserving computational capabilities and expediting pose estimation and prediction.
  • each image 408 is greater than the first latency of each inertial sensor sample 406, and the inertial sensor samples 406 captured by the IMU 280 is made available by one or more image frames ahead of the images 408.
  • Each of the images 408 is captured at an image timestamp t, and the corresponding inertial sensor sample 406 closest to the image timestamp t is available at an inertial timestamp t t.
  • the inertial timestamp t correspond to a pose p iold .
  • an ESKF update function is called to obtain a new and updated pose Pi new at the timestamp t t.
  • a pose p J id at a time t j subsequent to the time t is associated with the old pose p wid , and determined based on propagation of the inertial sensor samples 406 as follows:
  • the difference dr is applied to the pose p J0id determined based on propagation of the inertial sensor samples 406, allowing an updated pose p jnew to be determined with a minimum time cost (e.g., less than 0.001 ms in an Android system) in the IMU thread 402. In some embodiments having an IMU frequency higher than 1000 Hz, propagation is determined within less than 1 ms. Pose determination using the difference dr takes less than 1 ms, e.g., 0.0001 ms, which can make an augmented reality application run smoothly.
  • poses between the times t, and t j are updated in a loop using the same dr in the camera thread 404, and is optionally used for data synchronization or re-updated when a new image 408 is available.
  • updating these poses does not affect pose determination in IMU thread 402 (propagation), nor does it have any big impact on operation of the augmented reality application.
  • the relative pose Dr is determined by acceleration, angular velocity, bias, and noise measured by the IMU 280 as follows: Dn ⁇ and Dr ⁇ are relative rotation matrix, velocity, and position between time t t and t j , respectively; and a k are gyroscope and accelerometer measurements; and b k , b , p k d and p k d are accelerometer bias, gyroscope bias, accelerometer noise, and gyroscope noise, respectively.
  • equations (1) and (2) are updated as follows:
  • the same difference dr is applied to update an inertial-propagated pose P j ld to a pose Pj new -
  • the difference dr is applied to update poses p fc between the times t t and t j to poses P knew , and acceleration and angular velocity are not applied every time to update each pose p fc where k E [ i,j ]. This saves amount of calculation and time.
  • FIG. 5 is a temporal diagram illustrating a process 500 of estimating poses with reference to two parallel temporal threads 402 and 404, in accordance with some embodiments.
  • the temporal thread 402 of inertial sensor data includes a temporally-ordered sequence of inertial sensor samples 406 measured e by the IMU 280, and each inertial sensor sample 406 is marked at a temporal location corresponding to a time when the respective sample 406 is made available for pose estimation.
  • a first latency of each sample 406 is negligible, and the time when the sample 406 is made available substantially overlaps a time when the sample 406 is captured on the thread 402.
  • the temporal thread 404 of image data includes a temporally-ordered sequence of images 408 captured by a camera 260, and each image 408 is marked at a temporal location corresponding to a time when the respective image is made available for post estimation.
  • a second latency of each image 408 is comparable with a temporal separation between two consecutive images 408, e.g., on the same order with the temporal separation between the two consecutive images 408.
  • the respective image 408 is applied to determine an adjustment value (dr) retroactively at a time t j when the respective image 408 is captured.
  • Such an adjustment value (dr) is applied to determine a current pose (p jnew ) in real time after the respective image 408 is made available and before the immediately following image 408 is made available, derive an intermediate pose (p mmw ) occurring prior to the current pose (p jnew ) retroactively, or extrapolate a subsequent pose (p s ) that has not happened. More details on extrapolation of the subsequent pose (p s ) are discussed below with reference to Figure 6.
  • four images 408A, 408B, 408C, and 408D are made available after a first current time t ji , a second current time /, , a third current time t j 3 , and a fourth current time t j4 , respectively.
  • a prior image 408P has been captured at a prior time t, and made available prior to the first current time t ji.
  • the prior image 408A is used to determine image-based pose data (p mew ) corresponding to the first prior time t,i based on the prior image 408P, while prior pose data (p, 0id ) corresponding to the first prior time t,i have been stored in memory and are extracted from the memory.
  • the prior prose data (p wid ) and image-based pose data (p mew ) are compared to determine a prior adjustment value (rip) corresponding to the first prior time t,i.
  • a current inertial sensor sample 406A is captured substantially at the first current time t ji by the IMU 280 (e.g., within a range 412 of the first current time t j i).
  • a first current pose data (p jnew ) corresponding to the first current time t ji is determined based on the current inertial sensor sample 406A and the prior adjustment value (rip).
  • first current inertial-based pose data (p J0id ) are determined based on at least the prior pose data (p wid ) and the current inertial sensor sample 406A, e.g., using integration or an ESKF update function.
  • the current inertial-based pose data (p joid ) is adjusted by the prior adjustment value (rip) to determine the first current pose data (p jnew ).
  • the first current pose data (p jnew ) are stored in the pose data buffer 236 in association with the first current time t ji.
  • intermediate inertial-based pose data are determined based on at least the prior pose data (p wid ) and an intermediate inertial sensor sample 406M, e.g., using integration or an ESKF update function.
  • the intermediate inertial-based pose data (p moid ) is adjusted by the prior adjustment value (rip) to determine an intermediate pose data (p mnew ).
  • These intermediate pose data (p mmw ) are stored in the pose data buffer 236 in association with the intermediate time t m.
  • the adjustment value (rip) is retroactively updated every time an image 408 is made available for pose estimation. Similar to the first current image 408P, each of the images 408A, 408B, and 408C corresponds to a corresponding prior time when the respective image is captured, and is made available prior to a corresponding current time At each current time the image 408B, 408C, or 408D is not available yet, and the image 408A, 408B, or 408C is used to determine corresponding image-based pose data (p mew ) at the prior time respectively. Prior pose data (p, 0id ) corresponding to each prior time have been stored in the memory and are extracted from the memory.
  • the prior prose data (p wid ) and image-based pose data (p mew ) are compared to determine a respective adjustment value (rip).
  • a current inertial sensor sample is captured substantially at each current time t j 2, t j 3 , or 4 by the IMU 280, and used to determine corresponding current pose data (p jnew ) jointly with the respective adjustment value (rip) associated with the prior time t
  • the current pose data ( ) associated with each of the current time are stored in the pose data buffer 236 in association with the current times 1, 2, ip, and Ip.
  • intermediate pose data (p mmw ) between each of the current time 1, 2 , I, J , and ip and the respective prior time tp, s, and tp are also determined and stored in the pose data buffer 236 in association with an intermediate time t m.
  • a sequence 502 of inertial sensor samples 406 are captured after the current inertial sensor sample 406A.
  • respective inertial-based following pose data p/, i.e., p moid
  • integration or an ESKF update function can be determined from the current pose data ip, new .
  • the inertial sensor samples 406 and images 408 are synchronized.
  • the image 408 is used to update the image-based pose data (p mew ) at a prior time t, when the image 408 is captured.
  • the image-based pose data (p mew ) is used to determine a prior adjustment value rip at the prior time t and the prior adjustment value rip is applied at the current time t j to obtain current pose data (p mew ).
  • intermediate pose data (p mnew ) are derived and stored for an intermediate time t m between the times t, and t j.
  • propagation is running continuously based on the pose data p wid on the temporal thread 402 of inertial sensor data.
  • Figure 6A and 6B illustrates process 600 and 650 of extrapolating pose data associated with a subsequent time 4, in accordance with some embodiments.
  • each image 408 has been made available and is applied to determine an adjustment value (rip) retroactively at a prior time t, when the respective image 408 is captured.
  • such an adjustment value (rip) is applied to determine current pose data (p jnew ) in real time after the respective image 408 is made available, intermediate pose data (p mnew ) corresponding to an intermediate time that is between the times t, and I,, and following pose data (p/) corresponding to the sequence 502 of inertial sensor samples 406 captured after the current time t,.
  • Subsequent pose data (p s ) 602 that follows the current or following pose data can be extrapolated and predicted from a subset of following pose data (p y), the current pose data (p jnew ), intermediate pose data (p mnew ), and image-based pose data (p mew ), e.g., allowing AR glasses to render images in a predictive manner and improving user experience with little or no delay.
  • first pose data (pi) correspond to a first time ti that is no earlier than the prior time t, and is prior to the current time I, and are determined, e.g., based on an inertial sensor sample 406 measured at the first time 4 and the prior adjustment value ( dr ) determined for the prior time h.
  • a subsequent time 4 follows the current time t j.
  • Subsequent pose data (p s ) 602 corresponding to the subsequent time is predicted based on the first pose data (pi), the first time 4, the current pose data (p jnew ), and the current time t j.
  • the subsequent pose data (p s ) corresponding to the subsequent time 4 is linearly extrapolated from the first pose data (pi) and the current pose data (p jnew ).
  • the first pose data (pi) include the imaged-based pose data ( pi new ) corresponding to the prior time 4
  • the first pose data (pi) correspond to the intermediate pose data (p mnew ) that is aligned with the image-based pose data (p mew ) and the current pose data (p new ) ⁇
  • the subsequent pose data (p s ) corresponding to the subsequent time 4 has an error, and however, can still be used to facilitate real-time image rendering in a field of view of the camera 260.
  • first pose data (pi) correspond to a first time 4 that is no earlier than the second prior time (1,2) as previously explained.
  • first pose data (ri') correspond to a first time 4 that is no earlier than the current time (t j j) when the first current image 408A is made available for pose estimation.
  • first pose data (pi) applied for pose prediction of the subsequent pose data (p s ) correspond to the first time 4 that is no earlier than a later time of (1) the prior time (/,) when an image is captured and (2) a distinct current time (tj) when a prior image captured immediately before the current image is made available for pose estimation. That said, the first pose data (pi") (not the first pose data (pi)) is applied with the current pose data (p jnew ) to derive the subsequent pose data (p s ).
  • pose data have been updated to the following pose data (pj) corresponding to a following time //that follows the current time t j.
  • the following pose data (pj) are determined, e.g., based on an inertial sensor sample 406 measured at the following time //and the prior adjustment value (dr).
  • the subsequent pose data (p s ) 602 corresponding to the subsequent time 4 is predicted based on the first pose data (pi), the first time 4, the following pose data (pj), and the following time //.
  • the subsequent pose data (p s ) corresponding to the subsequent time 4 is linearly extrapolated from the first pose data (pi) and the following pose data (pj).
  • Figure 7 illustrates a process 700 of storing pose data (e.g .,p mew , pi oid , P jnew ,
  • the pose data buffer 236 also includes a plurality of time stamps.
  • Each of the inertial sensor data 406, image data 408, and pose data items corresponds to a respective time stamp. Stated another way, each time stamp records a respective time and corresponds to one or more of:
  • the pose data buffer 236 stores at least a current inertial sensor sample 406 A corresponding to a current time t j . Further, in some embodiments, the pose data buffer 236 stores the prior pose data (pmew ), the current pose data (p jnew ), one or more intermediate pose data (p mmw ) between the prior pose data and current pose data, and one or more following pose data (pj) following the current pose data (p new ) ⁇
  • a size of the pose data buffer 236 is dynamically adjusted based on amount of data to be stored for pose estimation and prediction. Accesses to data stored in the pose data buffer 236 are managed by a memory controller. In some situation, the amount of data exceeds a data threshold, and the accesses to the pose data buffer 236 can be significantly delayed.
  • the pose data buffer 236 includes a ring buffer 702.
  • the ring buffer 702 stores a plurality of pose data items including the prior pose data (p wid ) and the current pose data (p jnew ) according to a spatial order.
  • the spatial order matches a temporal order of the plurality of time stamps of the plurality of pose data items.
  • the plurality of pose data items are sequentially stored into the ring buffer 702 according to the spatial order.
  • oldest pose data stored in the ring buffer 702 is identified based on an oldest time stamp associated with the oldest pose data, and the current pose data (p jnew ) and a current time stamp corresponding to the current time (tj) are stored into the ring buffer 702 in place of the oldest pose data and the oldest time stamp.
  • first pose data (pi) are determined for a first time (/;) that is not earlier than the prior time (/,) or later than the current time and (tj), and used with the current pose data (p jnew ) or following pose data (pj) to extrapolate subsequent pose data (/3 ⁇ 4).
  • a size of the ring buffer 702 is dynamically adjusted based on a temporal separation between the first pose data (pi) and most recent post data, e.g., the current or following pose data (p jnew of pj).
  • the image-based pose data (p J0id ) is used as the first pose data (pi) to extrapolate the subsequent pose data (p s ), and the size of the ring buffer 702 is dynamically adjusted based on the temporal separation between the image- based pose data (p J0id ) and the current pose data (p jnew ).
  • the ring buffer 702 stores intermediate pose data (p mmw ) and following pose data (pj) derived from each inertial sensor sample 406 in IMU propagation. That said, the ring buffer 702 is used to store the inertial-based pose data with the sampling frequency of the IMU 280.
  • the sampling frequency of the IMU 280 is 1000 Hz
  • a time distance between two consecutive poses is 1 ms.
  • An ATW algorithm includes an extrapolation operation involving at least two poses stored in the ring buffer 702.
  • the ring buffer 702 has a fixed size, and little or no memory management is needed to store pose data into the ring buffer 702.
  • the fixed size of the ring buffer 702 can be fine tuned, i.e., the ring buffer 702 has an adjustable size that is represented as follows:
  • t e [0, ⁇ ] is the interpolation/extrapolation paramete
  • t 1 and t 2 two timestamps corresponds to two of pose data items stored in the ring buffer 702
  • t 3 is a timestamp corresponding to pose data to be predicted or upsampled. If t e [0,1], it is interpolation of intermediate pose data (p mnew ). If t > 1, it is extrapolation of subsequent pose data ip s ). In an example, t e (1,2), and a temporal distance of t 1 and t 2 is approximately 40 ms.
  • One of the two pose data items is the following pose data (pj) that has been available, and the other one of the two pose data items can be the oldest pose data stored in the ring buffer.
  • the size of the ring buffer 702 is tuned to store 40 pose data items.
  • oldest pose data and most recent pose data e.g., p jneu and pj
  • pose prediction may compensate for sensor latency and motion to photon latency effectively.
  • FIG. 8 is a flowchart of a method 800 for estimating poses, in accordance with some embodiments.
  • the method 800 is described as being implemented by an electronic system (e.g., a client device 104, a server 102, or a combination thereof).
  • An example of the client device 104 is a head-mount display 150.
  • the method 600 is applied to determine and predict poses for use in extended reality (e.g., VR, AR).
  • Method 700 is, optionally, governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of the electronic system.
  • Each of the operations shown in Figure 6 may correspond to instructions stored in a computer memory or non-transitory computer readable storage medium (e.g., memory 206 of the system 200 in Figure 2).
  • the computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
  • the instructions stored on the computer readable storage medium may include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in method 700 may be combined and/or the order of some operations may be changed.
  • the electronic system obtains (802) an image 408A that is captured at a prior time t, by a camera 260 of an electronic system and made available by the camera prior to a current time t extracts (804) prior pose data (p w id) corresponding to the prior time t, that is stored in a buffer (e.g., a pose data buffer 236), determines (806) image-based pose data ( p new ) corresponding to the prior time t, based on the image 408A, determines (808) a prior adjustment value (rip) corresponding to the prior time t, based on the image-based pose data ( pinew ) and the prior pose data (p id), obtains (810) a current inertial sensor sample 406 A that is captured substantially at the current time t by an IMU 280 of the electronic system, and determines (812) a current pose data (p jn ew ) corresponding to the current time t j based on the current inertial sensor
  • the current pose data (p new ) is applied for implementing a pose-based control operation, e.g., rendering virtual objects in a field of view of the camera.
  • current inertial -based pose data (p j oid ) are determined based on at least the prior pose data (p w id) and the current inertial sensor sample 406A, and adjusted by the prior adjustment value (rip) to determine the current pose data (p jne w).
  • first pose data (p ) correspond to a first time t that is no earlier than the prior time and prior to the current time, and is determined, e.g., based on the prior adjustment value (rip) and prior pose data (p wid ). Inertial sensor samples 406 between the prior and first times may be applied to determine the first pose data (p ). A subsequent time 4 is subsequent to the current time t j. Subsequent pose data (p s ) corresponding to the subsequent time 4 are extrapolated based on the first pose data (p ), the first time 4, the current pose data (p j new), and the current time t j.
  • the first time 4 is the prior time t and the first pose data (p ) are the image-based pose data ( p new ).
  • the subsequent pose data can be predicated and applied for implementing a pose-based control operation, e.g., rendering virtual objects in a field of view of the camera 260.
  • the subsequent time 4 is before a next image 408B that is captured immediately after the image 408A is made available, and the subsequent pose data (p s ) corresponding to the subsequent time 4 are linearly extrapolated based on the first pose data (pi), the first time ti, the current pose data (p new ), and the current time t j.
  • the electronic device obtains an intermediate inertial sensor sample 406 captured by the IMU 280 substantially at the intermediate time t m , and determines inertial-based intermediate pose data (p moid ) based on the prior pose data (p, 0id ) and the intermediate inertial sensor sample 406.
  • the inertial-based intermediate pose data (p moid ) is adjusted by the prior adjustment value (rip) to determine final intermediate pose data (p mmw ).
  • first pose data (pi) corresponds to a first time ti that is earlier than the following time t j.
  • Subsequent pose data (p s ) corresponds to a subsequent time 4 that follows the following time tf, and is predicted based on the first pose data (pi), the first time 4, the respective following pose data (pj), and the following time tf.
  • the current inertial sensor sample 406A is captured substantially at the current time t j when the current inertial sensor sample 406A is captured within a temporal range 412 of the current time t j.
  • the temporal range 412 is defined by a first latency of inertial sensor samples 406.
  • the current inertial sensor sample 406A is the first inertial sensor sample or among a first predefined number of sensor samples captured or made available immediately after the current time t j.
  • the current inertial sensor sample 406A is the last inertial sensor sample or among a first predefined number of sensor samples captured or made available immediately before the current time t j.
  • the current inertial sensor sample 406A is closer to the current time t j than any other inertial sensor samples 406.
  • the buffer 236 stores a plurality of pose data items and a plurality of time stamps. Each time stamp records a respective time and corresponds to one or more respective pose data items determined for the respective time.
  • the plurality of pose data items include at least the current pose data (p jmw ), and optionally include the prior pose data (pi otd ), one or more intermediate pose data items (p m ) between the prior pose data (p wtd ) and current pose data (p m ew ), and one or more following pose data items (p/) that follows the current pose data (p jn ew ).
  • the buffer 236 stores a plurality of inertial sensor samples 406 including the current inertial sensor sample 406 A.
  • the buffer 236 is a ring buffer 702
  • the electronic system stores, in the ring buffer 702, the plurality of pose data items (e.g., the prior pose data (pioid) and the current pose data (p j new)) according to a spatial order that matches a temporal order of the plurality of time stamps of the plurality of pose data items.
  • oldest pose data stored in the ring buffer are identified based on an oldest time stamp associated with the oldest pose data.
  • the current pose data (p m ew ) and a current time stamp corresponding to the current time t j are stored in place of the oldest pose data and the oldest time stamp.
  • a size of the ring buffer 702 is dynamically adjusted based on a temporal separation between the most recent pose data and first pose data applied to extrapolate a subsequent pose data (p s ) associated with a subsequent time t s following a most recent time corresponding to the most recent pose data.
  • oldest pose data and an oldest time stamp are extracted from the ring buffer 702
  • most recent pose data e.g., the follow pose data (p/)
  • a most recent time stamp e.g., the following time (//)
  • a subsequent pose data is extrapolated in association with a subsequent time following the most rent time stamp, based on the oldest pose data, oldest time stamp, most recent pose data, and most recent time stamp.
  • the IMU of the electronic system is configured to capture a plurality of inertial sensor samples at a first frequency and with a first latency
  • the camera of the electronic system is configured to capture a plurality of images at a second frequency and with a second latency.
  • the first frequency is greater than the second frequency
  • the first latency is smaller than the second latency.
  • a ring buffer 702 can be applied for image and IMU synchronization in ESKF, and a size of the ring buffer 702 is adjusted dynamically based on an accuracy of an ATW. Orientation extrapolation is enabled conveniently, e.g., using the oldest and most recent pose data stored in the ring buffer 702. Such an arrangement requires no memory management in real-time ATW, which preserves the limited management capability and expedites the ATW process.
  • Figure 8 have been described are merely exemplary and are not intended to indicate that the described order is the only order in which the operations could be performed.
  • One of ordinary skill in the art would recognize various ways to identify device poses as described herein. Additionally, it should be noted that details of other processes described above with respect to Figures 3-7 are also applicable in an analogous manner to method 800 described above with respect to Figure 8. For brevity, these details are not repeated here.
  • the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context.
  • stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages can be implemented in hardware, firmware, software or any combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne la synchronisation de données et l'estimation ou la prédiction de pose, par exemple en réalité étendue. Un système électronique obtient une image qui est capturée à un instant antérieur par une caméra et mise à disposition par ladite caméra avant un instant courant. Des données de pose antérieures (Piold) correspondant à l'instant antérieur sont stockées dans un tampon et extraites. Des données de pose basées sur l'image (Pinew) correspondant à l'instant antérieur sont déterminées sur la base de l'image. Le système électronique détermine une valeur de réglage antérieure correspondant à l'instant antérieur sur la base des données de pose basées sur l'image (Pinew) et des données de pose antérieures (Piold). Un échantillon de capteur inertiel courant qui est capturé sensiblement à l'instant courant par une unité de mesure inertielle (IMU) du système électronique est obtenu. Des données de pose courantes (Pjnew) correspondant à l'instant courant sont déterminées sur la base de l'échantillon de capteur inertiel courant et de la valeur de réglage antérieure.
PCT/US2021/029607 2021-04-28 2021-04-28 Synchronisation de données et prédiction de pose en réalité étendue WO2021178980A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/029607 WO2021178980A1 (fr) 2021-04-28 2021-04-28 Synchronisation de données et prédiction de pose en réalité étendue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/029607 WO2021178980A1 (fr) 2021-04-28 2021-04-28 Synchronisation de données et prédiction de pose en réalité étendue

Publications (1)

Publication Number Publication Date
WO2021178980A1 true WO2021178980A1 (fr) 2021-09-10

Family

ID=77612966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/029607 WO2021178980A1 (fr) 2021-04-28 2021-04-28 Synchronisation de données et prédiction de pose en réalité étendue

Country Status (1)

Country Link
WO (1) WO2021178980A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023075765A1 (fr) * 2021-10-27 2023-05-04 Innopeak Technology, Inc. Slam fondées sur une image de profondeur
WO2024076840A1 (fr) * 2022-10-07 2024-04-11 Qualcomm Incorporated Appareil et procédés permettant d'améliorer des systèmes de surveillance de conducteur
CN117889853A (zh) * 2024-03-15 2024-04-16 歌尔股份有限公司 Slam定位方法、终端设备及可读存储介质
CN117889853B (zh) * 2024-03-15 2024-06-04 歌尔股份有限公司 Slam定位方法、终端设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148433A1 (en) * 2014-11-16 2016-05-26 Eonite, Inc. Systems and methods for augmented reality preparation, processing, and application
US20190096081A1 (en) * 2017-09-28 2019-03-28 Samsung Electronics Co., Ltd. Camera pose determination and tracking
US20200027275A1 (en) * 2018-07-23 2020-01-23 Magic Leap, Inc. Method and system for resolving hemisphere ambiguity using a position vector
US20200050208A1 (en) * 2018-08-08 2020-02-13 The Toro Company Autonomous machine navigation and training using vision system
US20200284590A1 (en) * 2019-03-05 2020-09-10 DeepMap Inc. Distributed processing of pose graphs for generating high definition maps for navigating autonomous vehicles
US20210024081A1 (en) * 2019-07-09 2021-01-28 Refraction Ai, Inc. Method and system for autonomous vehicle control

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148433A1 (en) * 2014-11-16 2016-05-26 Eonite, Inc. Systems and methods for augmented reality preparation, processing, and application
US20190096081A1 (en) * 2017-09-28 2019-03-28 Samsung Electronics Co., Ltd. Camera pose determination and tracking
US20200027275A1 (en) * 2018-07-23 2020-01-23 Magic Leap, Inc. Method and system for resolving hemisphere ambiguity using a position vector
US20200050208A1 (en) * 2018-08-08 2020-02-13 The Toro Company Autonomous machine navigation and training using vision system
US20200284590A1 (en) * 2019-03-05 2020-09-10 DeepMap Inc. Distributed processing of pose graphs for generating high definition maps for navigating autonomous vehicles
US20210024081A1 (en) * 2019-07-09 2021-01-28 Refraction Ai, Inc. Method and system for autonomous vehicle control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GEPPERT MARCEL; LIU PEIDONG; CUI ZHAOPENG; POLLEFEYS MARC; SATTLER TORSTEN: "Efficient 2D-3D Matching for Multi-Camera Visual Localization", 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 20 May 2019 (2019-05-20), pages 5972 - 5978, XP033594260, DOI: 10.1109/ICRA.2019.8794280 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023075765A1 (fr) * 2021-10-27 2023-05-04 Innopeak Technology, Inc. Slam fondées sur une image de profondeur
WO2024076840A1 (fr) * 2022-10-07 2024-04-11 Qualcomm Incorporated Appareil et procédés permettant d'améliorer des systèmes de surveillance de conducteur
CN117889853A (zh) * 2024-03-15 2024-04-16 歌尔股份有限公司 Slam定位方法、终端设备及可读存储介质
CN117889853B (zh) * 2024-03-15 2024-06-04 歌尔股份有限公司 Slam定位方法、终端设备及可读存储介质

Similar Documents

Publication Publication Date Title
WO2021178980A1 (fr) Synchronisation de données et prédiction de pose en réalité étendue
CN105141807B (zh) 视频信号图像处理方法和装置
US10863210B2 (en) Client-server communication for live filtering in a camera view
KR20190098003A (ko) 장치의 자세 추정 방법 및 그 장치
CN112509047A (zh) 基于图像的位姿确定方法、装置、存储介质及电子设备
US20210014539A1 (en) Dynamic video encoding and view adaptation in wireless computing environments
WO2020236432A1 (fr) Défloutage d'image/vidéo utilisant des réseaux neuronaux convolutionnels avec des applications au principe de structure acquise à partir d'un mouvement (sfm)/à la localisation et au mappage simultanés (sla) avec des images/vidéos floues
CN112783700A (zh) 用于基于网络的远程辅助系统的计算机可读介质
US11238604B1 (en) Densifying sparse depth maps
CN103019375A (zh) 一种基于图像识别的光标控制方法及其系统
WO2023101662A1 (fr) Procédés et systèmes pour mettre en œuvre une odométrie visuelle-inertielle sur la base d'un traitement simd parallèle
KR20210050997A (ko) 포즈 추정 방법 및 장치, 컴퓨터 판독 가능한 기록 매체 및 컴퓨터 프로그램
WO2023086398A1 (fr) Réseaux de rendu 3d basés sur des champs de radiance neurale de réfraction
WO2023091131A1 (fr) Procédés et systèmes pour récupérer des images sur la base de caractéristiques de plan sémantique
WO2023277877A1 (fr) Détection et reconstruction de plan sémantique 3d
WO2023277903A1 (fr) Architecture slam monoculaire à caméra double
WO2023277888A1 (fr) Suivi de la main selon multiples perspectives
US9619714B2 (en) Device and method for video generation
WO2023195982A1 (fr) Sous-échantillonnage d'image-clé pour réduction d'utilisation de mémoire dans slam
WO2023075765A1 (fr) Slam fondées sur une image de profondeur
WO2023219612A1 (fr) Redimensionnement adaptatif d'objets manipulables et lisibles
WO2023219615A1 (fr) Suivi de multiples dispositifs de réalité étendue
WO2023063937A1 (fr) Procédés et systèmes de détection de régions planes à l'aide d'une profondeur prédite
WO2023191810A1 (fr) Slam basé sur une tuile de carte
WO2023211435A1 (fr) Estimation de profondeur pour systèmes slam à l'aide de caméras monoculaires

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21764251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21764251

Country of ref document: EP

Kind code of ref document: A1