CN117321472A - Post-warping to minimize delays in moving objects - Google Patents

Post-warping to minimize delays in moving objects Download PDF

Info

Publication number
CN117321472A
CN117321472A CN202280036131.8A CN202280036131A CN117321472A CN 117321472 A CN117321472 A CN 117321472A CN 202280036131 A CN202280036131 A CN 202280036131A CN 117321472 A CN117321472 A CN 117321472A
Authority
CN
China
Prior art keywords
virtual
pose
tracking device
updated
virtual object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280036131.8A
Other languages
Chinese (zh)
Inventor
贝恩哈德·荣格
丹尼尔·瓦格纳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Snap Inc
Original Assignee
Snap Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/518,828 external-priority patent/US20220375026A1/en
Application filed by Snap Inc filed Critical Snap Inc
Priority claimed from PCT/US2022/072341 external-priority patent/WO2022246384A1/en
Publication of CN117321472A publication Critical patent/CN117321472A/en
Pending legal-status Critical Current

Links

Abstract

A method for minimizing delay of moving objects in an Augmented Reality (AR) display device is described. In one aspect, the method comprises: determining an initial pose of the visual tracking device; identifying an initial position of the object in an image generated by an optical sensor of the visual tracking device, the image corresponding to an initial pose of the visual tracking device; rendering virtual content based on the initial pose and the initial position of the object; retrieving an updated gesture of the visual tracking device; tracking an update position of the object in the update image corresponding to the update gesture; and applying a time warp transformation to the rendered virtual content based on the update pose and the update location of the object to generate transformed virtual content.

Description

Post-warping to minimize delays in moving objects
Cross Reference to Related Applications
The present application claims priority from U.S. application Ser. No. 17/518,828, filed 11/2021, which claims priority from U.S. provisional patent application Ser. No. 63/190,119, filed 5/2021, each of which is incorporated herein by reference in its entirety.
Technical Field
The subject matter disclosed herein relates generally to display systems. In particular, the present disclosure proposes systems and methods for reducing motion-to-photon delay in Augmented Reality (AR) devices.
Background
Augmented Reality (AR) systems present virtual content to augment a user's real world environment. For example, virtual content overlaid on a physical object may be used to create an illusion that the physical object is moving, animating, etc. The AR device worn by the user continuously updates the presentation of the virtual content based on the user's movements to create an illusion that the virtual content is physically present in the user's real world environment. For example, as the user moves their head, the AR device updates the presentation of the virtual content to create the illusion that the virtual content remains in the same geographic location within the user's real world environment. Thus, the user may move around the virtual object presented by the AR device in the same manner as the user moves around the physical object.
In order to convincingly create the illusion that the virtual object is in the user's real world environment, the AR device must update the presentation of the virtual object almost instantaneously as the device moves. However, the virtual content may require a longer time to update because the AR device must process the environment data, render the virtual content, and then project the virtual content. The process creates a delay between the time the physical object is tracked by the AR device to the time the rendered virtual object is displayed in the display of the AR device. This delay is also referred to as the "motion-to-photon delay". Any perceived motion-to-photon delay can degrade the user experience.
Drawings
To facilitate identification of a discussion of any particular element or act, one or more of the highest digits in a reference numeral refer to the figure number in which that element was first introduced.
FIG. 1 is a block diagram illustrating an environment for operating an AR/VR display device in accordance with one example embodiment.
Fig. 2 is a block diagram illustrating an AR/VR display device in accordance with an example embodiment.
FIG. 3 is a block diagram illustrating a tracking system according to an example embodiment.
Fig. 4 is a block diagram illustrating a display controller according to an example embodiment.
Fig. 5 is a block diagram illustrating a process for time warping according to an example embodiment.
Fig. 6 is a block diagram illustrating a time warp engine according to an example embodiment.
Fig. 7 is a flowchart illustrating a method for applying a time warp process according to an example embodiment.
Fig. 8 is a flowchart illustrating a method for applying a time warp process according to an example embodiment.
Fig. 9 is a flowchart illustrating a method for applying a time warp process according to an example embodiment.
Fig. 10 is a flowchart illustrating a method for applying a time warp process according to an example embodiment.
Fig. 11 illustrates a network environment in which a head wearable device may be implemented, according to an example embodiment.
Fig. 12 is a block diagram illustrating a software architecture within which the present disclosure may be implemented, according to an example embodiment.
FIG. 13 is a diagrammatic representation of machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed according to an example embodiment.
Detailed Description
The following description describes systems, methods, techniques, sequences of instructions, and computer program products that illustrate example implementations of the present subject matter. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the present subject matter. It will be apparent, however, to one skilled in the art that embodiments of the present subject matter may be practiced without some or other of these specific details. Examples merely typify possible variations. Structures (e.g., structural components such as modules) are optional and may be combined or sub-divided, and operations (e.g., in a process, algorithm, or other function) may be varied in sequence or combined or sub-divided, unless explicitly stated otherwise.
The term "augmented reality" (AR) is used herein to refer to an interactive experience of a real-world environment in which physical objects residing in the real world are "augmented" or augmented by computer-generated digital content (also referred to as virtual content or synthetic content). AR may also refer to a system that enables a combination of real world and virtual world, real-time interaction, and 3D registration of virtual objects and real objects. AR creates the illusion that virtual content is physically present in and appears to be connected or interacted with the user's real world environment.
The term "virtual reality" (VR) is used herein to refer to a simulated experience of a virtual world environment that is quite different from a real world environment. Computer-generated digital content is displayed in a virtual world environment. VR also refers to a system that enables a user of a VR system to be fully immersed in and interact with virtual objects presented in a virtual world environment.
The term "AR application" is used herein to refer to a computer-operated application that implements an AR experience. The term "VR application" is used herein to refer to a computer-operated application that implements a VR experience. The term "AR/VR application" refers to a computer-operated application that implements an AR experience or a combination of VR experiences.
The term "AR display device" (also referred to as "AR device") is used herein to refer to a computing device that operates an AR application. The term "VR display device" (also referred to as "VR device") is used herein to refer to a computing device that operates a VR application. The term "AR/VR display device" (also referred to as "AR/VR device") is used herein to refer to a computing device that operates a combination of AR applications and VR applications.
The term "vision tracking system" (also referred to as "vision tracking device") is used herein to refer to a computer-operated application that tracks visual features identified in images captured by one or more cameras of the vision tracking system. The vision tracking system builds a model of the real world environment based on the tracked vision features. Non-limiting examples of vision tracking systems include: a visual synchrony positioning and mapping system (VSLAM) and a Visual Inertial Odometer (VIO) system. The VSLAM may be used to construct a target from an environment or scene based on one or more cameras of the visual tracking system. VIO (also known as visual inertial tracking) determines the latest pose (e.g., position and orientation) of a device based on data acquired from a plurality of sensors (e.g., optical sensors, inertial sensors) of the device.
The term "inertial measurement unit" (IMU) is used herein to refer to a device that can report on the inertial state of a moving body, including acceleration, speed, orientation, and positioning of the moving body. The IMU achieves tracking of movement of the subject by integrating the acceleration and angular velocity measured by the IMU. IMU may also refer to a combination of accelerometers and gyroscopes that may determine and quantify linear acceleration and angular velocity, respectively. Values obtained from the IMU gyroscope may be processed to obtain pitch, roll, and heading of the IMU, thereby obtaining pitch, roll, and heading of a subject associated with the IMU. Signals from the accelerometer of the IMU may also be processed to obtain the velocity and displacement of the IMU.
The term "motion-to-photon delay" (M2P delay) is used herein to refer to the duration between a user moving a visual tracking device and its presentation of virtual content adapting to that particular motion. Motion-to-photon delay may also refer to a delay associated with presenting virtual content in an AR device. As the user moves the AR device, the view of the user's real world environment changes instantaneously. However, the virtual content requires a long time to update because the AR device must process the environment data using the IMU data, render the virtual content, and project the virtual content in front of the user's field of view. Motion-to-photon delay may cause the virtual content to appear jittery or lagging and degrade the user's AR experience.
The term "time-warping" (also referred to as "time-warping)", "late-warping)", and "late-warping" are used herein to refer to a re-projection technique that distorts a rendered image before sending the rendered image to a display to correct for head movement that occurs after rendering. The process takes the already rendered image, modifies it with the rotation data that was most recently collected from the IMU, and then displays the warped image on the screen.
Previous solutions for reducing M2P latency have relied on detecting feature points on stationary physical objects. In other words, the previous solution only addresses the delay due to the motion of the AR display device, and not the delay due to the motion of the physical object. M2P, which is also independent of the physical object that the AR display device moves (e.g., a user's hand or physical object such as moving in a real world environment), causes additional M2P delays.
Systems and methods for reducing motion-to-photon delay in an AR device are described. The present system not only considers the motion of the AR display device, but also tracks the motion of the physical object to be enhanced. The system applies post-distortion processing that is optimized for both (movement of the AR device and movement of the physical object). For example, an AR device (using computer vision algorithms) tracks the face of another person. The AR device applies post-distortion processing based on the latest positions of the tracked faces and the latest IMU data. By tracking and accounting for both AR device motion and facial motion, the AR device may generate enhancements that are placed more accurately over the face. In another example, the post-distortion process considers a predetermined animated movement of the virtual object. In yet another example, the post-distortion process considers both the latest position of the static physical object and the latest position of the dynamic physical object.
In one example embodiment, a method for minimizing delay of moving objects in an Augmented Reality (AR) display device is described. In one aspect, the method comprises: determining an initial pose of the visual tracking device; identifying an initial position of the object in an image generated by an optical sensor of the visual tracking device, the image corresponding to an initial pose of the visual tracking device; rendering virtual content based on the initial pose and the initial position of the object; retrieving an updated gesture of the visual tracking device; tracking an update position of the object in the update image corresponding to the update gesture; and applying a time warp transformation to the rendered virtual content based on the update pose and the update location of the object to generate transformed virtual content.
Accordingly, one or more of the methods described herein help solve the technical problem of delay in displaying rendered moving objects in a visual tracking device by accounting for the motion of the tracked physical objects in a post-distortion process. The presently described method provides improvements to computer function operations by providing M2P latency reduction. Thus, one or more of the methods described herein may avoid the need for certain efforts or computing resources. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, network bandwidth, and cooling capacity.
Fig. 1 is a network diagram illustrating an environment 100 suitable for operating an AR/VR device 106 in accordance with some example embodiments. The environment 100 includes a user 102, an AR/VR device 106, and a physical object 104. The user 102 operates the AR/VR device 106. The user 102 may be a human user (e.g., a human), a machine user (e.g., a computer configured by a software program to interact with the AR/VR device 106), or any suitable combination thereof (e.g., a machine-assisted human or a machine supervised by a human). The user 102 operates the AR/VR device 106.
The AR/VR device 106 may be a computing device with a display, such as a smart phone, tablet computer, or a wearable computing device (e.g., a watch or glasses). The computing device may be handheld or may be removably mounted to the head of the user 102. In one example, the display includes a screen that displays images captured with the camera of the AR/VR device 106. In another example, the display of the AR/VR device 106 may be transparent, such as in the lenses of wearable computing eyewear. In other examples, the display may be opaque, partially transparent, partially opaque. In other examples, the display may be wearable by the user 102 to cover the field of view of the user 102.
The AR/VR device 106 includes an AR application that generates virtual content based on images detected with the imaging means of the AR/VR device 106. For example, the user 102 may direct the AR/VR device 106 to capture an image of the physical object 104. In one example, the user 102 moves or rotates the AR/VR device 106 in one direction and the physical object 104 moves in another direction. The AR application generates virtual content corresponding to the identified object (e.g., physical object 104) in the image and presents the virtual content in a display of the AR/VR device 106.
The AR/VR device 106 includes a tracking system 108. In one example, the tracking system 108 tracks the pose (e.g., position and orientation) of the AR/VR device 106 relative to the real world environment 110 using, for example, optical sensors (e.g., depth-enabled 3D cameras, image cameras), inertial sensors (e.g., gyroscopes, accelerometers, magnetometers), wireless sensors (bluetooth, wi-Fi), GPS sensors, and audio sensors. In another example, the tracking system 108 tracks the location of the physical object 104 or the location of virtual content (generated by the AR/VR device 106). The AR/VR device 106 displays virtual content based on the pose of the AR/VR device 106 with respect to the real world environment 110 and/or the physical object 104.
The AR/VR device 106 may operate over a computer network. The computer network may be any network that enables communication between or among machines, databases, and devices. Thus, the computer network may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. The computer network may include one or more portions that constitute a private network, a public network (e.g., the internet), or any suitable combination thereof.
Any of the machines, databases, or devices illustrated in fig. 1 may be implemented in a general-purpose computer modified (e.g., configured or programmed) by software to be a special-purpose computer to perform one or more of the functions described herein for the machine, database, or device. For example, a computer system capable of implementing any one or more of the methods described herein is discussed below with reference to fig. 7-10. As used herein, a "database" is a data storage resource and may store data structured as text files, tables, spreadsheets, relational databases (e.g., object-relational databases), triad stores, hierarchical data stores, or any suitable combination thereof. Furthermore, any two or more of the machines, databases, or devices illustrated in fig. 1 may be combined into a single machine, and the functionality described herein with respect to any single machine, database, or device may be subdivided among multiple machines, databases, or devices.
Fig. 2 is a block diagram illustrating modules (e.g., components) of an AR/VR device 106 in accordance with some example embodiments. The AR/VR device 106 includes a sensor 202, a display 204, a processor 208, a graphics processing unit 216, a display controller 218, and a storage device 206. Examples of AR/VR device 106 include a wearable computing device (e.g., glasses), a tablet computer, a navigation device, a portable media device, or a smart phone.
The sensors 202 include, for example, optical sensors 212 (e.g., imaging devices such as color imaging devices, thermal imaging devices, depth sensors, and one or more gray scale, global shutter tracking imaging devices) and inertial sensors 214 (e.g., gyroscopes, accelerometers, magnetometers). Other examples of sensors 202 include proximity or location sensors (e.g., near field communication, GPS, bluetooth, wifi), audio sensors (e.g., microphones), or any suitable combination thereof. Note that the sensor 202 described herein is for illustration purposes, and thus the sensor 202 is not limited to the sensor described above.
The display 204 includes a screen or monitor configured to display images generated by the processor 208. In one example embodiment, the display 204 may be transparent or translucent so that the user 102 may view through the display 204 (in the AR use case). In another example, the display 204 (e.g., an LCOS display) presents each frame of virtual content in multiple presentations.
The processor 208 includes an AR/VR application 210 and the tracking system 108. The AR/VR application 210 uses computer vision to detect and identify the physical environment or physical object 104. The AR/VR application 210 retrieves a virtual object (e.g., a 3D object model) based on the identified physical object 104 or physical environment. The AR/VR application 210 renders virtual objects in the display 204. For AR applications, the AR/VR application 210 includes a local rendering engine that renders a 3D model of a virtual object overlaid on (e.g., superimposed on or otherwise displayed simultaneously with) an image or view of the physical object 104. The view of the virtual object may be manipulated by adjusting the positioning of the physical object 104 (e.g., its physical position, orientation, or both) relative to the optical sensor 212. Similarly, the view of the virtual object may be manipulated by adjusting the pose of the AR/VR device 106 relative to the physical object 104. For VR applications, the AR/VR application 210 displays the virtual object in the display 204 at a location (in the display 204) determined based on the pose of the AR/VR device 106.
In one example implementation, the tracking system 108 estimates the pose of the AR/VR device 106. For example, the tracking system 108 uses image data from the optical sensor 212 and corresponding inertial data of the inertial sensor 214 to track the position and pose of the AR/VR device 106 relative to a frame of reference (e.g., the real world environment 110). In one example, the tracking system 108 uses the sensor data to determine a three-dimensional pose of the AR/VR device 106. The three-dimensional pose is a determined orientation and positioning of the AR/VR device 106 relative to the user's real world environment 110. For example, the AR/VR device 106 may use images of the user's real world environment 110 and other sensor data to identify the relative positioning and orientation of the AR/VR device 106 and physical objects in the real world environment 110 surrounding the AR/VR device 106. The tracking system 108 continuously collects and uses updated sensor data describing the movement of the AR/VR device 106 to determine an updated three-dimensional pose of the AR/VR device 106 that is indicative of changes in the relative positioning and orientation of the AR/VR device 106 and physical objects in the real-world environment 110. The tracking system 108 provides the three-dimensional pose of the AR/VR device 106 to the graphics processing unit 216.
In another example embodiment, the tracking system 108 (using computer vision) tracks the location of the detected physical object. For example, the tracking system 108 includes a face recognition algorithm and a face tracking algorithm that detect and track faces in images captured by the optical sensor 212. In another example, the tracking system 108 tracks its location based on predefined/preset behavior of the dynamic virtual content. For example, the tracking system 108 accesses an animation configuration of the virtual object to identify the trajectory, behavior, path of the virtual object over time.
Graphics processing unit 216 includes a rendering engine (not shown) configured to render frames of a 3D model of a virtual object based on virtual content provided by AR/VR application 210 and gestures provided by tracking system 108. In other words, the graphics processing unit 216 uses the three-dimensional pose of the AR/VR device 106 to generate frames of virtual content to be presented on the display 204. For example, the graphics processing unit 216 uses three-dimensional gestures to render frames of virtual content such that the virtual content is presented in the display 204 in the proper orientation and positioning to appropriately enhance the user's authenticity. As an example, the graphics processing unit 216 may render frames of virtual content using three-dimensional gesture data such that the virtual content overlaps physical objects in the user's real-world environment 110 when presented on the display 204. The graphics processing unit 216 generates updated frames of virtual content based on the updated three-dimensional gestures of the AR/VR device 106 that reflect changes in the position and orientation of the user relative to physical objects in the user's real world environment 110. Graphics processing unit 216 transmits the rendered frames to display controller 218.
The display controller 218 is positioned as an intermediary between the graphics processing unit 216 and the display 204. The display controller 218 receives image data (e.g., rendered frames) from the graphics processing unit 216, readjusts the position of the rendered virtual content in the time-warped frames by performing a post-warp transformation based on the latest pose of the AR/VR device 106 and the latest tracking information (of the tracked physical object, the preset animation of the virtual object, the plurality of physical objects with different movements). The display controller 218 provides the time-warped frame to the display 204 for display.
The storage device 206 stores virtual object content 220. The virtual object content 220 includes, for example, a database of visual references (e.g., images, QR codes) and corresponding virtual content (e.g., three-dimensional models of virtual objects).
Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine) or a combination of hardware and software. For example, any of the modules described herein may configure a processor to perform the operations described herein for that module. Furthermore, any two or more of these modules may be combined into a single module, and the functionality described herein with respect to a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
FIG. 3 illustrates a tracking system 108 according to an example embodiment. Tracking system 108 includes, for example, a visual tracking system 308 and a content tracking system 310.
The vision tracking system 308 includes an inertial sensor module 302, an optical sensor module 304, and a pose estimation module 306. The inertial sensor module 302 accesses inertial sensor data from the inertial sensor 214. The optical sensor module 304 accesses optical sensor data from the optical sensor 212.
The pose estimation module 306 determines a pose (e.g., position, location, orientation) of the AR/VR device 106 relative to a frame of reference (e.g., real world environment 110). In one example implementation, pose estimation module 306 estimates the pose of AR/VR device 106 based on a 3D map of feature points from the image captured by optical sensor 212 and from inertial sensor data captured by inertial sensor 214.
In one example, the pose estimation module 306 includes an algorithm that combines inertial information from the inertial sensor 214 and image information from the optical sensor 212, the inertial sensor 214 and the optical sensor 212 being coupled to a rigid platform (e.g., the AR/VR device 106) or a vehicle (rig). The carrier may include a plurality of cameras (with non-overlapping (distributed aperture) or overlapping (stereoscopic or more) fields of view) and IMUs mounted on a rigid platform (e.g., the carrier may thus have at least one IMU and at least one camera). In another example embodiment, the presently described motion-to-photon delay optimization may operate with a simpler tracking module (e.g., a module in which only rotation data from the IMU is tracked), thereby eliminating the need for image data from the optical sensor 212.
Content tracking system 310 includes a face tracking system 312, an animated virtual object tracking system 314, and a dynamic physical object tracking system 316. The face tracking system 312 uses computer vision to detect and track the location of the face (within the image). In one example, the face tracking system 312 detects and tracks the face of another person in an image captured with the optical sensor 212.
The animated virtual object tracking system 314 retrieves the configuration of the animated virtual object to identify and track the latest position of the animated virtual object. The configuration of the animated virtual object indicates a predefined behavior of the animated virtual object. For example, the animated virtual object includes a virtual ball having a predefined trajectory.
Dynamic physical object tracking system 316 uses computer vision to identify a moving physical object and track the latest position (within the image) of the physical object. For example, dynamic physical object tracking system 316 tracks the position of a moving physical object (e.g., a hand, a human body, an automobile, or any other physical object).
Fig. 4 is a block diagram illustrating a display controller 218 according to an example embodiment. The display controller 218 includes a time warp engine 402 that receives the rendered virtual objects from the graphics processing unit 216.
The time warping engine 402 retrieves the most current orientation data of the AR/VR device 106 from the inertial sensor module 302. The time warp engine 402 retrieves the latest position of the moving physical object, the latest position of the animated virtual object, or the latest position of a combination of static and dynamic physical objects.
The time warping engine 402 warps/re-projects the rendered virtual object based on a combination of the latest IMU data from the visual tracking system 308 and the latest location of the tracked content from the content tracking system 310 to generate a re-projected/warped frame. The time warp engine 402 provides the warped frames to the display 204 for display.
FIG. 5 is a block diagram illustrating an example process according to an example embodiment. The tracking system 108 receives sensor data from the sensors 202 to determine a pose (e.g., pose a) of the AR/VR device 106. The tracking system 108 provides the gesture to the graphics processing unit 216. Graphics processing unit 216 operates 3D rendering engine 502 to render frames (e.g., frame a) of animated/non-animated virtual content (provided by AR/VR application 210) at a location (in display 204) based on a gesture (e.g., gesture a) received from tracking system 108. Graphics processing unit 216 provides the rendered frame (e.g., frame a) to display controller 218.
The display controller 218 receives updated pose or rotation data (e.g., IMU data) from the tracking system 108. The display controller 218 also receives content tracking data (indicating the most recent location of tracking content (e.g., face, moving physical object)) from the tracking system 108.
The display controller 218 applies the time warping engine 402 to the rendered virtual object by performing a three-dimensional shifting operation on the rendered frame (e.g., frame a) based on the latest IMU data and the latest content tracking data to generate a new frame (e.g., frame b). The display controller 218 transmits frame b to the display 204 for display.
Fig. 6 is a block diagram illustrating a time warp engine 402 according to one embodiment. The time warp engine 402 includes a face warp module 602, an animated virtual object warp module 604, and a multi-object warp module 606.
The face distortion module 602 receives tracking data from the face tracking system 312. The tracking data includes the most recent position/location of the face depicted in the image captured by the optical sensor 212. For example, the user may be moving. The optical sensor 212 captures a first picture depicting the user at a first location in the first picture. The 3D rendering engine 502 generates a rendered virtual object based on the pose of the AR/VR device 106 and the face of the user at the first location. Subsequently, the optical sensor 212 captures a second picture depicting that the user has moved to a second position in the second picture. The face warping module 602 applies a post-warping algorithm to the rendered image based on the latest IMU data from the inertial sensor module 302 and the tracked face of the user at the second location (the latest user face location as determined by the face tracking algorithm).
The animated virtual object distortion module 604 receives tracking data from the animated virtual object tracking system 314. The tracking data includes the most recent position of the animated virtual object (generated by the AR/VR application 210) within the image captured by the optical sensor 212. The latest position may be identified based on the animation configuration settings of the animated virtual object. For example, the animation configuration settings define a setting track for the animated virtual object. The animated virtual object warping module 604 applies a post-warping algorithm to the rendered image based on the latest position (or predicted position) of the animated virtual object (according to settings in the animation configuration) and based on the latest IMU data from the inertial sensor module 302.
The multi-object warping module 606 receives tracking data of one or more physical objects tracked by computer vision algorithms of the dynamic physical object tracking system 316. The tracking data includes the most recent positions/locations of the first and second physical objects depicted in the image captured by the optical sensor 212. For example, a first physical object is stationary, while a second physical object is moving.
The multi-object warping module 606 applies a post-warping algorithm separately to each tracked physical object in the separate rendering layers. For example, the multi-object warping module 606 applies a post-warping algorithm to the first physical object to generate a first warping layer based on the latest IMU data from the inertial sensor module 302 and the tracked position of the first physical object. The multi-object warping module 606 applies a post-warping algorithm to the second physical object to generate a second warping layer based on the latest IMU data from the inertial sensor module 302 and the tracked position of the second physical object. The multi-object warping module 606 then combines the first warped layer and the second warped layer into a single rendered frame.
Fig. 7 is a flowchart illustrating a method 700 for applying a time warp process according to an example embodiment. The operations in method 700 may be performed by AR/VR device 106 using the components (e.g., modules, engines) described above with respect to fig. 4. Thus, the method 700 is described by way of example with reference to the AR/VR device 106. However, it should be understood that at least some of the operations of method 700 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
In block 702, the visual tracking system 308 identifies an initial pose of the AR/VR device 106. In block 704, the content tracking system 310 tracks the position of the object in the image corresponding to the initial pose. In block 706, the graphics processing unit 216 renders the virtual content based on the initial pose and the position of the object. In block 708, the time warp engine display controller 218 retrieves an updated pose of the AR/VR device 106 (based on the latest IMU data from the inertial sensor module 302). In block 710, the content tracking system 310 retrieves an update location of the object in the update image corresponding to the update gesture. In block 712, the time warp engine 402 applies a time warp transformation to the virtual content based on the update pose and the update location of the object. In block 714, the display 204 displays the transformed image.
It is noted that other embodiments may use different ordering, additional or fewer operations, and different nomenclature or terminology to accomplish similar functions. In some implementations, various operations may be performed in parallel with other operations in a synchronous or asynchronous manner. The operations described herein were chosen to illustrate some principles of operation in a simplified form.
Fig. 8 is a flowchart illustrating a method 800 for applying a time warp process according to an example embodiment. The operations in method 800 may be performed by AR/VR device 106 using components (e.g., modules, engines) described above with respect to fig. 4. Thus, the method 800 is described by way of example with reference to the AR/VR device 106. However, it should be understood that at least some of the operations of method 800 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
In block 802, the pose estimation module 306 determines an initial pose of the AR/VR device 106. In block 804, the face tracking system 312 tracks the position of the person's face in the image corresponding to the initial pose of the AR/VR device 106. In block 806, the AR/VR application 210 retrieves virtual content based on the face in the image. In block 808, the graphics processing unit 216 renders the 3D model of the virtual content based on the initial pose using the rendering engine. In block 810, the face distortion module 602 accesses updated inertial sensor data from the inertial sensor module 302 after rendering. In block 812, the face distortion module 602 accesses an updated position of the face in the updated image from the face tracking system 312 after rendering. In block 814, the face distortion module 602 applies a transformation to the rendered 3D model based on the updated inertial sensor data and the updated position of the face.
Fig. 9 is a flowchart illustrating a method 900 for applying a time warp process according to an example embodiment. The operations in method 900 may be performed by AR/VR device 106 using the components (e.g., modules, engines) described above with respect to fig. 3. Thus, the method 900 is described by way of example with reference to the AR/VR device 106. However, it should be understood that at least some of the operations of method 900 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
In block 902, the visual tracking system 308 identifies an initial pose of the AR/VR device 106. In block 904, dynamic physical object tracking system 316 tracks the positions of the first virtual object and the second virtual object in the image corresponding to the initial pose. In block 906, the graphics processing unit 216 renders a first layer of virtual content based on the initial pose and the location of the first virtual object. In block 908, the graphics processing unit 216 renders a second layer of virtual content based on the initial pose and the location of the second virtual object. In block 910, the time warp engine 402 retrieves an updated pose of the AR/VR device 106 (based on the latest IMU data from the inertial sensor module 302). In block 912, the dynamic physical object tracking system 316 identifies an update location of the first virtual object in the update image corresponding to the update gesture. In block 914, the dynamic physical object tracking system 316 identifies an update location of the second virtual object in the update image corresponding to the update gesture. The method 900 continues at block 916 in fig. 10.
Fig. 10 is a flowchart illustrating a method 1000 for applying a time warp process according to an example embodiment. The operations in method 1000 may be performed by AR/VR device 106 using components (e.g., modules, engines) described above with respect to fig. 2. Thus, the method 1000 is described by way of example with reference to the AR/VR device 106. However, it should be understood that at least some of the operations of method 1000 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
The method 1000 continues from block 916 of fig. 9. In block 1002, the multi-object warping module 606 applies a time warp transformation to the first layer based on the updated pose and the updated position of the first virtual object. In block 1004, the multi-object warping module 606 applies a time warp transformation to the second layer based on the updated pose and the updated position of the second virtual object. In block 1006, the multi-object warping module 606 combines the first layer and the second layer in a single rendered frame. In block 1008, the display 204 displays a single rendered frame.
System with head wearable device
Fig. 11 illustrates a network environment 1100 in which a head wearable device 1102 may be implemented, according to an example embodiment. Fig. 11 is a high-level functional block diagram of an example head wearable device 1102 communicatively coupled to a mobile client device 1138 and a server system 1132 via various networks 1140.
The head wearable device 1102 includes an imaging device, such as at least one of a visible light imaging device 1112, an infrared emitter 1114, and an infrared imaging device 1116. Client device 1138 may be capable of connecting with head wearable 1102 using both communication 1134 and communication 1136. Client devices 1138 connect to server system 1132 and network 1140. Network 1140 may include any combination of wired and wireless connections.
The head wearable device 1102 also includes two of the image displays 1104 of the optical assembly. The two image displays include one image display associated with the left side of the head wearable device 1102 and one image display associated with the right side of the head wearable device 1102. The head wearable device 1102 also includes an image display driver 1108, an image processor 1110, low power circuitry 1126 for low power consumption, and high speed circuitry 1118. The image display 1104 of the optical assembly is used to present images and video, including images that may include a graphical user interface, to a user of the head wearable device 1102.
The image display driver 1108 commands and controls the image display of the optical assembly's image display 1104. The image display driver 1108 may deliver image data directly to the image display in the image display 1104 of the optical assembly for presentation, or may have to convert the image data into a signal or data format suitable for delivery to an image display device. For example, the image data may be video data formatted according to a compression format such as h.264 (MPEG-4 part 10), HEVC, theora, dirac, realndeo RV40, VP8, VP9, etc., and the still image data may be formatted according to a compression format such as Portable Network Group (PNG), joint Photographic Experts Group (JPEG), tag Image File Format (TIFF), or exchangeable image file format (Exif).
As described above, the head wearable device 1102 includes a frame and a handle (or temple) extending from a side of the frame. The head wearable apparatus 1102 also includes a user input device 1106 (e.g., a touch sensor or push button) including an input surface on the head wearable apparatus 1102. A user input device 1106 (e.g., a touch sensor or press button) is used to receive input selections from a user that manipulate a graphical user interface of the presented image.
The components for the head wearable device 1102 shown in fig. 11 are located on one or more circuit boards (e.g., PCBs or flexible PCBs) in the bezel or temple. Alternatively or additionally, the depicted components may be located in a block, frame, hinge, or bridge of the head wearable device 1102. The left and right sides may include digital camera elements such as Complementary Metal Oxide Semiconductor (CMOS) image sensors, charge coupled devices, camera lenses, or any other corresponding visible light or light capturing element that may be used to capture data, including images of a scene with an unknown object.
The head wearable device 1102 includes a memory 1122, the memory 1122 storing instructions to perform a subset or all of the functions described herein. Memory 1122 may also include a storage device.
As shown in fig. 11, high speed circuitry 1118 includes a high speed processor 1120, a memory 1122, and high speed wireless circuitry 1124. In this example, the image display driver 1108 is coupled to the high speed circuitry 1118 and operated by the high speed processor 1120 to drive the left and right ones of the image displays 1104 of the optical assembly. High-speed processor 1120 can be any processor capable of managing the operation of any general computing system required by high-speed communication and head wearable device 1102. The high speed processor 1120 includes processing resources required to manage high speed data transmission over communication 1136 to a Wireless Local Area Network (WLAN) using high speed wireless circuitry 1124. In some examples, the high-speed processor 1120 executes an operating system (e.g., the LINUX operating system or other such operating system of the head wearable device 1102) and the operating system is stored in the memory 1122 for execution. The high-speed processor 1120 executing the software architecture of the head wearable device 1102 is used to manage data transfer with the high-speed wireless circuit 1124, among any other responsibilities. In some examples, the high-speed wireless circuit 1124 is configured to implement an Institute of Electrical and Electronics Engineers (IEEE) 802.11 communication standard (also referred to herein as Wi-Fi). In other examples, other high-speed communication standards may be implemented by the high-speed wireless circuit 1124.
The low power wireless circuitry 1130 and the high speed wireless circuitry 1124 of the head wearable device 1102 may include a short range transceiver (bluetooth TM ) And wireless wide area networks, local area networks, or wide area network transceivers (e.g., cellular or Wifi). Client device 1138, including transceivers that communicate via communications 1134 and communications 1136, may be implemented using details of the architecture of head wearable apparatus 1102, as may other elements of network 1140.
The memory 1122 includes any storage device capable of storing various data and applications including camera data generated by the left and right infrared cameras 1116 and the image processor 1110, images generated by the image display driver 1108 on an image display in the image display 1104 of the optical assembly, and the like. Although memory 1122 is shown as being integrated with high speed circuitry 1118, in other examples memory 1122 may be a separate stand-alone element of head wearable device 1102. In some such examples, the circuit by wires may provide a connection from the image processor 1110 or the low power processor 1128 to the memory 1122 through a chip including the high speed processor 1120. In other examples, the high-speed processor 1120 may manage addressing of the memory 1122 such that the low-power processor 1128 will enable the high-speed processor 1120 at any time that a read or write operation involving the memory 1122 is required.
As shown in fig. 11, a low power processor 1128 or a high speed processor 1120 of the head wearable device 1102 may be coupled to an image capture device (visible light image capture device 1112; infrared emitter 1114 or infrared image capture device 1116), an image display driver 1108, a user input device 1106 (e.g., a touch sensor or push button), and a memory 1122.
The head wearable device 1102 is connected to a host computer. For example, head wearable device 1102 is paired with client device 1138 via communication 1136 or connected to server system 1132 via network 1140. For example, the server system 1132 may be one or more computing devices that are part of a service or network computing system that includes a processor, memory, and a network communication interface to communicate with the client device 1138 and the head wearable 1102 over the network 1140.
Client device 1138 includes a processor and a network communication interface coupled to the processor. The network communication interface allows communication through network 1140, communication 1134, or communication 1136. The client device 1138 may also store at least a portion of the instructions for generating binaural audio content in a memory of the client device 1138 to implement the functionality described herein.
The output components of the head wearable device 1102 include visual components, such as a display, such as a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), a Light Emitting Diode (LED) display, a projector, or a waveguide. The image display of the optical assembly is driven by an image display driver 1108. The output components of the head wearable device 1102 also include acoustic components (e.g., speakers), haptic components (e.g., vibration motors), other signal generators, and the like. The input components of the head wearable apparatus 1102, client devices 1138, and server system 1132 (e.g., user input device 1106) may include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optoelectronic keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, touchpad, trackball, joystick, motion sensor, or other pointing instrument), tactile input components (e.g., physical buttons, a touch screen providing location and force of touch or touch gestures, or other tactile input components), audio input components (e.g., a microphone), and the like.
The head wearable device 1102 may optionally include additional peripheral elements. Such peripheral elements may include biometric sensors, additional sensors, or display elements integrated with the head wearable device 1102. For example, a peripheral element may include any I/O component, including an output component, a motion component, a positioning component, or any other such element described herein.
For example, the biometric component includes a detection representation (e.g., hand expression, face expression, voice expression, body posture or eye tracking), measuring biological signals (e.g., blood pressure, heart rate,Body temperature, perspiration, or brain waves), a person (e.g., voice recognition, retinal recognition, facial recognition, fingerprint recognition, or electroencephalogram-based recognition), and the like. The motion components include acceleration sensor components (e.g., accelerometers), gravity sensor components, rotation sensor components (e.g., gyroscopes), and the like. The positioning means comprises a position sensor means (e.g. a Global Positioning System (GPS) receiver means) for generating position coordinates, wiFi or bluetooth for generating positioning system coordinates TM Transceivers, altitude sensor components (e.g., altimeters or barometers that detect barometric pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and so forth. Such positioning system coordinates can also be received from client device 1138 via communication 1136 via low power wireless circuit 1130 or high speed wireless circuit 1124.
Fig. 12 is a block diagram 1200 illustrating a software architecture 1204, where the software architecture 1204 may be installed on any one or more of the devices described herein. The software architecture 1204 is supported by hardware, such as a machine 1202, the machine 1202 including a processor 1220, memory 1226 and I/O components 1238. In this example, the software architecture 1204 may be conceptualized as a stack of layers, with each layer providing a particular function. The software architecture 1204 includes layers such as an operating system 1212, libraries 1210, frameworks 1208, and applications 1206. In operation, application 1206 calls API call 1250 through the software stack and receives message 1252 in response to API call 1250.
Operating system 1212 manages hardware resources and provides common services. Operating system 1212 includes, for example, kernel 1214, services 1216, and drivers 1222. The kernel 1214 serves as an abstraction layer between the hardware and other software layers. For example, the kernel 1214 provides memory management, processor management (e.g., scheduling), component management, networking, and security settings, among other functions. Service 1216 may provide other common services for other software layers. The driver 1222 is responsible for controlling or interfacing with the underlying hardware. For example, the driver 1222 may include a display driver, an imaging device driver,Or (b)Low power consumption driver, flash memory driver, serial communication driver (e.g. Universal Serial Bus (USB) driver), and/or>Drivers, audio drivers, power management drivers, etc.
Library 1210 provides the low-level public infrastructure used by applications 1206. Library 1210 may include a system library 1218 (e.g., a C-standard library), system library 1218 providing functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, libraries 1210 may include API libraries 1224, such as media libraries (e.g., libraries for supporting presentation and manipulation of various media formats, such as moving picture experts group-4 (MPEG 4), advanced video coding (h.264 or AVC), moving picture experts group layer-3 (MP 3), advanced Audio Coding (AAC), adaptive multi-rate (AMR) audio codec, joint photographic experts group (JPEG or JPG), or Portable Network Graphics (PNG)), graphics libraries (e.g., openGL framework for rendering in two-dimensional (2D) and three-dimensional (3D) in graphical content on a display), database libraries (e.g., SQLite for providing various relational database functions), web libraries (e.g., webKit for providing web browsing functions), and the like. The library 1210 may also include a variety of other libraries 1228 to provide many other APIs to the application 1206.
Framework 1208 provides the high-level public infrastructure used by applications 1206. For example, framework 1208 provides various Graphical User Interface (GUI) functions, advanced resource management, and advanced location services. Framework 1208 may provide a wide variety of other APIs that may be used by applications 1206, some of which may be specific to a particular operating system or platform.
In an example embodiment, the applications 1206 may include a home application 1236, a contacts application 1230, a browser application 1232, a book-viewer application 1234, a location application 1242, a media application 1244, a messaging application 1246, a gaming application 1248, and applications such as third partiesVarious other applications of application 1240. The application 1206 is a program that performs the functions defined in the program. One or more of the applications 1206 that are variously structured may be created using a variety of programming languages, such as an object oriented programming language (e.g., objective-C, java or C++) or a procedural programming language (e.g., C-language or assembly language). In a particular example, the third party application 1240 (e.g., using ANDROID by an entity other than the vendor of the particular platform) TM Or IOS TM Applications developed in Software Development Kits (SDKs) may be, for example, in IOS TM 、ANDROID TMMobile operating system of Phone or Linux OS or other mobile software running on mobile operating system. In this example, third party applications 1240 may call API calls 1250 provided by operating system 1212 to facilitate the functionality described herein.
Fig. 13 is a diagrammatic representation of machine 1300 within which instructions 1308 (e.g., software, programs, applications, applets, apps, or other executable code) for causing machine 1300 to perform any one or more of the methods discussed herein may be executed. For example, the instructions 1308 may cause the machine 1300 to perform any one or more of the methods described herein. The instructions 1308 transform a generic, un-programmed machine 1300 into a specific machine 1300 that is programmed to perform the functions described and illustrated in the manner described. The machine 1300 may operate as a standalone device or may be coupled (e.g., networked) to other machines. In a networked deployment, the machine 1300 may operate in the capacity of a server machine or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 1300 may include, but is not limited to: a server computer, a client computer, a Personal Computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a PDA, an entertainment media system, a cellular telephone, a smart phone, a mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart device, a network router, a network switch, a network bridge, or any machine capable of executing instructions 1308 that specify actions to be taken by machine 1300, sequentially or otherwise. Furthermore, while only a single machine 1300 is illustrated, the term "machine" shall also be taken to include a collection of machines that individually or jointly execute instructions 1308 to perform any one or more of the methodologies discussed herein.
The machine 1300 may include a processor 1302, memory 1304, and I/O components 1342, which may be configured to communicate with each other via a bus 1344. In example embodiments, the processor 1302 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an ASIC, a Radio Frequency Integrated Circuit (RFIC), other processors, or any suitable combination thereof) may include, for example, the processor 1306 and the processor 1310 executing the instructions 1308. The term "processor" is intended to include a multi-core processor, which may include two or more separate processors (sometimes referred to as "cores") that may concurrently execute instructions. Although fig. 13 shows multiple processors 1302, machine 1300 may include a single processor with a single core, a single processor with multiple cores (e.g., a multi-core processor), multiple processors with a single core, multiple processors with multiple cores, or any combination thereof.
Memory 1304 includes a main memory 1312, a static memory 1314, and a storage unit 1316, each of main memory 1312, static memory 1314, and storage unit 1316 being accessible by processor 1302 via bus 1344. Main memory 1304, static memory 1314, and storage unit 1316 store instructions 1308 that embody any one or more of the methods or functions described herein. The instructions 1308, during execution thereof by the machine 1300, may also reside, completely or partially, within the main memory 1312, within the static memory 1314, within the machine-readable medium 1318 within the storage unit 1316, within at least one processor 1302 of the processors 1302 (e.g., within a cache memory of a processor), or within any suitable combination thereof.
The I/O component 1342 may include a wide variety of components for receiving input, providing output, producing output, sending information, exchanging information, capturing measurement results, and the like. The particular I/O components 1342 included in a particular machine will depend on the type of machine. For example, a portable machine such as a mobile phone may include a touch input device or other such input mechanism, while a headless server machine would be unlikely to include such a touch input device. It should be appreciated that the I/O component 1342 may include many other components not shown in fig. 13. In various example embodiments, the I/O components 1342 may include an output component 1328 and an input component 1330. The output component 1328 can include visual components (e.g., a display such as a Plasma Display Panel (PDP), a Light Emitting Diode (LED) display, a Liquid Crystal Display (LCD), a projector, or a Cathode Ray Tube (CRT)), acoustic components (e.g., speakers), haptic components (e.g., vibration motors, resistance mechanisms), other signal generators, and so forth. The input components 1330 may include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optoelectronic keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, touchpad, trackball, joystick, motion sensor, or other pointing instrument), tactile input components (e.g., physical buttons, a touch screen providing a location and/or force of touch or touch gestures, or other tactile input components), audio input components (e.g., a microphone), and the like.
In other example embodiments, the I/O components 1342 may include biometric components 1332, motion components 1334, environmental components 1336, or positioning components 1338, among various other components. For example, biometric component 1332 includes components for detecting expressions (e.g., hand expressions, facial expressions, voice expressions, body gestures, or eye tracking), measuring biological signals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identifying a person (e.g., voice recognition, retinal recognition, facial recognition, fingerprint recognition, or electroencephalogram-based recognition), and the like. The motion component 1334 includes an acceleration sensor component (e.g., accelerometer), a gravity sensor component, a rotation sensor component (e.g., gyroscope), and the like. Environmental component 1336 includes, for example, an illumination sensor component (e.g., a photometer), a temperature sensor component (e.g., one or more thermometers that detect ambient temperature), a humidity sensor component, a pressure sensor component (e.g., a barometer), an auditory sensor component (e.g., one or more microphones that detect background noise), a proximity sensor component (e.g., an infrared sensor that detects nearby objects), a gas sensor (e.g., a gas detection sensor that detects a concentration of hazardous gas to ensure safety or to measure contaminants in the atmosphere), or other component that may provide an indication, measurement, or signal corresponding to the surrounding physical environment. The positioning component 1338 includes a position sensor component (e.g., a GPS receiver component), an altitude sensor component (e.g., an altimeter or barometer that detects barometric pressure from which altitude may be derived), an orientation sensor component (e.g., a magnetometer), and so forth.
Communication may be accomplished using a variety of techniques. The I/O component 1342 further includes a communication component 1340, the communication component 1340 being operable to couple the machine 1300 to the network 1320 or device 1322 via the coupling 1324 and the coupling 1326, respectively. For example, communications component 1340 may include a network interface component or another suitable device to interface with network 1320. In the other instance of the present invention, the communication means 1340 may include wired communication means, wireless communication means cellular communication component, near Field Communication (NFC) component,Parts (e.g.)>Low power consumption)/(f)>Means and other communication means for providing communication via other modalities. Device 1322 may be another machine or any of a variety of peripheral devices (e.g., a peripheral device coupled via USB))。
Further, communications component 1340 may detect an identifier or include components operable to detect an identifier. For example, the communication component 1340 may include a Radio Frequency Identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., an optical sensor for detecting one-dimensional barcodes such as Universal Product Code (UPC) barcodes, multi-dimensional barcodes such as Quick Response (QR) codes, aztec codes, data matrices, data symbols (Dataglyph), maximum codes (MaxiCode), PDF417, ultra codes (Ultra Code), UCC RSS-2D barcodes, and other optical codes), or an acoustic detection component (e.g., a microphone for identifying marked audio signals). In addition, various information may be available via communications component 1340, e.g., via an Internet Protocol (IP) geolocated location, via The location of signal triangulation, the location of NFC beacon signals that may indicate a particular location via detection, etc.
The various memories (e.g., memory 1304, main memory 1312, static memory 1314, and/or memory of processor 1302) and/or storage unit 1316 may store one or more sets of instructions and data structures (e.g., software) embodying or used by any one or more of the methods or functions described herein. These instructions (e.g., instructions 1308), when executed by the processor 1302, cause various operations to implement the disclosed embodiments.
The instructions 1308 may be transmitted or received over the network 1320 via a network interface device (e.g., a network interface component included in the communications component 1340) using a transmission medium and using any of a number of well-known transmission protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, instructions 1308 may be sent or received to device 1322 via coupling 1326 (e.g., a peer-to-peer coupling) using a transmission medium.
Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader scope of the disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof show by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments shown are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The present embodiments are, therefore, not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
These embodiments of the inventive subject matter may be referred to, individually and/or collectively, herein by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Thus, although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description.
The Abstract of the disclosure is provided to enable the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with such understanding: i.e., the abstract is not to be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing detailed description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment.
Example
Example 1 is a method for minimizing delay of a moving object, comprising: determining an initial pose of the visual tracking device; identifying an initial position of the object in an image generated by an optical sensor of the visual tracking device, the image corresponding to an initial pose of the visual tracking device; rendering virtual content based on the initial pose and the initial position of the object; retrieving an updated gesture of the visual tracking device; tracking an update position of the object in the update image corresponding to the update gesture; and applying a time warp transformation to the rendered virtual content based on the update pose and the update location of the object to generate transformed virtual content.
Example 2 includes the method of example 1, further comprising: generating virtual content using an AR application of a visual tracking device; rendering the virtual content using a rendering engine of a graphics processing unit at the virtual tracking device; and displaying the transformed virtual content in a display of the visual tracking device.
Example 3 includes the method of example 1, wherein the object comprises a face, a body part, or a physical object.
Example 4 includes the method of example 1, wherein the object comprises an animated virtual object, wherein identifying an initial position of the object in the image comprises identifying an initial position of the animated virtual object based on an animation behavior of the animated virtual object, wherein retrieving an updated position of the object further comprises tracking an updated position of the animated virtual object based on the animation behavior and the updated pose of the animated virtual object.
Example 5 includes the method of example 4, wherein the animation behavior describes a predefined motion path of the virtual object.
Example 6 includes the method of example 1, wherein retrieving the updated pose of the visual tracking device is based on inertial sensor data of the visual tracking device.
Example 7 includes the method of example 6, wherein the inertial sensor data includes angular velocity data of the visual tracking device between the initial pose and the updated pose.
Example 8 includes the method of example 1, wherein the object comprises a first virtual object and a second virtual object, wherein identifying the initial position of the object further comprises: identifying an initial position of a first virtual object; and identifying an initial location of the second virtual object, wherein rendering the virtual content further comprises: rendering a first layer of virtual content based on the gesture and an initial position of the first virtual object; and rendering a second layer of virtual content based on the gesture and the initial position of the second virtual object, wherein retrieving the updated position of the object further comprises: identifying an update location of the first virtual object based on the update gesture and the first animation behavior of the first virtual object; and identifying an updated position of the second virtual object based on the updated pose and the second animation behavior of the second virtual object.
Example 9 includes the method of example 8, wherein applying the time warp transform to the rendered virtual content further comprises: applying a time warp transform to the first layer; applying a time warp transform to the second layer; and combining the first layer and the second layer in a single rendered frame.
Example 10 includes the method of example 9, further comprising: a single rendered frame is displayed in a display of the visual tracking device.
Example 11 is a computing device, comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform operations comprising: determining an initial pose of the visual tracking device; identifying an initial position of the object in an image generated by an optical sensor of the visual tracking device, the image corresponding to an initial pose of the visual tracking device; rendering virtual content based on the initial pose and the initial position of the object; retrieving an updated gesture of the visual tracking device; tracking an update position of the object in the update image corresponding to the update gesture; and applying a time warp transformation to the rendered virtual content based on the update pose and the update location of the object to generate transformed virtual content.
Example 12 includes the computing device of example 11, wherein the instructions further configure the device to: generating virtual content using an AR application of a visual tracking device; rendering the virtual content using a rendering engine of a graphics processing unit at the virtual tracking device; and displaying the transformed virtual content in a display of the visual tracking device.
Example 13 includes the computing device of example 11, wherein the object comprises a face, a body part, or a physical object.
Example 14 includes the computing device of example 11, wherein the object comprises an animated virtual object, wherein identifying an initial position of the object in the image comprises identifying an initial position of the animated virtual object based on an animation behavior of the animated virtual object, wherein retrieving an updated position of the object further comprises tracking an updated position of the animated virtual object based on the animation behavior and the updated pose of the animated virtual object.
Example 15 includes the computing device of example 14, wherein the animation behavior describes a predefined motion path of the virtual object.
Example 16 includes the computing apparatus of example 11, wherein retrieving the updated pose of the visual tracking device is based on inertial sensor data of the visual tracking device.
Example 17 includes the computing apparatus of example 16, wherein the inertial sensor data includes angular velocity data of the visual tracking device between the initial pose and the updated pose.
Example 18 includes the computing device of example 11, wherein the object includes a first virtual object and a second virtual object, wherein identifying the initial location of the object further includes: identifying an initial position of a first virtual object; and identifying an initial location of the second virtual object, wherein rendering the virtual content further comprises: rendering a first layer of virtual content based on the gesture and an initial position of the first virtual object; and rendering a second layer of virtual content based on the gesture and the initial position of the second virtual object, wherein retrieving the updated position of the object further comprises: identifying an update location of the first virtual object based on the update gesture and the first animation behavior of the first virtual object; and identifying an updated position of the second virtual object based on the updated pose and the second animation behavior of the second virtual object.
Example 19 includes the computing device of example 18, wherein applying the time warp transform to the rendered virtual content further comprises: applying a time warp transform to the first layer; applying a time warp transform to the second layer; combining the first layer and the second layer in a single rendered frame; and displaying the single rendered frame in a display of the visual tracking device.
Example 20 is a non-transitory computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to: determining an initial pose of the visual tracking device; identifying an initial position of the object in an image generated by an optical sensor of the visual tracking device, the image corresponding to an initial pose of the visual tracking device; rendering virtual content based on the initial pose and the initial position of the object; retrieving an updated gesture of the visual tracking device; tracking an update position of the object in the update image corresponding to the update gesture; and applying a time warp transformation to the rendered virtual content based on the update pose and the update location of the object to generate transformed virtual content.

Claims (20)

1. A method for minimizing delay of moving objects, comprising:
determining an initial pose of the visual tracking device;
identifying an initial position of an object in an image generated by an optical sensor of the visual tracking device, the image corresponding to the initial pose of the visual tracking device;
rendering virtual content based on the initial pose and the initial position of the object;
retrieving an updated gesture of the visual tracking device;
Tracking an update location of the object in an update image corresponding to the update pose; and
based on the update pose and the update position of the object, a time warp transformation is applied to the rendered virtual content to generate transformed virtual content.
2. The method of claim 1, further comprising:
generating the virtual content using an AR application of the visual tracking device;
rendering the virtual content using a rendering engine of a graphics processing unit at the virtual tracking device; and
the transformed virtual content is displayed in a display of the visual tracking device.
3. The method of claim 1, wherein the object comprises a face, a body part, or a physical object.
4. The method of claim 1, wherein the object comprises an animated virtual object,
wherein identifying the initial position of the object in the image instead comprises identifying the initial position of the animated virtual object based on the animated behaviour of the animated virtual object,
wherein retrieving the updated location of the object further comprises tracking an updated location of the animated virtual object based on the animation behavior of the animated virtual object and the updated pose.
5. The method of claim 4, wherein the animation behavior describes a predefined motion path of the virtual object.
6. The method of claim 1, wherein retrieving the updated pose of the visual tracking device is based on inertial sensor data of the visual tracking device.
7. The method of claim 6, wherein the inertial sensor data includes angular velocity data of the visual tracking device between the initial pose and the updated pose.
8. The method of claim 1, wherein the objects comprise a first virtual object and a second virtual object,
wherein identifying the initial position of the object further comprises:
identifying an initial position of the first virtual object; and
identifying an initial position of the second virtual object,
wherein rendering the virtual content further comprises:
rendering a first layer of virtual content based on the gesture and the initial position of the first virtual object; and
rendering a second layer of virtual content based on the gesture and the initial position of the second virtual object,
wherein retrieving the updated location of the object further comprises:
Identifying an updated position of the first virtual object based on the updated pose and a first animation behavior of the first virtual object; and
an updated position of the second virtual object is identified based on the updated pose and a second animation behavior of the second virtual object.
9. The method of claim 8, wherein applying the time warp transformation to the rendered virtual content further comprises:
applying the time warp transform to the first layer;
applying the time warp transform to the second layer; and
the first layer and the second layer are combined in a single rendered frame.
10. The method of claim 9, further comprising:
the single rendered frame is displayed in a display of the visual tracking device.
11. A computing device, comprising:
a processor; and
a memory storing instructions that, when executed by the processor, configure the apparatus to perform operations comprising:
determining an initial pose of the visual tracking device;
identifying an initial position of an object in an image generated by an optical sensor of the visual tracking device, the image corresponding to the initial pose of the visual tracking device;
Rendering virtual content based on the initial pose and the initial position of the object;
retrieving an updated gesture of the visual tracking device;
tracking an update location of the object in an update image corresponding to the update pose; and
based on the update pose and the update position of the object, a time warp transformation is applied to the rendered virtual content to generate transformed virtual content.
12. The computing device of claim 11, wherein the instructions further configure the device to:
generating the virtual content using an AR application of the visual tracking device;
rendering the virtual content using a rendering engine of a graphics processing unit at the virtual tracking device; and
the transformed virtual content is displayed in a display of the visual tracking device.
13. The computing device of claim 11, wherein the object comprises a face, a body part, or a physical object.
14. The computing device of claim 11, wherein the object comprises an animated virtual object,
wherein identifying the initial position of the object in the image instead comprises identifying the initial position of the animated virtual object based on the animated behaviour of the animated virtual object,
Wherein retrieving the updated location of the object further comprises tracking an updated location of the animated virtual object based on the animation behavior of the animated virtual object and the updated pose.
15. The computing device of claim 14, wherein the animation behavior describes a predefined motion path of the virtual object.
16. The computing apparatus of claim 11, wherein retrieving the updated pose of the visual tracking device is based on inertial sensor data of the visual tracking device.
17. The computing apparatus of claim 16, wherein the inertial sensor data comprises angular velocity data of the visual tracking device between the initial pose and the updated pose.
18. The computing device of claim 11, wherein the object comprises a first virtual object and a second virtual object,
wherein identifying the initial position of the object further comprises:
identifying an initial position of the first virtual object; and
identifying an initial position of the second virtual object,
wherein rendering the virtual content further comprises:
rendering a first layer of virtual content based on the gesture and the initial position of the first virtual object; and
Rendering a second layer of virtual content based on the gesture and the initial position of the second virtual object,
wherein retrieving the updated location of the object further comprises:
identifying an updated position of the first virtual object based on the updated pose and a first animation behavior of the first virtual object; and
an updated position of the second virtual object is identified based on the updated pose and a second animation behavior of the second virtual object.
19. The computing device of claim 18, wherein applying the time warp transformation to the rendered virtual content further comprises:
applying the time warp transform to the first layer;
applying the time warp transform to the second layer;
combining the first layer and the second layer in a single rendered frame; and
the single rendered frame is displayed in a display of the visual tracking device.
20. A non-transitory computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to:
determining an initial pose of the visual tracking device;
identifying an initial position of an object in an image generated by an optical sensor of the visual tracking device, the image corresponding to the initial pose of the visual tracking device;
Rendering virtual content based on the initial pose and the initial position of the object;
retrieving an updated gesture of the visual tracking device;
tracking an update location of the object in an update image corresponding to the update pose; and
based on the update pose and the update position of the object, a time warp transformation is applied to the rendered virtual content to generate transformed virtual content.
CN202280036131.8A 2021-05-18 2022-05-16 Post-warping to minimize delays in moving objects Pending CN117321472A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/190,119 2021-05-18
US17/518,828 2021-11-04
US17/518,828 US20220375026A1 (en) 2021-05-18 2021-11-04 Late warping to minimize latency of moving objects
PCT/US2022/072341 WO2022246384A1 (en) 2021-05-18 2022-05-16 Late warping to minimize latency of moving objects

Publications (1)

Publication Number Publication Date
CN117321472A true CN117321472A (en) 2023-12-29

Family

ID=89262390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280036131.8A Pending CN117321472A (en) 2021-05-18 2022-05-16 Post-warping to minimize delays in moving objects

Country Status (1)

Country Link
CN (1) CN117321472A (en)

Similar Documents

Publication Publication Date Title
US20240031678A1 (en) Pose tracking for rolling shutter camera
US20230300464A1 (en) Direct scale level selection for multilevel feature tracking under motion blur
US11615506B2 (en) Dynamic over-rendering in late-warping
EP4342170A1 (en) Selective image pyramid computation for motion blur mitigation
WO2022245648A1 (en) Dynamic adjustment of exposure and iso related application
US20240029197A1 (en) Dynamic over-rendering in late-warping
US11683585B2 (en) Direct scale level selection for multilevel feature tracking under motion blur
US20220375110A1 (en) Augmented reality guided depth estimation
WO2022245815A1 (en) Dynamic initialization of 3dof ar tracking system
US20220375026A1 (en) Late warping to minimize latency of moving objects
CN117321472A (en) Post-warping to minimize delays in moving objects
US11941184B2 (en) Dynamic initialization of 3DOF AR tracking system
US20230154044A1 (en) Camera intrinsic re-calibration in mono visual tracking system
US20230421717A1 (en) Virtual selfie stick
EP4341742A1 (en) Late warping to minimize latency of moving objects
CN117425869A (en) Dynamic over-rendering in post-distortion
CN117321546A (en) Depth estimation for augmented reality guidance
EP4341786A1 (en) Augmented reality guided depth estimation
CN117337575A (en) Selective image pyramid computation for motion blur mitigation
WO2024086538A1 (en) Sign language interpretation with collaborative agents
CN117337422A (en) Dynamic initialization of three-degree-of-freedom augmented reality tracking system
CN117321635A (en) Direct scale level selection for multi-level feature tracking
CN117321634A (en) Stereoscopic depth warping correction
CN117441343A (en) Related applications of dynamic adjustment of exposure and ISO
CN117356089A (en) Intrinsic parameter estimation in a visual tracking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination