CN117337575A - Selective image pyramid computation for motion blur mitigation - Google Patents

Selective image pyramid computation for motion blur mitigation Download PDF

Info

Publication number
CN117337575A
CN117337575A CN202280035656.XA CN202280035656A CN117337575A CN 117337575 A CN117337575 A CN 117337575A CN 202280035656 A CN202280035656 A CN 202280035656A CN 117337575 A CN117337575 A CN 117337575A
Authority
CN
China
Prior art keywords
image
optical sensor
motion blur
motion
tracking system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280035656.XA
Other languages
Chinese (zh)
Inventor
奥拉·博里什
马蒂亚斯·卡尔格鲁伯
丹尼尔·沃尔夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Snap Inc
Original Assignee
Snap Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/521,081 external-priority patent/US20220375041A1/en
Application filed by Snap Inc filed Critical Snap Inc
Priority claimed from PCT/US2022/029629 external-priority patent/WO2022245821A1/en
Publication of CN117337575A publication Critical patent/CN117337575A/en
Pending legal-status Critical Current

Links

Abstract

Methods for mitigating motion blur in a visual tracking system are described. In one aspect, a method for selective motion blur mitigation in a visual tracking system includes: accessing a first image generated by an optical sensor of a vision tracking system; identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor; determining a motion of the optical sensor during generation of the first image by the optical sensor; determining a motion blur level of the first image based on camera operating parameters of the optical sensor and the motion of the optical sensor; and determining whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.

Description

Selective image pyramid computation for motion blur mitigation
RELATED APPLICATIONS
The present application claims priority from U.S. provisional patent application Ser. No. 17/521,081, filed on 8 at 11, 2021, and U.S. provisional patent application Ser. No. 63/189,893, filed on 18 at 5, 2021, each of which is incorporated herein by reference in its entirety.
Technical Field
The subject matter disclosed herein relates generally to vision tracking systems. In particular, the present disclosure relates to systems and methods for reducing motion blur in a visual tracking system.
Background
Augmented Reality (AR) devices enable a user to view a scene while seeing related virtual content that may be aligned with items, images, objects, or environments in the field of view of the device. Virtual Reality (VR) devices provide a more immersive experience than AR devices. The VR device obscures the user's view with virtual content that is displayed based on the location and orientation of the VR device.
Both AR and VR devices rely on motion tracking systems that track the pose (e.g., orientation, position, location) of the device. A motion tracking system (also referred to as a vision tracking system) uses images captured by optical sensors of an AR/VR device to track its pose. However, when the AR/VR device moves rapidly, the image may become blurred. Thus, high motion blur may lead to reduced tracking performance. Alternatively, high motion blur may also make the computation more computationally intensive, maintaining adequate tracking accuracy and image quality with high dynamics.
Drawings
To facilitate identification of a discussion of any particular element or act, one or more of the highest digit(s) in a reference number refers to the figure number in which that element was first introduced.
FIG. 1 is a block diagram illustrating an environment for operating an AR/VR display device in accordance with one example embodiment.
Fig. 2 is a block diagram illustrating an AR/VR display device in accordance with an example embodiment.
FIG. 3 is a block diagram illustrating a visual tracking system according to an example embodiment.
FIG. 4 is a block diagram illustrating a blur mitigation module according to one example embodiment.
Fig. 5 is a block diagram illustrating a process according to an example embodiment.
Fig. 6 is a flowchart illustrating a method for mitigating motion blur according to an example embodiment.
Fig. 7 is a flowchart illustrating a method for mitigating motion blur according to an example embodiment.
Fig. 8 is a block diagram illustrating a software architecture in which the present disclosure may be implemented, according to an example embodiment.
FIG. 9 is a diagrammatic representation of machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed according to an example embodiment.
Fig. 10 illustrates a network environment in which a head wearable device may be implemented, according to one example embodiment.
Detailed Description
The following description describes systems, methods, techniques, sequences of instructions, and computer program products that illustrate example embodiments of the present subject matter. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the present subject matter. It will be apparent, however, to one skilled in the art that embodiments of the present subject matter may be practiced without some or other of these specific details. Examples merely typify possible variations. Structures (e.g., structural components, such as modules) are optional and may be combined or sub-divided, and operations (e.g., in a process, algorithm, or other function) may be varied in sequence or combined or sub-divided, unless explicitly stated otherwise.
The term "augmented reality" (AR) is used herein to refer to an interactive experience in a real-world environment in which physical objects residing in the real world are "augmented" or enhanced by computer-generated digital content (also referred to as virtual content or synthesized content). AR may also refer to a system that is capable of combining real world and virtual world, real-time interactions, and 3D registration of virtual objects and real objects. The user of the AR system perceives virtual content that appears to be connected or interacted with the real world physical object.
As used herein, "virtual reality" (VR) refers to a simulated experience of a virtual world environment that is completely different from a real world environment. Computer-generated digital content is displayed in a virtual world environment. VR also refers to a system that fully immerses a user of the VR system in the virtual world environment and interacts with virtual objects presented in the virtual world environment.
The term "AR application" as used herein refers to a computer-operated application that enables an AR experience. The term "VR application" as used herein refers to a computer-operated application that enables a VR experience. The term "AR/VR application" refers to a computer-operated application that enables an AR experience or a combination of VR experiences.
The term "visual tracking system" as used herein refers to a computer-operated application or system that enables the system to track visual features identified in images captured by one or more cameras of the visual tracking system. The vision tracking system models the real world environment based on the tracked vision features. Non-limiting examples of vision tracking systems include: a visual synchrony positioning and mapping system (VSLAM) and a visual odometer inertial (VIO) system. The VSLAM may be used to construct a target from an environment or scene based on one or more cameras of a visual tracking system. VIOs (also known as visual inertial tracking systems and visual inertial odometry systems) determine the latest pose (e.g., position and orientation) of a device based on data acquired from a plurality of sensors (e.g., optical sensors, inertial sensors) of the device.
The term "inertial measurement unit" (IMU) as used herein refers to a device capable of reporting the inertial state of a mobile body, including acceleration, speed, orientation and positioning of the mobile body. By integrating the acceleration and angular velocity measured by the IMU, the IMU is able to track the movement of the subject. IMU may also refer to a combination of accelerometers and gyroscopes that are capable of determining and quantifying linear acceleration and angular velocity, respectively. Values obtained from the IMU gyroscope may be processed to obtain pitch, roll, and heading of the IMU, thereby obtaining pitch, roll, and heading of a subject associated with the IMU. Signals from the accelerometer of the IMU may also be processed to obtain the velocity and displacement of the IMU.
Both AR and VR applications allow users to access information, for example, in the form of virtual content presented in the display of an AR/VR display device (also referred to as a display device). The presentation of the virtual content may be based on the positioning of the display device with respect to the physical object or with respect to a frame of reference (external to the display device) such that the virtual content appears correctly in the display. To the AR, the virtual content appears to be aligned with the physical object perceived by the user and the camera of the AR display device. Virtual content appears to be affiliated with the physical world (e.g., physical objects of interest). To do this, the AR display device detects the physical object and tracks the pose of the AR display device relative to the location of the physical object. The pose identifies the orientation and positioning of the display device relative to a frame of reference or relative to another object. For VR, the virtual object appears at a location based on the pose of the VR display device. Thus, the virtual content is refreshed based on the latest pose of the device. A visual tracking system at the display device determines a pose of the display device. Examples of vision tracking systems include vision inertial tracking systems (also known as vision odometry systems) that rely on data acquired from multiple sensors (e.g., optical sensors, inertial sensors).
When the camera device is moving rapidly (e.g., rotating rapidly), the image captured by the vision tracking system may be blurred. Motion blur in the image can lead to reduced tracking performance (of the visual tracking system). Alternatively, motion blur may also make the calculation of the visual tracking system more extensive to maintain adequate tracking accuracy and image quality with high dynamics.
In particular, visual tracking systems are typically based on image feature matching components. In the incoming video stream, the algorithm detects different 3D points (features) in the image and attempts to find (match) these points again in the subsequent image. In this matching procedure, the first image is referred to herein as a "source image". The second image (e.g., a subsequent image in which the feature needs to be matched) is referred to herein as a "target image".
Reliable feature points are typically detected in high contrast areas (e.g., corners or edges) of the image. However, for a head wearable device with a built-in camera, the camera may move rapidly as the user shakes his/her head, resulting in severe motion blur in the image captured with the built-in camera. Such rapid movement can result in blurring of the high contrast areas. Thus, the feature detection and matching stage of the visual tracking system is negatively affected, as is the overall tracking accuracy of the system.
A common strategy to mitigate motion blur is to perform feature detection and matching on downsampled versions of the source and target images if matching fails in the original image resolution due to motion blur. Although visual information may be lost in the downsampled version of the image, motion blur may be reduced. Thus, feature matching becomes more reliable. Typically, the image is downsampled multiple times to obtain different resolutions for motion blur of different degrees of severity, and the set of all different versions is referred to as an image pyramid. The downscaling (downscaling) process is also referred to as an "image pyramid process" or "image pyramid algorithm". However, the image pyramid process is time consuming and is computationally intensive.
The present application describes a method of reducing motion blur by selectively applying an image pyramid process to select images (rather than to each captured image). The motion blur mitigation module determines where to apply the image pyramid process based on the estimated or predicted motion blur level. The IMU or VIO using the visual tracking system can accurately and efficiently identify the current or predicted motion blur level without analyzing the content (pixels) of the image. Analyzing image pixels is computationally expensive.
If (1) the current image includes motion blur and the visual tracking system needs to match features of the image at a lower scale resolution, or (2) if new features in the current image are expected to match at a lower resolution in future images, the motion blur mitigation module determines whether to apply the image pyramid process to the current image because in the near future there may be motion blur in the upcoming image.
In one exemplary embodiment, a method for selective motion blur mitigation in a visual tracking system includes: accessing a first image generated by an optical sensor of a vision tracking system; identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor; determining a motion of the optical sensor during generation of the first image by the optical sensor; determining a motion blur level of the first image based on camera operating parameters of the optical sensor and the motion of the optical sensor; and determining whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
Accordingly, one or more of the methods described herein facilitate solving the technical problem of power consumption savings by selectively applying a computationally intensive image pyramid process to a current image. The presently described method provides improvements to the operation of computer functions by providing a reduction in power consumption. Thus, one or more of the methods described herein may avoid the need for certain efforts or computing resources. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, network bandwidth, and cooling capacity.
Fig. 1 is a network diagram illustrating an environment 100 suitable for operating an AR/VR display device 106 in accordance with some example embodiments. The environment 100 includes a user 102, an AR/VR display device 106, and a physical object 104. The user 102 operates the AR/VR display device 106. The user 102 may be a human user (e.g., a human), a machine user (e.g., a computer configured by a software program to interact with the AR/VR display device 106), or any suitable combination thereof (e.g., a machine-assisted human or a machine supervised by a human). The user 102 is associated with an AR/VR display device 106.
The AR/VR display device 106 may be a computing device with a display, such as a smartphone, tablet, or wearable computing device (e.g., a watch or glasses). The computing device may be handheld or may be removably mounted to the head of the user 102. In one example, the display includes a screen that displays images captured with the camera of the AR/VR display device 106. In another example, the display of the device may be transparent, such as in the lenses of wearable computing eyewear. In other examples, the display may be opaque, partially transparent, partially opaque. In other examples, the display may be worn by the user 102 to cover the field of view of the user 102.
The AR/VR display device 106 includes an AR application that generates virtual content based on images detected with the camera of the AR/VR display device 106. For example, the user 102 may direct the camera of the AR/VR display device 106 to capture an image of the physical object 104. The AR application generates virtual content corresponding to the identified object (e.g., physical object 104) in the image and presents the virtual content in the display of the AR/VR display device 106.
The AR/VR display device 106 includes a vision tracking system 108. The vision tracking system 108 tracks the pose (e.g., position and orientation) of the AR/VR display device 106 relative to the real world environment 110 using, for example, optical sensors (e.g., depth-enabled 3D cameras, image cameras), inertial sensors (e.g., gyroscopes, accelerometers), wireless sensors (bluetooth, wi-Fi), GPS sensors, and audio sensors. In one example, the AR/VR display device 106 displays virtual content based on the pose of the AR/VR display device 106 with respect to the real world environment 110 and/or the physical object 104.
Any of the machines, databases, or devices illustrated in fig. 1 may be implemented in a general-purpose computer that is modified (e.g., configured or programmed) by software to be a special-purpose computer to perform one or more of the functions described herein with respect to the machine, database, or device. For example, a computer system capable of implementing any one or more of the methods described herein is discussed below with reference to fig. 6-7. As used herein, a "database" is a data storage resource and may store data structured as text files, tables, spreadsheets, relational databases (e.g., object-relational databases), triad stores, hierarchical data stores, or any suitable combination thereof. Furthermore, any two or more of the machines, databases, or devices illustrated in fig. 1 may be combined into a single machine, and the functionality described herein with respect to any single machine, database, or device may be subdivided among multiple machines, databases, or devices.
The AR/VR display device 106 may operate over a computer network. The computer network may be any network that enables communication between or among machines, databases, and devices. Thus, the computer network may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. The computer network may include one or more portions that constitute a private network, a public network (e.g., the internet), or any suitable combination thereof.
Fig. 2 is a block diagram illustrating modules (e.g., components) of the AR/VR display device 106 in accordance with some example embodiments. The AR/VR display device 106 includes a sensor 202, a display 204, a processor 206, and a storage device 208. Examples of AR/VR display device 106 include a wearable computing device, a mobile computing device, a navigation device, a portable media device, or a smart phone.
The sensors 202 include, for example, optical sensors 214 (e.g., imaging devices such as color imaging devices, thermal imaging devices, depth sensors, and one or more gray scale, global/rolling shutter tracking imaging devices) and inertial sensors 212 (e.g., gyroscopes, accelerometers, magnetometers). Other examples of sensors 202 include proximity or location sensors (e.g., near field communication, GPS, bluetooth, wifi), audio sensors (e.g., microphones), thermal sensors, pressure sensors (e.g., barometers), or any suitable combination thereof. Note that the sensor 202 described herein is for illustration purposes, and thus the sensor 202 is not limited to the above-described sensor.
The display 204 includes a screen or monitor configured to display images generated by the processor 206. In one example embodiment, the display 204 may be transparent or semi-opaque such that the user 102 may view (in the AR use case) through the display 204. In another example embodiment, the display 204 covers the eyes of the user 102 and obscures the entire field of view of the user 102 (in the VR use case). In another example, the display 204 includes a touch screen display configured to receive user input via contacts on the touch screen display.
The processor 206 includes an AR/VR application 216 and a vision tracking system 210. The AR/VR application 216 uses computer vision to detect and identify the physical environment or physical object 104. The AR/VR application 216 retrieves virtual content (e.g., a 3D object model) based on the identified physical object 104 or physical environment. The AR/VR application 216 presents virtual objects in the display 204. In one example implementation, the AR/VR application 216 includes a local rendering engine that generates a visualization of virtual content overlaid (e.g., superimposed thereon or otherwise displayed in coordination therewith) on an image of the physical object 104 captured by the optical sensor 214. The visualization of virtual content may be manipulated by adjusting the position (e.g., physical location, orientation, or both) of the physical object 104 relative to the AR/VR application display device 106. Similarly, the visualization of virtual content may be manipulated by adjusting the pose of the AR/VR display device 106 relative to the physical object 104. For VR applications, AR/VR application 216 displays virtual content in display 204 at a location (in display 204) determined based on the pose of AR/VR display device 106.
The vision tracking system 210 estimates the pose of the AR/VR display device 106. For example, the vision tracking system 210 uses image data from the optical sensor 214 and the inertial sensor 212 and corresponding inertial data to track the position and pose of the AR/VR display device 106 relative to a frame of reference (e.g., the real world environment 110). The vision tracking system 210 will be described in more detail below in conjunction with fig. 3.
The storage device 208 stores virtual content 218. The virtual content 218 includes, for example, a database of visual references (e.g., images of physical objects) and corresponding experiences (e.g., three-dimensional virtual object models).
Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine) or a combination of hardware and software. For example, any of the modules described herein may configure a processor to perform the operations described herein for that module. Furthermore, any two or more of these modules may be combined into a single module, and the functionality described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
Fig. 3 illustrates a visual tracking system 210 according to an example embodiment. The vision tracking system 210 includes an inertial sensor module 302, an optical sensor module 304, a blur reduction module 306, and a pose estimation module 308. The inertial sensor module 302 accesses inertial sensor data from the inertial sensor 212. The optical sensor module 304 accesses optical sensor data (e.g., images, camera settings/operating parameters) from the optical sensor 214. Examples of imaging device operating parameters include, but are not limited to, exposure time of the optical sensor 214, field of view of the optical sensor 214, ISO values of the optical sensor 214, and image resolution of the optical sensor 214.
In one example embodiment, the blur reduction module 306 retrieves the angular velocity of the optical sensor 214 based on the IMU sensor data from the inertial sensor 212. The blur reduction module 306 estimates a motion blur level based on the angular velocity and the camera operating parameters without performing any analysis on pixels in the image.
In another example embodiment, the blur reduction module 306 considers both the angular velocity and the linear velocity of the optical sensor 214 in conjunction with the 3D position of the current tracking point in the current image based on the current velocity estimate from the visual tracking system 210. For example, the blur reduction module 306 determines the linear velocity of the optical sensor 214 and the effect of the linear velocity on different areas of the image based on the distance of the object (determined by tracking the 3D position of the feature points in the current image). Thus, objects closer to the optical sensor 214 (when the optical sensor 214 is moved) appear more blurred than objects farther from the optical sensor 214.
The blur reduction module 306 reduces the image captured by the optical sensor 214 based on the motion blur level. For example, the blur reduction module 306 determines that the current image is blurred and applies an image pyramid algorithm to the current image to increase the contrast of the current image.
The pose estimation module 308 determines a pose (e.g., position, location, orientation) of the AR/VR display device 106 relative to a frame of reference (e.g., real world environment 110). In one example implementation, pose estimation module 308 includes a VIO system that estimates a pose of AR/VR display device 106 based on a 3D map of feature points from a current image captured with optical sensor 214 and inertial sensor data captured with inertial sensor 212.
In one example implementation, the pose estimation module 308 calculates the position and orientation of the AR/VR display device 106. The AR/VR display device 106 includes one or more optical sensors 214 and one or more inertial sensors 212 mounted on a rigid platform (the frame of the AR/VR display device 106). The optical sensor 214 may be mounted with non-overlapping (distributed aperture) or overlapping (stereoscopic or more) fields of view.
In some example implementations, the pose estimation module 308 includes an algorithm that combines inertial information from the inertial sensor 212 and image information from the pose estimation module 308, the inertial sensor 212 and the pose estimation module 308 being coupled to a rigid platform (e.g., the AR/VR display device 106) or a plant (rig). In one embodiment, the kit may be comprised of a plurality of cameras mounted on a rigid platform with an inertial navigation unit (e.g., inertial sensor 212). The kit may thus have at least one inertial navigation unit and at least one camera device.
FIG. 4 is a block diagram illustrating the blur reduction module 306 according to one example embodiment. The blur reduction module 306 includes a motion blur detection engine 402 and a pyramid computation engine 408. The motion blur detection engine 402 includes a current image module 404 and a future image module 406.
The motion blur detection engine 402 determines a level of motion blur for the image from the optical sensor 214. The current image module 404 estimates a motion blur level of the current image. The future image module 406 estimates a likelihood of motion blur for a subsequent image (e.g., an image subsequent to the current image).
In one example embodiment, the current image module 404 estimates motion blur based on camera operating parameters and the angular velocity of the inertial sensor 212. The current image module 404 retrieves camera operating parameters of the optical sensor 214 from the optical sensor module 304. For example, the camera operating parameters include settings of the optical sensor 214 during capture/exposure of the current image. The current image module 404 also retrieves inertial sensor data from the inertial sensor 212 (where the inertial sensor data is generated during capture/exposure of the current image). The current image module 404 retrieves the angular velocity from the IMU of the inertial sensor module 302. In one example, the current image module 404 samples the angular velocity of the visual tracking system 108 based on inertial sensor data sampled during the exposure time of the current image. In another example, the current image module 404 identifies a maximum angular velocity of the visual tracking system based on inertial sensor data captured during an exposure time of the current image.
In another example embodiment, the current image module 404 estimates the motion blur based on camera operating parameters, the angular velocity and the linear velocity of the visual tracking system 108. The current image module 404 retrieves the angular velocity determined from the VIO data (from the pose estimation module 308). The current image module 404 retrieves the linear velocity from the visual tracking system 108 (from the VIO data) and estimates the effect of the linear velocity on the motion blur of the regions of the current image (based on the 3D position of the tracked feature points). As described above, the depicted object closer to the optical sensor 214 shows more blur, while the depicted object farther from the optical sensor 214 shows less blur. The pose estimation module 308 tracks the 3D positions of the feature points and calculates the effect of the calculated linear velocity on portions of the current image.
When the current image module 404 determines that the motion blur is high (e.g., exceeds a threshold), the current image module 404 notifies the pyramid computation engine 408 to continue applying the image pyramid algorithm to the current image.
Future image module 406 retrieves inertial sensor data from inertial sensor 212, where the inertial sensor data was generated during the capture/exposure time of the current image, and retrieves camera operating parameters of optical sensor 214 from optical sensor module 304. The camera operating parameters include camera settings (e.g., exposure time, field of view, resolution) of the optical sensor 214 during the capture/exposure of the current image. The future image module 406 estimates the likelihood of motion blur in subsequent images (subsequent to the current image) based on camera settings and (optionally) angular velocity. For example, the visual tracking system 108 may be located in a darker environment. Thus, the exposure time of the optical sensor 214 is long. The current image module 404 determines that the motion blur in the current image is below the threshold because the visual tracking system 108 is not moving fast at the time. However, the future image module 406 determines that the exposure time is longer (above the preset threshold) and that the likelihood of motion blur (in the next subsequent image) is high as long as the optical sensor 214 moves faster. Thus, the future image module 406 uses the exposure time of the optical sensor 214 as a predictor of the likelihood of future motion blur. In the event that the future image module 406 determines that the likelihood of future motion blur is high, the future image module 406 notifies the pyramid computation engine 408 to continue applying the image pyramid algorithm to the current image.
The pyramid computation engine 408 performs an image pyramid algorithm on the current image to reduce the current image. Pyramid computation engine 408 provides the reduced image to pose estimation module 308 for feature matching.
FIG. 5 is a block diagram illustrating an example process according to an example embodiment. The vision tracking system 108 receives sensor data from the sensors 202 to determine the pose of the vision tracking system 108. The blur reduction module 306 estimates the blurred motion of the current image based on sensor data (e.g., angular velocity from the IMU) or VIO data (e.g., angular velocity and linear velocity estimates and tracked 3D point locations) from the pose estimation module 308 and camera operating parameters associated with the current image (e.g., exposure time, field of view, resolution). In another example, the blur reduction module 306 determines the likelihood of motion blur in an image subsequent to the current image based on the angular/linear speed of the optical sensor 214 and the camera operating parameters.
When the blur reduction module 306 determines that the motion blur of the current image exceeds the motion blur threshold, the blur reduction module 306 requests the pyramid computation engine 408 to perform an image pyramid algorithm on the current image to reduce the current image. Although the motion blur of the current image may be within the motion blur threshold, the blur reduction module 306 may still determine that the likelihood of motion blur in the future image exceeds the likelihood threshold. In this case, the blur reduction module 306 requests the pyramid computation engine 408 to perform an image pyramid algorithm on the current image.
The pose estimation module 308 identifies the pose of the visual tracking system 108 based on the image (or the downscaled image) provided by the blur reduction module 306. Gesture estimation module 308 provides gesture data to AR/VR application 216.
The AR/VR application 216 retrieves the virtual content 218 from the storage device 208 and causes the virtual content 218 to be displayed at a location (in the display 204) based on the pose of the AR/VR display device 106. Note that the pose of the AR/VR display device 106 is also referred to as the pose of the vision tracking system 108 or the optical sensor 214.
Fig. 6 is a flowchart illustrating a method 600 for mitigating motion blur, according to an example embodiment. The operations in method 600 may be performed by the vision tracking system 108 using the components (e.g., modules, engines) described above with respect to fig. 4. Thus, the method 600 will be described by way of example with reference to the blur reduction module 306. However, it should be understood that at least some of the operations of method 600 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
In block 602, the current image module 404 identifies current camera operating parameters corresponding to the current image (captured by the optical sensor 214). In block 604, the current image module 404 detects an angular velocity (of the optical sensor 214) during an exposure time of the current image. In block 606, the current image module 404 estimates a motion blur level based on the angular velocity of the optical sensor 214 and the current camera operating parameters. In decision block 608, the current image module 404 determines whether the motion blur level (of the current image) exceeds a motion blur level threshold (of the current image). In block 610, the current image module 404 triggers a request to the pyramid computation engine 408 to downsample the current image in response to determining that the motion blur level (of the current image) exceeds the motion blur level threshold (of the current image). In the event that the current image module 404 determines that the motion blur level (of the current image) does not exceed the motion blur level threshold (of the current image), the method 600 proceeds to block a612.
It is noted that other embodiments may use different ordering, additional or fewer operations, and different nomenclature or terminology to accomplish similar functions. In some implementations, various operations may be performed in parallel with other operations in a synchronous or asynchronous manner. The operations described herein were chosen to illustrate some principles of operation in a simplified form.
Fig. 7 is a flowchart illustrating a method 700 for mitigating motion blur, according to an example embodiment. The operations in method 700 may be performed by the vision tracking system 108 using the components (e.g., modules, engines) described above with respect to fig. 4. Thus, the method 700 will be described by way of example with reference to the blur reduction module 306. However, it should be understood that at least some of the operations of method 700 may be deployed on various other hardware configurations or performed by similar components residing elsewhere.
At block a 612, from method 600, it proceeds to method 700. In decision block 702, the current image module 404 determines that the current image is to be used as a source image for matching features identified in a subsequent image. In block 704, the future image module 406 estimates a likelihood of motion blur in the next image based on the exposure time of the optical sensor 214 and the angular velocity of the optical sensor 214. In decision block 706, the future image module 406 estimates that the likelihood of motion blur exceeds a likelihood threshold. In block 708, as a result of decision block 706, future image module 406 triggers a request to pyramid computation engine 408 to downsample the current image.
Fig. 8 is a block diagram 800 illustrating a software architecture 804, which software architecture 804 may be installed on any one or more of the devices described herein. The software architecture 804 is supported by hardware, such as the machine 802, the machine 802 including a processor 820, memory 826, and I/O components 838. In this example, the software architecture 804 may be conceptualized as a stack of layers in which each layer provides a particular function. The software architecture 804 includes layers such as an operating system 812, libraries 810, frameworks 1208, and applications 806. In operation, the application 806 calls the API call 850 through the software stack and receives the message 852 in response to the API call 850.
Operating system 812 manages hardware resources and provides common services. Operating system 812 includes, for example: kernel 814, service 816, and driver 822. The kernel 814 serves as an abstraction layer between the hardware and other software layers. For example, the kernel 814 provides memory management, processor management (e.g., scheduling), component management, networking, and security settings, among other functions. Service 816 may provide other common services for other software layers. The driver 822 is responsible for controlling or interfacing with the underlying hardware. For example, the driver 822 may include a display driver, an imaging device driver, Or (b)Low power consumption driver, flash memory driver, serial communication driver (e.g., universal serial busBus (USB) driver), -a bus interface (USB) driver (USB)>Drivers, audio drivers, power management drivers, etc.
Library 810 provides a low-level public infrastructure used by applications 806. Library 810 may include a system library 818 (e.g., a C-standard library), which system library 818 provides functions such as memory allocation functions, string manipulation functions, mathematical functions, and the like. In addition, libraries 810 may include API libraries 824, such as media libraries (e.g., libraries for supporting presentation and manipulation of various media formats, such as moving Picture experts group-4 (MPEG 4), advanced video coding (H.264 or AVC), moving Picture experts group layer-3 (MP 3), advanced Audio Coding (AAC), adaptive Multi-Rate (AMR) audio codec, joint Picture experts group (JPEG or JPG) or Portable Network Graphics (PNG)), graphics libraries (e.g., openGL framework for presentation in two-dimensional (2D) and three-dimensional (3D) in graphical content on a display), database libraries (e.g., SQLite providing various relational database functions), web libraries (e.g., webKit providing web browsing functions), and the like. Library 810 may also include various other libraries 828 to provide many other APIs to applications 806.
The framework 808 provides a high-level public infrastructure used by the applications 806. For example, the framework 808 provides various Graphical User Interface (GUI) functions, advanced resource management, and advanced location services. The framework 808 can provide a wide variety of other APIs that can be used by the applications 806, some of which can be specific to a particular operating system or platform.
In an example implementation, the applications 806 may include a home application 836, a contacts application 830, a browser application 832, a book-viewer application 834, a location application 842, a media application 844, a messaging application 846, a gaming application 848, and a variety of other applications such as a third party application 840. The application 806 is a program that performs the functions defined in the program. One or more of the variously structured applications 806 may be created using a variety of programming languages, such as an object-oriented programming language (e.g., objective-C, java orC++) or a process programming language (e.g., C language or assembly language). In a particular example, the third party application 840 (e.g., using ANDROID by an entity other than the vendor of the particular platform) TM Or IOS TM Applications developed in Software Development Kits (SDKs) may be, for example, in IOS TM 、ANDROID TMThe Phone's mobile operating system or other mobile software running on the mobile operating system. In this example, third party application 840 may activate API call 850 provided by operating system 812 to facilitate the functionality described herein.
Fig. 9 is a diagrammatic representation of a machine 900 within which instructions 908 (e.g., software, programs, applications, applets, applications (apps) or other executable code) for causing the machine 900 to perform any one or more of the methods discussed herein may be executed. For example, the instructions 908 may cause the machine 900 to perform any one or more of the methods described herein. The instructions 908 transform a generic, un-programmed machine 900 into a specific machine 900 programmed to perform the functions described and illustrated in the manner described. The machine 900 may operate as a standalone device or may be coupled (e.g., networked) to other machines. In a networked deployment, the machine 900 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 900 may include, but is not limited to: a server computer, a client computer, a Personal Computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a PDA, an entertainment media system, a cellular telephone, a smart phone, a mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart devices, a web device, a network router, a network switch, a network bridge, or any machine capable of executing instructions 908 that specify actions to be taken by machine 900, sequentially or otherwise. Furthermore, while only a single machine 900 is illustrated, the term "machine" shall also be taken to include a collection of machines that individually or jointly execute the instructions 908 to perform any one or more of the methodologies discussed herein.
The machine 900 may include a processor 902, a memory 904, and an I/O component 942 that may be configured to communicate with each other via a bus 944. In an example embodiment, the processor 902 (e.g., a Central Processing Unit (CPU), a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an ASIC, a Radio Frequency Integrated Circuit (RFIC), other processors, or any suitable combination thereof) may include, for example, a processor 906 and a processor 910 that execute instructions 908. The term "processor" is intended to include a multi-core processor, which may include two or more separate processors (sometimes referred to as "cores") that may execute instructions simultaneously. Although fig. 9 shows multiple processors 902, machine 900 may include a single processor with a single core, a single processor with multiple cores (e.g., a multi-core processor), multiple processors with a single core, multiple processors with multiple cores, or any combination thereof.
The memory 904 includes a main memory 912, a static memory 914, and a storage unit 916, all of which are accessible by the processor 902 via a bus 944. Main memory 904, static memory 914, and storage unit 916 store instructions 908 that implement any one or more of the methods or functions described herein. The instructions 908 may also reside, completely or partially, within the main memory 912, within the static memory 914, within the machine-readable medium 918 within the storage unit 916, within at least one processor of the processors 902 (e.g., within the cache memory of the processor), or within any suitable combination thereof, during execution thereof by the machine 900.
The I/O component 942 may include a variety of components for receiving input, providing output, producing output, transmitting information, exchanging information, capturing measurement results, and the like. The particular I/O components 942 included in a particular machine will depend on the type of machine. For example, a portable machine such as a mobile phone may include a touch input device or other such input mechanism, while a headless server machine may not include such a touch input device. It should be understood that the I/O component 942 may include many other components not shown in fig. 9. In various example embodiments, the I/O components 942 may include an output component 928 and an input component 930. The output component 928 may include visual components (e.g., a display such as a Plasma Display Panel (PDP), a Light Emitting Diode (LED) display, a Liquid Crystal Display (LCD), a projector, or a Cathode Ray Tube (CRT)), acoustic components (e.g., speakers), haptic components (e.g., vibration motor, resistance mechanism), other signal generators, and so forth. Input component 930 may include an alphanumeric input component (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optoelectronic keyboard, or other alphanumeric input component), a point-based input component (e.g., a mouse, touchpad, trackball, joystick, motion sensor, or other pointing instrument), a tactile input component (e.g., a physical button, a touch screen providing positioning and/or force of a touch or touch gesture, or other tactile input component), an audio input component (e.g., a microphone), and the like.
In further example embodiments, the I/O component 942 may include a biometric component 932, a motion component 934, an environmental component 936, or a positioning component 938, among various other components. For example, the biometric means 932 includes means for detecting expressions (e.g., hand expressions, face expressions, voice expressions, body gestures, or eye tracking), measuring biological signals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identifying a person (e.g., voice recognition, retinal recognition, facial recognition, fingerprint recognition, or electroencephalogram-based recognition), and the like. The motion components 934 include acceleration sensor components (e.g., accelerometers), gravity sensor components, rotation sensor components (e.g., gyroscopes), and the like. The environmental components 936 include, for example, an illumination sensor component (e.g., a photometer), a temperature sensor component (e.g., one or more thermometers that detect ambient temperature), a humidity sensor component, a pressure sensor component (e.g., a barometer), an auditory sensor component (e.g., one or more microphones that detect background noise), a proximity sensor component (e.g., an infrared sensor that detects nearby objects), a gas sensor (e.g., a gas detection sensor that detects hazardous gas concentrations to ensure safety or to measure contaminants in the atmosphere), or other components that may provide an indication, measurement, or signal corresponding to the surrounding physical environment. The positioning component 938 includes a position sensor component (e.g., a GPS receiver component), an altitude sensor component (e.g., an altimeter or barometer that detects barometric pressure from which altitude may be derived), an orientation sensor component (e.g., a magnetometer), and so forth.
Communication may be implemented using a variety of techniques. I/O component 942 also includes a communication component 940, which communication component 1240 is operable to couple machine 900 to network 920 or device 922 via coupling 924 and coupling 926, respectively. For example, communication component 940 may include a network interface component or another suitable device to interface with network 920. In further examples, communication component 940 may include a wired communication component, a wireless communication component, a cellular communication component, a Near Field Communication (NFC) component,Parts (e.g.)>Low power consumption)/(f)>Components, and other communication components that provide communication via other modalities. Device 922 may be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via USB).
Further, communication component 940 may detect an identifier or include a component operable to detect an identifier. For example, the communication component 940 can include a Radio Frequency Identification (RFID) tag reader component, an NFC smart tag detection component, an optical reader component (e.g., for detecting one-dimensional barcodes such as Universal Product Code (UPC) barcodes, multidimensional barcodes such as Quick Response (QR) codes, aztec codes, data matrices, data symbols (Dat) aglyph), maximum code (MaxiCode), PDF417, ultra code (UltraCode), UCCRSS-2D bar code, and other optical code), or acoustic detection components (e.g., a microphone for identifying the marked audio signal). In addition, various information may be derived via the communication component 940, e.g., via Internet Protocol (IP) geolocated locations, viaThe location of signal triangulation, the location of NFC beacon signals that may indicate a particular location via detection, etc.
Various memories (e.g., memory 904, main memory 912, static memory 914, and/or memory of processor 902) and/or storage unit 916 may store one or more sets of instructions and data structures (e.g., software) implemented or used by any one or more of the methods or functions described herein. These instructions (e.g., instructions 908), when executed by the processor 902, cause various operations to implement the disclosed embodiments.
The instructions 908 may be transmitted or received over the network 920 via a network interface device (e.g., a network interface component included in the communication component 940) using a transmission medium and using any of a number of well-known transmission protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, instructions 908 may be transmitted or received to device 922 via coupling 926 (e.g., a peer-to-peer coupling) using a transmission medium.
The terms "machine storage medium," "device storage medium," and "computer storage medium" are used herein in the same sense and are used interchangeably throughout this disclosure. These terms refer to a single or multiple storage devices and/or media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the executable instructions and/or data. Accordingly, the term should be taken to include, but is not limited to, solid-state memory, as well as optical and magnetic media, including memory internal or external to the processor. Specific examples of machine, computer, and/or device storage media include: nonvolatile memory includes, for example, semiconductor memory devices such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), field Programmable Gate Array (FPGA), and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disk; CD-ROM and DVD-ROM discs. The terms "machine storage medium," computer storage medium, "and" device storage medium "specifically exclude carrier waves, modulated data signals, and other such media, and at least some of the carrier waves, modulated data signals, and other such media are encompassed by the term" signal medium.
The terms "transmission medium" and "signal medium" mean the same medium, and may be used interchangeably throughout this disclosure. The terms "transmission medium" and "signal medium" should be taken to include any intangible medium that is capable of storing, encoding or carrying instructions 1416 for execution by the machine 1400, and include digital or analog communications signals or other intangible medium to facilitate communication of such software. Accordingly, the terms "transmission medium" and "signal medium" shall include any form of modulated data signal, carrier wave, or the like. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
The terms "machine-readable medium," "computer-readable medium," and "device-readable medium" mean the same thing and may be used interchangeably in this disclosure. These terms are defined to include both machine storage media and transmission media. Thus, these terms include both storage devices/media and carrier wave/modulated data signals.
System with head wearable device
Fig. 10 illustrates a network environment 1000 in which a head wearable device 1002 may be implemented, according to an example embodiment. Fig. 10 is a high-level functional block diagram of an example head wearable apparatus 1002 that communicatively couples a mobile client device 1038 and a server system 1032 via various networks 1040.
The head wearable device 1002 includes an imaging device, such as at least one of a visible light imaging device 1012, an infrared emitter 1014, and an infrared imaging device 1016. Client device 1038 can connect with head wearable apparatus 1002 using both communication 1034 and communication 1036. Client device 1038 is connected to server system 1032 and network 1040. Network 1040 may include any combination of wired and wireless connections.
The head wearable device 1002 also includes two image displays of the image display 1004 of the optical assembly. The two image displays include one image display associated with the left lateral side of the head wearable device 1002 and one image display associated with the right lateral side of the head wearable device 1002. The head wearable device 1002 also includes an image display driver 1008, an image processor 1010, low power consumption circuitry 1026, and high speed circuitry 1018. The image display 1004 of the optical assembly is used to present images and video, including images that may include a graphical user interface, to a user of the head wearable device 1002.
The image display driver 1008 commands and controls the image display in the image display 1004 of the optical assembly. The image display driver 1008 may deliver the image data directly to an image display in the image display 1004 of the optical assembly for presentation, or may convert the image data into a signal or data format suitable for delivery to an image display device. For example, the image data may be video data formatted according to a compression format such as h.264 (MPEG-4), HEVC, theora, dirac, realVideo RV40, VP8, VP9, etc., and the still image data may be formatted according to a compression format such as Portable Network Group (PNG), joint Photographic Experts Group (JPEG), tag Image File Format (TIFF), or exchangeable image file format (Exif), etc.
As described above, the head wearable device 1002 includes a frame and a handle (or temple) extending from a lateral side of the frame. The head wearable apparatus 1002 also includes a user input device 1006 (e.g., a touch sensor or push button) that includes an input surface on the head wearable apparatus 1002. A user input device 1006 (e.g., a touch sensor or press button) is used to receive input selections from a user for manipulating a graphical user interface of the presented image.
The components shown in fig. 10 for head wearable device 1002 are located on one or more circuit boards (e.g., PCB or flexible PCB) in a bezel or temple. Alternatively or additionally, the depicted components may be located in a block, frame, hinge, or beam of the head wearable device 1002. The left and right may include digital camera elements, such as Complementary Metal Oxide Semiconductor (CMOS) image sensors, charge coupled devices, camera lenses, or any other corresponding visible light or light capturing element that may be used to capture data, including images of a scene with an unknown object.
The head wearable device 1002 includes a memory 1022, the memory 1022 storing instructions for performing a subset or all of the functions described herein. Memory 1022 may also include a storage device.
As shown in fig. 10, high-speed circuitry 1018 includes a high-speed processor 1020, memory 1022, and high-speed wireless circuitry 1024. In this example, the image display driver 1008 is coupled to the high speed circuitry 1018 and operated by the high speed processor 1020 to drive the left and right image displays of the image display 1004 of the optical assembly. High-speed processor 1020 may be any processor capable of managing the operation and high-speed communication of any general computing system required by head wearable device 1002. The high speed processor 1020 includes processing resources required to manage high speed data transmission over the communication 1036 to a Wireless Local Area Network (WLAN) using high speed wireless circuitry 1024. In some examples, high-speed processor 1020 executes an operating system (such as the LINUX operating system or other such operating system of head wearable device 1002) and the operating system is stored in memory 1022 for execution. The high-speed processor 1020 executing the software architecture of the head wearable device 1002 is used to manage data transmission with the high-speed wireless circuit 1024, among any other responsibilities. In some examples, the high-speed wireless circuit 1024 is configured to implement the Institute of Electrical and Electronics Engineers (IEEE) 802.11 communication standard (also referred to herein as Wi-Fi). In other examples, other high-speed communication standards may be implemented by high-speed wireless circuit 1024.
Head partThe low power wireless circuit 1030 and the high speed wireless circuit 1024 of the wearable device 1002 may include a short range transceiver (Bluetooth) TM ) And a wireless wide area network, local area network, or wide area network transceiver (e.g., cellular or WIFI). Client device 1038 (including a transceiver that communicates via communications 1034 and communications 1036) can be implemented using architectural details of head wearable apparatus 1002, as can other elements of network 1040.
The memory 1022 includes any storage device capable of storing various data and applications, and further includes camera data generated by the left and right infrared cameras 1016 and the image processor 1010, as well as images generated for display by the image display driver 1008 on the image display of the image display 1004 of the optical assembly. Although memory 1022 is shown as being integrated with high-speed circuitry 1018, in other examples memory 1022 may be a separate, stand-alone element of head wearable device 1002. In some such examples, the circuit by wire may provide a connection from the image processor 1010 or the low power processor 1028 to the memory 1022 through a chip including the high speed processor 1020. In other examples, high-speed processor 1020 may manage addressing of memory 1022 such that low-power processor 1028 will enable high-speed processor 1020 at any time when read or write operations involving memory 1022 are desired.
As shown in fig. 10, the low power processor 1028 or the high speed processor 1020 of the head wearable apparatus 1002 may be coupled to an image capturing apparatus (a visible light image capturing apparatus 1012, an infrared transmitter 1014, or an infrared image capturing apparatus 1016), an image display driver 1008, a user input device 1006 (e.g., a touch sensor or a push button), and a memory 1022.
The head wearable device 1002 is connected to a host computer. For example, head wearable apparatus 1002 pairs with client device 1038 via communication 1036 or connects to server system 1032 via network 1040. Server system 1032 may be one or more computing devices that are part of a service or network computing system, e.g., including a processor, memory, and a network communication interface, to communicate with client device 1038 and head wearable apparatus 1002 over network 1040.
Client device 1038 includes a processor and a network communication interface coupled to the processor. The network communication interface allows communication through network 1040, communication 1034, or communication 1036. Client device 1038 may further store at least a portion of the instructions for generating binaural audio content in a memory of client device 1038 to implement the functionality described herein.
The output components of the head wearable device 1002 include visual components such as a display (such as a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), a Light Emitting Diode (LED) display, a projector, or a waveguide). The image display of the optical assembly is driven by an image display driver 1008. The output components of the head wearable device 1002 also include acoustic components (e.g., speakers), haptic components (e.g., vibration motors), other signal generators, and the like. The input components (e.g., user input device 1006) of the head wearable apparatus 1002, client device 1038, and server system 1032 may include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optoelectronic keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, or other directional tool), tactile input components (e.g., physical buttons, a touch screen providing the location and force of a touch or touch gesture, or other tactile input components), audio input components (e.g., a microphone), and the like.
The head wearable device 1002 may optionally include additional peripheral elements. Such peripheral elements may include biometric sensors, additional sensors, or display elements integrated with head wearable device 1002. For example, a peripheral element may include any I/O component, including an output component, a motion component, a positioning component, or any other such element described herein.
For example, the biometric components include a device for detecting an expression (e.g., a hand expression, a facial expression, a vocal expression, a body posture, or eye tracking), measuring a biometric signal (e.g., blood pressure, heart rate, body temperature, sweat, or brain waves), identifyingParts of a person (e.g., speech recognition, retinal recognition, facial recognition, fingerprint recognition, or electroencephalogram-based recognition), etc. The motion components include acceleration sensor components (e.g., accelerometers), gravity sensor components, rotation sensor components (e.g., gyroscopes), and the like. The positioning component includes a position sensor component (e.g., a Global Positioning System (GPS) receiver component) for generating position coordinates, wiFi or Bluetooth for generating positioning system coordinates TM Transceivers, altitude sensor components (e.g., altimeters or barometers that detect barometric pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and so forth. Such positioning system coordinates may also be received and communicated 1036 from a client device 1038 via a low-power wireless circuit 1030 or a high-speed wireless circuit 1024.
When using phrases similar to "at least one of A, B or C", "at least one of A, B and C", "one or more A, B or C", or "one or more of A, B and C", this phrase is intended to be construed to mean that a may be present in an embodiment alone, B may be present in an embodiment alone, C may be present in an embodiment alone, or any combination of elements A, B and C may be present in a single embodiment; for example, a and B, A and C, B and C, or a and B and C.
Changes and modifications may be made to the disclosed embodiments without departing from the scope of the present disclosure. These and other changes or modifications are intended to be included within the scope of the present disclosure as expressed in the appended claims.
Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader scope of the disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof show by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments shown are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The detailed description is, therefore, not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
These embodiments of the inventive subject matter may be referred to, individually and/or collectively, herein by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Thus, although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description.
The Abstract of the disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with such understanding: i.e., the abstract is not to be used to interpret or limit the scope or meaning of the claims. Furthermore, in the foregoing detailed description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment.
Example
Example 1 is a method for selective motion blur mitigation in a visual tracking system, the method comprising: accessing a first image generated by an optical sensor of the vision tracking system; identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor; determining a motion of the optical sensor during generation of the first image by the optical sensor; determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and determining whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
Example 2 includes example 1, wherein determining the motion of the optical sensor comprises: retrieving inertial sensor data from an inertial sensor of the visual tracking system, the inertial sensor data corresponding to the first image; and determining an angular velocity of the vision tracking system based on the inertial sensor data, wherein the motion blur level is based on the angular velocity of the vision tracking system and the camera operating parameter without analyzing content of the first image.
Example 3 includes example 1, wherein determining the motion of the optical sensor comprises: the method further includes accessing, from a VIO system of the vision tracking system, VIO data including an estimated angular velocity of the optical sensor, an estimated linear velocity of the optical sensor, and a location of a feature point in the first image, wherein the motion blur level is based on the camera operating parameters and the VIO data without analyzing content of the first image, wherein motion blur in different regions of the first image is based on the estimated angular velocity of the optical sensor, the estimated linear velocity of the optical sensor, and a 3D location of the feature point in corresponding different regions of the first image relative to the optical sensor.
Example 4 includes example 1, wherein the camera operating parameters include a combination of an exposure time of the optical sensor, a field of view of the optical sensor, an ISO value of the optical sensor, and an image resolution.
Example 5 includes example 1, further comprising: determining that the motion blur level of the first image exceeds a motion blur threshold; in response to detecting that the motion blur level of the first image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and identifying features in a reduced version of the first image.
Example 6 includes example 1, further comprising: detecting that the motion blur level of the first image is within a motion blur threshold; and in response to detecting that the motion blur level of the first image is within the motion blur threshold, identifying a feature in the first image.
Example 7 includes example 1, wherein determining whether to downscale the first image further comprises: before the optical sensor generates a second image, the second image is subsequent to the first image, a likelihood of motion blur of the second image is estimated based on the motion of the visual tracking system and the camera operating parameters.
Example 8 includes example 7, further comprising: detecting that a likelihood of motion blur of the second image exceeds a motion blur threshold; in response to detecting that the likelihood of the motion blur level of the second image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and identifying features in a reduced version of the first image.
Example 9 includes example 7, further comprising: detecting that the likelihood of motion blur of the second image is within a motion blur threshold; and identifying a feature in the first image in response to detecting that the likelihood of motion blur of the second image is within the motion blur threshold.
Example 10 includes example 1, further comprising: matching feature points between the reduced version of the first image and the reduced version of the second image; and identifying a pose of the visual tracking system based on the matched feature points.
Example 11 is a computing device, comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to: accessing a first image generated by an optical sensor of the vision tracking system; identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor; determining a motion of the optical sensor during generation of the first image by the optical sensor; determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and determining whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
Example 12 includes example 11, wherein determining the motion of the optical sensor comprises: retrieving inertial sensor data from an inertial sensor of the visual tracking system, the inertial sensor data corresponding to the first image; and determining an angular velocity of the vision tracking system based on the inertial sensor data, wherein the motion blur level is based on the angular velocity of the vision tracking system and the camera operating parameter without analyzing content of the first image.
Example 13 includes example 11, wherein determining the motion of the optical sensor comprises: identifying an angular velocity from a visual inertial odometer system of the visual tracking system; tracking, using the visual odometer system, the location of the identified feature points in the first image; and determining a linear velocity based on the tracking, wherein motion blur in different regions of the first image is based on the estimated angular velocity of the optical sensor, the estimated linear velocity of the optical sensor, and a 3D position of a feature point in corresponding different regions of the first image relative to the optical sensor.
Example 14 includes example 11, wherein the camera operating parameters include a combination of an exposure time of the optical sensor, a field of view of the optical sensor, an ISO value of the optical sensor, and an image resolution.
Example 15 includes example 11, wherein the instructions further configure the apparatus to: determining that the motion blur level of the first image exceeds a motion blur threshold; in response to detecting that the motion blur level of the first image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and identifying features in a reduced version of the first image.
Example 16 includes example 11, wherein the instructions further configure the apparatus to: detecting that the motion blur level of the first image is within a motion blur threshold; and in response to detecting that the motion blur level of the first image is within the motion blur threshold, identifying a feature in the first image.
Example 17 includes example 11, wherein determining whether to downscale the first image further comprises: before the optical sensor generates a second image, the second image is subsequent to the first image, a likelihood of motion blur of the second image is estimated based on the motion of the visual tracking system and the camera operating parameters.
Example 18 includes example 17, wherein the instructions further configure the apparatus to: detecting that a likelihood of motion blur of the second image exceeds a motion blur threshold; in response to detecting that the likelihood of the motion blur level of the second image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and identifying features in a reduced version of the first image.
Example 19 includes example 17, wherein the instructions further configure the apparatus to: detecting that the likelihood of motion blur of the second image is within a motion blur threshold; and identifying a feature in the first image in response to detecting that the likelihood of motion blur of the second image is within the motion blur threshold.
Example 20 is a non-transitory computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to: accessing a first image generated by an optical sensor of a vision tracking system; identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor; determining a motion of the optical sensor during generation of the first image by the optical sensor; determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and determining whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.

Claims (20)

1. A method for selective motion blur mitigation in a visual tracking system, the method comprising:
Accessing a first image generated by an optical sensor of the vision tracking system;
identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor;
determining a motion of the optical sensor during generation of the first image by the optical sensor;
determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and
a determination is made whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
2. The method of claim 1, wherein determining the motion of the optical sensor comprises:
retrieving inertial sensor data from an inertial sensor of the visual tracking system, the inertial sensor data corresponding to the first image; and
determining an angular velocity of the visual tracking system based on the inertial sensor data,
wherein the motion blur level is based on the angular velocity of the visual tracking system and the camera operating parameter without analyzing the content of the first image.
3. The method of claim 1, wherein determining the motion of the optical sensor comprises:
Accessing VIO data from a VIO system of the vision tracking system, the VIO data including an estimated angular velocity of the optical sensor, an estimated linear velocity of the optical sensor, and a location of a feature point in the first image,
wherein the motion blur level is based on the camera operating parameters and the VIO data without analyzing the content of the first image,
wherein motion blur in different regions of the first image is based on the estimated angular velocity of the optical sensor, the estimated linear velocity of the optical sensor, and a 3D position of a feature point in a corresponding different region of the first image relative to the optical sensor.
4. The method of claim 1, wherein the camera operating parameters include a combination of an exposure time of the optical sensor, a field of view of the optical sensor, an ISO value of the optical sensor, and an image resolution.
5. The method of claim 1, further comprising:
determining that the motion blur level of the first image exceeds a motion blur threshold;
in response to detecting that the motion blur level of the first image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and
Features in a reduced version of the first image are identified.
6. The method of claim 1, further comprising:
detecting that the motion blur level of the first image is within a motion blur threshold; and
in response to detecting that the motion blur level of the first image is within the motion blur threshold, features in the first image are identified.
7. The method of claim 1, wherein determining whether to downscale the first image further comprises:
before the optical sensor generates a second image, the second image is subsequent to the first image, a likelihood of motion blur of the second image is estimated based on the motion of the visual tracking system and the camera operating parameters.
8. The method of claim 7, further comprising:
detecting that a likelihood of motion blur of the second image exceeds a motion blur threshold;
in response to detecting that the likelihood of the motion blur level of the second image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and
features in a reduced version of the first image are identified.
9. The method of claim 7, further comprising:
detecting that the likelihood of motion blur of the second image is within a motion blur threshold; and
features in the first image are identified in response to detecting that a likelihood of motion blur of the second image is within the motion blur threshold.
10. The method of claim 1, further comprising:
matching feature points between the reduced version of the first image and the reduced version of the second image; and
a pose of the visual tracking system is identified based on the matched feature points.
11. A computing device, comprising:
a processor; and
a memory storing instructions that, when executed by the processor, configure the apparatus to:
accessing a first image generated by an optical sensor of the vision tracking system;
identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor;
determining a motion of the optical sensor during generation of the first image by the optical sensor;
determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and
A determination is made whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
12. The computing device of claim 11, wherein to determine the motion of the optical sensor comprises to:
retrieving inertial sensor data from an inertial sensor of the visual tracking system, the inertial sensor data corresponding to the first image; and
determining an angular velocity of the visual tracking system based on the inertial sensor data,
wherein the motion blur level is based on the angular velocity of the visual tracking system and the camera operating parameter without analyzing the content of the first image.
13. The computing device of claim 11, wherein to determine the motion of the optical sensor comprises to:
accessing VIO data from a VIO system of the vision tracking system, the VIO data including an estimated angular velocity of the optical sensor, an estimated linear velocity of the optical sensor, and a location of a feature point in the first image,
wherein the motion blur level is based on the camera operating parameters and the VIO data without analyzing the content of the first image,
Wherein motion blur in different regions of the first image is based on the estimated angular velocity of the optical sensor, the estimated linear velocity of the optical sensor, and a 3D position of a feature point in a corresponding different region of the first image relative to the optical sensor.
14. The computing device of claim 11, wherein the imaging device operating parameters include a combination of an exposure time of the optical sensor, a field of view of the optical sensor, an ISO value of the optical sensor, and an image resolution.
15. The computing device of claim 11, wherein the instructions further configure the device to:
determining that the motion blur level of the first image exceeds a motion blur threshold;
in response to detecting that the motion blur level of the first image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and
features in a reduced version of the first image are identified.
16. The computing device of claim 11, wherein the instructions further configure the device to:
detecting that the motion blur level of the first image is within a motion blur threshold; and
In response to detecting that the motion blur level of the first image is within the motion blur threshold, features in the first image are identified.
17. The computing device of claim 11, wherein determining whether to downscale the first image further comprises:
before the optical sensor generates a second image, the second image is subsequent to the first image, a likelihood of motion blur of the second image is estimated based on the motion of the visual tracking system and the camera operating parameters.
18. The computing device of claim 17, wherein the instructions further configure the device to:
detecting that a likelihood of motion blur of the second image exceeds a motion blur threshold;
in response to detecting that the likelihood of the motion blur level of the second image exceeds the motion blur threshold, applying the pyramid computation algorithm to the first image to generate a reduced version of the first image; and
features in a reduced version of the first image are identified.
19. The computing device of claim 17, wherein the instructions further configure the device to:
detecting that the likelihood of motion blur of the second image is within a motion blur threshold; and
Features in the first image are identified in response to detecting that a likelihood of motion blur of the second image is within the motion blur threshold.
20. A non-transitory computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to:
accessing a first image generated by an optical sensor of the vision tracking system;
identifying an imaging device operating parameter of the optical sensor during generation of the first image by the optical sensor;
determining a motion of the optical sensor during generation of the first image by the optical sensor;
determining a motion blur level of the first image based on the camera operating parameters of the optical sensor and the motion of the optical sensor; and
a determination is made whether to use a pyramid computation algorithm to downscale the first image based on the motion blur level.
CN202280035656.XA 2021-05-18 2022-05-17 Selective image pyramid computation for motion blur mitigation Pending CN117337575A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/189,893 2021-05-18
US17/521,081 US20220375041A1 (en) 2021-05-18 2021-11-08 Selective image pyramid computation for motion blur mitigation in visual-inertial tracking
US17/521,081 2021-11-08
PCT/US2022/029629 WO2022245821A1 (en) 2021-05-18 2022-05-17 Selective image pyramid computation for motion blur mitigation

Publications (1)

Publication Number Publication Date
CN117337575A true CN117337575A (en) 2024-01-02

Family

ID=89293938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280035656.XA Pending CN117337575A (en) 2021-05-18 2022-05-17 Selective image pyramid computation for motion blur mitigation

Country Status (1)

Country Link
CN (1) CN117337575A (en)

Similar Documents

Publication Publication Date Title
US20230300464A1 (en) Direct scale level selection for multilevel feature tracking under motion blur
US20230388632A1 (en) Dynamic adjustment of exposure and iso to limit motion blur
US20220375041A1 (en) Selective image pyramid computation for motion blur mitigation in visual-inertial tracking
EP4342170A1 (en) Selective image pyramid computation for motion blur mitigation
US11615506B2 (en) Dynamic over-rendering in late-warping
US20240029197A1 (en) Dynamic over-rendering in late-warping
US11683585B2 (en) Direct scale level selection for multilevel feature tracking under motion blur
US11765457B2 (en) Dynamic adjustment of exposure and iso to limit motion blur
US20220375110A1 (en) Augmented reality guided depth estimation
EP4341781A1 (en) Dynamic initialization of 3dof ar tracking system
CN117337575A (en) Selective image pyramid computation for motion blur mitigation
US11983897B2 (en) Camera intrinsic re-calibration in mono visual tracking system
CN117441343A (en) Related applications of dynamic adjustment of exposure and ISO
US20230154044A1 (en) Camera intrinsic re-calibration in mono visual tracking system
US11941184B2 (en) Dynamic initialization of 3DOF AR tracking system
CN117321635A (en) Direct scale level selection for multi-level feature tracking
US20230401796A1 (en) Fast ar device pairing using depth predictions
US20230421717A1 (en) Virtual selfie stick
CN117321546A (en) Depth estimation for augmented reality guidance
CN117425869A (en) Dynamic over-rendering in post-distortion
CN117321472A (en) Post-warping to minimize delays in moving objects
EP4341786A1 (en) Augmented reality guided depth estimation
CN117337422A (en) Dynamic initialization of three-degree-of-freedom augmented reality tracking system
KR20240008370A (en) Late warping to minimize latency for moving objects
WO2023239776A1 (en) Fast ar device pairing using depth predictions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination