CN115088016A - Method and system for implementing dynamic input resolution for vSLAM systems - Google Patents

Method and system for implementing dynamic input resolution for vSLAM systems Download PDF

Info

Publication number
CN115088016A
CN115088016A CN202180012967.XA CN202180012967A CN115088016A CN 115088016 A CN115088016 A CN 115088016A CN 202180012967 A CN202180012967 A CN 202180012967A CN 115088016 A CN115088016 A CN 115088016A
Authority
CN
China
Prior art keywords
image
pixel resolution
tracking
vslam
calibration data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202180012967.XA
Other languages
Chinese (zh)
Other versions
CN115088016B (en
Inventor
邓凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN115088016A publication Critical patent/CN115088016A/en
Application granted granted Critical
Publication of CN115088016B publication Critical patent/CN115088016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • G06F11/3062Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations where the monitored property is the power consumption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

A method implemented by a computer system, comprising: a visual simultaneous localization and mapping (vSLAM) unit in communication with the computer system is initialized using the first image and the first calibration data set. The first image has a first pixel resolution. The method also includes determining an initialization quality value and determining that the initialization quality value is outside a predetermined initialization threshold. The method also includes generating a second image at a second pixel resolution higher than the first pixel resolution, generating a second calibration data set based at least in part on the second pixel resolution associated with the second image, and reinitializing the vSLAM cell using the second image and the second calibration data set.

Description

Method and system for implementing dynamic input resolution for vSLAM systems
Background
Augmented Reality (AR) overlays virtual content on a user's view of the real world. With the development of the AR Software Development Kit (SDK), the mobile industry brings the smartphone AR into the mainstream. The ARSDK generally provides 6 degree of freedom (6 DoF) tracking capability. A user may scan the environment using a camera included in an electronic device (e.g., a smartphone or AR system), and the device performs real-time visual simultaneous localization and mapping (vSLAM). Vslams can be implemented in mobile devices using a vSLAM unit to detect features of real-world objects and track those features as the mobile device moves in its three-dimensional environment.
Despite advances made in the AR field, there is a need in the art for improved methods and systems related to AR.
Disclosure of Invention
The present invention relates generally to methods and systems related to augmented reality applications. More particularly, embodiments of the present invention provide methods and systems for dynamic image input resolution scaling. The present invention is applicable to a variety of applications involving vSLAM operations, including but not limited to online 3D modeling based on computer vision, AR visualization, facial recognition, robotics, and auto-driving automobiles.
As described herein, embodiments of the present invention respond to computing resource requirements by adjusting image resolution during the vSLAM process. Image resolution may be adjusted (e.g., reduced) to reduce computational requirements and improve system performance. For example, vSLAM initialization may use a high resolution image, while gesture generation may use a lower resolution image scaled down with a scaler.
A system comprised of one or more computers may be used to perform specific operations or actions by installing software, firmware, hardware or a combination thereof on the system, which when executed, causes the system to perform the actions. One or more computer programs may be used to perform particular operations or actions by including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. General aspects include a dynamic visual simultaneous localization and mapping (vSLAM) processing method. In this method, a computer system initializes a vSLAM cell in communication with the computer system using a first image and a first calibration data set, wherein the first image has a first pixel resolution. The method also includes determining an initialization quality value and determining that the initialization quality value is outside a predetermined initialization threshold. The method also includes generating a second image at a second pixel resolution higher than the first pixel resolution, and generating a second calibration data set based at least in part on the second pixel resolution associated with the second image. The method also includes reinitializing the vSLAM cell using the second image and the second calibration data set.
Implementations of the method described above may include one or more of the following features. Optionally, the method includes receiving the raw image at an initial pixel resolution and receiving an initial calibration data set. The method may also include generating a first image from the original image and generating a first calibration data set from the initial calibration data set based at least in part on the first pixel resolution. Optionally, the computer system receives raw images from an optical sensor in communication with the computer system. Optionally, generating the first image comprises: a reduction factor is received from the performance monitor at a scaler unit in communication with the performance monitor and the optical sensor and the original image is reduced from an initial pixel resolution to a first pixel resolution based at least in part on the reduction factor, wherein the first pixel resolution is lower than the initial pixel resolution. Optionally, initializing the vSLAM unit includes generating an initialization result including at least one of an initial output pose, a coordinate system, or an initial object mapping. Optionally, the computer system determines the initialization quality value at least in part by measuring the initialization accuracy using a performance monitor in communication with the computer system, at least in part by measuring an error between the initialization result and motion data generated by an inertial measurement unit in communication with the vSLAM unit. Optionally, the second image is generated from the first image by enlarging the first image from a first pixel resolution to a second pixel resolution. Optionally, the first calibration data set is received by the vSLAM unit from a data scaling processor in communication with the performance monitor. Optionally, the data scaling processor generates a second calibration data set based at least in part on the one or more instructions from the performance monitor. Optionally, the first calibration data set is generated based at least in part on a hardware calibration data set associated with an optical sensor in communication with the computer system.
Another general aspect includes a method for performing dynamic feature tracking. In this method, a computer system performs feature tracking using a vSLAM unit in communication with the computer system, at least in part by tracking one or more features in a first tracking image having a first tracking pixel resolution using a first tracking calibration data set. The method also includes determining a tracking performance criterion and determining that the tracking performance criterion is outside a predetermined tracking threshold. The computer system also generates a second tracking image at a second tracking pixel resolution that is lower than the first tracking pixel resolution, and generates a second tracking calibration data set based at least in part on the second tracking pixel resolution of the second tracking image. The method also includes performing feature tracking using the vSLAM unit, the second tracking image, and the second tracking calibration data set.
Implementations of the method described above may include one or more of the following features. Optionally, the first tracking pixel resolution is a second pixel resolution, determined by an initializer in communication with the vSLAM unit. Optionally, the first tracking pixel resolution is lower than the second pixel resolution, and the first tracking image is generated from the second image by scaling down the second image from the second pixel resolution to the first tracking pixel resolution. Optionally, a second tracking calibration data set is generated from the first tracking calibration data set at least partially according to the second pixel resolution. Optionally, the performance monitor determines the tracking performance criteria at least in part by measuring one or more of a feature detection speed, a CPU utilization value, or a power consumption value.
Another general aspect includes a system for implementing dynamic visual simultaneous localization and mapping (vSLAM) in a mobile device. In such a system, the memory is used to store computer-executable instructions. The optical sensor is used to generate an image at an initial pixel resolution. The motion sensor is used to generate motion data. The system includes a scaler in communication with the optical sensor, a data scaling processor in communication with the memory. The system includes a performance monitor in communication with a sealer and a data scaling processor. The system also includes a vSLAM unit in communication with the performance monitor, the data scaling processor, the sealer, the optical sensor, and the motion sensor. The system also includes one or more processors in communication with the memory for executing computer-executable instructions to implement one or more of the above-described methods. For example, the system may implement a method to perform dynamic initialization of the vSLAM cell when the vSLAM cell is not initialized, and to perform dynamic feature tracking of one or more features in an image generated by the optical sensor when the vSLAM cell is initialized.
Another general aspect includes a method for dynamic initialization and feature tracking using a vSLAM cell. In this method, a computer system receives an original image and an initial calibration data set at an initial pixel resolution. The computer system reduces the original image to provide a first reduced image at a first reduced pixel resolution that is lower than the initial pixel resolution, and generates a first calibration data set from the initial calibration data set based at least in part on the first reduced pixel resolution. The computer system also initializes a visual synchronized positioning and mapping (vSLAM) system using the first reduced image and the first calibration data set. The computer system also generates a first initialization quality value and may determine that the first initialization quality value is outside of a predetermined initialization threshold. If the first initialization quality value is outside of the predetermined initialization threshold, the computer system generates a second scaled-down image at a second pixel resolution, the second pixel resolution being higher than the first scaled-down pixel resolution and lower than the initial pixel resolution, and generates a second calibration data set. Subsequently, the computer system initializes the vSLAM system using the second reduced image and the second calibration data set. Similar to the previous initialization, the computer system determines a second initialization quality value, and may determine that the second initialization quality value is within a predetermined initialization threshold. If the second initialization quality value is within the predetermined initialization threshold, the computer system receives a third image and tracks one or more features in the third image. The computer system determines a tracking performance criterion, and may determine that the tracking performance criterion is outside a predetermined tracking threshold. If the tracking performance criterion is outside of the predetermined tracking threshold, the computer system reduces the third image to provide a third reduced image at a third pixel resolution that is lower than the second pixel resolution, and tracks one or more features in the third reduced image.
Implementations of the method described above may include one or more of the following features. Optionally, the third image is received at a second pixel resolution.
Many benefits are achieved by the present invention over conventional techniques. For example, embodiments of the present invention provide methods and systems that improve the speed, computational performance, and power consumption characteristics of vSLAM initialization routines and feature detection routines. These and other embodiments of the invention and many of its advantages and features are described in more detail below in conjunction with the following text and attached drawings.
Drawings
FIG. 1 illustrates an example of a computer system including an Inertial Measurement Unit (IMU) and RGB optical sensors for feature detection and tracking applications, according to an embodiment of the present invention.
Fig. 2 is a simplified schematic diagram illustrating a system for initializing a vSLAM cell according to an embodiment of the present invention.
FIG. 3 is a simplified schematic diagram illustrating a system for performing feature tracking according to an embodiment of the present invention.
Fig. 4 is a simplified schematic diagram illustrating a system for performing dynamic initialization and feature tracking using a vSLAM cell according to an embodiment of the present invention.
Fig. 5 is a simplified flowchart illustrating a method of initializing a vSLAM cell according to an embodiment of the present invention.
FIG. 6 is a simplified flow diagram illustrating a method of performing feature tracking according to an embodiment of the present invention.
FIG. 7 is a simplified flow diagram illustrating a method of performing initialization and feature tracking according to an embodiment of the present invention.
FIG. 8 illustrates an example computer system according to an embodiment of the present invention.
Detailed Description
In the following description of embodiments according to the present invention, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the described embodiments.
Embodiments of the present disclosure are particularly directed to techniques to initialize and operate a visual simultaneous localization and mapping (vSLAM) unit in a mobile device. The mobile device may include a sealer and a data scaling processor whereby the image and calibration data received by the feature tracking unit and the initializer are scaled at least in part using vSLAM unit operating conditions.
In some embodiments, the vSLAM unit communicates with the sealer, the data scaling processor, and the performance monitor. An image including a real-world object captured by an RGB optical sensor (e.g., a camera) may be scaled down by a scaler based at least in part on one or more instructions from a performance monitor. Further, the data scaling processor may receive separate instructions from the performance monitor to update the calibration data associated with the RGB optical sensors for use by the vSLAM unit. In this manner, the vSLAM unit may implement an initialization routine and/or a feature detection and tracking routine using the adjusted input data to optimize vSLAM performance.
In an illustrative example, a smartphone application (also referred to as a smartphone app) may include AR functionality for superimposing animation elements on objects in the real world. For example, the animation element may be a logo, a floral pattern, a cartoon animal, and the like. For example, a smartphone application may detect and track a particular object so that a particular animation element appears on the phone screen only when the particular object appears in the field of view of the camera. The smartphone application may rely on information about the surface of objects in the surrounding environment of the phone to correctly place the animated element in the display field at an appropriate size, perspective and position to make it appear to interact with real-world objects. In some cases, this information includes images taken by a camera in the cell phone and information about the movement of the cell phone in the environment. In some cases, these two types of information are processed together in a vSLAM unit that is part of the handset to allow the smartphone application to identify the boundaries and surfaces of objects in the world around the handset.
The vSLAM unit may complete the initialization routine prior to detecting and tracking the object features. Initialization is important because it defines the two-dimensional coordinate system used by the smartphone application to place the animation, and generates an initial map of the real object in the field of view of the phone. The initialization results are used for feature detection and tracking routines while the vSLAM unit continuously updates the mapping of real objects and updates the placement and presentation of virtual objects. In the above example, the smartphone application may initialize the vSLAM unit and continuously detect and track real objects in the environment surrounding the phone so that it can map two-dimensional animated decorations on the objects in real time.
In this example, the vSLAM cell may be accompanied by additional cells that optimize initialization, detection, and tracking operations. For example, the scaler may modify the resolution of the image that the vSLAM unit uses for initialization and/or detection and tracking. For example, the scaler may reduce the resolution of the image produced by the camera, thereby requiring less system resources by the vSLAM unit. The scaler may also reduce the resolution of the images generated by the camera for the detection and tracking routines, which may utilize the reduced resolution images. In another example, the scaler may be controlled by a performance monitor that detects whether the vSLAM unit is initialized. If initialized, the performance monitor may instruct the scaler to switch from the initialized zoom to a relatively lower detection and tracking zoom for the image received by the vSLAM unit. The scaling used for detection and tracking may vary with the movement of the handset or the change in the environment surrounding the handset in order to optimize the performance of the smartphone when using an application. In this example, the performance monitor may iteratively check the detection and tracking quality in the vSLAM cell and adjust the scaling accordingly. The performance monitor may also adjust the vSLAM unit for processing the data of the image produced by the camera. This data, which may be referred to as calibration data, describes the hardware that the smartphone uses to collect image and motion data. For example, the vSLAM unit may use the calibration data to account for different locations of different components in the smartphone that generate different types of data used by the vSLAM unit.
In general, vSLAM allows AR systems and other types of systems that use Computer Vision (CV) to detect features and objects in the real world to detect and track objects as the system moves relative to the objects. Because computing resources on mobile devices are typically limited, the vSLAM process can be optimized for system constraints (e.g., power consumption and CPU usage). Furthermore, the optimized initialization quality results in better motion tracking accuracy when computational resources are sufficient, while the system optimizes the performance of the vSLAM unit during detection and tracking. The embodiment of the invention reduces the delay of defining the surface in the vSLAM process, thereby improving the system performance.
FIG. 1 illustrates an example of a computer system 110 including an Inertial Measurement Unit (IMU) 112 and RGB optical sensors 114 for feature detection and tracking applications, according to an embodiment of the present invention. Feature detection and tracking may be implemented by the vSLAM unit 116 of the computer system 110. In general, the RGB optical sensor 114 generates an RGB image of a real-world environment including, for example, real-world objects 130. In some embodiments, the IMU 112 generates motion data related to the motion of the computer system 110 in a three-dimensional environment, where the data includes, for example, rotations and translations of the IMU 112 with respect to 6 degrees of freedom (e.g., translations and rotations according to three cartesian axes). After initialization of the AR session (where the initialization may include calibration and tracking), the vSLAM unit 116 renders an optimized output pose 120 of the real-world environment in the AR session, where the optimized output pose 120 describes a pose of the RGB optical sensor 114 at least partially with respect to a feature map 124 detected in a real-world object 130. The optimized output pose 120 describes a coordinate system and a map for placing a two-dimensional AR object on the real-world object representation 122 of the real-world object 130.
In an example, the computer system 110 represents a suitable user device that includes, in addition to the IMU 112 and the RGB optical sensor 114, one or more Graphics Processing Units (GPUs), one or more General Purpose Processors (GPPs), and one or more memories storing computer readable instructions executable by at least one processor to perform the various functions of embodiments of the present invention. For example, the computer system 110 may be any of a smartphone, a tablet, an AR headset, or a wearable AR device, among others.
The IMU 112 may have a known sampling rate (e.g., the temporal frequency of data point generation), and this value may be stored locally and/or accessible by the vSLAM unit 116. The RGB optical sensor 114 may be a color camera. The RGB optical sensors 114 and IMU 112 may have different sampling rates. Typically, the sampling rate of the RGB optical sensors 114 is lower than the sampling rate of the IMU 112. For example, the sampling rate of the RGB optical sensors 114 may be 30Hz, while the sampling rate of the IMU 112 may be 100 Hz.
Further, the IMU 112 and the RGB optical sensors 114 installed in the computer system 110 may be separated by transformations (e.g., distance offset, angular field difference, etc.). The transformation may be known and its values may be stored locally and/or accessible by vSLAM unit 116. During movement of the computer system 110, the RGB optical sensors 114 and the IMU 112 may experience different motions relative to the centroid, center of mass, or another point of rotation of the computer system 110. In some cases, the transformation may result in an error or mismatch in the vSLAM optimized output pose. To this end, the computer system may include calibration data. In some cases, the calibration data may be set based on the transformation alone. As described with reference to fig. 2-4, the calibration data may include data that is at least partially associated with the resolution of the RGB optical sensor 114, such that variations in the scaling of the image generated by the RGB optical sensor 114 may be compensated for by corresponding adjustments of the calibration data.
vSLAM unit 116 may be implemented as dedicated hardware and/or a combination of hardware and software (e.g., a general-purpose processor and computer-readable instructions stored in memory and executable by the general-purpose processor). As described with reference to fig. 2-4, in addition to initiating AR sessions and performing feature detection and tracking as part of the vSLAM process, computer system 110 may dynamically manage the computational requirements and power consumption of vSLAM unit 116 by implementing a performance monitor in communication with the sealer and data scaling processor.
In the illustrative example of fig. 1, a smartphone is used to display an AR session of a real-world environment. In particular, the AR session includes rendering an AR scene that includes a representation of a real-world table on which a vase 132 (or some other real-world object) is placed. The virtual object 126 will be displayed in the AR scene. In particular, the virtual object will be displayed on a table. As part of detecting how the smartphone is oriented in the real world environment relative to the table and vase, the smartphone may initialize the vSLAM cell using images from the RBG optical sensor 114 or other camera. The vSLAM cell will define a reference coordinate system relative to which the vSLAM cell will detect features in the table and vase. After initialization, the vSLAM unit will detect and track features that are part of the overall AR system. In detecting and tracking features, the smartphone may monitor power consumption, computational requirements, and performance of the vSLAM cell, and may adjust the pixel resolution of the image and calibration data used by the vSLAM cell to actively manage power consumption and computational requirements.
Fig. 2 is a simplified schematic diagram illustrating a system 200 for initializing a vSLAM cell according to an embodiment of the present invention. As described with reference to fig. 1, vSLAM unit 116 may be implemented as part of a computer system (e.g., computer system 110 of fig. 1) in communication with sealer 220, performance monitor 270, and data scaling processor 240. The vSLAM unit 116 may receive the image set 210 and the motion data 280. For example, the vSLAM unit 116 may receive the motion data 280 from an IMU (e.g., IMU 112 of fig. 1) at a sample rate associated with the IMU. For example, the sampling rate may be any frequency greater than 0Hz, including but not limited to 50Hz, 60Hz, 70Hz, 80Hz, 90Hz, 100Hz, and the like. The image set 210 may be generated at an initial pixel resolution by an optical sensor (e.g., RGB optical sensor 114), including but not limited to a camera. The initial pixel resolution may be characteristic of an optical sensor and may range from thousands to millions or higher pixel resolution, including but not limited to 1MP, 2MP, 10MP, 20MP, etc. The optical sensor may generate the image set 210 at a refresh rate greater than 0Hz, including but not limited to 10Hz, 24Hz, 30Hz, 48Hz, 60Hz, etc.
vSLAM unit 116 may contain an initializer unit 254, the initializer unit 254 performing operations consistent with initializing vSLAM unit 116 including, but not limited to, generating a coordinate graph and an initial output pose of the optical sensor as described with reference to fig. 1. Scaler 220 may scale down the images in image set 210 from an initial pixel resolution to a first pixel resolution that is lower than the initial pixel resolution to facilitate initialization of vSLAM unit 116. The first pixel resolution may be a static value associated at least in part with the hardware configuration of the vSLAM unit 116 or may vary based at least in part on the motion data 280 and one or more characteristics of the image set 210. For example, the image may contain relatively few features and may allow initialization at a relatively low first pixel resolution. Although the scaler 220 is discussed with respect to the reduction of an image in some embodiments, this is not required by the present invention, and the scaler may pass the image to the vSLAM unit 116 without reduction or enlargement, but rather maintaining the original resolution.
A performance monitor 270 in communication with vSLAM unit 116 may be used to monitor the performance of vSLAM unit 116 and provide data to one or more system elements as described herein. The performance monitor may generate a set of performance metrics including initialization speed, feature tracking speed, CPU resource usage, initialization accuracy, and the like. In some embodiments, the initialization accuracy is expressed in terms of a measurement cost that is based at least in part on an error value calculated between the visual information generated by the initialization process and the IMU information, such that the initialization accuracy describes, at least in part, a degree of match between the visual data and the motion data 280 received by the vSLAM unit 116. In response to these performance metrics, the image resolution may be increased or decreased as described herein. Further, because new calibration data associated with the reduced resolution image may be utilized for the reduced resolution, the performance metric may be used to provide a request for calibration data.
The performance monitor 270 may also determine an initialization quality value or metric to describe the initialization accuracy. The initialization quality value may be any index, score, value, or rating. For example, the initialization quality value may be expressed as a fraction of a full score of 100, a value between 0 and 1, a value between 0 and 2, or the like. Other examples are possible. In one example, the initialization quality value is represented as a number centered at 1, where a number greater or less than 1 given a margin is equivalently indicative of a reduced initialization quality. In this example, the threshold value will be determined relative to 1, e.g., an initialization quality value outside of the range of 0.95 to 1.05 or 0.9 to 1.1. One of ordinary skill in the art would recognize many variations, modifications, and alternatives. As at least part of determining the initialization quality, performance monitor 270 may compare the initialization quality value to a threshold. The threshold may be a static margin (e.g., 95% of the maximum initialization quality value) or may be dynamically determined based on one or more performance characteristics of the vSLAM unit 116.
In some vSLAM systems, the image provided to the vSLAM unit 116 is downscaled to a resolution lower than the original resolution of the image. In some embodiments, the image is provided to the vSLAM unit 116 at the original resolution, reduced before transmission to the vSLAM unit, or enlarged before transmission to the vSLAM unit. In one example, performance monitor 270 may determine that the initialization quality is low. In response, the scaler 220 may receive an instruction from the performance monitor to begin scaling up the image set 210 to a second pixel resolution higher than the first pixel resolution. In some cases, the first pixel resolution may be too low for the initializer 254 to successfully initialize the vSLAM unit 116, but the second pixel resolution may be sufficient for the vSLAM unit 116 to successfully initialize. The performance monitor may continue to determine additional initialization quality values and iteratively adjust the scaler 220 until the initialization quality values satisfy the threshold. The performance monitor may also determine the target pixel resolution based at least in part on information generated by the vSLAM unit 116, including but not limited to motion data, quality of detection, and the like. As described with reference to fig. 3, the system 200 may switch from an initialization process performed by the initializer 254 to feature detection and tracking in response to the initialization quality value satisfying a threshold.
The system 200 may include a data scaling processor 240, the data scaling processor 240 receiving input from the performance monitor 270 during initialization based at least in part on an initialization quality value. As described with reference to fig. 1, the vSLAM unit 116 may be located at a different position or orientation relative to the centroid of the computer system as compared to the optical sensor. This may require a set of calibration data 230 to allow the vSLAM unit 116 to compensate for the difference. In other embodiments, the calibration data includes intrinsic and extrinsic calibration data of the optical system. This calibration data may be used by the data scaling processor to adjust the calibration data based on the image resolution currently being used by the vSLAM unit 116 (i.e., the current scaling of the image data by the scaler 220), thereby providing updated calibration data that may be loaded into the vSLAM unit 116. In some cases, the calibration data 230 may be predetermined based on the hardware configuration of the computer system. The performance monitor 270 may coordinate the operation of the sealer 220 with the operation of the data scaling processor 240 to adjust the calibration data 230 based at least in part on the scaling up or down performed by the sealer 220. As an example, if the initialization quality metric is outside a predetermined threshold, as detected by the performance monitor, the data scaling processor and scaler may be used to increase the resolution of the image provided to the vSLAM unit to improve the initialization process, e.g., without modifying the resolution of other parts of the system (including feature detection, feature tracking, bundle adjustment, etc.).
FIG. 3 is a simplified schematic diagram illustrating a system 300 for performing feature tracking according to an embodiment of the present invention. As described with reference to fig. 1 and 2, the system 300 may include the vSLAM unit 116, the sealer 220, the data scaling processor 240, and the performance monitor 270. To perform feature detection and tracking, the vSLAM unit 116 may receive the image set 210, the image set 210 having been scaled down by the scaler 220 to a pixel resolution less than the initial or raw pixel resolution produced by the photosensor (e.g., the RGB photosensor 114 of fig. 1). The reduced image received by the vSLAM unit 116 may be used for feature detection and tracking by the feature tracking unit 356. The pixel resolution of the reduced image may be lower than the pixel resolution used during the initialization process (e.g., initializer 254 of fig. 2). As described with reference to fig. 1, the feature tracking unit 356 may detect features in the image with reference to the coordinate system and maps generated during initialization of the vSLAM unit 116.
The performance monitor 270 may determine performance criteria associated, at least in part, with one or more parameters associated with the operation of the feature tracking unit 356. For example, performance monitor 270 may determine the processing speed of feature tracking unit 356, the processor (e.g., CPU) utilization associated with vSLAM unit 116, the power (e.g., battery) consumption rate of the vSLAM unit, or the initialization accuracy.
The processing rate (also referred to as speed) of the feature tracking unit 356 may be the time period of each feature tracking cycle. Similarly, the processing speed may be expressed as a number of cycles per unit time. The processing rate may be compared to a threshold and, in response to the processing rate not satisfying the threshold, the performance monitor 270 may adjust the operation of the scaler unit 220 to scale down the image set 210 and reduce the pixel resolution of the images received by the vSLAM unit 116. As discussed with respect to fig. 2, in coordination with adjusting the scaler unit 220, the performance monitor 270 may adjust the operation of the data scaling processor 240 to achieve a corresponding adjustment in the calibration data 230. The performance monitor 270 may adjust the scaler unit 220 and the data scaling processor 240 in response to any parameter failing to meet a threshold. Performance monitor 270 may also determine a single performance criterion value and adjust based on a comparison of that value to a threshold value. In some cases, such as in the case of a failed initialization and before the vSLAM unit 116 completes the re-initialization, the performance monitor 270 may adjust the sealer unit 220 and the data scaling processor 240 to reduce the amount of reduction of the image in the image set 210 as a way to improve feature detection.
During operation, if the performance monitor 270 determines that the feature tracking performance is outside of the desired performance region, the image provided to the vSLAM cell may be scaled down to improve feature tracking performance, as described herein. In some embodiments, once the performance is within the desired performance region, the image provided to the vSLAM unit may be magnified to a resolution closer to or equal to the native resolution of the camera system. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.
The vSLAM unit 116 may include a Bundle Adjustment (BA) unit 358. The BA unit 358 may receive motion data 280 from an IMU (e.g., IMU 112 of fig. 1) and coordinates of features detected and tracked by the feature tracking unit 356. The BA unit 358 may optimize the output pose 360 generated by the vSLAM unit 116 to minimize a cost function that quantifies the error of fitting a model to parameters including, but not limited to, camera pose and coordinates in a coordinate map associated with a feature (e.g., feature 124 of fig. 1) detected in the three-dimensional environment and provided to the BA unit 358 by the feature tracking unit 356.
In the system 300, the binding adjustment operation in the BA unit 358 may not be affected by the operation of the scaler 220. For example, the adjustment of the pixel resolution of the images in the image set 210 received by the vSLAM unit 116 after the downscaling by the scaler 220 may be applied differently to the operation of the BA unit 358. In this way, the performance of the vSLAM unit 116 relative to the feature tracking unit 356 may be managed by the performance monitor 270 without affecting the output gestures 360 generated by the vSLAM unit 116.
Fig. 4 is a simplified schematic diagram illustrating a system 400 for performing dynamic initialization and feature tracking using vSLAM cells according to an embodiment of the present invention. vSLAM unit 116 may continue to operate during periods when the computer system in which it is incorporated (e.g., computer system 110 of fig. 1) may be moved from one environment to another with different conditions. For example, a cell phone running an AR application may be brought from an internal environment to an external environment while the AR application is running. In an embodiment of system 400, vSLAM unit 116 may include logic implemented as software or hardware to determine initialization state 452. Initialization state 452 may fail when the initial output gesture, initial detected feature, or coordinate system no longer describes the environment of the computer system with sufficient accuracy, or when an error function describing that accuracy fails a threshold test. Rather than using the pixel resolution used by feature tracking unit 356, vSLAM unit 116 may switch from feature tracking mode 474 to initialization mode 472 whereby performance monitor 270 adjusts scaler 220 to scale down the images in image set 210 received by vSLAM unit 116 to the pixel resolution used by initializer unit 254. Similarly, performance monitor 270 adjusts the operation of data scaling processor 240 in initialization mode 472 to provide updated and/or adjusted calibration data 230 to vSLAM unit 116 for use by initializer 254. The initialization pattern 472 may include the dynamic adjustment described with reference to fig. 2.
After initializing or re-initializing vSLAM unit 116, when initialization state 452 indicates that the vSLAM system is initialized, system 400 may switch from initialization mode 472 to feature tracking mode 474, applying the updated coordinate graph and the initial output pose generated by initializer 254. Operation of the system 400 in the feature tracking mode 474 may proceed as described with reference to fig. 3, with continuous or periodic dynamic adjustment of the operation of the sealer 220 and the data scaling processor 240 to optimize performance of the vSLAM unit 116. As described with reference to fig. 3, operation of system 400 can be isolated to a single mode at a time. For example, when the system 400 is in the initialization mode 472, the performance monitor may only adjust the scaler 220 and data scaling processor for images in the image set 210 that are used for initialization and not for feature tracking. Thus, rather than starting feature tracking with initialization parameters, scaler 220 and the data scaling processor may apply scaling adjustments to perform feature tracking when system 400 switches from initialization mode 472 to feature tracking mode 474. The system 400 may store the sealer 220 and data scaling processor 240 parameters and/or may apply default parameters each time the system 400 switches modes. The system 400 may also apply other methods to optimize mode switching.
Fig. 5-7 are simplified flow diagrams illustrating a method of performing dynamic operations of a vSLAM cell in accordance with at least one aspect of the present disclosure. The flow is described in connection with a computer system that is an example of the computer system described above. Some or all of the operations of the flow may be implemented via specific hardware on a computer system and/or may be implemented as computer readable instructions stored on a non-transitory computer readable medium of a computer system. As stored, the computer readable instructions represent programmable modules comprising code executable by a processor of a computer system. Execution of such instructions configures the computer system to perform the corresponding operations. Each programmable module in combination with a processor represents means for performing the respective operations. While the operations are described in a particular order, it should be understood that the particular order is not required and that one or more operations may be omitted, skipped, and/or reordered. The methods described in the separate flows may be combined in a single method.
Fig. 5 is a simplified flowchart illustrating a method of initializing a vSLAM cell according to an embodiment of the present invention. The method includes initializing a visual simultaneous localization and mapping (vSLAM) unit in communication with a computer system using a first image and a first calibration data set, wherein the first image has a first pixel resolution (502). As described with reference to fig. 2, the vSLAM unit may be initialized prior to feature detection and tracking to determine parameters for mapping features in a three-dimensional environment to a two-dimensional coordinate system. Optionally, the method may include receiving an original image at an initial pixel resolution, receiving an initial calibration data set, generating a first image from the original image, and generating a first calibration data set from the initial calibration data set based at least in part on the first pixel resolution. As described with reference to fig. 1, the computer system may include RGB optical sensors, including but not limited to cameras, and may generate raw images at an initial or raw pixel resolution. When operating the system as described with reference to fig. 2-4, the image may be dynamically adjusted to optimize performance of the vSLAM unit (e.g., vSLAM unit 116 of fig. 1-4) during initialization and/or feature detection and tracking operations. Optionally, generating the first image may include receiving a reduction factor from the performance monitor at a scaler unit in communication with the performance monitor and the camera, and reducing the original image from an initial pixel resolution to a first pixel resolution based at least in part on the reduction factor, wherein the first pixel resolution is lower than the initial pixel resolution.
Alternatively, the computer system may receive raw images from a camera in communication with the computer system. As described with reference to fig. 2, the vSLAM cell may utilize reduced pixel resolution for initialization and/or feature tracking operations in order to conserve system resources. For example, the computer system may be a mobile device, including but not limited to a smartphone or tablet. For example, a smartphone may use images captured at up to 20MP raw resolution, which may cause the vSLAM cell to operate beyond the expected performance targets, at a pixel resolution of 20 MP. As described with reference to fig. 2-4, the computer system may implement a scaler to generate a reduced or enlarged image, enabling the vSLAM unit to operate efficiently. Optionally, the first calibration data set may be received by the vSLAM unit from a data scaling processor in communication with the performance monitor. Optionally, the first calibration data set may be generated based at least in part on a hardware calibration data set associated with a camera in communication with the computer system.
The method also includes determining an initialization quality value (504). In some cases, the computer system may determine an initialization quality value as a way to increase initialization accuracy and improve vSLAM operation. Alternatively, the computer system may determine the initialization quality value at least in part by measuring the initialization accuracy using a performance monitor in communication with the computer system. As described in more detail with reference to fig. 2-4, the initialization quality value may be determined according to a cost function based at least in part on a measure of error between the motion data and the image data.
The method also includes determining that the initialization quality value is outside a predetermined threshold (506). As described with reference to fig. 2, the computer system may compare the initialization quality value to a threshold value. For example, the threshold may indicate that the initialization quality is not satisfactory. The system may repeat the initialization, adjusting the image resolution upward until the accuracy is within a predetermined threshold.
The method also includes generating a second image at a second pixel resolution higher than the first pixel resolution (508). Alternatively, the second image may be generated from the first image. As described in more detail with reference to fig. 2-4, the vSLAM unit may receive an image in the set of images, which is used for initialization. In some cases, the image may be scaled down for the first initialization process, which may produce unsatisfactory results, and thus scaled up to a higher second pixel resolution for the second initialization process.
The method also includes generating a second calibration data set based at least in part on a second pixel resolution associated with the second image (510). Optionally, the data scaling processor may generate the second calibration data set based at least in part on one or more instructions from the performance monitor. As described in more detail with reference to fig. 2-4, the performance monitor may adjust the calibration data based at least in part on the pixel resolution of the image generated by the scaler.
The method also includes reinitializing the vSLAM cell using the second image and the second calibration data set (512). In the initialization mode, the vSLAM unit of the computer system may continue to operate, determine whether the system is initialized, and continue initialization operations until the quality of initialization meets a minimum level of accuracy, as described with reference to fig. 4. For example, if the initialization operation using one image is not satisfactory, the vSLAM system may repeat the operation for a new image.
It should be understood that the specific steps shown in fig. 5 provide a particular method of initializing a vSLAM cell according to an embodiment of the present invention. Other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the above steps in a different order. Moreover, the various steps shown in FIG. 5 may include multiple sub-steps that may be performed in various orders depending on the various steps. In addition, additional steps may be added or deleted depending on the particular application. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.
FIG. 6 is a simplified flow diagram illustrating a method of performing feature tracking according to an embodiment of the present invention. The method includes performing feature tracking using a visual simultaneous localization and mapping (vSLAM) unit in communication with a computer system at least in part by detecting one or more features in a first image at least in part from a first calibration data set, wherein the first image has a first pixel resolution (602). As described with reference to fig. 3, the computer system may detect features and track those features over time and motion, optimizing the output pose of the RGB optical sensor using, at least in part, the tracked features, enabling the computer system to function as an AR system. Alternatively, the first pixel resolution may be determined by an initializer in communication with the vSLAM unit, as described in more detail with reference to fig. 5. Optionally, the method includes receiving an original image at an initial pixel resolution, receiving an initial calibration data set, generating a first image from the original image, and generating a first calibration data set from the initial calibration data set based at least in part on the first pixel resolution. Alternatively, as described in more detail with reference to fig. 2, the scaler unit may receive the raw image at an initial pixel resolution from a camera in communication with the computer system. Optionally, the first calibration data set may be generated by a data scaling processor in communication with the computer system, wherein the first calibration data set is associated with one or more characteristics of the camera head.
The method also includes determining tracking performance criteria (604). As described with reference to fig. 3, the computer system may continue to operate while repeatedly performing multiple iterations of feature tracking, dynamically modifying the pixel resolution and calibration data used by the vSLAM cell to optimize the performance determined by the performance monitor. To this end, the computer system may determine performance characteristics, as described with reference to fig. 3, to determine how to dynamically optimize the performance of a vSLAM cell (e.g., vSLAM cell 116 of fig. 1-4). Optionally, the performance monitor determines the performance criteria at least in part by measuring one or more of a feature detection speed, a CPU utilization value, or a power consumption value.
The method also includes determining that the tracking performance criteria is outside of a predetermined threshold (606). As described in more detail with reference to fig. 3-4, the tracking performance criteria may be monitored for each feature tracking process (602) to measure whether the vSLAM cell is operating within predetermined parameters. For example, the tracking performance criterion may indicate that the speed of feature tracking is too slow compared to a predetermined threshold.
The method also includes generating a second image at a second pixel resolution that is lower than the first pixel resolution (608). As described with reference to fig. 3, in response to the performance criterion failing to meet the threshold, the second image may be scaled down differently than the first image. Optionally, the second image is generated from the first image. In this manner, the second image may be provided by upscaling the first image from the first pixel resolution to the second pixel resolution, as opposed to downscaling the corresponding original image from the initial pixel resolution to the second pixel resolution.
The method also includes generating a second calibration data set based at least in part on a second pixel resolution of a second image (610). Optionally, the data scaling processor generates a second calibration data set based at least in part on the one or more instructions from the performance monitor. As described in more detail with reference to fig. 3-5, the vSLAM unit may receive calibration data from a calibration data processor, which may receive instructions from a performance monitor. As part of the dynamic vSLAM cell operation, calibration data may be adjusted based at least in part on the pixel resolution used by the vSLAM cell for feature tracking. Optionally, a second calibration data set is generated from the first calibration data set.
The method also includes performing feature tracking (612) using the vSLAM unit, the second image, and the second calibration data set. As described in more detail with reference to fig. 3-4, the vSLAM unit may continuously perform feature tracking on a set of images received from a sensor (e.g., a camera or other sensor), dynamically adjusting the pixel resolution used by the vSLAM system.
It should be appreciated that the specific steps illustrated in FIG. 6 provide a particular method of tracking one or more features in an image according to an embodiment of the invention. Other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the above steps in a different order. Moreover, the various steps shown in FIG. 6 may include multiple sub-steps that may be performed in various orders depending on the various steps. In addition, additional steps may be added or deleted depending on the particular application. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.
FIG. 7 is a simplified flow diagram illustrating a method of performing initialization and feature tracking according to an embodiment of the present invention. The method includes receiving an original image at an initial pixel resolution (702).
The method also includes receiving an initial calibration data set (704).
The method also includes reducing the original image to provide a first reduced image at a first reduced pixel resolution that is lower than the initial pixel resolution (706).
The method also includes generating a first calibration data set from the initial calibration data set based at least in part on a first reduced pixel resolution associated with the first image (708).
The method also includes initializing the vSLAM system using the first reduced pixel resolution and the first calibration data set (710).
The method also includes generating a first initialization quality value (712).
The method also includes determining that the first initialization quality value is outside a predetermined threshold (714).
The method also includes generating a second image at a second pixel resolution, the second pixel resolution being higher than the first pixel resolution and lower than the initial pixel resolution (716).
The method also includes generating a second calibration data set based at least in part on a second pixel resolution of the second image (718).
The method also includes reinitializing the vSLAM cell using the second reduced image and the second calibration data set (720).
The method also includes determining a second initialization quality value (722).
The method also includes determining that the second initialization quality value is within a predetermined threshold (724).
The method also includes receiving a third image (726). Optionally, the third image is received at a second pixel resolution. As described in more detail with reference to fig. 6, the feature tracking operation may be performed using the pixel resolution used for initialization as the first pixel resolution, and a performance monitor in communication with the sealer and data scaling processor may dynamically adjust the pixel resolution of the images in the image set according to the pixel resolution to optimize VSLAM cell performance.
The method also includes tracking one or more features in the third image (728).
The method also includes determining tracking performance criteria (732).
The method also includes determining that the tracking performance criteria is outside of a predetermined threshold (734).
The method also includes reducing the third image to provide a third reduced image at a third pixel resolution that is lower than the second pixel resolution (736).
The method also includes tracking one or more features in the third scaled-down image (738).
It should be appreciated that the specific steps illustrated in fig. 7 provide a particular method of initializing a vSLAM cell and tracking one or more features in an image according to embodiments of the present invention. Other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the above steps in a different order. Moreover, the various steps shown in FIG. 7 may include multiple sub-steps that may be performed in various orders depending on the various steps. In addition, additional steps may be added or deleted depending on the particular application. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.
FIG. 8 illustrates an example computer system according to an embodiment of the present invention. Computer system 800 is an example of a computer system described above. Although these components are shown as belonging to the same computer system 800, the computer system 800 may also be distributed.
Computer system 800 includes at least a processor 802, a memory 804, a storage device 806, an input/output (I/O) device 808, a communication peripheral device 810, and an interface bus 812. Interface bus 812 may be used to communicate, send and transfer data, control and commands between the various components of computer system 800. The memory 804 and the storage device 806 include computer-readable storage media such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM), hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage such as
Figure BDA0003782365130000111
Memory, and other tangible storage media. Any such computer-readable storage media may be used to store instructions embodying the present disclosureInstructions or program code for aspects of (a). Memory 804 and storage 806 also include computer-readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any combination thereof. Computer readable signal media includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with computer system 800.
Further, memory 804 may include an operating system, programs, and applications. The processor 802 is used to execute stored instructions and includes, for example, logic processing units, microprocessors, digital signal processors, and other processors. The memory 804 and/or the processor 802 may be virtualized and may be hosted in another computer system, such as a cloud network or a data center. The I/O peripherals 808 include user interfaces such as keyboards, screens (e.g., touch screens), microphones, speakers, other input/output devices, and computing components such as graphics processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. I/O peripheral 808 is connected to processor 802 by any port coupled to interface bus 812. Communication peripherals 810 are used to facilitate communication between computer system 800 and other computing devices over a communication network and include, for example, network interface controllers, modems, wireless and wired interface cards, antennas, and other communication peripherals.
While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. It is therefore to be understood that the present disclosure is presented for purposes of illustration and not limitation, and does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure.
Unless specifically stated otherwise, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," and "identifying" or the like, refer to the action and processes of a computing device (e.g., one or more computers or similar electronic computing devices) that manipulates and transforms data represented as physical electronic or magnetic quantities within the computing platform's memories, registers, or other information storage devices, transmission devices, or display devices.
The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provides results conditioned on one or more inputs. Suitable computing devices include a microprocessor-based, multi-purpose computer system with access to stored software that programs or configures the computer system from a general-purpose computing device to a special-purpose computing device implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combination of languages may be used to implement the teachings contained herein in software used to program or configure a computing device.
Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the above examples may be changed-e.g., the blocks may be reordered, combined, and/or broken into sub-blocks. Some blocks or processes may be performed in parallel.
Conditional language used herein, such as "may," "e.g.," and the like, unless expressly stated otherwise or otherwise understood in the context of usage, is generally intended to convey that certain examples include but others do not include certain features, elements and/or steps. Thus, such conditional language does not generally imply that features, elements, and/or steps are in any way required by one or more examples or that one or more examples must include logic for deciding, with or without author input or prompting, whether these features, elements, and/or steps are included or are to be performed in any particular example.
The terms "comprising," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude other elements, features, acts, operations, and the like. Furthermore, the term "or" is used in its inclusive (and not exclusive) sense, such that when used, for example, to connect lists of elements, the term "or" indicates one, some, or all of the elements in the list. As used herein, "adapted to" or "for" refers to open and inclusive language and does not exclude devices adapted to or used to perform additional tasks or steps. Moreover, the use of "based on" is meant to be open and inclusive in that a process, step, calculation, or other action that is "based on" one or more recited conditions or values may in fact be based on additional conditions or values beyond those recited. Similarly, the use of "based, at least in part, on" means open and inclusive, in that a process, step, calculation, or other action that is "based, at least in part, on one or more recited conditions or values may, in practice, be based on additional conditions or values than those recited. Headings, lists, and numbers are included herein for ease of explanation only and are not meant to be limiting.
The various features and processes described above may be used independently of one another or may be used in various combinations. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. Moreover, certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are also not limited to any particular order, and the blocks or states associated therewith may be performed in other suitable orders. For example, described blocks or states may be performed in an order different than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in series, in parallel, or in some other manner. Blocks or states may be added to or deleted from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added, removed, or rearranged as compared to the disclosed examples.

Claims (24)

1. A method implemented by a computer system, the method comprising:
initializing a visual simultaneous localization and mapping (vSLAM) unit in communication with the computer system using a first image and a first calibration data set, wherein the first image has a first pixel resolution;
determining an initialization quality value;
determining that the initialization quality value is outside a predetermined initialization threshold;
generating a second image at a second pixel resolution higher than the first pixel resolution;
generating a second calibration data set based at least in part on the second pixel resolution associated with the second image; and
reinitializing the vSLAM unit using the second image and the second calibration data set.
2. The method of claim 1, further comprising:
receiving an original image at an initial pixel resolution;
receiving an initial calibration data set;
generating the first image from the original image; and
generating the first calibration data set from the initial calibration data set based at least in part on the first pixel resolution.
3. The method of claim 2, wherein the computer system receives the raw image from an optical sensor in communication with the computer system.
4. The method of claim 3, wherein generating the first image comprises:
receiving a scale-down factor from a performance monitor at a scaler unit in communication with the performance monitor and the optical sensor; and
downscaling the original image from the initial pixel resolution to the first pixel resolution based at least in part on the downscaling factor, wherein the first pixel resolution is lower than the initial pixel resolution.
5. The method of claim 1, wherein:
initializing the vSLAM unit includes generating an initialization result including at least one of an initial output pose, a coordinate system, or an initial object mapping; and
determining the initialization quality value includes: measuring an initialization accuracy using a performance monitor in communication with the computer system at least in part by measuring an error between the initialization result and motion data generated by an inertial measurement unit in communication with the vSLAM unit.
6. The method of claim 1, wherein the second image is generated from the first image by enlarging the first image from the first pixel resolution to the second pixel resolution.
7. The method of claim 1, wherein the first calibration data set is received by the vSLAM unit from a data scaling processor in communication with a performance monitor.
8. The method of claim 7, wherein the data scaling processor generates the second calibration data set at least in part according to one or more instructions from the performance monitor.
9. The method of claim 1, wherein the first calibration data set is generated based at least in part on a hardware calibration data set associated with an optical sensor in communication with the computer system.
10. The method of claim 1, further comprising:
performing feature tracking using the vSLAM unit in communication with the computer system at least in part by tracking one or more features in a first tracking image having a first tracking pixel resolution using a first tracking calibration data set;
determining a tracking performance standard;
determining that the tracking performance criteria is outside a predetermined tracking threshold;
generating a second tracking image at a second tracking pixel resolution lower than the first tracking pixel resolution;
generating a second tracking calibration data set based at least in part on the second tracking pixel resolution of the second tracking image; and
performing feature tracking using the vSLAM unit, the second tracking image, and the second tracking calibration data set.
11. The method of claim 10, wherein the first tracking pixel resolution is the second pixel resolution, determined by an initializer in communication with the vSLAM unit.
12. The method of claim 10, wherein:
the first tracking pixel resolution is lower than the second pixel resolution; and
generating the first tracking image from the second image by down-scaling the second image from a second pixel resolution to the first tracking pixel resolution.
13. The method of claim 10, wherein the second tracking calibration data set is generated from the first tracking calibration data set at least in part according to the second pixel resolution.
14. The method of claim 10, wherein a performance monitor determines the tracking performance criteria at least in part by measuring one or more of a feature detection speed, a CPU utilization value, or a power consumption value.
15. A system for implementing dynamic visual simultaneous localization and mapping (vSLAM) in a mobile device, the system comprising:
a memory for storing computer executable instructions;
an optical sensor to generate an image at an initial pixel resolution;
a motion sensor for generating motion data;
a scaler in communication with the optical sensor;
a data scaling processor in communication with the memory;
a performance monitor in communication with the sealer and the data scaling processor;
a vSLAM unit in communication with the performance monitor, the data scaling processor, the sealer, the optical sensor, and the motion sensor; and
one or more processors in communication with the memory to execute the computer-executable instructions to at least:
when the vSLAM cell is not initialized, performing dynamic initialization of the vSLAM cell; and
upon initialization of the vSLAM unit, the vSLAM unit performs dynamic feature tracking on one or more features in an image generated by the optical sensor.
16. The system of claim 15, wherein the dynamic initialization comprises:
the scaler receives a raw image from the optical sensor at the initial pixel resolution;
the data scaling processor receiving an initial calibration data set from the memory;
the scaler downscales the original image to provide a first downscaled image at a first downscaled pixel resolution that is lower than the initial pixel resolution;
generating, by the data scaling processor, a first calibration data set from the initial calibration data set based at least in part on the first reduced pixel resolution;
initializing the vSLAM unit using the first reduced image and the first calibration data set;
generating, by the performance monitor, a first initialization quality value;
the performance monitor determining that the first initialization quality value is outside a predetermined initialization threshold;
the scaler generates a second downscaled image at a second pixel resolution, the second pixel resolution being higher than the first downscaled pixel resolution and lower than the initial pixel resolution;
the data scaling processor generating a second calibration data set based at least in part on the second pixel resolution of the second image;
reinitializing the vSLAM cell using the second reduced image and the second calibration data set;
the performance monitor determining a second initialization quality value; and
the performance monitor determines that the second initialization quality value is within the predetermined initialization threshold.
17. The system of claim 16, wherein the sealer generates the second reduced image by enlarging the first reduced image from the first reduced pixel resolution to the second pixel resolution.
18. The system of claim 16, wherein dynamic feature tracking comprises:
the vSLAM unit receives a third image from the scaler at a third pixel resolution;
the vSLAM unit receiving a third calibration data set from the data scaling processor at least in part according to the third pixel resolution;
the vSLAM unit tracks one or more features in the third image;
the performance monitor determining a tracking performance criterion;
the performance monitor determining that the tracking performance criteria is outside a predetermined tracking threshold;
the scaler generates a fourth image at a fourth pixel resolution lower than the third pixel resolution;
the data scaling processor generating a fourth calibration data set based at least in part on the fourth pixel resolution;
the vSLAM unit tracks the one or more features in the fourth image;
the performance monitor determining an updated tracking performance criterion based at least in part on tracking the one or more features in the fourth image; and
the performance monitor determines that the updated tracking performance criteria is within the predetermined tracking threshold.
19. The system of claim 18, wherein dynamic feature tracking further comprises the one or more processors storing the third pixel resolution in the memory, and the sealer scaling down the image received from the optical sensor from the initial pixel resolution to the third pixel resolution when performing a dynamic feature tracking operation.
20. The system of claim 18, wherein the third image is received from the scaler at the second pixel resolution.
21. The system of claim 18, wherein the sealer generates the fourth image by scaling down the third image from the third pixel resolution to the fourth pixel resolution.
22. The system of claim 18, wherein the performance monitor determines the tracking performance criteria at least in part by measuring one or more of a feature detection speed, a CPU utilization value, or a power consumption value.
23. A method implemented by a computer system, the method comprising:
receiving an original image at an initial pixel resolution;
receiving an initial calibration data set;
scaling down the original image to provide a first scaled down image at a first scaled down pixel resolution lower than the initial pixel resolution;
generating a first calibration data set from the initial calibration data set based at least in part on the first reduced pixel resolution;
initializing a visual synchronized positioning and mapping (vSLAM) system using the first reduced image and the first calibration data set;
generating a first initialization quality value;
determining that the first initialization quality value is outside a predetermined initialization threshold;
generating a second reduced image at a second pixel resolution, the second pixel resolution being higher than the first reduced pixel resolution and lower than the initial pixel resolution;
generating a second calibration data set based at least in part on the second pixel resolution of the second image;
reinitializing the vSLAM system using the second reduced image and the second calibration data set;
determining a second initialization quality value;
determining that the second initialization quality value is within the predetermined initialization threshold;
receiving a third image;
tracking one or more features in the third image;
determining a tracking performance standard;
determining that the tracking performance criteria is outside a predetermined tracking threshold;
downscaling the third image to provide a third downscaled image at a third pixel resolution lower than the second pixel resolution; and
tracking the one or more features in the third reduced image.
24. The method of claim 23, wherein the third image is received at the second pixel resolution.
CN202180012967.XA 2020-02-05 2021-02-04 Method and system for implementing dynamic input resolution of vSLAM system Active CN115088016B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202062970598P 2020-02-05 2020-02-05
US62/970,598 2020-02-05
PCT/CN2021/075354 WO2021155828A1 (en) 2020-02-05 2021-02-04 Method and system for implementing dynamic input resolution for vslam systems

Publications (2)

Publication Number Publication Date
CN115088016A true CN115088016A (en) 2022-09-20
CN115088016B CN115088016B (en) 2024-08-23

Family

ID=77199708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180012967.XA Active CN115088016B (en) 2020-02-05 2021-02-04 Method and system for implementing dynamic input resolution of vSLAM system

Country Status (2)

Country Link
CN (1) CN115088016B (en)
WO (1) WO2021155828A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108476311A (en) * 2015-11-04 2018-08-31 奇跃公司 Dynamic display calibration based on eye tracking
CN109029433A (en) * 2018-06-28 2018-12-18 东南大学 Join outside the calibration of view-based access control model and inertial navigation fusion SLAM on a kind of mobile platform and the method for timing
CN109238306A (en) * 2018-08-30 2019-01-18 Oppo广东移动通信有限公司 Step counting data verification method, device, storage medium and terminal based on wearable device
US20190096081A1 (en) * 2017-09-28 2019-03-28 Samsung Electronics Co., Ltd. Camera pose determination and tracking
CN109804411A (en) * 2016-08-30 2019-05-24 C3D增强现实解决方案有限公司 System and method for positioning and mapping simultaneously

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11199414B2 (en) * 2016-09-14 2021-12-14 Zhejiang University Method for simultaneous localization and mapping
US11080890B2 (en) * 2017-07-28 2021-08-03 Qualcomm Incorporated Image sensor initialization in a robotic vehicle
CN108829595B (en) * 2018-06-11 2022-05-17 Oppo(重庆)智能科技有限公司 Test method, test device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108476311A (en) * 2015-11-04 2018-08-31 奇跃公司 Dynamic display calibration based on eye tracking
CN109804411A (en) * 2016-08-30 2019-05-24 C3D增强现实解决方案有限公司 System and method for positioning and mapping simultaneously
US20190096081A1 (en) * 2017-09-28 2019-03-28 Samsung Electronics Co., Ltd. Camera pose determination and tracking
CN109029433A (en) * 2018-06-28 2018-12-18 东南大学 Join outside the calibration of view-based access control model and inertial navigation fusion SLAM on a kind of mobile platform and the method for timing
CN109238306A (en) * 2018-08-30 2019-01-18 Oppo广东移动通信有限公司 Step counting data verification method, device, storage medium and terminal based on wearable device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GEORG KLEIN ,DAVID MURRAY: "Parallel tracking and mapping for small AR workspaces", 《2007 6TH IEEE AND ACM INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY》, 13 November 2007 (2007-11-13) *
林辉灿;吕强;王国胜;张洋;梁冰;: "基于VSLAM的自主移动机器人三维同时定位与地图构建", 计算机应用, no. 10, 10 October 2017 (2017-10-10) *

Also Published As

Publication number Publication date
CN115088016B (en) 2024-08-23
WO2021155828A1 (en) 2021-08-12

Similar Documents

Publication Publication Date Title
US9927870B2 (en) Virtual reality system with control command gestures
CN110462555B (en) Selectively applying a reprojection process on a layer sub-region to optimize post-reprojection power
US10394318B2 (en) Scene analysis for improved eye tracking
CN107646126B (en) Camera pose estimation for mobile devices
US20180275748A1 (en) Selectively applying reprojection processing to multi-layer scenes for optimizing late stage reprojection power
US11676292B2 (en) Machine learning inference on gravity aligned imagery
JP6360509B2 (en) Information processing program, information processing system, information processing method, and information processing apparatus
WO2022022449A1 (en) Method and apparatus for spatial positioning
KR20220036974A (en) Methods, systems, and media for rendering immersive video content with foveated meshes
CN114911398A (en) Method for displaying graphical interface, electronic device and computer program product
KR20210046759A (en) Image display methods, devices and systems
CN109598672B (en) Map road rendering method and device
KR20190048614A (en) Method and apparatus for recognizing pose
US20230005172A1 (en) Method and System for Implementing Adaptive Feature Detection for VSLAM Systems
CN115088016B (en) Method and system for implementing dynamic input resolution of vSLAM system
CN108369726B (en) Method for changing graphic processing resolution according to scene and portable electronic device
CN112099712B (en) Face image display method and device, electronic equipment and storage medium
CN105376510A (en) Projection method and projection device
CN107817887B (en) Information processing method and device
CN118823271A (en) Screen conversion parameter determining method and device from three-dimensional model to wearable equipment
CN116007621A (en) Positioning mode switching method and device, intelligent wearable device and storage medium
CN115779418A (en) Image rendering method and device, electronic equipment and storage medium
JP2004102337A (en) Image processing apparatus, image processing method, storage medium with program for executing the method stored therein, and program therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant