WO2021160071A1

WO2021160071A1 - Feature spatial distribution management for simultaneous localization and mapping

Info

Publication number: WO2021160071A1
Application number: PCT/CN2021/075851
Authority: WO
Inventors: Youjie XIA
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-02-11
Filing date: 2021-02-07
Publication date: 2021-08-19
Also published as: CN115004229A

Abstract

A method of managing feature spatial distribution for localization and mapping may include capturing a 2D image of a 3D space. The 2D image may include a plurality of tracked feature points. The method may further include: selecting, from the plurality of feature points, a first subset of feature points evenly distributed in the 2D image; and retrieving, from a point cloud, coordinates of the first subset of feature points. The coordinates define locations of the first subset of feature points in a global coordinate map of the 3D space. The method may further include: selecting, from the first subset of feature points, a second subset of feature points evenly distributed in the 3D space; optimizing the coordinates of the second subset of feature points based on a camera pose; and updating the coordinates of the second subset of feature points in the point cloud.

Description

FEATURE SPATIAL DISTRIBUTION MANAGEMENT FOR SIMULTANEOUS LOCALIZATION AND MAPPING

BACKGROUND OF THE INVENTION

Augmented Reality (AR) superimposes virtual content over a user's view of the real world. With the development of AR software development kits (SDK) , the mobile industry has brought smartphone AR to the mainstream. An AR SDK typically provides six degrees-of-freedom (6DoF) tracking capability. A user can scan the environment using a smartphone's camera, and the smartphone performs visual inertial odometry (VIO) in real time. Once the camera pose is tracked continuously, virtual objects can be placed into the AR scene to create an illusion that real objects and virtual objects are merged together.

Despite the progress made in AR systems, there is a need in the art for improved methods and systems related to feature spatial distribution management for localization and mapping.

SUMMARY OF THE INVENTION

This disclosure generally relates to extended reality technologies, and more specifically, and without limitation, to feature spatial distribution management for localization and mapping.

Embodiments may include a method of managing feature spatial distribution for localization and mapping. In some embodiments, the method may include capturing a two-dimensional (2D) image of a three-dimensional (3D) space. The 2D image may include a plurality of feature points that have been previously detected and tracked. The method may further include selecting, from the plurality of feature points, a first subset of feature points that may be evenly distributed in the 2D image. The method may also include retrieving, from a point cloud, coordinates of the first subset of feature points. The point cloud may include previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space. The method may further include selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that may be evenly distributed in the 3D space. The method may further include optimizing the coordinates of the second subset of feature points based at least in part on a camera pose from which the 2D image may be captured. The camera pose may be determined based on data received from an inertial measurement unit (IMU) configured to track the camera pose in the 3D space. The method may also include updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.

In some embodiments, the plurality of feature points may be a first plurality of feature points. The method further may also include detecting a second plurality of feature points in the 2D image. The method may further include selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined may be evenly distributed in the 2D image. In some embodiments, the third subset of feature points may be selected such that a number of the first subset of feature points and the third subset of feature points combined may be within a predetermined range.

In some embodiments, the method may further include tracking the third subset of feature points in one or more subsequently captured 2D images. The method may also include determining coordinates of the third subset of feature points using triangulation. The method may also include adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points. In some embodiments, the method may further include capturing another 2D image of the 3D space. The another 2D image may include at least a portion of the second subset of feature points or a portion of the third subset of feature points. The method may also include selecting, from at least the portion of the second subset of feature points or the portion of the third subset of feature points combined, a fourth subset of feature points that may be evenly distributed in the another 2D image. The method may further include retrieving, from the point cloud, the coordinates of the fourth subset of feature points. The method may further include selecting, from the fourth subset of feature points, a fifth subset of feature points such that the fifth subset of feature points may be evenly distributed in the 3D space. The method may also include optimizing the coordinates of the fifth subset of feature points based at least in part on a camera pose from which the another 2D image may be captured.

In some embodiments, the plurality of feature points may be detected based on at least one of color or color intensity of one or more adjacent pixels representing each of the plurality of feature points and surrounding pixels in one or more previously captured 2D images. In some embodiments, the method may further include dividing the 2D image into a plurality of areas that have an equal size. The first subset of feature points may be evenly distributed in the 2D image in that the selected feature points may be distributed among different areas of the plurality of areas. In some embodiments, the method may further include dividing a space in the global coordinate map that corresponds to the 3D space into a plurality of volumes that have an equal size. The second subset of feature points may be evenly distributed in the 3D space in that the selected feature points may be distributed among different volumes of the plurality of volumes.

Embodiments may further include an electronic device for managing feature spatial distribution for localization and mapping. The electronic device may include a camera, one or more processors, and a memory. The memory may have instructions that, when executed by the one or more processors, may cause the electronic device to perform the operation of capturing, using the camera, a two-dimensional (2D) image of a three-dimensional (3D) space. The 2D image may include a plurality of feature points that have been previously detected and tracked by the electronic device. The instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of selecting, from the plurality of feature points, a first subset of feature points that may be evenly distributed in the 2D image. The instructions, when executed by the one or more processors, may also cause the electronic device to perform the operation of retrieving, from a point cloud, coordinates of the first subset of feature points. The point cloud may include previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space. The instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that may be evenly distributed in the 3D space.

In some embodiments, the instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of optimizing the coordinates of the second subset of feature points based at least in part on a pose of the camera from which the 2D image may be captured. The pose of the camera may be determined based on data received from a sensor unit of the electronic device. The sensor unit may include an inertial measurement unit (IMU) configured to track the camera pose in the 3D space. The instructions, when executed by the one or more processors, may also cause the electronic device to perform the operation of updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.

In some embodiments, the plurality of feature points may be a first plurality of feature points. The instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of detecting a second plurality of feature points in the 2D image. The instructions, when executed by the one or more processors, may also cause the electronic device to perform the operation of selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined may be evenly distributed in the 2D image.

In some embodiments, the instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of tracking the third subset of feature points in one or more subsequently captured 2D images. The instructions, when executed by the one or more processors, may also cause the electronic device to perform the operation of determining coordinates of the third subset of feature points using triangulation. The instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points.

In some embodiments, the instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of dividing the 2D image into a plurality of areas that have an equal size. The first subset of feature points may be evenly distributed in the 2D image in that the selected feature points may be distributed among different areas of the plurality of areas. In some embodiments, the instructions, when executed by the one or more processors, may further cause the electronic device to perform the operation of dividing a space in the global coordinate map that correspond to the 3D space into a plurality of volumes that have an equal size. The second subset of feature points may be evenly distributed in the 3D space in that the selected feature points may be distributed among different volumes of the plurality of volumes.

Embodiments may further include a non-transitory machine readable medium having instructions for managing feature spatial distribution for localization and mapping using an electronic device having a camera and one or more processors. The instructions may be executable by the one or more processors to cause the electronic device to perform the operation of capturing, using the camera, a two-dimensional (2D) image of a three-dimensional (3D) space. The 2D image may include a plurality of feature points that have been previously detected and tracked by the electronic device. The instructions may be executable by the one or more processors to also cause the electronic device to perform the operation of selecting, from the plurality of feature points, a first subset of feature points that may be evenly distributed in the 2D image. The instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of retrieving, from a point cloud, coordinates of the first subset of feature points. The point cloud may include previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space. The instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that may be evenly distributed in the 3D space.

In some embodiments, the instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of optimizing the coordinates of the second subset of feature points based at least in part on a pose of the camera from which the 2D image may be captured. The pose of the camera may be determined based on data received from a sensor unit of the electronic device. The sensor unit may include an inertial measurement unit (IMU) configured to track the camera pose in the 3D space. The instructions may be executable by the one or more processors to also cause the electronic device to perform the operation of updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.

In some embodiments, the plurality of feature points may be a first plurality of feature points. The instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of detecting a second plurality of feature points in the 2D image. The instructions may be executable by the one or more processors to also cause the electronic device to perform the operation of selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined may be evenly distributed in the 2D image.

In some embodiments, the instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of tracking the third subset of feature points in one or more subsequently captured 2D images. The instructions may be executable by the one or more processors to also cause the electronic device to perform the operation of determining coordinates of the third subset of feature points using triangulation. The instructions may be executable by the one or more processors to further cause the electronic device to perform the operation of adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points.

In some embodiments, the third subset of feature points may be selected such that a number of the first subset of feature points and the third subset of feature points combined may be within a predetermined range. In some embodiments, the instructions, when executed by the one or more processors, further cause the electronic device to perform the operation of dividing the 2D image into a plurality of areas that have an equal size. The first subset of feature points may be evenly distributed in the 2D image in that the selected feature points may be distributed among different areas of the plurality of areas.

Numerous benefits are achieved by way of the present invention over conventional techniques. For example, embodiments of the present disclosure involve methods and systems that provide a feature management mechanism for managing feature point spatial distribution to achieve more accurate and efficient localization and mapping systems. Moreover, some embodiments dynamically select feature points that are not only evenly distributed in 2D images, but also evenly distributed in the 3D space that the 2D images represent. These and other embodiments of the invention along with many of its advantages and features are described in more detail in conjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified illustration of a 3D space in which some embodiments may be implemented.

FIG. 2A schematically illustrates feature points distributed in a 3D space according to an embodiment of the present invention.

FIG. 2B schematically illustrates distribution of the feature points of FIG. 2A in a 2D image representing a front view of the 3D space according to an embodiment of the present invention.

FIG. 2C schematically illustrates distribution of the feature points of FIG. 2A in a 2D image representing a right side view of the 3D space according to an embodiment of the present invention.

FIG. 3 is a block diagram illustrating the functionality of feature management for localization and mapping when implementing an extended reality session according to an embodiment of the present invention.

FIG. 4 is a flow diagram illustrating an embodiment of a feature management method for localization and mapping when implementing an extended reality session according to an embodiment of the present invention.

FIG. 5 depicts a block diagram of an embodiment of a computer system.

In the appended figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Various platforms have been developed for implementing various extended reality technologies, such as VR, AR, MR, etc. Using different application programming interfaces (APIs) , some platforms may enable a user's phone to sense its environment, understand, and interact with the world. Some of the APIs may be available across different operating systems, such as Android and iOS, to enable shared extended reality experiences. Some platforms may utilize three main capabilities, e.g., motion tracking, environmental understanding, and light estimation, to integrate virtual content with the real world as seen through a camera of the user's phone. For example, motion tracking may allow the user's phone or camera to understand and track its position relative to the world. Environmental understanding may allow the phone to detect the size and location of flat horizontal surfaces, e.g., the ground or a coffee table. Light estimation may allow the phone to estimate the environment's current lighting conditions.

Some platforms may allow developers to integrate camera and motion features of a user device to produce augmented reality experiences in applications installed on the user device. The platform may combine device motion tracking, camera scene capture, advanced scene processing, and display conveniences to simplify the task of building an extended reality experience. These techniques may be utilized to create different kinds of extended reality experiences, using a back camera and/or a front camera of the user device.

Motion tracking systems utilized by existing platforms, e.g., simultaneous localization and mapping (SLAM) , visual inertial simultaneous localization and mapping (VISLAM) , or visual inertial odometry (VIO) , may take an image stream and an inertial measurement unit (IMU) data stream in a synced fashion, then, with processing and feature point detection, create a combined output. One category of existing solutions to motion tracking and/or localization and mapping problems mainly relies on the detected feature points in two-dimensional (2D) images.

Generally, the more evenly the detected feature points are distributed, the better the localization and mapping results that may be achieved. Existing localization and mapping systems only rely on feature points in 2D images, and only consider distribution of the feature points in the 2D images. Existing systems do not take into account the distribution of the feature points in the 3D space that the 2D images represent. Nonetheless, distribution of the feature points in both the 2D images and the 3D space can affect the result of localization and mapping. This is because the distribution of feature points in a 2D image depends on the perspective of the camera. Thus, maintaining an even distribution of the detected feature points in the 2D images does not necessarily mean an even or uniform spatial distribution of the feature points in the 3D space, which can have a substantial impact on accuracy and frame rate of the localization and mapping system.

Embodiments described herein provide a feature management mechanism for managing feature point spatial distribution to achieve more accurate and efficient localization and mapping systems. Embodiments described herein dynamically select feature points that are not only evenly distributed in 2D images, but also evenly distributed in the 3D space that the 2D images represent. By continuously monitoring the spatial distribution of the feature points, more accurate localization and mapping results may be achieved. Additionally, by maintaining an even distribution of the feature points throughout the 2D images and the 3D space, diversified feature points that represent the entire 3D space, rather than local regions, may be selected to achieve better optimization results. Further, by selecting feature points that are evenly distributed throughout the 2D images and the 3D space, selection of duplicate feature points (multiple feature points in a single square or cube) may be avoided, and the problem size may be managed or reduced, which may lead to improved processing speed, e.g., faster optimization rate, lower power consumption, and more effective utilization of computing resources.

FIG. 1 is a simplified illustration of a 3D space 100 in which some embodiments may be implemented. Although an indoor space is illustrated in FIG. 1, the various embodiments described herein may also be implemented in an outer environment or other environment where an extended reality session may be carried out.

An extended reality session, such as an AR session, MR session, etc., may be implemented using an electronic device 110. The electronic device 110 may include a camera 112 configured to capture 2D images of the 3D space 100 for implementing the extended reality session and a display 114 for displaying the 2D images captured by the camera 112 and for presenting one or more rendered virtual objects 116. The electronic device 110 may include additional software and/or hardware, including but not limited to an IMU 118, for implementing the extended reality session and various other functions of the electronic device 110, such as motion tracking that allows the electronic device 110 to estimate and track the pose, e.g., position and orientation, of the camera 112 relative to the 3D space.

Although a tablet is illustrated in FIG. 1 as one example of the electronic device 110, the various embodiments described herein may not be limited for implementation on a tablet. The various embodiments may be implemented using many other types of electronic devices, including but not limited to various mobile devices, handheld devices, such as smart phones or tablets, hands-free devices, wearable devices, such as head-mounted displays (HMD) , optical see-through head-mounted displays (OST HMDs) , or smart glasses, or any devices that may be capable of implementing an AR, MR, or other extended reality applications.

In the embodiment shown in FIG. 1, although the electronic device 110 utilizes a display 114 for displaying or rendering the camera view with or without a virtual object overlaid, the embodiments described herein may be implemented without rendering the camera view on a display. For example, some OST-HMD and/or smart glasses may not include a display for displaying or rendering the camera view. The OST-HMD and/or smart glasses may include a transparent display for the user to see the real world environment through the transparent display. Nonetheless, the embodiments described herein may be implemented in OST-HMD and/or smart glasses to achieve more accurate and efficient implementation of the extended reality session.

To implement the extended reality session, an extended reality application, such as an AR and/or MR application that may be stored in a memory of the electronic device 110, may be activated to start the extended reality session. The electronic device 110, via the application, may scan the surrounding environment, such as the 3D space 100, in which the extended reality session may be conducted. The 3D space 100 may be scanned using the camera 112 of the electronic device 110 so as to create a global coordinate map or frame of the 3D space 100. Other sensor inputs, such as inputs received from lidar, radar, sonar, or various other sensors of the electronic device 110, if available, may also be utilized to create the global coordinate map. The global coordinate map may be continuously updated as new inputs may be received from the various sensors of the electronic device 110.

The global coordinate map may be a 3D map that can be used to keep track of the pose of camera 112, e.g., the position and the orientation of the camera 112 relative to the 3D space, as the user moves the electronic device 110 in the 3D space 100. The position may represent or may be indicative of where the camera 112 is and the orientation may represent or may be indicative of the direction in which the camera 112 is pointed or directed. The camera 112 may be moved with three degrees of freedom, resulting in a change in its position, and may be rotated with three degrees of freedom, leading to a change in its orientation. The global coordinate map may also be used to keep track of the positions of various feature points in the 3D space.

Construction of the global coordinate map of the 3D space 100 and/or tracking of the position and orientation of the camera 112 may be performed using various localization and mapping systems, such as SLAM for constructing and/or updating a map of an unknown environment while simultaneously keeping track of the position and orientation of a moving object, such as the camera 112 of the electronic device 110 described herein.

The constructed global coordinate map of the 3D space 100, estimates of the pose of the camera 112, and/or estimates of the positions of the feature points may need to be optimized by the localization and mapping system as new inputs from the camera 112 may be received. As can be appreciated, the ability for the electronic device 110 and/or the camera 112 to stay accurately localized can affect the accuracy and efficiency of the extended reality sessions implemented. As mentioned above, existing localization and mapping techniques mainly rely on detected feature points, e.g.,

corners

122a, 122b, 122c, 122d of the table 120 shown in FIG. 1, in the 2D image of the 3D space to construct the global coordinate map and/or to estimate the camera pose and the positions of the feature points. However, as also mentioned above, an even distribution of the detected feature points in the 2D images does not necessarily mean an even spatial distribution of the feature points in the 3D space, as will be discussed in more detail with reference to FIGS. 2A-2C below. Non-even distribution of the feature points in the 3D space can have a substantial impact on accuracy of the localization and mapping system.

FIG. 2A schematically illustrates feature points distributed in a 3D space according to an embodiment of the present invention. Although a hexahedron is shown in FIG. 2A for purpose of illustration, the 3D space in which an extended reality session may be implemented may be of any form and/or size. FIG. 2B schematically illustrates distribution of the same feature points distributed in a 2D image representing a front view of the 3D space according to an embodiment of the present invention. FIG. 2C schematically illustrates distribution of the same feature points distributed in a 2D image representing a right side view of the 3D space according to an embodiment of the present invention.

As shown in FIG. 2A, the 3D space is divided into a plurality of volumes, e.g., a plurality of cubes, that have an equal, predetermined size. It should be understood that the real or physical 3D space is not divided into a plurality of cubes; rather, it is the corresponding space in the global coordinate map that is divided. Different cube sizes may be defined in multiple layers. For example, the 3D space may be divided into a first plurality of cubes of a first size in a first layer, such as the eight cubes shown in FIG. 2A. Each of the first plurality of cubes of the first size may be further divided into a second plurality of cubes of an equal, second size that is smaller than the first size in a second layer, and each of the second plurality of cubes of the second size may be further divided into a third plurality of cubes of an equal, third size that is smaller than the second size in a third layer, and so forth. Feature points may be tracked in different layers depending on the computational capability, desired resolution, availability of detected feature points, and other factors. Although cubes are described as exemplary volumes that the 3D space may be divided into, the 3D space may be divided into other forms of 3D volumes in various embodiments. Although eight cubes are shown in FIG. 2A for purpose of illustration, the 3D space may be divided into more cubes, such as tens, hundreds, or thousands of cubes, or an even greater number of cubes. Further, although only one layer of division is shown in FIG. 2A for clarity, multiple layers of division may be implemented.

As mentioned above, the more evenly distributed the feature points are, the better, e.g., more accurate, the localization and mapping result may be. In some embodiments, the terms "even distribution, " "evenly distributed, " "uniform distribution, " or "uniformly distributed" are used to mean that the feature points are distributed in different cubes of the plurality of cubes in a layer, although not every cube may have a feature point distributed therein. Further, the cubes that have feature points distributed therein are distributed or spread out throughout the 3D space instead of aggregating in local regions of the 3D space. In some embodiments, "even distribution, " "evenly distributed, " "uniform distribution, " or "uniformly distributed" may mean that the cubes having feature points distributed therein may each include more than one feature point, and/or the cubes having feature points distributed therein are distributed or spread out throughout the 3D space instead of aggregating in local regions of the 3D space. The total number of feature points included in each cube may be less than a predetermined number. The number of feature points included in each cube may be the same or may be different. In some embodiments, when the number of feature points included in each cube is greater than one, the cubes may be further divided into smaller cubes such that the cubes having feature points distributed therein may each include one feature point.

In some embodiments, to determine whether the cubes having feature points distributed therein are distributed or spread out in the 3D space and not aggregated in local regions, after determining that the feature points are distributed among different cubes, the distribution of the feature points or cubes having feature points along each of the three dimensions of the 3D space may also be determined, as will be discussed in more detail below. By selecting and/or tracking feature points that are evenly distributed in the 3D space, e.g., distributed among different cubes, more accurate localization and mapping of the entire 3D space may be achieved.

With further references to FIGS. 2A-2C, four

feature points

202, 204, 206, 208 may be detected and tracked in 2D images of the 3D space, such as the front view of the 3D space shown in FIG. 2B and the right side view of the 3D space shown in FIG. 2C, for localization and mapping. Similar to how the 3D space may be divided into multiple layers, each 2D image may also be divided into multiple layers. For example, each 2D image may be divided into a first plurality of areas, e.g., squares, of an equal, first size in a first layer, such as the four squares shown in FIG. 2B or the four squares shown in FIG. 2C. Each of the first plurality of squares of the first size may be further divided into a second plurality of squares of an equal, second size that is smaller than the first size in a second layer, and each of the second plurality of squares of the second size may be further divided into a third plurality of squares of an equal, third size that is smaller than the second size in a third layer, and so forth.

Although squares are described as exemplary shapes that each 2D image may be divided into, each 2D image may be divided into other forms of 2D areas in various embodiments. Although four squares are shown in FIGS. 2B and 2C for purpose of illustration, each 2D image may be divided into more squares, such as tens, hundreds, or thousands of squares, or an even greater number of squares. Further, although only one layer of division is shown in FIGS. 2B and 2C for clarity, multiple layers of division may be implemented.

When the distribution of the feature points is discussed in the context of 2D images, in some embodiments, even distribution of the feature points means that the feature points are distributed among different squares of the plurality of squares in a layer, and the squares that have feature points distributed therein are spread out throughout the 2D image instead of aggregating in local regions of the 2D image. In some embodiments, to determine whether the squares having feature points distributed therein are distributed or spread out in the 2D image and not aggregated in local regions, after determining that the feature points are distributed among different squares, the distribution along each of the two dimensions of the 2D image may also be determined, as will be discussed in more detail below.

In some embodiments, even distribution of the feature points in the 2D image may mean that the squares that have feature points distributed therein may each include more than one feature point, and/or the squares that have feature points distributed therein are spread out throughout the 2D image instead of aggregating in local regions of the 2D image. The total number of feature points included in each square may be less than a predetermined number. The number of feature points included in each square may be the same or may be different. In some embodiments, when the number of feature points included in each square is greater than one, the squares may be further divided into smaller squares such that the squares having feature points distributed therein may each include one feature point.

With continued reference to FIG. 2B, when observed only from the front view of the 3D space, the feature points 202, 204, 206, 208 appear to be evenly distributed in the 2D image as each feature point is distributed in a different square. Further, the squares having feature points are distributed or spread out throughout the entire 2D image. Specifically, the squares having feature points are distributed evenly along the vertical dimension of the 2D image, and are also distributed evenly along the horizontal dimension of the 2D image. However, referring to FIG. 2C, when observed from the right side view of the 3D space, it can be seen that the feature points 202, 204, 206, 208 are not distributed evenly as feature point 204 and feature point 206 are distributed in a same square, i.e., the bottom left square, of the 2D image of FIG. 2C. Thus, when only 2D positions of feature points may be checked as in many of the existing localization and mapping systems, even distribution of the feature points may not actually be maintained, and the localization and mapping result may be inaccurate because the tracked feature points may appear to be evenly distributed in one 2D image, such as shown in FIG. 2B, but may not in fact be evenly distributed, such as shown in the 2D image of FIG. 2C.

The embodiments of the present invention described herein avoids this issue associated with the existing localization and mapping systems by introducing a feature management mechanism that checks distribution of feature points in both 2D images and the 3D space. By checking the distribution of the feature points in both 2D images and the 3D space, feature points that are evenly distributed in not only the 2D images, but also the 3D space, may be selected for tracking, and more accurate and efficient localization and mapping may be achieved. As an example, in the embodiments described herein, by checking the feature point distribution in the 3D space, feature point 210, instead of feature point 206, may be selected and tracked, along with

feature points

202, 204, 208. This is because feature points 202, 204, 206, 208 are not evenly distributed in the 3D space whereas feature points 202, 204, 208, 210 may be considered as evenly distributed in the 3D space.

Specifically, in the 3D space, feature point 202 is distributed in the front, top, left cube, feature point 204 is distributed in the front, bottom, left cube, feature point 206 is distributed in the front, bottom, right cube, feature point 208 is distributed in the rear, top, right cube, and feature point 210 is distributed in the rear, bottom, right cube. When feature points 202, 204, 206, 208 are selected, although the feature points 202, 204, 206, 208 are distributed in different cubes, the cubes having feature points are distributed in an aggregated manner in the 3D space because three out of the four cubes having feature points are in the front region of the 3D space whereas only one cube having a feature point is in the back region of the 3D space. When feature points 202, 204, 208, 210 are selected, however, not only are the feature points distributed among different cubes, the cubes having feature points are not distributed in an aggregated manner and are distributed evenly along each of the three dimensions of the 3D space. Specifically, two out of the four cubes having

feature points

202, 204 are in the front region of the 3D space, and two out of the four cubes having

feature points

208, 210 are in the rear region of the 3D space; two out of the four cubes having

feature points

202, 204 are in the left region of the 3D space, and two out of the four cubes having

feature points

208, 210 are in the right region of the 3D space; and two out of the four cubes having

feature points

202, 208 are in the top region of the 3D space, and two out of the four cubes having

feature points

204, 210 are in the bottom region of the 3D space.

Thus, by checking the feature point distribution in the 3D space, feature points that are more evenly distributed, e.g., feature points 202, 204, 208, 210 may be selected over feature points that may not be evenly distributed, e.g., feature points 202, 204, 206, 208. By selecting feature points that are evenly distributed in 3D space, even distribution in different 2D images or views of the 3D space, such as the even distribution of the feature points 202, 204, 208, 210 as shown in both FIGS. 2B and 2C, may be maintained, achieving more accurate 2D representation of the 3D space and more accurate and efficient localization and mapping.

FIG. 3 is a block diagram 300 illustrating the functionality of feature management for localization and mapping according to some embodiments. The functionality of the various blocks within the flow chart may be executed by hardware and/or software components of the electronic device 110. It is noted that alternative embodiments may alter the functionality illustrated to add, remove, combine, separate, and/or rearrange the various functions shown. A person of ordinary skill in the art will appreciate such variations.

At block 305, feature points in each of the received 2D images from the stream of image data 302 may be tracked. These tracked feature points may be previously detected and continuously tracked in the received 2D images, including feature points detected in the immediately preceding 2D images and now continuously tracked in the currently received 2D image. The feature points in the 2D images may be detected and tracked using image processing techniques. In some embodiments, the feature points in each 2D image may be detected and tracked based on the color, color intensity or brightness, etc., of one or more pixels and the surrounding pixels. For example, the 2D image may be analyzed to detect adjacent pixels that have the same or similar color and/or color intensity. The size of each group of adjacent pixels may then be analyzed to determine whether each group of pixels may represent or may be categorized as a point, an edge, a plane or surface, etc. For example, when a group of adjacent pixels may be determined to be located within a predetermined radius, the group of adjacent pixels may be categorized as a point, and the corresponding feature represented by the group of adjacent pixels may be categorized as a feature point. The predetermined radius may be measured by a predetermined number of pixels, e.g., 5 pixels, 10 pixels, 15 pixels, 20 pixels, 25 pixels, 50 pixels, 100 pixels, etc., and the number of pixels may be determined based on desired resolution, processing capacity, etc.

As an example, referring back to FIG. 1, the 2D image of the 3D space 100 may be analyzed. Adjacent pixels representing each of the

corners

122a, 122b, 122c, 122d of the table 120 in the 2D image may be determined to have the same or similar color and/or color intensity, and may be further determined to be within the predetermined radius. Thus, the four

corners

122a, 122b, 122c, 122d may be detected as feature points and may be tracked in the captured 2D images of the 3D space. These feature points may be helpful for defining or constructing geometric constraints for implementing the extended session, e.g., geometric constraints of a surface on which a virtual object, such as the virtual object 116 may be rendered. Although feature points are described as exemplary features that may be detected and tracked, the features that may be tracked may not be limited to points. For example, edges, areas, planes, etc., may be detected and tracked depending on the particular application.

At block 310, outliers among the detected and tracked feature points may be removed. Removing the outlier feature points may include pausing or stopping tracking of the feature points. In some embodiments, outlier feature points may include feature points that disturb the even distribution of the feature points in the current 2D image. For example, referring back to FIGS. 2B and 2C, among feature points 202, 204, 206, 208, 210, feature point 206 may be considered as an outlier feature point since feature points 202, 204, 208, 210 are evenly distributed in the 2D images of FIGS. 2B and 2C, and the inclusion of feature point 206 results in a non-even distribution of the feature points. As already discussed above, even distribution may be determined by dividing the 2D image into multiple squares or other appropriate shape of areas, determining whether the feature points are distributed among different squares, and/or examining the distribution of the squares along each dimension of the 2D image. Thus, removing outlier feature points may be performed by selecting the feature points that are evenly distributed in the current 2D image.

In some embodiments, outlier feature points may further include feature points the velocity or moving speed of which may be greater than a predetermined threshold. In some embodiments, the predetermined threshold may be a predetermined velocity value or moving speed. In some embodiments, the predetermined threshold may be a percentage difference in the velocity or moving speed among different feature points. For example, by tracking the feature points, the velocity or speed of the feature points may be determined. Feature points moving above the predetermined threshold velocity or speed may be removed. The velocity or speed variation among the feature points may also be determined. For example, an average velocity or speed of the feature points may be determined. Feature points that move at a velocity or speed greater than the average by more than a predetermined percentage, e.g., more than 30%, more than 40%, more than 50%, more than 60%, more than 70%, more than 80%, more than 90%, or greater, may be removed.

At block 315, new feature points in the current 2D image may be detected. Similar to how the feature points may be detected in the previous 2D images, the feature points in the current 2D image may be detected by analyzing the color, color intensity, size, etc., of adjacent pixels and the surrounding pixels.

At block 320, the newly detected feature points may be selected such that an even distribution of the feature points in the current 2D image may be maintained. In some embodiments, to maintain the even distribution of the feature points in the current 2D image, the distribution of the newly detected feature points may be analyzed by checking whether the newly detected feature points are distributed among different squares of the 2D image and/or checking whether the newly detected feature points are distributed throughout the 2D image or aggregated in one or more local regions as discussed above. Duplicate feature points, i.e., feature points distributed in the same squares, may be removed such that only one feature point in each square may be selected. Aggregated feature points may be reduced such that the distribution of the newly detected feature points throughout the entire 2D image may be even.

In some embodiments, in addition to analyzing the distribution of the newly detected feature points, the distribution of the previously detected and tracked feature points and the newly detected feature points combined may be analyzed. If duplicate feature points may be present, the newly detected feature points may be removed whereas the previously detected and tracked feature points may be retained because the coordinates of the previously detected and tracked feature points may have been determined and optimized multiple times (as will be discussed below) , and thus are more accurate. In some embodiments, among the duplicate feature points, the newly detected feature points may be retained whereas the previously detected and tracked feature points may be removed for other considerations. Once the newly detected feature points are selected, these newly selected feature points may be tracked in the subsequently captured 2D images, similar to how the previously detected feature points are tracked at block 305 in the previous 2D images.

In some embodiments, when selecting the newly detected feature points at block 320, only the distribution of the newly detected feature points may be examined, and the detected feature points may be selected accordingly such that the selected newly detected feature points may be evenly distributed. The distribution of the previously detected and tracked feature points and the selected newly detected feature points combined may be evaluated in another outlier rejection operation at block 310 to remove feature points that disturb the even distribution such that the remaining feature points are evenly distributed.

In some embodiments, when selecting the newly detected feature points at block 320, an appropriate number of the newly detected feature points may be selected such that a combined number of the selected newly detected feature points and the previously detected and continuously tracked feature points in the current 2D image may be within a predetermined range. Thus, an excessive number of the newly detected feature points may not be selected to avoid overburdening the computational capacity of the device implementing the extended reality session.

At block 325, correspondence between the feature points in the current 2D image and their coordinates in the 3D space, e.g., 3D coordinates, may be built. The 3D coordinates may be coordinates in a coordinate map established to map the 3D space and to track the pose of a camera and/or positions of detected and tracked feature points in the 3D space. An exemplary global coordinate map may include the global coordinate map of the 3D space 100 established by the electronic device 110 to map the 3D space 100 and to track the pose of the camera 112 and/or the positions of the detected and tracked feature points, e.g.,

feature points

122a, 122b, 122c, 122d.

The 3D coordinates of a feature point may be determined by triangulation. Specifically, when a feature point is detected and tracked in two or more 2D images, the feature point has been observed by the camera, e.g., camera 112, from two or more different camera poses, i.e., different camera positions and/or different camera orientations. Thus, based on the different camera positions and/or orientations from which the feature point has been observed, the 3D coordinates of the feature point in the 3D global coordinate map may be determined via triangulation. The determined positions or coordinates of the detected and tracked feature points may be stored in a point cloud, e.g., a 3D point cloud. Building correspondence between the feature points in the 2D image and their 3D coordinates may include retrieving from the point cloud the 3D coordinates for the feature points in the current 2D image. As can be appreciated, not all feature points in the current 2D image may have available or known 3D coordinates stored in the point cloud. For example, the 3D coordinates of the newly detected feature points in the current 2D image are not yet known since they have only been observed from one camera pose.

At block 330, the coordinates of the previously detected and tracked feature points may be optimized and/or updated. For example, based on the stream of IMU data 304 received from an IMU sensor configured to track the pose of the camera in the 3D space, e.g., IMU sensor 118 of the electronic device 110 configured to track the pose of the camera 112 in the 3D space 100, the current camera pose may be determined. By receiving the image data 302 and the IMU data 304 in a synched fashion, the positions of the feature points that have been previously detected and tracked in the current 2D image may be optimized and/or updated based on the visual correspondence of the feature points in the current 2D image and the camera pose from which the current 2D image is captured. Since more image data and IMU data have become available, the coordinates of the feature points may be calculated or updated with the new data, and the calculated or updated coordinates may be more accurate and/or refined. As can be appreciated, the coordinates of the feature points may be continuously optimized and/or refined as feature points may be continuously detected from new camera poses and tracked in new 2D images. In some embodiments, the optimization may be performed using optimization modules available from existing localization and mapping systems, such as the optimization module of SLAM, VISLAM, VIO, etc.

In some embodiments, at block 330, the coordinates of the feature points newly detected from the immediately preceding 2D image and continuously tracked in the current 2D image may also be determined based on the camera pose from which the immediately preceding 2D image was taken and the camera pose from which the current 2D image is taken via triangulation. The coordinates of these feature points may be continuously optimized when more image data and IMU data become available. In a similar manner, the positions or coordinates of the feature points newly detected from the current 2D image may be determined and/or optimized if they are continuously tracked in one or more of subsequently captured 2D images. At block 335, the point cloud may be updated by storing the coordinates of the feature points that have been determined and/or optimized at block 330.

In some embodiments, feature points that are evenly distributed in the 3D space may be selected at block 340 prior to optimization at block 330. In these embodiments, only feature points that are evenly distributed in the 3D space may then be optimized at block 330 and updated in the point cloud at block 335. Stated differently, by selecting feature points that are evenly distributed in the 3D space, only a subset of all the detected and tracked feature points in the 2D image may be optimized, which may be particularly beneficial if the optimization module of the localization and mapping may be constrained by the system's processing capacity.

Specifically, at block 340, based on the coordinates retrieved from the point cloud, feature points that have been previously detected and continuously tracked in the current 2D image and that are evenly distributed in the 3D space may be selected. As discussed above, even distribution of feature points in the 3D space may be determined by dividing the 3D space into multiple cubes (or other appropriate forms of 3D volumes) of an equal size in one or multiple layers, and by checking whether the feature points are distributed among different cubes. In some embodiments, the distribution of the cubes having feature points along each of the three dimensions of the 3D space may also be evaluated. For example, the 3D space may have an X dimension, a Y dimension, and a Z dimension. The distribution of the cubes having feature points along each of X dimension, Y dimension, and/or Z dimensions may be evaluated. In some embodiments, the distribution of the cubes along each of the X, Y, and/or Z dimensions may be evaluated based on the respective X, Y, and/or Z coordinates of the feature points inside each cube. For example, the X coordinates of all feature points may be checked to evaluate whether the cubes containing those feature points are evenly distributed in the X dimension, the Y coordinates of all feature points may be checked to evaluate whether the cubes containing those feature points are evenly distributed in the Y dimension, and/or the Z coordinates of all feature points may be checked to evaluate whether the cubes containing those feature points are evenly distributed in the Z dimension. By checking whether the feature points are distributed among different cubes and/or checking the distribution of the cubes having feature points along each of the three dimensions of the 3D space, feature points that are evenly distributed in the 3D space may be determined and selected. The selected feature points may then be used for subsequent processing.

The functionality described above with reference blocks 305-325 and 340 may be collectively referred to as a feature management mechanism 345 that provides several benefits. By selecting newly detected feature points that are evenly distributed in at least the 2D images, such as at block 320, duplicate feature points in the same square may not be selected to efficiently utilize the computational capability and to avoid degrading system performance. Additionally, more accurate optimization and/or localization and mapping results may be achieved by continuously monitoring the spatial distribution of the feature points and dynamically selecting feature points not only evenly distributed in each 2D image, but also evenly distributed in the 3D space. A faster optimization rate, lower power consumption, and less memory and CPU usage may be achieved since the size of optimization problem, i.e., the number of feature points and their corresponding 3D coordinates to be optimized, may be smaller due to the removal of feature points that are not evenly distributed in the 2D images and/or 3D space.

It should be appreciated that the specific steps illustrated in FIG. 3 provide a particular method of illustrating the functionality of feature management for localization and mapping according to an embodiment of the present invention. As noted above, other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 3 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 4 is a flow diagram illustrating an embodiment of a feature management method 400 for improved localization and mapping when implementing an extended reality session according to an embodiment of the present invention. Method 400 can be implemented by, e.g., using the electronic device 110 as described herein. As such, means for performing one or more of the functions illustrated in the various blocks of FIG. 4 may comprise hardware and/or software components of the electronic device 110. As with other figures herein, FIG. 4 is provided as an example. Other embodiments may vary in functionality from the functionality shown. Variations may include performing additional functions, substituting and/or removing select functions, performing functions in a different order or simultaneously, and the like.

At block 405, a 2D image of a 3D space in which an extended reality session may be implemented, e.g., 3D space 100, may be captured by a camera of an electronic device, e.g., camera 112 of the electronic device 110. The 2D image may include feature points that have been detected in one or more previously captured 2D images of the 3D space, and continuously tracked in the current 2D image. The feature points may be detected and/or tracked based on the color, color intensity, size, etc., of one or more adjacent pixels and the surrounding pixels, as discussed above. The 2D image may further include new feature points to be detected in subsequent operations.

At block 410, a subset, e.g., a first subset, of feature points may be selected from the previously detected and continuously tracked feature points. The first subset of feature points may be selected such that they are evenly distributed in the 2D image. As discussed above, to select feature points that are evenly distributed in the 2D image, the 2D image may be divided into a number of areas, e.g., squares, that have an equal size. In some embodiments, multiple layers of image division may be implemented. In some embodiments, the first subset of feature points may be selected such that they are distributed among different squares of the 2D image. Further, the first subset of feature points may be selected such that the squares having feature points distributed therein may be evenly distributed along each of the two dimensions of the 2D image to ensure that the selected feature points are distributed or spread out in the 2D image and not aggregated in local regions. Once the first subset of feature points may be selected from the previously detected and continuously tracked feature points, the remaining feature points may be removed from tracking.

In some embodiments, in addition to removing feature points that are not evenly distributed in the 2D image, method 400 may further include removing feature points that may have a velocity or moving speed greater than a predetermined threshold. The predetermined threshold may be a predetermined velocity or moving speed, or a predetermined percentage difference in the velocity or moving speed among different feature points.

At block 415, new feature points in the current 2D image may be detected. Similar to how feature points may be detected in previously captured 2D images, the new feature points may be detected based on the color, color intensity, size, etc., of one or more adjacent pixels and the surrounding pixels in the current 2D image.

At block 420, a subset, e.g., a second subset, of feature points may be selected from the newly detected feature points. The second subset of feature points may be selected such that the even distribution of the feature points in the current 2D image may be maintained. In some embodiments, to maintain the even distribution of the feature points in the current 2D image, the second subset of feature points may be selected such that they are evenly distributed in the current 2D image. In some embodiments, to maintain the even distribution of the feature points in the current 2D image, the second subset of feature points may be selected such that the first and second subsets of feature points combined may be evenly distributed in the current 2D image. If duplicate feature points are present (i.e., multiple feature points in a single square) when the first and second subsets of feature points are combined, the newly detected feature points may be removed and the previously detected feature points may be retained, or vice versa. [0077] In some embodiments, in addition to maintaining even distribution, when selecting the newly detected feature points, a total number of the first subset of feature points and the selected second subset of feature points combined may be maintained within a predetermined range so that the computation resources is not overburdened. The selected second subset of feature points may be added for tracking in subsequently captured 2D images.

At block 425, coordinates, e.g., 3D coordinates, of the first subset of feature points (i.e., feature points that have been previously detected and continuously tracked and that are evenly distributed in the current 2D image) may be retrieved from a point cloud where previously determined coordinates of feature points may be stored. The coordinates may define locations of the feature points in a 3D coordinate system, e.g., a global coordinate map or frame, of the 3D space. As discussed above, the 3D coordinates of any feature point may be determined by triangulation when a feature point is detected and tracked in two or more 2D images. That is, when the feature point is observed by the camera, e.g., camera 112, from two or more different camera poses, the 3D coordinates of the feature point in the 3D global coordinate map may be determined based on the camera poses via triangulation. Based on the retrieved coordinates, correspondence between the first subset of feature points in the 2D image and their 3D coordinates may be built.

At block 430, based on the retrieved coordinates, a third subset of feature points may be selected from the first subset of feature points such that the selected third subset of feature points are evenly distributed in the 3D space. As discussed above, to select feature points that are evenly distributed in the 3D space, the 3D space may be divided into a number of volumes, e.g., cubes, that have an equal size in one or more layers. In some embodiments, the third subset of feature points may be selected such that they are distributed among different cubes of the 3D space. Further, the third subset of feature points may be selected such that the cubes having feature points distributed therein may be evenly distributed along each of the three dimensions of the 3D space to ensure that the selected feature points are distributed or spread out in the 3D space and not aggregated in local regions. Once the third subset of feature points may be selected from the first subset of feature points, the remaining feature points in the first subset of feature points may be removed from tracking.

At block 435, coordinates of the third subset of feature points may be optimized based at least in part on the new camera pose from which the current 2D image is taken and the current 2D image. The camera pose may be determined based at least in part on the data received from an IMU, e.g., IMU 118 of the electronic device 110, configured to track the camera pose as the camera moves in the 3D space. Because only coordinates of the third subset of feature points that are evenly distributed in the 3D space, instead of the entire set of the previously detected and tracked feature points or the first subset of feature points, may be optimized, the size or total number of feature points for optimization may be smaller, and a higher optimization rate may be achieved.

At block 440, the coordinates of the feature points newly detected from the immediately preceding 2D image and continuously tracked in the current 2D image may be determined. Specifically, the coordinates may be determined based on the camera pose from which the immediately preceding 2D image was taken and the camera pose from which the current 2D image is taken via triangulation.

At block 445, the point cloud may be updated by storing the coordinates of the feature points that have been optimized at block 435 and/or the coordinates of the feature points that have been determined at block 440. After updating the point cloud, method 400 may return to block 405 and another 2D image of the 3D space may be captured. At least some of the second subset of feature points and/or some of the third subset of feature points may be present and tracked in the newly captured 2D image. Method 400 may continue with operations 410-445 to select feature points and to optimize coordinates of the selected feature points to achieve more accurate and efficient localization and mapping.

It should be appreciated that the specific steps illustrated in FIG. 4 provide a particular method of illustrating an embodiment of a feature management method for improved localization and mapping when implementing an extended reality session according to an embodiment of the present invention. As noted above, other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present invention may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 4 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 5 is a simplified block diagram of a computing device 500. Computing device 500 can implement some or all functions, behaviors, and/or capabilities described above that would use electronic storage or processing, as well as other functions, behaviors, or capabilities not expressly described. Computing device 500 includes a processing subsystem 502, a storage subsystem 504, a user interface 506, and/or a communication interface 508. Computing device 500 can also include other components (not explicitly shown) such as a battery, power controllers, and other components operable to provide various enhanced capabilities. In various embodiments, computing device 500 can be implemented in a desktop or laptop computer, mobile device (e.g., tablet computer, smart phone, mobile phone) , wearable device, media device, application specific integrated circuits (ASICs) , digital signal processors (DSPs) , digital signal processing devices (DSPDs) , programmable logic devices (PLDs) , field programmable gate arrays (FPGAs) , processors, controllers, micro-controllers, microprocessors, or electronic units designed to perform a function or combination of functions described above.

Storage subsystem 504 can be implemented using a local storage and/or removable storage medium, e.g., using disk, flash memory (e.g., secure digital card, universal serial bus flash drive) , or any other non-transitory storage medium, or a combination of media, and can include volatile and/or non-volatile storage media. Local storage can include random access memory (RAM) , including dynamic RAM (DRAM) , static RAM (SRAM) , or battery backed up RAM. In some embodiments, storage subsystem 504 can store one or more applications and/or operating system programs to be executed by processing subsystem 502, including programs to implement some or all operations described above that would be performed using a computer. For example, storage subsystem 504 can store one or more code modules 510 for implementing one or more method steps described above.

A firmware and/or software implementation may be implemented with modules (e.g., procedures, functions, and so on) . A machine-readable medium tangibly embodying instructions may be used in implementing methodologies described herein. Code modules 510 (e.g., instructions stored in memory) may be implemented within a processor or external to the processor. As used herein, the term "memory" refers to a type of long term, short term, volatile, nonvolatile, or other storage medium and is not to be limited to any particular type of memory or number of memories or type of media upon which memory is stored.

Moreover, the term "storage medium" or "storage device" may represent one or more memories for storing data, including read only memory (ROM) , RAM, magnetic RAM, core memory, magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information. The term "machine-readable medium" includes, but is not limited to, portable or fixed storage devices, optical storage devices, wireless channels, and/or various other storage mediums capable of storing instruction (s) and/or data.

Furthermore, embodiments may be implemented by hardware, software, scripting languages, firmware, middleware, microcode, hardware description languages, and/or any combination thereof. When implemented in software, firmware, middleware, scripting language, and/or microcode, program code or code segments to perform tasks may be stored in a machine readable medium such as a storage medium. A code segment (e.g., code module 510) or machine-executable instruction may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a script, a class, or a combination of instructions, data structures, and/or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, and/or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted by suitable means including memory sharing, message passing, token passing, network transmission, etc.

Implementation of the techniques, blocks, steps and means described above may be done in various ways. For example, these techniques, blocks, steps and means may be implemented in hardware, software, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more ASICs, DSPs, DSPDs, PLDs, FPGAs, processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above, and/or a combination thereof.

Each code module 510 may comprise sets of instructions (codes) embodied on a computer-readable medium that directs a processor of a computing device 500 to perform corresponding actions. The instructions may be configured to run in sequential order, in parallel (such as under different processing threads) , or in a combination thereof. After loading a code module 510 on a general purpose computer system, the general purpose computer is transformed into a special purpose computer system.

Computer programs incorporating various features described herein (e.g., in one or more code modules 510) may be encoded and stored on various computer readable storage media. Computer readable media encoded with the program code may be packaged with a compatible electronic device, or the program code may be provided separately from electronic devices (e.g., via Internet download or as a separately packaged computer readable storage medium) . Storage subsystem 504 can also store information useful for establishing network connections using the communication interface 508.

User interface 506 can include input devices (e.g., touch pad, touch screen, scroll wheel, click wheel, dial, button, switch, keypad, microphone, etc. ) , as well as output devices (e.g., video screen, indicator lights, speakers, headphone jacks, virtual-or augmented-reality display, etc. ) , together with supporting electronics (e.g., digital to analog or analog to digital converters, signal processors, etc. ) . A user can operate input devices of user interface 506 to invoke the functionality of computing device 500 and can view and/or hear output from computing device 500 via output devices of user interface 506. For some embodiments, the user interface 506 might not be present (e.g., for a process using an ASIC) .

Processing subsystem 502 can be implemented as one or more processors (e.g., integrated circuits, one or more single core or multi core microprocessors, microcontrollers, central processing unit, graphics processing unit, etc. ) . In operation, processing subsystem 502 can control the operation of computing device 500. In some embodiments, processing subsystem 502 can execute a variety of programs in response to program code and can maintain multiple concurrently executing programs or processes. At a given time, some or all of a program code to be executed can reside in processing subsystem 502 and/or in storage media, such as storage subsystem 504. Through programming, processing subsystem 502 can provide various functionality for computing device 500. Processing subsystem 502 can also execute other programs to control other functions of computing device 500, including programs that may be stored in storage subsystem 504.

Communication interface 508 can provide voice and/or data communication capability for computing device 500. In some embodiments, communication interface 508 can include radio frequency (RF) transceiver components for accessing wireless data networks (e.g., Wi-Fi network; 3G, 4G/LTE; etc. ) , mobile communication technologies, components for short range wireless communication (e.g., using Bluetooth communication standards, NFC, etc. ) , other components, or combinations of technologies. In some embodiments, communication interface 508 can provide wired connectivity (e.g., universal serial bus, Ethernet, universal asynchronous receiver/transmitter, etc. ) in addition to, or in lieu of, a wireless interface. Communication interface 508 can be implemented using a combination of hardware (e.g., driver circuits, antennas, modulators/demodulators, encoders/decoders, and other analog and/or digital signal processing circuits) and software components. In some embodiments, communication interface 508 can support multiple communication channels concurrently. In some embodiments the communication interface 508 is not used.

It will be appreciated that computing device 500 is illustrative and that variations and modifications are possible. A computing device can have various functionality not specifically described (e.g., voice communication via cellular telephone networks) and can include components appropriate to such functionality.

Further, while the computing device 500 is described with reference to particular blocks, it is to be understood that these blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts. For example, the processing subsystem 502, the storage subsystem, the user interface 506, and/or the communication interface 508 can be in one device or distributed among multiple devices.

Further, the blocks need not correspond to physically distinct components. Blocks can be configured to perform various operations, e.g., by programming a processor or providing appropriate control circuitry, and various blocks might or might not be reconfigurable depending on how an initial configuration is obtained. Embodiments can be realized in a variety of apparatus including electronic devices implemented using a combination of circuitry and software. Electronic devices described herein can be implemented using computing device 500.

Various features described herein, e.g., methods, apparatus, computer readable media and the like, can be realized using a combination of dedicated components, programmable processors, and/or other programmable devices. Processes described herein can be implemented on the same processor or different processors. Where components are described as being configured to perform certain operations, such configuration can be accomplished, e.g., by designing electronic circuits to perform the operation, by programming programmable electronic circuits (such as microprocessors) to perform the operation, or a combination thereof. Further, while the embodiments described above may make reference to specific hardware and software components, those skilled in the art will appreciate that different combinations of hardware and/or software components may also be used and that particular operations described as being implemented in hardware might be implemented in software or vice versa.

Specific details are given in the above description to provide an understanding of the embodiments. However, it is understood that the embodiments may be practiced without these specific details. In some instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

While the principles of the disclosure have been described above in connection with specific apparatus and methods, it is to be understood that this description is made only by way of example and not as limitation on the scope of the disclosure. Embodiments were chosen and described in order to explain the principles of the invention and practical applications to enable others skilled in the art to utilize the invention in various embodiments and with various modifications, as are suited to a particular use contemplated. It will be appreciated that the description is intended to cover modifications and equivalents.

Also, it is noted that the embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed, but could have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.

A recitation of "a" , "an" , or "the" is intended to mean "one or more" unless specifically indicated to the contrary. Conditional language used herein, such as, among others, "can, " "could, " "might, " "may, " "e.g., " and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example. The terms "comprising, " "including, " "having, " and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term "or" is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term "or" means one, some, or all of the elements in the list. The use of "adapted to" or "configured to" herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of "based on" is meant to be open and inclusive, in that a process, step, calculation, or other action "based on" one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of "based at least in part on" is meant to be open and inclusive, in that a process, step, calculation, or other action "based at least in part on" one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

Claims

A method of managing feature spatial distribution for localization and mapping, the method comprising:

capturing a two-dimensional (2D) image of a three-dimensional (3D) space, wherein the 2D image includes a plurality of feature points that have been previously detected and tracked;

selecting, from the plurality of feature points, a first subset of feature points that are evenly distributed in the 2D image;

retrieving, from a point cloud, coordinates of the first subset of feature points, wherein the point cloud includes previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space;

selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that are evenly distributed in the 3D space;

optimizing the coordinates of the second subset of feature points based at least in part on a camera pose from which the 2D image is captured, wherein the camera pose is determined based on data received from an inertial measurement unit (IMU) configured to track the camera pose in the 3D space; and

updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.
The method of claim 1, wherein the plurality of feature points is a first plurality of feature points, the method further comprising:

detecting a second plurality of feature points in the 2D image; and

selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined are evenly distributed in the 2D image.
The method of claim 2, wherein the third subset of feature points are selected such that a number of the first subset of feature points and the third subset of feature points combined is within a predetermined range.
The method of claim 2, further comprising:

tracking the third subset of feature points in one or more subsequently captured 2D images;

determining coordinates of the third subset of feature points using triangulation; and

adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points.
The method of claim 4, further comprising:

capturing another 2D image of the 3D space, wherein the another 2D image includes at least a portion of the second subset of feature points or a portion of the third subset of feature points;

selecting, from at least the portion of the second subset of feature points or the portion of the third subset of feature points combined, a fourth subset of feature points that are evenly distributed in the another 2D image;

retrieving, from the point cloud, the coordinates of the fourth subset of feature points;

selecting, from the fourth subset of feature points, a fifth subset of feature points such that the fifth subset of feature points are evenly distributed in the 3D space; and

optimizing the coordinates of the fifth subset of feature points based at least in part on a camera pose from which the another 2D image is captured.
The method of claim 1, wherein the plurality of feature points are detected based on at least one of color or color intensity of one or more adjacent pixels representing each of the plurality of feature points and surrounding pixels in one or more previously captured 2D images.
The method of claim 1, further comprising dividing the 2D image into a plurality of areas that have an equal size, wherein the first subset of feature points are evenly distributed in the 2D image in that the selected feature points are distributed among different areas of the plurality of areas.
The method of claim 1, further comprising dividing a space in the global coordinate map that corresponds to the 3D space into a plurality of volumes that have an equal size, wherein the second subset of feature points are evenly distributed in the 3D space in that the selected feature points are distributed among different volumes of the plurality of volumes.
An electronic device for managing feature spatial distribution for localization and mapping, the electronic device comprising:

a camera;

one or more processors; and

a memory having instructions that, when executed by the one or more processors, cause the electronic device to perform the following operations:

capturing, using the camera, a two-dimensional (2D) image of a three-dimensional (3D) space, wherein the 2D image includes a plurality of feature points that have been previously detected and tracked by the electronic device;

selecting, from the plurality of feature points, a first subset of feature points that are evenly distributed in the 2D image;

retrieving, from a point cloud, coordinates of the first subset of feature points, wherein the point cloud includes previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space; and

selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that are evenly distributed in the 3D space.
The electronic device of claim 9, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operations:

optimizing the coordinates of the second subset of feature points based at least in part on a pose of the camera from which the 2D image is captured, wherein:

the pose of the camera is determined based on data received from a sensor unit of the electronic device;

the sensor unit comprises an inertial measurement unit (IMU) configured to track the camera pose in the 3D space; and

updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.
The electronic device of claim 9, wherein the plurality of feature points is a first plurality of feature points, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operations:

detecting a second plurality of feature points in the 2D image; and

selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined are evenly distributed in the 2D image.
The electronic device of claim 11, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operations:

tracking the third subset of feature points in one or more subsequently captured 2D images;

determining coordinates of the third subset of feature points using triangulation; and

adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points.
The electronic device of claim 9, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operation:

dividing the 2D image into a plurality of areas that have an equal size, wherein the first subset of feature points are evenly distributed in the 2D image in that the selected feature points are distributed among different areas of the plurality of areas.
The electronic device of claim 9, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operation:

dividing a space in the global coordinate map that correspond to the 3D space into a plurality of volumes that have an equal size, wherein the second subset of feature points are evenly distributed in the 3D space in that the selected feature points are distributed among different volumes of the plurality of volumes.
A non-transitory machine readable medium having instructions for managing feature spatial distribution for localization and mapping using an electronic device having a camera and one or more processors, wherein the instructions are executable by the one or more processors to cause the electronic device to perform the following operations:

capturing, using the camera, a two-dimensional (2D) image of a three-dimensional (3D) space, wherein the 2D image includes a plurality of feature points that have been previously detected and tracked by the electronic device;

selecting, from the plurality of feature points, a first subset of feature points that are evenly distributed in the 2D image;

retrieving, from a point cloud, coordinates of the first subset of feature points, wherein the point cloud includes previously determined coordinates that define locations of at least the first subset of feature points in a global coordinate map of the 3D space; and

selecting, from the first subset of feature points and based at least in part on the retrieved coordinates, a second subset of feature points that are evenly distributed in the 3D space.
The non-transitory machine readable medium of claim 15, wherein the instructions are executable by the one or more processors to further cause the electronic device to perform the following operations:

optimizing the coordinates of the second subset of feature points based at least in part on a pose of the camera from which the 2D image is captured, wherein:

the pose of the camera is determined based on data received from a sensor unit of the electronic device;

the sensor unit comprises an inertial measurement unit (IMU) configured to track the camera pose in the 3D space; and

updating the coordinates of the second subset of feature points in the point cloud with the optimized coordinates of the second subset of feature points.
The non-transitory machine readable medium of claim 15, wherein the plurality of feature points is a first plurality of feature points, wherein the instructions are executable by the one or more processors to further cause the electronic device to perform the following operations:

detecting a second plurality of feature points in the 2D image; and

selecting, from the second plurality of feature points, a third subset of feature points such that the first subset of feature points and the third subset of feature points combined are evenly distributed in the 2D image.
The non-transitory machine readable medium of claim 17, wherein the instructions are executable by the one or more processors to further cause the electronic device to perform the following operations:

tracking the third subset of feature points in one or more subsequently captured 2D images;

determining coordinates of the third subset of feature points using triangulation; and

adding, to the point cloud, the third subset of feature points and the corresponding coordinates of the third subset of feature points.
The non-transitory machine readable medium of claim 15, wherein the third subset of feature points are selected such that a number of the first subset of feature points and the third subset of feature points combined is within a predetermined range.
The non-transitory machine readable medium of claim 15, wherein the instructions, when executed by the one or more processors, further cause the electronic device to perform the following operation:

dividing the 2D image into a plurality of areas that have an equal size, wherein the first subset of feature points are evenly distributed in the 2D image in that the selected feature points are distributed among different areas of the plurality of areas.