WO2021160095A1

WO2021160095A1 - Surface detection and tracking in augmented reality session based on sparse representation

Info

Publication number: WO2021160095A1
Application number: PCT/CN2021/076047
Authority: WO
Inventors: Jiangshan TIAN
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-02-13
Filing date: 2021-02-08
Publication date: 2021-08-19
Also published as: CN115023743A

Abstract

Techniques for detecting and tracking surface planes in AR sessions are described. In an example, a computer system generates a sparse 3D representation of a real-world environment in an AR session and performs a surface detection process based on the sparse 3D representation. Points from the sparse 3D representation are tracked in a next sparse 3D representation. The number of such points that also belong to a detected surface plane is determined. If the number of points exceeds a predefined threshold, the surface plane is tracked based on such points. Otherwise, the surface plane is no longer tracked and the next sparse 3D representation can be input to the surface detection process for a new surface detection in the AR session.

Description

[Title established by the ISA under Rule 37.2] SURFACE DETECTION AND TRACKING IN AUGMENTED REALITY SESSION BASED ON SPARSE REPRESENTATION

BACKGROUND OF THE INVENTION

Augmented Reality (AR) superimposes virtual content over a user’s view of the real world. With the development of AR software development kits (SDK) , the mobile industry has brought smartphone AR to the mainstream. An AR SDK typically provides six degrees-of-freedom (6DoF) tracking capability. A user can scan the environment using a smartphone’s camera, and the smartphone performs visual inertial odometry (VIO) in real time. Once the camera pose is tracked continuously, virtual objects can be placed into the AR scene to create an illusion that real objects and virtual objects are merged together.

Presenting a virtual object in an AR scene may involve detecting a surface plane on which the virtual object is to be placed. Such a surface plane may be tracked over time to update the relative placement of the virtual object.

SUMMARY OF THE INVENTION

The present invention relates generally to methods and systems for detecting and tracking surface planes in AR sessions.

In an example, a method is implemented by a computer system in an augmented reality (AR) session. The method includes determining points belonging to a surface plane, the surface plane detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in the AR session based on a first image of the real-world environment. The method also includes determining a second multi-dimensional representation of the real-world environment, the second multi-dimensional representation generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image. The method also includes determining a number of the points that are also included in the second multi-dimensional representation. The method also includes comparing the number with a threshold, and tracking the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.

In an example, the first multi-dimensional representation includes a first point cloud corresponding to features detected in the real-world environment. Each point corresponds to a feature and is associated with a point identifier. In this example, determining the number of the points includes: determining that a first point belonging to the surface plane is associated with a first point identifier, determining, based on the first point identifier, that the first point is included in a second point cloud corresponding to the second multi-dimensional representation, and incrementing the number of the points based on the determination that the first point is included in the second point cloud. In this example, the method can further include detecting, in a first feature detection iteration, where the features are based on the first image and on first inertial measurement unit (IMU) data, generating the first point cloud based on the detected features, and assigning, for each point in the first point cloud, a different point identifier. In addition, the method can include detecting, in a second feature detection iteration, where a first feature of the features is based on the second image and on second IMU data, detecting, in the second feature detection iteration and based on the second image and the second IMU data, a second feature that was not detected in the first feature detection iteration, generating a second point cloud, where the second point cloud including a first point corresponding to the first feature and a second point corresponding to the second feature, and where the first point also included in the first point cloud, maintaining a first point identifier assigned to the first point, and assigning a second point identifier to the second point.

In an example, the method further includes generating the first multi-dimensional representation based on an execution of a simultaneous localization and mapping (SLAM) process, wherein the first image is input to the SLAM process, and inputting the first multi-dimensional representation to a random sample consensus (RANSAC) process. The surface plane is detected based on an execution of the RANSAC process using the first multi-dimensional representation. In this example, the method can also include assigning a different point identifier to each point in the first multi-dimensional representation, determining that a first point in the first multi-dimensional representation belongs to the surface plane, and associating the surface plane with a first point identifier of the first point. Further, the determining the number of the points includes determining, based on the first point identifier, that the first point is included in the second multi-dimensional representation, and incrementing the number of the points based on the determination that the first point is included in the second multi-dimensional representation.

In an example, tracking the surface plane includes determining a first set of points from the points belonging to the surface plane, where the first set of points is present in the second multi-dimensional representation, and updating a plane function of the surface plane based on the first set of points. In this example, tracking the surface plane further includes determining a second set of points that belongs to the surface plane and that is in the second multi-dimensional representation but not in the first multi-dimensional representation, and associating the second set of points with the surface plane.

In an example, a computer system includes one or more processors and one or more memories storing computer-readable instructions that, upon execution by the one or more processors, configure the computer system to perform operations. The operations include determining points belonging to a surface plane, where the surface plane is detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in an augmented reality (AR) session based on a first image of the real-world environment. The operations also include determining a second multi-dimensional representation of the real-world environment, where the second multi-dimensional representation is generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image. The operations also include determining a number of the points that are also included in the second multi-dimensional representation. The operations also include comparing the number with a threshold, and tracking the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.

In an example, the execution of the computer-readable instructions further configures the computer system to determine, from the second multi-dimensional representation, second points that belong the surface plane, determine a third multi-dimensional representation of the real-world environment, where the third multi-dimensional representation is generated in the AR session based on a third image of the real-world environment, the third image generated subsequent to the first image, determine a second number of the second points that are also included in the first multi-dimensional representation, compare the second number with the threshold, and determine that the surface plane is no longer to be tracked in the AR session based on the second number being smaller than the threshold. In this example, the execution of the computer-readable instructions further configures the computer system to determine that a total number of tracked surface planes based on comparisons with the threshold is smaller than a second threshold, and detect, based on the total number being smaller than the second threshold, a second surface plane by at least inputting the third multi-dimensional representation to a random sample consensus (RANSAC) process. Further, the second surface plane is detected based on a set of constraints to the RANSAC process, the set of constraints specifying a minimum number of points from the third multi-dimensional representation. In addition, the set of constraints further specifies a maximum distance between a point belonging to the second surface plane and a camera of the computer system, where the camera is configured to generate the third image.

In an example, one or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations. The operations include determining points belonging to a surface plane, where the surface plane is detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in an augmented reality (AR) session based on a first image of the real-world environment. The operations also include determining a second multi-dimensional representation of the real-world environment, where the second multi-dimensional representation is generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image. The operations also include determining a number of the points that are also included in the second multi-dimensional representation. The operations also include comparing the number with a threshold, and tracking the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.

In an example, the operations further include generating the first multi-dimensional representation based on an execution of a simultaneous localization and mapping (SLAM) process, wherein the first image is input to the SLAM process, and inputting the first multi-dimensional representation to a random sample consensus (RANSAC) process. The surface plane is detected based on an execution of the RANSAC process using the first multi-dimensional representation. In this example, the operations can further include assigning a different point identifier to each point in the first multi-dimensional representation, determining that a first point in the first multi-dimensional representation belongs to the surface plane, and associating the surface plane with a first point identifier of the first point. Further, determining the number of the points includes determining, based on the first point identifier, that the first point is included in the second multi-dimensional representation; , and incrementing the number of the points based on the determination that the first point is included in the second multi-dimensional representation.

In an example, the threshold is set to a value equal to or larger than three.

Numerous benefits are achieved by way of the present invention over conventional techniques. For example, embodiments of the present disclosure involve methods and systems for surface detection and tracking techniques, where these techniques provide substantial processing improvements over conventional techniques. Depending on the configuration of the computer system implementing the techniques of the present disclosure, the latency associated with the detection and tracking of a surface plane can be reduced by a factor of two or more.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:

FIG. 1 illustrates an example of a computer system that includes a camera and an inertial measurement unit (IMU) sensor for AR applications, according to at least one embodiment of the disclosure;

FIG. 2 illustrates an example of tracking a surface plane, according to at least one embodiment of the disclosure;

FIG. 3 illustrates an example of computing components for detecting and tracking a surface plane, according to at least one embodiment of the disclosure;

FIG. 4 illustrates an example of an initial surface detection and tracking iteration in an AR session, according to at least one embodiment of the disclosure;

FIG. 5 illustrates an example of a current surface detection and tracking iteration in an AR session, according to at least one embodiment of the disclosure;

FIG. 6 illustrates an example of surface plane tracking based on point identifiers and associations with surface planes, according to at least one embodiment of the disclosure;

FIG. 7 illustrates an example of a determination of whether a surface detection process is to be repeated, according to at least one embodiment of the disclosure;

FIG. 8 illustrates an example of a flow for surface plane detection and tracking, according to at least one embodiment of the disclosure; and

FIG. 9 illustrates examples of components of a computer system, according to at least one embodiment of the disclosure.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Embodiments of the present disclosure are directed to, among other things, detecting and tracking surface planes in AR sessions. In an example, a computer system includes a camera, an IMU sensor, and an AR module. The AR module hosts a feature detection and tracking process, a surface detection process, and a surface tracking process. During an AR session, the camera generates an image of a real-world environment. Image data of the image and IMU data generated by the IMU sensor are input to the feature detection and tracking process that, in turn, outputs a multi-dimensional representation of the real-world environment. The multi-dimensional representation of the real-world environment is input to the surface detection process that, in turn, outputs data identifying a detected surface plane. The data includes points from the multi-dimensional representation and belonging to the plane. The tracking process associates the detected plane with identifiers of the points. Such identifiers are referred to herein as point identifiers. A next image is generated and input along with the next IMU data to the feature detection and tracking process that, in turn, outputs a next multi-dimensional representation of the real-world environment. By using the point identifiers, the tracking process determines particular points that were previously determined to belong to the surface plan and that are present in the next multi-dimensional representation. If the number of such particular points exceeds a first predefined threshold, the tracking system updates a plan function of the surface plane based on the next multi-dimensional representation. Otherwise, the surface tracking process outputs a decision to no longer track the surface plane. In this case, the total number of tracked surface planes is updated (e.g., reduced by at least one) and compared to a second predefined threshold. Only if smaller than the second predefined threshold, the next multi-dimensional representation is input to the surface detection process for a new detection of one or more surface planes.

To illustrate, consider an example of a real-world environment that includes an open cabinet that includes four horizontal shelves. In an AR session executing on a smartphone, different virtual objects representing books, vases, and/or other items are to be placed on the shelves. Images generated by a camera of the AR session may be input to a simultaneous localization and mapping (SLAM) process at a frame rate between twenty and thirty frame per second (FPS) . The SLAM process is an example of the feature detection and tracking process. The processing of each input image corresponds to a surface detection and tracking iteration (e.g., the surface detection and tracking iteration has the same rate of twenty to thirty iterations per second (ITS) ) . In the initial surface detection and tracking iteration, the SLAM process outputs an initial point cloud of the real-world environment. The initial point cloud is an example of a sparse three-dimensional representation of the real-world environment and includes points corresponding to features of the open cabinet. The number of points in the initial point cloud can be between a hundred and a thousand. Each point is assigned a point identifier, which is a string that uniquely identified the point. The initial point cloud is input to a random sample consensus (RANSAC) process, which is an example of the feature detection and tracking process. The RANSAC process outputs parameters defining four horizontal surface planes, each corresponding to one of the four shelves. For each horizontal surface plane, the points belonging to the horizontal surface plane are identified and their point identifiers are associated with the horizontal surface plane. In a next surface detection and tracking iteration (e.g., one using a next image input) , the SLAM process outputs a next point cloud of the real-world environment. The SLAM process tracks the features of the open cabinet. Some of the previously detected features are detected again. Some of the previously detected features are no longer detected. And new features that were not previously detected are currently detected. Hence, some of the points in the next point cloud correspond to previously detected features and have existing point identifiers. Remaining points of the next point cloud correspond to newly detected features and are assigned new point identifiers. And some of the points that were in the initial point cloud are no longer found in the next point cloud. For each horizontal surface plane, the surface tracking process determines the point identifiers associated with the horizontal surface plane, compares these point identifiers to the point identifiers of the points from the next point cloud, and determines matches. If the number of matched point identifiers (e.g., matched points that are points belonging to the horizontal surface plane as determined from the initial surface detection and tracking iteration and included in the next point cloud) exceeds the first predefined threshold (e.g., five) , the surface tracking process updates the plane function of the horizontal surface plane using the matched points. New point identifiers of points belonging to the horizontal surface plane are associated with the horizontal surface plane. However, if the number of matched point identifiers is smaller than the first predefined threshold, the horizontal surface plane is no longer tracked. In this case, if the total number of horizontal surface planes to track in the next surface detection and tracking iteration drops below the second predefined threshold (e.g., three) , the next point cloud is input to the RANSAC process for a new detection of surface planes. The surface detection and tracking iterations are repeated, where the input and output of each iteration depends on whether surface planes continue to be tracked using the point identifiers of whether the RANSAC process is to be executed again.

Embodiments of the present disclosure provide many technical advantages over conventional techniques that detect and track surfaces. By using a multi-dimensional representation of the real-world environment, especially a sparse representation such as a point cloud, as input to the surface detection process, such as to the RANSAC process, the processing to detect surface planes is significantly reduced. In addition, by using point identifiers to intelligently determine when to execute the surface detection process and avoid its execution when unnecessary, additional processing savings are possible. For instance, and depending on the configuration of the computer system implementing the techniques of the present disclosure, the latency associated with the detection and tracking of surface planes can be reduced by a factor of two or more. A reduction by at least five times has been observed.

FIG. 1 illustrates an example of a computer system 110 that includes a camera 112 and an inertial measurement unit (IMU) sensor 114 for AR applications, according to at least one embodiment of the disclosure. The AR applications can be implemented by an AR module 116 of the computer system 110. Generally, the camera 112 generates images of a real-world environment that includes, for instance, a real-world object 130. The camera 112 can also include a depth sensor that generates depth data about the real-world environment, where this data includes, for instance, a depth map that shows depth (s) of the real-world object 130 (e.g., distance (s) between the depth sensor and the real-world object 130) . The IMU sensor 114 can include a gyroscope and an accelerometer, among other components, and can output IMU data including, for instance, an orientation of the computer system 110. Image data of the images generated by the camera 112 in an AR session and the IMU data generated by the IMU sensor 114 in the AR session can be used to determine a 6DoF pose (e.g., position along the X, Y, and Z axes and rotation along each of such axes) of the computer system 110 relative to the real-world environment.

Following an initialization of the AR session (where this initialization can include calibration and tracking) , the AR module 116 renders an AR scene 120 of the of the real-world environment in the AR session, where this AR scene 120 can be presented at a graphical user interface (GUI) on a display of the computer system 110. The AR scene 120 shows a real-world object representation 122 of the real-world object 130. In addition, the AR scene 120 shows a virtual object 124 not present in the real-world environment. To place the virtual object 124 on the real-world object representation 122 in a proper manner, the AR module 116 can detect a surface plane 126 corresponding to the real-world object representation 122 and position the virtual object relative to the surface plane 126 (e.g., place the virtual object 124 on the surface plane 126) . The surface plane 126 may be a horizontal plane, a vertical plane, or any other plane that may be at an angle and that may correspond to a visible surface of the real-world object 130.

In an example, the computer system 110 represents a suitable user device that includes, in addition to the camera 112 and the IMU sensor 114, one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present disclosure. For instance, the computer system 110 can be any of a smartphone, a tablet, an AR headset, or a wearable AR device.

The AR module 116 can be implemented as specialized hardware and/or a combination of hardware and software (e.g., general purpose processor and computer-readable instructions stored in memory and executable by the general purpose processor) . In addition to initializing an AR session and performing VIO, the AR module 116 can detect features of the real-world environment, detect surface planes based on the detected features, and track the detected features and the detected surface planes to properly render the AR scene 120. For instance, the AR module 116 implements a feature detection and tracking process, a surface detection process, and a surface tracking process as a set of program codes.

FIG. 2 illustrates an example of tracking a surface plane 202, according to at least one embodiment of the disclosure. As illustrated, the surface plane 202 is tracked based on different images generated in an AR session. A first image 210 shows the surface plane 202. A second image 220 also shows the surface plane 202, except that the surface plane 202 is now partially occluded by another object. In a third image 230, the surface plane 202 is fully or almost fully occluded by the other object.

The surface plane 202 can be tracked using different techniques. In one technique, a surface detection process can be executed for each of the images 210-230. In other words, the surface detection process is repeated for each of the images 210-230. However, this technique may necessitate a large amount of processing because the surface detection process is continuously repeated. In addition, some of the processing may be wasteful because, for instance, the surface plane 202 is fully occluded in the third image 230 and need not be tracked.

In another example technique, the surface detection process may be executed on the first image 210 to detect the surface plane 202. From that point, high resolutions representations of the real-world environment may be generated from the remaining images 220-230 to track the surface plane 202. Such high resolution representations are typically not an output of a feature detection and tracking process (which, instead, outputs a low resolution representation) . Accordingly, a large amount of processing may still be necessitated.

In comparison thereto, embodiments of the present disclosure allow the re-use of the low resolution representations of the feature detection and tracking process to detect the surface plane 202 and associate point identifiers therewith. The point identifiers are used to track the surface plane. Accordingly, significant processing is saved in the detection and tracking of the surface plane 202.

FIG. 3 illustrates an example of computing components for detecting and tracking a surface plane, according to at least one embodiment of the disclosure. The computing components may be implemented as program code in an AR module of a computer system, such as the AR module 116 of FIG. 1.

As illustrated, the computing components include a feature detection and tracking process 310, a surface detection process 320, and a surface tracking process 330. The feature detection and tracking process 310 detects and tracks features of a real-world environment. The surface detection process 320 detects surface planes. And the surface tracking process 330 tracks detected surface planes.

In an example, the input to the feature detection and tracking process 310 includes image data and IMU data. The output of the feature detection and tracking process 310 includes a sparse multi-dimensional representation (e.g., three dimensional representation) of the real-world environment, such as a point cloud. For instance, the feature detection and tracking process 310 is implemented as a SLAM process that involves a particle filter, an extended Kalman filter, a covariance intersection, and/or other SLAM algorithms.

The input to the surface detection process 320 includes the sparse multi-dimensional representation that is output from the feature detection and tracking process 310. The output of the surface detection process 320 includes parameters of a detected surface plane. For instance, the surface plane is defined as a plane function having an equation of: Ax+By+Cz+D=0, and the output includes the A, B, C, and D parameters. In an illustration, the surface detection process 320 is implemented as a RANSAC process that involves an iterative estimation of such parameters. In particular, the RANSAC process select three points (e.g., x, y, and z) from the sparse multi-dimensional representation as a candidate surface plane, resolves the plane function equation to determine the A, B, C, and D parameters, and computes the distance of each of the remaining points of the sparse multi-dimensional representation to the candidate surface plane. For points that have a distance smaller than a predefined threshold distance, such points are found to belong to the candidate surface plane. If the number of points belonging to the candidate surface plane exceeds a minimum number (which can be set as a constraint on the RANSAC process) , such as five or some other value, the candidate surface plane is declared as a detected surface plane. Otherwise, the candidate surface plane is removed from the set of candidate surface planes and is not declared as a detected surface plane.

The input to the surface tracking process 330 includes the output of feature detection and tracking process 310 (e.g., the sparse multi-dimensional representation) and the output of the surface detection process 320 (e.g., the parameters of the detected surface planes) . The output of the surface tracking process 330 includes point identifiers of points from the sparse multi-dimensional representation, association of some of the point identifiers with detected surface planes, and a tracking decision indicating whether a detected surface plane is to be tracked using associated point identifiers or by performing the surface detection process again. Example implementations of the surface tracking process 330 are further described in connection with the next figures.

FIG. 4 illustrates an example of an initial surface detection and tracking iteration 400 in an AR session, according to at least one embodiment of the disclosure. Here, the feature detection and tracking process 310, the surface detection process 320, and the surface tracking process 330 can be used.

Prior to the initial surface detection and tracking iteration 400, no features of the real-world environment have been detected yet in the AR session. At the end of the initial surface detection and tracking iteration 400, one or more surface planes are detected based on detected features.

In an example, an image of the real-world environment is generated and is input to the feature detection and tracking process 310 (this input is shows as image data 402 in FIG. 4) . IMU data 404 is also input to the feature detection and tracking process 310. A sparse three-dimensional (3D) representation 412 of the real-world environment, such as a point cloud that includes between a hundred and a thousand points, is generated and output by the feature detection and tracking process 310. Each of the points corresponds to a detected feature. The surface tracking process 330 (or another process of the AR session) assigns a point identifier to each point of the sparse 3D representation 412. For instance, the point identifier can be the same or can be derived from the identifier of the corresponding feature (referred to herein as a feature identifier) , where the feature identifier is used by the feature detection and tracking process to track the detected features in subsequent iterations.

The sparse 3D representation 412 is input to the surface detection process 320 and to the surface tracking process 330. The surface detection process 320 detects a surface plane (or multiple ones) based on the points of the sparse 3D representation 412 and outputs, for each of the detected surface plane, a plane function 422 (e.g., the A, B, C, and D parameters) . The plane function 422 is input to the surface tracking process 330.

The surface tracking process 330 (or another process of the AR session) determines the points that belong to the surface plane and that are included in the sparse 3D representation 412, determines the point identifiers of these points, and associates the point identifiers with the surface plane (illustrated in FIG. 4 as surface plane and point identifier association 432) . For instance, the point identifiers and the A, B, C, and D parameters are stored in a data structure, where the entries of the data structure indicate the association 432 (e.g., a table is used, where a row of the table indicates the association 432 by listing the point identifiers and the A, B, C, and D parameters) .

FIG. 5 illustrates an example of a current surface detection and tracking iteration 500 in the AR session, according to at least one embodiment of the disclosure. The current surface detection and tracking iteration 500 is an iteration that is subsequent to a previous surface detection and tracking iteration, such as the initial surface detection and tracking iteration 400. Here, the use of the feature detection and tracking process 310, the surface detection process 320, and the surface tracking process 330 can continue.

Prior to the start of the current surface detection and tracking iteration 500, a surface plane (or, similarly multiple ones) was detected in the previous surface detection and tracking iteration. Also prior to this start, an association already exists between the surface plane and point identifiers of points from a previous sparse 3D representation (e.g., the sparse 3D representation 412) and belonging to the surface plane.

The current surface detection and tracking iteration 500 may start with inputting image data 502 and IMU data 504 to the feature detection and tracking process 310 that, in turn, outputs a sparse 3D representation 512 of the real-world environment (e.g., a point cloud, where each point corresponds to a detected feature) . As explained herein above, surface detection and tracking iterations may be performed at a particular ITS rate corresponding to an FPS rate. The image data 502 and the IMU data are input at the FPS rate.

The sparse 3D representation 512 is input to the surface tracking process 330. For features that were previously detected by the feature detection and tracking process 310 and having existing feature identifiers, the surface tracking process 330 determines the corresponding points and their point identifiers (e.g., these points were present in the 3D representation 412 and are still present in the 3D representation 512) . For features newly detected by the feature detection and tracking process 310 in the current surface detection and tracking iteration 500, the surface tracking process 330 determines the corresponding new points in the sparse 3D representation 512 (e.g., these points were not present in the 3D representation 412 and are now present in the 3D representation 512) and assigns new point identifiers to these new points. The new point identifiers are not associated with a surface plane yet. The surface tracking process 330 proceeds to perform a surface plane tracking 532 using the point identifiers (from the previous and current surface detection and tracking iterations) and existing associations with surface planes (e.g., from the previous surface detection and tracking iteration) , as further illustrated in the next figures.

FIG. 6 illustrates an example of surface plane tracking based on point identifiers and associations with surface planes, according to at least one embodiment of the disclosure. The surface plane tracking can be performed by the surface tracking process 330 to track a surface plane (or, similarly, multiple surface planes) in a current surface detection and tracking iteration (e.g., the current surface detection and tracking iteration 500) . In particular, multiple data are input to the surface tracking process 330. A first input includes a current sparse 3D representation of the real-world environment in the current surface detection and tracking iteration (e.g., the sparse 3D representation 512 in the current surface detection and tracking iteration 500) . Points belonging to the current sparse 3D representation and their associated point identifiers are referred to herein as current points and current point identifiers, respectively, in the interest of clarity of explanation. A second input includes surface plane and point identifier association 620 from the previous surface detection and tracking iteration (e.g., the surface plane and point identifier association 432 from the initial surface detection and tracking iteration 400) indicating existing associations between the surface plane and point identifiers corresponding to points that are included in the previous sparse 3D representation from the previous surface detection and tracking iteration (e.g., the sparse 3D representation 412 from the initial surface detection and tracking iteration 400) . Points belonging to the previous sparse 3D representation and their point identifiers are referred to herein as previous points and previous point identifiers, respectively, in the interest of clarity of explanation. A previous point identifier that corresponds to a previous point belonging to the surface plane is already associated with the surface plane, referred to herein as an existing association in the interest of clarity of explanation.

In an example, the surface tracking process 330 performs a point tracking 630 based on the inputs. In particular, the point tracking 630 involves determining the number of existing points that belong to the surface plane and are included in the current sparse 3D representation (e.g., are also current points) . For instance, the point tracking 630 initializes a counter and uses point identifiers. The point tracking 630 determines whether a match exists between a previous point identifier and one of the current point identifiers, where the previous point identifier has an existing association with the surface plane (thereby indicating that the corresponding previous point was determined as belonging to the surface plane in the previous surface detection and tracking iteration) . If the match exists, the point tracking 630 increments the counter by one. If no match exists, the counter is not incremented and the point tracking 630 determines that the corresponding previous point is no longer present in the current sparse 3D representation. The matching is repeated for the previous point identifiers having existing associations with the surface plane. At the end of the matching, the value of the counter indicates the number of the previous points belonging to the surface and that are still present in the current 3D sparse representation (e.g., are also current points) .

The surface tracking process 330 performs a comparison 640 of this number (e.g., the value of the counter) to a predefined threshold (e.g., set to a value of at least three; in a particular illustration, the predefined threshold is set to five) . If the number is larger than (larger indicating that is greater than or equal to) the predefined threshold, the surface tracking process 330 determines that the surface plane is to be tracked in the current surface detection and tracking iteration and, accordingly, performs a surface plane update 650. Otherwise, the surface tracking process 330 determines that the surface plane need no longer be tracked in the current surface detection and tracking iteration and, accordingly, removes surface plane from the tracking. In this case, determines whether the surface detection process is to be repeated as further described in FIG. 7.

The surface plane update 650 can include updating the plane function of the surface plane. For instance, the surface tracking process 330 performs multiple point determinations. A first point determination relates to the previous points. In particular, a previous point (or, similarly, multiple of previous points) is no longer present in the current sparse 3D representation and, thus, no longer belongs to the surface plane. For this previous point (and, similar, other previous points as applicable) , the surface plane update 650 involves removing the existing association between the surface plane and the previous point identifier corresponding to the previous point (e.g., referring back to the data structure described in connection with FIG. 4, the previous point identifier is deleted from the data structure) . A second point determination relates to new points. In particular, a new point is a current point (e.g., or similarly, multiple current points) that was not present in the previous sparse 3D representation, but is included in the current sparse 3D representation. The new point has a new point identifier, where the new point identifier is not associated with the surface plane yet. For this new point (e.g., or similarly, multiple new points) , the surface tracking process 330 determines whether the new point belongs to the surface plane (e.g., based on the distance of the new point to the surface plane being less than a predefined threshold) . If so, the surface tracking process 330 generates a new association between the new point identifier and the surface plane (e.g., referring back to the data structure described in connection with FIG. 4, the new point identifier is added to the data structure) .

In an example, after the first point determination and prior to the second point determination, the surface plane update 650 can involve updating the plane function of the surface plane (e.g., the A, B, C, and D parameters) based on the remaining, previous points that are determined to still belong to the surface plane. Additionally or alternatively, after the second point determination, the surface plane update 650 can involve updating the plane function based on the new points determined to belong to the surface plane and/or on all points (e.g., previous and new) determined to belong to the surface plane.

FIG. 7 illustrates an example of a determination of whether a surface detection process is to be repeated, according to at least one embodiment of the disclosure. When a surface plane is removed from the tracking in a current surface detection and tracking iteration as described in connection with FIG. 6, the surface tracking process 330 can perform the determination of whether to repeat the surface detection process. If no surface planes are removed from the current surface detection and tracking iteration, the surface tracking process 330 need not perform this determination. Generally, if the total number of surface planes removed from the tracking in the current surface detection falls below a predefined threshold, the surface detection process is repeated. Otherwise, the surface detection process is not repeated.

As illustrated, multiple inputs are used. A first input includes the current sparse 3D representation 710 in the current surface detection and tracking iteration 710. A second input includes the surface planes (e.g., their plane functions) detected from the previous surface detection and tracking iteration (shown in FIG. 7 as a previous surface plane 720) . Based on the inputs, the surface tracking process 330 performs a surface plane tracking 730.

The surface plane tracking 730 involves maintaining a counter indicating the total number of the surface planes to be tracked in the current surface detection and tracking iteration. In particular, the counter’s value is equal to the total number of surface planes tracked in the previous surface detection and tracking iteration. An output of the determination performed in connection with FIG. 6 indicates whether each of surface planes is to be removed from the tracking in the current surface detection and tracking iteration. For each of the surface planes that are to be removed, the surface plane tracking 730 decreases the counter by one. The resulting value of the counter indicates the total number of surface planes that are to be tracked in the current detection and tracking iteration. The surface tracking process 330 performs a comparison 740 of this total number (e.g., the resulting value of the counter) with the predefined threshold. This threshold can be set as a percentage (e.g., eighty percent) of the total number of surface planes detected by the surface detection process 320 last time this process 320 was performed. If larger than the predefined threshold, the surface tracking process 330 determines that the surface detection process 320 need not be repeated. In this case, the surface tracking process 330 sets the previous surface planes that are to be tracked as current surface planes 750 (and these current surface planes 750 can be updated as described in connection with FIG. 6) . Otherwise, the surface tracking process 330 can disregard the previous surface planes 720 and proceeds to triggering 760 new surface detections by repeating the surface detection process 320. Accordingly, the 3D representation 710 can be input to the surface detection process 320.

In an example, when repeating the surface detection process 320, a set of constraints may be imposed on this process 320. For instance, the set of constraints includes a first constraint specifying a minimum number of points from the sparse 3D representation 710 (e.g., five points) that should belong to a candidate surface plane for the surface detection process 320 to set the candidate surface plane as a detected surface plane. The set of constraints includes a first constraint specifying a maximum distance between a point belonging to the candidate surface plane and a camera of a computer system, where the camera generates the image corresponding to the sparse 3D representation 710 (e.g. the camera 112) .

FIG. 8 illustrates an example of a flow for surface plane detection and tracking, according to at least one embodiment of the disclosure. The flow is described in connection with a computer system that is an example of the computer system 110 of FIG. 1. Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations. Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, performed in parallel, and/or reordered.

In an example, the flow starts at operation 802, where the computer system generates an initial multi-dimensional representation of a real-world environment in an AR session. For instance, an image of the real-world environment is generated by a camera of the computer system. Image data of the image and IMU data of an IMU system of the computer system are input to a feature detection and tracking process (e.g., a SLAM) process of an AR module of the computer system. In an initial surface detection and tracking iteration, the feature detection and tracking process outputs a sparse 3D representation of the real-world environment, such as a point cloud, as an example of the initial multi-dimension representation.

In an example, the flow includes operation 804, where the computer system assigns point identifiers. For instance, each point included in the initial multi-dimension representation corresponds to a feature included in the real-world environment and detected by the feature detection and tracking process. A point corresponding to a feature is assigned a point identifier, where the point identifier is derived from a feature identifier of the feature.

In an example, the flow includes operation 806, where the computer system detects a surface plane (or, similarly, multiple surface planes) . For instance, the initial multi-dimensional representation is input to a surface detection process, such as a RANSAC process. The output of this process indicates a plane function for the surface plane.

In an example, the flow includes operation 808, where the computer system associates the surface plane (or, similarly, multiple surface planes) with point identifiers. For instance, for each point belonging to the surface plane and included in the initial multi-dimensional representation, the computer associates the corresponding point identifier with the surface plane. Such associations can be stored in a data structure.

In an example, the flow includes operation 810, where the computer system generates a current multi-dimensional representation. This operation is similar to operation 802, except that a new image and new IMU data are input to feature detection and tracking process and correspond to a current surface detection and tracking iteration.

In an example, the flow includes operation 812, where the computer system tracks points identifiers in the current multi-dimensional representation. The tracking involves determining whether a feature from the previous (e.g., initial) multi-dimensional representation is no longer included in the current multi-dimensional representation (e.g., removed feature) or remains included in the current multi-dimensional representation (e.g., remaining feature) . The tracking also involves determining whether a feature that was not included in the previous multi-dimensional representation is now included in the multi-dimensional representation (e.g., a new feature) . Point identifiers of points corresponding to removed features (e.g., removed points) are also removed. Point features of points corresponding to remaining features (e.g., remaining points) are maintained. And new point identifiers are assigned to points corresponding to new features (e.g., new points) .

In an example, the flow includes operation 814, where the computer system determines a number of points belonging to the surface plane (or, similarly, to each of the surface planes detected in the previous surface detection and tracking iteration) and included in the current multi-dimensional representation. For instance, point identifiers associated with the surface planes are matched to point identifiers corresponding to points of the current multi-dimensional representation. The total number of matches represent the number of points.

In an example, the flow includes operation 816, where the computer system compares the number of points to a first predefined threshold. If the number is larger than the first predefined threshold, operation 818 is performed. Otherwise, operation 820 is performed.

In an example, the flow includes operation 818, where the computer system updates the surface plane (or, similarly, multiple ones of the surface planes detected in the previous surface detection and tracking iteration as applicable) . For instance, the computer system determines the points that belong to the surface plane and that are removed, points that belong to the surface plane and that are maintained, and new points that now belong to the surface plane. Associations with the surface plane are removed for point identifiers of the removed points that previously belonged to the surface plane. Associations with the surface plane are added for point identifier of the new points belonging to the surface plane

In an example, the flow includes operation 820, where the computer system stops tracking the surface plane in the current surface detection and tracking iteration. Accordingly, no surface plane update is performed.

In an example, the flow includes operation 822, where the computer system determines the total number of surface planes tracked in the current surface detection and tracking iteration. For instance, a counter representing this total number is decreased per determination that a surface plane is no longer to be tracked as indicated by operation 816.

In an example, the flow includes operation 824, where the computer system compares the total number to a second predefined threshold. If the number is larger than the second predefined threshold, the computer system continues tracking the surface planes based on point identifiers, as indicated by the loop back to operation 810. Otherwise, the computer system repeats the surface detection process as indicated by the loop back to operation 806.

FIG. 9 illustrates examples of components of a computer system 900, according to at least one embodiment of the disclosure. The computer system 900 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 900, the computer system 900 can also be distributed.

The computer system 900 includes at least a processor 902, a memory 904, a storage device 906, input/output peripherals (I/O) 908, communication peripherals 910, and an interface bus 912. The interface bus 912 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 900. The memory 904 and the storage device 906 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example

memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 904 and the storage device 906 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 900.

Further, the memory 904 includes an operating system, programs, and applications. The processor 902 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 904 and/or the processor 902 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 908 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 908 are connected to the processor 902 through any of the ports coupled to the interface bus 912. The communication peripherals 910 are configured to facilitate communication between the computer system 900 and other computer systems over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present disclosure.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computer system, such as one or more computers or a similar electronic computer system or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computer system can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computer systems include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computer system.

Embodiments of the methods disclosed herein may be performed in the operation of such computer systems. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “comprising, ” “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Claims

A method implemented by a computer system in an augmented reality (AR) session, the method including:

determining points belonging to a surface plane, the surface plane detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in the AR session based on a first image of the real-world environment;

determining a second multi-dimensional representation of the real-world environment, the second multi-dimensional representation generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image;

determining a number of the points that are also included in the second multi-dimensional representation;

comparing the number with a threshold; and

tracking the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.
The method of claim 1, wherein the first multi-dimensional representation includes a first point cloud corresponding to features detected in the real-world environment, and wherein each point corresponds to a feature and is associated with a point identifier.
The method of claim 2, wherein determining the number of the points includes:

determining that a first point belonging to the surface plane is associated with a first point identifier;

determining, based on the first point identifier, that the first point is included in a second point cloud corresponding to the second multi-dimensional representation; and

incrementing the number of the points based on the determination that the first point is included in the second point cloud.
The method of claim 2, further including:

detecting, in a first feature detection iteration, the features based on the first image and on first inertial measurement unit (IMU) data;

generating the first point cloud based on the detected features; and

assigning, for each point in the first point cloud, a different point identifier.
The method of claim 4, further including:

detecting, in a second feature detection iteration, a first feature of the features based on the second image and on second IMU data;

detecting, in the second feature detection iteration and based on the second image and the second IMU data, a second feature that was not detected in the first feature detection iteration;

generating a second point cloud, the second point cloud including a first point corresponding to the first feature and a second point corresponding to the second feature, the first point also included in the first point cloud;

maintaining a first point identifier assigned to the first point; and

assigning a second point identifier to the second point.
The method of claim 1, further including:

generating the first multi-dimensional representation based on an execution of a simultaneous localization and mapping (SLAM) process, wherein the first image is input to the SLAM process; and

inputting the first multi-dimensional representation to a random sample consensus (RANSAC) process, wherein the surface plane is detected based on an execution of the RANSAC process using the first multi-dimensional representation.
The method of claim 6, further including:

assigning a different point identifier to each point in the first multi-dimensional representation;

determining that a first point in the first multi-dimensional representation belongs to the surface plane; and

associating the surface plane with a first point identifier of the first point.
The method of claim 7, wherein determining the number of the points includes:

determining, based on the first point identifier, that the first point is included in the second multi-dimensional representation; and

incrementing the number of the points based on the determination that the first point is included in the second multi-dimensional representation.
The method of claim 1, wherein tracking the surface plane includes:

determining a first set of points from the points belonging to the surface plane, wherein the first set of points is present in the second multi-dimensional representation; and

updating a plane function of the surface plane based on the first set of points.
The method of claim 9, wherein tracking the surface plane further includes:

determining a second set of points that belongs to the surface plane and that is in the second multi-dimensional representation but not in the first multi-dimensional representation; and

associating the second set of points with the surface plane.
A computer system including:

one or more processors; and

one or more memories storing computer-readable instructions that, upon execution by the one or more processors, configure the computer system to:

determine points belonging to a surface plane, the surface plane detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in an augmented reality (AR) session based on a first image of the real-world environment;

determine a second multi-dimensional representation of the real-world environment, the second multi-dimensional representation generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image;

determine a number of the points that are also included in the second multi-dimensional representation;

compare the number with a threshold; and

track the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.
The computer system of claim 11, wherein the execution of the computer-readable instructions further configures the computer system to:

determine, from the second multi-dimensional representation, second points that belong the surface plane;

determine a third multi-dimensional representation of the real-world environment, the third multi-dimensional representation generated in the AR session based on a third image of the real-world environment, the third image generated subsequent to the first image;

determine a second number of the second points that are also included in the first multi-dimensional representation;

compare the second number with the threshold; and

determine that the surface plane is no longer to be tracked in the AR session based on the second number being smaller than the threshold.
The computer system of claim 12, wherein the execution of the computer-readable instructions further configures the computer system to:

determine that a total number of tracked surface planes based on comparisons with the threshold is smaller than a second threshold; and

detect, based on the total number being smaller than the second threshold, a second surface plane by at least inputting the third multi-dimensional representation to a random sample consensus (RANSAC) process.
The computer system of claim 13, wherein the second surface plane is detected based on a set of constraints to the RANSAC process, the set of constraints specifying a minimum number of points from the third multi-dimensional representation.
The computer system of claim 14, wherein the set of constraints further specifies a maximum distance between a point belonging to the second surface plane and a camera of the computer system, wherein the camera is configured to generate the third image.
One or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations including:

determining points belonging to a surface plane, the surface plane detected based on a first multi-dimensional representation of a real-world environment, the first multi-dimensional representation generated in an augmented reality (AR) session based on a first image of the real-world environment;

determining a second multi-dimensional representation of the real-world environment, the second multi-dimensional representation generated in the AR session based on a second image of the real-world environment, the second image generated subsequent to the first image;

determining a number of the points that are also included in the second multi-dimensional representation;

comparing the number with a threshold; and

tracking the surface plane in the AR session based on the second multi-dimensional representation upon determining that the number is larger than the threshold.
The one or more non-transitory computer-storage media of claim 16, wherein the operations further include:

generating the first multi-dimensional representation based on an execution of a simultaneous localization and mapping (SLAM) process, wherein the first image is input to the SLAM process; and

inputting the first multi-dimensional representation to a random sample consensus (RANSAC) process, wherein the surface plane is detected based on an execution of the RANSAC process using the first multi-dimensional representation.
The one or more non-transitory computer-storage media of claim 17, wherein the operations further include:

assigning a different point identifier to each point in the first multi-dimensional representation;

determining that a first point in the first multi-dimensional representation belongs to the surface plane; and

associating the surface plane with a first point identifier of the first point.
The one or more non-transitory computer-storage media of claim 18, wherein determining the number of the points includes:

determining, based on the first point identifier, that the first point is included in the second multi-dimensional representation; and

incrementing the number of the points based on the determination that the first point is included in the second multi-dimensional representation.
The one or more non-transitory computer-storage media of claim 16, wherein the threshold is set to a value equal to or larger than three.