WO2021160080A1

WO2021160080A1 - Evaluating pose data of an augmented reality (ar) application

Info

Publication number: WO2021160080A1
Application number: PCT/CN2021/075972
Authority: WO
Inventors: Jiangshan TIAN; Fan DENG
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-02-12
Filing date: 2021-02-08
Publication date: 2021-08-19
Also published as: CN115066281A; CN115066281B

Abstract

Techniques for evaluating the performance of an AR application are described. In an example, the AR application executed on a user device. The user device is communicatively coupled with a server and in a field of view of a motion tracking system. The AR application estimates and sends pose data of the user device to the server. A tracking application of the motion tracking system also detects the pose of the user device in parallel and sends the resulting pose data to the server. An AR evaluation application of the server aligns the estimated pose data and detected pose data across at least a time dimension and uses the detected pose data as ground truth data to evaluate the estimated pose data and output an evaluation of the AR application.

Description

EVALUATING POSE DATA OF AN AUGMENTED REALITY (AR) APPLICATION

BACKGROUND OF THE INVENTION

Augmented Reality (AR) superimposes virtual content over a user’s view of the real world. With the development of AR software development kits (SDK) , the mobile industry has brought smartphone AR to the mainstream. An AR SDK typically provides six degrees-of-freedom (6DoF) tracking capability. A user can scan the environment using a smartphone’s camera, and the smartphone performs visual inertial odometry (VIO) in real time. Once the camera pose is tracked continuously, virtual objects can be placed into the AR scene to create an illusion that real objects and virtual objects

are merged together.

The quality of the AR experience can depend on how well virtual objects are placed in the AR scene. In turn, proper placement of the virtual objects can depend on how well the AR tracking is performed. Accordingly, there is a need in the art for improved methods and systems related to performing and evaluating AR tracking.

SUMMARY OF THE INVENTION

The present invention relates generally to methods and systems for evaluating the performance of an AR application including, for example, the accuracy of pose estimation.

In an example, a system includes a user device configured to execute an augmented reality (AR) application and send first data indicating an estimated trajectory of the user device, the first data generated by the AR application and including first time stamps, the first time stamps generated based on a first local time of the user device. The system also includes a motion tracking system configured to send second data indicating a tracked trajectory of the user device. The system also includes a computer system communicatively coupled with the user device and the motion tracking system and configured to: determine a time offset between a second local time of the computer system and the first local time of the user device, receive the first data, associate the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps, receive the second data,

associate the second data with third time stamps, the third time stamps generated based on the second local time, and generate an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.

In an example, the first data includes first pose data of the user device. The first pose data includes position data and orientation data and is generated by a simultaneous localization and mapping (SLAM) process of the AR application.

Further, the second data includes second pose data of the user device. The second pose data is generated by the motion tracking system.

The first pose data and the first time stamps are received over a fist socket associated with the AR application. The second pose data is received based on a second socket associated with a motion tracking application of the motion tracking system.

In an example, determining the time offset includes: prior to receiving the first data, receiving time data from the user device, the time data indicating the first local time, and determining the time offset based on a comparison of the time data with the second local time.

Further, the first data has a first data pattern, wherein the time data has a second data pattern, and wherein the first data pattern is different from the second data pattern.

The first data pattern includes pose data and a time stamp, wherein the second data pattern includes an identifier of the user device, the first local time, and a time baseline associated with generating the first data.

In an example, the first data indicates a first pose of the user device at a first time stamp of the first time stamps. The second data includes a second pose of the user device at a third time stamp of the third time stamps. The first pose is associated with a second time stamp of the second time stamps based on the first time stamp and the time offset. Generating the evaluation includes: determining that the second time stamp corresponds with the first time stamp and computing an evaluation metric based on the first pose and the second pose.

In an example, the first data is received over a time period. The second data is received over the time period. Generating the evaluation includes: generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps, determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline, and computing an evaluation metric based on the associations.

Further, the evaluation metric is defined based on user input received via a user interface to an evaluation application of the computer system.

In an example, the evaluation is generated by using the second data as ground truth data and the first data as variable data.

In an example, a method is implemented by a computer system. The method includes determining a time offset between a local time of the computer system and a local time of a user device based on an execution of an augmented reality (AR) application on the user device, receiving first data from the user device, the first data indicating an estimated trajectory of the user device and generated by the AR application, the first data including first time stamps generated based on the local time of the user device, associating the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps, receiving second data from a motion tracking system, the second data indicating a tracked trajectory of the user device, associating the second data with third time stamps, the third time stamps generated based on the local time of the computer system, and generating an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.

In an example, determining the time offset includes: prior to receiving the first data, receiving time data from the user device, the time data indicating the local time of the user device, and determining the time offset based on a comparison of the time data with the local time of the computer system.

Further, the first data is received over a time period, wherein the second data is received over the time period Generating the evaluation includes: generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps, determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline, and computing an evaluation metric based on the associations.

In an example, the first data includes pose data and the first time stamps. The time data includes an identifier of the user device, the first local time, and a time baseline associated with generating the first data.

In an example, one or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations. The operations include: determining a time offset between a local time of the computer system and a local time of a user device based on an execution of an augmented reality (AR) application on the user device, receiving first data from the user device, the first data indicating an estimated trajectory of the user device and generated by the AR application, the first data including first time stamps generated based on the local time of the user device, associating the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps, receiving second data from a motion tracking system, the second data indicating a tracked trajectory of the user device, associating the second data with third time stamps, the third time stamps generated based on the local time of the computer system, and generating an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.

In addition, the first pose data and the first time stamps are received over a fist socket associated with the AR application. The second pose data is received based on a second socket associated with a motion tracking application of the motion tracking system.

In an example, generating the evaluation includes: generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps, determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline, and computing an evaluation metric based on the associations.

Numerous benefits are achieved by way of the present invention over conventional techniques. For example, embodiments of the present disclosure involve methods and systems that implement techniques for evaluating the AR tracking by an AR application executing on a user device. These techniques enable quantitate and qualitative measurements of how well the AR application tracks the actual pose (e.g., position and orientation) of the user device. By doing so, refinements to the AR application and/or user device are possible and can improve the AR tracking and the resulting overall AR experience.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:

FIG. 1 illustrates an example of a user device that includes a camera and an inertial measurement unit (IMU) sensor for AR applications, according to at least one embodiment of the disclosure;

FIG. 2 illustrates an example of an AR evaluation system, according to at least one embodiment of the disclosure;

FIG. 3 illustrates an example of a time offset, according to at least one embodiment of the disclosure;

FIG. 4 illustrates an example of aligning AR data and motion tracking data to enable an AR evaluation, according to at least one embodiment of the disclosure;

FIG. 5 illustrates an example of a sequence diagram showing interactions between components of an AR evaluation system, according to at least one embodiment of the disclosure;

FIG. 6 illustrates an example of a flow for performing an AR evaluation, according to at least one embodiment of the disclosure; and

FIG. 7 illustrates examples of components of a computer system, according to at least one embodiment of the disclosure.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Embodiments of the present disclosure are directed to, among other things, evaluating the performance of an AR application that is executing on a user device. In an example, the performance can be evaluated by collecting AR data generated by the AR application and ground truth data generated by a motion tracking system. The AR data includes pose data of the user device (e.g., the user device’s position and orientation) as estimated by the AR application. The estimated pose data is tracked over time and the tracking indicates an estimated trajectory of the user device by the AR application. The ground truth data includes pose data of the user device as detected by the motion tracking system. This pose data is also tracked over time and the tracking indicates an actual trajectory of the user device as detected by the motion tracking system (referred to herein as a ground truth trajectory) . The AR data and the ground truth data are synchronized (e.g., along the time dimension) such that a proper analysis of the estimated trajectory can be performed based on the ground truth trajectory. Accuracy of the estimated pose data can be derived from the analysis and represents an example evaluation of the AR application’s performance.

To illustrate, consider an example of a smartphone that hosts an AR application. The smartphone is placed within a field of view of a motion tracking system. In addition, the smartphone and the motion tracking system are communicatively coupled to a server. The AR application is executed for ten minutes (or some other period of time) . During the ten minutes AR session, the AR application executes a simultaneous localization and mapping (SLAM) process that, among other things, estimates the 6DoF pose of the smartphone at a particular rate (e.g., between twenty frames per second or some other rate) . The AR data is output to the server and includes estimated 6DoF data and time stamps that depends on the rate (e.g., the pose at each fifty milliseconds corresponding to the twenty frames per second rate. The server receives and stores the estimated 6DOF data along with the corresponding time stamps. Also during the ten minutes AR session, the motion tracking system tracks the actual pose of the smartphone and sends the tracked data to the server. The server receives the actual 6DoF data and generates time stamps corresponding to the time of receipt. Accordingly, upon the end of the AR session, the server has collected estimated 6DoF data and associated time stamps from the smartphone and actual 6DoF data from the motion tracking system and has generated time stamps for the actual 6DoF data. The server performs a time synchronization of the estimated 6DoF data and the actual 6DoF data given the time stamps. Once the two sets of 6DoF data are time synchronized, the server derives one more metrics based on an analysis of the two sets, such as an average difference between estimated pose and actual pose and variance thereof. The metrics can be presented at a user interface in an evaluation report of the AR application’s performance.

Embodiments of the present disclosure provide various technical advantages. For example, conventional systems do not provide a framework for evaluating trajectory estimation by an AR application on a mobile user device, such as a smartphone. In comparison, embodiments of the present disclosure enable the evaluation of the performance an AR application independently of the type of the user device and/or of the operating system of such a user device. Hence, the embodiments are scalable to different user devices and operating systems. In addition, the embodiments enable user-defined metrics for the evaluation. Hence, the evaluation can be customized to output quantitate and/or qualitative measurements of the AR application’s performance. By being scalable and customizable, refinements to the AR application and/or user devices become possible.

FIG. 1 illustrates an example of a user device 110 that includes a camera 112 and an inertial measurement unit (IMU) sensor 114 for AR applications, according to at least one embodiment of the disclosure. The AR applications can be implemented by an AR module 116 of the user device 110. Generally, the camera 112 generates images of a real-world environment that includes, for instance, a real-world object 130. The camera 112 can also include a depth sensor that generates depth data about the real-world environment, where this data includes, for instance, a depth map that shows depth (s) of the real-world object 130 (e.g., distance (s) between the depth sensor and the real-world object 130) . The IMU sensor 114 can include a gyroscope and an accelerometer, among other components, and can output IMU data including, for instance, an orientation of the user device 110.

Image data of the images generated by the camera 112 in an AR session and the IMU data generated by the IMU sensor 114 in the AR session can be input to a SLAM process executed by the AR module 116. In turn, the SLAM process outputs a 6DoF pose (e.g., position along the X, Y, and Z axes and rotation along each of such axes) of the user device 110 relative to the real-world environment and a map of the real-world environment. The SLAM process tracks the 6DoF pose and the map over time based on the images and the IMU data that are input at a particular frame rate. The tracking of the 6DoF pose represents an estimated trajectory 110 of the user device in the real-world environment and this estimated trajectory can be mapped to a virtual trajectory in the map over time. The 6DoF pose includes pose data, such as position data along the X, Y, and Z axes and rotation data along each of such axes. The pose data and time stamps for when each of the pose data is generated (e.g., the time stamps corresponding to the frame rate) are illustrated as part of AR data 118 generated by the AR module 116 during the AR session. The AR module 116 can be implemented as specialized hardware and/or a combination of hardware and software (e.g., general purpose processor and computer-readable instructions stored in memory and executable by the general purpose processor) .

Following an initialization of the AR session (where this initialization can include calibration and tracking) , the AR module 116 renders an AR scene 120 of the of the real-world environment in the AR session, where this AR scene 120 can be presented at a graphical user interface (GUI) on a display of the user device 110. The AR scene 120 shows a real-world object representation 122 of the real-world object 130. In addition, the AR scene 120 shows a virtual object 124 not present in the real-world environment. To place the virtual object 124 on the real-world object representation 122 in a proper manner, the AR module 116 relies on the AR data 118 and the map of the real-world environment.

In an example, the user device 110 represents a suitable user device that includes, in addition to the camera 112 and the IMU sensor 114, one or more graphical processing units (GPUs) , one or more general purpose processors (GPPs) , and one or more memories storing computer-readable instructions that are executable by at least one of the processors to perform various functionalities of the embodiments of the present disclosure. For instance, the user device 110 can be any of a smartphone, a tablet, an AR headset, or a wearable AR device.

FIG. 2 illustrates an example of an AR evaluation system 200, according to at least one embodiment of the disclosure. The AR evaluation system 200 includes a user device 210, a motion tracking system 220, and a computer system 230. While being in a field of view of the motion tracking system 220, the user device 210 executes an AR application 212 that outputs AR data 214 to the computer system 230. The motion tracking system 220 tracks the actual pose of the user device 210 and outputs motion tracking data 224 to the computer system 230. The computer system 230 receives the AR data 214 and the motion tracking data 224 and performs an evaluation of the performance of the AR application.

The user device 210 is an example of the user device 110 of FIG. 1. In particular, the AR application 212 executes a SLAM process to generate the AR data 214. The AR data 214 includes 6DoF data and time stamps. At each time stamp, the corresponding 6DoF data indicates the position and orientation of the user device 210 at that point in time. The 6DoF data over time indicates the estimated trajectory of the user device 210. The AR data 214 can also include a map generated by the SLAM process, where this map can be defined in a coordinate system.

In an example, the motion tracking system 220 includes an optical system configured to perform motion capture. Generally, the motion capture can be implemented as a process that records the movement of objects, such as the user device 210, in the three-dimensional (3D) real-world environment. The movement can be sampled at a particular sampling rate and recoded as pose data at that rate, including position data and orientation data. To do so, the optical system includes a set of cameras (e.g., at least two cameras) and a tracking application 222. The tracking application 222 processes images generated by the cameras to detect features, generate a map of the real-world environment in a coordinate system, and track these features in the map over time. When the features belong to the user device 210, the movement of the user device 210 is tracked. The features can include a set of active and/or passive markers (e.g., at least three markers) attached to the user device 210 or fiducials that are inherent to the user device 210 and that are detectable from the images (e.g., particular sections of an exterior surface of the user device 210) . By tracking the features in the map, the tracking application 222 can output the motion tracking data 224. The motion tracking data 224 includes the pose data of the user device 210 as detected by the motion tracking system 220 over time in the coordinate system. Such an optical system can be available from VICON, of Oxford, United Kingdom or from OPTITRACK, of Corvallis, Oregon, United States of America.

The computer system 230 can be implemented as a server that hosts an AR evaluation application 232. The server receives the AR data 214 from the user device 212 and stores the AR data 214 in local memory as AR data 234. Similarly, the server receives the motion tracking data 224 from the motion tracking system 220 and stores the motion tracking data in the local memory as motion tracking data 236.

In an example, the computer system 230 is communicatively coupled with the user device 210 and the motion tracking system 220 over one or more wireless and/or wired networks. The communicative coupling can rely on a particular network topology, such as one including a peer-to-peer network between the computer system 230 and each of the user device 210 and the motion tracking system 220, or such as one including other network nodes such as an access point. Generally, the latency of the network between the computer system 230 and the user device 210 can be estimated as a round-trip time (RTT) or the time to first byte (TTFB) . The latency of the network between the computer system 230 and the user device 210 can be similarly estimated. If a network latency is smaller than a threshold latency, such as ten milliseconds or some other predefined value, this network latency can be ignored.

The communications between the computer system 230 and the user device 210 and the motion tracking system 220 are performed using sockets via the one or more networks. In particular, the AR evaluation application 232 receives the AR data 214 from the AR application 212 over a first socket associated with a first internet protocol (IP) address a first port. In comparison, the AR evaluation application 232 receives the motion tracking data 234 from the tracking application 222 over a second socket associated with a second IP address a second port.

In an example, the AR evaluation application 232 aligns the AR data 234 (or, equivalently, the received AR data 214 prior to storage as the AR data 234) with the motion tracking data 236 (or, equivalently, the received motion tracking data 224 prior to storage as the motion tracking data 236) . Different types of alignment can be performed. A first type relates to aligning in space the estimated pose data from the AR data 214 with the detected pose data from the motion tracking data 224 in a same coordinate system. As indicated above, each of the user device 210 and the motion tracking system 220 can generate its set of pose data in a coordinate system. The AR evaluation application 232 can determine a transform between the coordinate systems or between each of the coordinate systems and a local coordinate system such that the pose data estimated by the AR application 212 and the pose data detected by the tracking application 222 can be transformed to and defined in the same coordinate system. The transformation can be generated based on images received with the AR data 212 and the motion tracking data 224 and can be anchored to a detected feature of the real-world environment.

Another Type of alignment relates to the time dimension. In particular, the AR data 214 can include, in addition to the estimated pose data, time stamps. These time stamps are generated by the AR application 212 depending on the frame rate of the SLAM process and are expressed as a function of a local time of the user device 210. The local time can be the coordinated universal time (UTC) tracked by the user device 210. In comparison, the motion tracking data 224 does not include time stamps. Instead, the AR evaluation application 232 generates time stamps for the detected pose data, where the time stamps corresponds to the timing of when the motion tracking data 224 is received. The generated time stamps can be expressed as a function of a local time of the computer system 230. This local time can be the UTC time tracked by the computer system 230. In certain situations, an offset may exist between the local time of the user device 210 and the local time of the computer system 230. An offset may also exist between the start of the time stamping by the AR application 212 and the start of the time stamping by the AR evaluation application 232. The AR evaluation application 232 computes and stores such offsets in the local memory as the time offset 238. Accordingly, the AR evaluation application 232 shifts the time stamps of the AR data 214 (or, equivalently, of the AR data 234) or the time stamps generated for the motion tracking data 224 (or, equivalently, of the motion tracking data 236) by the time offset 238 such that the two sets of data are aligned relative to a same time line.

To compute the time offset 238, the AR evaluation application 232 can receive time data 216 from the AR application 212. The time data 216 can identify the local time of the user device 216 and the specific time of the first time stamp (e.g., the start of the time stamping by the AR application 212) relative to the local time. The AR evaluation application 232 compares the local time identified in the time data 216 with the local time of the computer system 230, compares the start of time stamping of the AR application 212 with the start of its time sampling, and computes the time offset 238 based on the comparisons. This computation is further illustrated in FIG. 3.

FIG. 3 illustrates an example of a time offset, according to at least one embodiment of the disclosure. As illustrated, a user device (such as the user device 210 of FIG. 2) generates time stamps relative to its local time (illustrated as a device local time 310) . The start of the time stamping by the user device is shown as a sampling start 312. Similarly, a computer system (such as the computer system 230 of FIG. 2) generates time stamps relative to its local time (illustrated as a system local time 320) . The start of the time stamping by the computer system is shown as a sampling start 322.

A first offset 330 exists between the device local time 310 and the system local time 320. By receiving time data from an AR application of the user device (e.g., the time data 216) , an AR evaluation application of the computer system can compute the first offset 330 as the time difference between the two

local times

320 and 330. The AR evaluation application 232 can shift, by the first offset 330, the sampling start 322 (or, equivalently, the sampling start 312) , such that the two starts are defined relative to a same local time. The remaining time difference between the two sampling starts 312 and 322 indicates a second a second offset 340. This second offset 340 can be a function of several factors. For instance, the second offset 340 can depend on a delay time predefined and available from the software SDK of the tracking application. The delay time indicates the processing latency of the tracking application. The second offset 340 can additionally or alternatively be a function of network latencies when non-negligible (e.g., greater than a predefined threshold latency) . For sentence, the difference between the network latency for receiving data from the user device and the network latency for receiving data from the motion tracking system can be added to the second offset 340. A time offset (e.g., the time offset 238 of FIG. 2) can include the first offset 330, the second offset 340, and/or the sum of the first offset 330 and the second offset 340.

Referring back to FIG. 2, after the different types of alignment are performed, the AR data 234 and the motion tracking data 236 (and, specifically, the estimated posed data included in the AR data 234 and the detected pose data included in the motion tracking data 236) can be analyzed to determine a performance of the AR application 212. In an example, the performance is determined as a set of metrics that compares the two sets of data. The type of metrics can be defined via a user interface to the AR evaluation application 232. In particular, user input can be received via the user interface and can define specific metric types. For instance, the user input can request the average of the difference (e.g., distance and angle) between the estimated pose and the actual pose and the variance thereof over a particular period of time (e.g., ten minutes) . The AR evaluation application 232 analyzes the AR data 234 and the motion tracking data 236 and outputs the metric (s) as an evaluation of the AR application 212.

Although FIG. 2 illustrates a single user device and a single motion tracking system 220, the embodiments of the present disclosure are not limited as such. For instance, multiple user devices can be in the field of view of the motion tracking system 220, each of which can be executing an AR application that sends AR data to the computer system 230 via a corresponding socket. Similarly, multiple motion tracking systems can be used, each of which having at least one user device in its field of view and executing a tracking application that sends motion tracking data to the computer system 230 via a corresponding socket. In other words, the computer system 230 can collect AR data from one or more user devices and motion tracking data from one or more motion tracking systems at the same time.

In a specific illustration, the AR evaluation system 200 adopts a client-server architecture. There are two types of client in the AR evaluation system 200. One type includes the user device 210 (and any other such user device) that provides the estimated 6DoF pose. The other type includes the motion tracking system 220 (and any other such measurement device) that provides the actual 6DoF data for use as ground truth data. The computer system 230 is the server implemented as a local workstation that collects all the data from the clients and processes them with evaluation algorithms.

The server can be setup in a workstation with Microsoft Windows operating system. When the AR evaluation application 232 is launched, the AR evaluation application 232 first starts up a Winsock program. The Winsock program is a programming interface that supports network communication in Windows system. The server resides in this program with specified socket type, protocol, and flags. After the initialization is done, the AR server resolves the server address and port. The server keeps listening to a client until the client is shut off.

A first client is the AR application 212. The AR application 212 sends data that follows a certain pattern of data (e.g., data pattern) to connect/disconnect, send SLAM data, send images, and send tasks which handle types of data transfer requests. The format of the data transmission of the user device can be defined when its transmission poses.

For the first “n” frames, the data type follow the data pattern of: time flag, device name, device local time, pose time baseline (e.g., start of the time sampling or a boot time of the AR application) . For the next frame, the data type follows the data pattern of: start flag. For the following frames, the data type follows the data pattern of: pose flag, pose data (e.g., position (x , y, z) and quaternion (qw, qx, qy, qz) ) , sensor data (e.g., IMU data, image data) , end pose flag.

A second client is the tracking application 222. The tracking application 222 is based on the application programming interface (API) of the motion tracking system 220.

Based on the client-server architecture, time synchronization can be solved. Each of the user device 210 and the computer system 230 follows its own rule to count the time. UTC time can be used as the baseline to synchronize the time. The ground truth time can be transferred to the user device’s 210. In particular, for the first “n” frames, the user device’s 210 current UTC time and the pose time baseline in UTC baseline are available from the received data. For the motion tracking system 220, the delay time is predefined and available from the software SDK of the tracking application 222. The computer system’s 230 UTC time is used to time stamp the ground truth data (e.g., the motion trakinvg data 224) upon receipt from the tracking application 222. A ground truth timeline relative to the user device’s 210 timeline (or vice versa) can be computed from the offset between the UTC times (e.g., the first offset 330) and the delay (e.g., corresponding to the second offset 340) . Once the data is collected and aligned, at least in time, the poses from user device 210 are evaluated based on a comparison to the ground truth data according to the user-requested metric types.

FIG. 4 illustrates an example of aligning AR data 410 and motion tracking data 420 to enable an AR evaluation, according to at least one embodiment of the disclosure. The aligning can be performed by an AR evaluation application of a computer system, such as the AR evaluation application 232 of FIG. 2. The AR data 410 is an example of the AR data 214 or 234 of FIG. 2. The motion tracking data 420 is an example of the motion tracking data 224 or 236 of FIG. 2.

In an example, the AR data 410 is shifted in time by a time offset 402 to generate AR data 430 such that the AR data 430 and the motion tracking data 420 are relatively time aligned. Of course, the shifting can be applied to the motion tracking data 420 instead, such that the shifted motion tracking data 420 is time aligned with the AR data 410.

The AR data 410 includes estimated pose data that is generated by the AR application of a user device and the corresponding time stamps generated also by the AR application. For instance, at “time 1, ” the estimated pose data indicates that the user device is estimated to have a “pose 1, ” where the “time 1” is defined in the local time of the user device.

The shifted AR data 430 can be generated by shifting each of the time stamps by the time offset 402 to define the time line of the estimated pose data in the local time of the computer system. For instance, “time 1” shifts to become “time 1’” defined in the local time of the computer system. Accordingly, the AR data 430 includes the same estimated pose data as the AR data 410, but each of the corresponding estimated poses is associated with a shifted time stamp defined in the local time of the computer system.

To illustrate, consider an example where the time offset 402 is two seconds. In this example, “time 1” is 12: 00: 00 pm in the UTC time of the user device. This time stamp is shifted to “time 1’, ” where “time 1’” is 12: 00: 02 pm in the UTC time of the computer system.

In comparison, the motion tracking data 420 includes ground truth pose data of the user device as detected by a tracking application of a motion tracking system. Upon receiving such pose data, the AR evaluation application generates time stamps corresponding to the timing of receiving the ground truth pose data. The time stamps are defined in the local time of the computer system. For instance, at “time A, ” the ground truth pose data indicates that the user device is detected to have a “pose A, ” where “time A” is defined in the local time of the computer system. Continuing with the previous application, “time A” is 12: 00: 02 pm in the UTC time of the computer system.

The AR evaluation application generates associations 440 between the shifted AR data 430 and the motion tracking data 420. The associations 440 show a relative timeline between the between the estimated posed data and the ground truth pose data to enable a comparison of these two data sets. For instance, the AR evaluation application determines that “time 1’” is the same as “time A” (e.g., “time 1’: 12: 00: 02 = time A: 12: 00: 02) . Accordingly, the AR evaluation application generates an association between the corresponding poses of the devices, where this association indicates that estimated “pose 1” corresponds to ground truth “pose A” because these two poses were generated at the same time (or substantially the same time) . Given this association, the AR evaluation application can compare estimated “pose 1” and ground truth “pose A” (e.g., the distance difference and the angular difference between the two) in the computation of an evaluation metric of the AR application.

FIG. 5 illustrates an example of a sequence diagram showing interactions between components of an AR evaluation system, according to at least one embodiment of the disclosure. The components includes a user device, a motion tracking system 520, and a computer system 530, similar to the user device 210, the motion tracking system 220, and the computer system 230 of FIG. 2.

As illustrated, in a first step of the sequence diagram, the user device 510 sends time data to the computer system 530. The time data includes, for instance, a current local time of the user device and a pose time baseline (e.g., the start of the time sampling by the user device, or a boot time of the AR application) . The time data can follow a first data pattern, such as one following the pattern of: time flag, device name, device local time, pose time baseline.

In a second step of the sequence diagram, the computer system 530 determines a time offset based on the time data. The time offset can be a function of difference between the user device’s 510 local time and the computer system’s 530 local time. The time offset can also be a function of the start of the time sampling by the user device 510 and the start of the time sampling by the computer system 530. In turn, the computer system’s 530 sampling start can depend (e.g., be equivalent to) a delay predefined and available from an SDK of a tracking application of the motion tracking system 520.

In a third step of the sequence diagram, the user device 510 sends AR data to the computer system 530. The AR data includes estimated pose data and time stamps. The computer 530 receives the AR data over a period of time. The AR data can be received by an AR evaluation application of the computer system 530 from the AR application of the user device 520 via a socket. The AR data can follow the data pattern of: pose flag, pose data (e.g., position (x , y, z) and quaternion (qw, qx, qy, qz) ) , sensor data (e.g., IMU data, image data) , end pose flag.

In a parallel step of the sequence diagram, the motion tracking system 520 sends motion tracking data to the computer system 530. The motion tracking data includes ground truth data pose data but not time stamps. The computer 530 receives the motion tracking data over the same period of time. The motion tracking data can be received by the AR evaluation application from the tracking application of the motion tracking system 520 via a different socket. The AR evaluation application can time stamp the ground truth data upon the receipt timing.

In a fourth step of the sequence diagram, the computer system 530 determines a relative timeline between the AR data and the motion tracking data. The relative timeline can be derived by shifting the time stamps of the estimated pose data by the time offset. Alternatively, the relative timeline can be derived by shifting the time stamps generated for the ground truth pose data by the time offset.

In a fifth step of the sequence diagram, the computer system 530 determines association between the AR data and the motion tracking data based on the relative timeline. For instance, once the time alignment is complete, the computer system 530 associates, as applicable, some or each of the estimated pose data with one of the ground truth pose data, where an association is determined by matching a time stamp of the estimated pose data with a time stamp of the ground truth pose data.

In a sixth step of the sequence diagram, the computer system 530 generates an evaluation of the AR application. For instance, the estimated pose data is compared to the ground truth data to derive specific evaluation metrics.

FIG. 6 illustrates an example of a flow for performing an AR evaluation, according to at least one embodiment of the disclosure. The flow is described in connection with a computer system that is an example of the computer system 230 of FIG. 2. Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations. Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, performed in parallel, and/or reordered.

In an example, the flow starts at operation 602, where the computer system receives time data indicating a local time of a user device. The time data can also indicate a baseline time for time stamping of pose data generated by the user device.

In an example, the flow includes operation 604, where the computer system determines a time offset based on the time data and a local time of the computer system. The time offset can also be determined based on the timing of the start of sampling by the computer system of motion tracking data, where this timing can be a function of a delay predefined and available from an SDK of a tracking application of a motion tracking system.

In an example, the flow includes operation 606, where the computer system receives first data indicated an estimated trajectory of the user device and including first time stamps. For instance, the first data is AR data that includes estimated pose data and corresponding time stamps.

In an example, the flow includes operation 608, where the computer system associated the first data with second time stamps based on the first time stamps and the time offset. For instance, the computer system shifts the time stamps of the estimated pose data by the time offset. The shifted time stamps are the second time stamps.

In an example, the flow includes operation 610, where the computer system receives second data indicated a tracked trajectory of the user device. For instance, the second data is received from the motion tracking system and include motion tracking data. In turn, the motion tracking data includes pose data of the user device as detected by the motion tracking system. The pose data is used as ground truth pose data. The tracked trajectory corresponds to the ground truth pose data over time.

In an example, the flow includes operation 612, where the computer system generates third time stamps based on the local time of the computer system. For instance, upon receiving each data of the motion tracking data, the computer system generates a corresponding time stamp, where this time stamp is defined in the local time of the computer system.

In an example, the flow includes operation 614, where the computer system associates the second data with the third time stamps. For instance, the computer system determines, for each data of the estimated pose data, the corresponding time stamp as shifted to the local time of the computer system. This time stamp is matched with one of the third time stamps that are defined in the local time of the computer system.

In an example, the flow includes operation 616, where the computer system determines associations between the first data and the second data based on correspondences between the second time stamps and the third time stamps. For instance, and referring back to operation 614, the second time stamp correspond to a particular estimated pose data. The matched third time stamp corresponds to a particular ground truth pose data. Accordingly, the computer system generates and stores an association between the particular estimated pose data and the particular ground truth data indicating that, at a particular time (e.g., one corresponding to second time stamp or, equivalently, the matched third time stamp) , the AR application estimated the user device to have a particular pose and the tracking application detected the user device to have that same or a different particular pose (depending on how well the AR application tracks the user device’s pose) .

In an example, the flow includes operation 618, where the computer system generates an evaluation of the AR application based on the associations. For instance, the estimated pose data is compared to the associated ground truth data to derive specific evaluation metrics.

FIG. 7 illustrates examples of components of a computer system, according to at least one embodiment of the disclosure. The computer system 700 is an example of the computer system 230 described herein above. Although these components are illustrated as belonging to a same computer system 700, the computer system 700 can also be distributed.

The computer system 700 includes at least a processor 702, a memory 704, a storage device 706, input/output peripherals (I/O) 708, communication peripherals 710, and an interface bus 712. The interface bus 712 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 700. The memory 704 and the storage device 706 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example

memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 704 and the storage device 706 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 700.

Further, the memory 704 includes an operating system, programs, and applications. The processor 702 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 704 and/or the processor 702 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 708 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 708 are connected to the processor 702 through any of the ports coupled to the interface bus 712. The communication peripherals 710 are configured to facilitate communication between the computer system 700 and other computer systems over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present disclosure.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computer system, such as one or more computers or a similar electronic computer system or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computer system can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computer systems include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computer system.

Embodiments of the methods disclosed herein may be performed in the operation of such computer systems. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “including, ” “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Claims

A system including:

a user device configured to execute an augmented reality (AR) application and send first data indicating an estimated trajectory of the user device, the first data generated by the AR application and including first time stamps, the first time stamps generated based on a first local time of the user device;

a motion tracking system configured to send second data indicating a tracked trajectory of the user device; and

a computer system communicatively coupled with the user device and the motion tracking system and configured to:

determine a time offset between a second local time of the computer system and the first local time of the user device,

receive the first data,

associate the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps,

receive the second data,

associate the second data with third time stamps, the third time stamps generated based on the second local time, and

generate an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.
The system of claim 1, wherein the first data includes first pose data of the user device, and wherein the first pose data includes position data and orientation data and is generated by a simultaneous localization and mapping (SLAM) process of the AR application.
The system of claim 2, wherein the second data includes second pose data of the user device, wherein the second pose data is generated by the motion tracking system.
The system of claim 3, wherein the first pose data and the first time stamps are received over a fist socket associated with the AR application, and wherein the second pose data is received based on a second socket associated with a motion tracking application of the motion tracking system.
The system of claim 1, wherein determining the time offset includes:

prior to receiving the first data, receiving time data from the user device, the time data indicating the first local time; and

determining the time offset based on a comparison of the time data with the second local time.
The system of claim 5, wherein the first data has a first data pattern, wherein the time data has a second data pattern, and wherein the first data pattern is different from the second data pattern.
The system of claim 6, wherein the first data pattern includes pose data and a time stamp, wherein the second data pattern includes an identifier of the user device, the first local time, and a time baseline associated with generating the first data.
The system of claim 1, wherein the first data indicates a first pose of the user device at a first time stamp of the first time stamps, wherein the second data includes a second pose of the user device at a third time stamp of the third time stamps, wherein the first pose is associated with a second time stamp of the second time stamps based on the first time stamp and the time offset, and wherein generating the evaluation includes:

determining that the second time stamp corresponds with the first time stamp; and

computing an evaluation metric based on the first pose and the second pose.
The system of claim 1, wherein the first data is received over a time period, wherein the second data is received over the time period, and wherein generating the evaluation includes:

generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps;

determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline; and

computing an evaluation metric based on the associations.
The system of claim 9, wherein the evaluation metric is defined based on user input received via a user interface to an evaluation application of the computer system.
The system of claim 1, wherein the evaluation is generated by using the second data as ground truth data and the first data as variable data.
A method implemented by a computer system, the method including:

determining a time offset between a local time of the computer system and a local time of a user device based on an execution of an augmented reality (AR) application on the user device;

receiving first data from the user device, the first data indicating an estimated trajectory of the user device and generated by the AR application, the first data including first time stamps generated based on the local time of the user device;

associating the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps;

receiving second data from a motion tracking system, the second data indicating a tracked trajectory of the user device;

associating the second data with third time stamps, the third time stamps generated based on the local time of the computer system; and

generating an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.
The method of claim 12, wherein determining the time offset includes:

prior to receiving the first data, receiving time data from the user device, the time data indicating the local time of the user device; and

determining the time offset based on a comparison of the time data with the local time of the computer system.
The method of claim 13, wherein the first data is received over a time period, wherein the second data is received over the time period, and wherein generating the evaluation includes:

generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps;

determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline; and

computing an evaluation metric based on the associations.
The method of claim 13, wherein the first data includes pose data and the first time stamps, wherein the time data includes an identifier of the user device, the first local time, and a time baseline associated with generating the first data.
One or more non-transitory computer-storage media storing instructions that, upon execution on a computer system, cause the computer system to perform operations including:

determining a time offset between a local time of the computer system and a local time of a user device based on an execution of an augmented reality (AR) application on the user device;

receiving first data from the user device, the first data indicating an estimated trajectory of the user device and generated by the AR application, the first data including first time stamps generated based on the local time of the user device;

associating the first data with second time stamps, the second time stamps generated based on the time offset and being different from the first time stamps;

receiving second data from a motion tracking system, the second data indicating a tracked trajectory of the user device;

associating the second data with third time stamps, the third time stamps generated based on the local time of the computer system; and

generating an evaluation of the AR application based on the first data, the second data, the second time stamps, and the third time stamps.
The one or more non-transitory computer-storage media of claim 16, wherein the first data includes first pose data of the user device, and wherein the first pose data includes position data and orientation data and is generated by a simultaneous localization and mapping (SLAM) process of the AR application.
The one or more non-transitory computer-storage media of claim 17, wherein the second data includes second pose data of the user device, wherein the second pose data is generated by the motion tracking system.
The one or more non-transitory computer-storage media of claim 18, wherein the first pose data and the first time stamps are received over a fist socket associated with the AR application, and wherein the second pose data is received based on a second socket associated with a motion tracking application of the motion tracking system.
The one or more non-transitory computer-storage media of claim 16, wherein generating the evaluation includes:

generating a relative timeline between the first data and the second data based on the second time stamps and the third time stamps;

determining associations between first pose data from the first data and second pose data from the second data based on the relative timeline; and

computing an evaluation metric based on the associations.