CN114730482A

CN114730482A - Device coordinate system in associated multi-person augmented reality system

Info

Publication number: CN114730482A
Application number: CN202080078551.3A
Authority: CN
Inventors: 徐毅
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-11-27
Filing date: 2020-11-23
Publication date: 2022-07-08
Also published as: US20220245851A1; WO2021104203A1

Abstract

Methods and systems for mapping the coordinate system of a device of an AR system are disclosed. The third device may receive a first gesture associated with the first coordinate system from the first device and a second gesture associated with the second coordinate system from the second device. The third device may receive an image comprising the fiducial marker presented by each of the first and second devices. The first pose and a portion of the image including the first fiducial marker may define a first set of 3D coordinates. The second pose and a portion of the image including the second fiducial marker may define a second set of 3D coordinates. A coordinate system transformation may be generated from a correspondence between the first and second sets of 3D coordinates. The coordinate system transformation may be sent to the first device and/or the second device.

Description

Device coordinate system in associated multi-person augmented reality system

Technical Field

The present disclosure relates generally to Augmented Reality (AR), and more particularly, but not by way of limitation, to a coordinate system that associates multiple devices in an AR environment.

Background

An AR presentation (AR presentations) may project virtual objects in a virtual environment displayed by a display screen of a device in a real world environment in which the device is located. A camera of the device may capture a real-world environment. Virtual objects may be created to mimic real-world objects and presented on a display screen of a device such that the virtual objects appear to be naturally in a virtual environment. For example, the camera may capture real-time video of a real-world environment including an empty picnic table. The device generates a virtual object of the picnic basket and the virtual object is presented as if placed on a picnic table. The virtual object appears in the display screen to appear substantially as if the corresponding real-world object is physically located at the picnic table.

In a multi-device AR system, each device may present a virtual environment representing a real environment. In particular, each device presents a virtual environment from the perspective of that device. Therefore, in such systems, consistency of presentation of the virtual environment between the devices is important.

Disclosure of Invention

Aspects of the present disclosure include a method for mapping a coordinate system of a device in a multi-person AR system. The method comprises the following steps: receiving, by a third mobile device, data indicative of a first gesture of a first mobile device, wherein the first gesture is defined relative to a first coordinate system associated with the first mobile device; receiving, by a third mobile device, data indicative of a second gesture of a second mobile device, wherein the second gesture is defined relative to a second coordinate system associated with the second mobile device; receiving, by the third mobile device, an image showing the first fiducial marker displayed by the first mobile device and the second fiducial marker displayed by the second mobile device; identifying, by the third mobile device, a first set of three-dimensional coordinates associated with the first fiducial marker, the first set of three-dimensional coordinates identified relative to the first coordinate system; identifying, by the third mobile device, a second set of three-dimensional coordinates associated with the second fiducial marker, the second set of three-dimensional coordinates identified relative to a second coordinate system; generating, by the third mobile device, a coordinate system transformation based on the first set of three-dimensional coordinates and the second set of three-dimensional coordinates, the coordinate system transformation mapping coordinates between the first coordinate system and the second coordinate system; the third mobile device transmits the coordinate system transformation to the second mobile device.

Another aspect of the disclosure includes a system comprising one or more processors and a non-transitory computer-readable medium comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform the above-described method.

Another aspect of the disclosure includes a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described method.

Further areas of applicability of the present disclosure will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples, while indicating various embodiments, are intended for purposes of illustration only and are not intended to necessarily limit the scope of the disclosure.

Drawings

The present disclosure is described in connection with the accompanying drawings.

Fig. 1 is an illustration of an AR system in which coordinate systems of multiple devices may be aligned, in accordance with at least one aspect of the present disclosure.

Fig. 2 illustrates an example of fiducial markers that may be presented on a first device for multi-device coordinate system alignment in accordance with at least one aspect of the present disclosure.

FIG. 3A illustrates a process of generating a pose of one device relative to a coordinate system of another device in accordance with at least one aspect of the present disclosure.

Fig. 3B illustrates a process of generating 3D coordinates of a second device relative to a coordinate system of the second device in accordance with at least one aspect of the present disclosure.

FIG. 3C illustrates a process of generating a coordinate system transformation in accordance with at least one aspect of the present disclosure.

Fig. 4 is a flow chart of a process of aligning coordinate systems of two mobile devices according to at least one aspect of the present disclosure.

Fig. 5 illustrates an example of components of a computing system executing an AR application in accordance with at least one aspect of the present disclosure.

In the drawings, similar components and/or features may have the same reference numerals. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the specification is applicable to any similar component having the same first reference label irrespective of the second reference label.

Detailed Description

The following description provides preferred exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the preferred exemplary embodiments will provide those skilled in the art with a convenient road map for implementing the preferred exemplary embodiment or exemplary embodiments. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope as set forth in the appended claims.

An augmented reality application executes on a device to present virtual objects in video that is simultaneously taken of a real-world environment in which the device is located. The device may present the captured video on a display screen of the device, wherein the virtual object is presented such that the virtual object appears correctly in the virtual environment. For example, virtual objects may be presented on a display screen in substantially the same position and orientation as if the corresponding real-world object were physically present in the real-world environment. To maintain the consistency of the virtual object as the device moves, the device may track its position and orientation in the real-world environment to ensure that the virtual object continues to appear correctly in the event of a change in the perspective of the device. The position and orientation define the pose of the device. The device may define a coordinate system to map the virtual environment to the real world environment and track its pose and virtual objects.

In a multi-device augmented reality application, the same virtual object may be presented on each display screen of multiple devices located in a real-world environment. Typically, each device performs a tracking process, such as a SLAM process, to track its pose (e.g., position and orientation) in the environment according to its own coordinate system. Since the coordinate systems of the devices are different, a transformation between the coordinate systems may be required in order to display the various instances of the same virtual object on the devices in a coordinated, coherent manner. The transformation may be generated based on one or more of the devices displaying the known fiducial markers and the remaining devices taking images of these fiducial markers. In particular, at least two devices present fiducial markers and a third device generates an image showing the presented fiducial markers to derive a transformation between the coordinate systems of the two devices.

For example, during AR calibration, the first device and the second device may begin tracking their respective gestures (e.g., position and orientation) in the real-world environment. The first device may determine a first pose (T) relative to a first coordinate system of the first device using, for example, a SLAM process or other tracking techniques₁). Similarly, the second device may determine a second pose (T) relative to a second coordinate system of the second device₂). The first device may render a first fiducial marker on a display screen of the first device and will be in relation to a first gesture (T)₁) To the third device. The second device may also present a second reference marker on a display screen of the second device and will be in relation to a second gesture (T)₂) The data of the device is transmitted to a third device. The third device may be directed to generate an image showing the fiducial marker displayed by the first and second devices within the same image.

In some examples, the first device and the second device may present instances of the same fiducial marker. Based on the image, the AR application of the third device detects a first feature point p from the first fiducial marker_1i(i is 0,1,2,3 … n) and detecting a second feature point p from the second fiducial mark_2i(i is 0,1,2,3 … n). Second characteristic point p_2i(i ═ 0,1,2,3 … n) and first feature point p_1iAnd (i is 0,1,2,3 … n) in one-to-one correspondence. AR applying a first gesture (T) from a first device₁) And offDeriving a first feature point p in a first coordinate system from known information on the geometry of the first device_1iA first set of three-dimensional (3D) coordinates of (i ═ 0,1,2,3 … n). Further, the AR application follows the second gesture (T)₂) And deriving a second feature point p in a second coordinate system from the information about the geometry of the second device_2iA second set of 3D coordinates of (i ═ 0,1,2,3 … n). The AR application establishes a correspondence between the first set of 3D coordinates and the second set of 3D coordinates.

The third device generates a coordinate system transformation using the correspondence. The coordinate system transformation may map points (e.g., coordinates) in the first coordinate system to corresponding points within the second coordinate system, and vice versa. In some cases, the coordinate system transformation may include an estimate of one or more rigid transformations, such as rotation, translation, reflection, combinations thereof, and so forth.

The devices described herein may be any type of computing device, such as a mobile device (e.g., smartphone, laptop, tablet, PDA, etc.), a desktop computing device, a dedicated hardware device such as a fixed or portable game console, and so forth. The devices described herein may include a built-in camera that may be used during execution of an AR application. In some cases, the camera may not be built in, but connected to the device. In these cases, the device and the camera may move independently of each other. The coordinate system of the device may be based on the position/orientation of the device or based on the position/orientation of the camera.

Fig. 1 is an illustration of an AR system 100 that may map coordinate systems of multiple devices in accordance with at least one aspect of the present disclosure. AR system 100 may run a multi-device augmented reality application in which virtual objects may be presented within the display screen of each device. For example, the first device 104 may define a virtual object to be presented within the environment in which the first device is located. The virtual object may be presented on the display screen 108 of the first device 104 as if the virtual object were a physical object located in the environment. Since the position of the virtual object in the environment may be based on the coordinate system 102 of the first device, which may not be shared by the second device 116 of the multi-device AR application, the virtual object may not be properly rendered on the display screen 120 of the second device 116 unless the coordinate system of the first device 104 may map to the coordinate system 118 of the second device 116.

During an initial calibration process for an AR application, the coordinate system 102 of the first device 104 may be mapped to the coordinate system 118 of the second device 116 without using a fixed reference point in the environment. For example, each of the first device 104 and the second device may present an image of the fiducial marker on a respective display screen. The third device 128 may access the images of the first device 104 and the second device 116 displaying the fiducial markers and use the images (along with the pose of each of the first device and the second device and the device geometry information of each device) to generate a coordinate system transformation that maps the coordinates of the first coordinate system 102 associated with the first device to the corresponding coordinates of the second coordinate system 118 associated with the second device.

In particular, during calibration, it may be determined that the current position of the device is the origin of its own coordinate system. The device may track its motion and update its position in its coordinate system using internal sensors (e.g., accelerometers, Global Positioning System (GPS) sensors, compasses, combinations of the above, etc.), image processing (e.g., using machine learning, deep learning, etc.), or combinations of the above. The device may not have data relating to its environment, but it may track its position relative to the device's initial calibration position (e.g. its origin). For example, if the internal sensor indicates that the device has moved one meter after calibration, the position of the device may be determined to be one meter (in a particular direction) from the origin. Thus, the coordinate system of the device may be used to indicate the relative position of the device (e.g., relative to the position of the device at calibration), not necessarily the absolute position of the device, as the environment may be unknown.

In some cases, the device may perform a simultaneous localization and mapping (SLAM) process that may define a coordinate system of the device and a pose of the device. The SLAM process may also track devices and objects in the environment relative to the coordinate system of the device. Even if the environment is unknown (at least initially unknown) to the SLAM process, the SLAM process can be used to track devices and objects in the environment. The SLAM process may take as input variables such as, but not limited to, control data ct, sensor data st, and time interval t, and generate an output that may include the approximate location of the device xt and map of the environment mt over a given time interval.

SLAM may begin with a calibration step, where an empty map of the environment may be initialized, with the device at the origin. The SLAM process may update xt and mt when the device captures sensor data indicating movement in a particular direction (and optionally image data from a camera of the device that may be used to indicate objects in the environment). SLAM can be an iterative process that updates xt and mt at set time intervals or when new sensor data or image data can be detected. For example, if no sensor change has occurred between time interval t and t +1, then the SLAM process may delay updating the location and map to conserve processing resources. Upon detecting a change in sensor data indicating a high likelihood that the device has moved from its previous location xt, the SLAM process may calculate the new location xt of the device and update the map mt.

Once the first and second devices detect the initial pose of the first and second devices, respectively, the first device may present fiducial markers 112 on display screen 108 of first device 104 and the second device may present fiducial markers 124 on display screen 120 of second device 116. The

fiducial markers

112 and 124 may be predetermined so that the devices of the multi-device AR system 100 may know the size, shape, color, pattern, shading, etc. The

fiducial markers

112 and 124 may be the same or different. If

fiducial markers

112 and 124 are to be identical fiducial markers, the first device may send an identification of the fiducial marker to be displayed to the second device to cause the second device to display the same fiducial marker. Alternatively, the second device may send an identification of the fiducial marker to be displayed to the first device to cause the first device to display the same fiducial marker.

If the

fiducial markers

112 and 124 displayed on the first and second devices are different, the third device may identify the fiducial marker displayed by each device, for example, using an imaging process as described below or by receiving an identification of the displayed fiducial marker from the first and second devices. The third device may then determine that the feature points of one fiducial marker correspond to the feature points of another fiducial marker (e.g., to the same 2D coordinates or the same physical location on the first device, etc.). For example, the third apparatus may use a table including correspondence between feature points of two or more fiducial markers. The third device may use the identification of each fiducial marker and identify from the table the corresponding feature point of each fiducial marker. Fiducial markers 112 and fiducial markers 128 are shown as distinct fiducial markers, where fiducial markers 112 may comprise a set of black squares distributed over a predetermined area, the predetermined area having a known size. The fiducial marks 124 may be a checkerboard pattern of black and white squares.

The third device 128 may be directed to obtain images of the display screen 108 of the first device 104 and the display screen 120 of the second device 116 when the

fiducial markers

112 and 124 are presented so that both

fiducial markers

112 and 124 may appear in the images. Since the size, color, pattern, etc. of the fiducial markers are known, the images of the

fiducial markers

112 and 124 may be processed to define a first set of two-dimensional feature points corresponding to the fiducial marker 112 and a second set of two-dimensional feature points corresponding to the fiducial marker 124. The feature points may correspond to points within the fiducial marker, such as the center of a square, the vertices between squares, the corners of the fiducial marker, or any other point of the fiducial marker that may be readily identified based on the characteristics of the fiducial marker. The first set of two-dimensional feature points may be processed with the first pose of the first device 104 and known geometric information of the first device to identify a first set of 3D coordinates (relative to the first coordinate system 102), where each 3D coordinate may correspond to one feature point in the first set of feature points. The second set of two-dimensional feature points may be processed with the second pose of the second device 116 and known geometric information of the second device to identify a second set of 3D coordinates (relative to the second coordinate system 118), where each 3D coordinate may correspond to a feature point of the second set of feature points. The correspondence between the first set of 3D coordinates and the second set of 3D coordinates may be used to generate a coordinate system transformation comprising one or more rigid transformations. The rigid transformation is a geometric transformation of euclidean space that preserves the distance between pairs of points. Rigid transformations may include rotation (about one or more axes), translation, reflection, or a combination thereof. The rigid transformation may provide a mapping of the coordinates of the first coordinate system 102 to the corresponding coordinates of the second coordinate system 118.

The first coordinate system 102 may be mapped to the second coordinate system 118 during an initial calibration process. In some cases, the mapping process may be performed again after the initial calibration process, for example when the SLAM process resets (since this would initiate a new coordinate system for the device), after calibration values indicate that the mapping is no longer accurate, or a predetermined time interval has elapsed.

Fig. 2 illustrates an example of fiducial markers that may be presented on a device to map the coordinate system of each device in accordance with at least one aspect of the present disclosure.

Fiducial markers

204 and 236 may be fiducial markers that may be used to map the device coordinate system. When placed in the real world, fiducial markers may be used to detect the position, orientation, and/or scale of objects in the environment. The

fiducial mark

204 and 236 may comprise a predetermined size, shape, and pattern that may be known to the device taking the image of the fiducial mark. Using known characteristics of the fiducial markers, the apparatus may detect the fiducial markers in the image and calculate the positions of the fiducial markers, define a virtual object to occupy space near or at the fiducial markers, detect the proportion of objects within the environment, detect the orientation of the apparatus relative to the fiducial markers, combinations thereof, and the like.

For example, the characteristics of each reference marker may be selected so that the reference marker and its characteristic points can be detected regardless of the particular rotation or transformation of the marker within the captured image. For example, the fiducial marker may include one or more shapes within the fiducial marker that appear different when rotated or transformed to indicate the degree of rotation and/or transformation when detected. The degree of rotation/translation may be used to determine the orientation of the device that captured the image. For example, if a fiducial marker in an image is rotated by 45 degrees, it may be determined that the device is also rotated by 45 degrees.

In some cases, one or more rotations, affine transformations, euclidean transformations, reflections, transpositions, and combinations thereof may be performed on the images of the fiducial markers and the processed fiducial markers visualized in a predetermined direction are output. For example, the device that photographs the fiducial marker may store characteristics (e.g., size, pattern, color, etc.) of the fiducial marker. However, fiducial markers may not appear as expected (e.g., rotated, blurred, stretched, etc.). The images may be processed to isolate the fiducial markers and rotate and/or translate the fiducial markers so that they appear in the intended direction. In other cases, the image of the fiducial marker may not be processed to change the orientation of the fiducial marker within the image.

The apparatus detects the orientation of the reference mark by detecting one or more feature points of the reference mark. The characteristic points may be detected using the detected characteristics of the reference mark and the known characteristics of the reference mark. For example, the feature points may be based on a particular characteristic of the marker. For example, the fiducial mark 204 may be a checkerboard pattern. Feature points may be detected at vertices between each set of four squares, at the center of each white square or black square, at corners of each white square, at corners of each black square, combinations thereof, and the like. Each fiducial marker may include one or more feature points that may be detected within the image. In some cases, each fiducial marker may include three or more feature points. Although any number of feature points may be detected, the more feature points that can be detected, the greater the accuracy of mapping one coordinate system to another.

In some cases, each fiducial marker may include a different number of feature points than other fiducial markers. For example, the first fiducial mark may include a first set of feature points and the second fiducial mark may include a second set of feature points (where the number of feature points in the first set is not equal to the number of feature points in the second set). At least some of the feature points of the first set of feature points may correspond to at least some of the feature points of the second set of feature points. These corresponding feature points may be identified by a correspondence table between different fiducial markers. For example, the table may indicate which feature points of the first set of feature points of the first fiducial marker correspond to which feature points of the second set of feature points of the second fiducial marker. In these cases, the first set of feature points may include three or more feature points corresponding to feature points in the second set of feature points.

The apparatus may use image processing to detect reference marks from other parts of the image and to detect feature points from the reference marks. One such image processing technique includes edge detection. Edge detection may include filtering techniques, where one or more filters may be applied to an image. The filter may modify the image by blurring, sharpening, transformation (such as, but not limited to, one or more affine transformations, euclidean transformations, etc.), and the like. The filter may reduce image noise by, for example, removing image artifacts and/or other portions of the image that do not correspond to the fiducial marker.

In some cases, some portions of an image may be processed more than other portions of the image. For example, one portion of an image may appear blurred while another portion of the image may be sharp. Different filters may be applied to different parts of the image, and furthermore, a set of different filters may be applied to different parts of the image. For example, a first portion of an image may be filtered to sharpen the first portion, and a second portion of the image may be filtered with an affine transform filter and noise reduction. Any number of different filters may be applied to the image and/or each partition.

Once the filter is applied, edge detection can identify changes in pixel intensity gradients between adjacent pixels. A large variation in intensity between adjacent pixels may indicate the presence of an edge. For example, a first pixel adjacent to a pixel having a low intensity value has a high intensity value, which may provide an indication that the first pixel is part of an edge. In some cases, pixels that do not belong to an edge may be suppressed (e.g., set to a predetermined red/green/blue value, such as black, where red-0, blue-0, and green-0, or any predetermined red/green/blue value). Edge detection operators such as Roberts crossover operators, Prewitt operators, Sobel operators, etc. may be used as part of the identification of pixel intensity gradients.

A non-maxima suppression process may be used to suppress pixels that do not correspond strongly to edges. The non-maxima suppression process assigns an edge intensity value to each pixel identified as part of the edge using a pixel intensity gradient. For each pixel identified as part of an edge, the edge intensity value of the pixel may be compared to the edge intensity values of the eight surrounding pixels of the pixel. If the pixel has a higher edge intensity value (e.g., a local maximum) than the edge intensity values of surrounding pixels, the surrounding pixels are suppressed. Non-maxima suppression may be repeated for each pixel in the entire image.

A dual threshold process may then be performed to remove noise and/or false edge pixels brought on by previous image processing techniques applied herein. Two pixel intensity thresholds can be defined, one high and one low. A threshold may be used to assign a strong or weak intensity attribute to each pixel. Pixels that include intensity values above the high threshold may be assigned a strong intensity attribute, wherein pixels that include intensity values between the high and low thresholds may be assigned a weak intensity attribute. Pixels that include intensity values below the low threshold may be suppressed (e.g., in the same manner as described above).

Then, a hysteresis process may be performed to remove pixels with weak intensity attributes (weak due to noise, color variations, etc.). For example, a local statistical analysis (e.g., connected component analysis, etc.) may be performed on each pixel having a weak intensity attribute. Pixels with weak intensity properties that are not surrounded by pixels comprising strong intensity properties may be suppressed. The remaining pixels (e.g., non-suppressed pixels) after the hysteresis process include only those pixels that belong to the edge. Although the five processes described above are described in a particular order, each process may be performed any number of times (e.g., repeatedly) and/or in any order without departing from the spirit or scope of the present disclosure. In some cases, only a subset of five processes need be performed on the image. For example, the image processing may perform the identification processing of the pixel intensity gradient without first performing the filtering processing. In some cases, a partially processed image may be received (e.g., one or more of the above processes have been performed). In this case, one or more additional processes may be performed to complete the image processing.

In some cases, signal processing (e.g., similar to radio frequency signals) may be performed on the image. The image may be transformed to the frequency domain (e.g., using a fourier transform, etc.) to represent the frequencies (e.g., pixel intensities, RGB values, etc.) at which particular pixel features are present in the image. In the frequency domain, one or more filters (e.g., without limitation, butterworth filters, band-pass, etc.) may be applied to the image (e.g., during or after preprocessing, edge detection) to suppress or alter particular frequencies. Suppressing specific frequencies may reduce noise, eliminate image artifacts, suppress non-edge pixels, eliminate pixels of specific colors or color gradients, normalize color gradients, and the like. The high pass filter may exhibit edges in the image (e.g., sharpen color and/or intensity between adjacent pixels), while the low pass filter may blend the edges (e.g., blur). Image fill may be performed prior to signal processing to improve signal processing techniques. In some cases, different portions and/or patches of an image may be processed differently, some with high pass filters and others with low pass filters. In some cases, the threshold (e.g., the cutoff frequency of a high-pass or low-pass filter) may be modified for different portions of the image (e.g., based on image processing one or more previous images, machine learning, and/or the like).

The signal processing may also determine other attributes of the image, such as identifying the coherence of the relationships between pixels (e.g., for edge detection, segmentation, pattern analysis, etc.). The relationship between pixels may be used to further refine edge detection and/or identify structural characteristics depicted within the image. For example, coherence can be used to identify relevant image portions (e.g., portions of the same object) and irrelevant image portions.

Fiducial markers

204 and 236 are examples of fiducial markers that may be used to map one coordinate system of two or more devices to another coordinate system. For example, the fiducial marker 204 may be a checkerboard pattern of alternating squares of two or more colors. In some cases, the colors may have high contrast, such as white and black. In other cases, one or more colors other than black and white may be used, such as red, green, and/or blue (or alternatively, cyan, magenta, and/or yellow). In still other cases, contrast pattern fill may be used, where one set of squares may not include a pattern, while another set of squares may use cross-hatching. The fiducial marker 204 may or may not include a border around the fiducial marker because edge detection may be used to define the border of the fiducial marker.

The fiducial marks may have an irregular shape and may not follow the set pattern. For example,

fiducial markers

208, 212, 216, 220, and 236 comprise a set of black squares interspersed in a predetermined area. The square shape of the fiducial marker may be used in part to determine a particular profile of the fiducial marker. Further, a scatter pattern of the set of squares (e.g., a distance between two or more particular squares, etc.) may be used to indicate a location of the device taking the image. For example, the distance between two non-adjacent squares may be known to the device. The device may calculate the difference between the known distance and the distance detected in the captured image. The greater the difference between the known value and the distance calculated from the image, the further the camera may be from the reference mark.

Similarly, the size of a particular set of squares can be calculated and compared to known sizes. The variation in the size of the squares can be used to determine the orientation of the apparatus relative to the fiducial markers. For example, if one side of the square is larger than the other sides of the square, the camera of the device may take an image of the reference mark from an angle that is offset from, rather than perpendicular to, the reference mark.

In some cases, the fiducial marks may have a non-square shape, such as

fiducial marks

224 and 228. The

fiducial markers

224 and 228 may be circular in shape with an internal circular shape. In some cases, one or more additional shapes, such as lines bisecting circles, may be included within the fiducial markers. These additional shapes may indicate the orientation of the fiducial markers in order to indicate the orientation of the device.

Although the particular shape shown in fig. 2 has a particular pattern, shape, color, orientation, etc., the fiducial mark may have any particular shape, which may be geometric, such as square and circular as shown, or amorphous.

Fig. 3A illustrates a process of generating 3D coordinates of a first device relative to a coordinate system of the first device in accordance with at least one aspect of the present disclosure. A third device may take images of the first device and the second device presenting the fiducial marker. The first device may present fiducial markers, such as fiducial marker 304. The third device may detect one or more feature points of the fiducial marker 304. The feature points may be any visible unique feature of the fiducial marker. In some cases, the orientation of the camera may be determined from the orientation of the fiducial mark based on characteristics of the fiducial mark. For example, the fiducial marker 304 may be oriented based on the color of the corner shape or the distance between non-adjacent shapes (e.g., 316, 320). The feature points may be detected based on particular characteristics of the fiducial marker, such as the size (e.g., 308 and 312), color, shape, pattern, combinations thereof, and the like, of individual portions of the fiducial marker. For example, the fiducial marker 304 may include a feature point at each vertex 324 formed by two adjacent shapes that share the same color. In this case, the fiducial marker 304 may include 9 feature points. Other fiducial markers may include more or fewer feature points. In some cases, any fiducial marker that includes at least three feature points may be used as the fiducial marker 304. These feature points may be aggregated into a first set of feature points 328.

The posture (T) of the first device at the moment when the third device takes the image of the marker may be received from the first device₁)332. The gesture may represent a position and orientation of the first device relative to a coordinate system of the first device. In some cases, gesture 332 may be defined by a rotation vector R¹ _nAnd a translation vector t¹ _nTo indicate. The rotation vector and the translation vector may be represented within a transformation matrix as shown. The SLAM process performed on the first device, image processing performed on the image captured by the first device, device geometry information (e.g., size of the device), may be used,Camera information (e.g., zoom focal length, tilt parameters, principal point, scale factor, etc.), internal sensors (e.g., accelerometer, gyroscope, compass, etc.), combinations thereof, etc. to determine a pose (T)₁)。

Posture (T) of the first device₁) And information about the geometry of the first device may be used to identify 3D coordinates 336 of each feature point in the first set of feature points relative to the coordinate system of the first device. For example, since the pose of the first device may indicate the 3D coordinates of the first device in the first coordinate system, the geometry of the device may be used to determine the 3D coordinates of each feature point in the first set of feature points in the first coordinate system.

Fig. 3B illustrates a process of generating 3D coordinates of a second device relative to a coordinate system of the second device in accordance with at least one aspect of the present disclosure. The second device may display the same fiducial marker as the first device, or a different fiducial marker, such as but not limited to fiducial marker 340. The fiducial marker 340 may comprise a set of black shapes distributed within a predefined area. The shapes may be distributed such that the orientation of the reference mark may be used to determine the orientation of a camera that obtains an image of the reference mark. One or more feature points may be detected from the fiducial marker 340 based on characteristics of the fiducial marker 340, such as, but not limited to, a number of black shapes, a distance between particular black shapes (e.g., 344 or 356), a shape having one or more common vertices or a particular number of common vertices, one or more corners of the fiducial marker (e.g., 348), a size of the fiducial marker (e.g., 352), a pattern of the fiducial marker, a color, combinations thereof, and the like. In some cases, any fiducial marker that includes at least three feature points may be used as fiducial marker 340. These feature points may be aggregated into a second set of feature points 360.

The posture (T) of the second device at the time when the third device captures the image of the marker 340 may be received₂)364. In some cases, gesture 364 may be represented by a rotation vector R² _nAnd a translation vector t² _nAnd (4) showing. The rotation vector and the translation vector may be represented within a transformation matrix as shown. Posture (T)₂) Can be combined withThe first device determines and determines in a similar manner as described. Posture (T) of the second device₂) And information about the geometry of the second device may be used to identify data indicating 3D coordinates 368 for each feature point in the second set of feature points relative to the coordinate system of the second device. Data indicative of 3D coordinates 368 may be identified in a similar manner as described above.

FIG. 3C illustrates a process of generating a coordinate system transformation in accordance with at least one aspect of the present disclosure. One or more rigid transformations 372 may be defined using the 3D coordinates 336 of the first set of feature points and the 3D coordinates 368 of the second set of feature points, the rigid transformations 372 may transform coordinates in the first coordinate system to coordinates in the second coordinate system. For example, each of the first feature points n¹Having corresponding feature points n among the second feature points². This correspondence may be used to match each 3D coordinate in the first coordinate system with a corresponding coordinate in the second coordinate system. For example, for each feature point n¹Corresponding 3D coordinates [ x ] in the first coordinate system may be identified¹ _n,y¹ _n,z¹ _n]And a corresponding one of the second feature points n². Then, the feature point n in the second coordinate system can be identified²Associated 3D coordinates [ x ]² _n,y² _n,z² _n]. The 3D coordinate [ x ] can then be determined¹ _n,y¹ _n,z¹ _n]And 3D coordinates [ x ]² _n,y² _n,z² _n]The corresponding relation between them. Can be used for 3D coordinate [ x ]¹ _n,y¹ _n,z¹ _n]Performing one or more rigid transformations, e.g. rotation, translation, reflection, etc., to map the 3D coordinates [ x ]¹ _n,y¹ _n,z¹ _n]Changing to 3D coordinate [ x ]² _n,y² _n,z² _n]. Rigid transformations may include one or more rotations, one or more translations, one or more reflections, combinations of one or more of the above, and the like. In some cases, a second one or more steel may be definedA sexual transformation 372 that can change the coordinates in the second coordinate system to the coordinates in the first coordinate system.

Depending on the selected coordinate system, coordinate system transform 376 may be generated using first one or more rigid transforms 372 or one or more second rigid transforms. For example, the third device (or alternatively the first device, the second device, a user of any device, or the server) may select a reference coordinate system (e.g., the first coordinate system or the second coordinate system) on which the coordinates of the virtual object are based. After selection, a coordinate system transformation 376 may be generated, the coordinate system transformation 376 transforming the coordinates in the non-selected coordinate system to corresponding coordinates in the selected coordinate system. For example, the coordinate system transformation may apply each of one or more rigid transformations to identify, from input coordinates (e.g., 380) in the first coordinate system, corresponding output coordinates (e.g., 376) in the second coordinate system. In some cases, the coordinate system transformation may include a first one or more rigid transformations and a second one or more rigid transformations, such that the coordinate system transformation may transform coordinates in the first coordinate system to corresponding coordinates in the second coordinate system and transform coordinates in the second coordinate system to corresponding coordinates in the first coordinate system, as desired.

Fig. 4 is a flow chart of a process of mapping one coordinate system to another coordinate system in accordance with at least one aspect of the present disclosure. At block 404, the third mobile device may receive a first gesture (T) of the first mobile device₁). The first mobile device may perform a location tracking procedure, such as a SLAM procedure, that continuously tracks a pose (T) of the first mobile device associated with a coordinate system of the first mobile device₁). First posture (T)₁) May be received at a particular time, for example when an image of the first device is taken. The third mobile device may receive device geometry information associated with the first mobile device, such as device size, display screen size, and the like. For example, the third mobile device may receive device geometry information and the first pose from another device (e.g., a server or a second mobile device, etc.), and an image of the fiducial marker (e.g., a fiducial marker as described below) displayed on the first mobile device. In some cases, the third shiftThe mobile device may receive a model identifier of the first mobile device. The third mobile device may then query (e.g., query an internal memory table or query a server) using the model identifier to obtain device geometry information. The third mobile device may also receive an identifier of the first fiducial marker indicating the particular fiducial marker displayed by the first mobile device.

At block 408, the third mobile device may receive a second gesture (T) of the second mobile device₂). Similar to the first device, the second mobile device may perform a location tracking procedure, such as a SLAM procedure, that continuously tracks a pose (T) of the second mobile device associated with a coordinate system of the second mobile device₂). A second gesture (T) may be received at a particular time₂) For example when capturing an image of the second device. In some cases, the first gesture and the second gesture may be received at approximately the same time (e.g., received in parallel). In other cases, the first and second gestures may be received asynchronously, e.g., sequentially, with one gesture being received followed by another gesture. The third mobile device may receive device geometry information associated with the second mobile device, such as device size, display screen size, and the like. For example, the third mobile device may receive device geometry information and the second pose from another device (e.g., a server or a second mobile device, etc.), and an image of the fiducial marker (e.g., a fiducial marker as described below) displayed on the second mobile device. In some cases, the third mobile device may receive a model identifier of the second mobile device. The third mobile device may then query (e.g., query an internal memory table or query a server) using the model identifier to obtain device geometry information. The third mobile device may also receive an identifier of the second fiducial marker indicating the particular fiducial marker displayed by the second mobile device.

At block 412, the third mobile device may receive images of the first fiducial marker presented on the display screen of the first mobile device and the second fiducial marker presented on the display screen of the second mobile device. The third device may receive instructions from each of the first and second devices to present to a user directing the user to operate a camera of the third device to take images of the first and second fiducial markers. For example, the third mobile device may use the camera to obtain images of the display screens of the first and second mobile devices while each of the first and second mobile devices is presenting the first and second fiducial markers. In another example, a third mobile device may receive an image over a network.

The first set of feature points may be detected from a portion of the image including the first reference marker and the second set of feature points may be detected from a portion of the image including the second reference marker. Since the fiducial markers have known dimensions and geometries, the feature points may be used to determine the position and/or orientation of the third device relative to the fiducial. In connection with the pose of the device presenting the reference, the set of feature points may be used to define the 3D coordinates of the set of feature points in the same coordinate system as the pose. In some cases, the set of feature points includes three feature points. In other cases, the set of feature points includes four or more feature points.

At block 416, the third device may define a first set of 3D coordinates. The first set of 3D coordinates may be defined using a first pose of the first mobile device, device geometry information of the first mobile device, and a first set of feature points. For example, using the mobile device geometry, the known dimensions and geometry of the fiducial markers may be utilized to estimate the physical location of each feature point displayed by the first mobile device (e.g., associated with the first coordinate system). The 3D coordinates of each feature point in the first coordinate system may be determined using the pose of the first mobile device.

At block 420, the third device may define a second set of 3D coordinates. The second set of 3D coordinates may be defined using a second pose of the second mobile device, device geometry information of the second mobile device, and a second set of feature points. For example, using the mobile device geometry, the known dimensions and geometry of the fiducial markers may be utilized to estimate the physical location of each feature point displayed by the second mobile device (e.g., associated with the second coordinate system). The 3D coordinates of each feature point in the second coordinate system may be determined using the pose of the second mobile device. In some cases, the first set of 3D coordinates and the second set of 3D coordinates may be defined in parallel. In other cases, the first set of 3D coordinates and the second set of 3D coordinates may be defined asynchronously (e.g., overlapping or sequential, with one set of coordinates defined before the other set of coordinates).

At block 422, a correspondence between the first set of 3D coordinates and the second set of 3D coordinates may be generated. If the first mobile device and the second mobile device display the same reference image, each of the first feature points will correspond to a feature point of the second feature points that forms a corresponding pair of feature points. By matching the 3D coordinates in the first set of 3D coordinates and the second set of 3D coordinates associated with the pairs of feature points, a correspondence between the 3D coordinates may be generated. If a different fiducial image is displayed, each of the first and second mobile devices may transmit the identity of the displayed fiducial marker to a third mobile device. The third mobile device may determine a correspondence between the first characteristic point and the second characteristic point forming the corresponding characteristic point pair using the lookup table. Then, by matching the 3D coordinates in the first set of 3D coordinates and the second set of 3D coordinates associated with the pairs of feature points, a correspondence between the 3D coordinates can be generated. The correspondence may represent a mapping of coordinates associating the first set of 3D coordinates (associated with the first coordinate system) and coordinates of the second set of 3D coordinates (associated with the second coordinate system).

At block 424, a coordinate system transformation may be generated using the first set of 3D coordinates and the second set of 3D coordinates. For example, since each of the first feature points has a corresponding feature point in the second feature points, the 3D coordinates associated with the feature point in the first feature points may be matched to the 3D coordinates associated with the corresponding feature point in the second feature points. Each 3D coordinate associated with the first device may be matched to a 3D coordinate of the second device. After matching, one or more rigid transformations may be applied to the first 3D coordinates in the first coordinate system to change the first 3D coordinates to their matching 3D coordinates in the second coordinate system.

In some cases, a single matching pair of 3D coordinates may be used to define the rigid transformation required to map coordinates in one coordinate system to corresponding 3D coordinates in the other coordinate system. The remaining 3D coordinate pairs may be used to verify the accuracy of one or more rigid transformations. In other cases, two or more matching pairs of 3D coordinates may be used to define the rigid transformation required to map coordinates in one coordinate system to corresponding 3D coordinates in another coordinate system. In other cases, each matching pair of 3D coordinates may be used to define the rigid transformation required to map the coordinates in one coordinate system to the corresponding 3D coordinates in the other coordinate system.

The coordinate system transformation may include one or more rigid transformations that apply one or more of rotation, translation, or reflection to the coordinates in the first coordinate system to determine corresponding coordinates in the second coordinate system. In some cases, the coordinate system transformation may additionally (or alternatively) include one or more rigid transformations that apply one or more of rotation, translation, or reflection to the coordinates in the second coordinate system to determine corresponding coordinates in the first coordinate system.

The coordinate system transformation may map points in a first coordinate system of a first mobile device to corresponding points in a second coordinate system of a second mobile device. In some cases, the location calculated by the SLAM process of the first mobile device may be transformed to a corresponding location in a second coordinate system of the second mobile device. In other cases, the location calculated by the SLAM process of the second mobile device may be transformed to a corresponding location in the first coordinate system of the first mobile device.

At block 428, the coordinate system transformation may be sent to the second mobile device to enable the second mobile device to transform the received coordinates in the first coordinate system into the second coordinate system. In some cases, the coordinate system transformation may be sent to the first mobile device to enable the first mobile device to transform the received coordinates in the second coordinate system into the first coordinate system. In other cases, the coordinate system transformation may be sent to the first mobile device and the second mobile device to enable the coordinates to be transformed into the appropriate coordinate system. This may enable the respective device to translate the received coordinates into the local coordinate system of the mobile device and encode the coordinates of the respective mobile device into the coordinate system of the receiving device, and then transmit the encoded coordinates to the receiving device.

For example, in an AR application, a first mobile device may take an image or video of an environment. The first mobile device may define a first instance of a virtual object to be rendered on a display screen of the first mobile device such that the first instance of the virtual image appears to be physically and naturally located in the environment. The SLAM process may track the first mobile device as it moves in the environment, and the AR application may continue to present the first instance of the virtual object as if it were naturally located in the environment (despite changes in the position or orientation of the first mobile device).

The second mobile device may receive information associated with the virtual object from the first mobile device (or from the server), including characteristics that enable the second mobile device to render an instance of the virtual object on a display screen of the second mobile device and gesture information that indicates a position and a direction of the virtual object in the environment. The second device may use a coordinate system transformation to transform the coordinates of the gesture information from the first coordinate system of the first mobile device to the coordinate system of the second mobile device. The second mobile device may capture an image or video of the environment. The second mobile device may then present the second instance of the virtual object within the captured environmental image/video such that the second instance of the virtual object appears to be physically (and naturally) located within the environment (despite changes in the position or orientation of the second mobile device).

The process of fig. 4 may be performed entirely by the third mobile device, partly by one or more of the first mobile device, the second mobile device, the third mobile device, and/or the server, or entirely by the server. For example, the server may direct mapping between the coordinate systems of the first mobile device and the second mobile device and execute all or part of the augmented reality application. The server may instruct a third mobile device to obtain images of the fiducial markers presented on the first mobile device and the second mobile device. The first gesture may be sent from the first mobile device to the server, the second gesture may be sent from the second mobile device to the server, and the image may be sent from the third mobile device to the server. The server may then generate a coordinate system transformation using the first pose, the second pose, and the image, and send the coordinate system transformation to the first mobile device and/or the second mobile device.

Although the blocks in fig. 4 are presented in a particular order, the blocks may be performed in any particular order. In some cases, each block of fig. 4 may be performed one or more times before proceeding to the next block. Although FIG. 4 depicts a mapping between coordinate systems associated with two devices, the process of FIG. 4 may be extended to coordinate systems mapping any number of devices, for example, by having each additional device perform 404-432. For example, the third mobile device may be used to obtain an image of the additional mobile device that presented the reference image with the reference image displayed by the first mobile device, the second mobile device, or any other mobile device that has been previously mapped.

The process of fig. 4 may be modified while still mapping the coordinate system of one device to the coordinate system of another device. For example, the third device may receive the image of each device as a separate image, such that the first image may include the display screen of the first mobile device and the second image may include the display screen of the second mobile device. The first mobile device, the second mobile device, and the third mobile device may be stationary between taking the first image and taking the second image. In another case, the first fiducial marker may be the same fiducial marker as the second fiducial marker, or a different fiducial marker. If different fiducial markers are used, the third device may include a table of feature points indicating common feature points (e.g., same 2D coordinate positions) in the fiducial markers. In some cases, the table of feature points may be received from the first device and/or the second device along with the respective identifiers of the displayed fiducial markers.

Fig. 5 illustrates an example of components of a computing system executing an AR application in accordance with at least one aspect of the present disclosure. Computing system 504 may be an example of the mobile device depicted in fig. 4. Although these components are shown as part of computing system 504, computing system 504 may also be distributed such that some components may be located within a different hardware platform than other components.

Computing system 504 includes at least a processor 508, a memory 512, a storage device 516, input/output (I/O) peripherals 520, communication peripherals 524, one or more cameras 528, and an interface bus 532. The interface bus 532 may be used to communicate, send, and transfer data, control, and commands between the various components of the computing system 504. Memory 512 and storage 516 may include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM), hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, such as

Memory, and other tangible storage media. Any such computer-readable storage media may be used to store instructions or program code that implement aspects of the present disclosure. Memory 512 and storage 516 may also include computer-readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any combination thereof. Computer-readable signal media includes any computer-readable media that is not computer-readable storage media and that can communicate, propagate, or transport a program for use in connection with computing system 504.

Further, the memory 512 may include an operating system, programs, and applications. The processor 508 may be used to execute stored instructions and include, for example, a logic processing unit, a microprocessor, a digital signal processor, and other processors. Memory 512 and/or processor 508 may be virtualized and may be hosted in another computing system, such as a cloud network or a data center. I/O peripherals 520 may include user interfaces such as keyboards, screens (e.g., touch screens), microphones, speakers, other input/output devices, and computing components such as graphics processing units, serial ports, parallel ports, universal serial bus, and other input/output peripherals. I/O peripheral 520 is connected to processor 508 through any port coupled to interface bus 532. Communication peripheral devices 524 may be used to facilitate communications between computing system 504 and other computing devices over a communication network and include, for example, network interface controllers, modems, wireless and wired interface cards, antennas, and other communication peripheral devices.

While the subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. It is therefore to be understood that the present disclosure is presented for purposes of illustration and not limitation, and does not preclude inclusion of such modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure.

Unless specifically stated otherwise, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," and "identifying" or the like, refer to the action and processes of a computing device (e.g., one or more computers or similar electronic computing devices) that manipulates and transforms data represented as physical electronic or magnetic quantities within the computing platform's memories, registers, or other information storage devices, transmission devices, or display devices.

The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provides results conditioned on one or more inputs. Suitable computing devices include microprocessor-based, multi-purpose computer systems that access stored software that programs or configures the computing system from a general-purpose computing device to a special-purpose computing device that implements one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combination of languages may be used to implement the teachings contained herein in software for programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the above examples may be changed-e.g., the blocks may be reordered, combined, and/or broken into sub-blocks. Some blocks or processes may be performed in parallel.

Conditional language used herein, such as "may," "e.g.," and the like, unless expressly stated otherwise or otherwise understood in the context of usage, is generally intended to convey that certain examples include but others do not include certain features, elements and/or steps. Thus, such conditional language does not generally imply that features, elements, and/or steps are in any way required by one or more examples or that one or more examples must include logic for deciding, with or without author input or prompting, whether these features, elements, and/or steps are included or are to be performed in any particular example.

The terms "comprising," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude other elements, features, acts, operations, and the like. Furthermore, the term "or" is used in its inclusive (and not exclusive) sense, such that when used, for example, to connect lists of elements, the term "or" indicates one, some, or all of the elements in the list. As used herein, "adapted to" or "for" refers to open and inclusive language and does not exclude devices adapted to or used to perform additional tasks or steps. Moreover, the use of "based on" is meant to be open and inclusive in that a process, step, calculation, or other action that is "based on" one or more recited conditions or values may in fact be based on additional conditions or values beyond those recited. Similarly, the use of "based, at least in part, on" means open and inclusive, in that a process, step, calculation, or other action that is "based, at least in part, on one or more recited conditions or values may, in practice, be based on additional conditions or values than those recited. Headings, lists, and numbers are included herein for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another or may be used in various combinations. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. Moreover, certain method or process blocks may be omitted in some embodiments. The methods and processes described herein are also not limited to any particular order, and the blocks or states associated therewith may be performed in other suitable orders. For example, described blocks or states may be performed in an order different than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in series, in parallel, or in some other manner. Blocks or states may be added to or deleted from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added, removed, or rearranged as compared to the disclosed examples.

While the principles of the disclosure have been described above in connection with specific apparatuses and methods, it is to be clearly understood that this description is made only by way of example and not as a limitation on the scope of the disclosure.

Claims

1. A method, comprising:

receiving, by a third mobile device, data indicative of a first gesture of a first mobile device, wherein the first gesture is defined with respect to a first coordinate system associated with the first mobile device;

the third mobile device receiving data indicative of a second gesture of a second mobile device, wherein the second gesture is defined relative to a second coordinate system associated with the second mobile device;

the third mobile device receiving an image showing the first fiducial marker displayed by the first mobile device and the second fiducial marker displayed by the second mobile device;

the third mobile device identifying a first set of three-dimensional coordinates associated with the first fiducial marker, the first set of three-dimensional coordinates identified relative to the first coordinate system;

the third mobile device identifying a second set of three-dimensional coordinates associated with the second fiducial marker, the second set of three-dimensional coordinates identified relative to the second coordinate system;

the third mobile device generating a coordinate system transformation based on the first set of three-dimensional coordinates and the second set of three-dimensional coordinates, the coordinate system transformation mapping coordinates between the first coordinate system and the second coordinate system; and

the third mobile device sends the coordinate system transformation to the second mobile device.

2. The method of claim 1, wherein identifying the first set of three-dimensional coordinates comprises:

detecting a set of feature points of the first fiducial marker, wherein each three-dimensional coordinate of the first set of three-dimensional coordinates corresponds to a feature point of the set of feature points.

3. The method of claim 2, wherein the set of feature points comprises three or more feature points.

4. The method of claim 1, wherein the first pose is represented by a rotation vector and a translation vector, wherein the first set of three-dimensional coordinates associated with the first fiducial marker is defined at least in part using the first pose and a geometry of the first mobile device.

5. The method of claim 1, wherein the coordinate system transformation is used to cause a first virtual object to be rendered by the first mobile device to appear in an environment in substantially the same location and substantially the same orientation as a second virtual object rendered by the second mobile device.

6. The method of claim 1, wherein the coordinate system transformation comprises a matrix defining a mapping between the first coordinate system and the second coordinate system, and

sending the coordinate system transformation includes sending the matrix to the second mobile device over a data network between the third mobile device and the second mobile device.

7. The method of claim 1, wherein objects within an environment in which a second mobile device is located are tracked based on the first coordinate system and the coordinate system transformation.

8. A mobile device, comprising:

one or more processors;

a camera;

a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

receiving data indicative of a first gesture of a first mobile device, wherein the first gesture is defined relative to a first coordinate system associated with the first mobile device;

receiving data indicative of a second gesture of a second mobile device, wherein the second gesture is defined relative to a second coordinate system associated with the second mobile device;

receiving an image comprising a portion of a first display screen of the first mobile device and a portion of a second display screen of the second mobile device, wherein the first display screen of the first mobile device comprises a first fiducial marker and the second display screen of the second mobile device comprises a second fiducial marker;

identifying a first set of three-dimensional coordinates associated with the first fiducial marker, the first set of three-dimensional coordinates identified relative to the first coordinate system;

identifying a second set of three-dimensional coordinates associated with the second fiducial marker, the second set of three-dimensional coordinates identified relative to the second coordinate system;

generating a coordinate system transformation using the first set of three-dimensional coordinates and the second set of three-dimensional coordinates, wherein the coordinate system transformation maps coordinates between the first coordinate system and the second coordinate system; and

sending the coordinate system transformation to the second mobile device.

9. The mobile device of claim 8, wherein identifying the first set of three-dimensional coordinates comprises:

10. The mobile device of claim 9, wherein the set of feature points comprises three or more feature points.

11. The mobile device of claim 8, the first pose being represented by a rotation vector and a translation vector, wherein the first set of three-dimensional coordinates associated with the first fiducial marker is defined at least in part using the first pose and a geometry of the first mobile device.

12. The mobile device of claim 8, wherein the coordinate system transformation is to cause a first virtual object to be presented by the first mobile device to appear in an environment in substantially a same location and substantially a same direction as a second virtual object presented by the second mobile device.

13. The mobile device of claim 8, wherein the coordinate system transformation comprises a matrix defining a mapping between the first coordinate system and the second coordinate system, and

sending the coordinate system transformation includes sending the matrix to the second mobile device over a data network.

14. The mobile device of claim 8, wherein objects within an environment in which the second mobile device is located are tracked based on the first coordinate system and the coordinate system transformation.

15. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

sending the coordinate system transformation to the second mobile device.

16. The non-transitory computer-readable medium of claim 15, wherein identifying the first set of three-dimensional coordinates comprises:

17. The non-transitory computer-readable medium of claim 16, wherein the set of feature points includes three or more feature points.

18. The non-transitory computer-readable medium of claim 15, wherein the first pose is represented by a rotation vector and a translation vector, and wherein the first set of three-dimensional coordinates associated with the first fiducial marker is defined at least in part using the first pose and information related to a geometry of the first mobile device.

19. The non-transitory computer-readable medium of claim 15, wherein the coordinate system transformation is to cause a first virtual object to be presented by the first mobile device to appear in an environment in substantially a same location and substantially a same direction as a second virtual object presented by the second mobile device.

20. The non-transitory computer-readable medium of claim 15, wherein objects within the environment in which the second mobile device is located are tracked based on the first coordinate system and the coordinate system transformation.