WO2017027338A1

WO2017027338A1 - Apparatus and method for supporting interactive augmented reality functionalities

Info

Publication number: WO2017027338A1
Application number: PCT/US2016/045654
Authority: WO
Inventors: Seppo T. VALLI
Original assignee: Pcms Holdings, Inc.
Priority date: 2015-08-07
Filing date: 2016-08-04
Publication date: 2017-02-16

Abstract

A camera marker system is disclosed. In an exemplary embodiment, a camera marker includes a display operative to display an augmented reality marker and a wide-angle camera on the same side of the camera marker as the display. A plurality of camera markers perform a self-calibration routine in which each camera marker determines relative locations of other camera markers in its field of view, and the camera markers cooperate to define a shared coordinate system. The location and orientation of an augmented reality viewer, such as a head-mounted display, can then be determined within the shared coordinate system using camera markers in a field of view of the augmented reality viewer.

Description

APPARATUS AND METHOD FOR SUPPORTING

INTERACTIVE AUGMENTED REALITY FUNCTIONALITIES

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to and the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application Serial No. 62/202,431, filed August 7, 2015 and entitled "APPARATUS AND METHOD FOR SUPPORTING INTERACTIVE AUGMENTED REALITY FUNCTIONALITIES," the full contents of which are hereby incorporated herein by reference.

BACKGROUND

[0002] In Mixed or Augmented Reality (AR) systems, virtual 3D models or animations are embedded into video views from the real environment. The position of the augmented 3D information is defined in relation to a set of known features in the video views. Traditionally these distinctive features are brought to the scene in the form of graphical markers, which are relatively easy to detect and track from the video.

[0003] Recently, using natural features for augmented reality tracking has also become feasible and popular. It is less intrusive than using graphical markers but also more challenging, as the AR content production process becomes more dependent on the target environment, e.g., whether or not the environment contains distinctive features to use, or if the lighting conditions stay stable enough between the offline content production and the real-time use.

[0004] In Augmented Reality, graphical tags, fiducials or markers have been commonly used. Graphical markers have certain advantages over the using of natural features. For example, graphical markers help to make the offline process for mixed reality content production and use more independent of the actual target environment. This allows content to be positioned more reliably in the target embodiment based on the position of graphical markers, whereas changes in the environment (e.g., changes in lighting or in the position of miscellaneous objects) can otherwise make it more difficult for an augmented reality system to consistently identify position and orientation information based only on the environment.

[0005] Passive graphical markers are typically printed on paper or other stable substrate and cannot change their appearance after being printed. Being passive, they naturally also lack the ability to support additional functionalities based on, for example, image processing or electrical connections. An example of a dynamic graphical marker is described in, for example, U.S. Patent Application Publication No. 2013/0109961 Al, entitled "Apparatus and method for providing dynamic fiducial markers for devices."

SUMMARY

[0006] The present disclosure addresses the benefits and opportunities offered by an electronic, connected, and camera equipped AR marker. In particular, the present disclosure describes a camera marker. In some embodiments, a camera marker operates as an electronic marker device that combines a wide angle, electronic pan-tilt-zoom camera with a display. The device's display is used in some embodiments for declaring and advertising the availability of Augmented Reality (AR) information, as well as for showing markers to augment such information.

[0007] In an exemplary embodiment, a camera marker is operated to display an image of at least a first augmented reality marker. The camera marker is further operated to capture an image of at least a second augmented reality marker. The second augmented reality marker may be displayed by a second camera marker. Based at least in part on the image of the second augmented reality marker, the position of the second augmented reality marker with respect to the camera marker may be determined. This position may then be provided to a position server.

[0008] In an exemplary embodiment, the device's camera is used for automatic calibration of a multi-marker setup, enabling accurate detection and tracking of the user and supporting new interaction and presence related effects and services.

[0009] In some embodiments, user acceptance for one or more surrounding devices is gained by embedding the device in the form of a familiar consumer object, such as a photo frame.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] FIG. 1 is a schematic plan view of an embodiment in which camera markers are employed in an augmented reality system in a room.

[0011] FIG. 2A provides a perspective view of a camera marker device with a marker shown on a display of the device.

[0012] FIG. 2B provides a perspective view of a camera marker together with augmented content as seen by AR glasses or a smart phone. [0013] FIG. 2C further illustrates exemplary effects of transforming a fiducial marker displayed on a screen of the camera marker device or viewed by AR glasses.

[0014] FIG. 3 illustrates architecture of an exemplary augmented reality system employing one or more camera markers.

[0015] FIG. 4 is a functional block diagram of components of a camera marker device.

[0016] FIG. 5 is a schematic illustration of an embodiment in which a camera marker operates as a virtual mirror.

[0017] FIG. 6 illustrates an exemplary wireless transmit/receive unit (WTRU) that may be employed as camera marker or common position server in some embodiments.

[0018] FIG. 7 illustrates an exemplary network entity that may be employed as a camera marker or common position server in some embodiments.

[0019] FIG. 8 illustrates a collection of exemplary augmented reality (AR) markers of the type that can be displayed by exemplary camera markers.

[0020] FIG. 9 illustrates an exemplary method of positioning AR content using camera marker devices.

[0021] FIG. 10 is a block diagram illustrating an exemplary functional architecture of components of a camera marker in accordance with an exemplary embodiment of the present disclosure.

DETAILED DESCRIPTION

[0022] This present disclosure describes a camera marker. In exemplary embodiments, a camera marker is an electronic marker device that combines a wide angle, electronic pan-tilt-zoom camera with a display. The device's display can be used for declaring and advertising the availability of Augmented Reality (AR) information, as well as for showing markers to augment such information.

[0023] The device's camera can be used for automatic calibration of a multi-marker setup. The calibrated marker setup can be used for accurately detecting and tracking the user to support new interaction and presence-related effects and services. In some embodiments, the camera marker is provided in the form of a familiar consumer object, such as a digital photo frame, to encourage user acceptance of the device and to allow the device to blend with existing room decoration.

[0024] FIG. 1 illustrates a user 105 with AR glasses (with field of view shown by lines 107) in a room 100 with a camera marker 115 on each wall. Camera markers 115 are placed in such a way that at least one other camera marker is seen on each camera view (lines 117 depict exemplary fields of view of each camera marker). This is to ensure that a self-calibration process can be made to derive a common coordinate system for the multi-marker setup. A common coordinate system can then be used to derive an accurate estimate for the user' s position in the room, and for capturing his/her visual parameters like gestures, appearance, etc.

[0025] There are few restrictions on the placement of camera markers in the environment. The number and location of camera markers can be selected based on, for example, the size of the space to be tracked and requirements for the tracking accuracy. That the camera marker devices 115 can also be positioned horizontally, e.g., on tables. In many AR visualizations, it is beneficial to define the floor level. However, to preserve the safety of the user and of the device, it may not be beneficial to place a camera marker on the floor. In some embodiments, camera markers can be used in conjunction with one or more traditional printed fiducial markers. For example, a printed fiducial marker may be placed on a surface and may be used to define the surface. For example, a printed fiducial marker may be placed on the floor and used to define the floor level. Similarly, a printed fiducial marker may be used to define other surfaces, such as desktops, tabletops, and walls. In some embodiments, a printed maker may be attached to the viewing device, e.g. on the backside of a tablet, in order to facilitate determination of the pose (the position and orientation) of the viewing device by the system.

[0026] FIG. 1 illustrates a user 105 in a room 100 with multiple camera markers 115. Wide angle cameras are used for calibration and user tracking purposes. In some embodiments, the wide-angle camera has an angle of view of at least 60° (see field of view lines 117). In other embodiments, the wide-angle camera has an angle of view of at least 90°. In further embodiments, the wide-angle camera has an angle of view of at least 120°. A virtual character 110 is augmented into AR glasses, which carry a camera for marker detection and tracking (see field of view lines 107).

[0027] In exemplary embodiments, a camera marker device is accessible and controllable both over an internet protocol (IP) network and locally, e.g., by using a smart phone as a remote control device. Local control is particularly useful for user interaction, such as for changing, scaling and turning the augmented or displayed content.

[0028] FIG. 2A illustrates an exemplary camera marker device 200. As illustrated, the device's appearance does not necessarily differ much from a smartphone or a tablet. For user acceptance, the appearance of an ordinary photo frame may be beneficial. The size of the display depends on the targeted content (markers and/or eye-catchers) and the desired freedom for moving the marker on the display. The marker 205 shown on the display can be replaced or modified over network. The content changes respectively. The camera 210 is used for calibration and user tracking purposes. In the example illustrated in FIG. 2A, the wide-angle camera and the display of the camera marker are both on the same face of the smart marker, with both facing the same direction.

[0029] FIG. 2B illustrates an embodiment of a face model 220 augmented in relation to the marker 205, as seen by AR glasses or a smart phone. Generally, for FIGs. 2B-C, a view-point to the marker / marker orientation is kept unchanged.

[0030] In an exemplary embodiment, a camera marker combines a wide-angle camera for tracking the space with a display for showing markers for Augmented Reality (AR) information. In addition to benefits in AR, the camera marker can be used for new user interaction and presence related services.

[0031] Self-calibration of the camera marker simplifies the setup process of a system that includes multiple camera markers. Self-calibration is useful for ensuring the system's accuracy and stability both in AR visualization and user tracking related functionalities. Self-calibration may be based on detecting the pose of other markers in each camera marker's own camera view.

[0032] To support self-calibration and user-tracking properties, camera marker devices are in communication with a common position server that combines all captured information, e.g., mapping marker positions into common coordinates, and that derives a common estimate for the position of the user. In some embodiments, one of the camera marker devices operates as the server. In other embodiments, a separate server device is provided (either locally or, for example as a cloud service or other networked service) in order to reduce computational capabilities required from each individual camera marker.

[0033] FIG. 3 illustrates an exemplary system architecture. In one embodiment, the system may comprise a user 301 with AR glasses. Camera markers (315, 316) may be linked with local computers (320a, 320b, 320n), which may be integrated with the camera marker (e.g., unit 318). The camera markers 315, 316 and their local computers 320 may communicate with a server and/or central computer 325, which combines all devices. The user's AR glasses may communicate with the backend 325 by a wireless connection 330. In some embodiments, an AR augmentation 303 may be presented to the user's AR view, in relation to the camera markers.

[0034] A network connection is used to provide content to the electronic marker display, either a marker for AR, a picture (or video) for an eye-catcher, animation, advertisement or other chosen content. Preferably the connection is wireless (e.g., WLAN), although a cord may be used to power the device. [0035] A network connection (which may be the same network connection) can be used for another purpose to connect the camera marker to the user's viewing device. The viewing device may be, for example, a camera phone, tablet, virtual glasses with camera, and the like. Thanks to this connection, the system infrastructure tracks at each time instant the marker's (marker display's) orientation with respect to the user. This information can be used, e.g., for tracking the user by electronically zooming in the marker camera view, and analyzing its content.

[0036] A further communication connection is used for local remote control of the camera marker by a smart phone or a TV-like remote controller. Feasible connection technologies include WLAN, Bluetooth, and Infrared link, among others.

[0037] FIG. 4 is a functional block diagram of components in one example of a camera marker device. As illustrated in FIG. 4, the camera marker includes a processor 402, a camera 404, which may be a wide-angle camera, a display 406, such as a LCD, which may be used to display an AR marker. An optional keypad 408 may be provided. The keypad 408 may be implemented as a feature of the display 406, where the display 406 is a touch-sensitive display. The exemplary camera marker device further includes non-volatile memory 410 and volatile memory 412, a transmitter 414, and a receiver 416. Other network input/output connections may also be provided.

[0038] In some embodiments, a camera marker is provided with audio capture and playback features. Audio may be used to increase the attractiveness and effectiveness of the videos used for announcing/advertising the available AR content. Audio may also be used as a component of the augmented AR content. A microphone can be used to capture user responses or commands.

[0039] When building up a multi-marker setup, various combinations of electronic and paper markers are feasible. In such a setup, for example, a paper marker on the floor could specify the floor level without the risk of an electronic device being stepped on. Paper markers may also be used as a way to balance the trade-off between calibration accuracy and system cost. In addition to graphical markers, also natural print-out pictures can be used as part of a hybrid marker setup. Even natural planar or 3D feature sets can be detected by multiple camera markers and used for augmenting 3D objects.

[0040] In some embodiments, at least some local processing is performed in each marker device in order to reduce the amount of information to be transmitted to the common server. Marker detection is one of such local operations. Note that camera marker setup is relatively stable, and tracking in camera markers is not needed to such an extent as in the user's viewing device (AR glasses or tablet), which is moving along with the user. Another example is the control of the wide- angle camera in order to capture, for example, cropped views of other markers (for marker detection and identification), or user's visual parameters. A third example for local processing is to use camera view for deriving the actual lighting conditions in the environment in order to adapt the respective properties for the virtual content for improved photorealism.

[0041] Instead of just utilizing visual cameras, camera markers can be equipped with 3D cameras, such as RGB-D or ToF sensors, for capturing depth information. As the success of devices such as the Kinect camera has shown, it can increase the versatility and performance of related functionalities and services. The use of camera markers may encourage the acceptance of 3D cameras as a ubiquitous part of users' environment.

[0042] As a reference, a system for real-time 3D reconstruction of room-sized spaces has been described in [2]. The system uses Kinect Fusion modified to a set of fixed sensors, which might be used also in a system of 3D camera markers.

[0043] Together with the information on the user' s real view-point (the information obtained, e.g., by analyzing the captured 3D scene, or obtained from virtual glasses), the 3D captured scene can be used to implement accurate user-perspective AR rendering (cf. illustration for device- perspective and user-perspective magic lenses in Baricevic et.al 2012). A more traditional way of capturing 3D information is to use two (e.g., stereo) or more cameras.

[0044] As described above, multiple markers can be used in AR both to give more and better 3D data of the environment. To provide this benefit, multiple markers are calibrated with respect to each other and the scene. Typically, calibration is performed by capturing the multi-marker scene by a moving external camera and making geometrical calculations from its views. An example of a multi-camera calibration method is given in [5].

[0045] Providing the markers with wide-angle cameras enables self-calibration in a multiple camera-marker system. The views of the marker cameras themselves can be used for the mutual calibration of all devices, and the calibration can be updated when necessary, e.g., to adapt into any possible changes in the setup.

[0046] In [3], a feasible process for auto-calibration is described, which can be applied also for multiple camera markers setup. The calibration is a real time process and does not need a separate calibration phase. The user may lay markers randomly on suitable places and start tracking immediately. The accuracy of the system improves on the run as the transformation matrices are updated dynamically. Calibration can also be done as a separate stage, and the results can be saved and used later with another application. The described algorithms can be applied to various types of markers. [0047] Viewing of AR information is typically made by a device using a camera (usually in a mobile phone or wearable glasses) to detect the orientation and scale of a marker. Location of the user with respect to the marker can respectively be sent from the viewing device over a network connection. This is an example of a type of inside-out type tracking based on camera's own motion.

[0048] In some embodiments, a user's position can also be derived by the cameras around the user. This outside-in type of tracking can be accomplished by camera markers, and brings some potential benefits. In addition to capturing user's position, marker cameras can be used to capture the user's visual appearance, gestures, or motion.

[0049] More reliable user tracking can be effected using multiple connected and calibrated marker cameras around the user, as described above. This makes the user capture easier and more accurate (e.g., tracking in large spaces, handling of occlusions in the scene, etc.) than when using one camera tracking, whether it is based on one wearable camera or one (typically fixed) external camera.

[0050] In the case of multiple users, after the position of each of the users is determined, the users can be provided with individualized viewpoints of the same content. It is also possible to serve the different users with different content, especially in embodiments in which the service is permitted to track users' identities. The user identities may be, for example, anonymously numbered indices. Such anonymous indices are sufficiently detailed to provide some level of service enhancement. However, having information regarding real identities enables the system to provide more personalized services.

[0051] Camera markers are an access point for information and interaction for the user, and they are used in some embodiments for analyzing users' responses to the content, monitoring user activities in the space, and collecting related contextual information. The use of camera markers allows for human-computer interface (HCI) studies to be performed in Augmented Reality. This can be compared to known ways of user observation by an external camera or eye-tracker.

[0052] User behavior data can be used in many ways to better understand and serve the user. The observations can also be used actively to provide the user with interactive content, as described in the following.

[0053] By tracking the user, interactive effects can be produced. These effects can be shown both when using the camera marker for showing AR information (seen, e.g., by AR glasses) or for showing the effects directly on a large enough camera marker display. [0054] As the orientation of the camera marker device is determined with respect to the user, the device's display or augmentation may be used to reflect the user the environment to almost any desired direction seen from the device (provided that the camera shows a wide enough panorama from the environment). In one such embodiment, the device acts as a virtual mirror, mimicking a physical mirror view even for a moving user (resulting with mirror-like motion parallax seen by the user). The virtual mirror can be in a fixed angle, reflecting to/from any chosen direction, or even dynamically turn to any desired direction while the user is moving (e.g., depending on his/her trace). This will enable new types effects and services in many spaces. The virtual mirror concept is illustrated in FIG. 5. A user 501 with AR gear may view (view 505) a "virtual mirror" 510 (e.g., a camera marker) on a wall 508. The virtual mirror 510 may give a reflection 515, which may shift direction relative to the user's motions.

[0055] FIG. 2C illustrates effects of scaling and turning the marker 205 over a network connection (either by the user or the service provider). Local user control enables for example the user to place an advertised 3D product (e.g., a couch) at the appropriate scale into a preferred position to his/her room. The provider or broadcaster of the advertisement does not need to know about the local circumstances, as the user is the one to make the composition for AR visualisation.

[0056] The size of the camera marker's display naturally limits the freedom for (locally) controlling the point at which an object is augmented. This is especially true for spatial translations. However, an augmented object can be moved longer distances by allowing the user to change the perspective (angle) of a marker image as well as the distance between the marker and the augmented object.

[0057] Each camera marker used in an application has a location and orientation (pose) with respect to a coordinate system used to provide augmented content. In an exemplary setup process, two or more camera markers cooperate to determine their respective locations and orientations. In the exemplary setup process, a set of six values is used to specify the location and orientation of camera marker with respect to the coordinate system, such as the three spatial coordinates (x,y,z) defining the location and three Euler angles (φ,θ, ψ) defining the orientation. It should be understood that alternative parameterizations of the location and orientation of the camera markers may also be employed.

[0058] Consider, for the sake of illustration, an embodiment in which three camera markers are in use, and in which each of the camera markers is equipped with a wide-angle camera with a field of view that encompasses the other two camera markers. In the exemplary setup process, each of the camera markers displays an AR marker or other fiducial marker, which may be a unique (or at least locally unique) marker. Each of the camera markers obtains an image from its camera and processes the image to recognize the two displayed AR markers and to determine the coordinates of those markers within the image. Having processed the image to recognize the AR markers, the camera marker further determines the angle between those markers. The determination of the position and orientation of the AR markers within the image may be performed using known AR marker detection techniques using, for example, statistical based, gradient based, pixel connectivity-edge linking based and Hough transform based methods.

[0059] In some embodiments, the image taken by each camera is further processed to determine the distance of the other camera markers. In some embodiments, the distance of the other camera markers is determined based at least in part on the apparent size and perspective of the camera markers within the image. This may be done by, for example, comparing the apparent size and shape of the camera markers in the image with a known actual size and shape of the camera markers. This may also be done by comparing the apparent size and shape of AR markers displayed on the camera markers with a known actual size and shape of the camera markers. The actual size and shape of camera markers and AR markers displayed thereon may be made available to each camera marker (and/or common position server) in advance, e.g., during manufacture or configuration, or that information may be shared (e.g., over a local network) during the setup process. The distance estimation process takes into consideration the optics and resolution of the camera marker. In some embodiments, each AR marker (whether printed or displayed on a camera marker) may convey information identifying its own actual physical dimensions, allowing camera markers to estimate the distance of the AR marker based on a comparison between the actual and apparent physical dimensions of that AR marker.

[0060] In embodiments in which the camera marker is provided with a stereo camera, depth camera or other depth-sensing technology, the distance of the other camera markers may be determined based at least in part on the corresponding depth measurement. In some embodiments, depth can be measured using other techniques. For example, the camera markers may exchange audio signals, with the travel time of the audio signals being indicative of the distance between markers.

[0061] In some embodiments, the image taken by each camera is further processed to determine the apparent angle of orientation of the other markers. In some embodiments, each camera marker is equipped with one or more accelerometers, and the camera marker is operative to process readings from the accelerometer or accelerometers to determine one or more angles of that camera marker with respect to the vertical (e.g., "pitch," "roll," and "yaw" angles). [0062] In the exemplary setup process, the angle and distance measurements performed as described above provide sufficient information to determine the location and orientation of all of the camera markers. These calculations can be performed using well-known principles of trigonometry.

[0063] In some embodiments, the locations and orientations of visual features other than camera markers may be used in the setup process. For example, as mentioned above, one or more printed markers may be employed, such as a printed marker placed on the floor.

[0064] An exemplary method is illustrated in FIG. 9. A camera marker operates to obtain a wide- angle image of a location in which it is situated (905), with the wide-angle image including one or more other camera markers in the field of view of the image. The camera marker operates to detect other camera markers (or printed markers) in the image (910). The camera marker then operates based on the shape and size of the camera markers in the image to determine marker coordinates in the local coordinate system (915). In an exemplary embodiment, the camera marker determines from the camera image the coordinates in the local coordinate system corresponding to the six degrees of freedom of the marker, e.g., x, y, z for marker position, and φ,θ, ψ (or α, β, γ) for marker orientation, the coordinates representing the pose of the marker. This determination may be made based on inputs such as a known actual size of the visible camera marker (or of the AR marker displayed by that camera marker), information concerning distortion introduced by wide-angle imaging optics, pixel resolution, focal length, and the like.

[0065] In the illustrated example, the camera marker provides the coordinates of the other camera markers (and possibly printed fiducial markers) in the local coordinate system to a common position server (920), along with identifiers of the markers (which may be identifiers encoded in the markers themselves using, for example, a QR code or similar technology). In some embodiments, the common position server is implemented in the camera marker itself, as in the embodiment of FIG. 9. In other embodiments, the common position server is implemented in a different camera marker or in a separate network node.

[0066] The common position server receives information regarding marker positions expressed in one or more different coordinate systems. For example, the common position server may receive from one camera marker an indication that a marker labeled Ml is at coordinates (χ,γ,ζ,φ,θ, ψ) in one local coordinate system and that the same marker labeled Ml is at coordinates (χ',γ',ζ',φ',θ', ψ)' in a different local coordinate system measured with respect to a different camera marker. Similar sets of coordinates may also be received representing the positions of different markers as measured by different coordinate systems. The position server defines a common coordinate system and transforms the coordinates of the markers into the common coordinate system (925).

[0067] Various techniques may be used to transform the coordinates of the markers into the common coordinate system. As one example, one of the local coordinate systems is defined as the common coordinate system. A transformation is then found between the common coordinate system and the local coordinate systems. The transformation may be found by testing a plurality of different transformations that result in the alignment of the locations of different camera markers. A best alignment may be selected as, for example, an alignment that minimizes the sum of least squares of distances between representations of the different camera markers in different coordinate systems.

[0068] For example, consider a system with three camera markers Ml, M2, and M3. Suppose that the local coordinate system of a camera marker Ml is selected as the common coordinate system, and suppose that camera markers Ml and M2 are in one another's fields of view, while camera marker M3 is only in the field of view of M2. The system of camera markers cooperates to determine the coordinates of camera marker M3 in the common coordinate system.

[0069] The exemplary system operates to determine a transformation that transforms the coordinate system of M2 to the local coordinate system of Ml . This transformation may take the form, for example, of a vector offset combined with a rotation. For example, an arbitrary location in the coordinate system of M2 may be expressed by a vector X₂. The same location in the coordinate system of Ml may be expressed by the vector Xi, where Xi = Ti,₂ + Ri,₂ X₂ , with Ti,₂ being a three-dimensional vector and Ri,₂ being a three-dimensional rotation matrix.

[0070] The transformation parameters (vector Ti,₂ and matrix Ri,₂) may be determined with the use of a search through a search space to minimize the sum of square distances between different representations of the same camera marker. For example, suppose additional markers MA and MB have positions represented, respectively, by vectors Ai and Bi in the coordinate system of Ml as measured by Ml . Those same markers have positions represented, respectively, by vectors A₂ and B₂ in the coordinate system of M2 as measured by M2. The positions as measured by M2 can thus be expressed in the coordinate system of Ml as follows:

Bl' = Tl,2 + Rl,2 B₂

Ideally, the transformation parameters Ti,₂ and matrix Ri,₂ are selected such that Ai'

and such that Bi' =Bi. However, due to inaccuracies and other variations in position determination, it may be desirable to select the transformation parameters Ti,₂ and matrix Ri,₂ so as to minimize the sum S, where S is given by the equation:

S = I Ai^* - Ai I² + I Bi^* - Bi |²

It should be apparent that the above equation can readily be generalized to include components that represent additional camera marker positions.

[0071] In some embodiments, the transformation parameters may include one or more scaling factors. Such a transformation may be expressed as Xi = Ti,₂ + Ri,₂ αΧ₂ , for example, where a is a scaling factor. In some embodiments, a is a scalar. In other embodiments, a is a vector or tensor value. In some embodiments, R is not a unitary rotation matrix but rather an arbitrary matrix, the components of which are adjusted using, e.g., a search technique to minimize the sum S.

[0072] Once a desirable set of transformation parameters has been determined, it is possible to determine the location of the marker M3 in the common coordinate system defined by Ml, even though M3 is not in the field of view of Ml . Suppose that the position of M3 is represented by a vector XM3 in the local coordinate system of M2, then the coordinates of M3 in the common coordinate system can be determined to be X , where

XM₃' = Ti,₂ + Ri,₂ XM₃ .

[0073] In embodiments in which the position of a single camera marker is measured by more than one other camera marker, that single camera marker may have more than one associated set of location coordinates in the common coordinate system. In such a situation, a least squares technique may be employed to determine a best fit position in the common coordinate system. The least squares technique may be weighted to accommodate the reliability of different measurements. For example, a position measurement from a nearby camera marker may be weighted more heavily than a position measurement from a more distant camera marker. A reliability measure may also be associated with different coordinate transforms, with transforms based on a greater number of marker positions being considered relatively more reliable, and transforms that result in a very low sum S being considered relatively more reliable. In such an embodiment, a position measurement that results from more reliable transforms is itself considered more reliable and thus is weighted more heavily in determining a best fit position.

[0074] For the sake of clarity, the example given above makes reference only to transformation of position vectors. It should be understood that information on the orientation of different camera markers (e.g., Euler angles) can likewise be transformed among different coordinate systems in order to represent the orientations of the camera markers with respect to a common coordinate system.

[0075] With reference to FIG. 9, after coordinates of the camera markers (and possibly other markers, such as printed markers) have been determined in a common coordinate system, those coordinates are communicated to an augmented reality (AR) rendering system (930). In some embodiments, the AR rendering system includes an AR device (such as a headset, tablet, or other device) with a camera. The AR device takes an image of the environment (935) and operates to locate AR markers (such as camera markers) in the image (940). Based on the location of the AR markers within the image, and based further on the coordinates of those markers within the common coordinate system, the AR system determines the position and orientation of the AR device within the common coordinate system (945). The AR system further operates to render AR content based on the determined position and orientation of the AR device (950). It should be understood that other components of the AR device, such as accelerometers and gyroscopic sensors, can be used to assist with the tracking of the AR device. In some embodiments, one or more printed markers are displayed on the AR device to facilitate tracking of the AR device by camera markers.

[0076] In some embodiments, the functions of the described camera marker are performed using a general purpose consumer tablet computer. A tablet computer has readily versions of the components needed such as a display, camera (though typically not with wide-angle optics), and wired and wireless network connections. In some embodiments, a camera marker is implemented using dedicated software running on the tablet device. In some embodiments, the camera marker is implemented using a special-purpose version of a tablet computer. The special-purpose version of the tablet computer may, for example, have reduced memory, lower screen resolution (possibly greyscale only), wide-angle optics, and may be pre-loaded with appropriate software to enable camera marker functionality. In some embodiments, inessential functionality such as GPS, magnetometer, and audio functions may be omitted from the special-purpose tablet computer.

[0077] Exemplary embodiments disclosed herein are implemented using one or more wired and/or wireless network nodes, such as a wireless transmit/receive unit (WTRU) or other network entity.

[0078] FIG. 6 is a system diagram of an exemplary WTRU 102, which may be employed as a user device in embodiments described herein. As shown in FIG. 6, the WTRU 102 may include a processor 118, a communication interface 119 including a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, a non-removable memory 130, a removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and sensors 138. It will be appreciated that the WTRU 102 may include any subcombination of the foregoing elements while remaining consistent with an embodiment.

[0079] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 6 depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.

[0080] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station over the air interface 115/116/117. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.

[0081] In addition, although the transmit/receive element 122 is depicted in FIG. 6 as a single element, the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MTMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 115/116/117.

[0082] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the WTRU 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.

[0083] The processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), readonly memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).

[0084] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102. The power source 134 may be any suitable device for powering the WTRU 102. As examples, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.

[0085] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.

[0086] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include sensors such as an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.

[0087] FIG. 7 depicts an exemplary network entity 190 that may be used in embodiments of the present disclosure, for example as a common server used for the setup of one or more camera markers. As depicted in FIG. 7, network entity 190 includes a communication interface 192, a processor 194, and non-transitory data storage 196, all of which are communicatively linked by a bus, network, or other communication path 198.

[0088] Communication interface 192 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 192 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 192 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 192 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.

[0089] Processor 194 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.

[0090] Data storage 196 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non-transitory data storage deemed suitable by those of skill in the relevant art could be used. As depicted in FIG. 7, data storage 196 contains program instructions 197 executable by processor 194 for carrying out various combinations of the various network-entity functions described herein.

[0091] FIG. 8 illustrates examples of patterns that can be displayed on the display of a camera marker for use as an AR marker, without limitation.

[0092] FIG. 10 illustrates a functional architecture of a camera marker 1001 in accordance with an embodiment. The camera marker 1001 may operate various modules. A camera module 1005 may operate within the camera marker 1001. A marker display module 1010 may operate within the camera marker 1001, to display the AR marker. A coordinate conversion module 1015 may operate within the camera marker 1001, to determine the coordinates, relative to the camera marker, of other markers detected by image capture. In some embodiments, a position server module 1020 may operate within the camera marker 1001. The position server module 1020 may include a shared coordinate conversion module 1022, which may convert the local coordinates of detected markers into a shared coordinate system. There may also be a marker transform/scale module 1040, which may scale and/or transform a displayed marker in relation to an AR unit.

[0093] The camera module 1005, marker display module 1010, coordinate conversion module 1015, position server module 1020, and marker transform/scale module 1040, as well as other modules, may communicate with a memory 1030. The memory may include rules for marker transform/scale 1032, captured images 1034, local coordinates 1036, other camera locations 1038, and/or the like.

[0094] At various times, the camera marker may have communications incoming from or outgoing to an AR unit 1003.

[0095] In one embodiment, there is a method comprising: providing a plurality of camera markers; and operating the plurality of camera markers to perform self-calibration. In some embodiments, the self-calibration includes determination of a shared coordinate system. In some embodiments, the method further comprises rendering augmented content to a user using the shared coordinate system. In some embodiments, the self-calibration includes determination of a location of the camera marker in the shared coordinate system. In some embodiments, the self-calibration includes determination of an orientation of the camera marker in the shared coordinate system.

[0096] In one embodiment, there is a method comprising: operating a camera marker to display an image of at least a first augmented reality marker; operating the camera marker to capture an image of at least a second augmented reality marker; based on the image, determining a pose of the second augmented reality marker with respect to the camera marker; and providing the pose to a position server. In some embodiments, the method further comprises operating a second camera marker to capture the image of the first augmented reality marker. In some embodiments, the method further comprises operating the second camera marker to display the second augmented reality marker. In some embodiments, the second augmented reality marker is a second camera marker. In some embodiments, the method further comprises detecting an image of a user by the camera marker and determining a pose of the user based on the image of the user. In some embodiments, the position server is implemented in the camera marker. In some embodiments, the position server is implemented in a separate camera marker. In some embodiments, the position server operates to define a shared coordinate system. In some embodiments, the method further comprises rendering augmented reality content using the shared coordinate system. In some embodiments, the rendering of the augmented reality content includes providing sound from a speaker of the camera marker. In some embodiments, the method further comprises controlling the camera marker to modify the first augmented reality marker, the modification being selected from the group consisting of changing, scaling and turning the augmented reality marker. In some embodiments, the method further comprises changing the rendering of augmented content in response to modification of the augmented reality marker. In some embodiments, the controlling is provided by remote control. In some embodiments, the remote control is provided over an internet protocol (IP) network. In some embodiments, the remote control is provided using a protocol selected from the group consisting of WLAN, Bluetooth, and an Infrared link. In some embodiments, the method further comprises: determining a pose of an augmented reality viewing device using at least the first augmented reality marker; and rendering augmented reality content on the augmented reality viewing device using the determined pose. In some embodiments, the method further comprises determining a pose of an augmented reality viewing device using at least the first augmented reality marker. In some embodiments, the viewing device is selected from the group consisting of a camera phone, a tablet computer, and a virtual reality headset. In some embodiments, the method further comprises determining a position of an augmented reality viewing device using at least the first augmented reality marker and the second augmented reality marker. In some embodiments, the second augmented reality marker is displayed on a second camera marker. In some embodiments, the position server operates to define a shared coordinate system and to determine a position of the camera marker in the shared coordinate system. In some embodiments, the augmented reality marker is a printed fiducial marker used to identify a surface level. In some embodiments, the augmented reality marker is a printed fiducial marker. In some embodiments, the camera marker displays information advertising the available augmented content. In some embodiments, capturing an image includes capturing a depth image.

[0097] In one embodiment, there is a method of providing a virtual mirror, the method comprising: obtaining an image from a camera of a camera marker; processing the image to emulate a reflected image; and rendering the processed image on an augmented reality display at a position determined at least in part by an augmented reality marker displayed by the camera marker. In some embodiments, the processed image is rendered substantially at the position of the augmented reality marker.

[0098] In one embodiment, there is a camera marker comprising: a wide-angle camera on a front face of the camera marker; and a display on the front face of the camera marker. In some embodiments, the camera further comprises logic in communication with the wide-angle camera to determine a relative location of at least one other camera marker. In some embodiments, the camera marker is operative to display an augmented reality (AR) marker on the display. In some embodiments, the wide-angle camera is an electronic pan-tilt-zoom camera. In some embodiments, the wide-angle camera is a depth camera. In some embodiments, the camera marker is implemented in a digital photo frame.

[0099] In one embodiment, there is a camera marker system comprising: a first camera marker including a first display and a first front-facing camera; a second camera marker including a second display and a second front-facing camera; wherein the first camera marker is positioned in a field of view of the second front-facing camera, and wherein the second camera marker is positioned in a field of view of the first front-facing camera. In one embodiment, the camera marker system further comprises a common position server. In one embodiment, the camera marker system further comprises an augmented reality system.

[0100] In one embodiment, there is a method of defining a coordinate system comprising: operating a first camera marker to determine a pose of a second camera marker in a first local coordinate system; operating the second camera marker to determine a pose of the first camera marker in a second local coordinate system; and determining a transformation between the first local coordinate system and the second local coordinate system. In one embodiment, the method further comprises defining a global coordinate system. In one embodiment, the method further comprises determining a transformation between the first local coordinate system and the global coordinate system. In one embodiment, the method further comprises determining a transformation between the second local coordinate system and the global coordinate system. In one embodiment, the method further comprises determining a pose of the first camera marker in the global coordinate system. In one embodiment, the method further comprises determining a pose of the second camera marker in the global coordinate system. In one embodiment, the method further comprises rendering augmented reality content using the global coordinate system. In one embodiment, the method further comprises rendering the augmented reality content using an augmented reality viewer. In one embodiment, the augmented reality viewer is a head-mounted display. In one embodiment, the augmented reality viewer is a tablet computer. In one embodiment, the augmented reality viewer is a smartphone. In one embodiment, the augmented reality viewer is a wearable device.

[0101] Note that various hardware elements of one or more of the described embodiments are referred to as "modules" that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits

(ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation. Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions could take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer- readable medium or media, such as commonly referred to as RAM, ROM, etc.

[0102] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer- readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD- ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

REFERENCES

[1] Domagoj Baricevic, Cha Lee, Matthew Turk, Tobias Hollerer, Doug A. Bowman (2012), "A Hand-Held AR Magic Lens with User-Perspective Rendering", ISMAR 2012, 10 p.

[2] Andrew Maimone and Henry Fuchs (2012), "Real-Time Volumetric 3D Capture of Room- Sized Scenes for Telepresence", 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 3DTV-CON 2012, Zurich, Switzerland, October 15-17, 2012.

[3] Siltanen S., Hakkarainen M., Honkamaa P. (2007), "Automatic marker field calibration", in Proc. Virtual Reality International Conference (VRIC), Laval, France, April 2007, pp. 261-267.

[4] US2013/0109961 Al (2013), Raja Bose, Jonathan Lester, Jorg Brakensiek, "APPARATUS AND METHOD FOR PROVIDING DYNAMIC FIDUCIAL MARKERS FOR DEVICES", US Patent Application Publication, Nokia Corporation, May 2, 2013.

[5] Marcel Bruckner and Joachim Denzler (2010), "Active Self-calibration of Multi-camera Systems", M. Goesele et al. (Eds.): DAGM 2010, LNCS 6376, Springer- Verlag Berlin Heidelberg 2010, pp. 31-40.

Claims

1. A method comprising:

operating a camera marker to display an image of at least a first augmented reality marker; operating the camera marker to capture an image of at least a second augmented reality marker;

based on the image, determining a pose of the second augmented reality marker with respect to the camera marker; and

providing the pose to a position server.

2. The method of claim 1, further comprising operating a second camera marker to capture the image of the first augmented reality marker.

3. The method of claim 2, further comprising operating the second camera marker to display the second augmented reality marker.

4. The method of any of claims 2-3, wherein the second augmented reality marker is displayed on a second camera marker.

5. The method of claim any of claims 1-4, further comprising detecting an image of a user by the camera marker and determining a position of the user based on the image of the user.

6. The method of any of claims 1-5, wherein the position server operates to define a shared coordinate system.

7. The method of claim 6, further comprising rendering augmented reality content using the shared coordinate system.

8. The method of any of claims 1-5, wherein the position server operates to define a shared coordinate system and to determine a pose of the camera marker in the shared coordinate system.

9. The method of any of claims 1-8, further comprising controlling the camera marker to modify the first augmented reality marker, the modification being selected from the group consisting of changing, scaling and turning the augmented reality marker.

10. The method of claim 9, further comprising changing the rendering of augmented content in response to modification of the augmented reality marker.

11. The method of any of claims 1-10, further comprising:

determining a pose of an augmented reality viewing device using at least the first augmented reality marker; and

rendering augmented reality content on the augmented reality viewing device using the determined pose.

12. The method of any of claims 1-11, further comprising determining a pose of an augmented reality viewing device using at least the first augmented reality marker and the second augmented reality marker.

13. The method of claim 12, wherein the second augmented reality marker is displayed on a second camera marker.

14. A camera marker system comprising:

a first camera marker including a first display and a first front-facing camera;

a second camera marker including a second display and a second front-facing camera; wherein the first camera marker is positioned in a field of view of the second front-facing camera, and wherein the second camera marker is positioned in a field of view of the first front- facing camera;

a common position server; and

a processor and a non-transitory storage medium storing instructions operative, when executed on the processor, to perform functions including:

operating a camera marker to display an image of at least a first augmented reality marker;

operating the camera marker to capture an image of at least a second augmented reality marker; based on the image, determining a pose of the second augmented reality marker with respect to the camera marker; and

providing the pose to a position server.

15. The camera marker system of claim 14, further comprising an augmented reality system.