CN113039508A

CN113039508A - Evaluating alignment of inputs and outputs of a virtual environment

Info

Publication number: CN113039508A
Application number: CN201980066268.6A
Authority: CN
Inventors: M·吉布森; J·切希尔; J-L·R·布特米; B·道格拉斯
Original assignee: Pluto Virtual Reality Technology Co
Current assignee: Pluto Virtual Reality Technology Co
Priority date: 2018-10-10
Filing date: 2019-10-09
Publication date: 2021-06-25
Also published as: EP3853699A1; EP3853699A4; WO2020076997A1

Abstract

Techniques and architectures are described herein for establishing and/or evaluating communication sessions that enable users from the same physical environment, different physical environments, or a combination to interact in a virtual coordinate system and perceive each other as present. A representation of a user may be aligned within a coordinate system while maintaining spatial alignment of the user in a physical environment and/or maintaining spatial alignment of the representation in another coordinate system. The representation of the user may be output to another user in alignment with the user's input. A human model may be created for a user and used to provide a representation of the user that maps to a human. A representation of a user may be evaluated to determine whether the representation is properly aligned with the user and/or coordinate system.

Description

Evaluating alignment of inputs and outputs of a virtual environment

Cross Reference to Related Applications

This application is a PCT application claiming priority from commonly owned us patent application No. 16/156,738 entitled "assessing ALIGNMENT OF INPUTS AND OUTPUTS OF a VIRTUAL environment (applicable ALIGNMENT OF INPUTS AND OUTPUTS FOR VIRTUAL ENVIRONMENTS)" filed on 10.10.2018, us patent application No. 16/156,776 entitled "reference frame FOR VIRTUAL environment" (REFERENCE FRAMES FOR VIRTUAL environment) filed on 10.10.2018, AND us patent application No. 16/156,818 entitled "ALIGNING VIRTUAL REPRESENTATIONS TO INPUTS AND OUTPUTS (applicable ALIGNMENT OF VIRTUAL REPRESENTATIONS TO INPUTS AND OUTPUTS" filed on 10.10.2018, which are all hereby incorporated by reference in their entirety.

Background

Many systems enable users to connect over a network. For example, Virtual Reality (VR) systems allow a user to control an avatar or other virtual representation in a virtual environment. In particular, a first user located at a first location may use a VR headset or other device to interact with a second user located at a second location. In another example, an Augmented Reality (AR) system allows a user to experience a physical world with augmented content. In particular, a first user and a second user at the same location may view a real-time image of the physical world with virtual content superimposed on the real-time image. In yet another example, users may communicate through a video conferencing system by viewing real-time images of each other. While these systems enable users to interact with each other in a virtual manner, they suffer from various frustrations. For example, VR systems typically place users within their own virtual environments. This often results in users located in the same physical environment bumping into or otherwise contacting each other as they move within the physical environment. Furthermore, AR systems typically allow users in the same physical environment to interact, but lack support for remote users. In addition, these systems provide a relatively high level of interaction and do not mimic the way in which humans actually communicate. For example, a video conferencing system requires a user to look at a camera to make another user appear to be making eye contact. Furthermore, because different types of systems and/or systems having multiple components are used to connect users over a network, the systems are prone to losing one or more aspects of the interaction.

Drawings

The detailed description is set forth with reference to the accompanying drawings. In the figures, the left-most digit or digits of a reference number identify the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.

Fig. 1 illustrates an example architecture in which techniques described herein may be implemented.

FIG. 2A illustrates a user within a physical environment.

FIG. 2B illustrates a representation of a user within a virtual environment.

FIG. 3 illustrates an example of representing users and representations from various environments in a common virtual environment.

FIG. 4 illustrates an example of outputting a first representation of a first user to a second user in alignment with an input of the first user.

Fig. 5 illustrates an example of evaluating a representation and/or a user to ensure that a communication session has a shared presence.

Fig. 6 illustrates an example apparatus that may perform the techniques discussed herein.

Fig. 7 illustrates an example service provider that may perform the techniques discussed herein.

FIG. 8 illustrates an example process of generating composite spatial data to maintain spatial alignment of a user in a physical environment and/or spatial alignment of representations in a coordinate system.

Fig. 9 illustrates an example process of evaluating eye alignment of users and/or representations participating in a communication session.

Fig. 10 illustrates an example process of evaluating communication sessions to determine whether users or representations are gazing at each other.

Fig. 11 illustrates an example process of evaluating an output of a representation as part of a communication session.

FIG. 12 illustrates an example process of evaluating sounds associated with a representation as part of a communication session.

FIG. 13 illustrates an example process of evaluating touches associated with a representation as part of a communication session.

FIG. 14 illustrates an example process of creating a human model and using the human model to evaluate user interactions.

FIG. 15 illustrates an example process of causing a representation to be displayed with a representation of eyes gazed in a direction corresponding to a direction in which a user is gazing in a physical environment.

FIG. 16 illustrates an example process of causing a representation to be displayed with a representation of eyes gazed in a direction corresponding to a direction in which a user is gazing in a physical environment.

Detailed Description

This disclosure describes, in part, techniques and architectures for establishing communication sessions that enable users from the same physical environment, different physical environments, or a combination to interact in a virtual coordinate system and perceive each other as present (also referred to as "shared presence"). In some examples, a representation of a user may be aligned within a coordinate system (e.g., a virtual environment) while maintaining spatial alignment of the user in a physical environment and/or maintaining spatial alignment of the representation in another coordinate system. To illustrate, the composite spatial data may be used to represent two users from a first physical environment in a virtual environment and to represent users from another physical environment in the virtual environment while maintaining the positioning of the two users relative to each other. Further, in some examples, a representation of a user may be output to another user in alignment with the user's input. To illustrate, the representation of the first user may be displayed to the second user with the eye of the second user being looked at while the first user is looking at the eye of the representation of the second user displayed to the first user. Further, in some examples, a human model may be created for a user and used to provide a representation of the user that maps to an actual human. To illustrate, the distance between the eyes for the representation of the user may correspond to the distance between the actual eyes of the user.

This disclosure also describes, in part, techniques and architectures for evaluating representations and/or users to ensure that a communication session maintains a shared presence. In some examples, a representation of a user may be evaluated during a communication session to determine whether the representation is properly aligned with the user and/or a coordinate system. To illustrate, a direction at which the first user perceives the representation of the second user to be gazed (as displayed by an apparatus associated with the first user) may be compared to a direction at which the apparatus of the second user indicates the representation of the gazed for the second user. If the directions are the same within a threshold amount, the communication session has a shared presence. And if the directions are not the same within a threshold amount, the communication session has lost shared presence.

As described above, in some examples, a representation of a user may be aligned within a coordinate system while maintaining spatial alignment of the user in a physical environment and/or maintaining spatial alignment of the representation in another coordinate system. For example, assume that a first device associated with a first user, a second device associated with a second user, and a third device associated with a third user are engaged in a communication session. In addition, it is assumed that the first device and the second device are located in a first physical environment, and the third device is located in a second physical environment. A first user may be associated with the first representation, a second user may be associated with the second representation, and a third user may be associated with the third representation. Here, a computing device (e.g., a first device, a second device, a service provider, etc.) may identify first spatial data indicative of a first frame of reference for a first user and a second user, a location of the first user or first representation relative to the first frame of reference, and a location of the second user or second representation relative to the first frame of reference. The first frame of reference may be a common spatial anchor point in the first physical environment (or in some cases, a virtual point in the virtual environment). The computing device may also identify second spatial data that indicates a second frame of reference of a third user and a location of the third user or the third representation relative to the second frame of reference. The second frame of reference may be an anchor point and/or another virtual point in the second physical environment.

The computing device may then generate composite spatial data for the first user, the second user, and the third user as they are part of the same communication session. The composite spatial data may include first spatial data and second spatial data. The composite spatial data may be indicated as a virtual point shared by the first spatial data and the second spatial data. The composite spatial data may indicate a location of the first user/representation, a location of the second user/representation, and/or a location of the third user/representation relative to the virtual point. In some examples, the composite spatial data maintains spatial data for a variety of physical and/or virtual environments such that users in the plurality of physical and/or virtual environments may maintain existing spatial alignment. In some examples, the composite spatial data contains spatial data for each user or device that is part of the communication session.

The computing device may use the composite spatial data to locate the first representation, the second representation, and/or the third representation. For example, the computing device may display, via the first device, a second representation for a second user within the coordinate system at a location based on the first spatial data included in the composite spatial data. This may maintain spatial alignment of the first user and the second user in the first physical environment. For example, if a first user is looking at a second user in a first physical environment, the first representation will be represented in the coordinate system as looking at the second representation. For example, the computing device may display, via the first device, a third representation for a third user within the coordinate system at a location based on the second spatial data included in the composite spatial data. In some examples, the composite spatial data may assist in locating users associated with the nested environment (e.g., allowing a first user in the physical environment and the virtual environment to communicate in a shared environment with a second user in both the physical environment and the other virtual environment).

In some examples, a representation of a user may be output to another user in an aligned manner with the user. For example, assume that a first device associated with a first user and a second device associated with a second user are engaged in a communication session. The first user may be represented by a first representation and the second user may be represented by a second representation. In this example, the eye of the first representation may be aligned with the direction in which the eye of the first user is looking relative to the object or person displayed to the first user. Similarly, the eye of the second representation may be aligned with the direction in which the eye of the second user is looking relative to the object or person displayed to the second user.

To facilitate such alignment, a computing device (e.g., a first device, a second device, a service provider, etc.) may determine a first direction in which an eye of a first user is gazing relative to a second representation displayed by the first device. The computing device may also determine a position and/or orientation of the second user relative to the second device. The computing device may use such a determination to cause the first representation to be displayed, via the second device, with an eye representation gazed in a second direction relative to the second user that is aligned with the first direction in which the first user's eye is gazed relative to the second representation displayed by the first device. For example, if a first user gazes at an eye of a representation of a second user displayed by a first apparatus, the first representation may be displayed on the second apparatus to the second user as gazing at the eye of the second user. Similarly, the eye representations of the second representation may be aligned. This may avoid that the user has to look directly at the sensor (e.g. camera) to look like looking at the eye of another user.

In some examples, a human model may be created for a user and used to provide a representation of the user that maps to an actual human. For example, the apparatus may capture data about the user, such as face data about the user's face, hand data about the user's hand, and so forth. The data may be used to form a human model representing the user. In some cases, the human model is user-specific. In other cases, various data may be collected over time to create a more general human model representing multiple users. In any case, the human model may be used to provide a representation of the user within the coordinate system. In particular, the representation for the user may contain features corresponding to features of the human model (e.g., containing similar sizes/shapes, similar positioning of features, etc.).

In some examples, the representation for the user may contain only features corresponding to the human model. To illustrate, if the device is only capable of capturing head and hand data of the user (e.g., because the user is using only a headset and a hand controller), the representation for the user may contain only the head and the hand. However, in other illustrations, the representation for the user may contain any number of features that may or may not correspond to a human model.

Additionally, in some cases, the representation of the user may be evaluated during the communication session to check whether the features of the representation accurately reflect the human model. To illustrate, if the represented hand is located at a first distance from the represented head, the first distance may be compared to a second distance between the hand on the human model and the head on the human model. If the first distance is greater than the second distance, this may indicate that the represented hand is no longer associated with the human model (e.g., the user has stopped using the hand controller and moved away from the hand controller). After this is recognized, the representation of the user may now be displayed without a hand.

In some examples, the representation and/or the user may be evaluated to ensure that the communication session maintains a shared presence. For example, assume a first user associated with a first device and a second user associated with a second device are engaged in a communication session, where the first user is represented in a first representation and the second user is represented in a second representation. Here, the position or direction of the output of the second representation as perceived by the first user may be evaluated against the position or direction of the output of the second representation as perceived by the second user. A similar process may occur for the output of the first representation. For example, the first apparatus may identify a first direction in which the second represented eye is looking in the coordinate system displayed by the first apparatus. The first direction may be a direction perceived by the first user (via the first apparatus) as being gazed at by the second representation. The second means may identify a second direction in the coordinate system in which the eye of the second representation is looking. The second direction may be a direction that the second user is looking through the coordinate system via the second device.

The first direction and the second direction may be evaluated to determine whether the first user perceives the direction in which the second representation is gazing to be the same as the direction in which the second apparatus indicates the second representation is gazing. For example, if the first user and the second user are gazing at each other (e.g., gazing at each other in the eyes), the evaluation may check that (i) the first means indicates that the second representation is gazing at the first user, and (ii) the second means indicates that the second representation is gazing at the first representation (e.g., the second user is gazing at the eyes of the first representation). If the evaluation indicates that the eye of the second representation is misaligned, the communication session may restart the first apparatus, the second apparatus, and/or the eye tracking sensor, switch to displaying the first representation and/or the second representation without eye data (e.g., misaligned eyes), and/or perform various other operations. By doing so, the techniques may maintain alignment of the user and/or the user's representation.

Although many examples are discussed in the context of aligning and/or evaluating eyes, other types of data may be evaluated. For example, the techniques may align and/or evaluate the direction of sound perceived from the mouth of the representation, the location of a touch on the perceived representation, and so on.

This brief introduction is provided for the convenience of the reader and is not intended to limit the scope and ongoing sections of the claims. Furthermore, the techniques described in detail herein may be implemented in several ways and in several contexts. Some example implementations and background are provided with reference to the accompanying drawings as described in more detail below. It should be appreciated, however, that the following embodiments and contexts are but a few of the many.

Fig. 1 illustrates an example architecture 100 in which techniques described herein may be implemented. Architecture 100 includes

users

102, 104, and 106 associated with

devices

108, 110, and 112, respectively. As shown, the user 102 is located in a physical environment 114 (also referred to as a real-world environment 114), while the

users

104 and 106 are located in a physical environment 116 (also referred to as a real-world environment 116). The

devices

108, 110, and 112 may provide output, such as by displaying content, outputting sound, vibration, etc., and/or may receive input, such as by the controllers 108(B), 110(B), and 112(B), to enable the

users

102, 104, and 106 to interact in a coordinate system 114 (e.g., a virtual environment). Such interaction and/or coordinate system 114 may form part of a communication session for

users

102, 104, and 106. Although illustrated in fig. 1 as a Virtual Reality (VR) headset and controller (e.g., a hand controller), as discussed in more detail with reference to fig. 6, each of the

devices

108, 110, and 112 may be any type of computing device. In the example architecture 100, the user 102 is represented by a representation 118, the user 104 is represented by a representation 120, and the user 106 is represented by a representation 122 within the coordinate system 114. Via

devices

108, 110, and 112,

users

102, 104, and 106 see

views

124, 126, and 128, respectively. For example, view 124 illustrates what device 108 is currently displaying to user 102. In some examples, the communication session is implemented in a point-to-point (P2P) context, where

devices

108, 110, and 112 perform operations to facilitate the communication session. In other examples, the service provider 130 (sometimes referred to as a remote computing system) performs some or all of the operations to facilitate the communication session. In other examples, the communication session may be facilitated in other manners. The

devices

108, 110, and 112 and/or the service provider 130 may communicate via one or more networks 132. The one or more networks 132 may comprise any one or combination of a number of different types of networks, such as a cellular network, a wireless network, a Local Area Network (LAN), a Wide Area Network (WAN), the internet, and so forth.

Although three users and three devices are shown in fig. 1, any number of users and/or devices may be implemented. Additionally, although one user 102 is located at the first physical environment 114 and two

users

104 and 106 are located at the second physical environment 116, the

users

102, 104, and 106 (and/or additional users) may be arranged differently. Thus, any number of users and/or devices from any number of physical environments may be part of a communication session. The communication session may comprise a video telephony session, a Virtual Reality (VR) session, an Augmented Reality (AR) session, and the like.

In the example of FIG. 1,

representations

118, 120, and 122 are each illustrated with only a head and two hands. For example, the representation 118 includes a header representation 118(A) and a watch representation 118 (B). Here, the head representation 118(a) corresponds to the head component 108(a) of the device 108, e.g., headset, while the watch representation 118(B) corresponds to the hand controller 108(B) of the device 108. In this example,

representations

118, 120, and 122 contain only those features (e.g., body parts) for which human data has been captured. For example, since the device 108 includes the headset 108(a) and the hand controller 108(B), the device 108 is able to capture data about the head of the user 102 and the hand of the user 102. Such human data is used to represent the user 102 in a header representation 118(a) and a watch representation 118 (B). In other examples,

representations

118, 120, and 122 may be displayed with other features (e.g., body parts) even if human data (e.g., body data) is not captured for those other features.

In the example of FIG. 1, the

representations

120 and 122 are aligned to maintain the positioning of the

users

104 and 106 within the physical environment 116. As shown, the

users

104 and 106 are positioned at a distance from each other within the physical environment 116 and are gazing at each other (e.g., would be gazing at each other's eyes if the

devices

110 and 112 were not worn). Thus, the

representations

120 and 122 are displayed at a distance from each other within the coordinate system 114 and are displayed gazing at each other's eyes. The locations of the

representations

120 and 122 are related to (e.g., identical, scaled by a factor, etc.) the locations of the

users

104 and 106 in the physical environment 116. In many examples, the positioning of the

representations

120 and 122 is based on the positioning of the

users

104 and 106 relative to a frame of reference (e.g., an anchor point in the physical environment 116). For example, table 128 may be a common frame of reference for

users

104 and 106.

As shown in FIG. 1, user 102 sees representation 120 of user 104 and sees representation 122 of user 106, which are gazing at each other. Additionally, the user 104 sees the representation 122 of the user 106 looking at the user 104, and sees the representation 118 of the user 102 looking to one side. Additionally, the user 106 sees the representation 120 of the user 104 looking at the user 106, and the representation 118 of the user 102 looking to one side. In many examples, a representation described as gazing at a user refers to the represented eyes looking toward the user's eyes, rather than gazing more generally in the direction of the user.

Fig. 2A illustrates a user 202 within a physical environment 204 (also referred to as a real-world environment). The physical environment 204 is shown in a coordinate system. Here, the coordinate system is represented by a set of lines, which may be used to describe the location of the user 202. In this example, the origin of the coordinate system represents the frame of reference 206 of the physical environment 204. Here, the frame of reference 206 corresponds to a corner of a room in which the user 202 is located. In other examples, however, the frame of reference 206 of the physical environment 204 may correspond to any physical point or anchor point to which a state (e.g., location, velocity, etc.) of an object may be referenced. For example, the frame of reference 206 of the physical environment 204 may instead be an object in a room (e.g., VR headphones, furniture, light fixtures, the ground, etc.). As shown in fig. 2B, the room contains a window 208.

In some examples, the apparatus 210 used by the user 202 may evaluate the physical environment 204 to determine the frame of reference 206 and/or the coordinate system. The frame of reference 206 may generally be used to track the user 202 within the physical environment 202, provide a representation 212 of the user 202 related to the movement of the user 202, and so forth. In some examples, the apparatus 210 may reevaluate the reference frame 206 when the reference frame 206 is no longer within a certain distance from the apparatus 210, when the user 202 moves to a new room (or the like), and/or periodically determine a new reference frame. Additionally, in some examples, multiple frames of reference may be used for the user 202 in the physical environment 204.

FIG. 2B illustrates a representation 212 of a user 202 within a virtual environment 214. The virtual environment 214 may be described with reference to a coordinate system (e.g., illustrated lines). Thus, the virtual environment 214 may sometimes be referred to as a virtual coordinate system. In this example, the origin of the coordinate system represents the frame of reference 216 of the virtual environment 214. Here, the frame of reference 216 corresponds to a point in the virtual room where the user 212 is located. The frame of reference 216 may be any point or anchor point (e.g., object) in the virtual environment 214 to which a state (e.g., position, velocity, etc.) of the object may be referenced. The frame of reference 216 may be the same as or different from the frame of reference 206 in the physical environment 204. Thus, the user 202 may be described with respect to a frame of reference 206 in the physical environment 204 and/or the representation 212 may be described with respect to a frame of reference 216 in the virtual environment 214. As shown, the virtual environment 214 includes a virtual table 218.

The coordinate system of the physical environment 204 may be described relative to the coordinate system of the virtual environment 214, or vice versa. Block 220 indicates that the origin of the coordinate system of the virtual environment 214 may be transformed, e.g., offset, rotated, scaled, etc., from the origin of the coordinate system of the physical environment 204. In this example, the origin of the coordinate system of the virtual environment 214 is offset from the origin of the coordinate system of the physical environment 204, which may be the same in other examples. In addition, although the coordinate systems are illustrated as being the same, the coordinate systems may be different.

FIG. 3 illustrates an example of representing users and representations from various environments in a common virtual environment 302. The common virtual environment 302 (sometimes referred to as the "virtual environment 302") may allow users and/or representations from various physical and/or virtual environments to communicate. In this example, composite spatial data may be created and used to maintain spatial alignment of the

users

304 and 306 in the physical environment 308 and spatial alignment of the

representations

310 and 312 in the virtual environment 314. By maintaining spatial alignment of users and/or representations of users, the techniques can assist in establishing shared presence within the common virtual environment 302 (e.g., creating an experience in which users feel each other's presence).

In this example, device 316 associated with user 304 and device 318 associated with user 306 communicate to establish frame of reference 320 (e.g., an anchor point in physical environment 308). Here, frame of reference 320 comprises a shared frame of reference used by both

devices

316 and 318.

Devices

316 and 318 may share frame of reference 320 because

users

304 and 306 are located in the same physical environment 308, e.g., the same room, house, yard, etc. In other examples, however,

devices

316 and 318 may use different frames of reference. Although not illustrated in fig. 3, the frame of reference 320 may be associated with a coordinate system (e.g., may be the origin of the coordinate system). This may allow objects (e.g., users 304 and 306) to be described with respect to frame of reference 320.

The frame of reference 320 may be used to generate spatial data for the user 304 and/or the user 306. For example, device 316 may determine a position of user 304 relative to frame of reference 320 and an orientation of user 304 relative to frame of reference 320 (e.g., an angle relative to an origin and/or axis of a coordinate system associated with frame of reference 320). To illustrate, this may include determining X, Y, Z values indicative of a position of user 304 relative to frame of reference 320 and/or Xr, Yr, Zr values indicative of a rotation of user 304 relative to frame of reference 320. In an example, X, Y, Z and/or Xr, Yr, Zr values may be used to find a distance between user 304 and frame of reference 320 or any other information. Such information regarding the alignment (e.g., positioning) of user 304 relative to frame of reference 320 may be stored as spatial data of user 304 (e.g., spatial data of user 304 may contain X, Y, Z and/or Xr, Yr, Zr values). In some examples, the spatial data is a particular characteristic, such as a particular body part, with respect to the user 304. For example, the spatial data for the user 304 may indicate that the user's 304 eyes are located at a particular X, Y, Z coordinate relative to the frame of reference 320 and gazing in a direction that establishes a 240 degree angle with the origin and/or axis of the coordinate system associated with the frame of reference 320. The line from device 316 in fig. 3 indicates the direction in which the user's 304 eyes are gazing. Although illustrated as being perpendicular to the body gaze of the user 304, the eyes may look in any direction. Similar processing may be performed to generate spatial data for user 306.

Representation 310 and representation 312 within virtual environment 314 are also associated with frame of reference 322. Here, the frame of reference 322 includes virtual points in the virtual environment 314, such as objects, corners of a room, etc., that are part of the virtual environment 314. Although not illustrated in FIG. 3, each of

representations

310 and 312 may be associated with a user other than

users

304 and 306. In the example of fig. 3, representation 310 and representation 312 share a frame of reference 322. In other examples, a different frame of reference may be used.

The frame of reference 322 may be used to generate spatial data for the user 310 and/or the representation 312. For example, the spatial data of the representation 310 may indicate a position of the user 310 relative to the frame of reference 322 and/or an orientation of the representation 310 relative to the frame of reference 322 (e.g., an angle relative to an origin and/or axis of a coordinate system associated with the frame of reference 322). To illustrate, the spatial data may indicate X, Y, Z coordinate values of the representation 310 relative to the frame of reference 322 and/or Xr, Yr, Zr coordinate values indicating a rotation of the representation 310 relative to the frame of reference 322. As similarly described above, the spatial data of representation 310 may be specific to a feature of representation 310, such as an eye of representation 310. The lines from representation 310 in fig. 3 indicate the direction in which the eyes of representation 310 are gazing. Although illustrated as being perpendicular to the body gaze of representation 310, the eyes may look in any direction. Similar processing may be performed to generate spatial data of representation 312.

In the example of FIG. 3,

users

304 and 306 in physical environment 308 and

representations

310 and 312 in virtual environment 314 are each represented within common virtual environment 302 for communication. Specifically, user 304 is represented within virtual environment 302 by representation 324, and user 306 is represented within virtual environment 302 by representation 326. Meanwhile, representation 310 and representation 312 are provided within virtual environment 302.

To maintain the spatial alignment of the

users

304 and 306 within the physical environment 308 and the spatial alignment of the

representations

310 and 312 within the virtual environment, composite spatial data may be generated. The composite spatial data may contain spatial data for user 304, spatial data for user 306, spatial data for representation 310, and/or spatial data for representation 312. Additionally or alternatively, the composite spatial data may indicate a frame of reference 328 common to the

representations

310, 312, 324, and 326 within the virtual environment 302. The frame of reference 328 may contain points in the virtual environment 302, such as objects, corners in a room, and the like. The composite spatial data may indicate the position of

representations

310, 312, 324, and 326 relative to frame of reference 328. For example, the composite spatial data may indicate a position of representation 310 relative to reference frame 328 and/or an orientation of representation 310 relative to reference frame 328 (e.g., an angle relative to an origin and/or axis of a coordinate system associated with reference frame 328, X, Y, Z and/or Xr, Yr, Zr coordinate values of representation 310 relative to reference frame 328, etc.). For each of the

other representations

312, 324, and 326, similar data may be stored in the composite spatial data. In addition, the composite spatial data may indicate a position and/or orientation of the reference frames relative to each other (e.g., a position and/or orientation of reference frame 320 relative to reference frame 328, and a position and/or orientation of reference frame 322 relative to reference frame 328).

The composite spatial data may be used to locate

representations

310, 312, 324, and 326 within the virtual environment. For example, the location and/or orientation of

representations

324 and 326 in virtual environment 302 may be maintained in accordance with the location and/or orientation of

users

304 and 306 in physical environment 308. Additionally, the position and/or orientation of the

representations

310 and 312 in the virtual environment 302 may be maintained in accordance with the position and/or orientation of the

representations

310 and 312 in the virtual environment 314. In some examples, the position and/or orientation is maintained without a zoom factor (e.g., if

users

304 and 306 are ten feet apart in physical environment 308,

representations

324 and 326 are a distance corresponding to ten feet apart in virtual environment 302). In other examples, a scaling factor is used (e.g., zooming in or out all of the representations by five times). As shown in FIG. 3, an angle 330 between

users

304 and 306 in physical environment 308 is maintained for

representations

324 and 326 in virtual environment 302. As also shown, an angle 332 between

representations

310 and 312 in virtual environment 314 is maintained in virtual environment 302.

Although the example of fig. 3 discusses nesting one virtual environment into another virtual environment (e.g., nesting virtual environment 314 into virtual environment 302), any number of virtual environments may be nested.

The techniques discussed herein may be implemented in a variety of contexts. In one example, a user associated with representation 310 may interact (e.g., play a game, explore a space, communicate, etc.) with a user associated with representation 312 in virtual environment 314. Such users may wish to join a communication session with

users

304 and 306 while still maintaining interaction in virtual environment 314. Thus, the virtual environment 302 may be implemented to facilitate communication sessions with all users (e.g., the user associated with the virtual environment 314 and the users 304 and 306). A user associated with the virtual environment 314 may switch between the virtual environments 302 and 314 while interacting in the virtual environments 302 and 314, and so on. In another example, a user associated with virtual environment 314 may wish to conduct a private chat (side conversation) and implement virtual environment 314 to facilitate such a conversation. In yet another example, the virtual environment 302 (or the virtual environment 314) can be a particular type of virtual environment that enables a particular type of interaction (e.g., a secure virtual environment that enables users to communicate more securely than other types of virtual environments, etc.). In another example, virtual environment 302 (or virtual environment 314) may be a Virtual Reality (VR) environment, and virtual environment 314 (or virtual environment 302) may be another type of environment, such as an Augmented Reality (AR) environment.

FIG. 4 illustrates an example of outputting a first representation 402 of a first user 404 to a second user 406 in alignment with an input of the first user 404. In this example, a first user 404 uses a first device 408 to communicate with a second user 406 using a second device 410. The communication may include a video telephony session (e.g., a video conference), a VR session, an AR session, or any other communication session. Here, the first device 408 includes a sensor 412 (e.g., a video camera, a still camera, a depth sensor, etc.) to capture data, such as images, depth data, etc., of the first user 404. Additionally, the second device 410 includes a sensor 414 to capture data of the second user 406. Although

devices

408 and 410 are illustrated as mobile devices, such as mobile phones or tablets,

devices

408 and 410 may be any type of device.

In this example, data from the sensor 412 may be processed to provide a first representation 402 of the first user 404, and data from the sensor 414 may be processed to provide a second representation 416 of the second user 406. For example, the first device 408 may analyze data from the sensor 412 to identify a direction in which the eyes of the first user 404 are looking. To do so, the first device 408 may determine by processing the image and/or depth data: (i) a position and/or orientation of the first device 408 relative to the first user 404 and/or relative to the frame of reference, (ii) a position and/or orientation of the first user 404 relative to the first device 408 and/or relative to the frame of reference, (iii) a position and/or orientation of the eyes of the first user 404 relative to the first device 408 and/or relative to the frame of reference, (iv) a position and/or orientation of any other body part of the first user 404 (e.g., ear, nose, hand, foot, etc.) relative to the first device 408 and/or relative to the frame of reference, etc. To illustrate, the first device 408 may obtain face data about the face of the first user 404 and analyze the face data to identify a location of the eyes of the first user 404 on the face of the first user 404, a distance between the eyes of the first user 404, and the like. In some cases, first device 408 may identify a location of a pupil of an eye of first user 404. Based on such locations, and knowing the general shape of the eye (or knowing the shape of the eye of an ordinary user) via image or depth data processing, the first device 408 may estimate the direction in which the pupil of the eye is gazing. In some examples, the first device 408 may estimate the direction as a line extending from the center of the pupil of the eye.

The first device 408 may correlate the direction in which the eyes of the first user 404 are looking (e.g., the direction in which the pupils are looking) with the location on the display screen of the first device 408. In some examples, the first device 408 may project a line out of a pupil of an eye of the first user 404 onto the first device 408. The first device 408 may also reference information about the first device 408, such as the size of the screen, the size of the first device 408, the position of the sensor 412 relative to the screen, and the like. Based on such information, the first device 408 may determine a group of pixels on the screen of the first device 408 at which the first user 404 is looking. The group of pixels may be related to the content being displayed via the screen of the first device 408. In this example, the first user 404 is looking at the eye of the second representation 416 of the second user 406 (e.g., looking at the upper left corner of the display screen of the first device 408, where the eye of the second representation 416 is displayed).

The first apparatus 408 may send data regarding the direction in which the first user 404 is looking to the second apparatus 410 so that the first representation 402 for the first user 404 may be displayed in an aligned manner with the first user 404. As shown in fig. 4, the second apparatus 410 displays the first representation 402 of the first user 404 with eyes looking at the eyes of the second user 406. To do so, the second device 410 may analyze data from the sensor 414 (e.g., analyze images, depth data, etc.) to determine: (i) a position and/or orientation of the second device 410 relative to the second user 406 and/or relative to the frame of reference, (ii) a position and/or orientation of the second user 406 relative to the second device 410 and/or relative to the frame of reference, (iii) a position and/or orientation of the eyes of the second user 406 relative to the second device 410 and/or relative to the frame of reference, (iv) a position and/or orientation of any other body part of the second user 406 (e.g., ear, nose, hand, foot, etc.) relative to the second device 410 and/or relative to the frame of reference, etc. Specifically, the second device 410 may obtain face data regarding the face of the second user 406 and analyze the face data to identify a location of the eyes of the second user 406 on the face of the second user 406, a distance between the eyes of the second user 406, and/or the like. The second apparatus 410 may then generate the first representation 402 with eyes gazing at the eyes of the second user 406.

In a similar manner, a second representation 416 of the second user 406 may be output to the first user 404. By performing such techniques, the first representation 402 may be aligned with the first user 404 and the second representation 416 may be aligned with the second user 406. This may avoid, for example, a user having to look directly at a sensor (e.g., a camera) to appear to look at another user's eye. As shown in FIG. 3, the first representation 402 is looking directly at the eye of the second user 406, since the first user 404 is looking directly at the eye of the second representation 416.

In some examples, the representation of the user may contain a composite image and/or an actual image of the user. In one example, the first representation 402 of the first user 404 may contain an image (e.g., a real-time image) of the first user 404 with a composite image of the first user's 404 face overlaid on the real-time image of the first user 404 (e.g., a computer-generated face overlaid on the first user's 404 face). The composite image may contain the first user's face with the first user's 404 eyes (and other facial features) looking in the direction the first user 404 gazes relative to the second representation 416 (and corresponding second user 406). The composite image of the first user's 404 face may contain pixel values from the first user's 404 image (e.g., appear as if the first user 404 actually looks in that direction) or other pixel values for an avatar, cartoon character, animated expression, or the like. In another example, the first representation 402 may contain a fully synthetic representation (e.g., an avatar, cartoon character, animated expression, etc.).

Although the example of fig. 4 is discussed in the context of first device 408 and second device 410 performing particular operations, the operations may be performed by any number of devices, such as any of first device 408 and second device 410 service providers or the like.

Fig. 5 illustrates an example of evaluating a representation and/or user to ensure that a communication session has established shared presence. In this example, a first user 502 uses a first device 504 to interact with a second user 506 using a second device 508. The first user 502 and the second user 506 are part of a communication session that operates to provide a shared presence. Here, a first user 502 may be associated with a first representation (not shown in fig. 5 for ease of discussion), and a second user 506 may be associated with a second representation 510. Computing device 512 (e.g., device 504, device 508, another device, a service provider, etc.) may evaluate second representation 510 of second user 506 to determine whether the location or direction of the output of second representation 510 as perceived by first user 502 matches the location or direction of the output of second representation 510 as perceived by second user 506. For ease of discussion, the following example will discuss the evaluation of the direction in which the eye of the second representation 510 is looking. However, a similar process may be performed to evaluate the first representation of the first user 502. Further, similar processing may be performed for other features of the user's body (e.g., other body parts).

To evaluate the second representation 510, the first apparatus 504 may identify a first direction in which an eye of the second representation 510 is gazing in the coordinate system displayed by the first apparatus 504. The first direction may be a direction that the first user 502 perceives (when viewing through the first apparatus 504) that the second representation 510 is gazing (e.g., relative to the first user 502) and/or a direction that the second representation 510 is gazing relative to a frame of reference. To illustrate, the first user 502 may perceive that the second representation 510 is looking at the eye of the first user 502. The first device 504 may generate first data 514 indicating a first direction. The first data 514 may include a line or vector 514(a) representing a first direction (e.g., a line or vector located in a coordinate system), an image 514(B) of a view of the second representation 510 from the perspective of the first user 502 (e.g., an image seen by the first user 502 through the first device 504), and so on. When the evaluation process is performed by another apparatus (e.g., apparatus 508, a service provider, etc.), first apparatus 504 may send first data 514 to the other apparatus.

The second means 508 may identify a second direction in the coordinate system in which the eye of the second representation 510 is looking. The second direction 516 may be a direction in which the second user 506 gazes in the coordinate system via the second apparatus 508 relative to the first representation, frame of reference, etc. of the first user 502. To illustrate, the second user 506 may be gazing at an eye of the first representation of the first user 502 displayed via the second apparatus 508. The second device 508 may generate second data 516 indicating a second direction. The second data 516 may include a line or vector 516(a) representing a second direction (e.g., a line or vector located in a coordinate system), an image 516(B) of an estimated view of the second representation 510 from the perspective of the first representation of the first user 502 (e.g., an image generated by the second device 508 that is estimated to be seen by the second device 508 from the perspective of the first representation of the first user 502), and so on. In other words, the image 516(B) may represent the view that the second device 508 estimates to be seen by the first user 502 due to how the first representation of the first user 502 is currently presented via the second device 508. To illustrate, if the second apparatus 508 presents the first representation of the first user 502 while gazing at the eyes of the second user 506, and the second user 506 is gazing at the eyes of the first representation, the second apparatus 508 will generate an image of the second representation 510 looking directly out of the image (from the perspective of the first representation). In any case, when the evaluation process is performed by another apparatus (e.g., apparatus 504, service provider, etc.), the second apparatus 508 may send the second data 516 to the other apparatus.

The computing device 512 may evaluate a first direction in which the eye of the second representation 510 is gazing in the coordinate system displayed by the first device 504 and a second direction in which the eye of the second representation 510 is gazing in the coordinate system provided by the second device 508. In one example, at 518, the computing device 512 may compare the line or vector line represented in the first data 514(a) with the line or vector represented in the second data 516 (a). If the lines or vectors match (e.g., the two vectors are within a threshold of being close to each other, have the same angle within a threshold number of degrees, and/or have the same magnitude), then the communication session has a shared presence (or is able to establish a shared presence) (e.g., accurately portrays the eyes of users 502 and 506). Here, the communication session may continue to share the presence, establish a shared presence, and so on. Alternatively, if the lines or vectors do not match, the communication session has lost shared presence (or failed to establish shared presence). In another example, at 520, the image represented by the first data 514(B) may be compared to the image represented by the second data 516 (B). If the images match (e.g., the direction in which the eye of the second representation 510 is looking in the image of the first data 514(B) is within a threshold amount of similarity to the direction in which the eye of the second representation 510 is looking in the image of the second data 516(B), the pixel values are more than the threshold amount to the same degree, etc.), then the communication session has a shared presence (or is able to establish a shared presence). Alternatively, if the images do not match, the communication session has lost shared presence (or failed to establish shared presence).

If the communication session has lost shared presence (or is unable to establish shared presence), various operations may be performed. For example, the first apparatus 504 and/or the second apparatus 508 may be restarted, an eye tracking sensor, a depth sensor, or other apparatus that may encounter errors in providing eye data may be restarted, the communication session may be switched to one that does not share or align eye data (e.g., a more general representation of a user displaying eyes that are not aligned representation, only video of the user, etc.).

In some examples, the evaluation process may occur during calibration. In one example, user 502 and user 506 may be required to gaze at each other in-eye (e.g., one user gazes at the eyes of the other user's representation) when joining a communication session, and when user 502 (or user 506) perceives the other user's representation to be gazing at user 502, eye contact is confirmed via voice input, buttons on a controller, or the like. In another example, user 502 and user 506 may be required to gaze at each other in the eyes, and when data from the eye tracking sensor indicates that user 502 (or user 506) is gazing at the eyes of the other user's representation, apparatus 504 and/or apparatus 508 may confirm that eye contact was made.

Although the example of fig. 5 is discussed in the context of evaluating eyes, other features (e.g., body parts) may be evaluated. For example, the techniques may evaluate a direction in which sound from a mouth of the representation is perceived, a location of a touch on the perceived representation, and so on. Further, although the example of fig. 5 is discussed in the context of evaluating a representation of a user, a similar process may be performed to evaluate a direction in a coordinate system at which the user's eyes are gazing, as discussed in further detail below.

Fig. 6 illustrates an example apparatus 602 that may perform the techniques discussed herein. For example, the device 602 may represent any of the

devices

108, 110, 112, 210, 316, 318, 408, 410, 504, 508, etc.

The device 602 may include multiple types of computing devices configured to perform one or more operations. For example, device 602 may be implemented as a notebook computer, desktop computer, smart phone, electronic reader device, mobile handset, personal digital assistant, or the likeAssistants (PDAs), portable navigation devices, portable gaming devices, VR devices (e.g., VR headsets, such as HTC (3) Vive, Woods, Woo,

Rift, etc.), tablet computers, wearable computers (e.g., watches, Optical Head Mounted Displays (OHMD), etc.), portable media players, televisions, set-top boxes, computer systems in automobiles, home appliances, cameras, robots, hologram systems, security systems, home computer systems (e.g., intercom systems, home media systems, etc.), projectors, Automated Teller Machines (ATMs), etc.

Device 602 may include a processor 604, a speaker 606, a microphone 608, a display 610, a camera 612, a depth sensor 614, an accelerometer 616, a magnetometer 618, a gyroscope 620, a Global Navigation Satellite System (GNSS) component 622 (e.g., a GNSS receiver such as a GPS receiver), a battery 624, a lens 626, a touchpad 628, a button 630, a haptic device 632, an eye tracking device 634, a memory 636, and/or a network interface 638. Although not illustrated in fig. 6, in some examples, the device 602 also includes a power cord, a strap, or other means for securing the device 602 to the user (e.g., a headband in the case of a VR headset), an olfactory sensor (e.g., to detect odors), a projector, and so forth.

The one or more processors 604 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a microprocessor, a digital signal processor, and the like. The speaker 606 may output audio. In some examples, multiple speakers 606 are implemented. To illustrate, the apparatus 602 may include a speaker 606 configured to be placed on each ear of the user. Here, the speakers 606 may be adjustable so that they may be positioned according to the desires of the user. The microphone 608 may receive sound and generate audio signals. In some examples, the one or more microphones 608 (or processing of audio signals from the one or more microphones 608) may identify a direction from which sound is received (e.g., relative to the apparatus 602).

The display 610 may include a touch screen, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, an organic LED display, a plasma display, an electronic paper display, or any other type of technology. In some examples, display 610 is aligned with lens 626. To illustrate, display 610 may be aligned with an axis through lens 626. The lens 626 may be formed of plastic, glass, or the like. Lens 626 may be adjustable. To illustrate, a user may move lens 626 closer to or away from display 610 to make an image appear closer/farther and/or to focus an image. In another example, a user may adjust lens 626 to fit lens 626 to a distance between the user's eyes (e.g., an interpupillary distance (IPD)). Additionally, in some examples, lens 626 may contain a sensor (e.g., a potentiometer) for measuring IPD distance. In some examples, such data regarding the IPD may be used to generate body data and/or sent to another apparatus so that a representation of the user using apparatus 602 may be accurately presented.

The camera 612 may capture an image. The images may include still images, video, and the like. The camera 612 may include a front camera, a rear camera, and the like. In some examples, the apparatus 602 includes one or more cameras to capture surroundings about the user and/or one or more cameras to capture images of the user. For example, the apparatus 602 may be implemented as a VR headset that includes an outward camera for capturing images of a physical environment in which the user is located and an inward camera for capturing images of the user's face or other body parts.

Depth sensor 614 (also referred to as a range sensor) may implement a variety of techniques to generate depth data that indicates a distance to a point in the surrounding environment. In some examples, the apparatus 602 includes a depth sensor 614 facing the user's environment (e.g., to obtain depth data about the user's physical environment) and a depth sensor 614 facing the user (e.g., to obtain facial data about the user's face, hand data about the user's hand, other data about other parts of the user's body, etc.). In one example, depth sensor 614 may include a time-of-flight camera ((ToF camera), which measures the sum of the ToF camera andtime of flight of optical signals between objects in the environment. In another example, the depth sensor 614 may include a structured light 3D scanner (e.g., an infrared emitter and an infrared camera) to implement structured light techniques that project a known pattern (e.g., structured light) onto a surface and capture an image. In other examples, the depth sensor 614 may implement other techniques, such as surface laser triangulation, stereo triangulation, interferometry, and so forth. As a non-limiting example, depth sensor 614 may be implemented by

Is/are as follows

Camera, from

Is/are as follows

From

Is/are as follows

Camera, from

Is/are as follows

System from

Is/are as follows

Systems, etc. using techniques and/or components. In some examples, the depth sensor 614 includes a camera 612 to capture images. Depth sensor 614 may generate depth data, such as range images, depth maps, and the like. The depth data may indicate one or more respective distances to one or more points represented in the depth data. In some examples, the depth data may be used to identify distances to points in the environment, identify objects or surfaces in the environment, and/or position and/or maintain a representation of a user or other content relative to the objects or surfaces as the device 602 moves within the environment (e.g., in AR or VR implementations).

Accelerometer 616 may determine the acceleration of device 602. The magnetometer 618 may determine the magnetic force of the device 602. Additionally, gyroscope 620 may determine the orientation and/or angular velocity of device 602. In some cases, data from accelerometer 616, magnetometer 618, and/or gyroscope 620 may be used to determine the position of device 602 and/or track the position of device 602 over time. To illustrate, data from accelerometer 616, magnetometer 618, and/or gyroscope 620 may be used to determine how far device 602 has traveled from an initial known position (e.g., based on GNSS data or other data) and/or to determine a direction of travel of device 602.

GNSS component 622 may determine the geographic location of device 602. For example, GNSS component 622 may receive information from a GNSS (e.g., GPS) and calculate a geographic location of device 602 based on the information. Touchpad 628 and/or button 630 may be configured to receive input (e.g., touch input, etc.). Additionally, in some examples, touch pad 628 and/or buttons 630 may provide output such as tactile feedback.

Haptic devices 632 may include haptic sensors, haptic feedback devices, haptic styluses or tools, haptic suits (whole body, glove, torso, etc.), wearable devices, and the like. In some examples, haptic devices 632 may reconstruct touch sensations by applying forces, motions, and/or vibrations. Haptic devices 632 may adjust the amount of force, motion, and/or vibration applied (e.g., to a user), provide force, motion, and/or vibration at a particular location (e.g., on a haptic suit), etc. Further, in some examples, haptic device 632 may detect a touch or other physical interaction. Haptic device 632 may measure the amount of force received (e.g., to the user), determine the location of the force (e.g., input), and so on.

The eye tracking device 634 may detect and/or track eye movements and/or head movements of the user. In some examples, the eye tracking device 634 detects and/or tracks the position of the pupil of the eye. The eye tracking device 634 may generate eye tracking data.

Memory 636 (as well as all other memory described herein) can comprise one or a combination of computer-readable media. Computer-readable media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable media includes, but is not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to store information for access by a computing device. As defined herein, a computer readable medium does not include a communication medium such as a modulated data signal or a carrier wave. Thus, a computer-readable medium is a non-transitory medium.

The network interface 638 may be configured to communicate with one or more devices over a network in a wireless and/or wired manner.

In some examples, one or more of the components 604-638 can be implemented as part of a controller communicatively coupled to the remaining components. Here, the device 602 may be implemented as two devices that facilitate the functionality described herein. For example, a controller may contain its own (or shared) processor 604, accelerometer 616, magnetometer 618, gyroscope 620, touchpad 628 and/or buttons 630, haptic device 632, memory 636, network interface 638, or the like.

In some examples, the apparatus 602 may be configured to receive user input, such as gesture input (e.g., via the camera 612), touch input, audio or voice input, and so forth. Further, the apparatus 602 may be configured to output content, such as audio, images, video, and so forth.

As shown, the memory 636 may contain a shared presence component 640. The shared presence component 640 may represent software and/or hardware. While one component is illustrated as an example for performing various functionalities, their functionalities and/or similar functionalities may be arranged differently (e.g., broken down into a large number of components, etc.). In some cases where hardware is implemented, any or all of the functions may be implemented (e.g., performed) in whole or in part by hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), program specific standard products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and so forth.

The shared presence component 640 may generally perform operations to establish and/or evaluate a shared presence for a communication session between users. In one example, shared presence component 640 may receive data 642 from microphone 608, display 610, camera 612, depth sensor 614, accelerometer 616, magnetometer 618, gyroscope 620, Global Navigation Satellite System (GNSS) component 622, touchpad 628, button 630, haptic device 632, and/or eye tracking device 634. The shared presence component 640 can analyze the data 642 and perform various operations based on the analysis. In some examples, the analysis includes computer vision processing and/or image processing. To illustrate, based on the analysis of the data 642, the shared presence component 640 may identify a position of a frame of reference, a user, or a device (e.g., device 602) relative to the frame of reference and/or another user or device, identify an object or surface in the environment in which the device 602 is located, identify a position, velocity, orientation (e.g., gesture, angle, etc.) of the user and/or the device 602, identify a position, velocity, orientation, etc. of a feature (e.g., a body part) of the user, identify a direction in which an eye of the user or a representation of the user is gazing, and/or perform various other operations. In some examples, the shared presence component 640 may communicate with another apparatus to negotiate a shared frame of reference, such as where the two apparatuses are in the same physical or virtual environment. The frame of reference may be maintained during the communication session, and/or updated periodically or when an event occurs (e.g., the apparatus 602 changes location (e.g., moves from one room to the next), the apparatus 602 moves more than a threshold distance from an existing frame of reference, etc.).

In some examples, data 642 includes body data, such as face data about a face of the user, hand data about a hand of the user, data about another body part, and so forth. For example, the body data may be captured by the camera 612, the depth sensor 614, the eye tracking device 634, and/or any other component of the device 602. In some cases, the body data may be represented in a format such as MPEG-4 facial and body animation data. The device 602 may capture physical data of a user that is using the device 602 and transmit the physical data to another device so that the other device may present a representation of the user. In some examples, the body data indicates a position, size, shape, orientation, etc. of a body part of the user (e.g., a size of a hand of the user, an IPD of an eye of the user, a position of a mouth, ear, nose, etc. of the user, etc.). In one illustration, the body data may indicate a direction in which the user's eyes are gazing (e.g., a direction in which the pupils are gazing).

In some examples, the shared presence component 640 may generate human model data 644 that describes a human model (sometimes referred to as a "model"). For example, based on the data 642 (e.g., body data, etc.), the shared presence component 640 can generate a human model that represents the user. The human model may contain features that represent the user (e.g., a hand sized to the size of the user's hand, an IPD that matches the user's IPD, an oral cavity, an ear, a nose, etc., located where the user's oral cavity, ear, nose, etc., are located, etc.). In some cases, the human model may be user-specific. In other cases, various data is collected over time to create a universal human model representing multiple users.

In some examples, the shared presence component 640 may align the representation of the user within a coordinate system while maintaining spatial alignment of the user in the physical environment and/or maintaining spatial alignment of the representation in another coordinate system. To do so, the shared presence component 640 can generate spatial data for the user and/or representation (e.g., based on an analysis of the data 642), generate composite spatial data 646, and/or use the spatial data and/or the composite spatial data 646 to locate the representation of the user in a coordinate system. In some examples, the shared presence component 640 can generate (e.g., present) a representation of the user based on the data 642 (e.g., body data), the human model data 644, the composite spatial data 646, and/or the like.

The coordinate system may be a set of standardized measurements or coordinates fixed to a frame of reference. The coordinate system may describe the geometric state of the object (e.g., relative to a frame of reference). The coordinate system may represent a virtual environment, a digital line, a cartesian coordinate system, a polar coordinate system, etc.

In some cases, composite spatial data 646 is implemented in the context of a state machine. For example, as each user joins a communication session, the user may be associated with a state machine indicating that the user has joined (or lost connection), spatial data and/or a representation of the user, whether physical data of the user is detected and/or exchanged between devices, and the like. Composite spatial data 646 may represent state machines of various users as part of a communication session. In some examples, composite spatial data 646 may contain spatial data for each user or device that is part of a communication session.

In some examples, the shared presence component 640 may evaluate shared presence. To illustrate, a direction at which the first user perceives the representation of the second user to be gazed (as displayed by an apparatus associated with the first user) may be compared to a direction at which the apparatus of the second user indicates the representation of the gazed for the second user. If the directions are the same within a threshold amount, the communication session has a shared presence. And if the directions are not the same within a threshold amount, the communication session has lost shared presence.

Fig. 7 illustrates an example service provider 702 that may perform the techniques discussed herein. For example, service provider 702 may represent service provider 130, computing device 512, and the like.

Service provider 702 may be implemented as one or more computing devices, such as one or more desktop computers, notebook computers, servers, and the like. One or more computing devices may be configured in a cluster, a data center, a cloud computing environment, or a combination thereof. In one example, one or more computing devices provide cloud computing resources, including computing resources, network resources, storage resources, and the like, that operate remotely from the device. To illustrate, service provider 702 may implement a cloud computing platform/infrastructure for building, deploying, and/or managing applications and/or services.

As shown, service provider 702 includes one or more processors 704, memory 706, and one or more network interfaces 708. The one or more processors 704 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a microprocessor, a digital signal processor, and so forth. The one or more network interfaces 708 may communicate with other devices in a wireless or wired manner.

The memory 706 (as well as all other memories described herein) may comprise one or a combination of computer-readable media. Computer-readable media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable media includes, but is not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to store information for access by a computing device. As defined herein, a computer readable medium does not include a communication medium such as a modulated data signal or a carrier wave. Thus, a computer-readable medium is a non-transitory medium.

In some examples, service provider 702 may perform any of the operations discussed with reference to apparatus 602. For example, service provider 702 may perform operations to facilitate a communication session (e.g., receive data from device 602, analyze the data, generate the data, forward the data onto another device to provide a representation of the user, etc.). Thus, service provider 702 may contain shared presence components 640, data 642, human model data 644, and/or composite spatial data 646.

Example procedure

Fig. 8-16 illustrate example processes 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, and 1600 for using techniques described herein. For ease of illustration, processes 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, and 1600 will be described as being performed by a computing device. For example, one or more of the individual operations of

processes

800, 900, 1000, 1100, 1200, 1300, 1400, 1500, and 1600 may be performed by apparatus 602, service provider 702, and/or any other apparatus. However, processes 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, and 1600 may be performed in other architectures. Further, the architecture 100 may be used to perform other processes.

Processes

800, 900, 1000, 1100, 1200, 1300, 1400, 1500, and 1600 (and each process described herein) are illustrated as logical flow diagrams, each operation of which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the operations represent computer-readable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer readable instructions contain routines, programs, objects, components, data structures, etc. that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the described processes. In addition, any number of the described operations may be omitted.

FIG. 8 illustrates an example process 800 of generating composite spatial data to maintain spatial alignment of a user in a physical environment and/or spatial alignment of representations in a coordinate system.

At 802, a computing device can facilitate a communication session with a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in the communication session in a first representation and the second user may be represented in the communication session in a second representation. In some cases, the first user and the second user are located in the same physical environment, while in other cases, the first user and the second user are located in separate physical environments. Although two users are discussed, any number of users may be part of a communication session. To illustrate, a communication session may include a first user and a second user located in the same physical environment and a third user located in a different physical environment. The communication session may be conducted in a point-to-point (P2P) manner, with a service provider, or in any other manner.

At 804, the computing device may identify first spatial data associated with a first user. For example, the computing device may receive first spatial data from a first device (e.g., when the service provider is performing process 800), determine the first spatial data (e.g., when the first device or the service provider is performing process 800), and so on. The first spatial data may indicate a first frame of reference for the first user, a location of the first user or the first representation relative to the first frame of reference, and/or the like. The frame of reference may include a virtual frame of reference (e.g., the origin of the coordinate system of the virtual environment) and/or a physical frame of reference (e.g., an anchor point in the physical environment, such as a person, an object, etc.).

At 806, the computing device may identify second spatial data associated with a second user. For example, the computing device may receive the second spatial data from the first device (e.g., when process 800 is performed by the first device or a service provider). The second spatial data may indicate a second frame of reference for the second user, a location of the second user or the second representation relative to the second frame of reference, and so on. In some cases, the first frame of reference and the second frame of reference may be the same (e.g., a common spatial anchor point or virtual point) when the first user and the second user are located in the same physical or virtual environment.

At 808, the computing device may generate composite spatial data. The composite spatial data may contain and/or may be based on the first spatial data and the second spatial data. The composite spatial data may indicate a virtual point (e.g., in a coordinate system) shared by the first spatial data and the second spatial data, a location of the first user or the first representation relative to the virtual point, a location of the second user or the second representation relative to the virtual point, and so on. The composite spatial data may contain any number of users or representations of spatial data as part of a communication session.

At 810, the computing device may cause display of a first representation of a first user. For example, the computing device may cause the first representation to be displayed to the second user via the second device (e.g., send data to be displayed via the second device, present the first representation, etc.). The first representation may be positioned within the coordinate system based on the composite spatial data (e.g., first spatial data included in the composite spatial data). The first representation may be positioned within a coordinate system to maintain a position of the first user relative to the first frame of reference.

At 812, the computing device may cause display of a second representation of a second user. For example, the computing device may cause the second representation to be displayed to the first user via the first device (e.g., send data to be displayed via the first device, present the second representation, etc.). The second representation may be positioned within the coordinate system based on the composite spatial data (e.g., second spatial data included in the composite spatial data). The second representation may be positioned within a coordinate system to maintain a position of the second user relative to the second frame of reference. In some cases where the first user and the second user are in the same physical environment, the second representation may be positioned in a coordinate system relative to the first representation such that the position of the first representation relative to the second representation is scaled to the position of the first user relative to the second user in the physical environment.

At 814, the computing device may maintain the spatial data in the composite spatial data when the device is part of a communication session. For example, the computing device may maintain the first spatial data of the first user in the composite spatial data while the first device is connected to the communication session. The first spatial data may be removed from the composite spatial data if the first device disconnects from the communication session. This may allow the position of the first user or the first representation relative to the first frame of reference to be maintained.

Fig. 9 illustrates an example process 900 of evaluating eye alignment of users and/or representations participating in a communication session.

At 902, a computing device can facilitate a communication session with a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in the communication session in a first representation and the second user may be represented in the communication session in a second representation. Although two users are discussed, any number of users may be part of a communication session.

At 904, the computing device may determine a first direction in which the first user's eye is gazing (e.g., within a physical environment). For example, the computing device may receive or capture eye tracking data (from the first device) indicating a first direction in which the first user's eye is gazing. The eye tracking data may be generated by an eye tracking sensor on the first device. The first direction may be relative to a frame of reference, e.g., an origin of a coordinate system, a second user, etc. In some cases, at 904, the computing device may determine whether a first direction in which the first user eye is gazing matches a direction in which the first representation's eye is gazing within the virtual environment (e.g., relative to a frame of reference).

At 906, the computing device may cause display of the first representation via the second device. For example, the computing device may cause the first representation to be displayed to the second user via the second device (e.g., send data to be displayed via the second device, present the first representation, etc.). The first representation may include an eye that is looking in a direction associated with a first direction in which the first user's eye is looking in the physical environment. In other words, the eyes of the first representation may look in the same direction as the first user (e.g., relative to the frame of reference).

At 908, the computing device may cause display of the second representation via the first device. For example, the computing device may cause the second representation to be displayed to the first user via the first device (e.g., send data to be displayed via the first device, present the second representation, etc.). The second representation may include an eye that is gazing in a direction that is related to a direction in which the second user's eye is gazing in the physical environment.

At 910, the computing device may determine a second direction in which a second representation or a second user's eye is gazing in the virtual environment as provided or determined by the first device. For example, the computing device may receive (from the first device) data indicating a second direction in which a second representation or a second user's eye is looking in the virtual environment (e.g., as displayed by the first device). The second direction may be relative to the first representation, the first user, or another frame of reference. In some cases, the data may include a first image of a view of the first representation or the second representation from the perspective of the first user (e.g., an image viewed by the first user through the first device).

At 912, the computing device may determine a third direction in which the second representation or the second user's eye is gazing in the virtual environment as provided or determined by the second device. For example, the computing device may receive (from the second device) data indicating a third direction in which the eye of the second representation is looking in the virtual environment (e.g., as displayed by the second device). The third direction may be relative to the first representation or another frame of reference. In some cases, the data may contain a second image that represents an estimated view of the second representation from the perspective of the first representation.

At 914, the computing device may evaluate the second direction and the third direction. In some cases, the computing device may represent the second direction with a first vector/line (e.g., a vector/line originating from and coming from the eyes of the second representation) and represent the third direction with a second vector/line (e.g., a vector/line originating from and coming from the eyes of the second representation). The computing device may then compare the first vector/line to the second vector/line. Additionally, in some cases, the computing device may compare a first image of a view of the second representation from the perspective of the first representation or the first user to a second image of an estimated view of the second representation from the perspective of the first representation.

At 916, the computing device may determine whether the second direction matches the third direction within a threshold amount (e.g., the second direction and the third direction are substantially aligned). The determination at 916 may be based on the evaluation at 914. In some examples, the determination at 916 may include determining whether a first vector/line representing a second direction (the second representation or the second user's eye is gazing in the second direction, as provided or determined by the first apparatus) matches a second vector/line representing a third direction (the second representation or the second user's eye is gazing in the third direction, as provided or determined by the second apparatus) by at least a threshold amount. This may include determining whether the first vector/line and the second vector/line are within a threshold distance of each other (e.g., starting from the same position), have the same angle within a threshold number of degrees (e.g., relative to the origin of the coordinate system), have the same magnitude within a threshold amount, and so on. Additionally, in some cases, the determination at 916 may include determining whether a direction in which an eye of the second representation gazes in the first image (of the view of the second representation from the perspective of the first representation or the first user) matches a direction in which an eye of the second representation gazes in the second image (of the estimated view of the second representation from the perspective of the first representation or the first user).

If the computing device determines that the second direction matches the third direction within a threshold amount, process 900 may proceed to 918. Alternatively, if the computing device determines that the second direction does not match the third direction within a threshold amount, process 900 may proceed to 920.

At 918, the computing device can maintain a current state of the communication session (e.g., allow the communication session to proceed as it is currently ongoing). However, in other examples, the computing device may perform other operations at 918, such as providing a notification to a user or system indicating that the communication session has a shared presence or that the user/representation is aligned.

At 920, the computing device may perform an operation associated with eye misalignment. For example, the communication session may be a first type of communication session associated with displaying a representation based on eye data (e.g., aligned eyes). Here, at 920, the computing device may switch from a first type of communication session to a second type of communication session that is not associated with eye data (e.g., does not align eyes). The second type of communication session may be a more general type of communication session that allows the first user and the second user to communicate without eye alignment, e.g., a communication session that uses a representation of the users without eye alignment, a communication session that uses video or voice (e.g., video conferencing, voice conferencing, etc.). In another example, at 920, the computing device may cause the first device and/or the second device to reboot, cause an eye tracking sensor, a depth sensor, or other sensor associated with the first device and/or the second device to reboot, and so on. This may assist in resetting components participating in the communication session in an attempt to reestablish the shared presence including eye alignment.

In some examples, process 900 may be performed periodically and/or in response to another event, such as a user joining a communication session. Further, process 900 may be performed for each device or user that is part of a communication session (e.g., to maintain shared presence for any number of users).

Fig. 10 illustrates an example process 1000 of evaluating communication sessions to determine whether users or representations are gazing at each other.

At 1002, a computing device may facilitate a communication session with a first device and a second device. For example, the computing device may calibrate the first device and the second device to have eye alignment for the communication session. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in a first representation within a coordinate system and the second user may be represented in a second representation within the coordinate system.

At 1004, the computing device may determine that the first user perceives that the eye of the second representation is gazing at the eye of the first user. In one example, a computing device may receive user input (e.g., originating from a first user) indicating that the first user perceives that an eye of the second representation is gazing at an eye of the first user. In another example, a computing device may receive (or capture) eye tracking data of a first user from an eye tracking sensor associated with a first device. Based on the eye tracking data, the computing device may determine a direction in which the first user's eyes are gazing. The computing device may determine that the direction in which the first user's eyes are looking is toward the eyes of the second representation displayed by the first device.

At 1006, the computing device may evaluate a first direction in which the second representation of the eye is gazing and a second direction in which the first representation of the eye is gazing. In one example, the computing device may evaluate a first direction in which the second represented eye is gazing in the coordinate system displayed by the first device and a second direction in which the first represented eye is gazing in the coordinate system displayed by the second device. To do so, the computing device may represent the first direction with a first vector/line (e.g., a vector/line originating from and coming from the eyes of the second representation) and represent the second direction with a second vector/line (e.g., a vector/line originating from and coming from the eyes of the first representation). The computing device may then compare the first vector/line to the second vector/line.

At 1008, the computing device may determine whether the first user has made eye contact with the second representation. In one example, the computing device may determine, based on the evaluation at 1006, whether the first vector/line matches the second vector/line within a threshold amount (e.g., whether the first vector/line is aligned with or pointed at the second vector/line, e.g., within a threshold number of degrees along the same axis).

If the computing device determines that the first user has made eye contact with the second representation, process 1000 may proceed to 1010. Alternatively, if the computing device determines that the first user has not made eye contact with the second representation, process 1000 may proceed to 1012.

At 1010, the computing device can utilize eye alignment to facilitate the communication session. For example, the computing device may allow the communication session to proceed with eye alignment (e.g., initiate the communication session with eye alignment). However, in other examples, the computing device may perform other operations at 1010, such as providing a notification to a user or system indicating that the communication session has established eye alignment.

At 1012, the computing device may perform an operation associated with eye misalignment. For example, the computing device may provide a notification to the user or system indicating that the communication session has not established eye alignment. Alternatively or additionally, the computing device may initiate a communication session without eye alignment, cause the first device and/or the second device to restart, cause the eye tracking sensor, the depth sensor, or other sensors associated with the first device and/or the second device to restart, etc.

In some examples, process 1000 occurs as part of a calibration process when the first device and/or the second device join the communication session. For example, the first user and the second user may be required to look at the eyes of each other (e.g., one user looking at the eyes of the representation of the other user) and confirm eye contact through voice input, buttons on a controller, or the like. In other examples, process 1000 may be performed at other times.

Fig. 11 illustrates an example process 1100 of evaluating an output of a representation as part of a communication session.

At 1102, a computing device can facilitate a communication session with a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in a first representation within a coordinate system and the second user may be represented in a second representation within the coordinate system.

At 1104, the computing device may identify a first location or orientation of the output of the second representation as perceived by the first user. For example, the computing device may receive data from the first device, (or determine) a first position or orientation of the output of the second representation as perceived by the first user. The output may include sound, touch, displayed eyes, and the like.

At 1106, the computing device may identify a second location or orientation of the output of the second representation as perceived by the second user. For example, the computing device may receive data from the second device, (or determine) a second location or orientation of the output of the second representation as perceived by the second user. The output may include sound, touch, displayed eyes, and the like.

At 1108, the computing device may evaluate a first position or orientation of the output and a second position or orientation of the output. For example, the computing device may compare a first position or orientation of the output to a second position or orientation of the output.

At 1110, the computing device may determine whether the first position or orientation matches the second position or orientation. For example, based on the evaluation at 1108, the computing device may determine whether the first location or orientation matches the second location or orientation within a threshold amount.

If the computing device determines that the first position or orientation matches the second position or orientation, process 1100 may proceed to 1112. Alternatively, if the computing device determines that the first position or orientation does not match the second position or orientation, process 1100 may proceed to 1114.

At 1112, the computing device can utilize the alignment to facilitate the communication session. For example, the computing device may allow the communication session to proceed with alignment (e.g., initiate the communication session with alignment). However, in other examples, the computing device may perform other operations at 1112, such as providing a notification to a user or system indicating that the communication session has established alignment.

At 1114, the computing device can perform an operation associated with the misalignment. For example, the computing device may provide a notification to the user or system indicating that the communication session has not established alignment or is out of alignment. Alternatively or additionally, the computing device may initiate or continue the communication session without alignment, cause the first device and/or the second device to restart, cause the eye tracking sensor, the depth sensor, the haptic suit, the speaker or other sensors/components associated with the first device and/or the second device to restart, or the like.

In some examples, process 1100 occurs as part of a calibration process when the first device and/or the second device join the communication session. In other examples, process 1100 may be performed at other times, such as any time during a communication session.

Fig. 12 illustrates an example process 1200 of evaluating sound associated with a representation as part of a communication session.

At 1202, a computing device can facilitate a communication session utilizing a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in a first representation within a coordinate system and the second user may be represented in a second representation within the coordinate system.

At 1204, the computing device may identify a first direction in which sound from the oral cavity of the second representation is perceived by the first user within the coordinate system. For example, the computing device may receive (or determine) data from the first device indicating the first direction. In some cases, the data describes a vector/line representing a first direction. To illustrate, the computing device may analyze audio data indicative of a location to which audio from a speaker of the first device is directed and/or oral cavity data indicative of a location of the oral cavity within the coordinate system of the second representation as perceived by the first user. Based on such analysis, the computing device may determine a first direction.

At 1206, the computing device may identify a second direction in which the second device outputs sound from the second represented oral cavity for the coordinate system. For example, the computing device may receive (or determine) data from the second device indicating the second direction. In some cases, the data describes a vector/line representing the second direction. To illustrate, the computing device may analyze the audio data sent by the second device describing the location of the sound used to output the second representation of the oral cavity and/or the oral cavity data sent by the second device describing the location of the second representation of the oral cavity within the coordinate system.

At 1208, the computing device may evaluate the first direction and the second direction. For example, the computing device may represent the first direction with a first vector/line (e.g., a vector/line originating from the oral cavity of the second representation as perceived by the first device) and represent the second direction with a second vector/line (e.g., a vector/line originating from the oral cavity of the second representation as provided by the second device). The computing device may then compare the first vector/line to the second vector/line. In some examples, the first vector may contain a magnitude corresponding to a loudness of the sound of the oral cavity from the second representation perceived by the first user or output by the first device. At the same time, the second vector may have a magnitude corresponding to the loudness of the sound from the oral cavity of the second representation provided by the second device.

At 1210, the computing device may determine whether the first direction matches the second direction. For example, based on the evaluation at 1208, the computing device may determine whether the first direction matches the second direction within a threshold amount. In some cases, the determination at 1210 may include determining whether the first vector/line matches the second vector/line by at least a threshold amount. This may include determining whether the first vector/line and the second vector/line are within a threshold distance of each other (e.g., starting from the same position), have the same angle within a threshold number of degrees (e.g., relative to the origin of the coordinate system), have the same magnitude within a threshold amount, and so on.

If the computing device determines that the first direction matches the second direction, process 1200 may proceed to 1212. Alternatively, if the computing device determines that the first direction does not match the second direction, process 1200 may proceed to 1214.

At 1212, the computing device may facilitate the communication session with the alignment. For example, the computing device may allow the communication session to proceed with alignment (e.g., initiate the communication session with alignment). However, in other examples, the computing device may perform other operations at 1212, such as providing a notification to a user or system indicating that the communication session has established alignment.

At 1214, the computing device may perform an operation associated with the misalignment. For example, the computing device may provide a notification to the user or system indicating that the communication session has not established alignment or is out of alignment. Alternatively or additionally, the computing device may initiate or continue the communication session without alignment, cause the first device and/or the second device to restart, cause the eye tracking sensor, the depth sensor, the haptic device, the speaker, the microphone, or other sensors/components associated with the first device and/or the second device to restart, or the like.

In some examples, the process 1200 occurs as part of a calibration process when the first apparatus and/or the second apparatus join the communication session. In other examples, process 1200 may be performed at other times, such as any time during a communication session.

FIG. 13 illustrates an example process 1300 of evaluating a touch associated with a representation as part of a communication session.

At 1302, a computing device can facilitate a communication session with a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented in a first representation within a coordinate system and the second user may be represented in a second representation within the coordinate system. In some examples, the first device and/or the second device include a haptic device configured to detect a touch from a user and/or apply a force/vibration to the user to simulate a touch or other contact on the user.

At 1304, the computing device may identify a first location at which the touch is perceived from the second representation. For example, the computing device may receive (or determine) data from the first device indicative of the first location. In some cases, the data describes a vector/line representing the first location. To illustrate, the computing device may receive data from the first device indicating a location on the first user's body where a touch is perceived from the second representation (by the first user via the first device) within the coordinate system. In other words, the data may indicate where the second representation or second user touched the first user's body. In some examples, the computing device may analyze an image of a view seen by the first user via the first device to determine where the second representation appears to touch the first user or the first representation.

At 1306, the computing device may identify a second location on the first representation at which the touch is provided by the second representation. For example, the computing device may receive (or determine) data indicative of the second location from the second device. In some cases, the data describes a vector/line representing the second location. To illustrate, the computing device may receive, from the second device, data indicative of a location on the body of the first representation where the second user or the second representation contacted the first representation. In other words, the data may indicate where the second representation or second user touched the first user's body. In some examples, the computing device may analyze an image of a view seen by a second user via the second device to determine where the second representation appears to touch the first user or the first representation.

At 1308, the computing device may evaluate the first location and the second location. For example, the computing device may represent the first location with a first vector/line (e.g., a vector/line originating from a finger of the second representation as perceived by the first device) and the second location with a second vector/line (e.g., a vector/line originating from a finger of the second representation as provided by the second device). The computing device may then compare the first vector/line to the second vector/line. In some examples, the first vector may contain a magnitude corresponding to a force of a touch sensed on the first representation. Meanwhile, the second vector may have a magnitude corresponding to a force provided by the second representation or second user (e.g., an amount of force provided by the second user through a haptic suit or other input device).

At 1310, the computing device may determine whether the first location matches the second location. For example, the computing device may determine whether the first location matches the second location within a threshold amount based on the evaluation at 1308. In some cases, the determination at 1310 may include determining whether the first vector/line matches the second vector/line by at least a threshold amount. This may include determining whether the first vector/line and the second vector/line are within a threshold distance of each other (e.g., starting from the same location), have the same angle within a threshold number of degrees (e.g., relative to the origin of the coordinate system), have the same magnitude within a threshold amount, and so on. In some cases, the direction of the touch may correspond to the direction of the applied force, e.g., one representation pushes the other representation in one direction.

If the computing device determines that the first location matches the second location, process 1300 may proceed to 1312. Alternatively, if the computing device determines that the first location does not match the second location, process 1300 may proceed to 1314.

At 1312, the computing device may utilize the alignment to facilitate the communication session. For example, the computing device may allow the communication session to proceed with alignment (e.g., initiate the communication session with alignment). However, in other examples, the computing device may perform other operations at 1312, such as providing a notification to a user or system indicating that the communication session has established alignment.

At 1314, the computing device may perform an operation associated with the misalignment. For example, the computing device may provide a notification to the user or system indicating that the communication session has not established alignment or is out of alignment. Alternatively or additionally, the computing device may initiate or continue the communication session without alignment, cause the first device and/or the second device to restart, cause the eye tracking sensor, the depth sensor, the haptic device, the speaker, the microphone, or other sensors/components associated with the first device and/or the second device to restart, or the like.

In some examples, the process 1300 occurs as part of a calibration process when the first apparatus and/or the second apparatus join the communication session. In other examples, process 1300 may be performed at other times, such as any time during a communication session.

FIG. 14 illustrates an example process 1400 of creating a human model and using the human model to evaluate user interactions.

At 1402, a computing device may capture data about a user. The data may include an image of the user, depth data of the user, eye tracking data of the user, and the like.

At 1404, the computing device can create a human model (e.g., human model data) based on the data captured at 1402. The human model may represent a body part of the user. For example, the body parts of the human model may be scaled to the size, shape, location, etc. of the body parts of the user.

At 1406, the computing device may cause a representation of the user to be displayed via the device. In some examples, the apparatus is a computing apparatus, while in other examples, the apparatus is another apparatus. The representation may be based on a human model. For example, the body parts of the human model may be scaled to the size, shape, location, etc. of the body parts of the human model. In some cases, if the human model contains only information about the user's head and hands, the representation may contain only the head and hands.

At 1408, the computing device may determine that a signal indicative of movement has not been received from the input device for a period of time. For example, because the computing device has not received a signal from the input device indicative of movement of the input device, the computing device may determine that the input device has not moved for more than a threshold amount of time. The input device may be a controller or any other input device.

At 1410, the computing device may determine a position of the input device relative to a device associated with the input device (e.g., a device paired with or otherwise associated with the input device). For example, in response to determining that a signal indicative of movement has not been received from the input device within a period of time at 1408, the computing device may determine a distance/proximity of the input device to a device associated with the input device.

At 1412, the computing device may determine that the input device is located greater than a threshold distance from the device. For example, based on the human model and the position of the input device relative to the device, the computing device may determine that the input device is located greater than a threshold distance from the device or a user using the device. Here, the computing device may reference the human model to determine that the body part using the input device is relatively far from the device (e.g., the controller is relatively far from the ear headset).

At 1414, the computing device may cause the representation to be displayed without displaying the body part associated with the input device. For example, if the input device is a hand control and the device is a headset, and it has been determined at 1412 that the hand control is far from the headset (e.g., indicating that the user has dropped the hand control), the computing device may cause the representation to be displayed without displaying the hand. In some cases, for example when the communication session is conducted by a service provider, this includes sending instructions to the output device to not display the represented body part.

Fig. 15 illustrates an example process 1500 of causing a representation to be displayed with a representation of eyes gazed in a direction corresponding to a direction in which a user is gazing in a physical environment.

At 1502, a computing device can facilitate a communication session with a first device and a second device. For example, a computing device may initiate a communication session between a first device and a second device. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented by a first representation and the second user may be represented by a second representation. The communication session may comprise a video telephony session, a virtual reality session, an augmented reality session, or any other session.

In some examples, the first representation includes data from a first image of a first user and/or the second representation includes data from a second image of a second user. Additionally, in some examples, the first representation is displayed in an overlapping manner on a face depicted in the live image of the first user and/or the second representation is displayed in an overlapping manner on a face depicted in the live image of the second user. Further, in some examples, the first representation represents a face of the first user and/or the second representation represents a face of the second user.

At 1504, the computing device may cause display of the second representation via the first device and/or cause display of the first representation via the second device. For example, the computing device may generate the first representation and/or transmit the first representation to the second device. Additionally, in some examples, when the computing device determines that the direction in which the first user's eye is gazing is at the eye representation of the second representation displayed by the first device, the computing device may cause the first representation to be displayed via the second device, wherein the eye representation of the first representation gazes at the eye of the second user.

At 1506, the computing device may receive first data for the first user from the first device. The first data may comprise a first image, first depth data, etc.

At 1508, the computing device may determine a first direction in which the first user's eye is gazing relative to the second representation displayed by the first device. This may be based on the first data.

At 1510, the computing device may receive second data for a second user from a second device. The second data may comprise a second image, second depth data, etc.

At 1512, the computing device may determine a position and orientation of the second user relative to the second device based on the second data.

At 1514, the computing device may cause the first representation to be displayed via the second device with an eye representation gazed in a second direction relative to the second user that is aligned with the first direction in which the eye of the first user is gazed relative to the second representation displayed by the first device. This may be based on a first direction in which the first user's eyes are looking relative to the second representation displayed by the first apparatus and/or a position and orientation of the second user relative to the second apparatus.

Fig. 16 illustrates an example process 1600 of causing a representation to be displayed with a representation of eyes gazed in a direction corresponding to a direction in which a user is gazing in a physical environment.

At 1602, a first apparatus can facilitate a communication session with the first apparatus and a second apparatus. The first device may be associated with a first user and the second device may be associated with a second user. The first user may be represented by a first representation and the second user may be represented by a second representation.

At 1604, the first device may capture first data of the first user using the sensor. The first data may comprise an image, depth data, etc.

At 1606, the first device may receive second data from the second device and/or the service provider, the second data indicating a direction in which an eye representation of the second representation should be gazed relative to the first user. The direction may be aligned with another direction in which the second user's eye is gazing relative to the first representation displayed via the second device.

At 1608, the first device can determine a position and orientation of the first user relative to a sensor of the first device. This may be based on the first data.

At 1610, the first apparatus may cause a second representation to be displayed via the first apparatus, wherein an eye representation of the second representation gazes in the direction indicated in the second data. This may be based on the direction in which the first data and/or the second represented eye should be gazed at relative to the first user.

The foregoing may also be understood in view of the following:

1. a method, comprising:

initiating a communication session utilizing a first device associated with a first user and a second device associated with a second user, the first user being represented in a first representation in the communication session and the second user being represented in a second representation in the communication session, the communication session being associated with at least one virtual environment;

determining, by the first device, a first direction in which the first user's eye is gazing within a physical environment;

causing, by the first device, display, via the second device, the first representation including an eye gazed in a second direction related to the first direction in which the eye of the first user gazed in the physical environment;

causing, by the first device, display of a second representation within the at least one virtual environment;

determining, by the first apparatus, a third direction of gaze of the eye of the second representation relative to the first representation in the at least one virtual environment as displayed by the first apparatus, the third direction being related to a fourth direction of gaze of the eye of the second user in the physical environment or another physical environment;

receiving data from the second apparatus indicating a fifth direction in which the eye of the second representation gazes relative to the first representation in the at least one virtual environment as provided by the second apparatus; and

determining that the third direction of the eye gaze of the second representation as displayed by the first apparatus matches the fifth direction of the eye gaze of the second representation as provided by the second apparatus.

2. The method of clause 1, wherein the determining that the third direction matches the fifth direction comprises:

representing the third direction of the eye gaze of the second representation as displayed by the first device as a first vector;

representing the fifth direction of the eye gaze of the second representation as provided by the second apparatus as a second vector;

comparing the first vector to the second vector; and

determining, based at least in part on the comparison, that the first vector matches the second vector.

3. The method of clause 1, further comprising:

capturing a first image of a view of the second representation from the perspective of the first representation,

wherein the receiving the data indicating the fifth direction from the second device comprises: receiving a second image from the second device, the second image representing an estimated view of the second representation from the perspective of the first representation; and is

Wherein the determining that the third direction matches the fifth direction comprises:

comparing the first image with the second image; and

determining that a direction in which the eye of the second representation is gazed in the first image matches a direction in which the eye of the second representation is gazed in the second image.

4. The method of clause 1, wherein the first user and the second user are located in the physical environment.

5. The method of clause 4, further comprising:

receiving data from the second apparatus indicating the fourth direction, the fourth direction being relative to the first user; and

determining that the fifth direction of the eye gaze of the second representation as provided by the second apparatus matches the fourth direction of the eye of the second user relative to the first user gaze within the physical environment.

6. The method of clause 1, wherein the second user is located in the other physical environment.

7. A system, comprising:

one or more processors; and

a memory communicatively coupled to the one or more processors and storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:

facilitating a communication session utilizing a first device associated with a first user and a second device associated with a second user, the first user represented in a first representation within a coordinate system and the second user represented in a second representation within the coordinate system;

determining a first direction in the coordinate system in which the second representation or at least one eye of the second user is looking;

receiving data from the second device indicating a second direction in the coordinate system in which the second representation or the at least one eye of the second user is looking; and

evaluating the first direction and the second direction.

8. The system of clause 7, wherein the operations further comprise:

determining, based at least in part on the evaluation, that the first direction matches the second direction; and

maintaining the first device and the second device in the communication session.

9. The system of clause 7, wherein the communication session is a first type of communication session associated with displaying the second representation based at least in part on eye data, and the operations further comprise:

determining, based at least in part on the evaluation, that the first direction does not match the second direction; and

switching the communication session from the first type of communication session to a second type of communication session that does not display the second representation based at least in part on the eye data.

10. The system of clause 7, wherein the communication session is a first type of communication session associated with displaying the second representation based at least in part on eye data, and the operations further comprise:

causing at least one of the first device or the second device to be re-activated or causing an eye tracking sensor associated with at least one of the first device or the second device to be re-activated.

11. The system of clause 7, wherein the evaluating comprises:

representing the first direction with a first vector;

representing the second direction with a second vector; and

comparing the first vector to the second vector.

12. The system of clause 7, wherein:

the determining the first direction comprises capturing a first image of a view of the second representation from a perspective of the first representation;

the receiving the data from the second device comprises: receiving a second image from the second device, the second image representing an estimated view of the second representation from the perspective of the first representation; and is

The evaluating includes comparing the first image to the second image.

13. The system of clause 7, wherein the operations further comprise:

receiving data about the first user, the data comprising at least one of an image of the first user, depth data of the first user, or eye tracking data of the first user;

creating a human model representing the first user based at least in part on the data; and

causing display, via the second device, of the first representation for the first user, the first representation based at least in part on the human model.

14. The system of clause 7, wherein the operations further comprise:

determining that a signal indicative of movement has not been received from an input device for a period of time, the input device being associated with the first device;

in response to determining that the signal indicative of movement has not been received from the input device within the period of time, determining a position of the first device relative to the input device;

determining that the input device is located more than a threshold distance from the first user based at least in part on a human model and the position of the first device relative to the input device; and

causing display of the first representation via the second device without displaying a body part representation associated with the input device.

15. A system, comprising:

one or more processors; and

initiating a communication session with a first device associated with a first user and a second device associated with a second user, the first user being associated with a first representation within a coordinate system and the second user being associated with a second representation within the coordinate system;

determining that the first user perceives the eye of the second representation to be gazing at the first user's eye;

evaluating a first direction in which the eye of the second representation as displayed by the first device is sighted in the coordinate system and a second direction in which the eye of the first representation as displayed by the second device is sighted in the coordinate system; and

determining, based at least in part on the evaluation, whether the first user has made eye contact with the second representation.

16. The system of clause 15, wherein the determining that the first user perceives the eye of the second representation to be gazed at the eye of the first user comprises: receiving an input from the first user indicating that the first user perceives the eye of the second representation to be gazing at the eye of the first user.

17. The system of clause 15, wherein the determining that the first user perceives the eye of the second representation to be gazed at the eye of the first user comprises:

receiving eye tracking data for the first user from an eye tracking sensor associated with the first apparatus;

determining a direction in which the eye of the first user is gazing based at least in part on the eye tracking data; and

determining that the direction in which the eye of the first user is gazing is toward the eye of the second representation displayed by the first apparatus.

18. The system of clause 15, wherein the evaluating occurs as part of a calibration process for at least one of the first device or the second device.

19. One or more non-transitory computer-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

facilitating a communication session utilizing a first device associated with a first user and a second device associated with a second user, the first user associated with a first representation within a coordinate system and the second user associated with a second representation within the coordinate system;

identifying a first position or orientation of an output of the second representation as perceived by the first user;

receiving data from the second device indicative of a second position or orientation of the output of the second representation as perceived by the second user; and

evaluating the first location or orientation and the second location or orientation to determine whether the first location or orientation matches the second location or orientation.

20. The one or more non-transitory computer-readable media of clause 19, wherein the output of the second representation comprises at least one of a sound, a touch, or a displayed eye.

21. The one or more non-transitory computer-readable media of clause 19, wherein the first device and the second device are located in the same physical environment.

22. A system, comprising:

one or more processors; and

determining a first direction in which sound from the oral cavity of the second representation is perceived by the first user within the coordinate system;

receiving data indicative of a second direction from the second device, the second device outputting sound from the oral cavity of the second representation in the second direction for the coordinate system; and

evaluating the first direction and the second direction to determine whether the first direction matches the second direction.

23. The system of clause 22, wherein the evaluating the first direction and the second direction comprises:

representing the first direction as a first vector;

representing the second direction as a second vector;

comparing the first vector to the second vector; and

determining whether the first vector and the second vector match based at least in part on the comparison.

24. The system of clause 22, wherein:

the magnitude of the first vector comprises a loudness of the sound from the mouth of the second representation perceived by the first user; and

the magnitude of the second vector comprises the loudness of the sound of the oral cavity from the second representation output by the second means.

25. A system, comprising:

one or more processors; and

determining a first location on the first representation at which a touch is perceived from the second representation;

receiving, from the second device, data indicative of a second location on the first representation at which a touch was provided by the second representation; and

evaluating the first location and the second location to determine whether the first location matches the second location.

26. The system of clause 25, wherein the evaluating the first direction and the second direction comprises:

representing the first direction as a first vector;

representing the second direction as a second vector;

comparing the first vector to the second vector; and

27. The system of clause 25, wherein:

the magnitude of the first vector comprises a force of the touch perceived on the first representation; and is

The magnitude of the second vector comprises the force of the touch provided by the second representation.

28. A method, comprising:

initiating a communication session utilizing a first device associated with a first user, a second device associated with a second user, and a third device associated with a third user, the first device and the second device being located in a first physical environment, and the third device being located in a second physical environment, the first user being associated with a first representation, the second user being associated with a second representation, and the third user being associated with a third representation;

identifying, by the first device, first spatial data indicative of a first frame of reference for the first user and the second user, a location of the first user or the first representation relative to the first frame of reference, and a location of the second user or the second representation relative to the first frame of reference, the first frame of reference comprising at least one of a common spatial anchor point or virtual point in the first physical environment;

receiving, by the first device and from the second device, second spatial data indicating a second frame of reference for the third user and a position of the third user or the third representation relative to the second frame of reference, the second frame of reference comprising at least one of an anchor point or another virtual point in the second physical environment;

generating, by the first device, composite spatial data, the composite spatial data including the first spatial data and the second spatial data;

presenting, by the first device, the second representation for the second user within a coordinate system, the second representation for the second user positioned within the coordinate system based at least in part on the first spatial data included in the composite spatial data; and

presenting, by the first device, the third representation for the third user within the coordinate system, the third representation for the third user positioned within the coordinate system based at least in part on second spatial data included in the composite spatial data.

29. The method of clause 28, wherein the presenting the second representation for the second user comprises positioning the second representation of the second user in the coordinate system relative to the first representation of the first user such that the position of the first user relative to the first frame of reference and the position of the second user relative to the first frame of reference are maintained.

30. The method of clause 28, wherein the presenting the second representation for the second user comprises positioning the second representation of the second user in the coordinate system relative to the first representation of the first user such that a location of the first representation relative to the second representation is scaled to a location of the first user relative to the second user in the first physical environment.

31. The method of clause 28, wherein the composite spatial data indicates:

a virtual point shared by the first spatial data and the second spatial data;

the location of the first user or the first representation relative to the virtual point of the composite spatial data;

the location of the second user or the second representation relative to the virtual point of the composite spatial data; and

the third user or the third representation is relative to the location of the virtual point of the composite spatial data.

32. The method of clause 28, further comprising:

maintaining the third spatial data in the composite spatial data when the third device is part of the communication session.

33. The method of clause 28, wherein the first device, the second device, and the third device each comprise at least one of a virtual reality headset or a mobile device.

34. The method of clause 28, wherein the communication session comprises a peer-to-peer communication session.

35. The method of clause 28, wherein the communication session is implemented at least in part via a service provider.

36. A system, comprising:

one or more processors; and

determining a location of the first user in the first physical environment relative to a location of the second user in the first physical environment;

causing display of the second representation for the second user within a coordinate system, the second representation positioned within the displayed view of the first device based at least in part on the location of the first user in the first physical environment relative to the location of the second user in the first physical environment; and

causing display of a third representation for the third user within the coordinate system.

37. The system of clause 36, wherein the operations further comprise:

generating composite spatial data indicative of a first frame of reference for the first user and the second user, a position of the first user or the first representation relative to the first frame of reference, a position of the second user or the second representation relative to the first frame of reference, a second frame of reference for the third user, and a position of the third user or a third representation of the third user relative to the second frame of reference,

wherein said causing of display of said second representation and said causing of display of said third representation are based, at least in part, on said composite spatial data.

38. The system of clause 37, wherein the composite spatial data indicates:

virtual points;

the location of the first user or the first representation relative to the virtual point;

the location of the second user or the second representation relative to the virtual point; and

the third user or the third representation is relative to the location of the virtual point.

39. The system of clause 36, wherein the operations further comprise:

generating composite spatial data comprising spatial data for the first user, spatial data for the second user, and spatial data for the third user,

40. The system of clause 36, wherein the operations further comprise:

capturing data about the first user, the data comprising at least one of an image of the first user, depth data of the first user, or eye tracking data of the first user;

causing display of the first representation via the second device, the first representation based at least in part on the human model.

41. The system of clause 40, wherein the first representation comprises a hand representation for the first user's hand.

42. The system of clause 41, wherein the operations further comprise:

determining that the input device is located greater than a threshold distance from the first user based at least in part on the human model and the position of the first device relative to the input device; and

43. The system of clause 42, wherein the body part representation comprises a hand representation and the input device comprises a hand controller.

44. The system of clause 36, wherein the operations further comprise:

determining that the first user is wearing a head set and that the first user is holding an input device in a hand; and

based at least in part on the determination that the first user is wearing a head device and the first user is holding an input device with a hand, causing display of the first representation with a head and a hand and without other body parts via the second device.

45. The system of clause 36, wherein the operations further comprise:

receiving data from the second device indicating that the second user is wearing a head set and the second user is holding an input device with a hand,

wherein the causing of display of the second representation comprises causing display of the second representation having a head and hands without other body parts.

46. A system, comprising:

one or more processors; and

facilitating a communication session utilizing a first device associated with a first user, a second device associated with a second user, and a third device associated with a third user, the first device and the second device being located in a first physical environment, and the third device being located in a second physical environment, the first user being associated with a first representation, the second user being associated with a second representation, and the third user being associated with a third representation;

receiving, from the second device, first spatial data indicating a first frame of reference for the second user and a position of the second user or the second representation relative to the first frame of reference, the first frame of reference comprising at least one of an anchor point or a virtual point in the first physical environment;

receiving, from the third device, second spatial data indicating a second frame of reference for the third user and a position of the third user or the third representation relative to the second frame of reference, the second frame of reference comprising at least one of an anchor point or another virtual point in the second physical environment;

generating composite spatial data, the composite spatial data comprising the first spatial data and the second spatial data;

causing display of the second representation for the second user, the second representation for the second user positioned within a coordinate system based at least in part on the first spatial data included in the composite spatial data; and

cause display of the third representation for the third user positioned within the coordinate system based at least in part on second spatial data included in the composite spatial data

47. The system of clause 46, wherein the operations further comprise:

maintaining the position of the second user or the second representation relative to the first frame of reference in the composite spatial data when the second device is part of the communication session.

48. A method, comprising:

facilitating, by a service provider, a communication session utilizing a first device associated with a first user and a second device associated with a second user, the first user associated with a first representation and the second user associated with a second representation;

causing, by the service provider, display of the second representation via the first device;

receiving, by the service provider and from the first device, first data for the first user, the first data comprising at least one of a first image or first depth data;

determining, by the service provider, a first direction of gaze of the first user's eye relative to the second representation displayed by the first apparatus based at least in part on the first data;

receiving, by the service provider and from the second device, second data for the second user, the second data comprising at least one of a second image or second depth data;

determining, by the service provider, a location and orientation of the second user relative to the second apparatus based at least in part on the second data; and

based at least in part on the first direction in which the first user's eye gazes relative to the second representation displayed by the first device and the position and orientation of the second user relative to the second device, causing display of the first representation via the second device, the first representation having an eye representation relative to the second user gazing in a second direction aligned with the first direction in which the first user's eye gazes relative to the second representation displayed by the first device.

49. The method of clause 48, wherein the first representation comprises data from the first image of the first user and the second representation comprises data from the second image of the second user.

50. The method of clause 48, further comprising:

determining that the first direction in which the first user's eye is gazing is at an eye representation of the second representation displayed by the first device,

wherein the causing comprises causing display of the first representation via the second device, wherein the eye representation of the first representation gazes at an eye of the second user.

51. The method of clause 48, wherein the causing comprises generating the first representation and sending the first representation to the second device.

52. The method of clause 48, wherein the causing comprises causing the first representation to be overlaid on a real-time image of the first user.

53. The method of clause 52, wherein the first representation represents a face of the first user.

54. A method, comprising:

initiating a communication session utilizing a first device associated with a first user and a second device associated with a second user, the first user associated with a first representation and the second user associated with a second representation;

capturing, by a sensor of the first device, first data of the first user, the first data comprising at least one of image or depth data;

receiving, by the first device and from at least one of the second device or a service provider, second data indicating a direction in which an eye representation of the second representation should be gazed relative to the first user, the direction being aligned with another direction in which an eye of the second user is gazed relative to the first representation displayed via the second device;

determining, by the first device, a position and an orientation of the first user relative to the first device based at least in part on the first data; and

causing display of the second representation via the first device based at least in part on the position and orientation of the first user relative to the sensor of the first device and the direction in which the eye representation of the second representation should be gazed relative to the first user gaze, wherein the eye representation of the second representation gazes in the direction indicated in the second data.

55. The method of clause 54, wherein the second representation includes data from an image of the second user.

56. The method of clause 54, wherein:

the other direction in which the eye of the second user is gazing is at an eye representation of the first representation displayed by the second apparatus; and is

The causing comprises causing display of the second representation via the first device, wherein the eye representation of the second representation gazes at an eye of the first user.

57. The method of clause 54, wherein the causing comprises causing the second representation to be overlaid on a real-time image of the second user.

58. The method of clause 57, wherein the second representation represents a face of the second user.

59. A system, comprising:

one or more processors; and

a memory communicatively coupled to the one or more processors and storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform the method of clause 54.

60. A system, comprising:

one or more processors; and

a memory communicatively coupled to the one or more processors and storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform:

receiving first data for the first user from the first device, the first data comprising at least one of a first image or first depth data;

determining, based at least in part on the first data, a first direction in which the first user's eye is gazing relative to the second representation displayed by the first apparatus;

receiving second data for the second user from the second device, the second data comprising at least one of a second image or second depth data;

determining a position and orientation of the second user relative to the second apparatus based at least in part on the second data; and

61. The system of clause 60, wherein the first representation comprises data from the first image of the first user and the second representation comprises data from the second image of the second user.

62. The system of clause 60, wherein the executable instructions, when executed by the one or more processors, further cause the one or more processors to perform:

63. The system of clause 60, wherein the causing comprises generating the first representation and sending the first representation to the second device.

64. The system of clause 60, wherein the first representation represents a face of the first user.

65. The system of clause 64, wherein the first representation is displayed in an overlaid manner on a face depicted in a real-time image of the first user.

66. The system of clause 60, wherein the communication session comprises a video telephony session.

67. The system of clause 60, wherein the communication session comprises at least one of a virtual reality session or an augmented reality session.

Conclusion

Although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the disclosure is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed herein as illustrative forms of implementing the described embodiments.

Claims

1. A method, comprising:

2. The method of claim 1, wherein the determining that the third direction matches the fifth direction comprises:

comparing the first vector to the second vector; and

3. The method of claim 1, further comprising:

comparing the first image with the second image; and

4. The method of claim 1, wherein the first user and the second user are located in the physical environment.

5. The method of claim 4, further comprising:

6. The method of claim 1, wherein the second user is located in the other physical environment.

7. A system, comprising:

one or more processors; and

evaluating the first direction and the second direction.

8. The system of claim 7, wherein the operations further comprise:

9. The system of claim 7, wherein the communication session is a first type of communication session associated with displaying the second representation based at least in part on eye data, and the operations further comprise:

10. The system of claim 7, wherein the communication session is a first type of communication session associated with displaying the second representation based at least in part on eye data, and the operations further comprise:

11. The system of claim 7, wherein the evaluating comprises:

representing the first direction with a first vector;

representing the second direction with a second vector; and

comparing the first vector to the second vector.

12. The system of claim 7, wherein:

The evaluating includes comparing the first image to the second image.

13. The system of claim 7, wherein the operations further comprise:

14. The system of claim 7, wherein the operations further comprise:

15. A method, comprising:

16. The method of claim 15, wherein the presenting the second representation for the second user comprises positioning the second representation of the second user in the coordinate system relative to the first representation of the first user such that the position of the first user relative to the first frame of reference and the position of the second user relative to the first frame of reference are maintained.

17. The method of claim 15, wherein the presenting the second representation for the second user comprises positioning the second representation of the second user in the coordinate system relative to the first representation of the first user such that a location of the first representation relative to the second representation is scaled to a location of the first user relative to the second user in the first physical environment.

18. The method of claim 15, wherein the composite spatial data indicates:

a virtual point shared by the first spatial data and the second spatial data;

19. The method of claim 15, further comprising:

20. The method of claim 15, wherein the first device, the second device, and the third device each comprise at least one of a virtual reality headset or a mobile device.