WO2018005235A1 - System and method for spatial interaction using automatically positioned cameras - Google Patents

System and method for spatial interaction using automatically positioned cameras Download PDF

Info

Publication number
WO2018005235A1
WO2018005235A1 PCT/US2017/038820 US2017038820W WO2018005235A1 WO 2018005235 A1 WO2018005235 A1 WO 2018005235A1 US 2017038820 W US2017038820 W US 2017038820W WO 2018005235 A1 WO2018005235 A1 WO 2018005235A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
sight
user
virtual
line
real
Prior art date
Application number
PCT/US2017/038820
Other languages
French (fr)
Inventor
Seppo T. VALLI
Pekka K. SILTANEN
Original Assignee
Pcms Holdings, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment ; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/225Television cameras ; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, camcorders, webcams, camera modules specially adapted for being embedded in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/232Devices for controlling television cameras, e.g. remote control ; Control of cameras comprising an electronic image sensor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents

Abstract

Methods and systems are described for generating a virtual geometry based on position information for at least a first user in a first physical location, a second user in a second physical location, and at least a third user in a third location, determining a virtual line-of-sight in the virtual geometry between the first user and the second user, determining a real-world line-of-sight associated with the virtual line-of-sight by mapping the virtual line-of-sight into the first physical location, automatically moving at least one moveable camera of a plurality of moveable cameras in the first location to a position and orientation along the determined real-world line-of-sight, and displaying a rendered version of the second user along the determined real-world line-of-sight.

Description

SYSTEM AND METHOD FOR SPATIAL INTERACTION

USING AUTOMATICALLY POSITIONED CAMERAS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a non-provisional filing of, and claims benefit under 35 U.S.C. §119(e) from, U.S. Provisional Patent Application Serial No. 62/357,060, entitled "System and Method for Spatial Interaction Using Automatically Positioned Cameras," filed June 30, 2016, the entirety of which is incorporated herein by reference.

BACKGROUND

[0002] Videoconferencing and telepresence solutions are becoming more and more important in supporting environmentally friendly and efficient ways of work and life. Augmented reality (AR) is a concept using a set of technologies for merging real and virtual elements to produce new visualizations (often times video) where physical and digital objects co-exist and interact in real time.

[0003] 3D models and animations are examples of virtual elements that can be visualized in AR. However, AR objects may be any digital information for which spatiality (3D position and orientation in space) provides added value. Some examples of AR objects include pictures, videos, graphics, text, and audio.

[0004] Augmented Reality visualizations utilize a means of seeing augmented virtual elements as a part of a physical view. This can be implemented using Augmented Reality glasses (e.g., video-see-through or optical -see-through, monocular or stereoscopic) to capture video from the user's environment and show it together with virtual elements on a display.

[0005] AR visualizations can be seen correctly from different viewpoints, so that when the user changes his/her viewpoint, virtual elements stay or act as if they are part of the real-world physical scene. Tracking technologies have been developed for deriving 3D properties of the environment for AR content production and for tracking the viewer's camera position with respect to the environment when viewing the content. The viewer' s position can be tracked e.g. by tracking known objects in the video stream shown on the viewer's AR glasses or using one or more depth cameras within the user' s environment.

[0006] Existing telepresence solutions for supporting spatiality and eye-contact are complicated and expensive, and they generally work only for fixed device and user configurations. Some exemplary systems include Viewport, described in Zhang et al., "Viewport: A Distributed, Immersive Teleconferencing System with Infrared Dot Pattern," IEEE Multimedia, vol. 20, no. 1, pp. 17-27, Jan. -March 2013; ViewCast, described in Yang et al., "Enabling Multi-party 3D Tele- immersive Environments with ViewCast," ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 6, No. 2, March 2010, pp. 111-139; 3DPresence, described in "Specification of Multi-View Acquisition System, v 1.0", Deliverable Dl . l, EU-FP7 project 3DPresence, 2008, 48p.; and U.S. Patent No. 7,515,174 "Multi-user video conferencing with perspective correct eye-to-eye contact."

[0007] Three dimensional reconstruction or modelling based telepresence solutions have been predicted to solve the needs for immersive and natural telepresence. Systems in which users are captured in real-time, reconstructed in 3D, and transmitted to remote sites to be rendered for other participants are described in Fuchs et al., "Immersive 3D Telepresence", IEEE COMPUTER, July 2014, pp. 46-52, and in P. Eisert, "Immersive 3-D Video Conferencing: Challenges, Concepts, and Implementations", Proc. SPIE Visual Communications and Image Processing (VCIP), July 2003. The system of Eisert is called a shared virtual table environment (SVTE). A unifying feature is typically the meeting table, which is perceived to be shared between the participants. Note, that in SVTE type of implementations, remote partners are not really immersed or teleported as parts of the local space.

[0008] In some solutions, a window paradigm has been used where the remote users are seen through a naturally behaving window, allowing users to experience motion parallax and stereoscopic 3D perception. In these solutions, in order to enable correct perception of gaze and gestures, physical parameters are standardized across meeting sites (geometry, meeting table, display assembly, etc.). Those parameters may also specify the number and position of the collaborating partners, which is very restricting. These solutions are based on extending each local meeting space by those used by remote partners.

[0009] 3D virtual worlds, such as Second Life and OpenQwaq (formerly known as Teleplace), are a well-known way of enabling interaction between people represented by avatars. Attempts have been made to bring naturalness to the interaction by making avatars and environments close to their real-world exemplars. Avatars share the same spatial environment, which removes the problem of inconsistent meeting environments. Lack of physical perception of objects and spaces does not seem to be bothersome especially in game and entertainment type of use, but for most people, the result is too unnatural to replace video conferencing.

[0010] An important difference to real-world interaction is the way avatars are controlled. In virtual worlds, their movement is not directly copied from humans, but instead avatars are remote controlled, e.g., through mouse and keyboard interaction. Such movement tends to alienate the user experience from being real and in person. Further, in order for such an avatar to represent a distant person, he/she should be captured also for facial gestures, which is difficult when the person is wearing virtual glasses.

[0011] Virtual window and SVTE solutions were extended by showing that people can even be brought to visit each other's physical spaces, by teleporting their 3D modelled and animated avatars to those environments. An example is described in Kantonen et al., "Mixed Reality in Virtual World Teleconferencing," Proc. IEEE Virtual Reality, Waltham, Massachusetts, USA, March 20 - 24, 2010, pp. 179-182.

[0012] The "Hydra" telepresence system is described in Buxton, W. (1992), "Telepresence: integrating shared task and person spaces." Proceedings of Graphics Interface '92, 123-129. Another telepresence system enabling collaboration on Augmented Reality objects is presented in Wang et al., "Mutual awareness in collaborative design: An Augmented Reality integrated telepresence system", Computers in Industry 65 (2014) 314-324. In Wang et al., the collaboration system is basically a three-party Hydra type setup in fixed regular geometry, enabling true perception of directions, extended by an AR table display for markers to remotely augment and interact with 3D models. The AR table display is also used to overlay indications of remote participants' hands with respect to augmented object(s).

[0013] Some current solutions relate to the problem of supporting non-spatial remote AR interaction over network. In some solutions, the basic assumption is that producing 3D augmentations remotely over network do not require assistance or in-advance preparations in the target location augmentations are to be bound.

[0014] Assistance or in-advance preparations in the target location augmentations can be avoided by capturing in real-time, over the network, enough 3D data from the target location so that a 3D object can be placed accurately including its position, orientation, and scale. In some solutions, this is supported by using a fixed distributed sensor setup as part of the system in the target location, favourably equally in all collaborating sites.

[0015] Note that avoiding assistance and off-line scanning rules out the use of most common methods for 3D feature capture, namely those based on a single moving camera or depth sensor, for example SLAM algorithms (Klein, G., & Murray, D. (2007), "Parallel tracking and mapping for small AR workspaces", 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, ISMAR, 2007) and basic Kinect Fusion algorithms.

[0016] A common problem in current remote AR systems is that only one camera is used at each site, which does not give enough view-points or information to remotely place an augmentation accurately in desired 3D position. Furthermore, existing remote AR systems do not support remote augmentation in a multi-point setup.

SUMMARY

[0017] Described herein are methods and systems for an immersive remote interaction system that provides a virtual meeting between multiple meeting sites and participants. The users' spatial awareness of each other is supported by providing a virtual line-of-sight between each pair of users. The system determines the geometry of the virtual setup based on the number of meeting sites and the positions of the participants. The same virtual geometry is perceived by all meeting participants. In some embodiments, the users wear AR glasses to see their remote counterparts augmented in their virtual but realistic positions. The actual positions of users in their environments are tracked in order to augment the videos of the remote participants into the local participant's view. Naturalistic virtual view-points are provided using moving cameras near the apparent position of the eyes of the augmented remote participants.

[0018] Methods and systems are described for generating a virtual geometry based on position information for at least a first user in a first physical location, a second user in a second physical location, and at least a third user in a third location, determining a virtual line-of-sight in the virtual geometry between the first user and the second user, determining a real-world line-of-sight associated with the virtual line-of-sight by mapping the virtual line-of-sight into the first physical location, automatically moving at least one moveable camera of a plurality of moveable cameras in the first location to a position and orientation along the determined real-world line-of-sight, and displaying a rendered version of the second user proximate to the position of the at least one moveable camera.

[0019] In some embodiments, the system is configured to move the camera to a position along a virtual line-of-sight between local and remote users, to pan and zoom cameras automatically and/or at the direction of the local user, and to send the video to the remote users to be augmented. The system supports a variable number of meeting sites and users, as well as users' mobility within the room. In some embodiments, multi-user 3D displays may be employed instead of AR glasses, as well as various other technologies for providing AR services.

[0020] Embodiments described below allow a minimal number of cameras to be used and to achieve more precise line-of-sight by moving the cameras into "optimal" positions.

[0021] Further, methods and systems are disclosed for enabling remote augmented reality

(AR) objects in an interaction setup for multiple participants and environments. Based on a unified geometry (coordinate system), all participants have individual natural viewpoints both to each other and to the AR objects. Instead of a separate 3D capture setup, multiple views to a target position for augmentation are provided by the same physical array of cameras which is used for capturing meeting participants.

[0022] Some embodiments for the augmented/mixed reality collaboration avoid problems caused by non-uniform meeting room geometries and fittings by extending each meeting site with those remote ones using a window paradigm (applied e.g. by shared virtual table environments (SVTE)). This choice allows the physical characteristics of spaces to differ. In addition, the system does not need to provide arbitrary virtual viewpoints from inside a remote space, which avoids the use of a complex setup for 3D capture and reconstruction for virtual viewpoint calculations. In the following disclosure, systems and methods are described which in addition to supporting gaze and spatial orientation between remote users, also support sharing of physical objects as 3D models. This type of system may be referred to as a spatial tele-interaction system, as distinction from more ordinary telepresence systems.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] A more detailed understanding may be had from the following description, presented by way of example in conjunction with the accompanying drawings, wherein:

[0024] FIG. 1 A depicts an example communications system in which one or more disclosed embodiments may be implemented.

[0025] FIG. IB depicts an example client device that may be used within the communications system of FIG. 1A.

[0026] FIG. 1C illustrates an exemplary network entity that may be employed as a server in accordance with some embodiments.

[0027] FIG. 2 illustrates various users in different parts of the world participating in a virtual meeting.

[0028] FIG. 3 illustrates a plurality of participants in a physical meeting, and how each participant is seen from different view-points from each of participants 302 and 304.

[0029] FIG. 4 illustrates local participants in a physical meeting, and how gaze awareness for each remote participant differs from view-points of each local participant.

[0030] FIG. 5 illustrates a virtual layout for a meeting between five participants on three sites, in accordance with some embodiments.

[0031] FIG. 6 illustrates an example of vertical parallax. [0032] FIG. 7 illustrates an exemplary system components and data flows between them, in accordance with some embodiments.

[0033] FIGs. 8 A and 8B show a flowchart of a process, in accordance with some embodiments.

[0034] FIG. 9 illustrates three separate physical meeting sites, in accordance with some embodiments.

[0035] FIG. 10 illustrates an exemplary virtual geometry for the three meeting sites of FIG. 9, in accordance with some embodiments.

[0036] FIG. 1 1 illustrates virtual lines-of-sight and camera positioning for the virtual geometry of FIG. 10, in accordance with some embodiments.

[0037] FIG. 12 illustrates virtual lines-of-sight and camera positioning for the virtual geometry of FIG. 10, in accordance with some embodiments.

[0038] FIG. 13 illustrates an example of how various remote videos are transformed with respect to a local participant, in accordance with some embodiments.

[0039] FIG. 14 illustrates an example of camera positioning, in accordance with some embodiments.

[0040] FIG. 15 illustrates a virtual meeting with virtual objects, in accordance with some embodiments.

[0041] FIG. 16 illustrates a camera array, in accordance with some embodiments.

[0042] FIG. 17 illustrates an exemplary perspective of a local user, in accordance with some embodiments.

[0043] FIG. 18 illustrates a location of a virtual object in a local participant' s local meeting site, in accordance with some embodiments.

[0044] FIG. 19 is a call-flow diagram in accordance with some embodiments.

[0045] FIGs. 20A and 20B are a flowchart of a process, in accordance with some embodiments.

[0046] FIG. 21 is a flowchart of a method, in accordance with some embodiments.

DETAILED DESCRIPTION

[0047] A detailed description of illustrative embodiments will now be provided with reference to the various Figures. Although this description provides detailed examples of possible implementations, it should be noted that the provided details are intended to be by way of example and in no way limit the scope of the application. The systems and methods relating to telepresence may be used with the wired and wireless communication systems described with respect to

FIGs. 1 A-1C. Descriptions for FIGs. 1 A-1C will be provided at the end of this document. Telepresence Sessions with Movable Cameras.

[0048] Telepresence (remote interaction) systems bring users seemingly closer to each other. In basic systems, this is accomplished by one camera in each local meeting room shared by all remote participants. In more advanced systems supporting directions and spatiality, multiple physical or virtual cameras may be used in each meeting space. FIG. 2 illustrates various users in different parts of the world participating in a virtual meeting. As shown in FIG. 2, users 202, 204, and 206 are participating over various parts of the country. Embodiments herein layout virtual geometries to form a virtual meeting 210 between users 202/204/206, as if the users were all sitting around a virtual conference table 212.

[0049] The manner in which remote meeting spaces are brought together makes a significant difference to system implementation and abilities. Further, in order for an interaction system to provide consistent viewing directions between participants and to various objects, the system benefits from use of a consistent geometry, e.g., a unified coordinate system.

[0050] Various problems stem from the way existing tele-interaction systems combine meeting spaces and geometries. In particular, there are variations where: 1) the system aims to bring meeting spaces as neighbors (adjacent to each other), or 2) the system tries to merge spaces by overlaying their geometries. Systems using a window paradigm (e.g. physical displays) fall into the first category, while Mixed Reality systems belong to the second category.

[0051] Inconsistency of geometries causes loss of consistency of directions, loss of awareness of positions, loss of continuity between meeting spaces, and loss of gaze awareness. Lack of 3D information from remote spaces may cause loss of scale (as position is not enough for scaling) and loss of support for remote augmentations. Inconsistency of geometries and lack of 3D information may cause loss of natural interaction with remote parties and loss of natural interaction using 3D models.

[0052] A virtual world is an example of a shared space with consistent geometry and scale. In virtual worlds, participants are represented by virtual avatars and experience the spatial geometries in a unified way. Real-world physics may be copied in virtual space as it makes interaction with the environment more natural, e.g., walking, and using furniture and objects. Traveling is naturally not a problem as participants can move (be teleported) easily from place to place.

[0053] On the other hand, lacking tactile feedback based on touching or feeling objects, as well as not sensing gravity, pose, and weight of objects, are examples of problems hindering experiencing naturalness, immersion and presence in virtual worlds. Although game players are used to moving and controlling their virtual representative by mouse interaction, for example, it is not as natural as using one's own body motion for walking, etc. [0054] In addition to problems in body control, supporting free viewpoints makes use of virtual camera views from the desired locations of the virtual world or the real-time reconstructed physical world. The latter option is in use in many recent telepresence solutions and requires advanced algorithms and high computation power.

[0055] The above difficulties relating avatar control and viewpoint formation are even more prominent in Mixed Reality based systems using avatars. Taking virtual models into physical scenes uses real-time capture of 3D information, which in many cases requires reconstructing the scene in 3D. In some earlier demonstrated systems, augmentation is supported more simply by graphical markers, manual off-line work, and a fixed meeting setup. In such systems, visual disturbances are being produced and considerable restrictions are made to system use.

[0056] Problems related to Mixed Reality solutions include computational challenges, such as complexity in making real-time 3D reconstructions, providing virtual camera views to the reconstruction result, solving the calibration and fusion of multiple depth sensors, and coding and transmission of the resulting high amount of data. Optionally, if local assistance and off-line preparations are used, restrictions in the system may include disturbances, such as visual markers, advance preparations and manual work, and restrictions in meeting setups.

[0057] Several existing telepresence solutions rely on making strong assumptions of meeting geometry and the number of users. In some cases, these assumptions represent a way to get consistent geometry across meeting sites and enable gaze awareness and correct lines-of-sight between participants. Such solutions may either be video or 3D reconstruction based. The latter supports more inherently the capture of 3D information needed for remote AR.

[0058] The Hydra system referenced above is a partial solution supporting eye-contact and approximate gaze directions. Again, by making assumptions on system geometry including the same regular geometry and number of terminals across all sites, Hydra can also be made to support true lines-of-sight. A problem in the above approaches is the inflexibility of the system to support varying number of meeting sites and several participants per site.

[0059] Most Augmented/Mixed reality applications are local, so that creation and consumption (visualisation) of AR content is made in the same local space. However, for example in remote maintenance and consultancy, it is very useful to make augmentations remotely, over the network. In collaboration and interaction systems, remote augmentation can support delivering and sharing of physical objects as 3D models.

[0060] If traditional graphical markers are used in the space, it is straightforward to do the augmentation from a remote location using videos containing those markers. If graphical markers are not acceptable due to causing visual disturbances, or if attaching graphical markers cannot be done in advance (assisted), the above method for supporting remote AR may not work well. In such cases, natural features of the physical environment, a point-cloud, or even a full 3D reconstruction can be captured and taken to a remote place for making the augmentation. In addition to high bitrate requirement and other technical limitations, this approach may be limited by the need to make advance preparations.

[0061] Embodiments below include supporting ad-hoc remote augmentation of 3D models in a spatial, multi-point, multi-party setup, which includes providing collaborating participants individual view-points also to the augmented objects, and an equal, symmetrical means for generating and placing them. As a summary, supporting remote augmentation includes support for ad-hoc use, symmetrical usage, individual view-points to augmented objects, and support for both local and remote viewing of augmented objects.

[0062] Existing options for telepresence include systems supporting: 1) Video based collaboration, 2) Collaboration in Virtual Environments, 3) Collaboration in Mixed Reality spaces, etc. It is desirable for a tele-interaction system to support all these options and in addition to enable spatial remote AR interaction functionalities in multi-point settings. Further, the system should preferably support interaction in local and remote spaces, as well as in a shared virtual space.

[0063] FIG. 3 illustrates a plurality of participants in a physical meeting and how each participant is seen from different view-points from each of the remaining participants. Example lines-of-sight are shown only for participants 302 and 304. Current collaboration systems poorly support spatial properties of a physical meeting, e.g. eye-contact and view-points (lines-of-sight) between multiple participants. For example, in a physical meeting, if participant 302 turns his/her head to look at the other participant, or even moves his/her position around a meeting table, all other participants immediately notice the change. These spatial properties cannot be communicated in current collaboration systems.

[0064] FIG. 4 illustrates one solution for a teleconference system. As shown, users 402 and

404 may be in the same physical location, and fields-of-view along lines of sight are illustrated to each remote participant shown on displays, showing differences in gaze awareness.

[0065] Eye-contact is known to help communication e.g. by expressing emotions, indicating turn taking, and building trust between participants. Eye-contact is a result of gazing in 3D space, thus spatiality should be supported. Spatiality helps to understand and memorize a message and builds awareness about the identities, roles, actions, and information flow between participants.

[0066] Imperfect eye-contact and poor spatial support in general are significant reasons why videoconferencing systems lack the naturalness of face-to-face meetings. Eye-contact is disturbed by so-called parallax distortions, both in horizontal and vertical directions. [0067] Horizontal distortion arises when only one camera or at most a few cameras are used as a proxy for the eyes of all remote participants in the meeting. Vertical distortion on the other hand stems from the displacement of a camera (a proxy for the participant's eyes) from his/her face on the display. An example of vertical parallax is illustrated in FIG. 6, where the user 602 staring at the center of monitor 604 causes vertical parallax if the camera 606 is located at the top of monitor 604.

[0068] The purpose of many collaboration systems is to bring the video(s) from the remote meeting site(s) and their people to one or several displays in the local meeting room. Often times only one camera per meeting site is used, which results in distortions, so that the facial or eye directions of the remote participants do not correspond to those of a physical meeting, due to "shared eyes" or view-points. A way to approach a physical meeting situation is to represent each remote participant with a separate electronic representative (e.g., proxy or surrogate) in each local meeting space, as if those people would be present in the local space. This electronic surrogate consists of a video display attached with a camera to replace the remote participant's face and eyes, a microphone, and a loudspeaker. So-called telepresence robots have been suggested for such a surrogate.

[0069] FIG. 5 illustrates a virtual layout for a meeting between five participants at three sites 502/504/506, in accordance with some embodiments. In FIG. 5, the system produces a virtual meeting geometry where the participants shown as dots are arranged in a circle, illustrated by the dashed circle 510. The room walls have camera arrays for providing lines-of-sight between each participant, shown by bold dashed lines. The dashed triangle in the middle is a virtual space between real physical spaces. After the system defines the meeting setup geometry, both real and virtual positions of users (for local and remote participants, respectively) are known and can be used for calculating the view-points.

[0070] Embodiments described herein provide the ability to communicate using video connections between participants in a local site (referred as local participants) and remote sites

(referred as remote participants). A system gives the participants a sensation of a meeting in a common environment. In some embodiments, the number of local participants can vary, as well as the number of meeting sites. In some embodiments, the system operates to calculate a geometry between the meeting sites so that the participants have virtual lines-of-sight to each other. The system determines real-world lines of sight by mapping the virtual lines-of-sight onto the physical location, and moves one or more moveable cameras to be positions on the line of sight.

Furthermore, the system augments the videos so that when the local participant is looking at a given augmented video, his or her eyes are gazing towards the corresponding moveable camera on the corresponding line-of-sight. In such embodiments, the system may augment the received video streams in the local environment so that the remote participant's eyes in the augmented video substantially align with the lens of the moveable camera to reduce parallax.

[0071] In some embodiments, the system includes the following components:

1) Augmented reality glasses, used by the collaboration participants.

2) A set of cameras that operate to move and film local participants from different angles. In some embodiments, the cameras may be wired or wireless miniature robot cameras moving on the wall surface. In some embodiments, the wireless miniature robot cameras may be attached to a horizontal rail, moving horizontally (not correcting completely vertical distortion). In some embodiments, the miniature robot cameras may be attached to the surface by a mechanism such as a linear motor mount that allows movement in two dimensions. In some embodiments, the cameras may include cameras attached to a three- dimensional mechanism, or cameras with flying ability, allowing the camera movement also in the third dimension, e.g. allowing camera movement in front or inside of a holographic display. In some embodiments, the cameras may be moveable wide-angle cameras to reduce the amount of movement by capturing a wide field of view and cropping the portion to transmit to the other participants.

3) Software and hardware to track position of each participant. In some embodiments, the tracking may be based on depth information from a depth camera. In some embodiments, the tracking may be based on tracking features in the videos filmed by the cameras in the AR glasses.

4) Software and hardware to augment the video (and spatial audio) from the remote

participants to the local participant's view (AR glasses), based on common geometry formed by the participant positions in the local sites.

5) Audio system producing and capturing spatial audio so that remote participant's sound seems to be coming from direction of the corresponding video stream. In some embodiments, the audio system may be implemented by positioning moving speakers in the same positions as the cameras. In some embodiments, the audio system may include a directional multi-speaker system.

6) Software to compute the virtual geometry and the positions of the cameras so that correct lines-of-sight are supported between all local and remote participants.

7) Software to transmit video and position information between the plurality of physical meeting sites. 8) Software that is configurable to manipulate the video, e.g. augmenting virtual objects to the video or changing background of the video.

[0072] Potential advantages of described embodiments are that they support spatiality and approximate individual lines-of-sight ("face awareness") in a very flexible way. The flexibility includes letting a participant choose one's sitting point freely and allowing mobility to occur inside the meeting room.

[0073] In an exemplary embodiment, realistic lines-of-sight are supported between all participants in the meeting. As the currently available AR glasses cover the users' eye-region, the system may support "face awareness" rather than "gaze-awareness" (true eye-contact). True eye- contact may be supported using either highly transparent AR glasses/HMDs or some other sophisticated means. Optical see-through AR glasses may be used in embodiments described herein.

[0074] As described above, horizontal and vertical camera-display parallax distortions are important reasons why videoconferencing systems lack the naturalness of face-to-face meeting. In at least one embodiment, horizontal distortion is prevented by moving a camera to line-of-sight between the remote participants. In some embodiments, vertical distortion is prevented by aligning each remote participant's augmented video in such a way that the display of the remote participant's eyes coincides with the lens of the camera.

[0075] In order to support the free sitting order and/or mobility of the participants, the cameras may be moved in real time to support substantially optimal positioning. This may be done in embodiments that track the users' locations at each moment, e.g. using the positions of HMD cameras with respect to the camera setup.

[0076] FIG. 7 illustrates exemplary system components and data flows between them. For the sake of clarity, supporting components, such as session management components, e.g. user management and multipoint control unit for enabling connections between participants have been excluded. Also, components for controlling spatial audio have not been illustrated. The control of spatial audio may be performed using techniques analogous to those described herein for video.

[0077] In some embodiments, each user has a similar set of hardware components at each local site: video cameras 702, AR glasses 704, and a tracking system 706. These are connected to a web server 708 via e.g. wireless connection. The server 708 receives information from the local components and generates 710 a virtual geometry based on the participant position info and implements possible video manipulation 712 e.g. augmentation of virtual objects from database

714 into the videos or removing video backgrounds. In some embodiments, the virtual geometry is further based on stored virtual geometries in a database for virtual geometry 716. Implementing those operations in the server is only an example; they can be implemented in the local site system or remotes site system.

[0078] In some embodiments, the tracking system is used to track 3D positions of each user in each remote site. In some embodiments, tracking is implemented using depth cameras. The tracking system calculates position of each local user in the local sites coordinates system, e.g. distance and position of the local users with respect to the local room wall. This position information is transmitted to the virtual geometry generation system. Using the generated virtual geometries, the video cameras 702 may be positioned using camera positioning 718, and videos may be displayed on AR glasses 704 based on video positioning 720.

[0079] In some embodiments, the virtual geometry generation system takes as input participant positions from each of the sites and generates a virtual geometry e.g. as shown in FIGs. 9 and 10. As shown in FIG. 9, there are three sites: Site 1, Site 2, and Site 3. Site 1 and 2 each include 1 user: user 901 and 906, respectively. Site 3 includes users 902 and 904. As shown in FIG. 10, virtual positions of all the participants are represented as being along sides of triangle 1010, which may represent a virtual space in between each physical location. Further, positions of the cameras at each site may be represented, as shown in FIG. 11. This virtual geometry may be used as an input for camera and augmented video positioning systems. Note that generating virtual geometry here does not refer to 3D model generation. In some embodiments, it is enough to calculate and transmit the positions of each user and camera, instead of calculating and transmitting whole 3D model of the local site environment.

[0080] In some embodiments, a virtual geometry may be represented in terms of a set of relationships with respect to physical geometries. For example, a three-dimensional position within a physical geometry may be represented as an ordered triplet of three values [x,y,z] within a particular coordinate system. Thus, where an object is present in a first location, the position of that object within the first location may be represented by coordinates [xi, yi, z . Those coordinates [xi, yi, zi] can be transformed using, for example, multiplication by a rotation matrix

Ri and addition of an offset vector Vi, into a different coordinate system, such as a coordinate system of a virtual geometry, where the position of the object may be represented by the coordinates [χι', yi', z ], where [χι', yi', z ] = [xi, yi, zi] Ri + Vi . Using the matrix Ri and the vector Vi (or equivalent mathematical components), positions and orientations can be transformed from a coordinate system that represents a real location (the first location) to a coordinate system that represents a virtual geometry. Similar matrices and vectors (e.g. R2 and V2 , R3 and V3, and so on) may be defined that represent the positions and orientations of other physical locations (e.g. a second and third location) with respect to a single virtual geometry. This also allows positions and orientations to be transformed from one physical location into another based on the virtual geometry. For example, the coordinates [xi, yi, zi] may be transformed using Ri and Vi into the coordinates [χι', yi', z ] of the virtual geometry, and then [χι', yi', z ] may be transformed, using the inverse of R2 and the negative of V2, into the coordinates [x2, y2, z2]. Thus, given a virtual geometry, an object that is physically located at [xi, yi, zi] in a first location is considered to be virtually located at [x2, y2, z2] in the second physical location.

[0081] In an exemplary embodiment, a first user is in a first physical location, a second user is in a second physical location, and a third user is in a third physical location. In a local coordinate system of the first physical location, the head of the first user is at coordinates [xi, yi, z . In a local coordinate system of the second physical location, the head of the second user is at coordinates [x2, y2, z2]. In a local coordinate system of the third physical location, the head of the third user is at coordinates [x3, y3, z3]. The directions toward which the users are facing may also be represented by coordinates (e.g. by Euler angles). A virtual geometry is then defined by generating matrices Ri, R2, R3 and offset vectors Vi, V2, V3 that can be used to transform the different local coordinate systems into a common coordinate system. In the common coordinate system of the virtual geometry, the heads of the three users are located at coordinates [χι', yi', zi'], [χ2', y2', z2'], and [χ3', y3', z3'], respectively.

[0082] The virtual geometry may be defined by selecting a set of matrices Ri, R2, R3 and offset vectors Vi, V2, V3 such that the points [χι', yi', z ], [χ2', y2', z2'], and [χ3', y3', z3'] are at the vertices of an equilateral triangle, with the Euclidean distance between points being a natural meeting distance apart (e.g. 2-3m). In situations involving more users, different geometric arrangements may be predetermined (e.g. four users arranged at the vertices of a square, five users at the vertices of a regular pentagon, and so on). The selected Euclidean distance between adjacent points being reduced as the number of users increases (analogous to participants sitting closer together around a more crowded conference table). Where the z values represent the vertical dimension (e.g. height off the floor of each local user's head), the virtual geometry may be defined such that each user's head has the same altitude in the virtual geometry (so that all users are at the same eye level), or the virtual geometry may be defined such that Zi'=zi, so that floor heights (which may be at zi=0) are unchanged in the virtual geometry, and so that different heights of different users are reflected in the virtual geometry. The matrices Ri, R2, R3 and offset vectors Vi, V2, V3 may further be selected such that, in the common coordinate system, the users are all facing in a central direction, e.g. toward the center of the triangle (or other polygon as appropriate).

[0083] Within the common coordinate system of the virtual geometry, different virtual lines- of-sight extend between pairs of users. For example, the two points [χι', yi', zi'] and [χ2', y2', z2'] define a virtual line-of-sight between the first and second users. This virtual line-of-sight corresponds to respective real-world lines-of-sight in the first and second locations. For, example, the virtual line-of-sight may be transformed into a real-world line-of-sight in the first physical location using Ri and Vi, or the virtual line-of-sight may be transformed into a real-world line-of- sight in the second physical location using R2 and V2.

[0084] In an exemplary embodiment, the real-world lines-of-sight are used to select a location and orientation of a camera in the physical location. For example, a camera may be mounted so as to be movable across a two-dimensional plane, such as a wall surface, in the first physical location. The position and orientation of this two-dimensional plane may be expressed using parameters in the local coordinate system. A point of intersection is determined between the plane and the real- world line-of-sight. The camera is then automatically moved to the point of intersection and is oriented along the direction of the real-world line-of-sight. This camera provides the second user with a realistic line-of-sight of the first user. Analogous steps are performed to generate other real- world lines-of-sight and to position and orient cameras along those real-world lines-of-sight.

[0085] In some embodiments, all of the participants wear AR glasses, e.g. see-through head mounted displays. The video streams received from the remote sites are augmented according to the local participant's view, so that the user sees them at the position, orientation and size defined by the video positioning system. The video position, orientation and size are selected so that the remote participants appear to be located at positions defined in the virtual geometry, such that the remote participants appear in the correct perspective.

[0086] The video cameras include a mechanism that allows them to move in at least one dimension. In some embodiments, the cameras may be miniature robotic cameras that can move independently on a wall surface, or they can be attached to rods that are moved using motor servo mechanism. Regardless of the way the camera movement is implemented, the system is configurable to position the cameras in the position calculated by the camera positioning system. In some embodiments, the cameras are wide-angle or fish-eye cameras and the camera view towards local participant is cropped from the video. In some embodiments, the mechanism also turns the camera so that the video view pans to the direction of the local participant and zooms the camera so that the participant appears to be in correct size even if the camera distance from the participant changes.

[0087] FIGs. 8A and 8B illustrate an outline of a process flow, in accordance with some embodiments. As shown in FIG. 8A, for each user 802, a 3D position is tracked 804 with respect to the cameras, until the 3D position 806 for all users has been determined 808. A virtual meeting setup is formed 810 based on all of the user positions 806 and basic geometries 812 associated with a number of remote sites and users. Forming the virtual meeting setup includes forming a geometry 814 of the virtual meeting setup. FIG. 8B continues from FIG. 8A, and for each site and for a given user 816, the method moves/pans 818 one or more cameras to each user according to a line of sight determined by the geometry of the virtual meeting setup. The at least one camera captures 820 video for the given user. The user video is augmented 822 according to camera position. The moving/panning of the cameras 818, the video capturing 820, and the augmenting 822 may all be performed according to the geometry 824 of the virtual meeting setup. After all users have been captured and all videos have been processed 826, the method checks 828 to see if any changes have occurred. If so, the virtual meeting setup may be updated 832, unless the meeting has ended 830.

[0088] In some embodiments, the local participant wears AR glasses and the videos of the remote participants are augmented according to the views seen through the glasses. In alternative embodiments, the local participant has a large display and the remote participant videos are shown on the display. The system can also be implemented as a composite in some embodiments, where participants at different sites have different terminals: some sites may be equipped with AR glasses, some with other displays (e.g. non-wearable displays such as TV screens or computer monitors).

[0089] In some embodiments, the display variation includes the same components as described above, except the following. Note that in the sites where displays are used, AR glasses may not be needed. In some embodiments, the large display may be a multi-view display, if there are several local participants who all will see video of remote participants from their own angle. In some embodiments, the large display may be a planar, curved display, or may be composed of several smaller displays.

[0090] In some embodiments, the system includes a set of cameras configurable to move and film local participants from different angles. In some embodiments, the set of cameras includes wireless miniature robot cameras moving on the display surface (preferably small enough so as not to significantly obstruct the display), that can be fastened to the display e.g. by a magnet and e.g. being controlled by a magnetic mechanism behind the display. In some embodiments, the set of cameras includes moving cameras that can be installed to the back of the transparent display e.g. using mirror systems or semitransparent displays etc. In some embodiments where the display is composed of several smaller displays, the cameras may be installed in the seams between the displays. In such embodiments, the camera may be positioned to the location which is nearest to the actual computed position on the display. [0091] It should be noted that some embodiments may utilize a set of cameras including one or more camera from each of the above-mentioned types, as well as various other moving camera types that would be known to one of skill in the art.

[0092] One way to implement directional multi-user display is to use head position tracking with lenticular autostereoscopic displays (S3D). Holographic (e.g. light field) displays are examples of high-end solutions. Directional display technologies are becoming more affordable and even large directional displays may become feasible e.g. in living-room use.

[0093] This variation has its advantages and disadvantages. Large (e.g., wall size) displays give local participants good spatial experience without having to wear any disturbing gadgets, such as AR glasses. Further, real eye-contact may be achieved, because the AR glasses do not obstruct a participant's eyes. However, camera positioning is more difficult, as cameras on top of the display surface disturb the user and the displays that can have cameras installed behind the screen are rare and expensive.

[0094] FIG. 9 illustrates an exemplary set-up of a conference between three sites, with the participant positions relative to a planar reference surface (e.g. a wall) at each site. The system uses the participants' positions and live video from remote sites to create local participant illusion of all the participants being in a same meeting space. A local user's system receives video from the remote sites, together with position information of the remote participants.

[0095] The local system's positioning software computes a virtual scene, where the positions of the virtual counterparts of remote participants are placed. The positions of these virtual remote participants can be computed using known techniques. FIG. 10 illustrates an exemplary virtual geometry where participant positions relative to the walls in FIG. 9 are used to place the virtual participants relative to "virtual walls". In the case of three remote sites, the virtual participants can be positioned around a triangle. In general, N sites can be positioned around an N-sided polygon. If the walls are not planar, the virtual geometry may be constructed so that the shape of the virtual walls corresponds to the wall shape.

[0096] FIG. 11 illustrates exemplary virtual lines-of-sight that may be calculated according to the virtual geometry of FIG. 10, in accordance with some embodiments. As shown, there are virtual lines-of-sight (dotted lines) between any two users not in the same local meeting site. Based on the virtual positions of the users, the positioning software augments the video streams received from the remote sites so that the position of the remote participant's eyes on the video is on a respective virtual line-of-sight between the local participant' s eyes and remote virtual participant' s eyes. In the exemplary virtual geometry, camera positions in the virtual geometry are calculated as intersection points of lines-of-sight and the virtual walls. Since, in the example, the positions in the real world geometry correspond to the virtual geometry, the camera positions and remote video augmentation positions in the real world can be calculated. If the participant positions change, the system automatically repositions the cameras and the augmented remote videos so that the participant position change in the real world is reflected to the virtual geometry.

[0097] FIG. 12 illustrates an example of recalculating virtual geometry after participant 906 has moved. Six cameras and the related augmented video streams have been relocated, in response to the change in geometry. In particular, as illustrated in FIG. 12, cameras 1202/1204/1206 that are local to participant 906 (indicated by solid circles) are all relocated, and cameras 1208/1210/1212 that are remote to participant 906 and that capture video to be used by participant 906 (indicated by dashed circles). In some embodiments, the sites continually exchange information identifying the location of each user's head in the common coordinate system of the virtual geometry, allowing each site to convert those coordinates into a point in their respective local coordinate system. Each site may, if necessary, move the local cameras to a position along the line-of-sight between the local user and the virtual position of the remote user.

[0098] Each remote video's size and shape may be transformed to appear to be orthogonal to the line-of-sight while the size of the videos are corrected, relating the distances of the virtual participants. FIG. 13 illustrates an example of how various remote videos 1302, 1304, and 1306 are transformed with respect to participant 1301. As shown in FIG. 13, local participant 1301 may correspond to user 901 shown in FIGs. 9-11 while remote videos 1302, 1304, and 1306 are displaying users 902, 904, and 906, respectively. In some embodiments, the application may also remove background of the participant videos, in order to create the impression that all the participants have similar backgrounds. The system may also augment virtual elements, such as a common virtual meeting table, into each participant's view.

[0099] In order to create the impression that the participants are looking at each other, the camera positioning system may track the remote participants' faces from the videos and position cameras on the display so that they are positioned to coincide with the eyes in the remote participant's augmented video. In order to transmit stereo video to the remote site, cameras may be positioned at both eyes, however one camera per remote participant may be used, as shown in FIG. 14. As shown in FIG. 14, the augmented videos 1302, and 1304, and 1306 have a single camera coinciding with each remote user's eye, thus reducing vertical parallax that may otherwise occur.

[0100] Another aspect in described embodiments is spatial audio. It may be beneficial for a local participant to realize which one of the remote participants is talking, based on the direction of the audio. For this purpose, the system may be equipped with a spatial audio system, allowing the audio from the remote participants to be mixed so that a remote participant's sound seems to be coming from direction of the corresponding video stream. Such embodiments may be achieved by a multi-speaker audio system in the local site and a multi -microphone system in the remote site.

[0101] In some embodiments, a method for positioning one of a plurality of cameras at a second location for use in generating a line-of-sight view for a first conference participant at a first physical location of a second conference participant at a second physical location in a multisite video conference includes determining information related to a virtual conference geometry comprising information regarding the positions of a first conference participant at a first physical location, a second conference participant at a second physical location, and a third conference participant at a third physical location, determining a real world line-of-sight of the second conference participant at a second physical location corresponding to a virtual line of sight for the first conference participant at a first physical location of the second conference participant at a second physical location in the determined virtual conference geometry, adjusting the location and orientation of a first camera of the plurality of cameras at the second location to a position and direction substantially along the determined real world line of sight of the second conference participant at a second physical location corresponding to a virtual line of sight for the first conference participant at a first physical location of the second conference participant at a second physical location in the determined virtual conference geometry, and displaying a rendered representation of the second user to the first conference participant at a first physical location based at least in part on imagery captured by the relocated and reoriented first camera.

[0102] In some embodiments, the method further includes panning and/or tilting the first camera to establish the virtual line-of-site between the first and second users.

[0103] In some embodiments, at least one camera of the plurality of cameras at a second location is one of: a monoscopic, stereoscopic, 180, 360, RGBD, or light-field camera.

[0104] In some embodiments, at least one camera of the plurality of cameras at a second location provides the data to create a 3D reconstructed model of the conferencing session at each site to then produce a virtual camera video stream from the particular position best matching the viewing orientation of the remote session participants

[0105] In some embodiments, the rendered representation of the second user is displayed to the first conference participant using one of a HMD, AR viewport, AR glasses, and multi-view display.

[0106] FIG. 19 illustrates a call-flow diagram, in accordance with some embodiments. As shown, a local user initiates 1902 a call (e.g., using an HMD), through a local tele-conference system, which connects 1904 to a remote tele-conference system. The local tele-conference system calibrates 1906 the conference room makeup using local cameras looking around 1912 the room as well as in some embodiments the user looking 1910 around the room, having cameras on the HMD. The remote tele-conference system performs similar calibration actions 1908 with a remote user in a remote meeting site. The local and remote cameras begin streaming 1914/1916 video data via local and remote tele-conference systems, respectively. Further, the local and remote user positions are streamed 1918/1920, respectively. A shared virtual geometry is created 1922 and shared 1924 and the video feeds from the local and remote systems are shared 1928/1926 with each other.

Augmenting Telepresence Sessions.

[0107] As described above, traditional video conferencing systems allow users to see and hear what happens in a remote site, but the users cannot interact with the remote environment. Remote AR systems exist, allowing users to interact with virtual objects augmented to the remote environment, at least to some extent. An example of such a system is described by Siltanen et al., (Siltanen, Pekka; Valli, Seppo; Ylikerala, Markus; Honkamaa, Petri (2015), "An Architecture for Remote Guidance Service", CE2015, 2015, lOp).

[0108] 3D models and animations are virtual elements that may be visualized in AR. However, AR objects can be any digital information for which 3D position and orientation gives added value, for example pictures, videos, graphics, text, and audio.

[0109] In some spatial videoconferencing systems providing spatial viewpoints for augmented 3D objects, information on the remote space is captured using a fixed distributed 3D capture setup. Further, many AR solutions are primarily local. Remote AR solutions may be two-point and unsymmetrical, for example, remote guidance, where only the remote user (expert/consultant) is able to make augmentations. Symmetrical remote AR solutions are rare, and symmetrical, spatial, multi-point, remote AR solutions are almost non-existent.

[0110] In some embodiments, AR objects are remotely augmented into a setup of multiple participants and environments. The systems described above may be applied and extended for producing and showing 3D objects by and for the participants of the interaction session. In some embodiments, augmentation of 3D objects is made in real-time, without the need for local assistance.

[0111] Due to the underlying unified geometry (coordinate system) produced using methods described above, all participants have individual natural viewpoints to each other. In some embodiments, unified geometry also enables individual viewpoints to AR objects. FIG. 15 illustrates an augmented interaction 1500 between three participants 1502/1504/1506 and three added 3D objects 1508/1510/1512. The hashed triangle in the middle represents virtual space between the real physical spaces. The positions of participants 1502/1504/1506 are known by tracking and positions of the objects 1508/1510/1512 are known due to them being placed by one or more of the users. The system provides individual viewpoints (lines-of-sight) between all actors (humans and objects). Lines-of-sight are provided by camera arrays or moveable cameras on the walls, the walls illustrated by the dashed lines.

[0112] FIG. 17 illustrates augmented interaction, seen using AR glasses, for example, from the perspective of the rightmost user 1504 in FIG. 15. The 3D building model 1512 in the shared virtual space blocks the view of the meeting site with one of the remote participants 1506. Meeting room interiors in the augmented sub-views (inside smaller quadrangles) are not shown in FIG. 17 for simplicity. In some embodiments, each line-of-sight between participants is supported by a pair of local and remote cameras defined by the meeting geometry. For seeing augmented or virtual objects, one-way connection is sufficient. In some embodiments, the line-of-sight may be established using configurations with moving cameras, as described above.

[0113] In alternative embodiments, each participant may wear augmented reality HMDs or glasses (either optical or video see-through) with an embedded camera. The embedded camera locates each user with respect to one or more fiducials (markers or natural images), which in the exemplary setup are attached on a nearby wall. Such markers are illustrated as markers 1602/1604/1606 in FIG. 16. Combining this information, the system naturally also knows the local users' positions with respect to each other. In addition to the markers, a horizontal array of wide- angle cameras 1608 may be located on the wall, in relation to the markers, at an average eye-level of a sitting person (approximately 125cm above the floor level). FIG. 16 illustrates an arrangement of markers and cameras, in accordance with some embodiments. FIG. 16 illustrates a system before augmenting remote sites and AR objects. A camera array is attached on the wall. Here positioning of the user 1504 is done exemplarily by tracking markers 1602/1604/1606 with the embedded camera in user's AR glasses. The distance of the neighboring cameras 1608 is dense enough to provide each user with separate virtual lines-of-sight to all remote participants. As an alternative to camera arrays, one or more movable cameras may be used as described above. Remote participants are augmented spatially to the closest camera positions defined by the lines-of-sights in the unified virtual geometry. Instead of a separate 3D capture setup, a camera array may provide varying viewpoints to the position where a remote user wants to add a 3D object.

[0114] In some embodiments, capturing the positions of the users is based optionally on electronic sensors (acceleration sensors, depth sensors, etc.). In such embodiments the AR glasses or HMDs do not necessarily need a camera for visual positioning (for detecting markers). Instead of camera arrays, lines-of-sight may be provided by different kinds of setups with reduced number of cameras.

[0115] In some embodiments, users are positioned and tracked, brought into one consistent, unambiguous geometry, where lines-of-sight to other participants are provided using the geometry. Objects are placed by users, with no need to position or track. In some embodiments, the object positions are chosen by users. The objects are brought into the same consistent, unambiguous geometry, and lines-of-sight are provided from participants to the objects using the geometry. Users each have individual viewpoints also to the shared virtual space. In some embodiments, the virtual space includes e.g., a table, and 3D models for collaborative viewing. In some embodiments, users interact with the 3D models and are provided with the ability to move/turn the 3D models. In some embodiments, side views are provided using a camera array/setup (either array of static cameras or an array of moving cameras), with an associated UI for seeing side views to specified location, e.g. by "sliding along" the camera array. In some embodiments, a UI is presented for placing, rotating, and scaling 3D augmentations using side views. In some embodiments, the UI may include a floor plan/map for inserting and moving objects in the common layout. In some embodiments, objects can be placed in each participant's physical room or in the shared virtual space between rooms. In some embodiments, interaction to move 3D objects between spaces may be supported, providing a continuum of local (physical), remote (AR), and virtual spaces.

[0116] Objects may be treated the same way as users. Object location (environment) is captured by line-of-sight camera based on virtual geometry and the view is displayed on the receiving site using AR on line-of-sight. Remote augmentation of 3D objects is supported by providing side-views to the chosen and to-be-adjusted object position using e.g. a camera array. In some embodiments, spatial/3D audio sources are assigned relating the positions of physical or augmented objects.

[0117] Some embodiments include a shared virtual space, such that users have individual viewpoints also to the virtual space. In some embodiments, the virtual space can contain e.g. a table, and a 3D model for collaborative viewing. Through use of UIs described above, the users may interact with the 3D model to move/scale/rotate the 3D model around any of the 3 axes.

[0118] FIGs. 20A and 20B illustrate exemplary flowcharts of a process 2000, in accordance with some embodiments. FIGs. 20A and 20B are similar to FIGs. 8A and 8B, with the exception that all 3D objects are treated as well as all users. As shown in FIG. 20 A, for each site, user, and the additional AR objects 2002, each user's 3D position is tracked 2004 with respect to cameras in the user's location. The position of the AR object is obtained 2008 if any AR objects are present throughout the virtual meeting. The user and object positions are stored 2006 as coordinates for use in forming 2012 the geometry for each meeting site once all users and objects have been processed 2010. The geometry of each meeting site is stored 2014. The method continues in FIG. 20B, where for each site, user, and AR object 2016, video is captured, linearized, and cropped to the user/object direction 2018. The user video/ AR object is augmented on each line-of-sight 2020 until all users, objects, and lines-of-sight are processed 2022. If any changes occur in the meeting setup 2024 and the meeting has not ended 2026, the meeting site geometry is updated 2030 and the geometry of each meeting site 2028 is updated for use in capturing and augmenting 2018/2020, respectively.

[0119] In some embodiments, a system includes cameras, microphones, and loudspeakers for visual and audio communication. The set of cameras, microphones and speakers may be reasonably large (but tolerable) in order to support a flexible number of users and, realistic lines- of-sight. In some embodiments, the cameras are in a fixed array, while alternatively the cameras may be moving cameras, or a combination of both.

[0120] The system may include a spatial audio system. In some embodiments, audio sources are augmented in user's environment. In some embodiments, the augmented audio sources are associated with either physical or augmented 3D objects. In some embodiments, playback is accomplished in relation to chosen 3D positions by using a spatial audio system with multiple channels and loudspeakers as in audio surround systems. An example of this is given in US 2015/0215351, "Control of enhanced communication between remote participants using augmented and virtual reality".

[0121] An exemplary system includes AR scene generation, e.g. rendered AR views for showing remote participants and augmented 3D objects. In some embodiments, a maximum number of remote participants is the same as or less than a number of physical cameras in the room. Note that in some embodiments, more remote participant are served in embodiments in which a camera happens to coincide on two or more lines-of-sight.

[0122] In some embodiments, each participant is wearing AR glasses to render AR views of remote parties as well as AR objects. Alternatively, displays such as external screens may be used.

[0123] An exemplary system includes a tracking method and device for tracking and positioning of users. The tracking device is used for serving multiple users per site. The tracking device is configured for positioning and tracking (e.g. pan-tilt-zoom) for cropping of the line-of- sight camera view correctly according to one consistent virtual geometry. In some embodiments, tracking includes AR tracking performed by AR glasses. [0124] The system uses geometry calculations and definition to use captured (and tracked) positions of all users. The geometry is consistently defined for each user using rules. In some embodiments, line-of-sight cameras are allocated to provide real line-of-sight views. The system may also create a graphical layout of the meeting setup for an AR editor. In some embodiments, the virtual geometry is implemented in a client terminal or on a server for creating and managing connections, and storing AR objects.

[0125] In some embodiments, the system includes cameras for remote AR editing, as well as an associated AR editor. Cameras for visual communication can be used also.

[0126] In some embodiments, users are positioned and tracked, brought into one consistent, unambiguous geometry, where lines-of-sight to other participants are provided using the geometry.

[0127] In some embodiments, the consistent unambiguous geometrical arrangement results in three types of subspaces, which have different possibilities and requirements for interaction. The first subspace includes local spaces, which are the physical meeting spaces for one or more users. In some embodiments, the physical environments are used to experience either locally or remotely made objects or avatars. In some embodiments, augmentations in local spaces can be seen by local participants using AR glasses, for example. In some embodiments, augmentations in local spaces can be seen by remote participants over a network by using AR glasses.

[0128] The second type of subspaces are remote spaces. The remote space corresponds to meeting spaces in defined spatial arrangement seen by other participants over the network. Users are shown according to their detected and tracked positions in their respective remote spaces. In some embodiments, the remote spaces represent environments to place remote augmentations. In order to do so, a layout/map type graphical UI and a camera array setup may be used to choose the position of the object to be augmented.

[0129] The third type of subspace is a virtual space, which corresponds to a shared space in between the geometrically arranged meeting sites (local/remote spaces). In some embodiments, the virtual space is a virtually modelled and furnished space used to form a visually connecting element between meeting rooms. In some embodiments, the virtual space is used for showing virtual object(s) for collaborative viewing.

[0130] As described above, the interaction space may include three basic types of subspaces.

A user can place and move a 3D model in and between any of these spaces. Placing the object starts favorably by using the graphical layout of the interaction setup for making an initial estimate for the 3D location for the augmentation. Having this initial estimate, the system can point and zoom (by virtual pan-tilt-zoom) the array of cameras towards the initial position and show the object rendered in each of the camera views. FIG. 18 illustrates a remote user (not shown in picture) augmenting local user's 1504 site with an AR object 1804. The cameras know the object's 3D position (as the position is defined by the remote user) and are pointed (virtual pan-tilt-zoom) to provide multiple perspective views to aid remote placing, rotating, and scaling of the AR object. In some embodiments, the user can drag the viewpoint to the object position along the camera array by mouse interaction and see how the augmentation shows from different angles. Based on these side views, the user can adjust the position, orientation and size of the 3D object until he/she is satisfied with the result. Exemplary embodiments also allow local users to position AR objects using the perspective views that are accessible to remote users.

[0131] A UI component to use for moving of objects in some embodiments is the graphical layout of the adjacent meeting spaces as defined by the system. As the coordinate system extends over all these spaces, moving an object over subspace boundaries may be visualized as a continuous process, so that an object may gradually leave one space and appear respectively gradually in another space according to a user's action.

[0132] The objects are treated similarly as to how human participants are treated. There are some important differences, however, such as objects are seen as part of the physical (or shared virtual) environment in the same geometry as human participants. Cameras may be used to capture the physical region containing an augmented object. Further, objects do not need to receive views of other participants. Unidirectional video capture and connection is enough for objects. For human participants, connections are bidirectional, as described above.

[0133] A user may choose to participate in the collaboration from her physical meeting location. In addition, the user can choose to augment and see his/her own avatar in some of the remote meeting rooms. An object augmented in some of the physical spaces can thus also be a human avatar. In some embodiments, the user (the exemplar of an avatar) sees and controls his/her avatar only as an outsider, in a third person view. Although a common paradigm in computer games, this is likely to reduce the perceived feeling of presence and immersion.

[0134] The participants may choose to use the shared virtual space in the same way as a normal virtual environment. Differing from the above, avatars in the virtual space can see each other also in a first person mode, as well as through "windows" in the virtual space to users in environments participating via video.

[0135] Although the system is configurable to support a user appearing simultaneously in several spaces, in many embodiments it is desirable to restrict his/her appearance to one modality (e.g. either a real video or a virtual avatar) in order not to confuse other participants with multiple simultaneous appearances. [0136] In some embodiments, a user interacts with other users using only the shared virtual space. In some embodiments, the user uses an avatar representing him/her self in the shared space. Window views from participants in their physical surroundings (around the virtual space) can be brought to the avatar from the correct view-point using knowledge about the defined system geometry.

[0137] In some embodiments, a user participates in an interactive session using existing video conferencing application, running e.g. on a laptop or some other terminal with a hardware display. In addition to normal videoconferencing, a virtual mode using the shared virtual environment is straightforwardly supported.

[0138] Apart from the spatial perception provided to some extent when using the above mentioned shared virtual space, a user having reduced terminal capabilities loses the spatial support (natural spatial viewpoints) provided by the disclosed system. The user can see the remote sites, users, and the possibly augmented objects in them, in a video mosaic type of arrangement. The user can also see the objects someone has augmented in remote parties' meeting rooms, can place augmentations there, however the remote parties cannot obtain varying viewpoints to make remote augmentations to the user's environment.

[0139] As described, a user of the disclosed system does not have the need for a full-fledged system with all described components or functionalities. This type of downward compatibility is a benefit considering the take-up and penetration of the disclosed system.

[0140] In embodiments where there are several users per site and/or a number of augmented objects, spatially augmented remote views for them occupy an increasing large area of the receiver's field of view. It is a reasonable option to collect all the views (stitches) coming from the same remote site and stich them into a large panorama seen from the receiving user's view-point.

[0141] In some embodiments, all views (for humans or objects) coming from the same remote site have different parts of the same physical background. As the views come from different cameras from different viewpoints, even from slightly different distances, those parts appear in different perspectives and scales.

[0142] Forming a 3D reconstruction from all line-of-sight cameras is used in some embodiments to get the combined panorama view from a correct view-point. An alternative option is to apply some simplified approximation using the views from the array of cameras. In some embodiments, the system stitches neighboring camera views into panorama so that the distortions on image borders stay small. A panorama may be generated using the techniques described in Yang et al., "A real-Time Distributed Light Field Camera", Eurographics Workshop on Rendering (2002), pp. 1-10. In Yang et al., a variant of a light field rendering method is presented in which the panorama is collated from a set of vertical image slices received from an array of cameras.

[0143] Some exemplary embodiments use a window paradigm (windows being augmented to virtual glasses) to provide lines-of-sight to other participants using the defined common virtual geometry and real-world coordinate system. This can be referred to as outside-in viewing approach. Each participant thus sees the other participants through a window to their environments, not being themselves inside that space. The latter can in principle be supported by capturing each remote site by a sensor setup, reconstructing the sites in 3D, and forming virtual camera views (view-points) from a desired position. Respectively, this approach can be referred to as inside-out approach. The latter approach is favorable in Mixed Reality based systems supporting visits by avatars, and expecting that a user must see from the remote space with the eyes of his/her avatar, for example supported by virtual cameras using the 3D reconstructed space.

[0144] FIG. 21 illustrates a flowchart of a method 2100, in accordance with some embodiments. As shown, method 2100 includes generating 2102 a virtual geometry based on position information for at least a first user in a first physical location, a second user in a second physical location, and at least a third user in a third location. A virtual line-of-sight is determined 2104 in the virtual geometry between the first user and the second user, and a responsively a real- world line-of-sight is determined 2106, the real -world line-of-sight associated with the virtual line- of-sight by mapping the virtual line-of-sight into the first physical location. At least one moveable camera of a plurality of moveable cameras is automatically moved to a position and orientation along the determined real-world line-of-sight, and a rendered version of the second user is displayed 2110 proximate to the position of the at least one moveable camera. In some embodiments, the rendered version of the second user is displayed along the real-world line-of- sight. In such embodiments, eyes of the second user may be displayed along the real-world line- of-sight to eliminate vertical parallax. The rendered version may be displayed to the first user such that the rendered version of the second user appears substantially along the real-world line-of- sight.

[0145] In some embodiments, video captured by the at least one moveable camera is transmitted to the second user in the second physical location for display to the second user. In some embodiments, the rendered version of the second user corresponds to a received 3D reconstructed model formed using a plurality of cameras at the second location. Alternatively, the rendered version of the second user may correspond to a live video of the second user captured at the second location. [0146] In some embodiments, the at least one moveable camera further is panned to align a field of view of the moveable camera with the determined real-world line-of-sight.

[0147] In some embodiments, information about a position of an augmented reality object in the second location is obtained, and a second virtual line-of-sight in the virtual geometry is determined between the position of the augmented reality object in the second location and the first user in the first location. Subsequently a second real-world line-of-sight associated with the second virtual line-of-sight is determined by mapping the second virtual line-of-sight into the first physical location, and a rendered version of the augmented reality object is displayed on the second real-world line of sight.

[0148] In some embodiments, in response to at least one of the first and second users moving in the first and second locations respectively, the virtual line-of-sight is automatically updated and the at least one moveable camera is repositioned along an updated real-world line-of-sight determined from the updated virtual line-of-sight.

[0149] In some embodiments, generating the virtual geometry includes obtaining information about N locations, wherein N is an integer greater than two, and forming the virtual geometry by assigning each location to a respective side of an N-sided virtual polygon representing a virtual meeting space.

[0150] In some embodiments, a similar method may be performed for the third user in the third physical location.

[0151] In some embodiments, at the second (or third) physical location, a second real-world line-of-sight is determined associated with the virtual line-of-sight by mapping the virtual line-of- sight into the second (or third) physical location, and at least one moveable camera of a plurality of moveable cameras in the second (or third) location is automatically moved to a position and orientation along the determined second real-world line-of-sight. A rendered version of the first user may be displayed proximate to the position of the at least one moveable camera of the plurality of moveable cameras in the second location.

[0152] In some embodiments, the method further includes updating the virtual geometry in response to obtaining position information about at least a fourth user in a fourth location. In such embodiments, in response to updating the virtual geometry, the virtual line-of-sight may be updated based on the updated virtual geometry, and the method may include determining an updated real-world line-of-sight based on the updated virtual line-of-sight and automatically moving the at least one moveable camera to a position and orientation along the updated real- world line-of-sight. [0153] In some embodiments, the method further includes panning and/or tilting the first camera to establish the virtual line-of-site between the first and second users.

[0154] In some embodiments, at least one camera of the plurality of cameras at a second location is one of: a monoscopic, stereoscopic, 180, 360, RGBD, or light-field camera.

[0155] In some embodiments, at least one camera of the plurality of cameras at a second location provides the data to create a 3D reconstructed model of the conferencing session at each site to then product a virtual camera video stream from the particular position best matching the viewing orientation of the remote session participants

[0156] In some embodiments, rendered representation of the second user is displayed to the first conference participant using HMD, AR viewport, AR glasses, and a multi-view display.

[0157] The above approach does not inherently support bodily (e.g. motion capture based) control of the avatar based on the user's own motions. At its best, the experience could be that of virtual games, with mouse control and no sensory feedback. Likely there are people who would accept such an approach, but it is still challenged by the technical complexity of real-time 3D reconstruction with high quality, without producing excessive amounts of data.

[0158] The above-mentioned difficulties in forming and controlling an avatar and supporting inside-out virtual view-points are avoided by the approach taken in embodiments described above. Supporting remote augmentation of 3D objects can be made in a much less demanding way by using the same camera array used for providing lines-of-sight between remote users.

[0159] The above-described embodiments accomplish several solutions to problems associated with remote tele-interaction systems. The solutions include, but are not limited to, providing a continuum of local, virtual, and remote spaces, providing perception of gaze, directions, and scale, providing remote augmentation of 3D objects to remote (physical) spaces, providing local augmentation and viewing of 3D objects in local (physical) spaces, providing local viewing of remotely augmented (received) 3D objects, providing bodily first person control of viewpoint to remote spaces (physical and/or virtual), providing virtual visits into shared virtual space as avatars, providing induction of 3D objects into shared virtual space, and providing human participants, objects, virtual environment, etc. can be shown on separate layers if desired.

[0160] Virtual and Mixed Reality worlds are an attempt to overcome the difficulties of teleporting actual objects. They do however, lack naturalness, physical senses, and do not enable experiencing a remote space by one's own free movements, as users are restricted by their actual physical environment.

[0161] Virtual/Mixed Reality solutions require ways to control one's digital clone or avatar, most favorably by controlling yourself. However, it is very difficult to copy one's natural motion, gestures and mimics especially when one is using virtual glasses. A clone also benefits from senses. Hearing is easy to implement, but seeing requires viewpoints from inside the space the clone is visiting. It is difficult to provide a digital clone with tactile senses, which is a big hindrance for experiencing physically remote environments.

[0162] Due to the above perceptual and technical difficulties, no good solutions for teleporting exist. Those most complicated solutions based on 3D reconstruction, virtual cameras etc. do not meet many of the important requirements.

[0163] Embodiments described above provide a spatial tele-interaction system supporting natural spatial viewpoints to remote participants without compromising the freedom of moving around. The embodiments also support taking objects to remote spaces as augmented 3D models. The embodiments avoid the need for computationally demanding 3D reconstruction and virtual camera views.

Exemplary Communications Framework.

[0164] FIG. 1 A is a diagram of an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, and the like, to multiple wireless users. The communications system 100 may enable multiple wired and wireless users to access such content through the sharing of system resources, including wired and wireless bandwidth. For example, the communications systems 100 may employ one or more channel-access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), and the like. The communications systems 100 may also employ one or more wired communications standards (e.g.: Ethernet, DSL, radio frequency (RF) over coaxial cable, fiber optics, and the like.

[0165] As shown in FIG. 1 A, the communications system 100 may include client devices 102a, 102b, 102c, and/or 102d, Radio Access Networks (RAN) 103/104/105, a core network 106/107/109, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, and communication links 115/116/117, and 119, though it will be appreciated that the disclosed embodiments contemplate any number of client devices, base stations, networks, and/or network elements. Each of the client devices 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wired or wireless environment. By way of example, the client device 102a is depicted as a tablet computer, the client device 102b is depicted as a smart phone, the client device 102c is depicted as a computer, and the client device 102d is depicted as a television.

[0166] The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.

[0167] The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, and the like. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple- input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.

[0168] The base stations 114a, 114b may communicate with one or more of the client devices 102a, 102b, 102c, and 102d over an air interface 115/116/117, or communication link 119, which may be any suitable wired or wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, and the like). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).

[0169] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel-access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the client devices 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).

[0170] In another embodiment, the base station 114a and the client devices 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE- A).

[0171] In other embodiments, the base station 114a and the client devices 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.

[0172] The base station 114b in FIG. 1 A may be a wired router, a wireless router, Home Node B, Home eNode B, or access point, as examples, and may utilize any suitable wired transmission standard or RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like. In one embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In another embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 114b and the client devices 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, and the like) to establish a picocell or femtocell. In yet another embodiment, the base station 114b communicates with client devices 102a, 102b, 102c, and 102d through communication links 119. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the core network 106/107/109.

[0173] The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the client devices 102a, 102b, 102c, 102d.

As examples, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and the like, and/or perform high-level security functions, such as user authentication. Although not shown in

FIG. 1A, it will be appreciated that the RAN 103/104/105 and/or the core network 106/107/109 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 103/104/105 or a different RAT. For example, in addition to being connected to the RAN 103/104/105, which may be utilizing an E-UTRA radio technology, the core network 106/107/109 may also be in communication with another RAN (not shown) employing a GSM radio technology.

[0174] The core network 106/107/109 may also serve as a gateway for the client devices 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and IP in the TCP/IP Internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.

[0175] Some or all of the client devices 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the client devices 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wired or wireless networks over different communication links. For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.

[0176] FIG. IB depicts an example client device that may be used within the communications system of FIG. 1 A. In particular, FIG. IB is a system diagram of an example client device 102. As shown in FIG. IB, the client device 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, a non-removable memory 130, a removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and other peripherals 138. It will be appreciated that the client device 102 may represent any of the client devices 102a, 102b, 102c, and 102d, and include any subcombination of the foregoing elements while remaining consistent with an embodiment. Also, embodiments contemplate that the base stations 114a and 114b, and/or the nodes that base stations 114a and 114b may represent, such as but not limited to transceiver station (BTS), a Node-B, a site controller, an access point (AP), a home Node-B, an evolved home Node-B (eNodeB), a home evolved Node-B (HeNB), a home evolved Node-B gateway, and proxy nodes, among others, may include some or all of the elements depicted in FIG. IB and described herein. [0177] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the client device 102 to operate in a wired or wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. IB depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.

[0178] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117 or communication link 119. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. In yet another embodiment, the transmit/receive element may be a wired communication port, such as an Ethernet port. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wired or wireless signals.

[0179] In addition, although the transmit/receive element 122 is depicted in FIG. IB as a single element, the client device 102 may include any number of transmit/receive elements 122. More specifically, the client device 102 may employ MTMO technology. Thus, in one embodiment, the

WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 115/116/117.

[0180] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the client device 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the client device 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.

[0181] The processor 118 of the client device 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128

(e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the client device 102, such as on a server or a home computer (not shown).

[0182] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the client device 102. The power source 134 may be any suitable device for powering the WTRU 102. As examples, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel- zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, a wall outlet and the like.

[0183] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the client device 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the client device 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment. In accordance with an embodiment, the client device 102 does not comprise a GPS chipset and does not acquire location information.

[0184] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.

[0185] FIG. 1C depicts an exemplary network entity 190 that may be used in embodiments of the present disclosure, for example as a server. As depicted in FIG. 1C, network entity 190 includes a communication interface 192, a processor 194, and non-transitory data storage 196, all of which are communicatively linked by a bus, network, or other communication path 198.

[0186] Communication interface 192 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 192 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 192 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 192 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.

[0187] Processor 194 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.

[0188] Data storage 196 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non- transitory data storage deemed suitable by those of skill in the relevant art could be used. As depicted in FIG. 1C, data storage 196 contains program instructions 197 executable by processor 194 for carrying out various combinations of the various network-entity functions described herein.

[0189] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer- readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

CLAIMS We claim:
1. A method comprising:
generating a virtual geometry based on position information for at least a first user in a first physical location, a second user in a second physical location, and at least a third user in a third location;
determining a virtual line-of-sight in the virtual geometry between the first user and the second user;
determining a real-world line-of-sight associated with the virtual line-of-sight by mapping the virtual line-of-sight into the first physical location;
automatically moving at least one moveable camera of a plurality of moveable cameras in the first location to a position and orientation along the determined real-world line-of-sight; and displaying, to the first user, a rendered version of the second user along the real-world line- of-sight.
2. The method of claim 1, further comprising transmitting video captured by the at least one moveable camera to the second user in the second physical location for display to the second user.
3. The method of claim 1, wherein the rendered version of the second user corresponds to a received 3D reconstructed model formed using a plurality of cameras at the second physical location.
4. The method of claim 1, wherein the rendered version of the second user corresponds to a live video of the second user captured at the second physical location.
5. The method of claim 1, wherein moving the at least one moveable camera further comprises panning the at least one moveable camera to align a field of view of the moveable camera with the determined real-world line-of-sight.
6. The method of claim 1, further comprising: obtaining information about a position of an augmented reality object in the first physical location;
determining a second virtual line-of-sight in the virtual geometry between the position of the augmented reality object in the first physical location and the second user in the second physical location;
determining a second real-world line-of-sight associated with the second virtual line-of- sight by mapping the second virtual line-of-sight into the first physical location;
automatically moving a second moveable camera of the plurality of moveable cameras in the first physical location to a position and orientation along the second real-world line-of-sight; and
transmitting video captured by the second moveable camera to the second user.
7. The method of claim 1, further comprising automatically updating the virtual line-of-sight in response to a change in position of at least one of the first and second users; and
automatically repositioning the at least one moveable camera along an updated real-world line-of-sight determined from the updated virtual line-of-sight.
8. The method of claim 1, wherein generating the virtual geometry comprises:
obtaining information about N locations, wherein N is an integer greater than two; and forming the virtual geometry by assigning each location to a respective side of an N-sided virtual polygon representing a virtual meeting space.
9. The method of claim 1, further comprising:
determining a second virtual line-of-sight between the first user and the third user in the virtual geometry;
determining a second real-world line-of-sight associated with the second virtual line-of- sight by mapping the second virtual line-of-sight into the first physical location;
automatically moving at least a second moveable camera of the plurality of moveable cameras in the first physical location to a position and orientation along the determined second real-world line-of-sight; and
displaying a rendered version of the third user along the determined second real-world line- of-sight.
The method of claim 1, further comprising: determining a second real-world line-of-sight associated with the virtual line-of-sight by mapping the virtual line-of-sight into the second physical location;
automatically moving at least one moveable camera of a plurality of moveable cameras in the second physical location to a position and orientation along the determined second real-world line-of-sight; and
displaying, to the second user, a rendered version of the first user along the determined second real-world line-of-sight.
11. The method of claim 1, wherein the rendered version of the second user is displayed to the first user via an augmented reality (AR) head-mounted display (HMD).
12. The method of claim 11, further comprising capturing, using an augmented reality (AR) head-mounted display (HMD), images of one or more visual markers using cameras on the AR HMD, the captured images of the one or more visual markers used for calculating the position information for the first user in the first physical location.
13. The method of claim 1, wherein displaying the rendered version of the second user along the real-world line-of-sight comprises aligning eyes of the rendered version of the second user along the real-world line-of-sight.
14. The method of claim 1, further comprising updating the virtual geometry in response to obtaining position information about at least a fourth user in a fourth location.
15. The method of claim 14, further comprising:
updating the virtual line-of-sight based on the updated virtual geometry, and responsively determining an updated real-world line-of-sight based on the updated virtual line-of-sight; and automatically moving the at least one moveable camera to a position and orientation along the updated real-world line-of-sight.
16. A system comprising a non-transitory computer readable medium for carrying one or more instructions, wherein the one or more instructions, when executed by one or more processors, causes the one or more processors to perform the steps of: generating a virtual geometry based on position information for at least a first user in a first physical location, a second user in a second physical location, and at least a third user in a third location;
determining a virtual line-of-sight in the virtual geometry between the first user and the second user;
determining a real-world line-of-sight associated with the virtual line-of-sight by mapping the virtual line-of-sight into the first physical location;
automatically moving at least one moveable camera of a plurality of moveable cameras in the first physical location to a position and orientation along the determined real-world line-of- sight; and
displaying, to the first user, a rendered version of the second user along the determined real-world line-of-sight.
17. The system of claim 16, wherein the non-transitory computer readable medium further comprising instructions for transmitting video captured by the at least one moveable camera to the second user in the second physical location for display to the second user.
18. The system of claim 16, wherein the non -transitory computer readable medium further comprises instructions for:
automatically updating the virtual line-of-sight in response to a change in position of at least one of the first and second users; and
automatically repositioning the at least one moveable camera along an updated real-world line-of-sight determined from the updated virtual line-of-sight.
19. The system of claim 16, wherein non-transitory computer readable medium comprises instructions for generating the virtual geometry by:
obtaining information about N locations, wherein N is an integer greater than two; and forming the virtual geometry by assigning each location to a respective side of an N-sided virtual polygon representing a virtual meeting space.
20. The system of claim 16, wherein the non-transitory computer readable medium further comprising instructions for:
determining a second virtual line-of-sight between the first user and the third user in the virtual geometry; determining a second real-world line-of-sight associated with the second virtual line-of- sight by mapping the second virtual line-of-sight into the first physical location;
automatically moving at least a second moveable camera of the plurality of moveable cameras in the first physical location to a position and orientation along the determined second real-world line-of-sight; and
displaying a rendered version of the third user proximate to the position of the at least second moveable camera.
PCT/US2017/038820 2016-06-30 2017-06-22 System and method for spatial interaction using automatically positioned cameras WO2018005235A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US201662357060 true 2016-06-30 2016-06-30
US62/357,060 2016-06-30

Publications (1)

Publication Number Publication Date
WO2018005235A1 true true WO2018005235A1 (en) 2018-01-04

Family

ID=59295332

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/038820 WO2018005235A1 (en) 2016-06-30 2017-06-22 System and method for spatial interaction using automatically positioned cameras

Country Status (1)

Country Link
WO (1) WO2018005235A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030067536A1 (en) * 2001-10-04 2003-04-10 National Research Council Of Canada Method and system for stereo videoconferencing
US20050168402A1 (en) * 2004-01-29 2005-08-04 Culbertson W. B. Method and system for communicating gaze in an immersive virtual environment
US7515174B1 (en) 2004-12-06 2009-04-07 Dreamworks Animation L.L.C. Multi-user video conferencing with perspective correct eye-to-eye contact
US20100103244A1 (en) * 2007-03-30 2010-04-29 Nxp, B.V. device for and method of processing image data representative of an object
US20110102538A1 (en) * 2009-10-29 2011-05-05 Kar-Han Tan Systems for establishing eye contact through a display
US20130141573A1 (en) * 2011-12-06 2013-06-06 Alcatel-Lucent Usa Inc Spatial Bookmarking
US20140232816A1 (en) * 2013-02-20 2014-08-21 Microsoft Corporation Providing a tele-immersive experience using a mirror metaphor
US20140267584A1 (en) * 2011-11-30 2014-09-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. View rendering for the provision of virtual eye contact using special geometric constraints in combination with eye-tracking
US20150213650A1 (en) * 2014-01-24 2015-07-30 Avaya Inc. Presentation of enhanced communication between remote participants using augmented and virtual reality

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030067536A1 (en) * 2001-10-04 2003-04-10 National Research Council Of Canada Method and system for stereo videoconferencing
US20050168402A1 (en) * 2004-01-29 2005-08-04 Culbertson W. B. Method and system for communicating gaze in an immersive virtual environment
US7515174B1 (en) 2004-12-06 2009-04-07 Dreamworks Animation L.L.C. Multi-user video conferencing with perspective correct eye-to-eye contact
US20100103244A1 (en) * 2007-03-30 2010-04-29 Nxp, B.V. device for and method of processing image data representative of an object
US20110102538A1 (en) * 2009-10-29 2011-05-05 Kar-Han Tan Systems for establishing eye contact through a display
US20140267584A1 (en) * 2011-11-30 2014-09-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. View rendering for the provision of virtual eye contact using special geometric constraints in combination with eye-tracking
US20130141573A1 (en) * 2011-12-06 2013-06-06 Alcatel-Lucent Usa Inc Spatial Bookmarking
US20140232816A1 (en) * 2013-02-20 2014-08-21 Microsoft Corporation Providing a tele-immersive experience using a mirror metaphor
US20150213650A1 (en) * 2014-01-24 2015-07-30 Avaya Inc. Presentation of enhanced communication between remote participants using augmented and virtual reality
US20150215351A1 (en) 2014-01-24 2015-07-30 Avaya Inc. Control of enhanced communication between remote participants using augmented and virtual reality

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
"Specification of Multi-View Acquisition System, v 1.0", DELIVERABLE DL.L, EU-FP7 PROJECT 3DPRESENCE, 2008, pages 48
BUXTON, W.: "Telepresence: integrating shared task and person spaces", PROCEEDINGS OF GRAPHICS INTERFACE '92, 1992, pages 123 - 129
FUCHS ET AL.: "Immersive 3D Telepresence", IEEE COMPUTER, July 2014 (2014-07-01), pages 46 - 52, XP011554133, DOI: doi:10.1109/MC.2014.185
KANTONEN ET AL.: "Mixed Reality in Virtual World Teleconferencing", PROC. IEEE VIRTUAL REALITY, WALTHAM, MASSACHUSETTS, USA, 20 March 2010 (2010-03-20), pages 179 - 182, XP031656104
KLEIN, G.; MURRAY, D.: "Parallel tracking and mapping for small AR workspaces", 6TH IEEE AND ACM INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, ISMAR, 2007
P. EISERT: "Immersive 3-D Video Conferencing: Challenges, Concepts, and Implementations", PROC. SPIE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP, July 2003 (2003-07-01)
WANG ET AL.: "Mutual awareness in collaborative design: An Augmented Reality integrated telepresence system", COMPUTERS IN INDUSTRY, vol. 65, 2014, pages 314 - 324, XP028820647, DOI: doi:10.1016/j.compind.2013.11.012
YANG ET AL.: "A real-Time Distributed Light Field Camera", EUROGRAPHICS WORKSHOP ON RENDERING, 2002, pages 1 - 10
YANG ET AL.: "Enabling Multi-party 3D Tele-immersive Environments with ViewCast", ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS, vol. 6, no. 2, March 2010 (2010-03-01), pages 111 - 139
ZHANG ET AL.: "Viewport: A Distributed, Immersive Teleconferencing System with Infrared Dot Pattern", IEEE MULTIMEDIA, vol. 20, no. 1, January 2013 (2013-01-01), pages 17 - 27, XP011493977, DOI: doi:10.1109/MMUL.2013.12

Similar Documents

Publication Publication Date Title
US20120038739A1 (en) Methods, systems, and computer readable media for shader-lamps based physical avatars of real and virtual people
US7106358B2 (en) Method, system and apparatus for telepresence communications
Fehn et al. Interactive 3-DTV-concepts and key technologies
US8896655B2 (en) System and method for providing depth adaptive video conferencing
US20080158345A1 (en) 3d augmentation of traditional photography
EP1589758A1 (en) Video conference system and method
US8228327B2 (en) Non-linear depth rendering of stereoscopic animated images
US7224382B2 (en) Immersive imaging system
US20150348327A1 (en) Head Mounted Device (HMD) System Having Interface With Mobile Computing Device for Rendering Virtual Reality Content
US20150054913A1 (en) Image stitching
US20030210461A1 (en) Image processing apparatus and method, printed matter production apparatus and method, and printed matter production system
US20150213650A1 (en) Presentation of enhanced communication between remote participants using augmented and virtual reality
Tanimoto et al. FTV for 3-D spatial communication
US20080024594A1 (en) Panoramic image-based virtual reality/telepresence audio-visual system and method
US20120314077A1 (en) Network synchronized camera settings
US20100225732A1 (en) System and method for providing three dimensional video conferencing in a network environment
US20150055937A1 (en) Aggregating images and audio data to generate virtual reality content
CN101453662A (en) Stereo video communication terminal, system and method
US20120169838A1 (en) Three-dimensional video conferencing system with eye contact
CN101534413A (en) System, method and apparatus for remote representation
Kim et al. TeleHuman: effects of 3d perspective on gaze and pose estimation with a life-size cylindrical telepresence pod
US6583808B2 (en) Method and system for stereo videoconferencing
US20120274736A1 (en) Methods and systems for communicating focus of attention in a video conference
US20100225735A1 (en) System and method for providing three dimensional imaging in a network environment
US20110316853A1 (en) Telepresence systems with viewer perspective adjustment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17736831

Country of ref document: EP

Kind code of ref document: A1