WO2008125593A2

WO2008125593A2 - Virtual reality-based teleconferencing

Info

Publication number: WO2008125593A2
Application number: PCT/EP2008/054359
Authority: WO
Inventors: Philipp Christian Berndt; Burckhardt Ruben Joseph Jason Bonello; Matthias Welk; Paul Jonathan Mccabe; Marc Werner Fleischmann; Reinhard KÖHN
Original assignee: Musecom Ltd.
Priority date: 2007-04-14
Filing date: 2008-04-10
Publication date: 2008-10-23
Also published as: EP2145465A2; CN101690150A; WO2008125593A3

Abstract

A virtual reality environment is applied to teleconferencing such that the environment is used to enter into a teleconference. According to a first aspect, the invention provides a method of controlling volume of sound data during a teleconference, the method comprising providing a virtual representation. including objects that represent users in the teleconference; and controlling the volume of the sound data according to how the users change location and relative orientation of their objects in the virtual representation. It is also preferred according to a second aspect that a user is represented by an avatar in the virtual reality environment, and wherein the user can control its avatar to move around the virtual reality environment. The method may allow a user to meet another through intuitive actions of the user's avatar. The method may further comprise accepting control inputs from the user to control gestures of the user's avatar.

Description

VIRTUAL REALITY-BASED TELECONFERENCING

FIELD OF THE INVENTION

The invention relates to a virtual reality environment being applied to teleconferencing such that the environment is used to enter into a teleconference.

SUMMARY OF THE INVENTION

According to a first aspect, the invention provides a method of controlling volume of sound data during a teleconference, the method comprising providing a virtual representation including objects that represent users in the teleconference; and controlling the volume of the sound data according to how the users change location and relative orientation of their objects in the virtual representation.

The method preferably comprises the step of changing other audio characteristics of the sound data according to how the users interact with the virtual representation.

It is preferred that objects in the virtual representation also have audio ranges, whereby the volume of the sound data is also controlled according to the audio ranges. The audio ranges are preferably adjustable.

According to the method of the invention, the virtual representation is preferably a virtual environment; and wherein the users are represented by avatars. In such preferred embodiment, volume of sound data between two users is a function of relative orientation of their avatars.

The virtual representation is preferably provided by a server system that computes a sound coefficient for each object that is a sound source with respect to a drain; and wherein for each user, controlling the volume includes applying those sound coefficients to the sound data of their corresponding objects, mixing the modified sound data and supplying the mixed sound data to the drain. The sound data is for example mixed according to

According to an alternative embodiment of the first aspect of the invention there is provided a method comprising: providing a virtual representation; establishing phone connections with a plurality of users, the users represented by objects in the virtual representation, each user representative object being both sound drain and sound source; and for each drain, mixing sound data from different sound sources and providing the mixed data to the user associated with the drain, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain; whereby the users are not directly connected, but instead communicate through a synthesized auditory environment.

The mixing the sound data for each drain preferably includes computing audio parameters for each paired source, each audio parameter controlling sound volume as a function of closeness of its corresponding source to the drain; and adjusting sound data of each paired source with the corresponding audio parameter, mixing the adjusted sound data of the paired sources, and providing the mixed sound data to the user associated with the drain.

The virtual representation preferably includes other objects that are sound sources, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain; and wherein adjusted sound data from the other objects is also mixed and supplied to the drain. The objects preferably include audio ranges.

The topology metric is, for example, virtual distance between a source and a drain, and may include distance and orientation. Audio is preferably clustered to reduce computational burden. Like for the first

embodiment, sound is mixed according to V_dw (t) = vol_d ^■ \\ c_wπ • V_s (t) . n=\

In order to reduce the computation burden of mixing the sound data for each drain, the sound data is mixed preferably only for those sound sources making a significant contribution. The audio ranges of certain objects are preferably automatically set at or near zero, whereby the sound data of those certain objects are excluded from the mixing. It is also preferred that a minimum distance between objects is imposed to reduce the computation burden of mixing the sound data. Preferably some sound data is premixed to reduce the computation burden of mixing the sound data; wherein the premixing includes mixing sound data from a group of sound drains and assigning a single coefficient per drain to the group. It is also encompassed by the invention that direct connections are made between a source and a drain to reduce the computation burden of mixing the sound data.

The first aspect of the invention also encompasses a communications system comprising phone-based teleconferencing means; and means for providing a virtual representation including objects that represent participants in a teleconference, the virtual representation allowing participants to use the phone-based teleconferencing means to enter into teleconferences and to control volume during the teleconferences, the volume controlled according to how the users change location and relative orientation of their objects in the virtual representation. An alternative communications system according to the first aspect comprises a server system for providing a virtual representation; and a teleconferencing system for establishing phone connections with a plurality of users, the users represented by objects in the virtual representation, the teleconferencing system controlling volume during a teleconference according to how the users change location and relative orientation of their representative objects in the virtual representation. It is preferred that each user representative object is both sound drain and sound source; and wherein for each drain, sound data from different sound sources is mixed and the mixed data is provided to the user associated with the drain, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain.

According to a second aspect, the invention provides a method comprising applying a virtual reality environment to teleconferencing such that the environment is used to enter into a teleconference. The environment preferably allows a user to enter without knowing any other in the environment, yet enable the user to meet and hold a teleconference with at least one other.

The step of applying the virtual reality environment preferably includes presenting the virtual reality environment to a user, presenting representations of the user and others in the virtual reality environment, and enabling the user's representation to experience the virtual reality environment, meet the others, and enter into telecon&wences. It is also preferred that the virtual reality environment enables a user to teleconference via a phone or via a VoIP device.

The step of applying the environment may also include starting a session with a user, presenting a virtual reality environment to the user, recognizing a phone call from the user, and adding the phone call to the session. Alternatively, applying the environment may include starting a first session with a user, presenting a virtual reality environment to the user, starting a second session in response to a phone call, and merging the first and second sessions if the phone call is made by the user.

The method according to the second aspect of the invention preferably further comprises calling the user at the user's request so the user can be voice-enabled in the virtual reality environment.

It is also preferred that when a user calls another not represented in the virtual reality environment, a representation of said another is added to the virtual reality environment.

The virtual reality environment preferably enables a user with only a device that cannot display the virtual reality environment to enter into teleconferences and experiences sounds but not sights of the virtual reality environment. According to a preferred embodiment, more than one virtual reality environment can be applied to the teleconferencing. A user can then move into and out of different virtual reality environments. The virtual reality environments are preferably linked, and each of the virtual reality environments is preferably uniquely addressable.

It is also preferred that at least some portion of a virtual reality environment can be private. The virtual reality environment may also have a persistent state, and may overlap a real space.

A user preferably establishes a connection with a location in the virtual reality environment.

It is preferred that a user has an audio range in the virtual reality environment. The audio range is dynamically adjustable.

According to a preferred embodiment, audio between users is attenuated as a function of closeness between the users.

It is also preferred according to the second aspect that a user is represented by an avatar in the virtual reality environment, and wherein the user can control its avatar to move around the virtual reality environment. The method may allow a user to meet another through intuitive actions of the user's avatar. The method may further comprise accepting control inputs from the user to control gestures of the user's avatar.

The volume of sound between the user and another preferably can be a function of relative orientation of their representations in the virtual reality environment.

A user establishes a connection with a location in the virtual reality environment, and a connection is also established with a multimedia source. The user and others preferably view the same multimedia by each viewing a window that displays the multimedia and, at the same time, discussing the displayed multimedia via the teleconferencing. The user and another may share the multimedia view by co- browsing. Alternatively a user shares a multimedia source with another by drag-and-dropping a multimedia representation proximate the other's representation.

The method of the second aspect may further comprise mixing Internet content with phone links, whereby a user can access content on the Internet via a phone interface.

Additional virtual reality environments are preferably available to a user, wherein the user is instead assigned to one of the additional environments based on a characteristic of the user. A user may have multiple profiles, each profile representing a different aspect of the user, wherein the user can switch between multiple profiles. The user may have a profile that can be made public. However, the user has an option of remaining anonymous.

The method may further comprise providing service agents in the virtual reality environment.

According to the second aspect, the invention also provides an apparatus for applying a virtual reality environment to teleconferencing to enable a user to enter the virtual reality environment without knowing any other in the virtual reality environment, yet enable the user to meet and hold a teleconference with others in the virtual reality environment.

Furthermore, the second aspect provides a system comprising: means for teleconferencing; and means for coupling an immersive virtual reality environment with the teleconferencing. The system is preferably web-based.

According to another embodiment, the invention provides a teleconferencing method, comprising entering a virtual reality environment provided by a service provider; navigating an avatar around the virtual reality environment; establishing a phone call with the service provider to become voice-enabled; and talking to voice-enabled others who are represented in the virtual reality environment.

According to a third aspect, the invention provides a communications system comprising a server system for providing a virtual representation including at least one object; and a teleconferencing system for establishing audio communications with an audio-only device; an object in the virtual representation controlled in response to signals from the audio-only device.

It is preferred that at least one of the objects is movable and represents a user of an audio-only device. More preferably, an object representing a user of an audio-only device is an avatar; and wherein signals from the audio-only device cause the avatar to move about the virtual representation. It is preferred that signals from the audio-only device cause the object to move about the virtual representation; and wherein the teleconferencing system allows a user of the audio-only device to speak with other users represented in the virtual representation, but not see the virtual representation.

The server system preferably provides additional virtual representations, and a signal from the audio-only device causes an object representing the user of an audio-only device to go to a different virtual representation. It is preferred that an object representing a user of an audio-only device can be assigned to the virtual representation by dialing directly to that virtual representation.

The virtual representation preferably is a virtual environment, and signals from the audio-only device allow a user to interact with the virtual environment. The audio-only device may be a phone, and the signals may be phone signals. The signals may be dial tone (DTMF) signals, or voice commands.

The communications system preferably further comprises means for providing an audio description of the virtual representation to the audio-only device. The objects that are closer to a user's representative object in the virtual representation are preferably described in greater detail. The virtual representation may also be described from a first person perspective.

It is preferred that a first object in the virtual representation represents an Internet resource; and wherein a user of an audio-only device can access the Internet by controlling the state of the first object.

The teleconferencing system may include a VoIP system for establishing VoIP connections with network- connected devices.

According to a preferred embodiment of the communications system, the user of the audio-only device is represented in the virtual representation for others to see; and the user's representative object indicates audio-only capability.

The third aspect of the invention also provides a system comprising means for providing a virtual representation including objects; means for receiving signals from audio-only devices; and means for controlling states of the objects in response to the signals.

The third aspect of the invention also provides a communications system for providing a virtual environment including a plurality of objects, the objects having changeable states; and for establishing audio communications with audio-only devices; the system controlling the states of the objects in the virtual representation in response to signals from the audio-only devices, such that users of the audio devices can interact with the virtual environment.

Furthermore, the third aspect of the invention also provides a method of controlling objects in a virtual environment comprising receiving signals from audio-only devices; and controlling states of the objects in response to the signals. The method preferably further comprises providing an audio description of the virtual environment to the audio-only device. According to a fourth aspect, the invention provides a method of providing a service comprising: providing a network-accessible virtual environment including objects that represent users of the service; allowing the users to control their representative objects in the virtual environment to personally interact with other users represented in the virtual environment and also to become voice-enabled; and enabling those users who are voice-enabled to speak with other voice-enabled users via phones.

The users preferably control their representative objects via client devices; and allowing the users to control their objects includes receiving commands from the client devices and moving the representative objects in response to the commands. The users may also allowed to control their representative objects via the Internet; and wherein those users who are voice-enabled are enabled to speak with each other via a public switched telephone network.

The fourth aspect of the invention also encompasses interacting with the virtual environment to control audio characteristics in the virtual environment. Objects in the virtual environment have audio ranges, whereby the volume of the sound data is also controlled according to the audio ranges. The users interact as a function of how close together they are. The closeness between two users is measured as a distance between web pages concurrently viewed by those two users. Alternatively closeness between two users is measured as a distance between two coordinates on a web page that is concurrently viewed by those two users.

It is also preferred according to the fourth aspect of the invention that the users are represented by avatars; and wherein volume of sound data between two users is a function of relative orientation of their avatars.

The method preferably allows certain users to personally interact with other users in the virtual environment without seeing the virtual environment. The method also preferably comprises calling a user at the user's request so the user can be voice-enabled in the virtual environment. When a user calls another not represented in the virtual environment, a representative object of said another is preferably added to the virtual environment.

Preferably, multiple virtual environments are provided; and a user can move into and out of different virtual environments.

Communication may also performed by shared, modifiable objects. The users may also be allowed to communicate through intuitive actions of their avatars.

According to a preferred embodiment, users share a multimedia connection by each viewing a window that displays the multimedia connection and, at the same time, discussing the displayed multimedia via phones. The users may share multimedia by co-browsing. A user may also share a multimedia source with another . user by drag-and-dropping a multimedia representation proximate the other user's representative object.

Additional virtual environments may be available to a user, wherein the user is instead assigned to one of the additional virtual environments based on a characteristic of the user. A user may have multiple profiles, each profile representing a different aspect of the user, wherein the user can switch between multiple profiles.

The fourth aspect of the invention also provides a system comprising means for providing a network- accessible virtual environment including objects that represents system users; means for allowing the users to control their representative objects in the virtual environment to personally interact with other users represented in the virtual environment and also to become voice-enabled; and means for enabling those users who are voice-enabled to speak with other voice-enabled users via phones.

The fourth aspect of the invention also provides a system comprising a server system for providing a virtual environment including objects that represents users of the system, the server system allowing the users to control their representative objects in the virtual environment to interact with other users represented in the virtual environment; and a phone system for enabling those users who are voice- enabled to speak with other voice-enabled users via phones. The server system is preferably web-based; and the server system receives commands from client to devices to control objects in the virtual environment; and the phone system enables at least some users to speak via a public switched telephone network.

According to a fifth aspect, the invention provides a communications system comprising a teleconferencing system for hosting teleconferences; and a server system for providing a virtual representation for the teleconferencing system, the virtual representation including objects whose states can be commanded to transition gradually, the server system providing clients to client devices, each client causing its client device to display the virtual representation; each client device capable of generating a command for gradually transitioning an object to a new state in the virtual representation and sending the command to the server system; the server system commanding the clients to transition an object to its new state by a specified time.

The server system preferably causes the teleconferencing system to control audio characteristics in a manner that is consistent with the virtual representation. The teleconferencing system may include a phone system. According to a preferred embodiment of the communications system, when a client device commands an object to transition gradually to a new state, the server system receives the command and generates an event that commands all of the clients to transition the object to the new state by a specified time. The server system also preferably keeps track of objects that transition abruptly; and wherein when a client device commands an object to transition abruptly to a new state, the server system receives the command and generates an event that commands all of the clients to show the object at the new state at a specified time.

Preferably, at least some of the objects are movable and represent users.

In this aspect of the invention it is preferred that the virtual representation is an immersive virtual environment.

The server system preferably manages a master model of object states in time so as to regulate state transitions of the objects in the virtual representation. More preferably, the server system, in response to a command, determines a first time at which an object should start transitioning from a current state and a second time at which the should object reach the new state; and wherein the server system sends start and stop times and the new state to the clients. The server system also computes a movement path including waypoints and arrival times at the waypoints, and sends the movement path to the clients. A client may also compute a transition path and send the transition path to the server system.

The server system is preferably web-based. It is also preferred that the clients are run on virtual machines. The clients may be Flash clients.

The teleconferencing system may include a VoIP system for establishing VoIP with network-connected devices.

The communications system preferably further comprises a sound system for generating sounds for objects in the virtual representation, wherein the server system also synchronizes the objects in the virtual representation with the sounds; and wherein the sound system mixes the synchronized sounds with audio from a teleconference.

In the communications system of the invention, the server system preferably includes a world server for generating data for varying audio characteristics in time of audio between users during a teleconference.

According to the fifth aspect it is also preferred that the virtual representation is a virtual environment, and wherein the communications system further comprises means for allowing audio-only devices to control objects in the virtual environment. The means responds to phone signals to control the objects in the virtual environment. The system may further comprise means for providing an audio description of the virtual representation to audio-only devices.

It is preferred that the teleconferencing system hosts multiple teleconferences among different groups of users; wherein the server system provides additional independent virtual representations and regulates state transitions of the objects in each virtual representation; and wherein the server system filters communications with the clients, sending communications only to those clients needing to transition an obj ect in a particular virtual representation.

The fifth aspect of the invention also provides a communications system for a plurality of client devices, comprising first means for hosting teleconferences; and second means for providing virtual representations that enable the teleconferences, each virtual representation including objects whose states transition gradually, the second means providing clients to at least some of the client devices, each client causing its client device to display a virtual representation; each client device capable of generating a command for gradually transitioning an object to a new state in a virtual representation and sending the command to the second means; the second means commanding the clients to transition an object to roughly the same state at roughly the same time, the second means causing the first means to control audio characteristics of the teleconferences to be consistent with the virtual representations.

The fifth aspect of the invention also provides a method of providing a communications service, the method comprising: hosting a teleconference; providing clients to a plurality of client devices, each client causing its client device to display a virtual representation of the teleconference, the virtual representation including objects whose states transition gradually; waiting for object state transition commands from a client, each object state transition command for gradually transitioning an object to a new state in the virtual representation; and generating an event in response to a command, the event causing each of the clients to transition an object to roughly the same state at roughly the same time.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is an illustration of a system in accordance with an embodiment of the present invention.

Fig. 2 is an illustration of a system in accordance with an embodiment of the present invention.

Fig. 3 is an illustration of a method in accordance with an embodiment of the present invention. Fig. 4 is an illustration of a virtual reality environment in accordance with an embodiment of the present invention. Fig. 5 is an illustration of a state diagram of a virtual reality environment. Figs 6 - 7 are illustrations of a method of supplying audio to a user in accordance with an embodiment of the present invention.

Fig. 8 is an illustration of two avatars facing each other.

Figs 9-10 are illustrations of a method in accordance with an embodiment of the present invention. Fig. 11 is an illustration of methods of reducing the computational burden of sound mixing in accordance with embodiments of the present invention.

Figs 12a- 12c are illustrations of sound mixing in accordance with embodiments of the present invention. Fig. 13 is an illustration of a method in accordance with an embodiment of the present invention.

Fig. 14 is an illustration of services provided by a service provider in accordance with an embodiment of the present invention.

Fig. 15 is an illustration of a method in accordance with an embodiment of the present invention.

Fig. 16 is an illustration of a system in accordance with an embodiment of the present invention.

Fig. 17 is an illustration of a method in accordance with an embodiment of the present invention.

Fig. 18 is an illustration of a method of mixing sound in accordance with an embodiment of the present invention.

Fig. 19 is an illustration of a system in accordance with an embodiment of the present invention.

Fig. 20 is an illustration of a method in accordance with an embodiment of the present invention.

Fig. 21 is an illustration of a method in accordance with an embodiment of the present invention.

Fig. 22 is an illustration of a method in accordance with an embodiment of the present invention. Fig. 23 is an illustration of a timeline in accordance with an embodiment of the present invention.

Fig. 24 is an illustration of a method of mixing sound in accordance with an embodiment of the present invention. Figs 25 — 26 are illustrations of a method of computing waypoints for a moving object in accordance with an embodiment of the present invention. Figs 27a-27d are illustrations of different topologies for a communications system in accordance with the present invention.

Fig. 28 is an illustration of a system in accordance with an embodiment of the present invention.

Fig. 29 is an illustration of a portion of a system in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

Reference is made to Figure 1, which illustrates a teleconferencing system 100 that includes a provider 110 of a teleconferencing service. The service provider 110 applies a virtual reality environment to teleconferencing such that the environment is used to enter into a teleconference. In some embodiments, the environment enables a user to enter the environment without knowing any others in the environment, yet enables the user to meet and hold a teleconference with others in the environment. The term "user" refers to an entity that utilizes the teleconferencing service. The entity could be an individual, a group of people who are collectively represented as a single unit (e.g., a family, a corporation), etc. The term "another" (when used alone) refers to another user. The term "others" refers to other users.

A user can connect to the service provider 110 with a user device 120 that has a graphical user interface. Such user devices 120 include, without limitation, computers, tablet PCs, VoIP phones, gaming consoles, televisions with set-top boxes, certain cell phones, and personal digital assistants. For instance, a computer can connect to the service provider 110 via the Internet or other network, and its user can enter into the virtual reality environment and take part in a teleconference.

A user can connect to the service provider 110 with a user device 130 that does not have a graphical user interface. Such user devices 130 include, without limitation, traditional telephones (e.g., touch tone phones, rotary phones), cell phones, VoIP phones, and other devices that have a telephone interface but no graphical user interface. For instance, a traditional phone can connect to the service provider 110 via a PSTN network, and its user can enter into the virtual reality environment and take part in a teleconference.

A user can utilize both devices 120 and 130 during a single teleconference. For instance, a user might use a device 120 such as a computer to enter and navigate the virtual reality environment, and a touch tone telephone 130 to take part in a teleconference.

Reference is made to Figure 21, which illustrates a method of providing a service that allows for personal interaction between users. The method includes providing a network-accessible virtual environment including objects that represent users of the service (block 2110), allowing the users to control their representative objects in the virtual environment to personally interact with other users represented in the virtual environment and also to become voice-enabled (block 2120), and enabling those users who are voice-enabled to speak with other voice-enabled users via phones (block 2130).

Phones are not limited to any particular type. Examples of phones include PSTN phones (e.g., touch-tone phones) and VoIP phones including soft phones.

When a user becomes voice-enabled, that user can speak with other voice-enabled users who are represented in the virtual environment. As a first example of becoming voice-enabled, a user of a traditional phone can become voice-enabled by placing a call to the service provider. As a second example, a user can become voice-enabled by receiving a call from the service provider. According to Figure 21, the user interacts with the virtual environment to control audio characteristics in the virtual environment (block 2140). For example, the volume of sound data can be controlled. In some embodiments, volume of sound between one user and another is a function of distance between and relative orientation of their representative objects. In some embodiments, the representative objects also have audio ranges.

Audio characteristics other than volume may also be controlled according to how users interact with the virtual environment. For example, filters can be applied to sound data to add reverb, distort sounds, etc. An object's audio characteristics might be changed by applying filters (e.g. reverb, room acoustics) to the object's sound data. Examples of changing audio characteristics include the following. As an avatar walks from a carpeted room into a stone hall, a parameter of a reverb filter is adjusted to add more reverb to the user's voice and avatar's footsteps. As an avatar walks into a metallic chamber, a parameter of an effect filter is adjusted so the user's voice and avatar's footsteps are distorted to sound metallic. When an avatar speaks into a virtual microphone or virtual telephone, a filter (e.g. band pass filter) is applied to the avatar's sound data so the user's voice sound as if it's coming from a loudspeaker system or telephone.

Reference is made to Figure 15, which illustrates a method of controlling volume of sound data during a teleconference. The method includes providing a virtual representation including objects (e.g., avatars) that represent participants (i.e., users) in the teleconference (block 1510), and controlling the volume of the sound data according to how the users change locations and relative orientation of their objects in the virtual representation (block 1520).

In some embodiments, the users' objects have audio ranges. An audio range limits the distance that sound can be received and/or broadcasted. The audio ranges facilitate multiple teleconferences in a single virtual representation.

Audio characteristics other than volume may also be controlled according to how users interact with the virtual representation (block 1530). For example, filters can be applied to sound data to add reverb, distort sounds, etc. Examples are provided below.

A virtual representation is not limited to any particular type. A first type of virtual representation could be similar to the visual metaphorical representations illustrated in Figures 3-5 and 8a-8b of Singer et al. U.S. Patent No. 5,889,843 (a graphical user interface displays icons on a planar surface, where the icons represent audio sources).

A second type of virtual representation is a virtual environment. A virtual environment includes a scene and sounds. A virtual environment is not limited to any particular type of scene or sounds. As a first example, a virtual environment includes a beach scene with blue water, white sand and blue sky. In addition, the virtual environment includes an audio representation of a beach (e.g. waves crashing against the shore, sea gulls cries). As a second example, a virtual environment includes a club scene, complete with bar, dance floor, and dance music (an exemplary bar scene 310 is depicted in Figure 4). As a third example, a virtual environment includes a park with a microphone and loudspeakers, where sounds picked up by the microphone are played over the speakers.

A virtual representation includes objects. An object in a virtual environment has properties that allow a user to perform certain actions on them (e.g., sit on, move, and open). An object (e.g., a Flash® object) in a virtual environment may obey certain specifications (e.g., an API).

At least some of the objects represent users of the communications system 110. These user representative objects could be images, avatars, live video, recorded sound samples, name tags, logos, user profiles, etc. In the case of avatars, live video or photos could be projected on them. The users' representative objects allow their users to see and communicate with other users in a virtual representation. In some situations, a user cannot see his own representative object, but rather sees the virtual representation as his representative object would see it (that is, from a first person perspective).

In some embodiments, the virtual representation is a virtual environment, and the users are represented by avatars. In some embodiments, volume of sound between one user and another is a function of distance between and relative orientation of their avatars. In some embodiments, the avatars also have audio ranges.

Reference is made to Figure 2, which illustrates an exemplary communications system 110 for providing a teleconferencing service. The teleconferencing service may be provided to users having client devices 120 and audio-only devices 130. A client device 120 refers to a device that can run a client and provide a graphical interface. One example of a client is a Flash® client. Client devices 120 are not limited to any particular type. Examples of client devices 120 include, but are not limited to computers, tablet PCs, VoIP phones, gaming consoles, televisions with set-top boxes, certain cell phones, and personal digital assistants. Another example of a client device 120 is a device running a Telnet program.

Audio-only devices 130 refer to devices that provide audio but, for whatever reason, do not display a virtual representation. Examples of audio-only devices 130 include traditional phones (e.g., touch-tone phones) and VoIP phones.

A user can utilize both a client device 120 and an audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and help the user enter into teleconferences. The client device 120 also interacts with the virtual representation to control volume of sound data during a teleconference. The audio-only device 130 is used to speak with at least one other user during a teleconference.

The communications system 110 includes a teleconferencing system 140 for hosting teleconferences. The teleconferencing system 140 may include a phone system for establishing phone connections with traditional phones (landline and cellular), VoIP phones, and other audio-only devices 130. For example, a user of a traditional phone can connect with the teleconferencing system 140 by placing a call to it. The teleconferencing system 140 may also include means for establishing connections with client devices 120 that have teleconferencing capability (e.g., a computer equipped with a microphone, speakers and teleconferencing software).

A teleconference is not limited to conversations between two users. A teleconference may involve many users. Moreover, the teleconferencing system 140 can host one or more teleconferences at any given time.

The communications system 110 further includes a server system 150 for providing clients 160 to those users having client devices 120. Each client 160 causes its client device 120 to display a virtual representation. A virtual representation provides a vehicle by which a user can enter into a teleconference (e.g., initiate a teleconference, join a teleconference already in progress), even if that user knows no other users represented in the virtual representation. The communications system 110 allows a user to listen in on one or more teleconferences. Even while engaged in one teleconference, a user has the ability to listen in on other teleconferences, and seamlessly leave the one teleconference and join another teleconference. A user could even be involved in a chain of teleconferences (e.g., a line of people where person C hears B and D, and person D hears C and E, and so on).

Each client 160 enables its client device 120 to move the user's representative object within the virtual representation. By moving his representative object around a virtual representation, a user can move nearby other representative objects to listen in on conversations and meet other users. By moving his representative object around a virtual environment, a user can experience the sights and sounds that the virtual environment offers.

In a virtual environment, user representative objects have states that can be changed. For instance, an avatar has states such as location and orientation. The avatar can be commanded to walk (that is, make a gradual transition) from its current location (current state) to a new location (new state). Other objects (that don't represent users) in a virtual environment might have states that transition gradually or abruptly. Other objects in the virtual environment have states that can be changed. As a first example, a user can take part in a virtual volleyball game, where a volleyball is represented by an object. Hitting the volleyball causes the volleyball to follow a path towards a new location. As a second example, a balloon is represented by an object. The balloon may start uninflated (e.g., a current state) and expand gradually to a fully inflated size (new state). As a third example, an object represents a jukebox having methods (actions) such as play/stop/pause, and properties such as volume, song list, and song selection. As a fourth example, an object represents an Internet object, such as a uniform resource identifier (URI) (e.g., a web address). Clicking on the Internet object opens an Internet connection.

Different objects can provide different sounds. The sounds of a jukebox might include different songs in a playlist. The sounds of an avatar might include walking sounds. Yet even the walking sounds of different avatars might be different. For instance, the walking sound of an avatar with high heels might be different than that of one wearing flip-flop sandals. Walking sounds may also change subject to the terrain. For instance the walking sound on parquet flooring may be different than that on snow.

With an object in general, one user can change its state, and other users will experience the state change. For example, one user can turn down the volume of a jukebox, and everyone represented in the virtual representation will hear the lower volume.

The virtual environment is network-accessible. For example, the virtual environment may be accessed via the Internet or a local area network (LAN).

The users may control the objects in a virtual environment with client devices. A client device refers to a device that can run a client and provide a graphical interface. One example of a client is a Flash® client. Client devices are not limited to any particular type. Examples of client devices include, but are not limited to computers, tablet PCs, gaming consoles, televisions with set-top boxes, certain cell phones, and personal digital assistants. Another example of a client device is a device running a textual user interface, such as a Telnet program. Yet another example is a mobile phone such as an iPhone running a chat-client such as Google-Talk.

Each client causes its client device to display a virtual environment, including the objects within. A client device generates commands, and those objects are controlled in response to the commands. By moving his representative object around a virtual environment, a user can experience the sights and sounds that the virtual environment offers. By moving his representative object around a virtual environment, a user can interact with other users. For instance, a voice-enabled user may interact with another voice-enabled user by moving into the other user's audio range. An audio range limits the distance that sound can be received and/or broadcasted. The audio ranges facilitate multiple conversations in a single virtual environment.

In general, interaction is a function of "closeness" between two users. Closeness may be measured in terms of distance between two representative objects in a virtual environment. However, closeness is not so limited. Another topology metric may be used to measure closeness. For example, closeness could be Euclidean distance between two representative objects. The distance may even be a real distance between the user and another user or real life object. For instance, the real distance might be the distance between a user in New York City and another user in Berlin. Another topology metric may measure closeness as the distance (in hyperlinks) between web pages currently being viewed by two users. Yet another topology metric may measure closeness as the distance (e.g., pixel distance) between two coordinates on a web page (for example, the distance between two coordinates that are pointed at by two users with their mouse pointers).

A virtual environment could overlap real space. For example, a scene of a real place is displayed (e.g., a map of a city or country, a room). Locations of people in that real place can be determined, for example with GPS-equipped phones. The users whose real locations are known are represented virtually by avatars in their respective locations in the virtual environment. Or, the place might be real, but the locations are not. Instead, a user's avatar wanders to different places to meet different people.

A user can also become voice-enabled via a client device. As a first example, a client device initiates a phone connection by pressing a "Call me" button upon which the service provider calls the user's phone.

As a second example, a client device could command a co-installed VoIP soft-phone (e.g. via XML sockets) to establish a VoIP connection with the service provider. As a third example, an integrated client/phone such as a graphical Flash(R) client could have built-in VoIP capabilities. As a fourth example, a mobile phone could run a GUI+voice application. As a fifth example, a blind user could use a textual (telnet/Braille) client to issue a text command upon which the service provider calls the user's phone.

If one user wants to talk to some others who are not in the virtual environment at that time, that user could request the service provider to send invitations (e.g., via email, instant messaging or SMS messages) to those users. In case of email or instant messaging, a recipient may simply click on a link in the message to load the client and participate in a conversation. Thus, a user can utilize both a client device and a phone to interact with other users. The client device is used to interact with the virtual environment and help the user meet other users. The phone is used to speak with at least one other user. However, some phones (e.g., certain VoIP phones) may also have the functionality of a client device.

Reference is now made to Figure 3, which illustrates an example of how the virtual reality environment can be applied to teleconferencing. In this example, the service provider runs an on-line service that allows a user to start a teleconferencing session (block 200). In some embodiments, the service provider provides teleconferencing services via a web site. Using a web browser, the user enters the web site, and logs into the service, and the service provider starts a session.

After the session is started, a virtual reality environment is presented to the user (block 210). If, for example, the service provider runs a web site, a web browser can download and display a virtual reality environment to the user.

The virtual reality environment includes a scene and (optionally) sounds. A virtual reality environment is not limited to any particular type of scene or sounds. As a first example, a virtual reality environment includes a beach scene, with blue water, white sand and blue sky. In addition to this visualization, the virtual reality environment includes an audio representation of a beach (e.g. waves crashing against the shore, sea gulls cries). As a second example, a virtual reality environment provides a club scene, complete with bar, dance floor, and dance music (an exemplary bar scene 310 is depicted in Figure 4).

A scene in a virtual reality environment is not limited to any particular number of dimensions. A scene could be depicted in two dimensions, three dimensions, or higher.

Included in the virtual reality environment are representations of the user and others. The representations could be images, avatars, live video, recorded sound samples, name tags, logos, user profiles, etc. In the case of avatars, live video or photos could be projected on them. The service provider assigns to each representation a location within a virtual reality environment. Each user has the ability to see and communicate with others in the virtual reality environment. In some embodiments, the user cannot see his own representation, but rather sees the virtual reality environment as his representation would see it (that is, from a first person perspective).

A user can control its representation to move around a virtual reality environment. By moving around a virtual reality environment, the user can experience the different sights and sounds that the virtual reality environment provides (block 220). For instance, a representative object could turn on a jukebox and select songs from a playlist. The jukebox would play the selected songs.

Additional reference is made to Figure 4, which depicts a virtual reality environment including a club scene 310. The club scene 310 includes a bar 320, and dance floor 330. The user is represented by an avatar 340. Others in the club scene 310 are represented by other avatars. An avatar could be moved from its current location to a new location by clicking on the new location in the virtual environment, pressing a key on a keyboard, entering text, entering a voice command, etc. Dance music is projected from speakers (not shown) near the dance floor 330. As the user's avatar 340 approaches the dance floor 330, the music becomes louder. The music is loudest when the user's avatar 340 is in front of the speakers. As the user's avatar 340 is moved away from the speakers, the dance music becomes softer. If the user's avatar 340 is moved to the bar 320, the user hears background conversation (which might be actual conversations between others at the bar 320). The user might hear other background sounds at the bar 320, such as a bartender washing glasses or mixing drinks. Audio representation might involve changing the speaker's audio characteristics by applying filters (e.g. reverb, club acoustics) to the object's sound data. Examples for changing audio characteristics include the following. As an avatar walks from a carpeted room into a stone hall, a parameter of a reverb filter is adjusted to add more reverb to the user's voice and avatar's footsteps. As an avatar walks into a metallic chamber, a parameter of an effect filter is adjusted so the user's voice and avatar's footsteps are distorted to sound metallic. When an avatar speaks into a virtual microphone or virtual telephone, a filter (e.g. band pass filter) is applied to the avatar's sound data so the user's voice sound as if it's coming from a loudspeaker system or telephone.

The user might not know any of the other users represented in the club scene 310. However, the user can enter into a teleconference with another user by becoming voice enabled, and causing his avatar 340 to approach that other user's avatar (the users can start speaking with each other as soon as both avatars are within audio range of each other). Users can use their audio-only devices 130 to speak with each other (each audio-only device 130 makes a connection with the teleconferencing system 140, and the teleconferencing system 140 completes the connection between the audio-only devices 130). The user can command his avatar 340 to leave that teleconference, wander around the club scene 310, and approach other avatars so as to listen in on other conversations and speak with other people. The user can listen in on one or more conversations simultaneously. Even while engaged in one conversation, a user has the ability to listen in on other conversations, and seamlessly leave the one conversation and join another conversation. A user could even be involved in a chain of conversations (e.g., a line of people where person C hears B and D, and person D hears C and E, and so on).

The communications system 110 can host multiple virtual representations simultaneously. The communications system 110 can host multiple teleconferences in each virtual representation. Each teleconference can include two or more people.

If more than one virtual representation is available to a user, the user can move in and out of the different virtual representations. Each of the virtual representations can be uniquely addressable via a unique phone number. The server system 150 can then place each user directly into the selected virtual representation.

Users can reserve and enter private virtual representations to hold private conversations. Users can also reserve and enter private areas of virtual representations to hold private conversations.

This interaction is unlike that of a conventional teleconference. In a conventional teleconference, several parties schedule a teleconference in advance. When the time comes, the participants call a number, wait for verification, and then talk. When the participants are finished talking, they hang up. In contrast, teleconferencing according to the present invention is dynamic. Multiple teleconferences might be occurring between different groups of people. The teleconferences can occur without advance planning. A user can listen in on one or more teleconferences simultaneously, enter into and leave a teleconference at will, and hop from one teleconference to another.

The virtual reality environment just described is considered "immersive." An "immersive" environment is defined herein as an environment with which a user can interact.

Reference is once again made to Figure 3. A user can also move its representation around a virtual reality environment to engage others represented in the virtual reality environment (block 220). The user's representation may be moved by clicking on a location in the virtual reality environment, pressing a key on a keyboard, pressing a key on a telephone, entering text, entering a voice command, etc.

There are various ways in which the user can engage others in the virtual reality environment. One way is by wandering around the virtual reality environment and hearing conversations that are already in progress. As the user moves its representation around the virtual reality environment, that user can hear voices and other sounds.

Another way a user can engage others is by text messaging, video chat, etc. Another way is by clicking on another's representation, whereby a profile is displayed. The profile provides information about the person behind the representation. In some embodiments, images (e.g., profile photos, live webcam feeds) of others who are close by will automatically appear. Still another way is to become voice-enabled via phone (block 230). Becoming voice-enabled allows the user to have teleconferences with others who are voice-enabled. For example, the user wants to have a teleconference using a phone. The phone could be a traditional phone or a VoIP phone. To enter into a teleconference, the user can call the service provider. When making the call by traditional telephone, the user can call a virtual reality environment (e.g., by calling a unique phone number, or by calling a general number and entering a user ID and PIN via DTMF, or by entering a code that the user can find on a web page).

When making the call by VoIP phone, the user can call the virtual reality environment by calling its unique SIP address. A user could be authenticated by appending credentials to the SIP address.

The service provider can join the phone call with the session in progress if it can recognize the user's phone number (block 232). If the service provider cannot recognize the user's phone number, the user starts a new session via Hie phone (block 234). For instance, a representative object could turn on a jukebox and select songs from a playlist. The jukebox would play the selected songs and then the service provider merges the new phone session with the session already in progress (block 236).

Instead of the user calling the service provider, the user can request the service provider to call the user (block 238). For example, a sidebar includes a "CALL button" that the user clicks to become voice- enabled. Once voice-enabled, the user can walk up to another who is voice-enabled, and start talking immediately. A telephone icon over the head of an avatar could be used to indicate that its user is voice- enabled, and/or another graphical sign, such as sound waves, could be displayed near an avatar (e.g. in front of its face) to indicate that it is speaking or making other sounds.

In some embodiments, the user has the option of becoming voice-enabled immediately after starting a session (block 230). This option allows the user to immediately enter into teleconferences with others who are voice-enabled (block 240). A voice-enabled user could even call a person who has not yet entered the virtual reality environment, thereby pulling that person into the virtual reality environment (block 240). Once voice-enabled (block 230), the user remains voice-enabled until the user discontinues the call (e.g., hangs up the phone).

In some embodiments, a user can connect to the service provider with only a single device 120 (e.g., a computer with a microphone and speakers, a VoIP phone) that can navigate the virtual reality environment and also be used for teleconferences. For instance, a user connects to the web site via the Internet, is automatically voice-enabled, meets others in the virtual reality environment, and enters into teleconferences (indicated by the line that goes directly from block 210 to block 240). VoDP offers certain advantages. VoIP on a broadband connection enables a truly seamless persistent connection that allows a user to "hang out" casually in one or more environments for a long time. Every now and then, something interesting might be heard, or someone's voice might be recognized, whereby the user can pay more attention and just walk over to chat. Yet another advantage of VoIP is that stereo sound connections can be easily established.

In some embodiments, the service provider runs a web site, but allows a user to log into the teleconferencing service and enter into a teleconference without accessing the web site (block 260). A user might only have access to a touch-tone telephone or other device 130 that can't access the web site or display the virtual reality environment. Or the user might have access to a single device that can either access the web site or make phone calls, but not both (e.g., a cell phone). Consider a traditional telephone. With only the telephone, the user can call a telephone number and connect to the service provider. The service provider can then create a representation of the user in virtual reality environment. Via telephone signals (e.g., DTMF, voice control), the user can move its representation around in the virtual reality environment, listen to other conversations, meet other people and experience the sounds (but not sights) of the virtual reality environment. Although the user cannot see its representation, others who access the web site can see the user's representation.

A teleconference is not limited to conversations between a user and another (e.g., a single person). A teleconference can involve many others (e.g., a group). Moreover, others can be added to a teleconference as they meet and engage those already in the teleconference. And once engaged in one teleconference, a person has the ability to "listen in" on other teleconferences, and seamlessly leave the one teleconference and join another teleconference. A user could even be involved in a chain of teleconferences (e.g., a line of people where person C hears B and D, and person D hears C and E, and so on).

If more than one virtual reality environment is available to a user, the user can move into and out of the different environments, and thereby meet even more different groups of people. Each of the virtual reality environments can be uniquely addressable via an Internet address or a unique phone number. The service provider can then place each user directly into the selected target virtual reality environment. Users can reserve and enter private virtual reality environments to hold private conversations. Users can also reserve and enter private areas of public environments to hold private conversations. A web browser or other graphical user interface could include a sidebar or other means for indicating different environments that are available to a user. The sidebar allows a user to move into and out of different virtual reality environments, and to reserve and enter private areas of a virtual reality environment. A service provider can host multiple teleconferences in a virtual reality environment. A service provider can host multiple virtual reality environments simultaneously. In some embodiments a user can be in more than one virtual reality environment simultaneously.

Reference is now made to Figure 5, which illustrates a state diagram of a virtual reality environment (directed arrows in the diagram indicate actions). The state of a virtual reality environment may be persistent in that it continues to exist throughout many user sessions and it continues to exist through the actions of different users. This allows a virtual reality environment to be modified by one user, and the modifications observed by others. For example, graffiti can be written on walls, a light switch in a virtual reality environment could be switched on and off, etc.

Objects in the virtual reality environment can be added, removed, and moved by users. Examples of objects include sound sources (e.g:, music boxes, bubbling fish tanks), data objects (e.g., a modifiable book with text and pictures), visualized music objects, etc. Objects can have properties that allow a user to perform certain actions on them. A user could sit on a chair, open a window or operate a juke box. Objects could have profiles too. For example, a car in a virtual show room could have a make, model, year, top speed, number of cylinders, etc.

The persistent state also allows "things" to be put on top of each other. A file can be dropped onto a user or dropped onto the floor as a way of sharing the file with the user. A music or sound file could be dropped on a jukebox. A picture or video could be dropped on a projector device to trigger playback / display. A multimedia sample (e.g., an audio clip or video clip containing a message) could be "pinned" to a whiteboard.

The persistent state also allows for meta-representations of files. These meta-representations may be icons that offer previews of an actual file. For example, an audio file might be depicted as a disk; an image file might be depicted as a small picture (maybe in a frame), etc.

A virtual reality environment could overlap real space. For example, a scene of a real place is displayed (e.g., a map of a city or country, a room). Locations of people in that real place can be determined, for example with GPS phones. The participating people whose real locations are known are represented virtually by avatars in their respective locations in the virtual reality environment. Or, the place might be real, but the locations are not. Instead, a user's avatar wanders to different places to meet different people.

Different virtual reality environments could be linked together. Virtual reality environments could be linked to form a continuous open environment, or different virtual reality environments could be linked in the same way web pages are linked. There can be links from one virtual reality environment to another environment. There could be links from a virtual reality environment, object or avatar to the web, and vice versa. As examples, a link from a user's avatar could lead to a web version of that user's profile. A link from a web page or a unique phone number could lead to a user's favorite virtual reality environment or a jukebox play list.

Reference is now made to Figure 6, which illustrates how a user experiences audio in a virtual reality environment. The user has a location in the environment and establishes an audio connection with that location.

At block 510, locations of all sound sources in the virtual reality environment are determined. Sound sources include objects in the virtual reality environment (e.g., a jukebox, speakers, a running stream of water), and representations of those users who are talking.

At block 512, closeness of each sound source to the user's representation is determined. The closeness is a function of a topology metric. In the virtual reality environment, the metric could be Euclidean distance between the user and the sound source. The distance may even be a real distance between the user and the source. For instance, the real distance might be the distance between a user in New York City and a sound source (e.g., another user) in Berlin.

At block 514, audio streams from the sound sources are weighted as a function of closeness to the user's representation. Sound sources closer to the user's representation would receive higher weights (sound louder) than sound sources farther from the user's representation.

At block 516, the weighted streams are combined and presented to the user. Sounds from all sources available to the user are processed (e.g. alienated, filtered, phase-shifted) and mixed together and supplied to the user. The sounds do not include the user's own voice. The audio range of the user and each sound source can have a geometric shape or a shape that simulates real life attenuation.

Additional reference is made to Figure 7, which illustrates the use of an audio range to perform additional attenuation of sound in a virtual reality environment. A user's representative object is at location Pw and three other objects are at locations P_x, P_γ and P_z. Let MIX_W be the sound heard by the user represented at location P_w. In a simple sound model, MIX_W may be expressed as

MIXw = aVχ+bVγ+cV_z where Vx, V_γ, and V_z are sound data from the objects at locations Pχ₅ Py and Pz, and where a, b and c are sound coefficients. In this simple model, the volume of sound data V_x is adjusted by coefficient a, the volume of sound data V_γ is adjusted by coefficient b, and the volume of sound data V_z is adjusted by coefficient c.

The value of each coefficient may be inversely proportional to the distance between the corresponding sound source and the user's representative object. As such, sound gets louder as the user's object and the sound source move closer together, and sound gets softer as they move farther apart. The server system generates the sound coefficients. However, the volume control is not limited to a topology metric such as distance. That is, closeness of two objects is not limited to distance.

Each object may have an audio range. The audio range is used to determine whether sound is cut off. The audio ranges of the objects at locations Pw and Pz are indicated by circles E_wand E₂. Audio ranges of the representations at locations P_x and P_γ are indicated by ellipses E_x and Ey. The elliptical shape of an audio range indicates that the sound from its audio source is directional or asymmetric. The circular shape indicates sound that the sound is omni-directional (that is, projected equally in all directions).

In some embodiments, coefficient c=0 when location Pz is outside the range E_w, and coefficients a=l and b=l when locations P_x and Py are within the range E_w. In other embodiments, a coefficient may vary between 0 and 1. For instance, a coefficient might equal a value of zero at the perimeter of the range, a value of one at the location of the user's representative object, and a fractional value therebetween.

In some embodiments, topology metrics might be used in combination with the audio range. For example, a sound will fade as the distance between the source and the user's representative object increases, and the sound will be cut off as soon as the sound source is out of range.

The audio range may be a receiving range or a broadcasting range. If a receiving range, a user will hear other sources within that range. Thus, the user will hear other users whose representative objects are at locations P_x and P_γ, since the audio ranges E_x and Ey intersect the range E_w- The user will not hear another whose representative object is at location P_z, since the audio range E_w does not intersect the range Ez-

If the audio range is a broadcasting range, a user hears those sources in whose broadcasting range he is. Thus, the user will hear the user whose representative object is at location P_x, since location P_w is within the ellipse E_x. The user will not hear those users whose representative objects are at locations Py and P_z, since the location Pw is outside of the ellipses E_γ and E_z. In some embodiments, the user's audio range is fixed. In other embodiments, the user's audio range can be dynamically adjusted. For instance, the audio range can be reduced if a virtual environment becomes too crowded. Some embodiments might have a function that allows for private conversations. That function may be realized by reducing the audio range (e.g. to a whisper) or by forming a disconnected "sound bubble." Some embodiments might have a "do not disturb" function, which may be realized by reducing the audio range.

Different audio ranges may have different shapes and sizes, different attenuation functions, directionality/orientation, state dependent attenuation, etc.

As for objects representing users, avatars offer certain advantages over other types of objects. Avatars allow one user to interact with another.

In some embodiments, metrics might be used in combination with the audio range. For example, a sound will fade as the distance between the source and the user increases, and the sound will be cut off as soon as the audio source is out of range.

In some embodiments, sounds from a user may be projected equally in all directions (that is, sound is omni-directional). In other embodiments, the sound projection may be directional or asymmetric.

User representations are not limited to avatars. However, avatars offer certain advantages. Avatars allow one user to meet another user through intuitive actions. All a user need do is control its avatar to walk up to another avatar and face it. The user can then introduce himself, and invite another to enter into a teleconference.

Another intuitive action is realized by controlling the gestures of the avatars. This can be done to convey information from one user to another. For instance, gestures can be controlled by pressing buttons on a keyboard or keypad. Different buttons might correspond to gestures such as waving, kissing, smiling, frowning etc. In some embodiments, the gestures of the user can be monitored via a webcam, corresponding control signals can be generated, and the control signals can be sent to the service provider. The service provider can then use those control signals to control the gesture of an avatar.

Yet another intuitive action is realized by the orientation of two avatars. For instance, the volume of sound between two users may be a function of relative orientation of the two avatars. Avatars facing each other will hear each other better than one avatar facing away from the other, and much better than two avatars facing in different directions. Reference is made to Figure 8, which shows two avatars A and B facing in the directions of the arrows. The avatars A and B are facing each other directly if angles α and β between the avatars' attitude and their connecting line AB equal zero. Assume avatar A is a speaker and avatar B is a listener. The value of the attenuation function can vary differently for changes to α and β. In this case the attenuation is asymmetrical. One advantage of orientation-based attenuation is allowing a user to take part in one conversation, while casually hearing other conversations.

The attenuation may also be a function of the distance between avatars A and B. The distance between avatars A and B may be taken along line AB.

A sound model may be based on direction, orientation, distance and states of the objects associated with the sound sources and sound drains. As an example of a state, the volume or audio range of sound data might be reduced if an object is in a whisper mode, or the volume or audio range might be increased if the object is in yell mode. The volume heard by an object or its receiving range could be reduced if that object is in a do-not-disturb mode. A sound model may also consider others factors that influence the volume of sound data. For instance a user's broadcasting audio range could be increased when he is detected to be shouting and reduced when he is detected to be whispering.

Let V_dw(t) be the sound heard by the user represented by the object at location P_w and associated with sound drain w. In such a model, V_dw(t) may be expressed as

with

where

vol_d is the drain gain of sound drain w, s_msκ is the total number of sound sources in the environment, V_s (t) is the sound produced by sound source n,

s_" is the source gain of sound source n, f_wn (d_m, ■> ^a _nw » A_w ' ^U _n ' ^U _w ) ^{ls an} attenuation function determining how source n is attenuated for drain w, d_m is the distance between w and n, a_m, is the angle between the sound emission direction (speaking direction) and the connecting line of user w and sound source n, and

P_n^ is the angle between the connecting line of user w and sound source n and the sound reception direction (hearing direction),

U_n is the state of the object associated with sound source n, and u_w is the state of the object associated with sound drain w.

The state U_n of the object associated with sound source n reflects any other factor or set of factors that influence the volume of sound from the sound source n. For instance, the state U_n might reduce the volume if the object associated with sound source n is in a whisper mode, or it might increase the volume if the object associated with sound source n is in a yell mode. Similarly, the state of the object u_w associated with sound drain w reflects any other factor or set of factors that influence the volume of sound heard by the sound drain w. For instance, the state u_w could reduce the volume of the sound heard by the sound drain w if the object associated with sound drain w is in a do-not-disturb mode.

Reference is made to Figures 9 and 10, which illustrate a first approach for controlling the volume of sound data in a teleconference. The server system generates sound coefficients, and the teleconferencing system uses the sound coefficients to vary the audio characteristics (e.g., audio volume) of sound data that goes from sound sources to a sound drain. A sound drain refers to the representative object of a user who can hear sounds in the virtual environment. A sound coefficient can vary the audio volume or other audio characteristics as a function of closeness of a sound source and a sound drain.

A virtual environment is provided (block 710), and phone connections are established with a plurality of users (block 720). The users are represented by objects in the virtual environment. Each user representative object can be both sound drain and sound source.

At block 730, locations of all sound sources and sound drains in the virtual environment are determined. Sound sources include objects that can provide sound in a virtual environment (e.g., a jukebox, speakers, a running stream of water, users' representative objects). A sound source could be multimedia from an Internet connection (e.g., audio from a YouTube video).

The following functions are performed for each sound drain in the virtual environment. At block 740, closeness of each sound source to a drain is determined. This function is performed for each sound drain in the virtual environment. The server system can perform this function, since it keeps track of the object states.

At block 750, a coefficient for each drain/source pair is computed. Each coefficient varies the volume of sound from a source as a function of its closeness to the drain. The closeness is not limited to distance. This function may also be performed by the server system, since it maintains information about closeness of the objects. The server system supplies the sound coefficients to the teleconferencing system.

The sound from a source to a drain can be cut off (that is, not heard) if the drain is outside of an audio range of the source (in the case of a broadcasting range). The sound coefficient would reflect such cut-off (e.g., by being set to zero or close to zero). The server system can determine the range, and whether cutoff occurs, since it manages the object states.

At block 760, sound data from each sound source is adjusted with its corresponding coefficient. As a result, the sound data from the sound sources are weighted as a function of closeness to a drain.

At block 770, the weighted sound data is combined and sent back on a phone line or VoIP channel to a user. Thus, an auditory environment is synthesized from the sounds of different objects, and the synthesized environment is heard by the user.

The process at blocks 730-750 is performed continuously, since locations, orientations and other states in the virtual representation are changed continuously. The process at blocks 760-770 is also performed continuously, as the sound data is streamed continuously (e.g., in chunks of 100ms).

Consider a virtual environment in which there are n sound sources for each of n drains. The computation effort for mixing sound data from all n sources for each drain will be in the order of n² (i.e., O(n²)). This can pose a large scaling problem, especially for large teleconferences and dense crowds.

Reference is now made to Figure 11. Any of the following approaches, alone or in combination, could be used to reduce the computation burden.

At block 1010, for each drain, the sound data is mixed only for those sound sources making a significant contribution. As a first example, the subset includes the loudest sound sources (i.e., those with the highest coefficients). As a second example, the subset includes only those representative objects whose users are actually talking. As a third example, sound sources that are not active (i.e., sound sources that are not providing sound data) are excluded. If a user's object is not voice-enabled, it can be excluded. If a play feature of a jukebox is off, the jukebox can be excluded.

At block 1008, audio ranges of certain objects may be automatically set at or near zero, so that their coefficients are set at or near zero. The sound data from these objects would be excluded at block 1010.

At block 1020, a minimum distance between objects may be enforced. This policy would prevent users from forming dense crowds.

At block 1030, the teleconferencing system could also premix sound data for groups of sound sources. The premixed sound data of a group could be mixed with other audio data for a sound drain. An example of premixing is illustrated in Figure 12c.

At block 1040, in addition to or instead of sound mixing illustrated in Figures 9 and 10 (that is, instead of generating a synthesized environment), the teleconferencing system could make direct connections between a source and a drain. This might be done if the server system determines that two users can essentially only hear each other. Making direct connections can preserve computing power and decrease latencies.

Reference is now made to Figure 12a, which shows a line of sound sources (SourceO to Source3) and five objects (Drain5 to Drain9) listening to those sound sources. The five drains (Drain5 to Drain 9) are in different positions with respect to the line of sound sources.

Figure 12b illustrates a sound mixer 1110 that mixes sound data from the line of sources (SourceO to Source3) without premixing. Each sound source (SourceO to Source3) has a coefficient for each sound drain (the coefficients are represented by filled circles and exemplary values are also provided). The sound mixer 1110 performs four mixing operations per sound drain for a total of 20 mixing operations.

Figure 12c illustrates an alternative sound mixer 1120, which premixes the sound data from the line of sources (SourceO to Source3). The sound sources (SourceO to Source3) are grouped, and the sound mixer 1120 mixes the sound data from the group. Four mixing operations are performed during premixing.

The sound mixer 1120 computes a single coefficient for each drain and performs one mixing operation per drain. The value of a coefficient may be a function of distance from its drain to the group (e.g., distance from a drain to a centroid of the group). Thus, the sound mixer 1120 performs an additional five mixing operations for a total of nine mixing operations. The coefficients that premix sound data into a single sound source for a group could be determined with respect to a certain point such as a centroid (such coefficients are indicated by values 0.8, 0 .9, 0.9, and 0.8), or some other metric. Alternatively, the values could all be set to one, which means that each drain would hear the same volume from each sound source (SourceO-SourceS). However, different drains would still hear different volumes from the group (as indicated by the different coefficients 0.97, 0.84, 0.75, 0.61 and 0.50).

Sound sources may be grouped in a way that minimizes the mixing operations, yet keeps the deviation from the ideal sound (that is, sound without pre-mixing) at an acceptable level. Various clustering algorithms can be used to group the sound sources (e.g., a K-means algorithm; or by iteratively clustering the mutual nearest neighbors).

Additional sources can be mixed without premixing. Figure 12c illustrates a fifth sound source (Source4) that is not grouped with the line of sound sources. The fifth sound source is assigned its own coefficients for Drain3 and Drain7. Thus, a single mixing operation is performed for Drain3, and two mixing operations are performed for Drain7.

Reference is now made to Figure 13 which illustrates different activities that may be facilitated by a service provider. Connections are not limited to audio sources. Connections can also be made with multimedia sources (block 1310). Examples of such multimedia include, without limitation, video streams, text chat messages, instant messenger messages, avatar gestures or moves, mood expressions, emoticons, and web pages.

Multimedia sources could be displayed (e.g., viewed, listened to) from within a virtual reality environment (block 1320). For example, a video clip could be viewed on a screen inside a virtual reality environment. Sound could be played from within a virtual reality environment.

Multimedia sources could be viewed in separate popup windows (block 1330). For example, another instance of a web browser is opened, and a video clip is played in it.

The virtual reality environment facilitates sharing the multimedia (block 1340). Multiple users can share a media presentation (e.g., view it, edit it, browse, listen to it), and, at the same time, discuss the presentation via teleconferencing. In some embodiments, one of the users can control the presentation of the multimedia. This feature allows all of the browsers to be synchronized, so all users can watch a presentation at the same time. In other embodiments, each user has control over the presentation, whereas the browsers are not synchronized. A multimedia connection can be shared in a variety of ways. One user can share a media connection with another user by drag-and-dropping a multimedia representation onto the other user's avatar, or by causing its avatar to hand the multimedia representation to the other user user's avatar.

As a first example, a first user's avatar drops a video file photo or document on a second user's avatar. Both the first and second user then watch the video in a browser or media player, while discussing it via teleconferencing.

As a second example, a first user's avatar drops a URL on a second user's avatar. A web browser for each user opens, and downloads content at the URL. The first and second users can then co-browse, while discussing the content via teleconferencing.

As a third example, a user presents something to the surrounding avatars. All users within range get to see the presentation (first, however, they might be asked whether they want to see the presentation).

The multimedia connection provides another advantage: it allows telephones and other devices without browsers to access content on the Internet. For example, a multimedia connection could provide streaming audio to a virtual reality environment. The streaming audio would be an audio source that has a specific location in the virtual reality environment. A user with only a standard telephone can wander around the virtual reality environment and find the audio source. Consequently, the user can listen to the streaming audio over the telephone.

Reference is now made to Figure 14. The service provider 1400 could provide other services. One service is automatically assigning a user to certain virtual reality environments based on a characteristic of the user (block 1410). The characteristic may be a parameter in the user's profile, or an interest of the user, or a mood of the user, or some other characteristic.

A user may have multiple profiles. Each profile represents a different aspect of the user. Different profiles give the user access to certain virtual reality environments. A user can switch between profiles during a session.

The profile can state a need. For example, a profile might reveal that the user is shopping for an automobile. The user could be automatically assigned to a virtual show room, including representations of automobiles, and representations of salesmen. In some embodiments, user profiles can be made public, so they can be viewed by others. For instance, a first user can click on the avatar of a second user, and the profile of that second user appears as a form of introduction. Or, a first user might wander around a virtual reality environment, looking for people to meet. The first user could learn about a second user by clicking on the avatar of that second user. In response, the second user's profile would be displayed to the first user. If the profile does not disclose the user's real name and phone number, the second user stays anonymous.

Another service is providing agents (e.g. operators, security, experts) that offer services to those in the virtual reality environment (block 1420). As a first example, users might converse while watching a movie, while an agent finds information about the cast. As a second example, a user chats with another person, and the person requests an agent to look up something with a search engine. As a third example, an agent identifies lonely participants that seem to match and introduces them to each other.

Another service is providing a video chat service (block 1440). For instance, the service provider might receive web camera data from different users, and associate the web camera data with the different users such that a user's web camera data can be viewed by certain other users.

Yet another service is hosting different functions in different virtual reality environments (block 1430). Examples of different functions include, without limitation, social networking, business conferencing, business-to-business services, business-to-customers services, trade fairs, conferences, work and recreation places, virtual stores, promoting gifts, on-line gambling and casinos, virtual game and entertainment shows, virtual schools and universities, on-line teaching, tutoring sessions, karaoke, pluggable (team) games, casinos, award-based contests, clubs, concerts, virtual galleries, museums, and demonstrations or any scenario available in real life. A virtual reality environment could be used to host a television show or movie.

Reference is made to Figure 16, which illustrates an exemplary web-based communications system 1600. The communications system 1600 includes a VE server system 1610. The "VE" refers to virtual environment.

The VE server system 1610 hosts a website, which includes a collection of web pages, images, videos and other digital assets. The VE server system 1610 includes a web server 1612 for serving web pages, and a media server 1614 for storing video, images, and other digital assets.

One or more of the web pages embed client files. Files for a Flash® client, for instance, are made up of several separate Flash® objects (.swf files) that are served by the web server 1612 (some of which can be loaded dynamically when they are needed). A client is not limited to a Flash® client. Other browser-based clients include, without limitation, Java™ applets, Microsoft® Silverlight™ clients, .NET applets, Shockwave® clients, scripts such as JavaScript, etc. A downloadable, installable program could even be used.

Using a web browser, a client device downloads web pages from the web server 1612 and then downloads the embedded client files from the web server 1612. The client files are loaded into the client device, and the client is started. The client starts running the client files and loads the remaining parts of the client files (if any) from the web server 1612.

An entire client or a portion thereof may be provided to a client device. Consider the example of a Flash® client including a Flash® player and one or more Flash® objects The Flash® player is already installed on a client device. When .swf files are sent to and loaded into the Flash® player, the Flash® player causes the client device to display a virtual environment. The client also accepts inputs (e.g., keyboard inputs, mouse inputs) that command a user's representative object to move about and experience the virtual environment.

The server system 1610 also includes a world server 1616. The "world" refers to all virtual representations provided by the server system 1610. When a client starts running, it opens a connection with the world server 1616. The server system 1610 selects a description of a virtual environment and sends the selected description to the client. The selected description contains links to graphics and other media for the virtual environment. The description also contains coordinates and appearances of all objects in the virtual environment. The client loads media (e.g., images) from the media server 1614, and projects the images (e.g., in isometric, 3-D).

The client displays objects in the virtual environment. Some of these objects are user representative objects such as avatars. The animated views of an object could comprise pre-rendered images or just-in- time rendered 3D-Models and textures, that is, objects could be loaded as individual Shockwave® objects, parameterized generic Shockwave® objects, images, movies, 3D-Models optionally including textures, and animations. Users could have unique/personal avatars or share generic avatars.

Objects can be loaded on demand, which reduces the initial loading time. Also low quality or generic representations could be loaded first, for example, when an avatar is far away from another object, and higher quality representations could be loaded later, as the avatar gets closer to the object. When a client device wants an object to move to a new location in the virtual environment, its client determines the coordinates of the new location and a desired time to start moving the object, and generates a request. The request is sent to the world server 1616.

The world server 1616 receives a request and updates the data structure representing the "world." The world server 1616 manages each object state in one or more virtual environments, and updates the states that change. Examples of states include avatar state, objects they're carrying, user state (account, permissions, rights, audio range, etc.), and call management. When a user commands an object in a virtual environment to a new state, the world server 1616 commands all clients represented in the virtual environment to transition the state of that object, so client devices display the object at roughly the same state at roughly the same time.

The world server 1616 can also manage objects that transition gradually or abruptly. When a client device commands an object to transition to a new state, the world server 1616 receives the command and generates an event that causes all of the clients to show the object at the new state at a specified time.

The communications system 1600 also includes a teleconferencing system 1620. Some embodiments of the teleconferencing system 1620 may include a telephony server 1622 for establishing calls with traditional telephones. For instance, the telephony server 1622 may include PBX or ISDN cards for making connections for users with traditional telephones (e.g., touch-tone phones) and digital phones. The telephony server 1622 may include mobile network or analog network connectors. The cards act as the terminal side of a PBX or ISDN line and, in cooperation with associated software perform all low- level signaling for establishing phone connections. Events (e.g. ringing, connect, disconnect) and audio data in chunks (of e.g. 100ms) are passed from a card to a sound system 1626. The sound system 1626, among other things, mixes the audio between users in a teleconference, mixes any external sounds (e.g., the sound of a jukebox, a person walking, etc) and passes the mixed (drain) chunks back to the card and, therefore, to a user.

Some embodiments of the teleconferencing system 1620 may transcode calls into VoIP, or receive VoIP streams directly from third parties (e.g., telecommunication companies). In those embodiments, events would originate not from the cards, but transparently from an IP network.

Some embodiments of the teleconferencing system 1620 may include a VoIP server 1624 for establishing connections with users who call in with VoIP phones. In this case, a client (e.g., the client 160 of Figure 1) may contain functionality by which it tries to connect to a VoIP soft-phone audio-only device using, for example, an xml-socket connection. If the client detects the VoIP phone, it enables VoIP functionality for the user. The user can then (e.g., by the click of a button) cause the client to establish a connection by issuing a CALL command via the socket to the VoIP phone which calls the VoIP server 1624 while including information necessary to authenticate the VoIP connection.

The world server 1616 associates each authenticated VoIP connection with a client connection. The world server 1616 associates each authenticated PBX connection with a client connection.

For devices that are enabled to run Telnet sessions, a user could establish a Telnet session to receive information, questions and options, and also to enter commands. For Telnet-enabled devices, the means 1617 could provide a written description of a virtual environment.

The telephony system 1622 can also allow users of audio-only devices to control objects in a virtual environment. A user with only an audio-only device alone can experience sounds of the virtual environment as well as speak with others, but cannot see sights of the virtual environment. The telephony system 1622 can use phone signals (e.g., DTMF, voice commands) from phones to control the actions of their corresponding representation in the virtual environment.

The audio-only device generates signals for selecting and controlling objects in the virtual representation, and the telephony system 1622 translates the signals and informs the server system to take action, such as changing the state of an object. As examples, the signals may be dial tone (DTMF) signals, voice signals, or some other type of phone signal. Consider a touch tone phone. Certain buttons on the phone can correspond to commands. A user with a touch phone or DTMF-enabled VoIP phone can execute a command by entering that command using DTMF tones. Each command can be supplied with one or more arguments. An argument could be a phone number or other number sequence. In some embodiments, voice commands could be interpreted and used.

The server system can also include a means 1617 for providing an audio description of the virtual environment. For example, a virtual environment can be described to a user from the perspective of the user's avatar. Objects that are closer to the user's avatar might be described in greater detail. The description may include or leave out detail to keep the overall length of the description approximately constant. The user can request more detailed descriptions of certain objects, upon which additional details are revealed. The server system can also generate an audio description of options in response to a command. The teleconferencing system mixes the audio description (if any) and other audio, and supplies the mixed sound data to the user's audio-only device. For Telnet-enabled devices, the means 417 could provide a written description of a virtual environment. For other audio-only devices, the means 417 could include a speech synthesis system for providing a spoken description, which is heard on the audio- only device. A sound system 1626 can play sound clips, such as sounds in the virtual environment. The sound clips are synchronized with state changes of the objects in the virtual environment. The sound system 1626 starts and stops the sound clips at the state transition start and stop times indicated by the world server 1616.

The sound system 1626 can mix sounds of the virtual environment with audio from the teleconferencing. Sound mixing is not limited to any particular approach, and may be performed as described above. The teleconferencing system may receive a list of patches, sets of coefficients, and goes through the list. The teleconferencing system can also use heuristics to determine whether it has enough time to patch all connections. If not enough time is available, packets are dropped.

The VE server system 1610 may also include one or more servers that offer additional services. For example, a web container 1618 might be used to implement servlet and JavaServer Pages (JSP) specifications to provide an environment for Java code to run in cooperation with the web server 1612.

Figure 29 shows several other services that can be provided by the VE server system 1610 additionally or alternatively to the services shown in Figure 16.

A service repository 1632 provides information about the other services offered by the VE server system.

A context repository 1634 provides information for bootstrapping and configuring the client at startup and at runtime, such as how to set up the virtual environment, dependencies between modules and behavior of objects. Preferably this is done using a domain specific language.

An object repository 1636 provides the clients with information about the environment and the object it contains. For this it holds the information about the objects in a format that allows different object implementations or different versions thereof, to use and modify it, e.g. XML format.

An authentication service 1641 verifies the user's credentials, such as nickname and password, and supplies the client with a token that can be used for authenticating with the other services. The token may be stored in a cookie.

A profile service 1642 provides user profile data and actions such as asking a user to become one's friend on the user's profile.

An account service 1643 provides information about the user's available funds. A room service 1644 manages rooms by providing methods for clients to enter or leave a virtual environment. Furthermore it tracks all state changes inside virtual environments. All current states of the room and the avatars can be retrieved by clients to get a current snapshot. If a user wants to login into a room, which is currently not handled, the service opens the room and the user will be logged in. Clients can connect to the messaging service to get informed about room changes and to send changes to other clients. It also computes the sound coefficients which it sends to the sound system 1626 and controls the playback of audio samples.

A call service 1645 provides information about phone rates and can be used by the client to initiate phone connections.

A mail service 1646 can be used by the client to send messages to a specific set of destinations, e.g. services that further process those messages.

All servers in the communications system 1600 can be run on the same machine, or distributed over different machines. Communication may be performed by a remote invocation call. For example, an HTTP or HTTPS-based protocol (e.g. SOAP) can be used by the server(s) and network-connected devices to transport the clients and communicate with the clients.

Reference is made to Figures 9 and 24, which illustrates a first approach for mixing sound. The world server 1616 generates data such as sound coefficients, which the sound system 1626 uses to vary the audio characteristics (e.g., audio volume). The sound coefficients or other data vary the audio volume or other audio characteristics as a function of closeness of object pairs.

At block 2410, locations of all sounds sources in a virtual environment are determined. Sound sources include objects in a virtual environment (e.g., a jukebox, speakers, a running stream of water). Sound sources also include those users who are talking. A sound source could be multimedia from an Internet connection (e.g., audio from a YouTube video).

The following functions are performed for each drain in the virtual environment. A drain refers to the representation of a user who can hear sounds in the virtual environment. At block 2420, closeness of each sound source to a drain is determined. This function is performed for each sound drain in the virtual environment. The closeness is not limited to distance. The world server 1616 can perform this function, since it maintains the information about location of the sound sources.

At block 2430, a coefficient for each drain/source pair is generated. Each coefficient varies the volume of sound from a source as a function of its closeness to the drain. This function may also be performed by the world server 1616, since it maintains information about locations of the objects. The world server

1616 supplies the sound coefficients to the sound system 1626.

The sound from a source to a drain can be cut off (that is, not heard) if the source is outside of the audio range of the drain. The coefficient would reflect such cut-off (e.g., by being set to zero or close to zero). The world server 1616 can determine the range, and whether cut-off occurs, since it keeps track of the object states.

At block 2440, audio streams from the audio sources are weighted as a function of closeness to the drain, and the weighted streams are combined and sent back on a phone line or VoIP channel to a user. The sound system 1626 may include a processor that receives a list of patches, sets of coefficients, and goes through the list. The processor can also use heuristics to determine whether it has enough time to patch all connections. If not enough time is available, packets are dropped.

A more preferred approach is shown in Fig. 18. Again, the world server 1616 generates sound coefficients, which the sound system 1626 uses to vary the audio characteristics (e.g., audio volume) of sound data that goes from sound sources to sound drains. A sound drain refers to the representative object of a user who can hear sounds in the virtual environment. A sound coefficient can vary the audio volume or other audio characteristics as a function of closeness of a source and a drain.

At block 1810, locations of all sounds sources in a virtual environment are determined. Sound sources include objects in a virtual environment (e.g., a jukebox, speakers, a running stream of water). Sound sources also include the representative objects of those users who are talking. A sound source could be multimedia from an Internet connection (e.g., audio from a YouTube video).

The following functions are performed for each drain in the virtual environment. At block 1820, closeness of each sound source to a drain is determined. This function is performed for each sound drain in the virtual environment. The closeness is not limited to distance. The world server 1616 can perform this function, since it maintains the information about location of the sound sources.

At block 1830, a coefficient for each drain/source pair is computed. Each coefficient varies the volume of sound from a source as a function of its closeness to the drain. This function may also be performed by the world server 1616, since it maintains information about locations of the objects. The world server 1616 supplies the sound coefficients to the sound system 1626.

The sound from a source to a drain can be cut off (that is, not heard) if the source is outside of an audio range of the drain. The coefficient would reflect such cut-off (e.g., by being set to zero or close to zero). The world server 1616 can determine the range, and whether cut-off occurs, since it keeps track of the object states.

At block 1840, sound data from each sound source is adjusted with its corresponding coefficient. As a result, the sound data from the sound sources are weighted as a function of closeness to a drain.

At block 1850, the weighted sound data is combined and sent back on a phone line or VoIP channel to a user. The sound system 1626 may include a processor that receives a list of patches, sets of coefficients, and goes through the list The processor can also use heuristics to determine whether it has enough time to patch all connections. If not enough time is available, packets are dropped.

In addition to or instead of sound mixing illustrated in Figures 9 and 18, to preserve computing power and decrease latencies, the teleconferencing system 1620 could switch together source/drain pairs to direct connections. This might be done if the world server 1616 determines that two users can essentially only hear each other. The teleconferencing system 1620 could also premix some or all sources for several drains whose coefficients are similar, hi the latter case each user's own source may have to be subtracted from the joined drain to yield his drain.

A user can utilize both a client device 120 and an audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and find others to speak with. The audio-only device 130 is used to speak with others.

However, some users might only have access to audio-only devices. Yet, such users can still control objects in a virtual representation. For example, such users can move their representative objects around a virtual representation to listen in on teleconferences, and approach and speak with other users. By moving their representative objects around a virtual environment, a user having only an audio-only device can hear the sounds, but not see the sights, that a virtual environment offers.

Reference is now made to Figure 17. To start a session with only an audio-only device, an audio-only device establishes audio communications with the teleconferencing system (block 1710). With a traditional telephone, the user can call a virtual representation (e.g., by calling a unique phone number, or by calling a general number and entering additional data such as a user ID and PIN, via DTMF). With a VoIP phone, a user could for instance call a virtual representation by calling its unique VoIP address.

The teleconferencing system informs the server system of the session (block 1715). The server system assigns the user to a location within a virtual representation (block 1720). The audio-only device generates signals for selecting and controlling objects in the virtual representation

(block 1730). The signals are not limited to any particular type. As examples, the signals may be dial tone (DTMF) signals, voice signals, or some other type of phone signal.

Consider a touch tone phone. Certain buttons on the phone can correspond to commands. A user with a touch phone or DTMF-enabled VoIP phone can execute a command by entering that command using DTMF tones. Each command can be supplied with one or more arguments. An argument could be a phone number or other number sequence. In some embodiments, voice commands could be interpreted and used.

A command argument might expect a value from a list of options. The options may be structured in a tree so that the user selects a first group with one digit and is then presented the resulting subsets of remaining options and so on. The most probable options could be listed first.

For example a user could press '0' to enter a command menu where all available commands are read to the user. The user can then enter a CALL command (e.g., 2255) followed by the # sign. The user may then be asked to identify the person to call, e.g., by saying that person's name, entering that person's phone number, entering a code corresponding to that person, etc. Instead of pressing a button to enter the command menu the user could speak a catchword, such as "Computer." The teleconferencing system could also detect, process and act upon audio signals before a user enters a command menu. For example the teleconferencing system could analyze the user's voice and detect a mood change and communicate it to the server system. The server system, in response, might modify the user's representative object to reflect that mood change.

Another command could cause an object to move within its virtual environment. Arguments of that command could specify direction, distance, new location, etc.

Another command could allow a user to switch to another virtual environment, and an argument of that command could specify the virtual environment. Another command could allow a user to join a teleconference. Another command could allow a user to request information about the environment or about other users. Another command could allow one user's avatar to take another user's avatar by the hand, whereby the latter avatar would follow (be piggybacked to) the former avatar.

Another command could allow a user to select an object representing an Internet resource, such as a web page. Arguments could specify certain links, URLs or bookmarks. For example, a list of available links could be read to the user, who enters an argument to select a link (e.g., an Internet radio site). In this manner, telephones and other devices without browsers can be used to access content on the Internet. For example, a virtual environment includes an Internet object. When the object is selected, a connection is made to a site that provides streaming audio. The server system supplies the streaming audio to the teleconferencing system, which mixes the streaming audio on the user's phone line.

Another command could allow a user to give another user or a group of users certain rights or access to one or more of his files or directories. Another command could allow a user to transfer objects (e.g., files, tokens or currency units) to other users. Another command could allow a user to record and leave voice messages for other users (voice messages could be converted to text and left as text messages). Another command could allow a user to present media (such as videos, sound samples and images) to other users (e.g., on a virtual screen), change its representative object (e.g., change the mood of an avatar), initiate or participate in polls or play games.

The teleconferencing system receives and translates the signals and informs the server system to take action (block 1740) such as changing the state of an object. The teleconferencing system translates the signals and tells the server system to change the state.

The teleconferencing system can play audio clips, such as sounds in the virtual environment (block 1750).

The server system can also synchronize the sound clips with state changes of the virtual representation.

The server system can also provide an audio description of the virtual environment (block 1750). For example, a virtual environment can be described to a user from the perspective of the user's avatar.

Objects that are closer to the user's avatar might be described in greater detail. The description may include or leave out detail to keep the overall length of the description approximately constant. The user can request more detailed descriptions of certain objects, upon which additional details are revealed. The server system can also generate an audio description of options in response to a command (block 1750).

The teleconferencing system mixes those audio descriptions with the other audio for the user and supplies the mixed sound data to the user's audio-only device (block 1760).

The server system can also generate data for controlling audio characteristics over time (block 1770). For example, volume of a conversation between two users is a function of distance and/or orientation of then- two avatars in the virtual environment. In this example, sound gets louder as the avatars move closer together, and sound gets softer as the avatars move further apart. The server system generates sound coefficients that vary the volume of sound between two users, as a function of the distance between the two users. The coefficients are used by the teleconferencing system to vary sound volume over time (block 1780). In this manner, the server system commands the teleconferencing system to attenuate or modify sounds so the conversation is consistent with the virtual environment. Reference is made to Figure 19, which illustrates an exemplary web-based system 1900 similar to the one shown in Fig. 16. The communications system 1900 includes a VE server system 1910. The "VE" refers to virtual environment.

Client devices are referenced by numeral 1902. Phones are referenced by numeral 1904.

The VE server system 1910 hosts a website, which includes a collection of web pages, images, videos and other digital assets. The VE server system 1910 includes one or more web servers 1912 for serving web pages, and one or more media servers 1914 for storing video, images, and other digital assets.

One or more of the web pages embed client files. Files for a Flash® client, for instance, are made up of one or more separate Flash® objects (.swf files) that are served by the web server 1912 (some of which can be loaded dynamically when they are needed).

A client is not limited to a Flash® client. Other browser-based clients include, without limitation, Java™ applets, Microsoft® Silverlight™ clients, .NET applets, Shockwave® clients, scripts such as JavaScript, etc. A downloadable, installable program could even be used.

Using a web browser, a client device 1902 downloads web pages from a web server 1912 and then downloads the embedded client files from a web server 1912. The client files are loaded into the client device, and the client is started. The client starts running the client files and loads the remaining parts of the client files (if any) from a web server 1912.

An entire client or a portion thereof may be provided to a client device. Consider the example of a Flash® client including a Flash® player and one or more Flash® objects. The Flash® player is already installed on a client device. When .swf files are sent to and loaded into the Flash® player, the Flash® player causes the client device to display a virtual environment. The client also accepts inputs (e.g., keyboard inputs, mouse inputs) that command a user's representative object to move about and experience the virtual environment

The server system 1910 also includes one or more world servers 1916. The "world" refers to a set of representations of the virtual environment provided by the server system 1910. When a client starts running, it opens a connection with a world server 1916. The server system 1910 selects a description of a virtual environment and sends the selected description to the client. The selected description contains links to graphics and other media for the virtual environment. The description also contains coordinates and appearances of all objects in the virtual environment. The client loads media (e.g., images) from a media server 1914, and projects the images (e.g., in isometric, 3-D).

The client displays objects in the virtual environment. Some of these objects are user representative objects such as avatars. The animated views of an object could comprise pre-rendered images or just-in- time rendered 3D-Models and textures, that is, objects could be loaded as individual Shockwave® objects, parameterized generic Shockwave® objects, images, movies, 3D-tnodels optionally including textures, and animations. Users could have unique/personal avatars or share generic avatars.

When a client device 1902 wants an object to move to a new location in the virtual environment, its client determines the coordinates of the new location and a desired time to start moving the object, and generates a request. The request is sent to the world server 1916.

The world server 1916 receives a request and updates the data structure representing the "world." The world server 1916 manages each object state in one or more virtual environments, and updates the states that change. Examples of states include avatar state, objects they're carrying, user state (account, permissions, rights, audio range, etc.), and call management. When a user commands an object in a virtual environment to a new state, the world server 1916 commands all clients represented in the virtual environment to transition the state of that object, so client devices display the object at roughly the same state at roughly the same time. The world server 1916 may also perform collision detection and avoidance, path finding, and ensure, in general, consistent (e.g. physically correct) behavior.

The world server 1916 can also manage objects that transition gradually or abruptly. When a client device commands an object to transition to a new state, the world server 1916 receives the command and generates an event that causes all of the clients to show the object at the new state at a specified time.

The world server 1916 generates coefficients for the sound model. For example, the world server 1916 keeps track of distances between objects, and generates the coefficients as a function of the distance between the objects. The world server 1916 supplies the coefficients to a phone system 1920, which applies the coefficients to the audio data.

The phone system 1920 establishes phone connections with traditional phones (landline and cellular), VoIP phones, and other phones 1904. Some embodiments of the phone system 1920 may include one or more telephony servers 1922 for establishing calls with phones via a public switched telephone network (PSTN). For instance, a telephony server 1922 may include PBX or ISDN cards for making connections for users with traditional telephones (e.g., touch-tone phones) and digital phones. The telephony server 1922 may include mobile network or analog network connectors. These cards act as the terminal side of a PBX or ISDN line and, in cooperation with associated software, perform all low-level signaling for establishing phone connections. Events (e.g. ringing, connect, disconnect) and audio data in chunks (of e.g. 100ms) are passed from a card to a sound system 1926. The sound system 1926, among other things, mixes the audio between users in a teleconference, mixes in any external sounds (e.g., the sound of a jukebox, a person walking, etc) and passes the mixed (drain) chunks back to the card and, therefore, to a user.

Some embodiments of the phone system 1920 may include one or more VoIP servers 1924 for establishing connections with users who call in with VoIP phones. In this case, a client (e.g., the client 160 of Figure 1) may contain functionality by which it tries to connect to a VoIP soft-phone phone using, for example, an xml-socket connection. If the client detects the VoIP phone, it enables VoIP functionality for the user. The user can then (e.g., by the click of a button) cause the client to establish a connection by issuing a CALL command via the socket to the VoIP phone which calls a VoIP server 1924 while including information necessary to authenticate the VoIP connection.

Some embodiments of the phone system 1920 may transcode calls into VoIP, or receive (and possibly transcode) VoIP streams directly from third parties (e.g., telecommunication companies). In those embodiments, events would originate not from the cards, but transparently from an IP network.

The world servers 1916 can associate each authenticated VoIP connection with a client connection, if existent. The world servers 1916 can associate each authenticated PBX connection with a client connection, if existent.

The server system 1910 can provide the same virtual representation to different kinds of client devices 1902, possibly with different visual representations (e.g. 3D, isometric, and textual), whereby users of those different client devices 1902 can still interact with each other. For devices that are enabled to run text sessions, such as Telnet sessions, a user could establish a text session to receive information, questions and options, and also to enter commands. For textual devices, a written description of a virtual environment could be provided.

The phone system 1920 can also allow users of phones to control objects in a virtual environment. A user without a client device and with only a phone can experience sounds of the virtual environment as well as speak with other users (having or not having client devices 1902), even if that user cannot see sights of the virtual environment. The phone system 1920 can accept phone signals (e.g., DTMF, voice commands) from phones to control the actions of their corresponding representation in the virtual environment. The phone system 1920 could also receive SMS or MMS to control these actions. A phone 1904 generates signals for selecting and controlling objects in the virtual representation, and the phone system 1920 translates the signals and informs the server system to take action, such as changing the state of an object. As examples, the signals may be dial tone (DTMF) signals, voice signals, or some other type of phone signal. Consider a touch tone phone. Certain buttons on the phone can correspond to commands. A user with a touch phone or DTMF-enabled VoEP phone can execute a command by entering that command using DTMF tones. The telephony server 1922 detects the (in-band) DTMF tones and converts them into (out-of-band) control signals which are passed to the world server 1916. Each command can be supplied with one or more arguments. An argument could be a phone number or other number sequence. In some embodiments, voice commands could be interpreted and used.

The server system 1910 can also include a server 1917 for providing an audio description of a virtual environment. For example, a virtual environment can be described to a user from the perspective of the user's avatar. Objects that are closer to the user's avatar might be described in greater detail. The description may include or leave out detail to keep the overall length of the description approximately constant. The user can request more detailed descriptions of certain objects, upon which additional details are revealed. The server system 1910 can also generate an audio description of options in response to a command. The phone system 1920 mixes the audio description (if any) and other audio, and supplies the mixed sound data to the user's phone.

The sound system 1926 can play sound clips, such as sounds in the virtual environment. The sound clips are synchronized with state changes of the objects in the virtual environment. The sound system 1926 starts and stops the sound clips at the state transition start and stop times indicated by the world server 1916.

The sound system 1926 can mix sounds of the virtual environment with audio from the phones 1904. Sound mixing is not limited to any particular sound model. The phone system 1920 may receive a list of patches, sets of coefficients, and goes through the list.

The VE server system 1910 may also include one or more servers that offer additional services. For example, one or more web containers 1918 might be used to implement servlet and JavaServer Pages (JSP) specifications to provide an environment for Java code to run in cooperation with the web servers 1912.

All servers in the system 1900 can be run on the same machine, or distributed over different machines. Communication may be performed using remote invocation. For example, an HTTP or HTTPS-based protocol (e.g. SOAP) can be used by the server(s) and network-connected devices to transport the clients and communicate with the clients. Reference is now made to Figure 20, which illustrates an example of using the system 1900. At block 2000, a user is allowed to start a session. For example, using a web browser, a user enters a web site, and logs into the system 1900. The provider of the service starts the session.

After the session is started, a virtual environment is presented to the user (block 2010). If, for example, the service provider runs a web site, a web browser can download and display a virtual environment to the user.

A user can control its representative object to move around a virtual environment to experience the different sights and sounds that the virtual environment provides (block 2020). For instance, a representative object could turn on a jukebox and select songs from a playlist. The jukebox would play the selected songs. Users could also drag and drop songs from a shared or local file folder onto the jukebox to have the songs uploaded and played.

A user can also move its representative object around a virtual environment to interact with other users represented in the virtual environment (block 2040). The user's representative object may be moved by clicking on a location in the virtual environment, pressing a key on a keyboard, pressing a key on a telephone, entering text, entering a voice command, etc.

There are various ways in which the user can interact with other users in the virtual environment. One way is by wandering around the virtual environment and hearing conversations that are already in progress. As the user moves its representative object around the virtual environment, that user can hear voices and other sounds.

The user can then participate in a conversation or otherwise interact with others (block 2040) by becoming voice-enabled via phone (block 2030). Becoming voice-enabled allows the user to speak with others who are voice-enabled. For example, the user wants to have a teleconference using a phone. To enter into a teleconference, the user uses the phone to call the communications system. Using a traditional telephone, the user can call the virtual environment that he is in (e.g., by calling a unique phone number, or by calling a general number and entering additional data such as user ID and PIN, via DTMF). Using a VoIP phone, a user could call a virtual environment by calling its unique VoIP address.

The service provider can join the phone call with the session in progress if it can recognize the user's phone number (block 2032). If the service provider cannot recognize the user's phone number, the user starts a new session via the phone (block 2034), the user identifies himself (e.g., by entering additional data such as a user ID and PIN via DTMF) and then the service provider merges the new phone session with the session already in progress (block 2036). Instead of the user calling the service provider, the user can request the service provider to call the user (block 2038).

Once voice-enabled (block 2030), the user can use a phone to talk to others who are voice-enabled. Once voice-enabled (block 2030), the user remains voice-enabled until the user discontinues the call (e.g., hangs up the phone).

In some embodiments, the system 1900 allows a user to log into the teleconferencing service and enter into a conversation without accessing the web site (block 2060). A user might only have access to a touch-tone telephone or other phone 1904 that can't display a virtual environment. Consider a traditional telephone. With only the telephone, the user can call a telephone number and connect to the service provider. The service provider can then add the user's representative object to the virtual environment. Via telephone signals (e.g., DTMF, voice control), the user can move its representative object about the virtual environment, listen to other conversations, meet other people and experience the sounds (but not sights) of the virtual environment. Although the user cannot see its representative objects, others viewing the virtual environment can see the user's representative object.

More than one virtual environment may be hosted at any given time. If more than one virtual reality environment is available to a user, the user can move into and out of the different virtual environments, and thereby interact with even more people. Each of the virtual environments can be uniquely addressable via an Internet address or a unique phone number. The service provider can then place each user directly into the selected target virtual environment. Users can reserve and enter private virtual environments to hold private conversations. Users can also reserve and enter private areas of public environments to hold private conversations. A web browser or other graphical user interface could include a sidebar, a browser extension or other means for indicating different environments that are available to a user. The sidebar allows a user to move into and out of different virtual environments, and to reserve and enter private areas of a virtual environment.

Communication between users is not limited only to conversations via phones. Communication can occur in other ways. Examples include, without limitation, video streams, text chat messages, instant messenger messages, avatar gestures or moves, mood expressions, emoticons, and web pages.

The state of a virtual environment may be persistent in that it continues to exist throughout many user sessions and it continues to exist through the actions of different users. This allows a virtual environment to be modified by one user, and the modifications observed by others. For example, graffiti can be written on walls, a light switch in a virtual reality environment could be switched on and off, etc., as a way of signaling to another user. Objects in the virtual environment can be added, removed, moved and modified by a user as a way of signaling to another user. Examples of objects include sound sources (e.g., music boxes, bubbling fish tanks), data objects (e.g., a modifiable book with text and pictures), visualized music objects, etc.

Communication between users may be performed by sharing certain objects. The persistent state also allows "things" to be put on top of each other. A file can be dropped onto a user or dropped onto the floor as a way of sharing the file with the user. A music or sound file could be dropped on a jukebox. A picture or video could be dropped on a projector device to trigger playback / display. A multimedia sample (e.g., an audio clip or video clip containing a message) could be "pinned" to a whiteboard.

Referring back to Fig. 2, a virtual representation and a teleconference are generated by two different systems 140 and 150. In addition, the different clients 160 that display the virtual representation might not communicate directly with each other (in a pure client-server system, they won't). Yet the communication system 110 ensures that the clients 160 display roughly the same object transitions in a virtual representation at roughly the same time.

If a user commands a new object state in a virtual representation, his client does not directly inform other clients of the new state. Moreover, the client does not immediately transition the object to the new location. Instead, the client sends a request to the server system 150 and awaits instructions from the server system 150.

The server system 150 causes all of the clients displaying a virtual representation to gradually transition an object to its new state by a specified time. When a state of an object in an environment has changed, the server system 150 informs all necessary clients of the change. In this manner, the server system 150 ensures that all client devices 120 show roughly the same object transition in the virtual representation at roughly the same time.

The communications system 110 can host multiple virtual representations simultaneously. The communications system 110 can host multiple teleconferences in each virtual representation.

If more than one virtual representation is available to a user, the user can move in and out of the different virtual representations. Each of the virtual representations can be uniquely addressable via an Internet address or a unique phone number. The server system 150 can then place each user directly into the selected virtual representation. Users can reserve and enter private virtual representations to hold private conversations. Users can also reserve and enter private areas of virtual representations to hold private conversations. A web browser or other graphical user interface could include a sidebar or other means for indicating different virtual representations that are available to a user.

Thus, a user can make use of both a client device 120 and an audio-only device 130 during a teleconference. The client device 120 is used to interact with the virtual representation and find others to speak with. The audio-only device 130 is used to speak with others.

Additional reference is made to Figure 22, which illustrates an example of how the communications system 110 manages the state of an object when a client device requests a new state for that object. To further illustrate this example, the object will be described as an avatar that represents a user, and the new state will be a new location of the avatar.

On the client side, the client receives an input to change the state of the object (block 2210). For example, the new location for an object is received by clicking on the new location in the virtual representation.

In response, the client 160 computes coordinates in the virtual representation from the clicked screen coordinates of the new location (block 2215) and sends a state change request to the server system (block 2220). The state change request includes the coordinates of the new location. The state change request may also include a desired time at which the avatar should start moving toward the new location (block 2215). The desired time should be slightly in the future so that an event can be communicated to all clients 160 before the time arrives. Then, the client 160 goes into a wait state (block 2225).

The server system 150 validates the request (block 2230). For example, the server system 150 checks whether the virtual representation contains a path that allows the avatar to move to the new location. This may include determining whether the coordinates of the new location lie within a walkable space and whether the avatar is allowed to walk there from its current location at the specified time. If the time has already passed or doesn't allow time to communicate, the starting time is shifted slightly into the future as necessary.

If the request is validated, the server system 150 can also compute a path and arrival time for the representation to transition from the current state to the new state (block 2230). For example, the server system 150 may use a wayfinding algorithm to compute a walking route with waypoints and arrival times for each waypoint. An exemplary wayfinding algorithm is described below. The server system 150 updates a master model, which is a data structure that contains all object states in time (block 2235). For example, the server system 150 adds the avatar's waypoints and their arrival times to the master model.

The server system 150 then generates an event, which notifies all clients 160 of the updated object state (block 2240). For example, the event includes the start and stop times for each waypoint in the avatar's walking path. All of those clients 160 displaying the virtual representation will move the avatar to each of the waypoints at the same arrival times. Thus, all of those clients 160 will shows roughly the same avatar motion at roughly the same time (roughly, due, for instance, to imperfectly synchronized system clocks or system latencies).

In addition, the server system 150 can command the teleconferencing system 140 to play movement sounds at the appropriate time (block 2260). The teleconferencing system 140 plays the sound clip(s) at the designated time(s) (block 2270). For example, the server system 150 can provide a sound clip of the sound of footsteps as an avatar walks to a new location, and the teleconferencing system 140 plays the sound clip to the user whose avatar is walking.

The server system 150 also synchronizes the sound clips with the movement and state changes in the virtual representation.

The server system 150 can also generate data for controlling audio characteristics over time (block 2280). For example, volume of a conversation between two users is a function of distance and/or orientation of their two avatars in the virtual environment. In this example, sound gets louder as the avatars move closer together, and sound gets softer as the avatars move further apart. The server system 150 generates sound coefficients that vary the volume of sound between two users, as a function of the distance between the two users. The coefficients are used by the teleconferencing system 140 to vary sound volume over time (block 2290). In this manner, the server system 150 commands the teleconferencing system 140 to attenuate or modify sounds so the conversation is consistent with the virtual environment. In this manner, the server system 150 can also command the teleconferencing system 140 to play sound clips, record user speech or modify operational parameters affecting sound quality.

In general, an object in a virtual representation has properties that allow a user to perform certain actions on them (e.g. sit on, move, open). An object (e.g. a Flash object) obeys certain specifications (e.g. an API). As but one example, an object can be a jukebox having methods (actions) such as play/stop/pause, and properties such as volume, song list, and song selection. The server system 150 would generate an event when the jukebox is turned on and a song selected. The server system 150 would command the teleconferencing system to play the selected clip. A client 160 can optionally compute a transition path at block 2215 and send the transition path to the server system 150. This might be done to ease the workload (at block 2230) on the server system 150, which wouldn't have to compute the transition path.

Reference is now again made to Fig. 24. In addition to or instead of sound mixing illustrated in Figure 24, to preserve computing power and decrease latencies, the teleconferencing system 1620 could switch together source/drain pairs to direct connections. This might be done if the world server 1616 determines that two users can essentially only hear each other. Also the teleconferencing system 1620 could premix some or all sources for several drains whose coefficients are similar. In the latter case each user's own source may have to be subtracted from the joined drain to yield his drain.

The telephony system 1622 (see Figure 16) can also allow users of audio-only devices to control objects in a virtual environment, and move from one virtual environment to another. A user with only an audio- only device can experience sounds of the virtual environment as well as speak with others, but cannot experience sights of the virtual environment. The telephony system 1622 can use phone signals (e.g., DTMF, voice commands) from phones to control the actions of their corresponding representation in the virtual environment.

Certain buttons on a phone can correspond to commands. A user with a touch phone or DTMF-enabled VoIP phone can execute a command by entering that command using DTMF tones. Each command can be supplied with one or more arguments. An argument could be a phone number or other number sequence. In some embodiments, voice commands could be interpreted and used.

For example a user could press '0' to enter a command menu where all available commands are read to the user. The user can then enter a CALL command (e.g., 2255) followed by the # sign. The user may then be asked to identify the person to call, e.g., by saying that person's name, entering that person's phone number, entering a code corresponding to that person, etc.

Another command could cause an avatar to move within its virtual environment. Arguments of that command could specify direction, distance, new location, etc. Another command could allow a user to switch to another virtual environment, and an argument of that command could specify the virtual environment. Another command could allow a user to join a teleconference. Another command could allow a user to request information about the environment or about other users. Another command could allow one user's avatar to take another user's avatar by the hand, whereby the latter avatar would follow (be piggybacked to) the former avatar.

For devices that are enabled to run Telnet sessions, a user could establish a telnet session to receive information, questions and options, and also to enter commands.

Certain client devices could include Braille terminals. Braille devices can be used like text terminals.

For users that have only audio-only devices, the server system 1610 could include means 1617 for providing an alternative description of virtual environment. For Telnet-enabled devices, the means 1617 could provide a written description of a virtual environment. For other audio-only devices, the system could include a speech synthesis system for providing a spoken description, which is heard on the audio- only device.

For example, a virtual environment can be described to a user from the perspective of the user's avatar. Objects that are closer to the user's avatar might be described in greater detail. The description may include or leaves out detail to keep the overall length of the description approximately constant. The user can request more detailed descriptions of certain objects, upon which additional details are revealed.

The communications system 1610 provides teleconferencing without requiring the user to install software or to acquire special equipment (e.g., microphone, PC speakers, headset). If the system 1610 is web- based, a web browser can be used to connect to the VE server system 1610, download and run a client, and display the virtual environment. This makes it easy to connect and use the communications system 1600.

Figures 25 and 26 illustrate a method of computing waypoints for a moving object. This method may be performed by the world server 1616 or by the client 160. Consider the exemplary space illustrated in Figure 25. The space is represented by a polygonal boundary 2510 and two polygonal obstacles 2520. The boundary 2510 has vertices A, B, C, D, E and F. One obstacle 2520 has vertices G, H, I, J and K, and the other obstacle 2520 has vertices L, M and N. The boundary 2510 might delineate the bounds of a virtual representation, whereas the obstacles 2520 might represent movable objects and fixtures in the virtual representation. The goal of the method is to a find a path from a current location S to a new location T that is not obstructed by either the boundary 2510 or the obstacles 2520. A line segment is obstructed by the boundary 2510 if any portion of the line segment lies outside the boundary 2510. A line segment is obstructed by an obstacle 2520 if any portion of the segment lies inside the corresponding open polygon 2520.

The path may be formed by one or more line segments (i.e., a piecewise linear path). Internal vertices (i.e. excluding S and T) of the path are vertices of the boundary 2510 and obstacle(s) 2520. Vertices such as K are excluded as internal vertices, since a path formed by line segment GJ is shorter than a path formed by line segments GK and KJ. Vertices such as A, B, D, E and F are also excluded as internal vertices, since shorter paths can be formed with other vertices. A path could even follow a boundary (e.g., a line segment along vertices H and I).

Referring to Figure 26, the world server computes a visibility graph (block 2610), for example, using a Planar sweep algorithm. The visibility graph includes vertices of the boundary 2510 and vertices of each obstacle 2520. Between each pair of points, the visibility graph also includes an edge, but only if the line segment is not obstructed by the boundary 2510 or by any obstacle 2520.

The visibility graph is updated whenever an obstacle 2520 moves or a new obstacle appears (block 2620). For instance, the visibility graph is updated if a new avatar enters a virtual representation, or if an object (e.g., a chair) in a virtual environment is moved.

When an object is commanded to move, new and current locations for an object are added to the visibility diagram (block 2630), and the shortest path between the new and current locations is found (block 2640). An algorithm such as Dijkstra's algorithm may be run on the visibility graph to identify the edges of the shortest path.

At any given time, multiple objects might be moving within a virtual representation. The shortest path determination (block 2640) can also include collision avoidance. One approach toward avoiding collisions between two moving objects is to transport each object instantly to its new location. However, collision avoidance is optional, as objects could be allowed to collide (e.g., pass through each other).

As mentioned above, a system according to the present invention is not limited to a single virtual representation. Rather, a system according to the present invention can host a plurality of independent virtual representations, assign different users to different representations, allow one or more teleconferences per virtual representation, and manage the state of (e.g., regulate motion of ) the objects in each virtual representation. For example, a system could provide a first virtual environment including a club scene, and a second virtual environment including a beach scene. Some users could be assigned to the first virtual environment, experience sights and sounds of the club scene, and have teleconferences with those users represented by avatars in a virtual club. Other users could be assigned to the second virtual environment, experience sights and sounds of the beach scene, and have teleconferences with those users represented by avatars on a virtual beach. The server system would manage objects in both environments.

The server system can filter communications with the clients, sending communications only to those clients needing to change the state of an object in a particular virtual representation. The world server 1616 may perform either or both of the following functions:

(1) Create sessions, handle session timeouts, and destroy sessions. For instance, once a client is closed and the timeout has passed that user's session is terminated.

(2) Dispatch events from the world server 1616 to only those clients that will be affected by the events. For instance users in one virtual environment are not affected by events in another virtual environment. Therefore, the world server 1616 sends events affecting a virtual environment only to those clients represented in that virtual environment.

The filtering reduces communication overhead. As a result, traffic between the world server and clients is reduced.

A communications system according to the present invention is not limited to any particular topology. Several exemplary topologies are illustrated in Figures 27a-27d. These topologies offer different ways in which clients can communicate with a server system. These figures 27a-27d do not illustrate topologies in which audio-only devices communicate with a teleconferencing system.

Reference is made to Figure 27a, which illustrates a pure client-server topology. Clients are represented by circles, and a server system is illustrated by a triangle. The server system may include one or more servers.

Reference is now made to Figure 27b, which illustrates a topology including clients (represented by circles), a server system (represented by a big triangle), and super nodes (represented by small triangles). Each super node functions as both a client and server. Functioning as a server, a super node serves data to all connected clients. Functioning as a client, a super node can display and interact with a virtual representation. The server system and super nodes coordinate to keep track of the objects in a virtual representation. The super nodes may be operated by users or by the communications service provider.

Reference is now made to Figure 27c, which illustrates a topology including peers (represented by hexagons), and a server system (represented by a triangle). Each peer connects to the server system, as indicated by a dashed line, to display and interact with a virtual representation. However, the peers are also interconnected, as indicated by the solid lines, and as such, can bypass the server system and pass certain data among themselves. Example of such data include, but are not limited to, audio files (e.g., sound clips), static files (e.g., background images, user pictures), live data (e.g. webcam). One of the peers can originate such data and/or receive such data from the server system, and pass the data to its peers.

To ease the burden off the server system, the peers can also exchange data concerning a virtual environment as well as object commands. A transition path and times could be computed by a peer that commanded an object state change, and the path and times could be distributed to its peers. Such data could also be sent to the server system, so the server system can keep track of the objects in the virtual representation.

Reference is now made to Figure 27d, which illustrates a topology including a server system (represented by a triangle), clients (represented by circles), super-nodes (represented by squares), and peers (represented by hexagons). Peers exchange data among each other, and clients connect to one or more super-nodes (both illustrated with solid lines). If such a connection fails, a client can connect to a fallback peer (illustrated by dotted lines) which will then become a super-node. The clients and the peers may also connect or exchange data with the server system, as illustrated by the dashed lines.

The topology of Figure 27d offers advantages of peer-to-peer communication, including reducing computational load and traffic on the server while still allowing simple clients to participate. In contrast to a client that runs in a virtual machine, a peer or super node might require installation.

Reference is made to Figure 28, which illustrates an exemplary communications system 2800 including a server system 2810 and teleconferencing system 2820 that communicate with peers 2850. The server system 2810 provides a virtual representation to each peer 2850, and ensures that each peer 2850 displays roughly the same object transitions at roughly the same time.

The peers 2850 use peer-to-peer communication to exchange data among each other. Each peer 2850 includes a graphical user interface 2852, a sound mixer 2854, and audio input/output hardware 2856 (e.g., a microphone and speakers). Each peer 2850 can generate an audio stream with the audio I/O hardware 2856 and distribute that audio stream to one or more other peers 2850. The sound mixer 2854 of a peer 2850 weights the audio streams from other peers and audio streams from other audio sources. The weighted streams are combined in the sound mixer 2854, and the combined stream is outputted on the audio I/O hardware 2856. Sound coefficients for weighting the audio streams could be computed by a peer's graphical user interface 2852 or by the server system 2810. Peers 2850 could also send combined audio streams to other peers to preserve bandwidth.

Peer communication can also be used to exchange data such as files and events instead of, or in addition, to loading it from the server system 2810. A peer-to-peer file sharing protocol such as Bittorrent can be used to transport static files. This reduces traffic on the server system's media servers because optimally each file is downloaded only once from the server system 2810.

Each user's media (e.g. representation/avatar graphics, profile pictures, files) could be seeded from the user's peer 2850 and distributed among the peers 2850. State change commands, text messages and webcam data or pictures could also be loaded from the server system 2810 only once and distributed in a peer-to-peer fashion to reduce traffic on the server system 2810.

The system is not limited to any particular architecture. For example, the system of Figure 1 can be implemented as a client-server system. In such a system, the service provider includes one or more servers, and the different user devices are client devices. Certain types of client devices (e.g., computers) can connect to the servers via a network such as the Internet. Other types of client devices can connect via different networks. For instance a traditional telephone can connect via PSTN lines, VoIP phones can connect through the Internet, etc.

Teleconferencing according to the present invention can be performed conveniently. Entering into a teleconference can be as simple as going to a web site, and clicking a mouse button (maybe a few times).

Phone numbers do not have to be reserved. Pre-conference introductions do not have to be made.

Special hardware (e.g., web cameras, soundcards, and microphones) is not needed, since voice communication can be provided by a telephone. Communication is intuitive and, therefore, easy to learn.

Audio-visual dynamic multi group communication is enabled. A user can move from one group to the other and thereby change whom they are communicating with.

A system according to the present invention allows for a convergence and integration of different communication technologies. Teleconferences can be held by users having traditional phones, VoIP phones, devices with GUI interfaces and Internet connectivity, etc.

Claims

1. A method of controlling volume of sound data during a teleconference, the method comprising providing a virtual representation including objects that represent users in the teleconference; and controlling the volume of the sound data according to how the users change location and relative orientation of their objects in the virtual representation.

2. The method of claim 1, further comprising changing other audio characteristics of the sound data according to how the users interact with the virtual representation.

3. The method of claim 1, wherein objects in the virtual representation also have audio ranges, whereby the volume of the sound data is also controlled according to the audio ranges.

4. The method of claim 3, wherein the audio ranges are adjustable.

5. The method of claim 1, wherein the virtual representation is a virtual environment; and wherein the users are represented by avatars.

6. The method of claim 5, wherein volume of sound data between two users is a function of relative orientation of their avatars.

7. The method of claim 1, wherein the virtual representation is provided by a server system that computes a sound coefficient for each object that is a sound source with respect to a drain; and wherein for each user, controlling the volume includes applying those sound coefficients to the sound data of their corresponding objects, mixing the modified sound data and supplying the mixed sound data to the drain.

8. The method of claim 7, wherein the sound data is mixed according to

v n=l

9. A method comprising: providing a virtual representation; establishing phone connections with a plurality of users, the users represented by objects in the virtual representation, each user representative object being both sound drain and sound source; and for each drain, mixing sound data from different sound sources and providing the mixed data to the user associated with the drain, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain; whereby the users are not directly connected, but instead communicate through a synthesized auditory environment.

10. The method of claim 9, wherein mixing the sound data for each drain includes computing audio parameters for each paired source, each audio parameter controlling sound volume as a function of closeness of its corresponding source to the drain; and adjusting sound data of each paired source with the corresponding audio parameter, mixing the adjusted sound data of the paired sources, and providing the mixed sound data to the user associated with the drain.

11. The method of claim 9, wherein the virtual representation includes other objects that are sound sources, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain; and wherein adjusted sound data from the other objects is also mixed and supplied to the drain.

12. The method of claim 9, wherein the objects include audio ranges.

13. The method of claim 9, wherein the topology metric is virtual distance between a source and a drain.

14. The method of claim 9, wherein the topology metric includes distance and orientation.

15. The method of claim 9, whereby audio is clustered to reduce computational burden.

16. The method of claim 9, wherein sound is mixed according to

V_dw(t) = vo!_< - ^S'∑c_wπ - V_Sπ (t) .

17. The method of claim 9, wherein to reduce the computation burden of mixing the sound data for each drain, the sound data is mixed only for those sound sources making a significant contribution.

18. The method of claim 17, wherein audio ranges of certain objects are automatically set at or near zero, whereby the sound data of those certain objects are excluded from the mixing.

19. The method of claim 9, wherein a minimum distance between objects is imposed to reduce the computation burden of mixing the sound data.

20. The method of claim 9, wherein at least some sound data is premixed to reduce the computation burden of mixing the sound data; wherein the premixing includes mixing sound data from a group of sound drains and assigning a single coefficient per drain to the group.

21. The method of claim 9, wherein direct connections are made between a source and a drain to reduce the computation burden of mixing the sound data.

22. A communications system comprising: phone-based teleconferencing means; and means for providing a virtual representation including objects that represent participants in a teleconference, the virtual representation allowing participants to use the phone-based teleconferencing means to enter into teleconferences and to control volume during the teleconferences, the volume controlled according to how the users change location and relative orientation of their objects in the virtual representation.

23. A communications system comprising: a server system for providing a virtual representation; and a teleconferencing system for establishing phone connections with a plurality of users, the users represented by objects in the virtual representation, the teleconferencing system controlling volume during a teleconference according to how the users change location and relative orientation of their representative objects in the virtual representation.

24. The system of claim 23, wherein each user representative object is both sound drain and sound source; and wherein for each drain, mixing sound data from different sound sources and providing the mixed data to the user associated with the drain, where volume of sound data from a source is adjusted according to a topology metric of the source with respect to the drain.

25. A method comprising applying a virtual reality environment to teleconferencing such that the environment is used to enter into a teleconference.

26. Apparatus for applying a virtual reality environment to teleconferencing to enable a user to enter the virtual reality environment without knowing any other in the virtual reality environment, yet enable the user to meet and hold a teleconference with others in the virtual reality environment.

27. A system comprising: means for teleconferencing; and means for coupling an immersive virtual reality environment with the teleconferencing.

28. A communications system comprising: a server system for providing a virtual representation including at least one object; and a teleconferencing system for establishing audio communications with an audio-only device; an object in the virtual representation controlled in response to signals from the audio-only device.

29. A system comprising: means for providing a virtual representation including objects; means for receiving signals from audio-only devices; and means for controlling states of the objects in response to the signals.

30. A communications system for providing a virtual environment including a plurality of objects, the objects having changeable states; and for establishing audio communications with audio-only devices; the system controlling the states of the objects in the virtual representation in response to signals from the audio-only devices, such that users of the audio devices can interact with the virtual environment.

31. A method of controlling objects in a virtual environment comprising: receiving signals from audio-only devices; and controlling states of the objects in response to the signals.

32. A method of providing a service comprising: providing a network-accessible virtual environment including objects that represent users of the service; allowing the users to control their representative objects in the virtual environment to personally interact with other users represented in the virtual environment and also to become voice-enabled; and enabling those users who are voice-enabled to speak with other voice-enabled users via phones.

33. A system comprising: means for providing a network-accessible virtual environment including objects that represents system users; means for allowing the users to control their representative objects in the virtual environment to personally interact with other users represented in the virtual environment and also to become voice-enabled; and means for enabling those users who are voice-enabled to speak with other voice-enabled users via phones.

34. A communications system comprising: a teleconferencing system for hosting teleconferences; and a server system for providing a virtual representation for the teleconferencing system, the virtual representation including objects whose states can be commanded to transition gradually, the server system providing clients to client devices, each client causing its client device to display the virtual representation; each client device capable of generating a command for gradually transitioning an object to a new state in the virtual representation and sending the command to the server system; the server system commanding the clients to transition an object to its new state by a specified time.

35. A communications system for a plurality of client devices, comprising: first means for hosting teleconferences; and second means for providing virtual representations that enable the teleconferences, each virtual representation including objects whose states transition gradually, the second means providing clients to at least some of the client devices, each client causing its client device to display a virtual representation; each client device capable of generating a command for gradually transitioning an object to a new state in a virtual representation and sending the command to the second means; the second means commanding the clients to transition an object to roughly the same state at roughly the same time the second means causing the first means to control audio characteristics of the teleconferences to be consistent with the virtual representations.