CN116018803A

CN116018803A - Web-based video conference virtual environment with navigable avatar and application thereof

Info

Publication number: CN116018803A
Application number: CN202180037563.6A
Authority: CN
Inventors: G·C·克罗尔; E·S·布朗德
Original assignee: Caterpillar Technologies
Current assignee: Caterpillar Technologies
Priority date: 2020-10-20
Filing date: 2021-10-20
Publication date: 2023-04-25
Also published as: CA3181367C; IL298268B1; JP2023534092A; EP4122192A1; AU2021366657A1; IL298268A; WO2022087147A1; AU2021366657B2; JP7318139B1; KR20230119261A; JP2023139110A; CA3181367A1; KR20220160699A; AU2023229565A1; KR102580110B1; IL308489A; BR112022024836A2

Abstract

Disclosed herein is a web-based video conferencing system that allows video avatars (102A, 102B) to navigate within a virtual environment. The system has a presentation mode that allows texture mapping of presentation streams to presenter screens (104A, 104B) located within the virtual environment. The opposite left and right sounds are adjusted to provide a sense of the position of the avatar in the virtual space. The sound is further adjusted based on the region in which the avatar is located and the region in which the virtual camera is located. The video stream quality is adjusted based on the relative position in the virtual space. Three-dimensional modeling is available within a virtual videoconferencing environment.

Description

Web-based video conference virtual environment with navigable avatar and application thereof

Cross Reference to Related Applications

The present application claims priority from the following U.S. patent applications: U.S. patent application Ser. No. 17/075,338, issued 10/20/2020, with an authorized date of 2021, 4/13, and an authorized bulletin number of US10,979,672; U.S. patent application Ser. No. 17/198,32, filed on day 3 and 11 of 2021; U.S. patent application Ser. No. 17/075,362, issued 10/20/2020, with an authorized date of 2021, 8/17, and an authorized bulletin number of US11,095,857; U.S. patent No. US17/075,390, issued 10/20 in 2020, grant day 2021, 3/16, grant bulletin No. US10,952,006; U.S. patent application Ser. No. 17/075,408, issued 10/20/2020, 2021/7/20, and US11,070,768; U.S. patent application Ser. No. 17/075,428, issued 10/20/2020, with an authorized date of 2021, 7/27, and an authorized bulletin number of US11,076,128; and U.S. patent application Ser. No. 17/075,454, filed 10/20/2020. The contents of each of these applications are incorporated by reference herein in their entirety.

Technical Field

The field relates generally to video conferencing.

Background

Videoconferencing involves users receiving and transmitting audio-video signals at different locations for real-time communication from person to person. Videoconferencing is widely used on many computing devices for a variety of different services, including the Zoom service available from the Zoom carrier of San Jose, CA. Some video conferencing software, such as the FaceTime application available from Apple inc (Apple inc.) of Cupertino (CA), california, is a standard configuration for mobile devices.

Generally, these applications operate through video and output audio displayed by other conference participants. When there are multiple participants, the screen may be divided into rectangular boxes, each displaying video of one participant. Sometimes these services display the video of the speaking person by using a larger box. When different people speak, the box will switch between speakers. The application captures video from a camera integrated with the user's device and captures audio from a microphone integrated with the user's device. The application then transmits the audio and video to other applications running on other users' devices.

Many of these video conferencing applications have screen sharing functionality. When a user decides to share his screen (or a portion of his screen), the stream with his screen content is transferred to the other user's device. In some cases, other users may even control sharing of content on the user's screen. In this way, the user may collaborate on the project or demonstrate to other conference participants.

Recently, video conferencing technology has become increasingly important. Many workplaces, trade shows, gatherings, meetings, schools and religious venues have been shut down or encouraged to not attend to avoid the spread of disease, particularly covd-19. Virtual conferences using video conferencing technology are increasingly replacing physical conferences. In addition, this technique provides the advantage of avoiding travel and commute compared to physical meetings.

However, in general, the use of such video conferencing techniques can result in a loss of locale. In terms of experience, people get experience when physically meeting in the same place, and lose that experience when virtually meeting. In social terms, people can express themselves through gesture actions and can look at their partners. This sense of experience is important in creating relationships and social connections. However, this sensation is no longer present in conventional video conferences.

Furthermore, these video conferencing techniques can present additional problems when several participants begin to enter a conference. In an entity meeting, people may have a one-way conversation. You can make your voice so that only people close to your can hear you say. In some cases you can even conduct private conversations in larger gatherings. However, in a virtual conference, when multiple people speak at the same time, the software may mix the two audio streams substantially evenly, resulting in participants breaking each other. Thus, when a plurality of persons are conferred with a virtual conference, a private conversation is impossible, and the conversation tends to take the form of a one-to-many lecture. Here, the virtual meeting also loses the opportunity for participants to more effectively create social connections, communicate, and establish a relationship network.

Furthermore, due to network bandwidth and computing hardware limitations, the performance of many video conferencing systems begins to degrade when large amounts of streams are put into conferences. Many computing devices, while equipped with the capability to process video streams from several participants, are not capable enough to process video streams from tens or more of participants. Since many schools operate entirely virtually, a 25 person class conducting a virtual meeting may severely slow down the operation of computing devices distributed by the schools.

Massive multiplayer online games (MMOG or MMO) can typically have more than 25 participants. These games typically have hundreds or thousands of players on a single server. MMOs typically allow a player to navigate an avatar in a virtual world. Sometimes these MMOs allow users to speak to each other or send messages to each other. Examples include the robusta (Roblox) game available from Luo Bule x (Roblox) corporation of San Mateo, CA, california, and my world (MINECRAFT) game available from the praise (Mojang) studio of Stockholm, sweden.

Having only avatars interact with each other also has limitations in terms of social interactions. These avatars often fail to convey facial expressions that people often do inadvertently. These facial expressions can be observed in video conferences. Some publications may describe placing video on an avatar in a virtual world. However, these systems typically require specialized software and have other applications that limit them.

There is a need for a method of improving video conferencing.

Disclosure of Invention

In an embodiment, an apparatus enables a video conference between a first user and a second user. The device includes a processor coupled to a memory, a display screen, a network interface, and a web browser. The network interface is configured to receive: (i) data specifying a three-dimensional virtual space, (ii) a position and an orientation in the three-dimensional virtual space, the position and orientation being entered by a first user, and (iii) a video stream captured from a camera on a device of the first user. The camera of the first user is positioned to capture a photographic image of the first user. A web browser implemented on the processor is configured to download a web application from the server and execute the web application. The web application includes a texture mapper and a renderer. The texture mapper is configured to texture map the video stream onto a three-dimensional model of the avatar. The renderer is configured to render from a perspective of the virtual camera of the second user to display a three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction. By managing texture mapping within a web application, embodiments avoid the need to install specialized software.

In an embodiment, a computer-implemented method allows presentation in a virtual conference that includes multiple participants. In a method, data specifying a three-dimensional virtual space is received. The position and orientation in the three-dimensional virtual space is also received. The location and direction are entered into the conference by a first participant of the plurality of participants. Finally, a video stream captured from a camera on a device of the first participant is received. The camera is positioned to capture a photographic image of the first participant. The video stream is texture mapped onto the three-dimensional model of the avatar. Further, a presentation stream is received from a device of the first participant. The presentation flow is texture mapped onto the three-dimensional model of the presentation screen. Finally, rendering a three-dimensional virtual space from a perspective of a virtual camera of a second participant of the plurality of participants for display to the second participant, the three-dimensional virtual space having a texture-mapped avatar and a texture-mapped presentation screen participant. In this way, embodiments allow presentation in a social conferencing environment.

In an embodiment, a computer-implemented method provides audio for a virtual conference that includes a plurality of participants. In a method, a three-dimensional virtual space is rendered from a perspective of a virtual camera of a first user for display to the first user, the three-dimensional virtual space including an avatar having a texture-mapped video of a second user. The virtual camera is at a first location in the three-dimensional virtual space and the avatar is at a second location in the three-dimensional virtual space. An audio stream is received from a microphone of a device of a second user. A microphone is positioned to capture speech of the second user. The volume of the received audio streams is adjusted to determine the left audio stream and the right audio stream to provide a perception of the position of the second location in the three-dimensional virtual space relative to the first location. The left audio stream and the right audio stream are output to be played to the first user in stereo.

In an embodiment, a computer-implemented method provides audio for a virtual meeting. In a method, a three-dimensional virtual space is rendered from a perspective of a virtual camera of a first user for display to the first user, the three-dimensional virtual space including an avatar having a texture-mapped video of a second user. The virtual camera is at a first location in the three-dimensional virtual space and the avatar is at a second location in the three-dimensional virtual space. An audio stream is received from a microphone of a device of a second user. It is determined whether the virtual camera and the avatar are located in the same area of the plurality of areas. When it is determined that the virtual camera and the avatar are not located in the same area, the audio stream is attenuated. The attenuated audio stream is output for playback to the first user. In this way, embodiments allow private conversations or unidirectional conversations in a virtual video conferencing environment.

In an embodiment, a computer-implemented method effectively streams video for a virtual conference. In a method, a distance between a first user and a second user in a virtual meeting space is determined. A video stream captured from a camera on a device of a first user is received. The camera is positioned to capture a photographic image of the first user. The resolution or bit rate of the video stream is reduced based on the determined distance such that a shorter distance results in a greater resolution than a longer distance. The video stream is transmitted to the second user's device at a reduced resolution or bit rate for display to the second user within the virtual conference space. The video stream will be texture mapped onto the avatar of the first user for display to the second user within the virtual conference space. In this way, embodiments can efficiently allocate bandwidth and computing resources even when there are a large number of conference participants.

In an embodiment, a computer-implemented method allows modeling in a virtual video conference. In a method, a three-dimensional model of a virtual environment, a grid representing the three-dimensional model of an object, and a video stream from a participant of a virtual video conference are received. The video stream is texture mapped to an avatar that can be navigated by the participant. A texture mapped avatar and a grid representing a three-dimensional model of the object within the virtual environment are rendered for display.

System, device and computer program product embodiments are also disclosed.

Other embodiments, features, and advantages of the present inventions, as well as the structure and operation of the various embodiments, are described in detail below with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the pertinent art to make and use the disclosure.

Fig. 1 is a diagram illustrating an example interface for providing a video conference in a virtual environment in which video streams are mapped onto avatars.

Fig. 2 is a diagram illustrating a three-dimensional model used to render a virtual environment having an avatar for a video conference.

Fig. 3 is a diagram illustrating a system for providing video conferencing in a virtual environment.

Fig. 4A-4C illustrate how data is transferred between the various components of the system in fig. 3 to provide a video conference.

Fig. 5 is a flow chart illustrating a method for adjusting relative left and right volume to provide position perception in a virtual environment during a video conference.

Fig. 6 is a graph illustrating how the volume rolls off as the distance between avatars increases.

Fig. 7 is a flow chart illustrating a method for adjusting relative volume to provide different volume regions in a virtual environment during a video conference.

Fig. 8A to 8B are diagrams illustrating different volume areas in a virtual environment during a video conference.

Fig. 9A to 9C are diagrams illustrating a hierarchical structure of traversing volume areas in a virtual environment during a video conference.

FIG. 10 illustrates an interface with a three-dimensional model in a three-dimensional virtual environment.

Fig. 11 illustrates presentation screen sharing in a three-dimensional virtual environment for video conferencing.

Fig. 12 is a flowchart illustrating a method for allocating available bandwidth based on the relative positions of avatars within a three-dimensional virtual environment.

Fig. 13 is a graph illustrating how the priority value decreases as the distance between avatars increases.

Fig. 14 is a chart illustrating how allocated bandwidth may vary based on relative priority.

Fig. 15 is a diagram illustrating components of an apparatus to provide video conferencing within a virtual environment.

The drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.

Detailed Description

Video conferencing with avatars in a virtual environment

Fig. 1 is a diagram illustrating an example of an interface 100 for providing a video conference in a virtual environment, in which video streams are mapped onto avatars.

The interface 100 may be displayed to participants of a video conference. For example, the interface 100 may be rendered for display to participants and may be updated continuously as the video conference proceeds. The user may control the direction of his virtual camera using, for example, keyboard input. In this way, a user may navigate through the virtual environment. In an embodiment, different inputs may change the X and Y positions and pan and tilt angles of the virtual camera in the virtual environment. In further embodiments, the user may use the input to change the height (Z coordinate) or deflection of the virtual camera. In still other embodiments, the user may enter input to "jump" the virtual camera while returning to its original position to simulate gravity. Inputs that may be used to navigate the virtual camera may include, for example, keyboard inputs and mouse inputs, such as the WASD keyboard key to move the virtual camera forward, backward, left, right in the X-Y plane, the space key to "jump" the virtual camera, and changes in mouse movement specifying pan and tilt angles.

Interface 100 includes

avatars

102A and 102B, each representing a different participant in a video conference. The

avatars

102A and 102B have texture mapped

video streams

104A and 104B from the devices of the first participant and the second participant, respectively. Texture mapping is an image applied (mapped) to a surface of a shape or polygon. Here, the images are frames of video. The camera devices capturing the video streams 104A and 104B are positioned to capture the faces of the respective participants. In this way, the avatar has a facial moving image texture-mapped onto it as the participant speaks and listens in the conference.

Similar to the control of the virtual camera through the user viewing interface 100, the position and orientation of the

avatars

102A and 102B are controlled by the individual participants represented thereby. The

avatars

102A and 102B are three-dimensional models represented by grids. Each

avatar

102A and 102B may have a participant name below the avatar.

The

respective avatars

102A and 102B are controlled by the respective users. They may each be positioned at a location point within the virtual environment corresponding to their own virtual camera. Just as the user viewing interface 100 may be moved around the virtual camera, each user may move their

respective avatars

102A and 102B.

The virtual environment rendered in the interface 100 includes a background image 120 of the venue and a three-dimensional model 118. The venue may be a venue or a building in which a video conference should be conducted. The venue may include a floor area bounded by walls. The three-dimensional model 118 may include a mesh and texture. Other ways of mathematically representing the surface of the three dimensional model 118 are also possible. For example, polygon modeling, curve modeling, and digital texturing are possible. For example, the three-dimensional model 118 may be shown by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space. The three-dimensional model 118 may also include specifications for the light source. The light sources may include, for example, point light sources, directional light sources, spotlight sources, and ambient light sources. The object may also have certain properties describing how it reflects light. In an example, the attributes may include diffusion, ambient, and spectral lighting interactions.

In addition to locales, virtual environments may also include various other three-dimensional models that account for different components of the environment. For example, the three-dimensional environment may include a decoration model 114, a speaker model 116, and a presentation screen model 122. Just like the models 118, these models may be represented using any mathematical means that represents geometric surfaces in three-dimensional space. These models may be separate from the model 118 or combined into a single representation of the virtual environment.

A decorative model, such as model 114, is used to enhance the authenticity of the venue and to increase the aesthetics of the venue. The speaker model 116 may virtually emit sounds, such as presentation music and background music, as will be described in more detail below with respect to fig. 5 and 7. The presentation screen model 122 may be used to provide an outlet to present a presentation. The presenter screen share or video texture of the presenter screen share may be mapped onto the presentation screen model 122.

Button 108 may provide the user with a list of participants. In one example, after the user selects button 108, the user may chat with other participants by sending text messages alone or in groups.

The buttons 110 may enable a user to change the properties of the virtual camera used to render the interface 100. For example, a virtual camera may have a field of view that specifies the angle at which data is rendered for display. Modeling data within the camera field of view is rendered, while modeling data outside the camera field of view may not be rendered. By default, the field of view of the virtual camera may be set somewhere between 60 ° and 110 °, commensurate with the wide-angle lens and human vision. However, selecting button 110 may cause the virtual camera to increase the field of view beyond 170 ° commensurate with the fisheye lens. This may enable a user to have a wider peripheral perception of their surroundings in the virtual environment.

Finally, button 112 causes the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to a device belonging to another participant that is signaled to cease displaying avatars corresponding to users of previous viewing interface 100.

In this way, the interface virtual 3D space is used for video conferencing. Each user controls the avatar, which may control the avatar to move around, look around, jump, or do other things that change position or direction. The virtual camera presents the virtual 3D environment and other avatars to the user. The avatar of the other user has as an integral part a virtual display showing the user's webcam image.

By giving users a sense of space and allowing users to see each other's faces, embodiments provide more social experience than traditional web conferences or traditional MMO games. More social experiences have a variety of applications. For example, it may be used in online shopping. For example, interface 100 has the following applications: providing virtual groceries, churches, trade shows, B2B sales, B2C sales, school education, restaurants or canteens, product releases, job site surveys (e.g., for architects, engineers, contractors), office spaces (e.g., people are virtually "working at desks"), remote control machines (boats, vehicles, planes, submarines, drones, drilling equipment, etc.), factory/factory control rooms, medical operations, garden designs, travel with guided virtual buses, music concerts (e.g., concerts), lectures (e.g., TED lectures), political party gatherings, directors, underwater studies, studies on difficult to reach sites, emergency training (e.g., fire), cooking, shopping (including checkout and cargo transportation), virtual art and technology (e.g., painting and crock), marital, funeral, washing, remote sports training, consultation, treatment of phobia (e.g., countermeasure), fashion shows, amusement parks, home decoration, viewing sporting events, viewing electronic games, viewing performances captured using three-dimensional cameras, playing table games and role playing games, browsing/traversing medical images, viewing geological data, learning language, meeting in the vision-impaired space, meeting in the hearing impaired space, engaging people who are not normally able to walk or stand in activities, broadcasting news or weather, talk shows, ticketing, voting, MMOs, purchasing/selling virtual addresses (e.g., in some MMOs, such as Lin Deng Research corporation (Linden Research available from San Francisco, CA), inc.) SECOND person (SECOND LIFE) available in games), flea market, garage sales, travel agencies, banks, computer flow management, fencing/rapier/martial arts, reproduction (e.g. reproduction of crime scene and or accident), color-lined real events (e.g. wedding, presentation, performance, space walking), evaluation or viewing of real events captured with three-dimensional cameras, animal performances, zoos, experiential LIFE (e.g. modified video stream or still image of the virtual world to simulate the view of the user's hope of experiencing a reaction), job interviews, game shows, interactive stories (e.g. murder mystery)), virtual fishing, virtual navigation, psychological research, behavioral analysis, virtual sports (e.g. climbing/bearing stone movement), control of lamps etc. (home automation (domics)), memory palace, archaeology, gift shop, virtual employment such that they must be able to make their real world, virtual business transaction positions (e.g. virtual reality, etc.), virtual reality market, virtual reality, and market, etc. if they must be integrated with each other, and only in real time, if they are required to be able to complete their real-time transaction positions (e.g. to be integrated with each other, financial market, etc.), and augmented reality where you project a person's face onto their AR headset (or helmet) so that you can see their facial expression (e.g., for military, law enforcement, fire department, special forces), and make reservations (e.g., to reserve a vacation house/car/etc.).

Fig. 2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment having an avatar for a video conference. As illustrated in FIG. 1, the virtual environment herein includes a three-dimensional venue 118 and various three-dimensional models, including three-

dimensional models

114 and 122. As also illustrated in FIG. 1, the diagram 200 includes

avatars

102A and 102B navigating in a virtual environment.

As described above, the interface 100 in fig. 1 is rendered from the perspective of the virtual camera. The virtual camera is illustrated in diagram 200 as virtual camera 204. As mentioned above, the user viewing interface 100 in fig. 1 may control the virtual camera 204 and navigate the virtual camera in three-dimensional space. The interface 100 is continually updated based on the new position of the virtual camera 204 and any changes to the model within the field of view of the virtual camera 204. As described above, the field of view of the virtual camera 204 may be a cone defined at least in part by horizontal and vertical field of view angles.

As described above with respect to fig. 1, the background image or texture may define at least a portion of the virtual environment. The background image may capture aspects of the virtual environment that are intended to appear at a distance. The background image may be texture mapped onto sphere 202. The virtual camera 204 may be at the origin of the sphere 202. In this way, remote features of the virtual environment may be efficiently rendered.

In other embodiments, other shapes may be used instead of sphere 202 to texture map the background image. In various alternative embodiments, the shape may be a cylinder, a cube, a rectangular prism, or any other three-dimensional geometry.

Fig. 3 is a diagram illustrating a system 300 for providing video conferencing in a virtual environment. The system 300 includes a server 302 coupled to devices 306A and 306B via one or more networks 304.

Server 302 provides services for connecting video conference sessions between devices 306A and 306B. As will be described in more detail below, server 302 transmits notifications to devices (e.g., devices 306A-306B) of conference participants when new participants join the conference and when existing participants leave the conference. The server 302 transmits a message describing the position and orientation of the virtual cameras of the individual participants within the three-dimensional virtual space in the three-dimensional virtual space. Server 302 also transmits video streams and audio streams between the respective devices of the participants (e.g., devices 306A through 306B). Finally, the server 302 stores and transmits data describing the specified three-dimensional virtual space to the respective devices 306A to 306B.

In addition to the data necessary for the virtual conference, server 302 may also provide executable information that instructs devices 306A and 306B how to render the data to provide the interactive conference.

Server 302 responds to the request by responding. Server 302 may be a web server. Web servers are software and hardware that use the hypertext transfer protocol (HTTP) and other protocols to respond to client requests made on the world wide web. The main job of web servers is to display web site content by storing web pages, processing web pages, and delivering web pages to users.

In alternative embodiments, communication between devices 306A-306B does not occur through server 302, but rather on a peer-to-peer basis. In the described embodiment, data describing the location and orientation of each participant, notifications regarding new participants and exited participants, and video and audio streams of each participant are not transmitted through server 302, but are transmitted directly between devices 306A through 306B.

Network 304 enables communication between various devices 306A-306B and server 302. Network 304 may be a temporary network, an intranet, an extranet, a Virtual Private Network (VPN), a Local Area Network (LAN), a Wireless LAN (WLAN), a Wide Area Network (WAN), a Wireless Wide Area Network (WWAN), a Metropolitan Area Network (MAN), a portion of the internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any combination of two or more such networks.

Devices 306A through 306B are each devices of the respective participants of the virtual conference. Devices 306A-306B each receive data necessary to conduct a virtual conference and render data necessary to provide the virtual conference. As will be described in more detail below, devices 306A-306B include a display to present rendered meeting information, input to allow a user to control a virtual camera, speakers (e.g., a headset) to provide audio to the user for a meeting, a microphone to capture voice input of the user, and a camera positioned to capture facial video of the user.

Devices 306A-306B may be any type of computing device, including a laptop computer, desktop computer, smart phone, or tablet computer, or a wearable computer (e.g., a smart watch or an augmented reality or virtual reality headset).

The web browsers 308A-308B may retrieve network resources (e.g., web pages) addressed by link identifiers (e.g., uniform resource locators or URLs) and present the network resources for display. In particular, web browsers 308A-308B are software applications for accessing information on the world Wide Web. Typically, web browsers 308A-308B make this request using the Hypertext transfer protocol (HTTP or HTTPS). When a user requests a web page from a particular website, the web browser retrieves the desired content from the web server, interprets and executes the content, and then displays the page shown as client/corresponding conferencing application 310A-310B on the display on devices 306A-306B. In an example, the content may have HTML and client script processing, such as JavaScript. Once displayed, the user may enter information and make selections on the page, which may cause web browsers 308A-308B to make further requests.

Conference applications 310A-310B may be web applications downloaded from server 302 and configured to be executed by respective web browsers 308A-308B. In an embodiment, conference applications 310A-310B may be JavaScript applications. In one example, the meeting applications 310A-310B can be written in a higher-level language (e.g., typescript language) and translated or compiled into JavaScript. Conference applications 310A-310B are configured to interact with WebGL JavaScript application programming interfaces. It may have control code specified in JavaScript and shader code written in the OpenGL ES shading language (GLSL ES). Using the WebGL API, conferencing applications 310A-310B may utilize a graphics processing unit (not shown) of devices 306A-306B. Furthermore, openGL renders interactive two-dimensional graphics and interactive three-dimensional graphics without using plug-ins.

Conference applications 310A-310B receive data describing the location and orientation of other avatars and three-dimensional modeling information describing the virtual environment from server 302. In addition, conference applications 310A-310B receive video streams and audio streams of other conference participants from server 302.

Conference applications 310A through 310B render three-dimensional modeling data, including data describing a three-dimensional environment and data representing the avatars of individual participants. Such rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques. In an embodiment, rendering may involve ray tracing based on characteristics of the virtual camera. Ray tracing involves creating an image by tracing the path of light as pixels in the image plane and simulating its effect on encountering a virtual object. In some embodiments, to enhance realism, ray tracing may simulate optical effects such as reflection, refraction, scattering and dispersion.

In this way, the user uses web browsers 308A-308B to enter virtual space. The scene is displayed on the screen of the user. The web camera video stream and microphone audio stream of the user are sent to the server 302. When other users enter the virtual space, an avatar model is created for them. The position of this avatar is transmitted to the server and received by other users. Other users also obtain notifications from the server 302 that audio/video streams are available. The user's video stream is placed on the avatar created for the user. The audio stream is played as a location from the avatar.

Fig. 4A-4C illustrate how data is transferred between the various components of the system in fig. 3 to provide a video conference. Similar to fig. 3, each of fig. 4A-4C depicts a connection between server 302 and devices 306A and 306B. In particular, fig. 4A-4C illustrate example data flows between devices.

Fig. 4A illustrates a diagram 400 of how server 302 may transmit data describing a virtual environment to devices 306A and 306B. In particular, devices 306A and 306B each receive three-dimensional locale 404, background texture 402, spatial hierarchy 408, and any other three-dimensional modeling information 406 from server 302.

As described above, the background texture 402 is an image that illustrates the remote features of the virtual environment. The image may be regular (e.g., brick wall) or irregular. The background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image formats. Which describes a background image to be rendered for e.g. a sphere at a distance.

The three-dimensional venue 404 is a three-dimensional model of the space in which the meeting is to be conducted. As described above, the three-dimensional locale may include, for example, a grid and its own texture information to be mapped onto the three-dimensional primitives it describes. Which may define a space within which the virtual camera and the respective avatar may navigate within the virtual environment. Thus, it may be bounded by an edge (e.g., a wall or fence) that describes the perimeter of the navigable virtual environment to the user.

Spatial hierarchy 408 is data that specifies partitions in a virtual environment. These partitions are used to determine the sound handling before transmitting sound between participants. As will be described below, this partition data may be hierarchical and may describe sound processing to allow participants of a virtual conference to have areas of private conversation or unidirectional conversation.

Three-dimensional model 406 is any other three-dimensional modeling information required to conduct a conference. In one embodiment, this may include information describing the individual avatars. Alternatively or additionally, this information may include a product display.

As the information needed to conduct a meeting is sent to the participants, fig. 4B-4C illustrate how the server 302 forwards the information from one device to another. Fig. 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and 306B, and fig. 4C illustrates a diagram 460 showing how server 302 transmits information to respective devices 306B and 306A. In particular, device 306A transmits location and direction 422A, video stream 424A, and audio stream 426A to server 302, which transmits location and direction 422A, video stream 424A, and audio stream 426A to device 306B. And device 306B transmits location and direction 422B, video stream 424B, and audio stream 426B to server 302, which transmits location and direction 422B, video stream 424B, and audio stream 426B to device 306A.

Locations and directions 422A-422B describe the location and direction of a virtual camera using the user of device 306A. As described above, the position may be coordinates (e.g., x, y, z coordinates) in three-dimensional space, and the direction may be a direction (e.g., pan, tilt, roll) in three-dimensional space. In some embodiments, the user may not have control over the roll of the virtual camera, so the direction may only specify pan and tilt angles. Similarly, in some embodiments, the user may not be able to change the coordinates of the avatar (because the avatar is constrained by virtual gravity), so the z-coordinate may be unnecessary. In this way, the positions and directions 422A-422B may each include at least coordinates on a horizontal plane in the three-dimensional virtual space, as well as pan and tilt values. Alternatively or additionally, the user may be able to "jump" the avatar, so the Z-position may be specified only by an indication of whether the user is jumping his avatar.

In different examples, the locations and directions 422A-422B may be transmitted and received using HTTP request responses or using socket messaging.

Video streams 424A-424B are video data captured from cameras of respective devices 306A and 306B. The video may be compressed. For example, the video may use any commonly known video codec, including MPEG-4, VP8, or H.264. Video may be captured and transmitted in real time.

Similarly, audio streams 426A through 426B are audio data captured from the microphones of the respective devices. The audio may be compressed. For example, the audio may use any commonly known audio codec, including MPEG-4 or vorbis. Audio may be captured and transmitted in real time. Video stream 424A and audio stream 426A are captured, transmitted, and presented in synchronization with each other. Similarly, video stream 424B and audio stream 426B are captured, transmitted, and presented in synchronization with each other.

Video streams 424A-424B and audio streams 426A-426B may be transmitted using WebRTC application programming interfaces. WebRTC is an API available in JavaScript. As described above, devices 306A and 306B download and run web applications as conference applications 310A and 310B, and conference applications 310A and 310B may be implemented in JavaScript. Conference applications 310A and 310B may use WebRTC to receive and transmit video streams 424A-424B and audio streams 426A-426B by making API calls from their JavaScript.

As mentioned above, when a user leaves the virtual meeting, this departure is communicated to all other users. For example, if device 306A exits the virtual meeting, server 302 will communicate the departure to device 306B. Accordingly, device 306B will stop rendering the avatar corresponding to device 306A, thereby removing the avatar from the virtual space. In addition, device 306B will cease receiving video stream 424A and audio stream 426A.

As described above, conference applications 310A and 310B may periodically or intermittently re-render virtual space based on new information from respective video streams 424A and 424B, locations and directions 422A and 422B, and new information about the three-dimensional environment. For simplicity, each of these updates will now be described from the perspective of device 306A. However, the skilled artisan will appreciate that device 306B will perform similarly given the similar changes.

When device 306A receives video stream 424B, device 306A maps the frame texture from video stream 424A to an avatar corresponding to device 306B. The texture mapped avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306A.

When device 306A receives the new location and new direction 422B, device 306A generates an avatar corresponding to device 306B that is at the new location and oriented in the new direction. The generated avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306A.

In some embodiments, server 302 may send updated model information describing the three-dimensional virtual environment. For example, server 302 may send updated

information

402, 404, 406, or 408. When this occurs, device 306A will re-render the virtual environment based on the updated information. This may be useful when the environment changes over time. For example, outdoor activities may change from daytime to dusk as the activities progress.

Again, when device 306B exits the virtual conference, server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In this case, device 306A would re-render the virtual environment without the avatar of device 306B.

Although fig. 4A-4C and 3 are illustrated with two apparatuses for simplicity, the skilled artisan will appreciate that the techniques described herein may be extended to any number of apparatuses. Also, while fig. 4A-4C and 3 illustrate a single server 302, a skilled artisan will appreciate that the functionality of the server 302 may be distributed among multiple computing devices. In an embodiment, the data transmitted in fig. 4A may be from one network address of the server 302, while the data transmitted in fig. 4B through 4C may be transmitted to/from another network address of the server 302.

In one embodiment, participants may set their webcam, microphone, speaker, and graphics settings prior to entering the virtual meeting. In an alternative embodiment, after launching the application, the user may enter a virtual hall where they are welcome by an avatar controlled by a real person. The person is able to view and modify the graphic settings of the webcam, microphone, speaker and user. The staff may also instruct the user how to use the virtual environment, for example by instructing the user to view, move around, and interact. When the user is ready, the user automatically leaves the virtual waiting room and joins the real virtual environment.

Adjusting volume of video conferences in a virtual environment

Embodiments also adjust the volume to provide a perception of position and space within the virtual meeting. This is illustrated in fig. 5 to 7, 8A to 8B, and 9A to 9C, for example, each of which is described below.

Fig. 5 is a flow chart illustrating a method 500 for adjusting relative left-right volume to provide a perception of location in a virtual environment during a video conference.

At step 502, the volume is adjusted based on the distance between avatars. As described above, an audio stream is received from a microphone of another user's device. The volumes of the first audio stream and the second audio stream are adjusted based on the distance between the second location to the first location. This is illustrated in fig. 6.

Fig. 6 shows a graph 600 illustrating how the volume decreases as the distance between the multiple avatars increases. Graph 600 illustrates volume 602 on its x-axis and y-axis. As the distance between users increases, the volume remains constant until the reference distance 606 is reached. At this point, the volume begins to decrease. In this way, a closer user will typically sound louder than a farther user, all other things being equal.

The speed of sound reduction depends on the roll-off factor. This may be a coefficient built into the settings of the videoconferencing system or client device. As illustrated by

lines

608 and 610, a larger roll-off factor will cause the volume to degrade faster than a smaller roll-off factor.

Returning to fig. 5, at step 504, the relative left and right audio is adjusted based on the direction in which the avatar is located. That is, the volume of audio output on a user's speaker (e.g., headphones) will vary to provide a perception of where the talking user's avatar is located. The relative volumes of the left and right audio streams are adjusted based on the direction of the location of the user generating the audio stream (e.g., the location of the avatar of the speaking user) relative to the location of the user receiving the audio (e.g., the location of the virtual camera). The location may be on a horizontal plane within the three-dimensional virtual space. The relative volumes of the left and right audio streams provide a perception of the position of the second location relative to the first location in the three-dimensional virtual space.

For example, at step 504, the audio corresponding to the avatar on the left side of the virtual camera will be adjusted such that the audio is output at a higher volume at the receiving user's left ear than the right ear. Similarly, the audio corresponding to the avatar on the right side of the virtual camera will be adjusted so that the audio is output at a higher volume at the receiving user's right ear than the left ear.

At step 506, the relative left and right audio is adjusted based on the direction in which one avatar is oriented relative to the other avatar. The relative volumes of the left and right audio streams are adjusted based on the angle between the direction in which the virtual camera faces and the direction in which the avatar faces, such that a more perpendicular angle tends to have a larger difference in volume between the left and right audio streams.

For example, when the avatar is directly facing the virtual camera, the relative left-right volume of the corresponding audio stream of the avatar may not be adjusted at all in step 506. When the avatar faces the left side of the virtual camera, the relative left and right volumes of the corresponding audio streams of the avatar may be adjusted such that the left volume is louder than the right volume. And, when the avatar faces the right side of the virtual camera, the relative left and right volumes of the corresponding audio streams of the avatar may be adjusted such that the right volume is louder than the left volume.

In an example, the calculation in step 506 may involve a vector product of the angle that the avatar faces and the angle that the avatar faces. The angle may be the direction in which it faces in the horizontal plane.

In an embodiment, a check may be made to determine the audio output device being used by the user. The adjustments in

steps

504 and 506 may not occur if the audio output device is not a set of headphones or another type of speaker that provides a stereo effect.

Steps 502-506 are repeated for each audio stream received from each other participant. Based on the calculations in steps 502-506, a left audio gain and a right audio gain for each of the other participants are calculated.

In this way, the audio streams of each participant are adjusted to provide a perception of where the avatar of the participant is located in the three-dimensional virtual environment.

Not only is the audio stream adjusted to provide a look at the location of the avatar, but in some embodiments, the audio stream may be adjusted to provide a private or semi-private volume area. In this way, the virtual environment enables the user to conduct a private conversation. Also, the virtual environment enables users to communicate with each other and allows individual, unidirectional conversations to occur, which is not possible with conventional video conferencing software. This is explained with respect to fig. 7, for example.

Fig. 7 is a flow chart illustrating a method 700 for adjusting relative volume to provide different volume regions in a virtual environment during a video conference.

As described above, the server may provide a specification of sound areas or volume areas to the client device. The virtual environment may be divided into different volume regions. At step 702, the device determines in which sound zones the respective avatars and virtual cameras are located.

For example, fig. 8A-8B are diagrams illustrating different volume regions in a virtual environment during a video conference. Fig. 8A illustrates a diagram 800 having a volume area 802 that allows a semi-private conversation or unidirectional conversation between a user controlling an avatar 806 and a user controlling a virtual camera. In this way, users around conference table 810 may conduct conversations without interfering with other users in the room. The sound from the user controlling avatar 806 in the virtual camera may drop as it exits volume area 802, but not completely disappear. This allows passers-by to join the conversation if they wish.

Interface 800 also includes

buttons

804, 806, and 808, which will be described below.

Fig. 8B illustrates a diagram 800 with a volume area 804, the volume area 804 allowing a private conversation to be conducted between a user controlling an avatar 808 and a user controlling a virtual camera. Once inside the volume area 804, audio from the user controlling the avatar 808 and the user controlling the virtual camera may be output only to those users inside the volume area 804. Since no audio at all is played in the conference from those users to other users, their audio streams may not even be transmitted to other user devices.

The volume space may be hierarchical, as illustrated in fig. 9A and 9B. Fig. 9B is a diagram 930 showing a layout with different volume regions arranged into a sub-structure.

Volume areas

934 and 935 are within volume area 933, and

volume areas

933 and 932 are within volume area 931. These volume areas are represented in a hierarchical tree, as illustrated in fig. 900 and 9A.

In graph 900, node 901 represents volume area 931 and is the root of the tree.

Nodes

902 and 903 are children of node 901 and represent

volume areas

932 and 933.

Nodes

904 and 906 are children of node 903 and represent

volume areas

934 and 935.

If a user located in region 934 attempts to listen to a user located in region 932, the audio stream must traverse several different virtual "walls," each of which attenuates the audio stream. Specifically, sound must pass through the wall of region 932, the wall of region 933, and the wall of region 934. Each wall is weakened by a specific factor. This calculation is described with respect to step 704 and step 706 in fig. 7.

At step 704, the hierarchy is traversed to determine which different sound regions exist between avatars. This is illustrated, for example, in fig. 9C. Starting with a node corresponding to the virtual area of the speaking voice (in this case, node 904), a path is determined to the node of the receiving user (in this case, node 902). To determine the path, links 952 traveling between nodes are determined. In this way, a subset of regions between the region including the avatar and the region including the virtual camera is determined.

At step 706, the audio streams from the speaking user are attenuated based on the respective wall transmission factors for the subset of regions. Each respective wall transmission factor specifies a degree to which the audio stream is attenuated.

Additionally or alternatively, different regions have different roll-off factors, in which case the distance-based computation shown in method 600 may be applied to the respective regions based on the respective roll-off factors. In this way, different areas of the virtual environment emit sound at different rates. The audio gain determined in the method described above with respect to fig. 5 may be applied to the audio stream to determine left and right audio accordingly. In this way, the wall transmission factor, roll-off factor, and left-right adjustment that provides a sense of sound direction may be applied together to provide a comprehensive audio experience.

Different audio regions may have different functionalities. For example, the volume area may be a podium area. If the user is located in a podium area, some or all of the attenuation described with respect to fig. 5 or 7 may not occur. For example, no attenuation due to roll-off factors or wall transport factors may occur. In some embodiments, the relative left and right audio may still be adjusted to provide a sense of direction.

For exemplary purposes, the methods described with respect to fig. 5 and 7 describe audio streams from users having corresponding avatars. However, the same method may be applied to other sound sources than the avatar. For example, the virtual environment may have a three-dimensional model of speakers. Sound may be emitted from the speakers in the same manner as the avatar model described above due to the presentation or just to provide background music.

As mentioned above, wall transmission factors may be used to completely isolate audio. In an embodiment, this may be used to create a virtual office. In one example, each user may have a monitor in their physical (possibly home) office that continuously displays the meeting application and logs into the virtual office. There may be features that allow the user to indicate whether they are in the office or they are undisturbed. If the do-not-disturb indicator is off, then the colleague or manager may enter into the virtual space and knock or walk in as it did while in the physical office. If the worker is not in her office, the interviewer can leave a note. When the worker returns, she will be able to read the notes left by the interviewee. The virtual office may have a whiteboard and/or interface that displays user messages. The message may be an email and/or from a messaging application, such as a Slack application available from Slack technologies company of san francisco, california.

Users can customize or personalize their virtual offices. For example, they may be able to post poster models or other wall ornaments. They may be able to change the model or orientation of a table or ornament (e.g., a plant). They can change illumination or look outside from the window.

Returning to fig. 8A, interface 800 includes

various buttons

804, 806, and 808. The attenuation described above with respect to the methods in fig. 5 and 7 may not occur, or may occur only in a small amount, when the user presses button 804. In this case, the user's voice is output uniformly to other users, allowing the users to speak to all participants in the conference. User video may also be output on a presentation screen within the virtual environment, as will be described below. When the user presses button 806, the speaker mode is enabled. In this case, audio is output from sound sources within the virtual environment in order to play background music. When the user presses button 808, the screen sharing mode may be enabled, thereby enabling the user to share the content of the screen or window on his device with other users. The content may be presented on a presentation model. This will also be described below.

Demonstration in a three-dimensional environment

FIG. 10 illustrates an interface 1000 with a three-dimensional model 1004 in a three-dimensional virtual environment. As described above with respect to fig. 1, interface 1000 may be displayed to a user that may navigate around a virtual environment. As illustrated in interface 1000, the virtual environment includes an avatar 1004 and a three-dimensional model 1002.

The three-dimensional model 1002 is a 3D model of a product placed inside a virtual space. One can join this virtual space to observe the model and walk around it. The product may have localized sounds to enhance the experience.

More particularly, when a presenter in virtual space wants to present a 3D model, it selects the desired model from the interface. This sends a message to the server to update the details (including the name and path of the model). This will be automatically transmitted to the client. In this way, the three-dimensional model may be rendered for synchronous display when the video stream is presented. The user may navigate the virtual camera around the three-dimensional model of the product.

In different examples, the object may be a product display or may be an advertisement for a product.

Fig. 11 illustrates an interface 1100 with presentation screen sharing in a three-dimensional virtual environment for video conferencing. As described above with respect to fig. 1, interface 1100 may be displayed to a user that may navigate around a virtual environment. As illustrated in interface 1100, the virtual environment includes avatar 1104 and presentation screen 1106.

In this embodiment, a presentation stream is received from a device of a participant in a conference. The presentation flow is texture mapped onto a three-dimensional model of the presentation screen 1106. In one embodiment, the presentation stream may be a video stream from a camera on the user's device. In another embodiment, the presentation stream may be a screen share from the user's device, where the monitor or window is shared. The presentation video stream and the presentation audio stream may also come from external sources, such as live campaigns, through screen sharing or otherwise. When the user enables the presenter mode, the user's presentation stream (and audio stream) is published to the server and marked with the name of the screen the user wants to use. Other clients are notified that a new flow is available.

The presenter may also be able to control the position and orientation of audience members. For example, a presenter may have the option of selecting to rearrange all other participants to a meeting so that they are positioned and oriented to face the presentation screen.

The audio stream is captured synchronously with the presentation stream and from a microphone of the first participant's device. The audio stream from the user's microphone may be heard by other users as an audio stream from the presentation screen 1106. In this way, the presentation screen 1106 may be a sound source as described above. Because the user's audio stream is projected from the presentation screen 1106, it may be suppressed by the user's avatar. In this way, the audio stream is output to be played in synchronization with the display of the presentation stream on the screen 1106 within the three-dimensional virtual space.

Allocating bandwidth based on distance between users

Fig. 12 is a flow chart illustrating a method 1200 for allocating available bandwidth based on the relative positions of avatars within a three dimensional virtual environment.

At step 1202, a distance between a first user and a second user in a virtual meeting space is determined. The distance may be a distance between the first user and the second user on a horizontal plane in three-dimensional space.

At step 1204, the received video streams are prioritized such that the video streams of closer users are prioritized over the video streams from farther users. The priority value may be determined as illustrated in fig. 13.

Fig. 13 shows a chart 1300 showing priority 1306 on the y-axis, as well as distance 1302. As illustrated by line 1306, the priority status remains at a constant level until the reference distance 1304 is reached. After the reference distance is reached, the priority starts to drop.

At step 1206, available bandwidth between the various video streams is allocated to the user device. This may be done based on the priority value determined in step 1204. For example, the priorities may be scaled such that all priorities sum to 1. For any video in which the available bandwidth is insufficient, the relative priority may be reduced to zero. Subsequently, the priority is again adjusted for the remainder of the video stream. Bandwidth is allocated based on these relative priority values. In addition, bandwidth may be reserved for audio streams. This is illustrated in fig. 14.

Fig. 14 illustrates a graph 1400 in which the y-axis represents bandwidth 1406 and the x-axis represents relative priority. After the effective minimum bandwidth 1406 is allocated to the video, the bandwidth 1406 allocated to the video stream is scaled up with its relative priority.

Once the allocated bandwidth is determined, the client may request the video from the server by selecting and allocating bandwidth/bit rate/frame rate/resolution to the video. This may begin a negotiation process between the client and the server to begin streaming video at the specified bandwidth. In this way, the available video bandwidth and the available audio bandwidth are divided fairly for all users, where users with twice as many priority will get twice as much bandwidth.

In one possible implementation, all clients send multiple video streams with different bit rates and resolutions to the server using simulcast. The other clients may then indicate to the server which of the streams they focus on and want to receive.

At step 1208, a determination is made as to whether the available bandwidth between the first user and the second user in the virtual meeting space invalidates the video display at the remote location. This determination may be done by the client or the server. If completed by the client, the client sends a message to the server to stop transmitting video to the client. If the video display is not valid, transmission of the video stream to the second user's device is paused and the second user's device is notified to replace the video stream with a still image. The still image may be just the last video frame (or one of the last video frames) received.

In one embodiment, a similar process may be performed for audio, reducing quality according to the size of the audio reserved portion. In another embodiment, the bandwidth of each audio stream is uniform.

In this way, embodiments improve performance for all users, and servers may reduce the quality of video and audio streams for users that are far away and/or less important. This is not done when there is sufficient bandwidth budget available. The reduction is done in bit rate and resolution. This improves video quality because the encoder can more efficiently utilize the available bandwidth of the user.

Independently of this, the video resolution is scaled down based on distance, with twice as far users having half the resolution. In this way, unnecessary resolution may not be downloaded in view of limitations in screen resolution. Thus saving bandwidth.

Fig. 15 is a diagram illustrating a system 1500 to provide components of a device for video conferencing within a virtual environment. In various embodiments, system 1500 may operate according to the methods described above.

Device 306A is a user computing device. Device 306A may be a desktop or laptop computer, a smart phone, a tablet computer, or a wearable device (e.g., a watch or a head-mounted device). Device 306A includes microphone 1502, camera 1504, stereo speakers 1506, input devices 1512. Although not shown, the apparatus 306A also includes a processor and persistent, non-transitory, and volatile memory. The processor may include one or more central processing units, graphics processing units, or any combination thereof.

Microphone 1502 converts sound into electrical signals. Microphone 1502 is positioned to capture voice of a user of device 306A. In different examples, the microphone 1502 may be a capacitive microphone, an electret microphone, a moving-coil microphone, a ribbon microphone, a carbon microphone, a piezoelectric microphone, a fiber optic microphone, a laser microphone, a water microphone, or a MEMs microphone.

The camera 1504 captures image data by capturing light substantially through one or more lenses. The camera 1504 is positioned to capture a photographic image of the user of the device 306A. The camera 1504 includes an image sensor (not shown). The image sensor may be, for example, a Charge Coupled Device (CCD) sensor or a Complementary Metal Oxide Semiconductor (CMOS) sensor. The image sensor may include one or more photodetectors that detect light and convert the light into electrical signals. These electrical signals captured together over a similar time frame include still photographic images. A series of still photographic images captured at regular intervals together constitute a video. In this way, the camera 1504 captures images and video.

The stereo speaker 1506 is a device that converts electrical audio signals into corresponding left and right sounds. The stereo speakers 1506 output left and right audio streams that are generated by an audio processor 1520 (hereinafter), which will be played in stereo to a user of the device 306A. Stereo speakers 1506 include both ambient speakers and headphones designed to play sound directly into the user's left and right ears. Example speakers include moving iron loudspeakers, piezoelectric loudspeakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, planar loudspeakers, halr pneumatic transducers, transparent ion conducting loudspeakers, plasma arc loudspeakers, thermoacoustic loudspeakers, rotary woofers, moving coil armatures, electrostatic armatures, electret armatures, planar magnetic armatures, and balanced armatures.

The network interface 1508 is a software interface or a hardware interface between two pieces of equipment or two protocol layers in a computer network. Network interface 1508 receives video streams of the various participants of the meeting from server 302. The video stream is captured from a camera on a device of another participant in the video conference. The network interface 1508 also receives data from the server 302 specifying the three-dimensional virtual space and any models therein. For each of the other participants, the network interface 1508 receives a position and a direction in the three-dimensional virtual space. The position and direction are entered by each of the various other participants.

The network interface 1508 also transmits data to the server 302. The network interface transmits the location of the user virtual camera of device 306A used by renderer 1518 and transmits the video and audio streams from camera 1504 and microphone 1502.

The display 1510 is an output device for presenting electronic information in visual or tactile form (e.g., using tactile form in the example of a tactile electronic display for use by the blind). The display 1510 may be a television, a computer display, a head-mounted display, a head-up display, an output of an augmented reality headset or virtual reality headset, a broadcast reference monitor, a medical monitor mobile display (for a mobile device), a smartphone display (for a smartphone). To present information, the display 1510 may include an Electroluminescent (ELD) display, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) backlight LCD, a Thin Film Transistor (TFT) LCD, a Light Emitting Diode (LED) display, an OLED display, an AMOLED display, a plasma (PDP) display, a quantum dot (QLED) display.

Input device 1512 is equipment for providing data and control signals to an information handling system, such as a computer or information device. The input device 1512 allows a user to input a new desired location of the virtual camera used by the renderer 1518, thereby enabling navigation in a three-dimensional environment. Examples of input devices include keyboards, mice, scanners, navigation bars, and touch screens.

Web browser 308A and Web application 310A are described above with respect to FIG. 3. The web application 310A includes a screen capturer 1514, a texture mapper 1516, a renderer 1518, and an audio processor 1520.

The screen capturer 1514 captures a presentation stream, particularly screen sharing. The screen capturer 1514 may interact with an API provided by the web browser 308A. By invoking the functionality available from the API, the screen capturer 1514 may cause the web browser 308A to ask the user which window or screen they want to share. Based on the answer to the query, web browser 308A may return a video stream corresponding to the screen share to screen capturer 1514, which screen capturer 1514 passes the video stream on to network interface 1508 for transmission to server 302 and ultimately to the other participants' devices.

The texture mapper 1516 texture maps the video stream onto a three-dimensional model corresponding to the avatar. The texture mapper 1516 may texture map individual frames from the video to the avatar. In addition, the texture mapper 1516 may texture map the presentation stream to a three-dimensional model of the presentation screen.

The renderer 1518 renders from the perspective of the virtual camera of the user of the device 306A to output a three-dimensional virtual space to the display 1510, the output three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar of each participant at the received corresponding location and oriented in a direction. The renderer 1518 also renders any other three-dimensional model including, for example, a presentation screen.

The audio processor 1520 adjusts the volume of the received audio streams to determine the left audio stream and the right audio stream to provide a perception of the position of the second position relative to the first position in the three-dimensional virtual space. In one embodiment, the audio processor 1520 adjusts the volume based on the distance between the second position to the first position. In another embodiment, the audio processor 1520 adjusts the volume based on the direction of the second position to the first position. In yet another embodiment, the audio processor 1520 adjusts the volume based on the direction of the second position relative to the first position on a horizontal plane within the three dimensional virtual space. In yet another embodiment, the audio processor 1520 adjusts the volume based on the direction in which the virtual camera faces in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located at the left side of the virtual camera and the right audio stream tends to have a higher volume when the avatar is located at the right side of the virtual camera. Finally, in yet another embodiment, the audio processor 1520 adjusts the volume based on the angle between the direction in which the avatar faces and the direction in which the avatar faces such that angles more perpendicular to the direction in which the avatar faces tend to have a larger difference in volume between the left audio stream and the right audio stream.

The audio processor 1520 may also adjust the volume of the audio stream based on the region in which the speaker is located relative to the region in which the virtual camera is located. In this embodiment, the three-dimensional virtual space is divided into a plurality of areas. These regions may be hierarchical. When the speaker and the virtual camera are located in different areas, a wall transmission factor may be applied to attenuate the volume of the speaking audio stream.

Server 302 includes presence notifier 1522, stream adjuster 1524, and stream repeater 1526.

The presence notifier 1522 notifies conference participants as they join and leave the conference. When a new participant joins the conference, the presence notifier 1522 sends a message to the device of another participant of the conference indicating that the new participant has joined. The presence notifier 1522 signals the stream forwarder 1526 to start forwarding video, audio and location/direction information to other participants.

The stream adjuster 1524 receives a video stream captured from a camera on the first user's device. The flow adjuster 1524 determines the available bandwidth to transmit data for the virtual conference to the second user. The flow adjuster determines a distance between a first user and a second user in the virtual meeting space. And it allocates the available bandwidth between the first video stream and the second video stream based on the relative distance. In this way, the stream adjuster 1524 prioritizes the video stream of the closer user over the video stream from the farther user. Additionally or alternatively, the stream adjuster 1524 may be located on the device 306A as part of the web application 310A.

The stream repeater 1526 broadcasts the received position/orientation information, video, audio, and screen sharing (adjusted by the stream adjuster 1524). The stream forwarder 1526 may send information to the device 306A in response to a request from the conference application 310A. Conference application 310A may send a request in response to a notification from presence notifier 1522.

The network interface 1528 is a software interface or a hardware interface between two pieces of equipment or two protocol layers in a computer network. The network interface 1528 transmits the model information to the devices of the individual participants. The network interface 1528 receives video, audio, and screen sharing screens from the various participants.

The screen capturer 1514, texture mapper 1516, renderer 1518, audio processor 1520, presence notifier 1522, stream adjuster 1524, and stream repeater 1526 may each be implemented in hardware, software, firmware, or any combination thereof.

Identifiers such as "(a)", "(b)", "(i)", "(ii)", etc. are sometimes used in different elements or steps. These identifiers are used for clarity and do not necessarily specify an order of elements or steps.

The invention has been described above with the aid of functional building blocks illustrating embodiments of the specified functions and relationships thereof. For ease of description, boundaries of these functional building blocks have been arbitrarily defined herein. Alternate boundaries may be defined so long as the specified functions and relationships thereof are appropriately performed.

The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments without undue experimentation, without departing from the general concept of the present invention. Accordingly, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.

The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A system for enabling a video conference between a first user and a second user, the system comprising:

a processor coupled to the memory;

a display screen;

a network interface configured to receive: (i) Data specifying a three-dimensional virtual space, (ii) a location and a direction in the three-dimensional virtual space, the location and the direction being input by the first user, and a video stream captured from a camera on a device of the first user, the camera being positioned to capture a photographic image of the first user;

A web browser implemented on the processor, the web browser configured to download a web application from a server and execute the web application, wherein the web application comprises:

a texture mapper configured to texture map the video stream onto a three-dimensional model of an avatar, and

a renderer, the renderer configured to:

(i) Rendering from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space comprising a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction,

(ii) Upon receiving an input from the second user indicating a desire to change the perspective of the virtual camera, changing the perspective of the virtual camera of the second user, an

(iii) Re-rendering from the changed perspective of the virtual camera to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction.

2. The system of claim 1, wherein the device further comprises a graphics processing unit, and wherein the texture mapper and the renderer comprise WebGL application calls that enable the web page application to texture map or render using the graphics processing unit.

3. A computer-implemented method for effectuating a video conference between a first user and a second user, the computer-implemented method comprising:

transmitting a web application to a first client device of the first user and to a second client device of the second user;

receiving from the first client device executing the web application (i) a location and an orientation in a three-dimensional virtual space, wherein the location and orientation are entered by the first user, and (ii) a video stream captured from a camera on the first client device, the camera being positioned to capture a photographic image of the first user; and

transmitting the location and the direction and the video stream to the second client device of the second user, wherein the web application comprises executable instructions that when executed on a web browser cause the second client device to:

(i) Mapping the video stream from the perspective texture of the second user's virtual camera onto the avatar's three-dimensional model,

(ii) Rendering to display said three-dimensional virtual space to said second user, the displayed three-dimensional virtual space comprising a texture-mapped three-dimensional model of said avatar located at said location and oriented in said direction,

4. The method of claim 3, wherein the web application comprises a WebGL application call that enables the web application to be texture mapped or rendered using a graphics processing unit of the second client device.

5. A computer-implemented method for effectuating a video conference between a first user and a second user, the computer-implemented method comprising:

receiving data specifying a three-dimensional virtual space;

receiving a position and a direction in the three-dimensional virtual space, wherein the position and the direction are input by the first user;

receiving a video stream captured from a camera on a device of the first user, the camera positioned to capture photographic images of the first user;

texture mapping the video stream onto a three-dimensional model of an avatar by a web application implemented on a web browser; and

rendering, by the web application implemented on the web browser, from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction;

upon receiving an input from the second user indicating a desire to change the perspective of the virtual camera:

changing the perspective of the virtual camera of the second user; and

Re-rendering from the changed perspective of the virtual camera to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction.

6. The method according to claim 5, wherein the method further comprises:

receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone positioned to capture speech of the first user; and

and outputting the audio stream synchronously with the display of the video stream in the three-dimensional virtual space for playing to the second user.

7. The method of claim 5, wherein the perspective of the virtual camera is defined by at least one coordinate on a horizontal plane in the three-dimensional virtual space and a pan value and a tilt value.

8. The method of claim 5, further comprising, upon receiving a new location and a new direction of the first user in the three-dimensional virtual space:

re-rendering to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the new position and oriented in the new direction.

9. The method of claim 5, wherein the texture mapping comprises iteratively mapping pixels onto a three-dimensional model of the avatar for each frame of the video stream.

10. The method of claim 5, wherein the data, the location and the direction, and the video stream are received from a server at a web browser, and wherein the texture mapping and rendering are performed by the web browser.

11. The method according to claim 10, wherein the method further comprises:

receiving a notification from the server indicating that the first user is no longer available; and

re-rendering to display the three-dimensional virtual space on the web browser to the second user, the three-dimensional virtual space having no texture-mapped three-dimensional model of the avatar.

12. The method according to claim 11, wherein the method further comprises:

receiving a notification from the server indicating that a third user has entered the three-dimensional virtual space;

receiving a second position and a second direction of the third user in the three-dimensional virtual space;

Receiving a second video stream captured from a camera on the third user's device, the camera positioned to capture photographic images of the third user;

texture mapping the second video stream onto a second three-dimensional model of a second avatar; and

rendering from the perspective of the virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a second texture-mapped three-dimensional model located at the second location and oriented in the second direction.

13. The method of claim 5, wherein receiving data specifying the three-dimensional virtual space comprises receiving a grid specifying a conference space and receiving a background image, wherein rendering comprises texture mapping the background image onto a sphere.

14. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for implementing a video conference between a first user and a second user, the operations comprising:

Receiving data specifying a three-dimensional virtual space;

receiving a position and a direction in the three-dimensional virtual space, the position and direction being input by the first user;

texture mapping the video stream onto a three-dimensional model of an avatar;

rendering from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction,

changing the perspective of the virtual camera of the second user; and

re-rendering from the changed perspective of the virtual camera to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction,

Wherein the data, the location and the direction, and the video stream are received from a server at a web browser, and wherein the texture mapping and rendering are performed by the web browser.

15. The apparatus of claim 14, wherein the operations further comprise:

16. The apparatus of claim 14, wherein the perspective of the virtual camera is defined by at least one coordinate on a horizontal plane in the three-dimensional virtual space and a pan value and a tilt value.

17. The apparatus of claim 14, wherein the operations further comprise, upon receiving a new location and a new direction of the first user in the three-dimensional virtual space:

18. The apparatus of claim 14, wherein the texture mapping comprises iteratively mapping pixels onto a three-dimensional model of the avatar for each frame of the video stream.

19. The apparatus of claim 14, wherein the operations further comprise:

re-rendering to display the three-dimensional virtual space to the second user on the web browser, the displayed three-dimensional virtual space having no texture-mapped three-dimensional model of the avatar.

20. The apparatus of claim 19, wherein the operations further comprise:

Rendering from a perspective of the virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a second texture-mapped three-dimensional model located at the second location and oriented in the second direction.

21. The apparatus of claim 14, wherein the receiving data specifying the three-dimensional virtual space comprises receiving a grid specifying a conference space and receiving a background image, wherein the rendering comprises texture mapping the background image onto a sphere.

22. A computer-implemented method for effectuating a video conference between a first user and a second user, the computer-implemented method comprising:

receiving data specifying a three-dimensional virtual space;

receiving a position and a direction in the three-dimensional virtual space, the position and the direction being input by the first user;

texture mapping the video stream onto a three-dimensional model of an avatar by a web application implemented on a web browser;

receiving a notification from a server indicating that the first user is no longer available; and

re-rendering to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space having no texture-mapped three-dimensional model of the avatar.

23. The method according to claim 22, wherein the method further comprises:

24. A system for enabling a video conference between a first user and a second user, the system comprising:

a processor coupled to the memory;

a display screen;

a network interface configured to receive: (i) Data specifying a three-dimensional virtual space, (ii) a location and a direction in the three-dimensional virtual space, wherein the location and the direction are input by the first user, and a video stream captured from a camera on a device of the first user, the camera positioned to capture a photographic image of the first user;

a mapper configured to map the video stream onto a three-dimensional model of an avatar, and

A renderer configured to render from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a three-dimensional model of the avatar with a mapped video stream located at the location and oriented in the direction.

25. The system of claim 24, wherein the device further comprises a graphics processing unit, and wherein the mapper and the renderer comprise WebGL application calls that enable the web application to map or render using the graphics processing unit.

26. A computer-implemented method for effectuating a video conference between a first user and a second user, the computer-implemented method comprising:

receiving from the first client device executing the web application (i) a location and an orientation in a three-dimensional virtual space, the location and the orientation being input by the first user, and (ii) a video stream captured from a camera on the first client device, the camera being positioned to capture a photographic image of the first user; and

Transmitting the location and the direction and the video stream to the second client device of the second user, wherein the web application comprises executable instructions that when executed on a web browser map the video stream onto a three-dimensional model of an avatar and render from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space comprising the three-dimensional model of the avatar at the location and oriented in the direction mapped with the video stream.

27. The method of claim 26, wherein the web application comprises a WebGL application call that enables the web application to map or render using a graphics processing unit of the second client device.

28. A computer-implemented method for effectuating a video conference between a first user and a second user, the computer-implemented method comprising:

receiving data specifying a three-dimensional virtual space;

mapping the video stream onto the three-dimensional model of the avatar by a web application implemented on a web browser; and

rendering, by the web application implemented on the web browser, from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a three-dimensional model of the avatar located at the location and oriented in the direction.

29. The method of claim 28, wherein the method further comprises:

30. The method of claim 28, further comprising, upon receiving an input from the second user indicating a desire to change the perspective of the virtual camera:

Changing a perspective of the virtual camera of the second user; and

re-rendering from the changed perspective of the virtual camera to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a three-dimensional model of the avatar located at the location and oriented in the direction.

31. The method of claim 30, wherein the perspective of the virtual camera is defined by at least one coordinate on a horizontal plane in the three-dimensional virtual space and a pan value and a tilt value.

32. The method of claim 28, further comprising, upon receiving a new location and a new direction of the first user in the three-dimensional virtual space:

re-rendering to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including the three-dimensional model of the avatar located at the new position and oriented in the new direction.

33. The method of claim 28, wherein the mapping comprises repeatedly mapping pixels onto the three-dimensional model of the avatar for each frame of the video stream.

34. The method of claim 28, wherein the data, the location and the direction, and the video stream are received from a server at a web browser, and wherein the mapping and rendering are performed by the web browser.

35. The method of claim 34, wherein the method further comprises:

re-rendering to display the three-dimensional virtual space to the second user on the web browser, the displayed three-dimensional virtual space having no three-dimensional model of the avatar.

36. The method according to claim 35, wherein the method further comprises:

mapping the second video stream onto a second three-dimensional model of a second avatar; and

Rendering from a perspective of the virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including the second three-dimensional model located at the second location and oriented in the second direction.

37. The method of claim 28, wherein receiving data specifying the three-dimensional virtual space comprises receiving a grid specifying a conference space and receiving a background image, wherein rendering comprises mapping the background image onto a sphere.

38. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for enabling a video conference between a first user and a second user, the operations comprising:

receiving data specifying a three-dimensional virtual space;

Mapping the video stream onto a three-dimensional model of an avatar; and

rendering from a perspective of a virtual camera of the second user to display the three-dimensional virtual space to the second user, the displayed three-dimensional virtual space including a three-dimensional model of the avatar located at the location and oriented in the direction.

39. The apparatus of claim 38, wherein the operations further comprise:

40. The device of claim 38, wherein the operations further comprise, upon receiving an input from the second user indicating a desire to change the perspective of the virtual camera:

changing a perspective of the virtual camera of the second user; and

41. The apparatus of claim 40, wherein the perspective of the virtual camera is defined by at least one coordinate on a horizontal plane in the three-dimensional virtual space and a pan value and a tilt value.

42. The apparatus of claim 38, wherein the operations further comprise, upon receiving a new location and a new direction of the first user in the three-dimensional virtual space:

43. The apparatus of claim 38, wherein the mapping comprises repeatedly mapping pixels onto the three-dimensional model of the avatar for each frame of the video stream.

44. The apparatus of claim 38, wherein the data, the location and the direction, and the video stream are received from a server at a web browser, and wherein the mapping and the rendering are performed by the web browser.

45. The apparatus of claim 44, wherein the operations further comprise:

re-rendering to display the three-dimensional virtual space to the second user at the web browser, the displayed three-dimensional virtual space having no three-dimensional model of the avatar.

46. The apparatus of claim 45, wherein the operations further comprise:

47. The apparatus of claim 38, wherein receiving data specifying the three-dimensional virtual space comprises receiving a grid specifying a conference space and receiving a background image, wherein the rendering comprises mapping the background image onto a sphere.

48. A computer-implemented method for presentation in a virtual conference comprising a plurality of participants, the computer-implemented method comprising:

receiving data specifying a three-dimensional virtual space;

receiving a position and a direction in the three-dimensional virtual space, the position and the direction being input to the virtual conference by a first participant of the plurality of participants;

receiving a video stream captured from a camera on a device of the primary participant, the camera being positioned to capture photographic images of the primary participant;

mapping the video stream onto a three-dimensional model of an avatar;

receiving a presentation stream from the device of the primary participant;

mapping the demonstration stream to a three-dimensional model of a demonstration screen; and

rendering from a perspective of a virtual camera of a second participant of the plurality of participants to display the three-dimensional virtual space to the second participant, the displayed three-dimensional virtual space having a mapped avatar and a mapped presentation screen.

49. The method of claim 48, further comprising:

receiving an audio stream captured in synchronization with the presentation stream from a microphone of the device of the primary participant, the microphone positioned to capture speech of the primary participant; and

And outputting the audio stream synchronously with the display of the presentation stream in the three-dimensional virtual space so as to be played to the second participant.

50. The method of claim 48, further comprising:

receiving the position of a third participant in the plurality of participants in the three-dimensional virtual space;

receiving an audio stream from a microphone of the tertiary participant's device, the microphone positioned to capture speech of the tertiary participant; and

adjusting the audio stream to provide a perception of a location of receipt of the tertiary participant in the three-dimensional virtual space relative to a location of the virtual camera,

wherein the rendering comprises rendering to display the three-dimensional virtual space to the secondary participant, the displayed three-dimensional virtual space having an avatar of the tertiary participant at the receiving location.

51. The method of claim 48, further comprising:

receiving the position of the first participant in the three-dimensional virtual space;

receiving an audio stream from a microphone of the primary participant's device, the microphone positioned to capture speech of the primary participant;

Adjusting the audio stream to provide a perception of a location of reception of the primary participant in the three-dimensional virtual space relative to a location of the virtual camera;

rendering to display the three-dimensional virtual space to the secondary participant, the displayed three-dimensional virtual space having an avatar of the tertiary participant at the receiving location; and

when a presentation mode is entered, the audio stream is adjusted to provide a perception of the mapped presentation screen position relative to the position of the virtual camera.

52. The method of claim 48 wherein the presentation stream is a video of the primary participant.

53. The method of claim 48 wherein the presentation stream is a screen share of the primary participant.

54. The method of claim 48 wherein mapping the video stream comprises mapping frames of the video stream onto a three-dimensional model of the avatar to present moving images of the first participant's face on the avatar.

55. The method of claim 54, wherein the avatar comprises a surface, and wherein the mapping comprises mapping the individual frames onto the surface.

56. The method of claim 55, wherein the rendering comprises rendering a mapped avatar at the location and the direction in the three-dimensional virtual space, wherein the primary participant is capable of changing the location and direction of the mapped avatar within the rendered three-dimensional virtual space based on changes in the location and the direction entered by the primary participant.

57. The method of claim 55, wherein the rendering comprises rendering such that the avatar is located at the position in the three-dimensional virtual space and the surface is oriented in the direction in the three-dimensional virtual space, the method further comprising:

receiving a new direction in the three-dimensional virtual space, the new direction being input by the primary participant;

re-rendering from the perspective of the virtual camera of the secondary participant to display the three-dimensional virtual space to the secondary participant such that the surface of the texture-mapped avatar is oriented in the new direction.

58. The method of claim 57 wherein when the primary participant enters the new direction, the primary participant's virtual camera is changed in accordance with the new direction, the primary participant's virtual camera specifying how to render the three-dimensional virtual space for display to the primary participant.

59. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for presentation in a virtual conference comprising a plurality of participants, the operations comprising:

receiving data specifying a three-dimensional virtual space;

mapping the video stream onto a three-dimensional model of an avatar;

receiving a presentation stream from a device of the first meeting party;

60. The apparatus of claim 59, wherein the operations further comprise:

61. The apparatus of claim 59, wherein the operations further comprise:

62. The apparatus of claim 59, wherein the operations further comprise:

when entering the demonstration mode, the audio stream is adjusted to provide a perception of the position of the mapped demonstration screen relative to the position of the virtual camera.

63. The apparatus of claim 59 wherein the presentation stream is a video of the primary participant.

64. The apparatus of claim 59 wherein the presentation stream is a screen share of the primary participant.

65. A system for conducting a presentation in a virtual conference comprising a plurality of participants, the system comprising:

A processor coupled to the memory;

a network interface configured to receive: (i) data specifying a three-dimensional virtual space, (ii) a location and a direction in the three-dimensional virtual space, the location and the direction being input to the virtual conference by a first participant of the plurality of participants, (iii) a video stream captured from a camera on a device of the first participant, the camera being positioned to capture photographic images of the first participant, and (iv) a presentation stream from the device of the first participant;

a mapper implemented on the processor, the mapper configured to map the video stream onto a three-dimensional model of an avatar and to map the presentation stream onto a three-dimensional model of a presentation screen; and

a renderer implemented on the processor, the renderer configured to render from a perspective of a virtual camera of a second participant of the plurality of participants to display the three-dimensional virtual space to the second participant, the displayed three-dimensional virtual space having a mapped avatar and a mapped presentation screen.

66. The system of claim 65 wherein the presentation stream is a video of the primary participant.

67. The system of claim 65 wherein the presentation stream is a screen share of the primary participant.

68. A computer-implemented method for presentation in a virtual conference comprising a plurality of participants, the computer-implemented method comprising:

receiving from a first device of a first participant in the plurality of participants of the virtual conference (i) a location and a direction in the three-dimensional virtual space, the location and the direction being input by the first participant, (ii) a video stream captured from a camera on the first device, the camera being positioned to capture a photographic image of the first participant, and (iii) a presentation stream; and

transmitting the presentation stream to a second device of a second participant of the plurality of participants, wherein the second device is configured to (i) map the presentation stream onto a three-dimensional model of a presentation screen, (ii) map the video stream onto an avatar, and (iii) render from a perspective of a virtual camera of the second participant of the plurality of participants to display the three-dimensional virtual space to the second participant, the displayed three-dimensional virtual space having a mapped presentation screen space and a mapped three-dimensional model of the avatar located at the location and oriented in the direction.

69. The method of claim 68 wherein the presentation stream is a video of the primary participant.

70. The method of claim 68 wherein the presentation stream is a screen share of the primary participant.

71. The method of claim 68, further comprising:

a web application having executable code is transmitted to the second device, the executable code specifying how the second device is to map and render the presentation screen.

72. A computer-implemented method for providing a virtual conference comprising a plurality of participants, the computer-implemented method comprising:

rendering from a perspective of a first user's virtual camera to display to the first user a three-dimensional virtual space comprising an avatar having a second user's texture-mapped video, the virtual camera being at a first location in the three-dimensional virtual space and the avatar being at a second location in the three-dimensional virtual space;

receiving an audio stream from a microphone of the second user's device, the microphone positioned to capture speech of the second user; and

Adjusting the volume of the received audio streams to determine left and right audio streams to provide a perception of the position of the second location in the three-dimensional virtual space relative to the first location; and

the left audio stream and the right audio stream are output to be played to the first user in stereo.

73. The method of claim 72, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on the direction of the second position to the first position.

74. The method of claim 73, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on a direction of the second position to the first position on a horizontal plane within the three-dimensional virtual space.

75. The method of claim 72, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on the direction of the second location relative to the first location on a horizontal plane within the three-dimensional virtual space.

76. The method of claim 72, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on the direction in which the avatar faces in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located on the left side of the avatar and the right audio stream tends to have a higher volume when the avatar is located on the right side of the avatar.

77. The method of claim 76 wherein the adjusting comprises adjusting the relative volumes of the left and right audio streams based on an angle between the direction in which the virtual camera faces and the direction in which the avatar faces such that an angle that is more perpendicular to the direction in which the avatar faces tends to have a larger difference in volume between the left and right audio streams.

78. The method of claim 72, wherein the adjusting comprises adjusting the volume of the first audio stream and the second audio stream based on a distance between the second location and the first location.

79. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for presentation in a virtual conference comprising a plurality of participants, the operations comprising:

rendering from a perspective of a first user's virtual camera to display a three-dimensional virtual space to the first user, the displayed three-dimensional virtual space comprising an avatar having a texture-mapped video of a second user, the virtual camera being at a first location in the three-dimensional virtual space and the avatar being at a second location in the three-dimensional virtual space;

80. The device of claim 79, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on the direction of the second location to the first location.

81. The device of claim 79, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on a direction of the second location to the first location on a horizontal plane within the three-dimensional virtual space.

82. The device of claim 79, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on the direction of the second location relative to the first location on a horizontal plane within the three-dimensional virtual space.

83. The device of claim 82, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on a direction in which the avatar faces in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located on the left side of the avatar and the right audio stream tends to have a higher volume when the avatar is located on the right side of the avatar.

84. The device of claim 83, wherein the adjusting comprises adjusting the relative volumes of the left and right audio streams based on an angle between the direction in which the virtual camera faces and the direction in which the avatar faces such that an angle that is more perpendicular to the direction in which the avatar faces tends to have a larger difference in volume between the left and right audio streams.

85. The device of claim 79, wherein the adjusting comprises adjusting the volume of a first audio stream and a second audio stream based on a distance between the second location to the first location.

86. A system for providing a virtual conference comprising a plurality of participants, the system comprising:

A processor coupled to the memory;

a renderer implemented on the processor, the renderer configured to render from a perspective of a virtual camera of a first user to display a three-dimensional virtual space to the first user, the displayed three-dimensional virtual space including an avatar having a texture-mapped video of a second user, the virtual camera being at a first location in the three-dimensional virtual space and the avatar being at a second location in the three-dimensional virtual space;

a network interface configured to receive an audio stream from a microphone of the second user's device, the microphone being positioned to capture speech of the second user;

an audio processor configured to adjust the volume of the received audio streams to determine left and right audio streams to provide a perception of the position of the second location in the three-dimensional virtual space relative to the first location; and

a stereo speaker that outputs the left audio stream and the right audio stream to be played to the first user in stereo.

87. The system of claim 86, wherein the audio processor is configured to adjust the relative volumes of the left audio stream and the right audio stream based on the direction of the second location to the first location.

88. The system of claim 87, wherein the audio processor is configured to adjust the relative volumes of the left audio stream and the right audio stream based on the direction of the second location to the first location on a horizontal plane within the three-dimensional virtual space.

89. The system of claim 86, wherein the audio processor is configured to adjust the relative volumes of the left audio stream and the right audio stream based on the direction of the second location relative to the first location on a horizontal plane within the three-dimensional virtual space.

90. The system of claim 89, wherein the audio processor is configured to adjust the relative volumes of the left audio stream and the right audio stream based on the direction in which the avatar faces in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located on the left side of the avatar and the right audio stream tends to have a higher volume when the avatar is located on the right side of the avatar.

91. The system of claim 86, wherein the audio processor is configured to adjust the relative volumes of the left and right audio streams based on an angle between the direction in which the virtual camera is facing and the direction in which the avatar is facing such that angles that are more perpendicular to the direction in which the avatar is facing tend to have a larger difference in volume between the left and right audio streams.

92. The system of claim 86, wherein the audio processor is configured to adjust the volume of the first and second audio streams based on the distance between the second location and the first location.

93. A computer-implemented method for providing audio for a virtual conference, the computer-implemented method comprising:

(a) Rendering from a perspective of a first user's virtual camera to display to the first user at least a portion of a three-dimensional virtual space including an avatar representing a second user, the virtual camera being at a first location in the three-dimensional virtual space and the avatar being at a second location in the three-dimensional virtual space, wherein the three-dimensional virtual space is partitioned into a plurality of regions;

(b) Receiving an audio stream from a microphone of the second user's device, the microphone positioned to capture speech of the second user;

(c) Determining whether the virtual camera and the avatar are located in a same region of the plurality of regions;

(d) Determining whether the avatar is in a podium area among the plurality of areas;

(e) Weakening the audio stream when it is determined that the virtual camera and the avatar are not located in the same area and it is determined that the avatar is not in the podium area; and

(f) Outputting the audio stream for playing to the first user.

94. The computer-implemented method of claim 93, wherein the audio stream is a first audio stream, wherein the three-dimensional virtual space comprises a second avatar representing a third user, wherein determining (c) comprises determining that the virtual camera and the avatar are located in the same area, the computer-implemented method further comprising:

(g) Receiving a second audio stream from a microphone of the first user's device, the microphone positioned to capture speech of the first user;

(h) Determining that the second avatar and the virtual camera are located in a different region of the three-dimensional virtual space than the same region in which the virtual camera is located; and

(i) Attenuating the first audio stream and the second audio stream to prevent the first audio stream and the second audio stream from being heard by the third user, thereby enabling a private conversation between the first user and the second user.

95. The computer-implemented method of claim 93, wherein each of the plurality of regions has a wall transmission factor that specifies a degree of attenuation of the audio stream in (e).

96. The computer-implemented method of claim 93, wherein each of the plurality of regions has a distance transmission factor, the computer-implemented method further comprising:

(g) Determining a distance between the virtual camera and the avatar in the three-dimensional virtual space;

(h) Determining at least one region between the virtual camera and the avatar; and

(i) Weakening the audio stream based on the distance determined in (g) and the distance transmission factor corresponding to the at least one region determined in (h).

97. The computer-implemented method of claim 93, wherein the plurality of regions are structured as a hierarchy.

98. The computer-implemented method of claim 97, wherein each of the plurality of areas has a wall transmission factor, the computer-implemented method further comprising:

(g) Traversing the hierarchy to determine a subset of regions in a plurality of regions between a region including the avatar and a region including the avatar; and

(h) Attenuating the audio stream based on respective wall transmission factors corresponding to the subset of regions determined in (g).

99. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for providing audio for a virtual meeting, the operations comprising:

(f) Outputting the audio stream for playing to the first user.

100. The device of claim 99, wherein the audio stream is a first audio stream, wherein the three-dimensional virtual space comprises a second avatar representing a third user, wherein the determining (c) comprises determining that the virtual camera and the avatar are located in the same area, the operations further comprising:

101. The apparatus of claim 99, wherein each of the plurality of regions has a wall transmission factor that specifies a degree of attenuation of the audio stream in (e).

102. The apparatus of claim 99, wherein each of the plurality of regions has a distance transmission factor, the operations further comprising:

103. The apparatus of claim 99, wherein the plurality of regions are structured as a hierarchy.

104. The apparatus of claim 103, wherein the respective ones of the plurality of areas have wall transmission factors, the operations further comprising:

105. A system for providing audio for a virtual conference, the system comprising:

a processor coupled to the memory;

a renderer implemented on the processor and configured to render from a perspective of a virtual camera of a first user to display at least a portion of a three-dimensional virtual space to the first user, the displayed three-dimensional virtual space including an avatar representing a second user, the virtual camera being at a first location in the three-dimensional virtual space and the avatar being at a second location in the three-dimensional virtual space, wherein the three-dimensional virtual space is partitioned into a plurality of regions, the plurality of regions being structured as a hierarchy;

a network interface configured to receive an audio stream from a microphone of the second user's device, the microphone being positioned to capture speech of the second user; and

an audio processor configured to determine whether the virtual camera and the avatar are located in a same region of the plurality of regions and whether the avatar is in a podium region of the plurality of regions, and to attenuate the audio stream and output the audio stream to play to the first user when it is determined that the virtual camera and the avatar are not located in the same region and it is determined that the avatar is not in the podium region.

106. The system of claim 105, wherein each of the plurality of regions has a wall transmission factor that specifies a degree of attenuation of the audio stream.

107. The system of claim 105, wherein each of the plurality of regions has a distance transmission factor, the audio processor configured to: (i) determining a distance between the virtual camera and the avatar in the three-dimensional virtual space, (ii) determining at least one region between the virtual camera and the avatar, and (iii) weakening the audio stream based on the determined distance and the distance transmission factor corresponding to the determined at least one region.

108. The system of claim 105, wherein each of the plurality of regions has a wall transmission factor, wherein the audio processor is configured to traverse the hierarchy to determine a subset of regions among the plurality of regions between the region including the avatar and the region including the virtual camera, and to attenuate the audio stream based on each wall transmission factor corresponding to the determined subset of regions.

109. A computer-implemented method for streaming video for virtual conferences, the computer-implemented method comprising:

(a) Determining a distance between a first user and a second user in a virtual conference space;

(b) Receiving a video stream captured from a camera on a device of the first user, the camera positioned to capture photographic images of the first user;

(c) Selecting a reduced resolution or bit rate of the video stream based on the determined distance such that a closer distance results in a greater resolution or bit rate than a farther distance; and

(d) Requesting to transmit a video stream at a reduced resolution or bit rate to a device of the second user for display to the second user within the virtual conference space, the video stream to be mapped on an avatar of the first user for display to the second user within the virtual conference space.

110. The method of claim 109, further comprising:

(e) Receiving a second video stream captured from a camera on a third user's device, the camera positioned to capture photographic images of the third user;

(f) Determining an available bandwidth for transmitting data of the virtual conference to the second user;

(g) Determining a second distance between the first user and the second user in the virtual conference space; and

(h) Allocating available bandwidth between the video stream received in (b) and the second video stream received in (e) based on the relative relationship of the second distance determined in (g) to the distance determined in (a).

111. The method of claim 110, wherein the assigning (h) comprises prioritizing video streams of closer users over video streams from farther users.

112. The method of claim 110, further comprising:

(i) Receiving a first audio stream from a device of the first user;

(j) Receiving a second audio stream from a device of the third user, wherein the assigning (h) comprises reserving a portion of the first audio stream and the second audio stream.

113. The method of claim 112, further comprising:

(k) The quality of the first audio stream and the second audio stream is reduced according to the size of the reserved portion.

114. The method of claim 113, wherein said reducing (k) comprises reducing said mass independently of a relative relationship of said second distance determined in (g) to a distance determined in (a).

115. The method of claim 109, further comprising:

(e) Determining the distance between the first user and the second user in the virtual conference space to invalidate video display at the distance;

in response to the determination in (e):

(f) Suspending the transmission of the video stream to the device of the second user; and

(g) Notifying the second user that the device replaced the video stream with a still image.

116. The method of claim 109, wherein the video stream at the reduced resolution is mapped by the device of the second user onto an avatar for display to the second user, the avatar to be rendered at the location of the second user within the virtual conference space.

117. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for streaming video of a virtual conference, the operations comprising:

118. The apparatus of claim 117, wherein the operations further comprise:

119. The apparatus of claim 118, wherein the assigning (h) comprises prioritizing video streams of closer users over video streams from farther users.

120. The apparatus of claim 118, wherein the operations further comprise:

(i) Receiving a first audio stream from a device of the first user;

121. The apparatus of claim 120, wherein the operations further comprise:

122. The apparatus of claim 121, wherein the reducing (k) comprises reducing the mass independent of the second distance determined in (g) versus the distance determined in (a).

123. The apparatus of claim 117, wherein the operations further comprise:

In response to the determination in (e):

124. The device of claim 117, wherein the video stream at the reduced resolution is mapped by the device of the second user onto an avatar to be displayed to the second user, the avatar to be rendered at the location of the second user within the virtual conference space.

125. A system for streaming video for virtual conferences, the system comprising:

a processor coupled to the memory;

a network interface that receives a video stream captured from a camera on a device of a first user, the camera being positioned to capture a photographic image of the first user;

a stream adjuster configured to determine a distance between a first user and a second user in the virtual conference space, and to reduce a resolution of the video stream based on the determined distance, such that a closer distance results in a greater resolution or bit rate than a farther distance,

Wherein the network interface is configured to transmit a video stream at a reduced resolution or bit rate to the device of the second user for display to the second user within the virtual conference space, the video stream to be mapped on the avatar of the first user for display to the second user within the virtual conference space.

126. The system of claim 125 wherein the network interface receives a second video stream captured from a camera on a device of a third user, the camera positioned to capture photographic images of the third user,

wherein the stream adjuster (i) determines an available bandwidth for transmitting data of the virtual conference to the second user, (ii) determines a second distance between a first user and a second user in a virtual conference space, and (iii) allocates the available bandwidth between the video stream and the second video stream based on a relative relationship of the second distance to the distance.

127. The system of claim 126, wherein the stream adjuster is configured to prioritize video streams from closer users over video streams from farther users.

128. The system of claim 126, wherein the network interface is configured to receive a first audio stream from the first user's device and a second audio stream from the third user's device, wherein the stream adjuster is configured to retain a portion of the first audio stream and the second audio stream.

129. A computer-implemented method for streaming video for virtual video conferencing, the computer-implemented method comprising:

receiving a three-dimensional model of a virtual environment;

receiving a mesh representing a three-dimensional model of an object;

receiving a video stream of a first participant of the virtual video conference, the video stream comprising a plurality of frames;

mapping each of the plurality of frames of the video stream onto a three-dimensional model represented by a grid to produce an avatar navigable by the primary participant, wherein the grid is produced independently of the video stream; and

rendering from the perspective of a second participant's virtual camera to display the mapped avatar and the mesh representing the three-dimensional model of the object to the second participant in the virtual environment.

130. The method of claim 129 wherein the object is a product and wherein the secondary participant is able to navigate the virtual camera around a three-dimensional model of the product.

131. The method of claim 129, wherein the object is an advertisement.

132. The method of claim 129, further comprising:

The product is presented in the virtual meeting space.

133. The method of claim 129, wherein the virtual environment is a building.

134. The method of claim 129, wherein the grid is a first grid, the method further comprising:

a request from the second participant is transmitted for the first grid, wherein receiving the first grid occurs in response to the request.

135. The method of claim 129, further comprising:

receiving the position and the direction of the first user in the three-dimensional virtual space;

mapping the video stream onto a three-dimensional model of an avatar, wherein the rendering includes rendering the virtual conference to include the mapped three-dimensional model of the avatar at the location and oriented in the direction.

136. A non-transitory, tangible computer-readable device storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for streaming video of a virtual conference, the operations comprising:

Receiving a three-dimensional model of a virtual environment;

receiving a mesh representing a three-dimensional model of an object;

137. The non-transitory, tangible computer readable device of claim 136, wherein the object is a product, and wherein the secondary participant is capable of navigating the virtual camera around the three-dimensional model of the product.

138. The non-transitory, tangible computer readable device of claim 136, wherein the object is an advertisement.

139. The non-transitory, tangible computer-readable device of claim 137, wherein the operations further comprise:

The product is presented in the virtual meeting space.

140. The non-transitory, tangible computer readable device of claim 136, wherein the virtual environment is a building.

141. The non-transitory, tangible computer-readable device of claim 136, wherein the grid is a first grid, the operations further comprising:

142. The non-transitory, tangible computer-readable device of claim 136, wherein the operations further comprise:

143. A system for streaming video for virtual video conferencing, the system comprising:

a processor coupled to the memory;

a network interface configured to receive (i) a three-dimensional model of a virtual environment, (ii) a grid representing the three-dimensional model of an object, (iii) a video stream of a first and a meeting of the virtual video conference, the video stream comprising a plurality of frames;

a mapper to map each of the plurality of frames of the video stream onto a three-dimensional model represented by a grid to produce an avatar navigable by the primary participant, wherein the grid is produced independent of the video stream; and

a renderer configured to render from a perspective of a virtual camera of a secondary participant to display the mapped avatar and the mesh representing the three-dimensional model of the object to the secondary participant in the virtual environment.

144. The system of claim 143 wherein the object is a product and wherein the secondary participant is capable of navigating the virtual camera around the three-dimensional model of the product.

145. The system of claim 143, wherein the object is an advertisement.

146. The system of claim 143, wherein the virtual environment is a building.