WO2024020452A1 - Multi-screen presentation in a virtual videoconferencing environment - Google Patents

Multi-screen presentation in a virtual videoconferencing environment Download PDF

Info

Publication number
WO2024020452A1
WO2024020452A1 PCT/US2023/070509 US2023070509W WO2024020452A1 WO 2024020452 A1 WO2024020452 A1 WO 2024020452A1 US 2023070509 W US2023070509 W US 2023070509W WO 2024020452 A1 WO2024020452 A1 WO 2024020452A1
Authority
WO
WIPO (PCT)
Prior art keywords
presentation
dimensional
participants
stream
participant
Prior art date
Application number
PCT/US2023/070509
Other languages
French (fr)
Inventor
James DONAHOWER
Gerard Cornelis Krol
Petr Polyakov
Erik Stuart Braund
Original Assignee
Katmai Tech Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/813,657 external-priority patent/US11928774B2/en
Priority claimed from US17/813,708 external-priority patent/US20240031531A1/en
Application filed by Katmai Tech Inc. filed Critical Katmai Tech Inc.
Publication of WO2024020452A1 publication Critical patent/WO2024020452A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents

Definitions

  • This field is generally related to videoconferencing.
  • Video conferencing involves the reception and transmission of audio-video signals by users at different locations for communication between people in real time.
  • Videoconferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, CA.
  • Some videoconferencing software such as the FaceTime application available from Apple Inc. of Cupertino, CA, comes standard with mobile devices.
  • these applications operate by displaying video and outputting audio of other conference participants.
  • the screen may be divided into a number of rectangular frames, each displaying video of a participant.
  • these services operate by having a larger frame that presents video of the person speaking. As different individuals speak, that frame will switch between speakers.
  • the application captures video from a camera integrated with the user’s device and audio from a microphone integrated with the user’s device. The application then transmits that audio and video to other applications running on other user’s devices.
  • Massively multiplayer online games generally can handle quite a few more than 25 participants. These games often have hundreds or thousands of players on a single server. MMOs often allow players to navigate avatars around a virtual world. Sometimes these MMOs allow users to speak with one another or send messages to one another. Examples include the ROBLOX game available from Roblox Corporation of San Mateo, CA, and the MINECRAFT game available from Mojang Studios of Sweden.
  • a computer-implemented method allows users to simultaneously share two-screens in a three-dimensional virtual environment.
  • the method comprises receiving data specifying a three-dimensional virtual space.
  • the three-dimensional virtual space comprises a plurality of participants, an avatar representing each of the plurality of participants, and a plurality of three-dimensional models of a plurality of presentation screens.
  • the method further comprises receiving a first selection of a first three- dimensional model of the plurality of three-dimensional models of a first presentation screen of the plurality of presentation screens and receiving a first presentation stream from a first client device of a first participant of the plurality of participants.
  • the method comprises mapping the first presentation stream onto the first three-dimensional model of the first presentation screen.
  • the method further comprises receiving a second selection of a second three-dimensional model of the plurality of three-dimensional models of a second presentation screen of the plurality of presentation screens and receiving a second presentation stream from a second client device of the second participant.
  • the method further comprises mapping the second presentation stream onto the second three- dimensional model of the second presentation screen while the first presentation stream is mapped on the first three-dimensional model of the first presentation screen.
  • the method comprises from a perspective of a virtual camera of a third participant of the plurality of participants, rendering for display to the third participant the three-dimensional virtual space with the first three-dimensional model of the first presentation screen including the first presentation stream and the second three-dimensional model of the second presentation screen including the second presentation stream.
  • a computer-implemented method allows sharing a presentation stream in a two-dimensional view.
  • the method comprises receiving data specifying a three- dimensional virtual space.
  • the three-dimensional virtual space comprises a plurality of participants and an avatar representing each of the plurality of participants.
  • the method further comprises receiving a presentation stream from a first client device of a first participant of the plurality of participants and mapping the presentation stream onto a three- dimensional model of a presentation screen in the three-dimensional virtual space.
  • the method further comprises receiving a selection of the presentation screen from a second participant of the plurality of participants and rendering for display to the second participant a two-dimensional view of the presentation stream.
  • Figure 1 is a diagram illustrating an example interface that provides videoconferencing in a virtual environment with video streams being mapped onto avatars.
  • Figure 2 is a diagram illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
  • Figure 3 is a diagram illustrating a system that provides videoconferences in a virtual environment.
  • Figures 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing.
  • Figures 5A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
  • Figures 6A-E illustrate an interface with example presentation screens in a three- dimensional virtual environment used for videoconferencing.
  • Figure 7 is a flowchart illustrating a method for sharing a presentation stream on a presentation screen in a virtual conference room, according to some embodiments.
  • Figure 8 is a flowchart illustrating a method for rendering a 2-D view of a presentation stream, according to some embodiments.
  • Figure 9 is a diagram illustrating components of devices used to provide videoconferencing within a virtual environment.
  • Figure 1 is a diagram illustrating an example of an interface 100 that provides videoconferences in a virtual environment with video streams being mapped onto avatars.
  • Interface 100 may be displayed to a participant to a videoconference.
  • interface 100 may be rendered for display to the participant and may be constantly updated as the videoconference progresses.
  • a user may control the orientation of their virtual camera using, for example, keyboard inputs. In this way, the user can navigate around a virtual environment.
  • different inputs may change the virtual camera’s X and Y position and pan and tilt angles in the virtual environment.
  • a user may use inputs to alter height (the Z coordinate) or yaw of the virtual camera.
  • a user may enter inputs to cause the virtual camera to “hop” up while returning to its original position, simulating gravity.
  • the inputs available to navigate the virtual camera may include, for example, keyboard and mouse inputs, such as WASD keyboard keys to move the virtual camera forward, backward, left, or right on an X-Y plane, a space bar key to “hop” the virtual camera, and mouse movements specifying changes in pan and tilt angles.
  • keyboard and mouse inputs such as WASD keyboard keys to move the virtual camera forward, backward, left, or right on an X-Y plane, a space bar key to “hop” the virtual camera, and mouse movements specifying changes in pan and tilt angles.
  • Interface 100 includes avatars 102A and B, which each represent different participants to the videoconference.
  • Avatars 102A and B respectively, have texture mapped video streams 104 A and B from devices of the first and second participant.
  • a texture map is an image applied (mapped) to the surface of a shape or polygon.
  • the images are respective frames of the video.
  • the camera devices capturing video streams 104A and B are positioned to capture faces of the respective participants. In this way, the avatars have texture mapped thereon, moving images of faces as participants in the meeting talk and listen.
  • avatars 102 A and B are controlled by the respective participants that they represent.
  • Avatars 102A and B are three-dimensional models represented by a mesh. Each avatar 102 A and B may have the participant’s name underneath the avatar.
  • the respective avatars 102 A and B are controlled by the various users. They each may be positioned at a point corresponding to where their own virtual cameras are located within the virtual environment. Just as the user viewing interface 100 can move around the virtual camera, the various users can move around their respective avatars 102 A and B.
  • the virtual environment rendered in interface 100 includes background image 120 and a three-dimensional model 118 of an arena.
  • the arena may be a venue or building in which the videoconference should take place.
  • the arena may include a floor area bounded by walls.
  • Three-dimensional model 118 can include a mesh and texture. Other ways to mathematically represent the surface of three-dimensional model 118 may be possible as well. For example, polygon modeling, curve modeling, and digital sculpting may be possible.
  • three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space.
  • Three-dimensional model 118 may also include specification of light sources.
  • the light sources can include for example, point, directional, spotlight, and ambient.
  • the objects may also have certain properties describing how they reflect light.
  • the properties may include diffuse, ambient, and spectral lighting interactions.
  • the virtual environment can include various other three- dimensional models that illustrate different components of the environment.
  • the three-dimensional environment can include a decorative model 114, a speaker model 116, and a presentation screen model 122.
  • these models can be represented using any mathematical way to represent a geometric surface in three-dimensional space. These models may be separate from three-dimensional model 118 or combined into a single representation of the virtual environment.
  • Decorative models such as decorative model 114, serve to enhance the realism and increase the aesthetic appeal of the arena.
  • Speaker model 116 may virtually emit sound, such as presentation and background music, as will be described in greater detail below with respect to figures 5 and 7.
  • Presentation screen model 122 can serve to provide an outlet to present a presentation. Video of the presenter or a presentation screen share may be texture mapped onto presentation screen model 122.
  • Button 108 may provide the user a list of participants. In one example, after a user selects button 108, the user could chat with other participants by sending text messages, individually or as a group.
  • Button 110 may enable a user to change attributes of the virtual camera used to render interface 100.
  • the virtual camera may have a field of view specifying the angle at which the data is rendered for display. Modeling data within the camera field of view is rendered, while modeling data outside the camera’s field of view may not be.
  • the virtual camera’s field of view may be set somewhere between 60 and 110°, which is commensurate with a wide-angle lens and human vision.
  • selecting button 110 may cause the virtual camera to increase the field of view to exceed 170°, commensurate with a fisheye lens. This may enable a user to have broader peripheral awareness of its surroundings in the virtual environment.
  • button 112 causes the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to devices belonging to the other participants signaling to their devices to stop displaying the avatar corresponding to the user previously viewing interface 100.
  • interface virtual 3D space is used to conduct video conferencing. Every user controls an avatar, which they can control to move around, look around, jump or do other things which change the position or orientation.
  • a virtual camera shows the user the virtual 3D environment and the other avatars.
  • the avatars of the other users have as an integral part a virtual display, which shows the webcam image of the user.
  • embodiments provide a more social experience than conventional web conferencing or conventional MMO gaming. That more social experience has a variety of applications. For example, it can be used in online shopping.
  • interface 100 has applications in providing virtual grocery stores, houses of worship, trade shows, B2B sales, B2C sales, schooling, restaurants or lunchrooms, product releases, construction site visits (e.g., for architects, engineers, contractors), office spaces (e.g., people work “at their desks” virtually), controlling machinery remotely (ships, vehicles, planes, submarines, drones, drilling equipment, etc.), plant/factory control rooms, medical procedures, garden designs, virtual bus tours with guide, music events (e.g., concerts), lectures (e.g., TED talks), meetings of political parties, board meetings, underwater research, research on hard to reach places, training for emergencies (e.g., fire), cooking, shopping (with checkout and delivery), virtual arts and crafts (e.g., painting and pottery), marriages, funerals, baptisms, remote sports training, counseling, treating fears (e.g., confrontation therapy), fashion shows, amusement parks, home decoration, watching sports, watching esports, watching performances captured using a three-dimensional camera, playing board and role playing games,
  • Figure 2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
  • the virtual environment here includes a three-dimensional arena 118, and various three- dimensional models, including three-dimensional models 114 and 122.
  • diagram 200 includes avatars 102 A and B navigating around the virtual environment.
  • interface 100 in figure 1 is rendered from the perspective of a virtual camera. That virtual camera is illustrated in diagram 200 as virtual camera 204.
  • the user viewing interface 100 in figure 1 can control virtual camera 204 and navigate the virtual camera in three-dimensional space.
  • Interface 100 is constantly being updated according to the new position of virtual camera 204 and any changes of the models within in the field of view of virtual camera 204.
  • the field of view of virtual camera 204 may be a frustum defined, at least in part, by horizontal and vertical field of view angles.
  • a background image, or texture may define at least part of the virtual environment.
  • the background image may capture aspects of the virtual environment that are meant to appear at a distance.
  • the background image may be texture mapped onto a sphere 202.
  • the virtual camera 204 may be at an origin of the sphere 202. In this way, distant features of the virtual environment may be efficiently rendered.
  • other shapes instead of sphere 202 may be used to texture map the background image.
  • the shape may be a cylinder, cube, rectangular prism, or any other three-dimensional geometry.
  • FIG. 3 is a diagram illustrating a system 300 that provides videoconferences in a virtual environment.
  • System 300 includes a server 302 coupled to devices 306 A and B via one or more networks 304.
  • Server 302 provides the services to connect a videoconference session between devices 306 A and 306B.
  • server 302 communicates notifications to devices of conference participants (e.g., devices 306A-B) when new participants join the conference and when existing participants leave the conference.
  • Server 302 communicates messages describing a position and direction in a three-dimensional virtual space for respective participant’s virtual cameras within the three-dimensional virtual space.
  • Server 302 also communicates video and audio streams between the respective devices of the participants (e.g., devices 306A-B).
  • server 302 stores and transmits data describing data specifying a three-dimensional virtual space to the respective devices 306A-B.
  • server 302 may provide executable information that instructs the devices 306 A and 306B on how to render the data to provide the interactive conference.
  • Server 302 responds to requests with a response.
  • Server 302 may be a web server.
  • a web server is software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web.
  • HTTP Hypertext Transfer Protocol
  • the main job of a web server is to display website content through storing, processing and delivering webpages to users.
  • communication between devices 306A-B happens not through server 302 but on a peer-to-peer basis.
  • one or more of the data describing the respective participants’ location and direction, the notifications regarding new and exiting participants, and the video and audio streams of the respective participants are communicated not through server 302 but directly between devices 306A- B.
  • Network 304 enables communication between the various devices 306A-B and server 302.
  • Network 304 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any combination of two or more such networks.
  • VPN virtual private network
  • LAN local area network
  • WLAN wireless LAN
  • WAN wide area network
  • WWAN wireless wide area network
  • MAN metropolitan area network
  • PSTN Public Switched Telephone Network
  • PSTN Public Switched Telephone Network
  • Devices 306A-B are each devices of respective participants to the virtual conference. Devices 306A-B each receive data necessary to conduct the virtual conference and render the data necessary to provide the virtual conference. As will be described in greater detail below, devices 306A-B include a display to present the rendered conference information, inputs that allow the user to control the virtual camera, a speaker (such as a headset) to provide audio to the user for the conference, a microphone to capture a user’s voice input, and a camera positioned to capture video of the user’s face.
  • a display to present the rendered conference information
  • inputs that allow the user to control the virtual camera inputs that allow the user to control the virtual camera
  • a speaker such as a headset
  • microphone to capture a user’s voice input
  • a camera positioned to capture video of the user’s face.
  • Devices 306A-B can be any type of computing device, including a laptop, a desktop, a smartphone, or a tablet computer, or wearable computer (such as a smartwatch or a augmented reality or virtual reality headset).
  • Web browser 308A-B can retrieve a network resource (such as a webpage) addressed by the link identifier (such as a uniform resource locator, or URL) and present the network resource for display.
  • web browser 308A-B is a software application for accessing information on the World Wide Web.
  • web browser 308A-B makes this request using the hypertext transfer protocol (HTTP or HTTPS).
  • HTTP Hypertext transfer protocol
  • the web browser retrieves the necessary content from a web server, interprets and executes the content, and then displays the page on a display on device 306A-B shown as client/counterpart conference application 308A-B.
  • the content may have HTML and client-side scripting, such as JavaScript.
  • Conference application 310A-B may be a web application downloaded from server 302 and configured to be executed by the respective web browsers 308A-B.
  • conference application 310A-B may be a JavaScript application.
  • conference application 310A-B may be written in a higher-level language, such as a Typescript language, and translated or compiled into JavaScript.
  • Conference application 310A-B is configured to interact with the WebGL JavaScript application programming interface. It may have control code specified in JavaScript and shader code written in OpenGL ES Shading Language (GLSL ES).
  • GLSL ES OpenGL ES Shading Language
  • conference application 310A-B may be able to utilize a graphics processing unit (not shown) of device 306A-B.
  • OpenGL rendering of interactive two-dimensional and three- dimensional graphics without the use of plug-ins.
  • Conference application 310A-B receives the data from server 302 describing position and direction of other avatars and three-dimensional modeling information describing the virtual environment. In addition, conference application 310A-B receives video and audio streams of other conference participants from server 302.
  • Conference application 310A-B renders three three-dimensional modeling data, including data describing the three-dimensional environment and data representing the respective participant avatars.
  • This rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques.
  • the rendering may involve ray tracing based on the characteristics of the virtual camera.
  • Ray tracing involves generating an image by tracing a path of light as pixels in an image plane and simulating the effects of his encounters with virtual objects.
  • the ray tracing may simulate optical effects such as reflection, refraction, scattering, and dispersion.
  • the user uses web browser 308A-B to enter a virtual space.
  • the scene is displayed on the screen of the user.
  • the webcam video stream and microphone audio stream of the user are sent to server 302.
  • an avatar model is created for them.
  • the position of this avatar is sent to the server and received by the other users.
  • Other users also get a notification from server 302 that an audio/video stream is available.
  • the video stream of a user is placed on the avatar that was created for that user.
  • the audio stream is played back as coming from the position of the avatar.
  • FIGS 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing. Like figure 3, each of figures 4A-C depict the connection between server 302 and devices 306 A and B. In particular, figures 4A-C illustrate example data flows between those devices.
  • Figure 4A illustrates a diagram 400 illustrating how server 302 transmits data describing the virtual environment to devices 306A and 306B.
  • both devices 306 A and 306B receive from server 302 the three-dimensional arena 404, background texture 402, space hierarchy 408 and any other three-dimensional modeling information 406.
  • background texture 402 is an image illustrating distant features of the virtual environment.
  • the image may be regular (such as a brick wall) or irregular.
  • Background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image format. It describes the background image to be rendered against, for example, a sphere at a distance.
  • Three-dimensional arena 404 is a three-dimensional model of the space in which the conference is to take place. As described above, it may include, for example, a mesh and possibly its own texture information to be mapped upon the three-dimensional primitives it describes. It may define the space in which the virtual camera and respective avatars can navigate within the virtual environment. Accordingly, it may be bounded by edges (such as walls or fences) that illustrate to users the perimeter of the navigable virtual environment.
  • Space hierarchy 408 is data specifying partitions in the virtual environment. These partitions are used to determine how sound is processed before being transferred between participants. As will be described below, this partition data may be hierarchical and may describe sound processing to allow for areas where participants to the virtual conference can have private conversations or side conversations.
  • Three-dimensional model 406 is any other three-dimensional modeling information needed to conduct the conference. In one embodiment, this may include information describing the respective avatars. Alternatively or additionally, this information may include product demonstrations.
  • FIG. 4B-C illustrate how server 302 forwards information from one device to another.
  • Figure 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and B
  • Figure 4C illustrates a diagram 420 showing how server 302 transmits the information to respective devices 306B and A.
  • device 306 A transmits position and direction 422A, video stream 424A, and audio stream 426A to server 302, which transmits position and direction 422 A, video stream 424 A, and audio stream 426 A to device 306B.
  • device 306B transmits position and direction 422B, video stream 424B, and audio stream 426B to server 302, which transmits position and direction 422B, video stream 424B, and audio stream 426B to device 306 A.
  • Position and direction 422A-B describe the position and direction of the virtual camera for the user using device 306 A.
  • the position may be a coordinate in three-dimensional space (e.g., x, y, z coordinate) and the direction may be a direction in three-dimensional space (e.g., pan, tilt, roll).
  • the user may be unable to control the virtual camera’s roll, so the direction may only specify pan and tilt angles.
  • the user may be unable to change the avatar’s z coordinate (as the avatar is bounded by virtual gravity), so the z coordinate may be unnecessary.
  • position and direction 422A-B each may include at least a coordinate on a horizontal plane in the three-dimensional virtual space and a pan and tilt value.
  • the user may be able to “jump” its avatar, so the Z position may be specified only by an indication of whether the user is jumping their avatar.
  • position and direction 422A-B may be transmitted and received using HTTP request responses or using socket messaging.
  • Video stream 424A-B is video data captured from a camera of the respective devices 306 A and B.
  • the video may be compressed.
  • the video may use any commonly known video codecs, including MPEG-4, VP8, or H.264.
  • the video may be captured and transmitted in real time.
  • audio stream 426A-B is audio data captured from a microphone of the respective devices.
  • the audio may be compressed.
  • the video may use any commonly known audio codecs, including MPEG-4 or vorbis.
  • the audio may be captured and transmitted in real time.
  • Video stream 424A and audio stream 426A are captured, transmitted, and presented synchronously with one another.
  • video stream 424B and audio stream 426B are captured, transmitted, and presented synchronously with one another.
  • the video stream 424A-B and audio stream 426A-B may be transmitted using the WebRTC application programming interface.
  • the WebRTC is an API available in JavaScript.
  • devices 306 A and B download and run web applications, as conference applications 310A and B, and conference applications 310A and B may be implemented in JavaScript.
  • Conference applications 310A and B may use WebRTC to receive and transmit video stream 424A-B and audio stream 426A-B by making API calls from its JavaScript.
  • conference applications 310A and B may periodically or intermittently re-render the virtual space based on new information from respective video streams 424A and B, position and direction 422A and B, and new information relating to the three-dimensional environment.
  • new information from respective video streams 424A and B, position and direction 422A and B, and new information relating to the three-dimensional environment.
  • each of these updates are now described from the perspective of device 306A.
  • device 306B would behave similarly given similar changes.
  • device 306A texture maps frames from video stream 424A on to an avatar corresponding to device 306B. That texture mapped avatar is re-rendered within the three-dimensional virtual space and presented to a user of device 306 A.
  • device 306 A As device 306 A receives a new position and direction 422B, device 306 A generates the avatar corresponding to device 306B positioned at the new position and oriented at the new direction. The generated avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306 A.
  • server 302 may send updated model information describing the three-dimensional virtual environment.
  • server 302 may send updated information 402, 404, 406, or 408.
  • device 306 A will re-render the virtual environment based on the updated information. This may be useful when the environment changes over time. For example, an outdoor event may change from daylight to dusk as the event progresses.
  • server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In that case, device 306 A would re-render the virtual environment without the avatar for device 306B.
  • figure 3 in figures 4A-C is illustrated with two devices for simplicity, a skilled artisan would understand that the techniques described herein can be extended to any number of devices. Also, while figure 3 in figures 4A-C illustrates a single server 302, a skilled artisan would understand that the functionality of server 302 can be spread out among a plurality of computing devices.
  • the data transferred in FIG. 4A may come from one network address for server 302, while the data transferred in FIGs. 4B- C can be transferred to/from another network address for server 302.
  • participants can set their webcam, microphone, speakers and graphical settings before entering the virtual conference.
  • users may enter a virtual lobby where they are greeted by an avatar controlled by a real person. This person is able to view and modify the webcam, microphone, speakers and graphical settings of the user.
  • the attendant can also instruct the user on how to use the virtual environment, for example by teaching them about looking, moving around and interacting. When they are ready, the user automatically leaves the virtual waiting room and joins the real virtual environment.
  • Embodiments also adjust volume to provide a sense of position and space within the virtual conference.
  • Figures 5A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
  • the server may provide specification of sound or volume areas to the client devices.
  • Virtual environment may be partitioned into different volume areas.
  • Figure 5A illustrates a diagram 500 with a volume area 502 that allows for a semiprivate or side conversation between a user controlling avatar 506 and the user controlling the virtual camera.
  • the users around conference table 510 can have a conversation without disturbing others in the room.
  • the sound from the users controlling avatar 506 in the virtual camera may fall off as it exits volume area 502, but not entirely. That allows passersby to join the conversation if they’d like.
  • Interface 500 also includes buttons 504, 506, and 508, which will be described below.
  • Figure 5B illustrates a diagram 500 with a volume area 504 that allows for a private conversation between a user controlling avatar 508 and the user controlling the virtual camera.
  • audio from the user controlling avatar 508 and the user controlling the virtual camera may only be output to those inside volume area 504. As no audio at all is played from those users to others in the conference, their audio streams may not even be transmitted to the other user devices.
  • the different areas have different roll off factors in that case, the distance based for individual areas based on the respective roll off factors.
  • different areas of the virtual environment project sound at different rates.
  • the audio gains may be applied to the audio stream to determine left and right audio accordingly.
  • both wall transmission factors, roll off factors, and left-right adjustments to provide a sense of direction for the sound may be applied together to provide a comprehensive audio experience.
  • a volume area may be a podium area. If the user is located in the podium area, no attenuation may occur because of roll off factors or wall transmission factors. In some embodiments, the relative left-right audio may still be adjusted to provide a sense of direction.
  • the same methods may be applied to other sound sources, other than avatars.
  • the virtual environment may have three-dimensional models of speakers. Sound may be emitted from the speakers in the same way as the avatar models described above, either because of a presentation or just to provide background music.
  • wall transmission factors may be used to isolate audio entirely.
  • this can be used to create virtual offices.
  • each user may have in their physical (perhaps home) office a monitor displaying the conference application constantly on and logged into the virtual office.
  • There may be a feature that allows the user to indicate whether he’s in the office or should not be disturbed. If the do- not-disturb indicator is off, a coworker or manager may come around within the virtual space and knock or walk in as they would in a physical office. The visitor may be able to leave a note if the worker is not present in her office. When the worker returns, she would be able to read the note left by the visitor.
  • the virtual office may have a whiteboard and/or an interface that displays messages for the user.
  • the messages may be email and/or from a messaging application such as the SLACK application available from Slack Technologies, Inc. of San Francisco, CA.
  • Users may be able to customize or personalize their virtual offices. For example, they may be able to put up models of posters or other wall ornaments. They may be able to change models or orientation of desks or decorative ornaments, such as plantings. They may be able to change lighting or view out the window.
  • the interface 500 includes various buttons 504, 506, and 508.
  • the attenuation may not occur, or may occur only in smaller amounts. In that situation, the user’s voice is output uniformly to other users, allowing for the user to provide a talk to all participants in the meeting.
  • the user video may also be output on a presentation screen within the virtual environment as well, as will be described below.
  • a speaker mode is enabled. In that case, audio is output from sound sources within the virtual environment, such as to play background music.
  • a screen share mode may be enabled, enabling the user to share contents of a screen or window on their device with other users. The contents may be presented on a presentation model. This too will be described below.
  • Embodiments also allow users in the three-dimensional virtual environment to share a presentation stream on a presentation screen in the three-dimensional virtual environment used for video conferencing.
  • Figures 6A-E illustrate an interface 600 with presentation screens in a three- dimensional virtual environment used for videoconferencing.
  • interface 600 may be displayed to a user who can navigate around the virtual environment.
  • the virtual environment includes a virtual conference room with multiple presentation screens.
  • FIG. 6A illustrates interface 600 with a three-dimensional model of presentation screen 602.
  • interface 600 may include a virtual conference room. Users represented by avatars 604-606 may conduct a meeting in the virtual conference room. The users may be participants of the meeting.
  • the virtual conference room may include the three-dimensional model of presentation screen 602. Presentation screen 602 may be positioned at a central location of the virtual conference room.
  • the virtual conference room may include a virtual conference table with additional three-dimensional models of presentation screens distributed around the virtual conference table. The virtual conference table with the additional three-dimensional models of presentation screens will be described in further detail with respect to Figures 6B-6C.
  • Presentation screen 602 may be a main presentation screen.
  • presentation screen 602 may be larger than the additional presentation screens distributed around the virtual conference table.
  • presentation screen 602 may be positioned such that it is visible to all of the participants in the virtual conference room.
  • a first participant in the virtual conference room may want to share a presentation stream captured by their device (e.g., device 306A or device 306B) on the three- dimensional model of presentation screen 602.
  • the server e.g., server 302
  • the share screen button may cause a prompt to be rendered on the first participant’s device asking prompt the user to select a screen and/or window of their device to share on presentation screen 602.
  • the first participant may select a share screen button on interface 600.
  • Selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the first participant to select a screen and/or window of their device to share on the three-dimensional model of presentation screen 602. Furthermore, selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the user to select a presentation screen in the virtual conference room.
  • the server may receive a presentation stream from the first participant’s device. Specifically, the selection of presentation screen 602 and presentation stream may be published to the server.
  • the presentation stream may include audio and video data.
  • the presentation stream may be a video stream from a camera on first participant’s device.
  • the presentation stream may be a screen share from the user’s device, where a monitor or window is shared. Through screen share or otherwise, the presentation video and audio stream could also be from an external source, for example a livestream of an event.
  • the presentation stream (and audio stream) of the first participant is published to the server tagged with the name of the screen the user wants to use.
  • the presentation stream is texture mapped onto the three-dimensional model of a presentation screen 602.
  • a presentation screen may take a variety of forms, such as a poster, a view out of a window, a view of a control panel, or a surface of a table with object placed on it.
  • the participants in the virtual conference room can consume the presentation stream by viewing presentation screen 602. In other words, from the perspective of a virtual camera the participants in the virtual conference room, the presentation stream may be rendered for display to the users in the virtual conference room on presentation screen 602.
  • An audio stream is captured synchronously with the presentation stream and from a microphone of the device of the first participant.
  • the audio stream from the microphone of the first participant may be heard by other participants as to be coming from presentation screen 602.
  • presentation screen 602 may be a sound source as described above. Because the first participant’s audio stream is projected from the presentation screen 602, it may be suppressed coming from the first participant’s avatar. In this way, the audio stream is outputted to play synchronously with display of the presentation stream on screen 602 within the three-dimensional virtual space.
  • the first participant may also be able to control the location and orientation of the audience members.
  • the first participant may have an option to select to re-arrange all the other participants to the meeting to be positioned and oriented to face the presentation screen.
  • FIG. 6B illustrates interface 600 multiple presentation screens around a virtual conference table.
  • the virtual conference room may include a three- dimensional model of a virtual conference table 611.
  • three- dimensional models of presentation screens 610-616 may be positioned around the virtual conference table 611.
  • Each of the presentation screens 610-616 may include a ‘share screen’ button.
  • presentation screen 610 may include ‘share screen’ button 618.
  • presentation screens may also include the ‘share screen’ button.
  • the first participant may select ‘share screen’ button 618 located on the three-dimensional model of presentation screen 610 to share a presentation stream from their device (e.g., device 306A or 306B).
  • the share screen button may cause a prompt to be rendered on the first participant’s device asking the first participant to select a screen and/or window of their device to share on presentation screen 610.
  • the first participant may select a share screen button on interface 600. Selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’ s device asking the user to select a screen and/or window of their device to share on presentation screen 610.
  • selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the first participant to select one of presentation screens 610-616.
  • the first participant may also select presentation screen 602, as shown in figure 6A.
  • the server may automatically select one of presentation screens 610-616 (or 602) based on the availability of the presentation screen, the size of the presentation screen, or the proximity of the first participant’s avatar to a given presentation screen in the three-dimensional virtual environment.
  • the server may receive a presentation stream from the first participant’s device. Specifically, the selection of the three-dimensional model of presentation screen 610 and presentation stream may be published to the server. Other clients are notified that a new stream is available.
  • FIG. 6C illustrates the first participant’s presentation stream being shared on presentation screen 610.
  • the first participant’s presentation stream 620 is texture mapped onto the three-dimensional model of presentation screen 610.
  • the participants in the virtual conference room can consume presentation stream 620 by viewing presentation screen 610.
  • presentation stream 620 may be rendered for display to the participants in the virtual conference room on the three-dimensional model of a presentation screen 610. As such, the participants will have to navigate their respective avatars with respect to presentation screen 610 in the virtual conference room to consume presentation stream 620.
  • An audio stream is captured synchronously with presentation stream 620 and from a microphone of the device of the first participant.
  • the audio stream from the microphone of the user may be heard by other users as to be coming from presentation screen 610.
  • presentation screen 610 may be a sound source as described above.
  • the first participant’s audio stream is projected from the presentation screen 610, it may be suppressed coming from the user’s avatar. In this way, the audio stream is outputted to play synchronously with display of presentation stream 620 on presentation screen 610 within the three-dimensional virtual space.
  • the first participant or other participants in the virtual conference room may share other presentation streams on remaining available presentation screens (e.g., presentation screens 612-616 or 602).
  • a single participant may concurrently share a plurality of streams, or multiple participants may each share concurrently a presentation stream in the virtual conference room.
  • audio from the various presentation streams are combined (along with audio from other sources, such as from other user’s avatars), as described above with respect FIG. 5A-B.
  • the audio is positional, including optional falloff.
  • a user such as a presenter, participant, or administrator, may have more control over the audio.
  • a user interface element may enable the user to mute all participants or all presentation streams except a specific one.
  • the first participant can chose to stop sharing the presentation stream by selecting a button on interface 600.
  • the server can end the presentation stream being rendered on a presentation stream.
  • the server may end the presentation stream being rendered on a presentation stream based on the first participant leaving the three-dimensional virtual environment or the first participant’s avatar being more than a predetermined threshold distance from the presentation screen. For example, if the first participant leave the virtual conference room, the server may end the presentation stream being rendered on a presentation stream.
  • the virtual conference room may be a volume area. As such, users outside the virtual conference room may not be able to consume the presentation streams being shared by the participants in the virtual conference room.
  • the server in response to the presentation stream being published to the server, may determine which participants in the virtual conference room may consume the presentation stream. For example, determine which participants in the virtual conference room may consume the presentation stream based on position/title, security clearance, position of the participant’s avatar in the virtual conference room/three- dimensional virtual environment, etc.
  • a participant e.g., first participant
  • Figure 6D illustrates a 2-D view of the presentation stream.
  • a participant in the virtual conference room may want to view presentation stream 620 being rendered on a presentation screen in a 2-D view. The participant may select the presentation stream 620 (e.g., by clicking on the presentation stream).
  • the server or the participant’s device may cause a 2-D view 630 of presentation stream 620 to be rendered on the participant’s device.
  • 2-D view 630 of presentation stream 620 may be rendered over the three- dimensional virtual environment on the participant’s device. As such, the three- dimensional virtual environment may be visible to the participant around 2-D view 630. Presentation stream 620 may continue to be rendered on presentation screen 610 while 2- D view 630 of presentation stream 620 is rendered on the participant’s device.
  • Figure 6E illustrates a 2-D view of a participant’s view of the three-dimensional virtual space.
  • a participant may share presentation stream 650 in on presentation screen 660 in the virtual conference room.
  • Presentation stream 650 may include the participant’s view of the three-dimensional virtual environment.
  • presentation stream 650 may include the avatars and virtual conference room from the participant’s perspective.
  • presentation stream 650 may mapped to presentation screen 660.
  • Presentation stream 650 may be rendered in 2-D on presentation screen 660.
  • Presentation stream 650 is continuously updated as the perspective of the participant is updated in the virtual conference room.
  • Presentation stream 650 may provide a honeycomb view of the virtual conference room. The participants in the virtual conference room may consume presentation stream 650.
  • Figure 7 is a flow chart illustrating a method 700 for sharing a presentation stream on a presentation screen in a virtual conference room, according to some embodiments.
  • a three-dimensional virtual environment is rendered for users.
  • the users may navigate around the three-dimensional virtual environment using avatars.
  • the three-dimensional virtual environment may include virtual conference rooms.
  • the users may also attend a meeting in a virtual conference room.
  • the users may be participants of the meeting.
  • a selection of a first three-dimensional model of a first presentation screen is received from a first participant.
  • the virtual conference room may include multiple three- dimensional models of presentation screens.
  • the presentation screens may be of varying sizes.
  • a first participant may select the first presentation screen by clicking a ‘share screen’ button on the first three-dimensional model of a first presentation screen.
  • the first participant may select the first presentation screen by selecting from a list of available presentation screens in the virtual conference room.
  • first participant’s device or server may automatically select the first presentation screen for the first participant based on the availability of the first presentation screen, the attributes of the presentation screen (e.g., size, location, etc.), or the proximity of the first participant’s avatar to the presentation screen.
  • the first presentation screen may be selected to share a presentation stream.
  • the presentation stream is received.
  • the presentation stream is audio and video data.
  • the presentation stream may be sharing a screen or window rendered on the first participant’s device.
  • the presentation stream is published to the server (e.g., server 302).
  • the server informs the other devices of other participants in the virtual conference room of the presentation stream. In some embodiments, the server informs the participants permitted to consume the presentation stream.
  • the presentation stream is mapped to the first three-dimensional model of the first presentation screen.
  • the presentation stream may be textured mapped to the first three-dimensional model of the first presentation screen.
  • the first three-dimensional model of the first presentation screen including the first presentation stream is rendered for display for the other participants.
  • the other participants in the virtual conference room may consume the presentation stream.
  • Figure 8 is a flow chart illustrating a method 800 for rendering a 2-D view of a presentation stream, according to some embodiments.
  • data specifying a three-dimensional virtual space is received.
  • a three-dimensional virtual environment is rendered for users.
  • the users may navigate around the three-dimensional virtual environment using avatars.
  • the three-dimensional virtual environment may include virtual conference rooms.
  • the users may also attend a meeting in a virtual conference room.
  • the users may be participants of the meeting.
  • the presentation stream is received from a first participant.
  • the presentation stream may be sharing a screen or window rendered on the first participant’s device.
  • the presentation stream is mapped to a three-dimensional model of a presentation screen in the virtual conference room.
  • the presentation stream may be textured mapped to the three-dimensional model of the presentation screen.
  • a selection of the presentation screen is received from a second participant.
  • the second participant may select to view a 2-D view of the presentation stream by selecting (e.g., clicking on) the presentation screen.
  • rendering for display for the second participant a 2-D view of the presentation stream.
  • a 2-D window including the presentation stream may be rendered on the second participant’s device.
  • the 2-D window may be overlaid on the three-dimensional virtual environment.
  • the presentation stream may continue to be rendered on the presentation screen while the 2-D view of the presentation stream is rendered for on the second participant’s device.
  • Figure 9 is a diagram of a system 900 illustrating components of devices used to provide videoconferencing within a virtual environment.
  • system 900 can operate according to the methods described above.
  • Device 306A is a user computing device.
  • Device 306A could be a desktop or laptop computer, smartphone, tablet, or wearable (e.g., watch or head mounted device).
  • Device 306A includes a microphone 902, camera 904, stereo speaker 906, input device 912.
  • device 306A also includes a processor and persistent, non-transitory and volatile memory.
  • the processors can include one or more central processing units, graphic processing units or any combination thereof.
  • Microphone 902 converts sound into an electrical signal. Microphone 902 is positioned to capture speech of a user of device 306 A.
  • microphone 1502 could be a condenser microphone, electret microphone, moving-coil microphone, ribbon microphone, carbon microphone, piezo microphone, fiber-optic microphone, laser microphone, water microphone, or MEMs microphone.
  • Camera 904 captures image data by capturing light, generally through one or more lenses. Camera 904 is positioned to capture photographic images of a user of device 306A. Camera 904 includes an image sensor (not shown).
  • the image sensor may, for example, be a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor.
  • CMOS complementary metal oxide semiconductor
  • the image sensor may include one or more photodetectors that detect light and convert to electrical signals. These electrical signals captured together in a similar timeframe comprise a still photographic image. A sequence of still photographic images captured at regular intervals together comprise a video. In this way, camera 904 captures images and videos.
  • Stereo speaker 906 is a device which converts an electrical audio signal into a corresponding left-right sound. Stereo speaker 906 outputs the left audio stream and the right audio stream generated by an audio processor 920 (below) to be played to device 306A’s user in stereo. Stereo speaker 906 includes both ambient speakers and headphones that are designed to play sound directly into a user’s left and right ears.
  • Example speakers includes moving-iron loudspeakers, piezoelectric speakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, heil air motion transducers, transparent ionic conduction speakers, plasma arc speakers, thermoacoustic speakers, rotary woofers, moving-coil, electrostatic, electret, planar magnetic, and balanced armature.
  • Network interface 908 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network.
  • Network interface 908 receives a video stream from server 302 for respective participants for the meeting. The video stream is captured from a camera on a device of another participant to the video conference.
  • Network interface 908 also received data specifying a three-dimensional virtual space and any models therein from server 302. For each of the other participants, network interface 908 receives a position and direction in the three-dimensional virtual space. The position and direction are input by each of the respective other participants.
  • Network interface 908 also transmits data to server 302. It transmits the position of device 306A’s user’s virtual camera used by Tenderer 918 and it transmits video and audio streams from camera 904 and microphone 902.
  • Display 910 is an output device for presentation of electronic information in visual or tactile form (the latter used for example in tactile electronic displays for blind people).
  • Display 910 could be a television set, computer monitor, head-mounted display, heads-up displays, output of a augmented reality or virtual reality headset, broadcast reference monitor, medical monitors mobile displays (for mobile devices), smartphone displays (for smartphones).
  • display 910 may include an electroluminescent (ELD) display, liquid crystal display (LCD), light-emitting diode (LED) backlit LCD, thin- film transistor (TFT) LCD, light-emitting diode (LED) display, OLED display, AMOLED display, plasma (PDP) display, or quantum dot (QLED) display.
  • ELD electroluminescent
  • LCD liquid crystal display
  • LED light-emitting diode
  • TFT thin- film transistor
  • LED light-emitting diode
  • OLED display OLED display
  • AMOLED display plasma (PDP) display
  • QLED quantum dot
  • Input device 912 is a piece of equipment used to provide data and control signals to an information processing system such as a computer or information appliance. Input device 912 allows a user to input a new desired position of a virtual camera used by Tenderer 918, thereby enabling navigation in the three-dimensional environment. Examples of input devices include keyboards, mouse, scanners, joysticks, and touchscreens.
  • Web browser 308A and web application 310A were described above with respect to Figure 3.
  • Web application 310A includes screen capturer 914, texture mapper 1516, Tenderer 918, and audio processor 920.
  • Screen capturer 914 captures a presentation stream, in particular a screen share.
  • Screen capturer 914 may interact with an API made available by web browser 308 A. By calling a function available from the API, screen capturer 914 may cause web browser 308A to ask the user which window or screen the user would like to share. Based on the answer to that query, web browser 308 A may return a video stream corresponding to the screen share to screen capturer 914, which passes it on to network interface 908 for transmission to server 302 and ultimately to other participants’ devices.
  • Texture mapper 916 textures map the video stream onto a three-dimensional model corresponding to an avatar. Texture mapper 916 May texture map respective frames from the video to the avatar. In addition, texture mapper 916 may texture map a presentation stream to a three-dimensional model of a presentation screen.
  • Renderer 918 renders, from a perspective of a virtual camera of the user of device 306A, for output to display 910 the three-dimensional virtual space including the texturemapped three-dimensional models of the avatars for respective participants located at the received, corresponding position and oriented at the direction. Renderer 918 also renders any other three-dimensional models including for example the presentation screen.
  • Audio processor 920 adjusts volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second position is in the three-dimensional virtual space relative to the first position. In one embodiment, audio processor 920 adjusts the volume based on a distance between the second position to the first position. In another embodiment, audio processor 920 adjusts the volume based on a direction of the second position to the first position. In yet another embodiment, audio processor 920 adjusts the volume based on a direction of the second position relative to the first position on a horizontal plane within the three-dimensional virtual space.
  • audio processor 920 adjusts the volume based on a direction where the virtual camera is facing in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located to the left of the virtual camera and the right audio stream tends to have a higher volume when the avatar is located to the right of the virtual camera.
  • audio processor 920 adjusts the volume based on an angle between the direction where the virtual camera is facing and a direction where the avatar is facing such that the angle being more normal to where the avatar is facing tends to have a greater difference in volume between the left and right audio streams.
  • Audio processor 920 can also adjust an audio stream’s volume based on the area where the speaker is located relative to an area where the virtual camera is located.
  • the three-dimensional virtual space is segmented into a plurality of areas. These areas may be hierarchical. When the speaker and virtual camera are located in different areas, a wall transmission factor may be applied to attenuate the speaking audio stream’s volume.
  • Server 302 includes an attendance notifier 922, a stream adjuster 924, and a stream forwarder 926.
  • Attendance notifier 922 notifies conference participants when participants join and leave the meeting. When a new participant joins the meeting, attendance notifier 922 sends a message to the devices of the other participants to the conference indicating that a new participant has joined. Attendance notifier 922 signals stream forwarder 926 to start forwarding video, audio, and position/direction information to the other participants.
  • Stream adjuster 924 receives a video stream captured from a camera on a device of a first user. Stream adjuster 924 determines an available bandwidth to transmit data for the virtual conference to the second user. It determines a distance between a first user and a second user in a virtual conference space. And, it apportions the available bandwidth between the first video stream and the second video stream based on the relative distance. In this way, stream adjuster 924 prioritizes video streams of closer users over video streams from farther ones. Additionally or alternatively, stream adjuster 924 may be located on device 306A, perhaps as part of web application 310A.
  • Stream forwarder 926 broadcasts position/direction information, video, audio, and screen share screens received (with adjustments made by stream adjuster 924).
  • Stream forwarder 926 may send information to the device 306 A in response to a request from conference application 310A.
  • Conference application 310A may send that request in response to the notification from attendance notifier 922.
  • Network interface 928 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network.
  • Network interface 928 transmits the model information to devices of the various participants.
  • Network interface 928 receives video, audio, and screen share screens from the various participants.
  • Screen capturer 914, texture mapper 916, Tenderer 918, audio processor 920, attendance notifier 922, a stream adjuster 924, and a stream forwarder 926 can each be implemented in hardware, software, firmware, or any combination thereof.
  • Identifiers such as “(a),” “(b),” “(i),” “(ii),” etc., are sometimes used for different elements or steps. These identifiers are used for clarity and do not necessarily designate an order for the elements or steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Disclosed herein is a web-based videoconference system that allows for multi-screen sharing. In some embodiments, data specifying a three-dimensional virtual space is received. The three-dimensional virtual space comprises a plurality of participants and an avatar representing each of the plurality of participants and three-dimensional models of a plurality of presentation screens. Multiple presentation streams may be shared on different presentation screens simultaneously.

Description

MULTI-SCREEN PRESENTATION IN A VIRTUAL VIDEOCONFERENCING ENVIRONMENT
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is claims priority to U.S. Application No. 17/813,657, filed on July 20, 2022 and U.S. Application No. 17/813,708, filed on July 20, 2022, the contents of which are incorporated by reference herein in their entireties.
BACKGROUND
Field
[0002] This field is generally related to videoconferencing.
Related Art
[0003] Video conferencing involves the reception and transmission of audio-video signals by users at different locations for communication between people in real time. Videoconferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, CA. Some videoconferencing software, such as the FaceTime application available from Apple Inc. of Cupertino, CA, comes standard with mobile devices.
[0004] In general, these applications operate by displaying video and outputting audio of other conference participants. When there are multiple participants, the screen may be divided into a number of rectangular frames, each displaying video of a participant. Sometimes these services operate by having a larger frame that presents video of the person speaking. As different individuals speak, that frame will switch between speakers. The application captures video from a camera integrated with the user’s device and audio from a microphone integrated with the user’s device. The application then transmits that audio and video to other applications running on other user’s devices.
[0005] Many of these videoconferencing applications have a screen share functionality. When a user decides to share their screen (or a portion of their screen), a stream is transmitted to the other users’ devices with the contents of their screen. In some cases, other users can even control what is on the user’s screen. In this way, users can collaborate on a project or make a presentation to the other meeting participants.
[0006] Recently, videoconferencing technology has gained importance. Especially since the COVID-19 pandemic, many workplaces, trade shows, meetings, conferences, schools, and places of worship are now taking place at least partially online. Virtual conferences using videoconferencing technology are increasingly replacing physical conferences. In addition, this technology provides advantages over physically meeting to avoid travel and commuting.
[0007] However, often, use of this videoconferencing technology causes loss of a sense of place. There is an experiential aspect to meeting in person physically, being in the same place, that is lost when conferences are conducted virtually. There is a social aspect to being able to posture yourself and look at your peers. This feeling of experience is important in creating relationships and social connections. Yet, this feeling is lacking when it comes to conventional videoconferences.
[0008] Moreover, when the conference starts to get several participants, additional problems occur with these videoconferencing technologies. In physical meeting conferences, people can have side conversations. You can project your voice so that only people close to you can hear what you’re saying. In some cases, you can even have private conversations in the context of a larger meeting. However, with virtual conferences, when multiple people are speaking at the same time, the software mixes the two audio streams substantially equally, causing the participants to speak over one another. Thus, when multiple people are involved in a virtual conference, private conversations are impossible, and the dialogue tends to be more in the form of speeches from one to many. Here, too, virtual conferences lose an opportunity for participants to create social connections and to communicate and network more effectively.
[0009] Moreover, due to limitations in the network bandwidth and computing hardware, when a lot of streams are placed in the conference, the performance of many videoconferencing systems begins to slow down. Many computing devices, while equipped to handle a video stream from a few participants, are ill-equipped to handle a video stream from a dozen or more participants. With many schools operating entirely virtually, classes of 25 can severely slow down the school-issued computing devices. [0010] Massively multiplayer online games (MMOG, or MMO) generally can handle quite a few more than 25 participants. These games often have hundreds or thousands of players on a single server. MMOs often allow players to navigate avatars around a virtual world. Sometimes these MMOs allow users to speak with one another or send messages to one another. Examples include the ROBLOX game available from Roblox Corporation of San Mateo, CA, and the MINECRAFT game available from Mojang Studios of Stockholm, Sweden.
[0011] Having bare avatars interact with one another also has limitations in terms of social interaction. These avatars usually cannot communicate facial expressions, which people often make inadvertently. These facial expressions are observable on videoconference. Some publications may describe having video placed on an avatar in a virtual world. However, these systems typically require specialized software and have other limitations that limit their usefulness.
[0012] Improved methods are needed for videoconferencing.
BRIEF SUMMARY
[0013] In an embodiment, a computer-implemented method allows users to simultaneously share two-screens in a three-dimensional virtual environment. The method comprises receiving data specifying a three-dimensional virtual space. The three-dimensional virtual space comprises a plurality of participants, an avatar representing each of the plurality of participants, and a plurality of three-dimensional models of a plurality of presentation screens. The method further comprises receiving a first selection of a first three- dimensional model of the plurality of three-dimensional models of a first presentation screen of the plurality of presentation screens and receiving a first presentation stream from a first client device of a first participant of the plurality of participants. Moreover, the method comprises mapping the first presentation stream onto the first three-dimensional model of the first presentation screen. The method further comprises receiving a second selection of a second three-dimensional model of the plurality of three-dimensional models of a second presentation screen of the plurality of presentation screens and receiving a second presentation stream from a second client device of the second participant. The method further comprises mapping the second presentation stream onto the second three- dimensional model of the second presentation screen while the first presentation stream is mapped on the first three-dimensional model of the first presentation screen. Furthermore, the method comprises from a perspective of a virtual camera of a third participant of the plurality of participants, rendering for display to the third participant the three-dimensional virtual space with the first three-dimensional model of the first presentation screen including the first presentation stream and the second three-dimensional model of the second presentation screen including the second presentation stream.
[0014] In an embodiment, a computer-implemented method allows sharing a presentation stream in a two-dimensional view. The method comprises receiving data specifying a three- dimensional virtual space. The three-dimensional virtual space comprises a plurality of participants and an avatar representing each of the plurality of participants. The method further comprises receiving a presentation stream from a first client device of a first participant of the plurality of participants and mapping the presentation stream onto a three- dimensional model of a presentation screen in the three-dimensional virtual space. The method further comprises receiving a selection of the presentation screen from a second participant of the plurality of participants and rendering for display to the second participant a two-dimensional view of the presentation stream.
[0015] System, device, and computer program product embodiments are also disclosed.
[0016] Further embodiments, features, and advantages of the invention, as well as the structure and operation of the various embodiments, are described in detail below with reference to accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the relevant art to make and use the disclosure.
[0018] Figure 1 is a diagram illustrating an example interface that provides videoconferencing in a virtual environment with video streams being mapped onto avatars. [0019] Figure 2 is a diagram illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
[0020] Figure 3 is a diagram illustrating a system that provides videoconferences in a virtual environment. [0021] Figures 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing.
[0022] Figures 5A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
[0023] Figures 6A-E illustrate an interface with example presentation screens in a three- dimensional virtual environment used for videoconferencing.
[0024] Figure 7 is a flowchart illustrating a method for sharing a presentation stream on a presentation screen in a virtual conference room, according to some embodiments.
[0025] Figure 8 is a flowchart illustrating a method for rendering a 2-D view of a presentation stream, according to some embodiments.
[0026] Figure 9 is a diagram illustrating components of devices used to provide videoconferencing within a virtual environment.
[0027] The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.
DETAILED DESCRIPTION
Video Conference with Avatars in a Virtual Environment
[0028] Figure 1 is a diagram illustrating an example of an interface 100 that provides videoconferences in a virtual environment with video streams being mapped onto avatars.
[0029] Interface 100 may be displayed to a participant to a videoconference. For example, interface 100 may be rendered for display to the participant and may be constantly updated as the videoconference progresses. A user may control the orientation of their virtual camera using, for example, keyboard inputs. In this way, the user can navigate around a virtual environment. In an embodiment, different inputs may change the virtual camera’s X and Y position and pan and tilt angles in the virtual environment. In further embodiments, a user may use inputs to alter height (the Z coordinate) or yaw of the virtual camera. In still further embodiments, a user may enter inputs to cause the virtual camera to “hop” up while returning to its original position, simulating gravity. The inputs available to navigate the virtual camera may include, for example, keyboard and mouse inputs, such as WASD keyboard keys to move the virtual camera forward, backward, left, or right on an X-Y plane, a space bar key to “hop” the virtual camera, and mouse movements specifying changes in pan and tilt angles.
[0030] Interface 100 includes avatars 102A and B, which each represent different participants to the videoconference. Avatars 102A and B, respectively, have texture mapped video streams 104 A and B from devices of the first and second participant. A texture map is an image applied (mapped) to the surface of a shape or polygon. Here, the images are respective frames of the video. The camera devices capturing video streams 104A and B are positioned to capture faces of the respective participants. In this way, the avatars have texture mapped thereon, moving images of faces as participants in the meeting talk and listen.
[0031] Similar to how the virtual camera is controlled by the user viewing interface 100, the location and direction of avatars 102 A and B are controlled by the respective participants that they represent. Avatars 102A and B are three-dimensional models represented by a mesh. Each avatar 102 A and B may have the participant’s name underneath the avatar.
[0032] The respective avatars 102 A and B are controlled by the various users. They each may be positioned at a point corresponding to where their own virtual cameras are located within the virtual environment. Just as the user viewing interface 100 can move around the virtual camera, the various users can move around their respective avatars 102 A and B.
[0033] The virtual environment rendered in interface 100 includes background image 120 and a three-dimensional model 118 of an arena. The arena may be a venue or building in which the videoconference should take place. The arena may include a floor area bounded by walls. Three-dimensional model 118 can include a mesh and texture. Other ways to mathematically represent the surface of three-dimensional model 118 may be possible as well. For example, polygon modeling, curve modeling, and digital sculpting may be possible. For example, three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space. Three-dimensional model 118 may also include specification of light sources. The light sources can include for example, point, directional, spotlight, and ambient. The objects may also have certain properties describing how they reflect light. In examples, the properties may include diffuse, ambient, and spectral lighting interactions. [0034] In addition to the arena, the virtual environment can include various other three- dimensional models that illustrate different components of the environment. For example, the three-dimensional environment can include a decorative model 114, a speaker model 116, and a presentation screen model 122. Just as three-dimensional model 118, these models can be represented using any mathematical way to represent a geometric surface in three-dimensional space. These models may be separate from three-dimensional model 118 or combined into a single representation of the virtual environment.
[0035] Decorative models, such as decorative model 114, serve to enhance the realism and increase the aesthetic appeal of the arena. Speaker model 116 may virtually emit sound, such as presentation and background music, as will be described in greater detail below with respect to figures 5 and 7. Presentation screen model 122 can serve to provide an outlet to present a presentation. Video of the presenter or a presentation screen share may be texture mapped onto presentation screen model 122.
[0036] Button 108 may provide the user a list of participants. In one example, after a user selects button 108, the user could chat with other participants by sending text messages, individually or as a group.
[0037] Button 110 may enable a user to change attributes of the virtual camera used to render interface 100. For example, the virtual camera may have a field of view specifying the angle at which the data is rendered for display. Modeling data within the camera field of view is rendered, while modeling data outside the camera’s field of view may not be. By default, the virtual camera’s field of view may be set somewhere between 60 and 110°, which is commensurate with a wide-angle lens and human vision. However, selecting button 110 may cause the virtual camera to increase the field of view to exceed 170°, commensurate with a fisheye lens. This may enable a user to have broader peripheral awareness of its surroundings in the virtual environment.
[0038] Finally, button 112 causes the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to devices belonging to the other participants signaling to their devices to stop displaying the avatar corresponding to the user previously viewing interface 100.
[0039] In this way, interface virtual 3D space is used to conduct video conferencing. Every user controls an avatar, which they can control to move around, look around, jump or do other things which change the position or orientation. A virtual camera shows the user the virtual 3D environment and the other avatars. The avatars of the other users have as an integral part a virtual display, which shows the webcam image of the user.
[0040] By giving users a sense of space and allowing users to see each other’s faces, embodiments provide a more social experience than conventional web conferencing or conventional MMO gaming. That more social experience has a variety of applications. For example, it can be used in online shopping. For example, interface 100 has applications in providing virtual grocery stores, houses of worship, trade shows, B2B sales, B2C sales, schooling, restaurants or lunchrooms, product releases, construction site visits (e.g., for architects, engineers, contractors), office spaces (e.g., people work “at their desks” virtually), controlling machinery remotely (ships, vehicles, planes, submarines, drones, drilling equipment, etc.), plant/factory control rooms, medical procedures, garden designs, virtual bus tours with guide, music events (e.g., concerts), lectures (e.g., TED talks), meetings of political parties, board meetings, underwater research, research on hard to reach places, training for emergencies (e.g., fire), cooking, shopping (with checkout and delivery), virtual arts and crafts (e.g., painting and pottery), marriages, funerals, baptisms, remote sports training, counseling, treating fears (e.g., confrontation therapy), fashion shows, amusement parks, home decoration, watching sports, watching esports, watching performances captured using a three-dimensional camera, playing board and role playing games, walking over/through medical imagery, viewing geological data, learning languages, meeting in a space for the visually impaired, meeting in a space for the hearing impaired, participation in events by people who normally can’t walk or stand up, presenting the news or weather, talk shows, book signings, voting, MMOs, buying/selling virtual locations (such as those available in some MMOs like the SECOND LIFE game available from Linden Research, Inc. of San Francisco, CA), flea markets, garage sales, travel agencies, banks, archives, computer process management, fencing/swordfighting/martial arts, reenactments (e.g., reenacting a crime scene and or accident), rehearsing a real event (e.g., a wedding, presentation, show, space-walk), evaluating or viewing a real event captured with three-dimensional cameras, livestock shows, zoos, experiencing life as a tall/short/blind/deaf/white/black person (e.g., a modified video stream or still image for the virtual world to simulate the perspective that a user wishes to experience the reactions), job interviews, game shows, interactive fiction (e.g., murder mystery), virtual fishing, virtual sailing, psychological research, behavioral analysis, virtual sports (e.g., climbing/bouldering), controlling the lights, etc., in your house or other location (domotics), memory palace, archaeology, gift shop, virtual visit so customers will be more comfortable on their real visit, virtual medical procedures to explain the procedures and have people feel more comfortable, and virtual trading floor/fmancial marketplace/stock market (e.g., integrating real-time data and video feeds into the virtual world, real-time transactions and analytics), virtual location people have to go as part of their work so they will actually meet each other organically (e.g., if you want to create an invoice, it is only possible from within the virtual location) and augmented reality where you project the face of the person on top of their AR headset (or helmet) so you can see their facial expressions (e.g., useful for military, law enforcement, firefighters, special ops), and making reservations (e.g., for a certain holiday home/car/etc.).
[0041] Figure 2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing. Just as illustrated in figure 1, the virtual environment here includes a three-dimensional arena 118, and various three- dimensional models, including three-dimensional models 114 and 122. Also as illustrated in figure 1, diagram 200 includes avatars 102 A and B navigating around the virtual environment.
[0042] As described above, interface 100 in figure 1 is rendered from the perspective of a virtual camera. That virtual camera is illustrated in diagram 200 as virtual camera 204. As mentioned above, the user viewing interface 100 in figure 1 can control virtual camera 204 and navigate the virtual camera in three-dimensional space. Interface 100 is constantly being updated according to the new position of virtual camera 204 and any changes of the models within in the field of view of virtual camera 204. As described above, the field of view of virtual camera 204 may be a frustum defined, at least in part, by horizontal and vertical field of view angles.
[0043] As described above with respect to figure 1, a background image, or texture, may define at least part of the virtual environment. The background image may capture aspects of the virtual environment that are meant to appear at a distance. The background image may be texture mapped onto a sphere 202. The virtual camera 204 may be at an origin of the sphere 202. In this way, distant features of the virtual environment may be efficiently rendered. [0044] In other embodiments, other shapes instead of sphere 202 may be used to texture map the background image. In various alternative embodiments, the shape may be a cylinder, cube, rectangular prism, or any other three-dimensional geometry.
[0045] Figure 3 is a diagram illustrating a system 300 that provides videoconferences in a virtual environment. System 300 includes a server 302 coupled to devices 306 A and B via one or more networks 304.
[0046] Server 302 provides the services to connect a videoconference session between devices 306 A and 306B. As will be described in greater detail below, server 302 communicates notifications to devices of conference participants (e.g., devices 306A-B) when new participants join the conference and when existing participants leave the conference. Server 302 communicates messages describing a position and direction in a three-dimensional virtual space for respective participant’s virtual cameras within the three-dimensional virtual space. Server 302 also communicates video and audio streams between the respective devices of the participants (e.g., devices 306A-B). Finally, server 302 stores and transmits data describing data specifying a three-dimensional virtual space to the respective devices 306A-B.
[0047] In addition to the data necessary for the virtual conference, server 302 may provide executable information that instructs the devices 306 A and 306B on how to render the data to provide the interactive conference.
[0048] Server 302 responds to requests with a response. Server 302 may be a web server. A web server is software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web. The main job of a web server is to display website content through storing, processing and delivering webpages to users.
[0049] In an alternative embodiment, communication between devices 306A-B happens not through server 302 but on a peer-to-peer basis. In that embodiment, one or more of the data describing the respective participants’ location and direction, the notifications regarding new and exiting participants, and the video and audio streams of the respective participants are communicated not through server 302 but directly between devices 306A- B.
[0050] Network 304 enables communication between the various devices 306A-B and server 302. Network 304 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any combination of two or more such networks.
[0051] Devices 306A-B are each devices of respective participants to the virtual conference. Devices 306A-B each receive data necessary to conduct the virtual conference and render the data necessary to provide the virtual conference. As will be described in greater detail below, devices 306A-B include a display to present the rendered conference information, inputs that allow the user to control the virtual camera, a speaker (such as a headset) to provide audio to the user for the conference, a microphone to capture a user’s voice input, and a camera positioned to capture video of the user’s face.
[0052] Devices 306A-B can be any type of computing device, including a laptop, a desktop, a smartphone, or a tablet computer, or wearable computer (such as a smartwatch or a augmented reality or virtual reality headset).
[0053] Web browser 308A-B can retrieve a network resource (such as a webpage) addressed by the link identifier (such as a uniform resource locator, or URL) and present the network resource for display. In particular, web browser 308A-B is a software application for accessing information on the World Wide Web. Usually, web browser 308A-B makes this request using the hypertext transfer protocol (HTTP or HTTPS). When a user requests a web page from a particular website, the web browser retrieves the necessary content from a web server, interprets and executes the content, and then displays the page on a display on device 306A-B shown as client/counterpart conference application 308A-B. In examples, the content may have HTML and client-side scripting, such as JavaScript. Once displayed, a user can input information and make selections on the page, which can cause web browser 308A-B to make further requests.
[0054] Conference application 310A-B may be a web application downloaded from server 302 and configured to be executed by the respective web browsers 308A-B. In an embodiment, conference application 310A-B may be a JavaScript application. In one example, conference application 310A-B may be written in a higher-level language, such as a Typescript language, and translated or compiled into JavaScript. Conference application 310A-B is configured to interact with the WebGL JavaScript application programming interface. It may have control code specified in JavaScript and shader code written in OpenGL ES Shading Language (GLSL ES). Using the WebGL API, conference application 310A-B may be able to utilize a graphics processing unit (not shown) of device 306A-B. Moreover, OpenGL rendering of interactive two-dimensional and three- dimensional graphics without the use of plug-ins.
[0055] Conference application 310A-B receives the data from server 302 describing position and direction of other avatars and three-dimensional modeling information describing the virtual environment. In addition, conference application 310A-B receives video and audio streams of other conference participants from server 302.
[0056] Conference application 310A-B renders three three-dimensional modeling data, including data describing the three-dimensional environment and data representing the respective participant avatars. This rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques. In an embodiment, the rendering may involve ray tracing based on the characteristics of the virtual camera. Ray tracing involves generating an image by tracing a path of light as pixels in an image plane and simulating the effects of his encounters with virtual objects. In some embodiments, to enhance realism, the ray tracing may simulate optical effects such as reflection, refraction, scattering, and dispersion.
[0057] In this way, the user uses web browser 308A-B to enter a virtual space. The scene is displayed on the screen of the user. The webcam video stream and microphone audio stream of the user are sent to server 302. When other users enter the virtual space an avatar model is created for them. The position of this avatar is sent to the server and received by the other users. Other users also get a notification from server 302 that an audio/video stream is available. The video stream of a user is placed on the avatar that was created for that user. The audio stream is played back as coming from the position of the avatar.
[0058] Figures 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing. Like figure 3, each of figures 4A-C depict the connection between server 302 and devices 306 A and B. In particular, figures 4A-C illustrate example data flows between those devices.
[0059] Figure 4A illustrates a diagram 400 illustrating how server 302 transmits data describing the virtual environment to devices 306A and 306B. In particular, both devices 306 A and 306B, receive from server 302 the three-dimensional arena 404, background texture 402, space hierarchy 408 and any other three-dimensional modeling information 406.
[0060] As described above, background texture 402 is an image illustrating distant features of the virtual environment. The image may be regular (such as a brick wall) or irregular. Background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image format. It describes the background image to be rendered against, for example, a sphere at a distance.
[0061] Three-dimensional arena 404 is a three-dimensional model of the space in which the conference is to take place. As described above, it may include, for example, a mesh and possibly its own texture information to be mapped upon the three-dimensional primitives it describes. It may define the space in which the virtual camera and respective avatars can navigate within the virtual environment. Accordingly, it may be bounded by edges (such as walls or fences) that illustrate to users the perimeter of the navigable virtual environment.
[0062] Space hierarchy 408 is data specifying partitions in the virtual environment. These partitions are used to determine how sound is processed before being transferred between participants. As will be described below, this partition data may be hierarchical and may describe sound processing to allow for areas where participants to the virtual conference can have private conversations or side conversations.
[0063] Three-dimensional model 406 is any other three-dimensional modeling information needed to conduct the conference. In one embodiment, this may include information describing the respective avatars. Alternatively or additionally, this information may include product demonstrations.
[0064] With the information needed to conduct the meeting sent to the participants, figures 4B-C illustrate how server 302 forwards information from one device to another. Figure 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and B, and Figure 4C illustrates a diagram 420 showing how server 302 transmits the information to respective devices 306B and A. In particular, device 306 A transmits position and direction 422A, video stream 424A, and audio stream 426A to server 302, which transmits position and direction 422 A, video stream 424 A, and audio stream 426 A to device 306B. And device 306B transmits position and direction 422B, video stream 424B, and audio stream 426B to server 302, which transmits position and direction 422B, video stream 424B, and audio stream 426B to device 306 A.
[0065] Position and direction 422A-B describe the position and direction of the virtual camera for the user using device 306 A. As described above, the position may be a coordinate in three-dimensional space (e.g., x, y, z coordinate) and the direction may be a direction in three-dimensional space (e.g., pan, tilt, roll). In some embodiments, the user may be unable to control the virtual camera’s roll, so the direction may only specify pan and tilt angles. Similarly, in some embodiments, the user may be unable to change the avatar’s z coordinate (as the avatar is bounded by virtual gravity), so the z coordinate may be unnecessary. In this way, position and direction 422A-B each may include at least a coordinate on a horizontal plane in the three-dimensional virtual space and a pan and tilt value. Alternatively or additionally, the user may be able to “jump” its avatar, so the Z position may be specified only by an indication of whether the user is jumping their avatar.
[0066] In different examples, position and direction 422A-B may be transmitted and received using HTTP request responses or using socket messaging.
[0067] Video stream 424A-B is video data captured from a camera of the respective devices 306 A and B. The video may be compressed. For example, the video may use any commonly known video codecs, including MPEG-4, VP8, or H.264. The video may be captured and transmitted in real time.
[0068] Similarly, audio stream 426A-B is audio data captured from a microphone of the respective devices. The audio may be compressed. For example, the video may use any commonly known audio codecs, including MPEG-4 or vorbis. The audio may be captured and transmitted in real time. Video stream 424A and audio stream 426A are captured, transmitted, and presented synchronously with one another. Similarly, video stream 424B and audio stream 426B are captured, transmitted, and presented synchronously with one another.
[0069] The video stream 424A-B and audio stream 426A-B may be transmitted using the WebRTC application programming interface. The WebRTC is an API available in JavaScript. As described above, devices 306 A and B download and run web applications, as conference applications 310A and B, and conference applications 310A and B may be implemented in JavaScript. Conference applications 310A and B may use WebRTC to receive and transmit video stream 424A-B and audio stream 426A-B by making API calls from its JavaScript.
[0070] As mentioned above, when a user leaves the virtual conference, this departure is communicated to all other users. For example, if device 306 A exits the virtual conference, server 302 would communicate that departure to device 306B. Consequently, device 306B would stop rendering an avatar corresponding to device 306 A, removing the avatar from the virtual space. Additionally, device 306B will stop receiving video stream 424 A and audio stream 426A.
[0071] As described above, conference applications 310A and B may periodically or intermittently re-render the virtual space based on new information from respective video streams 424A and B, position and direction 422A and B, and new information relating to the three-dimensional environment. For simplicity, each of these updates are now described from the perspective of device 306A. However, a skilled artisan would understand device 306B would behave similarly given similar changes.
[0072] As device 306A receives video stream 424B, device 306A texture maps frames from video stream 424A on to an avatar corresponding to device 306B. That texture mapped avatar is re-rendered within the three-dimensional virtual space and presented to a user of device 306 A.
[0073] As device 306 A receives a new position and direction 422B, device 306 A generates the avatar corresponding to device 306B positioned at the new position and oriented at the new direction. The generated avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306 A.
[0074] In some embodiments, server 302 may send updated model information describing the three-dimensional virtual environment. For example, server 302 may send updated information 402, 404, 406, or 408. When that happens, device 306 A will re-render the virtual environment based on the updated information. This may be useful when the environment changes over time. For example, an outdoor event may change from daylight to dusk as the event progresses.
[0075] Again, when device 306B exits the virtual conference, server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In that case, device 306 A would re-render the virtual environment without the avatar for device 306B. [0076] While figure 3 in figures 4A-C is illustrated with two devices for simplicity, a skilled artisan would understand that the techniques described herein can be extended to any number of devices. Also, while figure 3 in figures 4A-C illustrates a single server 302, a skilled artisan would understand that the functionality of server 302 can be spread out among a plurality of computing devices. In an embodiment, the data transferred in FIG. 4A may come from one network address for server 302, while the data transferred in FIGs. 4B- C can be transferred to/from another network address for server 302.
[0077] In one embodiment, participants can set their webcam, microphone, speakers and graphical settings before entering the virtual conference. In an alternative embodiment, after starting the application, users may enter a virtual lobby where they are greeted by an avatar controlled by a real person. This person is able to view and modify the webcam, microphone, speakers and graphical settings of the user. The attendant can also instruct the user on how to use the virtual environment, for example by teaching them about looking, moving around and interacting. When they are ready, the user automatically leaves the virtual waiting room and joins the real virtual environment.
Adjusting Volume for a Video Conference in a Virtual Environment
[0078] Embodiments also adjust volume to provide a sense of position and space within the virtual conference.
[0079] Figures 5A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
[0080] The server may provide specification of sound or volume areas to the client devices. Virtual environment may be partitioned into different volume areas.
[0081] Figure 5A illustrates a diagram 500 with a volume area 502 that allows for a semiprivate or side conversation between a user controlling avatar 506 and the user controlling the virtual camera. In this way, the users around conference table 510 can have a conversation without disturbing others in the room. The sound from the users controlling avatar 506 in the virtual camera may fall off as it exits volume area 502, but not entirely. That allows passersby to join the conversation if they’d like.
[0082] Interface 500 also includes buttons 504, 506, and 508, which will be described below.
[0083] Figure 5B illustrates a diagram 500 with a volume area 504 that allows for a private conversation between a user controlling avatar 508 and the user controlling the virtual camera. Once inside volume area 504, audio from the user controlling avatar 508 and the user controlling the virtual camera may only be output to those inside volume area 504. As no audio at all is played from those users to others in the conference, their audio streams may not even be transmitted to the other user devices.
[0084] Additionally or alternatively, the different areas have different roll off factors in that case, the distance based for individual areas based on the respective roll off factors. In this way, different areas of the virtual environment project sound at different rates. The audio gains may be applied to the audio stream to determine left and right audio accordingly. In this way, both wall transmission factors, roll off factors, and left-right adjustments to provide a sense of direction for the sound may be applied together to provide a comprehensive audio experience.
[0085] Different audio areas may have different functionality. For example, a volume area may be a podium area. If the user is located in the podium area, no attenuation may occur because of roll off factors or wall transmission factors. In some embodiments, the relative left-right audio may still be adjusted to provide a sense of direction.
[0086] For exemplary purposes, the same methods may be applied to other sound sources, other than avatars. For example, the virtual environment may have three-dimensional models of speakers. Sound may be emitted from the speakers in the same way as the avatar models described above, either because of a presentation or just to provide background music.
[0087] As mentioned above, wall transmission factors may be used to isolate audio entirely. In an embodiment, this can be used to create virtual offices. In one example, each user may have in their physical (perhaps home) office a monitor displaying the conference application constantly on and logged into the virtual office. There may be a feature that allows the user to indicate whether he’s in the office or should not be disturbed. If the do- not-disturb indicator is off, a coworker or manager may come around within the virtual space and knock or walk in as they would in a physical office. The visitor may be able to leave a note if the worker is not present in her office. When the worker returns, she would be able to read the note left by the visitor. The virtual office may have a whiteboard and/or an interface that displays messages for the user. The messages may be email and/or from a messaging application such as the SLACK application available from Slack Technologies, Inc. of San Francisco, CA. [0088] Users may be able to customize or personalize their virtual offices. For example, they may be able to put up models of posters or other wall ornaments. They may be able to change models or orientation of desks or decorative ornaments, such as plantings. They may be able to change lighting or view out the window.
[0089] Turning back to figure 5A, the interface 500 includes various buttons 504, 506, and 508. When a user presses the button 504, the attenuation may not occur, or may occur only in smaller amounts. In that situation, the user’s voice is output uniformly to other users, allowing for the user to provide a talk to all participants in the meeting. The user video may also be output on a presentation screen within the virtual environment as well, as will be described below. When a user presses the button 506, a speaker mode is enabled. In that case, audio is output from sound sources within the virtual environment, such as to play background music. When a user presses button 508, a screen share mode may be enabled, enabling the user to share contents of a screen or window on their device with other users. The contents may be presented on a presentation model. This too will be described below.
Presenting in a Three-dimensional Environment
[0090] Embodiments also allow users in the three-dimensional virtual environment to share a presentation stream on a presentation screen in the three-dimensional virtual environment used for video conferencing.
[0091] Figures 6A-E illustrate an interface 600 with presentation screens in a three- dimensional virtual environment used for videoconferencing. As described above with respect to figure 1, interface 600 may be displayed to a user who can navigate around the virtual environment. As illustrated in interface 600, the virtual environment includes a virtual conference room with multiple presentation screens.
[0092] Figure 6A illustrates interface 600 with a three-dimensional model of presentation screen 602. In this embodiment, interface 600 may include a virtual conference room. Users represented by avatars 604-606 may conduct a meeting in the virtual conference room. The users may be participants of the meeting. The virtual conference room may include the three-dimensional model of presentation screen 602. Presentation screen 602 may be positioned at a central location of the virtual conference room. The virtual conference room may include a virtual conference table with additional three-dimensional models of presentation screens distributed around the virtual conference table. The virtual conference table with the additional three-dimensional models of presentation screens will be described in further detail with respect to Figures 6B-6C.
[0093] Presentation screen 602 may be a main presentation screen. In this regard, presentation screen 602 may be larger than the additional presentation screens distributed around the virtual conference table. Furthermore, presentation screen 602 may be positioned such that it is visible to all of the participants in the virtual conference room.
[0094] A first participant in the virtual conference room may want to share a presentation stream captured by their device (e.g., device 306A or device 306B) on the three- dimensional model of presentation screen 602. The server (e.g., server 302) may receive a selection of presentation screen 602. In some embodiments, the first participant may select a share screen button located on the three-dimensional model of presentation screen 602. The share screen button may cause a prompt to be rendered on the first participant’s device asking prompt the user to select a screen and/or window of their device to share on presentation screen 602. In other embodiments, the first participant may select a share screen button on interface 600. Selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the first participant to select a screen and/or window of their device to share on the three-dimensional model of presentation screen 602. Furthermore, selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the user to select a presentation screen in the virtual conference room.
[0095] In response to receiving a selection of the three-dimensional model of presentation screen 602, the server may receive a presentation stream from the first participant’s device. Specifically, the selection of presentation screen 602 and presentation stream may be published to the server. The presentation stream may include audio and video data. In one embodiment, the presentation stream may be a video stream from a camera on first participant’s device. In another embodiment, the presentation stream may be a screen share from the user’s device, where a monitor or window is shared. Through screen share or otherwise, the presentation video and audio stream could also be from an external source, for example a livestream of an event. When the first participant enables presenter mode, the presentation stream (and audio stream) of the first participant is published to the server tagged with the name of the screen the user wants to use. Other clients are notified that a new stream is available. [0096] The presentation stream is texture mapped onto the three-dimensional model of a presentation screen 602. A presentation screen may take a variety of forms, such as a poster, a view out of a window, a view of a control panel, or a surface of a table with object placed on it. The participants in the virtual conference room can consume the presentation stream by viewing presentation screen 602. In other words, from the perspective of a virtual camera the participants in the virtual conference room, the presentation stream may be rendered for display to the users in the virtual conference room on presentation screen 602.
[0097] An audio stream is captured synchronously with the presentation stream and from a microphone of the device of the first participant. The audio stream from the microphone of the first participant may be heard by other participants as to be coming from presentation screen 602. In this way, presentation screen 602 may be a sound source as described above. Because the first participant’s audio stream is projected from the presentation screen 602, it may be suppressed coming from the first participant’s avatar. In this way, the audio stream is outputted to play synchronously with display of the presentation stream on screen 602 within the three-dimensional virtual space.
[0098] In some embodiments, the first participant may also be able to control the location and orientation of the audience members. For example, the first participant may have an option to select to re-arrange all the other participants to the meeting to be positioned and oriented to face the presentation screen.
[0099] Figure 6B illustrates interface 600 multiple presentation screens around a virtual conference table. As described above, the virtual conference room may include a three- dimensional model of a virtual conference table 611. In this embodiment, three- dimensional models of presentation screens 610-616 may be positioned around the virtual conference table 611. Each of the presentation screens 610-616 may include a ‘share screen’ button. For example, presentation screen 610 may include ‘share screen’ button 618. Similarly, presentation screens may also include the ‘share screen’ button.
[0100] As a non-limiting example, the first participant may select ‘share screen’ button 618 located on the three-dimensional model of presentation screen 610 to share a presentation stream from their device (e.g., device 306A or 306B). The share screen button may cause a prompt to be rendered on the first participant’s device asking the first participant to select a screen and/or window of their device to share on presentation screen 610. In other embodiments, the first participant may select a share screen button on interface 600. Selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’ s device asking the user to select a screen and/or window of their device to share on presentation screen 610. Furthermore, selecting the share screen button on interface 600 may cause a prompt to be rendered on the first participant’s device asking the first participant to select one of presentation screens 610-616. The first participant may also select presentation screen 602, as shown in figure 6A. In other embodiments, the server may automatically select one of presentation screens 610-616 (or 602) based on the availability of the presentation screen, the size of the presentation screen, or the proximity of the first participant’s avatar to a given presentation screen in the three-dimensional virtual environment.
[0101] In response to receiving a selection of the three-dimensional model of presentation screen 610, the server may receive a presentation stream from the first participant’s device. Specifically, the selection of the three-dimensional model of presentation screen 610 and presentation stream may be published to the server. Other clients are notified that a new stream is available.
[0102] Figure 6C illustrates the first participant’s presentation stream being shared on presentation screen 610. The first participant’s presentation stream 620 is texture mapped onto the three-dimensional model of presentation screen 610. The participants in the virtual conference room can consume presentation stream 620 by viewing presentation screen 610. In other words, from the perspective of a virtual camera the participants in the virtual conference room, presentation stream 620 may be rendered for display to the participants in the virtual conference room on the three-dimensional model of a presentation screen 610. As such, the participants will have to navigate their respective avatars with respect to presentation screen 610 in the virtual conference room to consume presentation stream 620.
[0103] An audio stream is captured synchronously with presentation stream 620 and from a microphone of the device of the first participant. The audio stream from the microphone of the user may be heard by other users as to be coming from presentation screen 610. In this way, presentation screen 610 may be a sound source as described above. Because the first participant’s audio stream is projected from the presentation screen 610, it may be suppressed coming from the user’s avatar. In this way, the audio stream is outputted to play synchronously with display of presentation stream 620 on presentation screen 610 within the three-dimensional virtual space. [0104] The first participant or other participants in the virtual conference room may share other presentation streams on remaining available presentation screens (e.g., presentation screens 612-616 or 602). As such, a single participant may concurrently share a plurality of streams, or multiple participants may each share concurrently a presentation stream in the virtual conference room. When multiple presentation streams are shared, audio from the various presentation streams are combined (along with audio from other sources, such as from other user’s avatars), as described above with respect FIG. 5A-B. The audio is positional, including optional falloff. Alternatively or additionally, a user, such as a presenter, participant, or administrator, may have more control over the audio. For example, a user interface element may enable the user to mute all participants or all presentation streams except a specific one.
[0105] The first participant can chose to stop sharing the presentation stream by selecting a button on interface 600. In response to the fist participant selecting the button, the server can end the presentation stream being rendered on a presentation stream. In some embodiments, the server may end the presentation stream being rendered on a presentation stream based on the first participant leaving the three-dimensional virtual environment or the first participant’s avatar being more than a predetermined threshold distance from the presentation screen. For example, if the first participant leave the virtual conference room, the server may end the presentation stream being rendered on a presentation stream.
[0106] As described above with respect to Figures 5 and 6, the virtual conference room may be a volume area. As such, users outside the virtual conference room may not be able to consume the presentation streams being shared by the participants in the virtual conference room.
[0107] In some embodiments, in response to the presentation stream being published to the server, the server may determine which participants in the virtual conference room may consume the presentation stream. For example, determine which participants in the virtual conference room may consume the presentation stream based on position/title, security clearance, position of the participant’s avatar in the virtual conference room/three- dimensional virtual environment, etc. In some embodiments, a participant (e.g., first participant) may select which participants are to consume the presentation stream. The selection of the participants may be transmitted along with the presentation stream to the server. [0108] Figure 6D illustrates a 2-D view of the presentation stream. A participant in the virtual conference room may want to view presentation stream 620 being rendered on a presentation screen in a 2-D view. The participant may select the presentation stream 620 (e.g., by clicking on the presentation stream). In response to selecting presentation stream 620, the server or the participant’s device may cause a 2-D view 630 of presentation stream 620 to be rendered on the participant’s device.
[0109] 2-D view 630 of presentation stream 620 may be rendered over the three- dimensional virtual environment on the participant’s device. As such, the three- dimensional virtual environment may be visible to the participant around 2-D view 630. Presentation stream 620 may continue to be rendered on presentation screen 610 while 2- D view 630 of presentation stream 620 is rendered on the participant’s device.
[0110] Figure 6E illustrates a 2-D view of a participant’s view of the three-dimensional virtual space. In some embodiments, a participant may share presentation stream 650 in on presentation screen 660 in the virtual conference room. Presentation stream 650 may include the participant’s view of the three-dimensional virtual environment. As such, presentation stream 650 may include the avatars and virtual conference room from the participant’s perspective.
[OHl] In response to selecting to share presentation stream 650, presentation stream 650 may mapped to presentation screen 660. Presentation stream 650 may be rendered in 2-D on presentation screen 660. Presentation stream 650 is continuously updated as the perspective of the participant is updated in the virtual conference room. Presentation stream 650 may provide a honeycomb view of the virtual conference room. The participants in the virtual conference room may consume presentation stream 650.
[0112] Figure 7 is a flow chart illustrating a method 700 for sharing a presentation stream on a presentation screen in a virtual conference room, according to some embodiments.
[0113] In 702, data specifying a three-dimensional virtual space is received. As described above with respect to figure 1, a three-dimensional virtual environment is rendered for users. The users may navigate around the three-dimensional virtual environment using avatars. The three-dimensional virtual environment may include virtual conference rooms. The users may also attend a meeting in a virtual conference room. The users may be participants of the meeting. [0114] In 704, a selection of a first three-dimensional model of a first presentation screen is received from a first participant. The virtual conference room may include multiple three- dimensional models of presentation screens. The presentation screens may be of varying sizes. A first participant may select the first presentation screen by clicking a ‘share screen’ button on the first three-dimensional model of a first presentation screen. Alternatively, the first participant may select the first presentation screen by selecting from a list of available presentation screens in the virtual conference room. In another example, first participant’s device or server may automatically select the first presentation screen for the first participant based on the availability of the first presentation screen, the attributes of the presentation screen (e.g., size, location, etc.), or the proximity of the first participant’s avatar to the presentation screen. The first presentation screen may be selected to share a presentation stream.
[0115] In 706, the presentation stream is received. The presentation stream is audio and video data. In one example, the presentation stream may be sharing a screen or window rendered on the first participant’s device. The presentation stream is published to the server (e.g., server 302). The server informs the other devices of other participants in the virtual conference room of the presentation stream. In some embodiments, the server informs the participants permitted to consume the presentation stream.
[0116] In 708, the presentation stream is mapped to the first three-dimensional model of the first presentation screen. The presentation stream may be textured mapped to the first three-dimensional model of the first presentation screen.
[0117] In 710, the first three-dimensional model of the first presentation screen including the first presentation stream is rendered for display for the other participants. The other participants in the virtual conference room may consume the presentation stream.
[0118] Figure 8 is a flow chart illustrating a method 800 for rendering a 2-D view of a presentation stream, according to some embodiments.
[0119] In 802, data specifying a three-dimensional virtual space is received. As described above with respect to figure 1, a three-dimensional virtual environment is rendered for users. The users may navigate around the three-dimensional virtual environment using avatars. The three-dimensional virtual environment may include virtual conference rooms. The users may also attend a meeting in a virtual conference room. The users may be participants of the meeting. [0120] In 804, the presentation stream is received from a first participant. In one example, the presentation stream may be sharing a screen or window rendered on the first participant’s device.
[0121] In 806, the presentation stream is mapped to a three-dimensional model of a presentation screen in the virtual conference room. The presentation stream may be textured mapped to the three-dimensional model of the presentation screen.
[0122] In 808, a selection of the presentation screen is received from a second participant. The second participant may select to view a 2-D view of the presentation stream by selecting (e.g., clicking on) the presentation screen.
[0123] In 810, rendering for display for the second participant a 2-D view of the presentation stream. A 2-D window including the presentation stream may be rendered on the second participant’s device. The 2-D window may be overlaid on the three-dimensional virtual environment. The presentation stream may continue to be rendered on the presentation screen while the 2-D view of the presentation stream is rendered for on the second participant’s device.
[0124] Figure 9 is a diagram of a system 900 illustrating components of devices used to provide videoconferencing within a virtual environment. In various embodiments, system 900 can operate according to the methods described above.
[0125] Device 306A is a user computing device. Device 306A could be a desktop or laptop computer, smartphone, tablet, or wearable (e.g., watch or head mounted device). Device 306A includes a microphone 902, camera 904, stereo speaker 906, input device 912. Not shown, device 306A also includes a processor and persistent, non-transitory and volatile memory. The processors can include one or more central processing units, graphic processing units or any combination thereof.
[0126] Microphone 902 converts sound into an electrical signal. Microphone 902 is positioned to capture speech of a user of device 306 A. In different examples, microphone 1502 could be a condenser microphone, electret microphone, moving-coil microphone, ribbon microphone, carbon microphone, piezo microphone, fiber-optic microphone, laser microphone, water microphone, or MEMs microphone.
[0127] Camera 904 captures image data by capturing light, generally through one or more lenses. Camera 904 is positioned to capture photographic images of a user of device 306A. Camera 904 includes an image sensor (not shown). The image sensor may, for example, be a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor. The image sensor may include one or more photodetectors that detect light and convert to electrical signals. These electrical signals captured together in a similar timeframe comprise a still photographic image. A sequence of still photographic images captured at regular intervals together comprise a video. In this way, camera 904 captures images and videos.
[0128] Stereo speaker 906 is a device which converts an electrical audio signal into a corresponding left-right sound. Stereo speaker 906 outputs the left audio stream and the right audio stream generated by an audio processor 920 (below) to be played to device 306A’s user in stereo. Stereo speaker 906 includes both ambient speakers and headphones that are designed to play sound directly into a user’s left and right ears. Example speakers includes moving-iron loudspeakers, piezoelectric speakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, heil air motion transducers, transparent ionic conduction speakers, plasma arc speakers, thermoacoustic speakers, rotary woofers, moving-coil, electrostatic, electret, planar magnetic, and balanced armature.
[0129] Network interface 908 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network. Network interface 908 receives a video stream from server 302 for respective participants for the meeting. The video stream is captured from a camera on a device of another participant to the video conference. Network interface 908 also received data specifying a three-dimensional virtual space and any models therein from server 302. For each of the other participants, network interface 908 receives a position and direction in the three-dimensional virtual space. The position and direction are input by each of the respective other participants.
[0130] Network interface 908 also transmits data to server 302. It transmits the position of device 306A’s user’s virtual camera used by Tenderer 918 and it transmits video and audio streams from camera 904 and microphone 902.
[0131] Display 910 is an output device for presentation of electronic information in visual or tactile form (the latter used for example in tactile electronic displays for blind people). Display 910 could be a television set, computer monitor, head-mounted display, heads-up displays, output of a augmented reality or virtual reality headset, broadcast reference monitor, medical monitors mobile displays (for mobile devices), smartphone displays (for smartphones). To present the information, display 910 may include an electroluminescent (ELD) display, liquid crystal display (LCD), light-emitting diode (LED) backlit LCD, thin- film transistor (TFT) LCD, light-emitting diode (LED) display, OLED display, AMOLED display, plasma (PDP) display, or quantum dot (QLED) display.
[0132] Input device 912 is a piece of equipment used to provide data and control signals to an information processing system such as a computer or information appliance. Input device 912 allows a user to input a new desired position of a virtual camera used by Tenderer 918, thereby enabling navigation in the three-dimensional environment. Examples of input devices include keyboards, mouse, scanners, joysticks, and touchscreens.
[0133] Web browser 308A and web application 310A were described above with respect to Figure 3. Web application 310A includes screen capturer 914, texture mapper 1516, Tenderer 918, and audio processor 920.
[0134] Screen capturer 914 captures a presentation stream, in particular a screen share. Screen capturer 914 may interact with an API made available by web browser 308 A. By calling a function available from the API, screen capturer 914 may cause web browser 308A to ask the user which window or screen the user would like to share. Based on the answer to that query, web browser 308 A may return a video stream corresponding to the screen share to screen capturer 914, which passes it on to network interface 908 for transmission to server 302 and ultimately to other participants’ devices.
[0135] Texture mapper 916 textures map the video stream onto a three-dimensional model corresponding to an avatar. Texture mapper 916 May texture map respective frames from the video to the avatar. In addition, texture mapper 916 may texture map a presentation stream to a three-dimensional model of a presentation screen.
[0136] Renderer 918 renders, from a perspective of a virtual camera of the user of device 306A, for output to display 910 the three-dimensional virtual space including the texturemapped three-dimensional models of the avatars for respective participants located at the received, corresponding position and oriented at the direction. Renderer 918 also renders any other three-dimensional models including for example the presentation screen.
[0137] Audio processor 920 adjusts volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second position is in the three-dimensional virtual space relative to the first position. In one embodiment, audio processor 920 adjusts the volume based on a distance between the second position to the first position. In another embodiment, audio processor 920 adjusts the volume based on a direction of the second position to the first position. In yet another embodiment, audio processor 920 adjusts the volume based on a direction of the second position relative to the first position on a horizontal plane within the three-dimensional virtual space. In yet another embodiment, audio processor 920 adjusts the volume based on a direction where the virtual camera is facing in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located to the left of the virtual camera and the right audio stream tends to have a higher volume when the avatar is located to the right of the virtual camera. Finally, in yet another embodiment, audio processor 920 adjusts the volume based on an angle between the direction where the virtual camera is facing and a direction where the avatar is facing such that the angle being more normal to where the avatar is facing tends to have a greater difference in volume between the left and right audio streams.
[0138] Audio processor 920 can also adjust an audio stream’s volume based on the area where the speaker is located relative to an area where the virtual camera is located. In this embodiment, the three-dimensional virtual space is segmented into a plurality of areas. These areas may be hierarchical. When the speaker and virtual camera are located in different areas, a wall transmission factor may be applied to attenuate the speaking audio stream’s volume.
[0139] Server 302 includes an attendance notifier 922, a stream adjuster 924, and a stream forwarder 926.
[0140] Attendance notifier 922 notifies conference participants when participants join and leave the meeting. When a new participant joins the meeting, attendance notifier 922 sends a message to the devices of the other participants to the conference indicating that a new participant has joined. Attendance notifier 922 signals stream forwarder 926 to start forwarding video, audio, and position/direction information to the other participants.
[0141] Stream adjuster 924 receives a video stream captured from a camera on a device of a first user. Stream adjuster 924 determines an available bandwidth to transmit data for the virtual conference to the second user. It determines a distance between a first user and a second user in a virtual conference space. And, it apportions the available bandwidth between the first video stream and the second video stream based on the relative distance. In this way, stream adjuster 924 prioritizes video streams of closer users over video streams from farther ones. Additionally or alternatively, stream adjuster 924 may be located on device 306A, perhaps as part of web application 310A.
[0142] Stream forwarder 926 broadcasts position/direction information, video, audio, and screen share screens received (with adjustments made by stream adjuster 924). Stream forwarder 926 may send information to the device 306 A in response to a request from conference application 310A. Conference application 310A may send that request in response to the notification from attendance notifier 922.
[0143] Network interface 928 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network. Network interface 928 transmits the model information to devices of the various participants. Network interface 928 receives video, audio, and screen share screens from the various participants.
[0144] Screen capturer 914, texture mapper 916, Tenderer 918, audio processor 920, attendance notifier 922, a stream adjuster 924, and a stream forwarder 926 can each be implemented in hardware, software, firmware, or any combination thereof.
[0145] Identifiers, such as “(a),” “(b),” “(i),” “(ii),” etc., are sometimes used for different elements or steps. These identifiers are used for clarity and do not necessarily designate an order for the elements or steps.
[0146] The present invention has been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
[0147] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such as specific embodiments, without undue experimentation, and without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance. [0148] The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

WHAT IS CLAIMED IS: A computer-implemented method for presenting in a virtual conference including a plurality of participants, comprising: receiving data specifying a three-dimensional virtual space, wherein the three- dimensional virtual space comprises the plurality of participants, an avatar representing each of the plurality of participants, and a plurality of three-dimensional models of a plurality of presentation screens; receiving a first selection of a first three-dimensional model of the plurality of three- dimensional models of a first presentation screen of the plurality of presentation screens; receiving a first presentation stream from a first client device of a first participant of the plurality of participants; mapping the first presentation stream onto the first three-dimensional model of the first presentation screen; and receiving a second selection of a second three-dimensional model of the plurality of three-dimensional models of a second presentation screen of the plurality of presentation screens; receiving a second presentation stream; mapping the second presentation stream onto the second three-dimensional model of the second presentation screen while the first presentation stream is mapped on the first three-dimensional model of the first presentation screen; from a perspective of a virtual camera, rendering for display the three-dimensional virtual space with the first three-dimensional model of the first presentation screen including the first presentation stream and the second three-dimensional model of the second presentation screen including the second presentation stream. The computer-implemented method of claim 1, further comprising identifying a set of participants from the plurality of participants allowed to consume the first presentation stream based on security parameters or a position of a respective avatar of each of the set of participants with respect to the first three-dimensional model of the first presentation screen in the three-dimensional virtual space. The computer-implemented method of claim 1, wherein the second presentation stream is received from a second client device of a second participant. . The computer-implemented method of claim 1, wherein receiving the first selection based on an input received by the first client device, availability of the first three-dimensional model of the first presentation screen, or a position of a first avatar of the first participant in the three-dimensional virtual space. The computer-implemented method of claim 1, wherein the presentation screen comprises audio and video data captured by the first client device. The computer-implemented of claim 1, further comprising removing the first presentation stream from the first three-dimensional model of the first presentation screen based on a position of a first avatar of the of the first participant in the three-dimensional virtual space with respect to the first three-dimensional model of the first presentation screen or an input received by the first client device. The computer-implemented of claim 1, wherein the plurality of three-dimensional models of the plurality of presentation screens are of varying sizes. The computer-implemented method of claim 1, wherein the first and second presentation stream are both received from the first client device of a first participant. A non-transitory, tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations for presenting in a virtual conference including a plurality of participants, the operations comprising: receiving data specifying a three-dimensional virtual space, wherein the three- dimensional virtual space comprises a plurality of participants, an avatar representing each of the plurality of participants, and a plurality of three-dimensional models of a plurality of presentation screens; receiving a first selection of a first three-dimensional model of the plurality of three- dimensional models of a first presentation screen of the plurality of presentation screens; receiving a first presentation stream from a first client device of a first participant of the plurality of participants; mapping the first presentation stream onto the first three-dimensional model of the first presentation screen; and receiving a second selection of a second three-dimensional model of the plurality of three-dimensional models of a second presentation screen of the plurality of presentation screens; receiving a second presentation stream; mapping the second presentation stream onto the second three-dimensional model of the second presentation screen while the first presentation stream is mapped on the first three-dimensional model of the first presentation screen; from a perspective of a virtual camera, rendering for display the three-dimensional virtual space with the first three-dimensional model of the first presentation screen including the first presentation stream and the second three-dimensional model of the second presentation screen including the second presentation stream. The device of claim 9, wherein the operations further comprise identifying a set of participants from the plurality of participants allowed to consume the first presentation stream based on security parameters or a position of a respective avatar of each of the set of participants with respect to the first three-dimensional model of the first presentation screen in the three-dimensional virtual space. The device of claim 9, wherein the second presentation stream is received from a second client device of a second participant. The device of claim 9, wherein receiving the first selection based on an input received by the first client device, availability of the first three-dimensional model of the first presentation screen, or a position of a first avatar of the first participant in the three- dimensional virtual space. The device of claim 9, wherein the presentation screen comprises audio and video data captured by the first client device. The device of claim 9, wherein the operations further comprise removing the first presentation stream from the first three-dimensional model of the first presentation screen based on a position of a first avatar of the of the first participant in the three-dimensional virtual space with respect to the first three-dimensional model of the first presentation screen or an input received by the first client device. The device of claim 9, wherein the plurality of three-dimensional models of the plurality of presentation screens are of varying sizes. A system for presenting in a virtual conference including a plurality of participants, comprising: a processor coupled to a memory; a network interface configured to (i) receive data specifying a three-dimensional virtual space, wherein the three-dimensional virtual space comprises the plurality of participants, an avatar representing each of the plurality of participants, and a plurality of three-dimensional models of a plurality of presentation screens; (ii) receive a first selection of a first three-dimensional model of the plurality of three-dimensional models of a first presentation screen of the plurality of presentation screens, (iii) receive a first presentation stream from a first client device of a first participant of the plurality of participants, (iv) receive a second selection of a second three-dimensional model of the plurality of three- dimensional models of a second presentation screen of the plurality of presentation screens, and (v) receive a second presentation stream from a second client device of the second participant; a mapper, implemented on the processor, configured to (i) map the first presentation stream onto the first three-dimensional model of the first presentation screen, and (ii) map the second presentation stream onto the second three-dimensional model of the second presentation screen while the first presentation stream is mapped on the first three- dimensional model of the first presentation screen; and a Tenderer, implemented on the processor, configured to render for display to the second participant, from a perspective of a virtual camera of a third participant of the plurality of participants, rendering for display to the third participant the three-dimensional virtual space with the first three-dimensional model of the first presentation screen including the first presentation stream and the second three-dimensional model of the second presentation screen including the second presentation stream. The system of claim 16, wherein the network interface is further configured to (vi) identify a set of participants from the plurality of participants allowed to consume the first presentation stream based on security parameters or a position of a respective avatar of each of the set of participants with respect to the first three-dimensional model of the first presentation screen in the three-dimensional virtual space. The system of claim 16, wherein the network interface is further configured to (vi) inform a third client device of the third participant of the presentation stream and the second client device requests the presentation stream from the communication server. The system of claim 16, wherein receiving the first selection based on an input received by the first client device, availability of the first three-dimensional model of the first presentation screen, or a position of a first avatar of the first participant in the three- dimensional virtual space. The system of claim 16, wherein the presentation screen comprises audio and video data captured by the first client device. A computer-implemented method for presenting in a virtual conference including a plurality of participants, comprising: receiving data specifying a three-dimensional virtual space, wherein the three- dimensional virtual space comprises the plurality of participants and an avatar representing each of the plurality of participants; receiving a presentation stream from a first client device of a first participant of the plurality of participants; mapping the presentation stream onto a three-dimensional model of a presentation screen in the three-dimensional virtual space; receiving a selection of the presentation screen from a second participant of the plurality of participants; and rendering for display to the second participant a two-dimensional view of the presentation stream. The computer-implemented method of claim 21, further comprising rendering the presentation stream onto on the three-dimensional model of the presentation screen behind the two-dimensional view of the presentation stream in the three-dimensional virtual space. The computer-implemented method of claim 21, wherein the selection is an input received by a second client device of the second participant. The computer-implemented method of claim 21, wherein a communication server informs a second client device of the second participant of the presentation stream and the second client device requests the presentation stream from the communication server. The computer-implemented method of claim 21, wherein the presentation stream comprises a first participant’s view of the three-dimensional virtual space. The computer-implemented method of claim 21 , wherein the presentation screen comprises audio and video data captured by the first client device. The computer-implemented of claim 21, further comprising removing the presentation stream from the first three-dimensional model of the first presentation screen based on a position of the avatar of the of the participant in the three-dimensional virtual space with respect to the three-dimensional model of the presentation screen or an input received by the first client device. A non-transitory, tangible computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations for presenting in a virtual conference including a plurality of participants, the operations comprising: receiving data specifying a three-dimensional virtual space, wherein the three- dimensional virtual space comprises the plurality of participants and an avatar representing each of the plurality of participants; receiving a presentation stream from a first client device of a first participant of the plurality of participants; mapping the presentation stream onto a three-dimensional model of a presentation screen in the three-dimensional virtual space; receiving a selection of the presentation screen from a second participant of the plurality of participants; and rendering for display to the second participant a two-dimensional view of the presentation stream. The device of claim 28, wherein the operations further comprise rendering the presentation stream onto on the three-dimensional model of the presentation screen behind the two dimensional view of the presentation stream in the three-dimensional virtual space. The device of claim 28, wherein the selection is an input from received by a second client device of the second participant. The device of claim 28, wherein a communication server informs a second client device of the second participant of the presentation stream and the second client device requests the presentation stream from the communication server. The device of claim 28, wherein the presentation stream comprises a first participant’s view of the three-dimensional virtual space. The device of claim 28, wherein the presentation screen comprises audio and video data captured by the first client device. The device of claim 28, wherein the operations further comprise removing the presentation stream from the first three-dimensional model of the first presentation screen based on a position of the avatar of the of the participant in the three-dimensional virtual space with respect to the three-dimensional model of the presentation screen or an input received by the first client device. A system for presenting in a virtual conference including a plurality of participants, comprising: a processor coupled to a memory; a network interface configured to (i) receive data specifying a three-dimensional virtual space, wherein the three-dimensional virtual space comprises the plurality of participants and an avatar representing each of the plurality of participants, (ii) receive a presentation stream from a first client device of a first participant of the plurality of participants, and (iii) receive a selection of the presentation screen from a second participant of the plurality of participants; a texture mapper, implemented on the processor, configured to texture map the presentation stream onto a three-dimensional model of the presentation screen; and a Tenderer, implemented on the processor, configured to render for display to a second participant of the plurality of participants, the three-dimensional virtual space including a two-dimensional view of the presentation stream. The system of claim 35, wherein the Tenderer is further configured to render the presentation stream onto on the three-dimensional model of the presentation screen behind the two-dimensional view of the presentation stream in the three-dimensional virtual space. The system of claim 35, wherein the selection is an input from received by a second client device of the second participant. The system of claim 35, wherein a communication server informs a second client device of the second participant of the presentation stream and the second client device requests the presentation stream from the communication server. The system of claim 35, wherein the presentation stream comprises a first participant’s view of the three-dimensional virtual space. The system of claim 35, wherein the presentation screen comprises audio and video data captured by the first client device.
PCT/US2023/070509 2022-07-20 2023-07-19 Multi-screen presentation in a virtual videoconferencing environment WO2024020452A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US17/813,657 US11928774B2 (en) 2022-07-20 2022-07-20 Multi-screen presentation in a virtual videoconferencing environment
US17/813,708 2022-07-20
US17/813,708 US20240031531A1 (en) 2022-07-20 2022-07-20 Two-dimensional view of a presentation in a three-dimensional videoconferencing environment
US17/813,657 2022-07-20

Publications (1)

Publication Number Publication Date
WO2024020452A1 true WO2024020452A1 (en) 2024-01-25

Family

ID=89618634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/070509 WO2024020452A1 (en) 2022-07-20 2023-07-19 Multi-screen presentation in a virtual videoconferencing environment

Country Status (1)

Country Link
WO (1) WO2024020452A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140040783A1 (en) * 2010-03-10 2014-02-06 Xmobb, Inc. Virtual social supervenue for sharing multiple video streams
US10491946B2 (en) * 2016-08-30 2019-11-26 The Directv Group, Inc. Methods and systems for providing multiple video content streams
US20200294312A1 (en) * 2015-04-09 2020-09-17 Cinemoi North America, LLC Systems and methods to provide interactive virtual environments
US20210248803A1 (en) * 2018-10-31 2021-08-12 Dwango Co., Ltd. Avatar display system in virtual space, avatar display method in virtual space, and computer program
WO2022087147A1 (en) * 2020-10-20 2022-04-28 Katmai Tech Holdings LLC A web-based videoconference virtual environment with navigable avatars, and applications thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140040783A1 (en) * 2010-03-10 2014-02-06 Xmobb, Inc. Virtual social supervenue for sharing multiple video streams
US20200294312A1 (en) * 2015-04-09 2020-09-17 Cinemoi North America, LLC Systems and methods to provide interactive virtual environments
US10491946B2 (en) * 2016-08-30 2019-11-26 The Directv Group, Inc. Methods and systems for providing multiple video content streams
US20210248803A1 (en) * 2018-10-31 2021-08-12 Dwango Co., Ltd. Avatar display system in virtual space, avatar display method in virtual space, and computer program
WO2022087147A1 (en) * 2020-10-20 2022-04-28 Katmai Tech Holdings LLC A web-based videoconference virtual environment with navigable avatars, and applications thereof

Similar Documents

Publication Publication Date Title
US11290688B1 (en) Web-based videoconference virtual environment with navigable avatars, and applications thereof
US10952006B1 (en) Adjusting relative left-right sound to provide sense of an avatar's position in a virtual space, and applications thereof
US11095857B1 (en) Presenter mode in a three-dimensional virtual conference space, and applications thereof
US11070768B1 (en) Volume areas in a three-dimensional virtual conference space, and applications thereof
US11140361B1 (en) Emotes for non-verbal communication in a videoconferencing system
US11076128B1 (en) Determining video stream quality based on relative position in a virtual space, and applications thereof
US11457178B2 (en) Three-dimensional modeling inside a virtual video conferencing environment with a navigable avatar, and applications thereof
US11184362B1 (en) Securing private audio in a virtual conference, and applications thereof
AU2023229565B2 (en) A web-based videoconference virtual environment with navigable avatars, and applications thereof
US11743430B2 (en) Providing awareness of who can hear audio in a virtual conference, and applications thereof
US20240087236A1 (en) Navigating a virtual camera to a video avatar in a three-dimensional virtual environment, and applications thereof
US11928774B2 (en) Multi-screen presentation in a virtual videoconferencing environment
US11700354B1 (en) Resituating avatars in a virtual environment
US20240031531A1 (en) Two-dimensional view of a presentation in a three-dimensional videoconferencing environment
US12028651B1 (en) Integrating two-dimensional video conference platforms into a three-dimensional virtual environment
US20240007593A1 (en) Session transfer in a virtual videoconferencing environment
US11776227B1 (en) Avatar background alteration
US11741664B1 (en) Resituating virtual cameras and avatars in a virtual environment
US11748939B1 (en) Selecting a point to navigate video avatars in a three-dimensional environment
WO2024020452A1 (en) Multi-screen presentation in a virtual videoconferencing environment
WO2022235916A1 (en) Securing private audio in a virtual conference, and applications thereof
WO2024059606A1 (en) Avatar background alteration
WO2022204356A1 (en) Emotes for non-verbal communication in a videoconferencing system
WO2024020562A1 (en) Resituating virtual cameras and avatars in a virtual environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23843858

Country of ref document: EP

Kind code of ref document: A1