WO2022087147A1 - A web-based videoconference virtual environment with navigable avatars, and applications thereof - Google Patents
A web-based videoconference virtual environment with navigable avatars, and applications thereof Download PDFInfo
- Publication number
- WO2022087147A1 WO2022087147A1 PCT/US2021/055875 US2021055875W WO2022087147A1 WO 2022087147 A1 WO2022087147 A1 WO 2022087147A1 US 2021055875 W US2021055875 W US 2021055875W WO 2022087147 A1 WO2022087147 A1 WO 2022087147A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- avatar
- virtual space
- camera
- dimensional
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 99
- 238000009877 rendering Methods 0.000 claims description 59
- 238000013507 mapping Methods 0.000 claims description 40
- 230000005540 biological transmission Effects 0.000 claims description 29
- 230000008859 change Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 241001310793 Podium Species 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 230000002238 attenuated effect Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 21
- 238000005516 engineering process Methods 0.000 description 6
- 239000008186 active pharmaceutical agent Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 229920001621 AMOLED Polymers 0.000 description 1
- 208000025721 COVID-19 Diseases 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 206010016275 Fear Diseases 0.000 description 1
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 240000007591 Tilia tomentosa Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 239000011449 brick Substances 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 238000009223 counseling Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- XEEYBQQBJWHFJM-UHFFFAOYSA-N iron Substances [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/08—Volume rendering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2385—Channel allocation; Bandwidth allocation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4782—Web browsing, e.g. WebTV
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8166—Monomedia components thereof involving executable data, e.g. software
- H04N21/8173—End-user applications, e.g. Web browser, game
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/157—Conference systems defining a virtual conference space and using avatars or agents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/42—Graphical user interfaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/10—Aspects of automatic or semi-automatic exchanges related to the purpose or context of the telephonic communication
- H04M2203/1016—Telecontrol
- H04M2203/1025—Telecontrol of avatars
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/563—User guidance or feature selection
- H04M3/564—User guidance or feature selection whereby the feature is a sub-conference
Definitions
- This field is generally related to videoconferencing.
- Video conferencing involves the reception and transmission of audio-video signals by users at different locations for communication between people in real time.
- Videoconferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, CA.
- Some videoconferencing software such as the FaceTime application available from Apple Inc. of Cupertino, CA, comes standard with mobile devices.
- these applications operate by displaying video and outputting audio of other conference participants.
- the screen may be divided into a number of rectangular frames, each displaying video of a participant.
- these services operate by having a larger frame that presents video of the person speaking. As different individuals speak, that frame will switch between speakers.
- the application captures video from a camera integrated with the user’s device and audio from a microphone integrated with the user’s device. The application then transmits that audio and video to other applications running on other user’s devices.
- Massively multiplayer online games generally can handle quite a few more than 25 participants. These games often have hundreds or thousands of players on a single server. MMOs often allow players to navigate avatars around a virtual world. Sometimes these MMOs allow users to speak with one another or send messages to one another. Examples include the ROBLOX game available from Roblox Corporation of San Mateo, CA, and the MINECRAFT game available from Mojang Studios of Sweden.
- a device enables videoconferencing between a first and second user.
- the device includes a processor coupled to a memory, a display screen, a network interface, and a web browser.
- the network interface is configured to receive: (i) data specifying a three-dimensional virtual space, (ii) a position and direction in the three- dimensional virtual space, the position and direction input by the first user, and (iii) a video stream captured from a camera on a device of the first user.
- the first user’s camera is positioned to capture photographic images of the first user.
- the web browser implemented on the processor, is configured to download a web application from a server and execute the web application.
- the web application includes a texture mapper and a Tenderer.
- the texture mapper is configured to texture map the video stream onto a three- dimensional model of an avatar.
- the Tenderer is configured to render, from a perspective of a virtual camera of the second user, for display to the second user the three- dimensional virtual space including the texture-mapped three-dimensional model of the avatar located at the position and oriented at the direction.
- a computer-implemented method allows for a presentation in a virtual conference including a plurality of participants.
- data specifying a three-dimensional virtual space is received.
- a position and direction in the three- dimensional virtual space are also received. The position and direction were input by a first participant of the plurality of participants to the conference.
- a video stream captured from a camera on a device of the first participant is received. The camera was positioned to capture photographic images of the first participant.
- the video stream is texture mapped onto a three-dimensional model of an avatar.
- a presentation stream from the device of the first participant is received.
- the presentation stream is texture mapped onto a three-dimensional model of a presentation screen.
- a three- dimensional virtual space with the texture-mapped avatar and the texture-mapped presentation screen is, from a perspective of a virtual camera of a second participant of the plurality of participants, rendered for display to the second participant.
- embodiments allow for presentations in a social conference environment.
- a computer-implemented method provides audio for a virtual conference including a plurality of participants.
- a three-dimensional virtual space including an avatar with texture mapped video of a second user is rendered, from a perspective of a virtual camera of a first user, for display to the first user.
- the virtual camera is at a first position in the three-dimensional virtual space and the avatar at a second position in the three-dimensional virtual space.
- An audio stream from a microphone of a device of the second user is received.
- the microphone was positioned to capture speech of the second user.
- Volume of the received audio stream is adjusted to determine a left audio stream and a right audio stream to provide a sense of where the second position is in the three-dimensional virtual space relative to the first position.
- the left audio stream and the right audio stream are output to be played to the first user in stereo.
- a computer-implemented method provides audio for a virtual conference.
- a three-dimensional virtual space including an avatar with texture mapped video of a second user is rendered, from a perspective of a virtual camera of a first user, for display to the first user.
- the virtual camera is at a first position in the three-dimensional virtual space and the avatar at a second position in the three- dimensional virtual space.
- An audio stream from a microphone of a device of the second user is received. Whether the virtual camera and the avatar are located in a same area in the plurality of areas is determined. When the virtual camera and the avatar are determined not to be located in the same area, the audio stream is attenuated.
- the attenuated audio stream is output to be played to the first user. In this way, embodiments allow for private and side conversations in a virtual video conferencing environment.
- a computer-implemented method efficiently streams video for a virtual conference.
- a distance between a first and second user in a virtual conference space is determined.
- a video stream captured from a camera on a device of the first user is received.
- the camera was positioned to capture photographic images of the first user.
- a resolution or bit rate of the video stream is reduced based on the determined distance such that a closer distance results in a greater resolution than a farther distance.
- the video stream is transmitted at the reduced resolution or bit rate to a device of the second user for display to the second user within the virtual conference space.
- the video stream is to be texture mapped on an avatar of the first user for display to the second user within the virtual conference space. In this way, embodiments allocate bandwidth and computing resources efficiently even when there are a large number of conference participants.
- a computer-implemented method allows for modeling in a virtual video conference.
- a three-dimensional model of a virtual environment, a mesh representing a three-dimensional model of an object, and a video stream from a participant of the virtual video conference are received.
- the video stream is texture mapped to an avatar navigable by the participant.
- the texture mapped avatar and the mesh representing the three-dimensional model of the object within the virtual environment are rendered for display.
- Figure l is a diagram illustrating an example interface that provides videoconferencing in a virtual environment with video streams being mapped onto avatars.
- Figure 2 is a diagram illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
- Figure 3 is a diagram illustrating a system that provides videoconferences in a virtual environment.
- Figures 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing.
- Figure 5 is a flowchart illustrating a method for adjusting relative left-right volume to provide a sense of position in a virtual environment during a videoconference.
- Figure 6 is a chart illustrating how volume rolls off as distance between the avatars increases.
- Figure 7 is a flowchart illustrating a method for adjusting relative volume to provide different volume areas in a virtual environment during a videoconference.
- Figure 8A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
- Figures 9A-C are diagrams illustrating traversing a hierarchy of volume areas in a virtual environment during a videoconference.
- Figure 10 illustrates an interface with a three-dimensional model in a three- dimensional virtual environment.
- Figure 11 illustrates a presentation screen share in a three-dimensional virtual environment used for videoconferencing.
- Figure 12 is a flowchart illustrating a method for apportioning available bandwidth based on relative position of avatars within the three-dimensional virtual environment.
- Figure 13 is a chart illustrating how a priority value can fall off as distance between the avatars increases.
- Figure 14 is a chart illustrating how the bandwidth allocated can vary based on relative priority.
- Figure 15 is a diagram illustrating components of devices used to provide videoconferencing within a virtual environment.
- Figure 1 is a diagram illustrating an example of an interface 100 that provides videoconferences in a virtual environment with video streams being mapped onto avatars.
- Interface 100 may be displayed to a participant to a videoconference.
- interface 100 may be rendered for display to the participant and may be constantly updated as the videoconference progresses.
- a user may control the orientation of their virtual camera using, for example, keyboard inputs. In this way, the user can navigate around a virtual environment.
- different inputs may change the virtual camera’s X and Y position and pan and tilt angles in the virtual environment.
- a user may use inputs to alter height (the Z coordinate) or yaw of the virtual camera.
- a user may enter inputs to cause the virtual camera to “hop” up while returning to its original position, simulating gravity.
- Interface 100 includes avatars 102 A and B, which each represent different participants to the videoconference.
- Avatars 102A and B respectively, have texture mapped video streams 104 A and B from devices of the first and second participant.
- a texture map is an image applied (mapped) to the surface of a shape or polygon.
- the images are respective frames of the video.
- the camera devices capturing video streams 104A and B are positioned to capture faces of the respective participants. In this way, the avatars have texture mapped thereon, moving images of faces as participants in the meeting talk and listen.
- avatars 102 A and B are controlled by the respective participants that they represent.
- Avatars 102A and B are three-dimensional models represented by a mesh. Each avatar 102 A and B may have the participant’s name underneath the avatar.
- the respective avatars 102A and B are controlled by the various users. They each may be positioned at a point corresponding to where their own virtual cameras are located within the virtual environment. Just as the user viewing interface 100 can move around the virtual camera, the various users can move around their respective avatars 102 A and B.
- the virtual environment rendered in interface 100 includes background image 120 and a three-dimensional model 118 of an arena.
- the arena may be a venue or building in which the videoconference should take place.
- the arena may include a floor area bounded by walls.
- Three-dimensional model 118 can include a mesh and texture. Other ways to mathematically represent the surface of three-dimensional model 118 may be possible as well. For example, polygon modeling, curve modeling, and digital sculpting may be possible.
- three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three- dimensional space.
- Three-dimensional model 118 may also include specification of light sources.
- the light sources can include for example, point, directional, spotlight, and ambient.
- the objects may also have certain properties describing how they reflect light.
- the properties may include diffuse, ambient, and spectral lighting interactions.
- the virtual environment can include various other three- dimensional models that illustrate different components of the environment.
- the three-dimensional environment can include a decorative model 114, a speaker model 116, and a presentation screen model 122.
- model 118 these models can be represented using any mathematical way to represent a geometric surface in three- dimensional space. These models may be separate from model 118 or combined into a single representation of the virtual environment.
- Decorative models such as model 114, serve to enhance the realism and increase the aesthetic appeal of the arena.
- Speaker model 116 may virtually emit sound, such as presentation and background music, as will be described in greater detail below with respect to figures 5 and 7.
- Presentation screen model 122 can serve to provide an outlet to present a presentation. Video of the presenter or a presentation screen share may be texture mapped onto presentation screen model 122.
- Button 108 may provide the user a list of participants. In one example, after a user selects button 108, the user could chat with other participants by sending text messages, individually or as a group.
- Button 110 may enable a user to change attributes of the virtual camera used to render interface 100.
- the virtual camera may have a field of view specifying the angle at which the data is rendered for display. Modeling data within the camera field of view is rendered, while modeling data outside the camera’s field of view may not be.
- the virtual camera’s field of view may be set somewhere between 60 and 110°, which is commensurate with a wide-angle lens and human vision.
- selecting button 110 may cause the virtual camera to increase the field of view to exceed 170°, commensurate with a fisheye lens. This may enable a user to have broader peripheral awareness of its surroundings in the virtual environment.
- button 112 causes the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to devices belonging to the other participants signaling to their devices to stop displaying the avatar corresponding to the user previously viewing interface 100.
- interface virtual 3D space is used to conduct video conferencing. Every user controls an avatar, which they can control to move around, look around, jump or do other things which change the position or orientation.
- a virtual camera shows the user the virtual 3D environment and the other avatars.
- the avatars of the other users have as an integral part a virtual display, which shows the webcam image of the user.
- embodiments provide a more social experience than conventional web conferencing or conventional MMO gaming. That more social experience has a variety of applications. For example, it can be used in online shopping.
- interface 100 has applications in providing virtual grocery stores, houses of worship, trade shows, B2B sales, B2C sales, schooling, restaurants or lunchrooms, product releases, construction site visits (e.g., for architects, engineers, contractors), office spaces (e.g., people work “at their desks” virtually), controlling machinery remotely (ships, vehicles, planes, submarines, drones, drilling equipment, etc.), plant/factory control rooms, medical procedures, garden designs, virtual bus tours with guide, music events (e.g., concerts), lectures (e.g., TED talks), meetings of political parties, board meetings, underwater research, research on hard to reach places, training for emergencies (e.g., fire), cooking, shopping (with checkout and delivery), virtual arts and crafts (e.g., painting and pottery), marriages, funerals, baptisms, remote sports training, counseling, treating fears (e.g., confrontation therapy), fashion shows, amusement parks, home decoration, watching sports, watching esports, watching performances captured using a three-dimensional camera, playing board and role playing games,
- Figure 2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
- the virtual environment here includes a three-dimensional arena 118, and various three- dimensional models, including three-dimensional models 114 and 122.
- diagram 200 includes avatars 102 A and B navigating around the virtual environment.
- interface 100 in figure 1 is rendered from the perspective of a virtual camera. That virtual camera is illustrated in diagram 200 as virtual camera 204.
- the user viewing interface 100 in figure 1 can control virtual camera 204 and navigate the virtual camera in three-dimensional space.
- Interface 100 is constantly being updated according to the new position of virtual camera 204 and any changes of the models within in the field of view of virtual camera 204.
- the field of view of virtual camera 204 may be a frustum defined, at least in part, by horizontal and vertical field of view angles.
- a background image, or texture may define at least part of the virtual environment.
- the background image may capture aspects of the virtual environment that are meant to appear at a distance.
- the background image may be texture mapped onto a sphere 202.
- the virtual camera 204 may be at an origin of the sphere 202. In this way, distant features of the virtual environment may be efficiently rendered.
- other shapes instead of sphere 202 may be used to texture map the background image.
- the shape may be a cylinder, cube, rectangular prism, or any other three-dimensional geometry.
- FIG. 3 is a diagram illustrating a system 300 that provides videoconferences in a virtual environment.
- System 300 includes a server 302 coupled to devices 306 A and B via one or more networks 304.
- Server 302 provides the services to connect a videoconference session between devices 306A and 306B.
- server 302 communicates notifications to devices of conference participants (e.g., devices 306A-B) when new participants join the conference and when existing participants leave the conference.
- Server 302 communicates messages describing a position and direction in a three-dimensional virtual space for respective participant’s virtual cameras within the three-dimensional virtual space.
- Server 302 also communicates video and audio streams between the respective devices of the participants (e.g., devices 306A-B).
- server 302 stores and transmits data describing data specifying a three-dimensional virtual space to the respective devices 306A-B.
- server 302 may provide executable information that instructs the devices 306 A and 306B on how to render the data to provide the interactive conference.
- Server 302 responds to requests with a response.
- Server 302 may be a web server.
- a web server is software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web.
- HTTP Hypertext Transfer Protocol
- the main job of a web server is to display website content through storing, processing and delivering webpages to users.
- communication between devices 306A-B happens not through server 302 but on a peer-to-peer basis.
- one or more of the data describing the respective participants’ location and direction, the notifications regarding new and exiting participants, and the video and audio streams of the respective participants are communicated not through server 302 but directly between devices 306A- B.
- Network 304 enables communication between the various devices 306A-B and server 302.
- Network 304 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any combination of two or more such networks.
- VPN virtual private network
- LAN local area network
- WLAN wireless LAN
- WAN wide area network
- WWAN wireless wide area network
- MAN metropolitan area network
- PSTN Public Switched Telephone Network
- PSTN Public Switched Telephone Network
- Devices 306A-B are each devices of respective participants to the virtual conference. Devices 306A-B each receive data necessary to conduct the virtual conference and render the data necessary to provide the virtual conference. As will be described in greater detail below, devices 306A-B include a display to present the rendered conference information, inputs that allow the user to control the virtual camera, a speaker (such as a headset) to provide audio to the user for the conference, a microphone to capture a user’s voice input, and a camera positioned to capture video of the user’s face.
- a display to present the rendered conference information
- inputs that allow the user to control the virtual camera inputs that allow the user to control the virtual camera
- a speaker such as a headset
- microphone to capture a user’s voice input
- a camera positioned to capture video of the user’s face.
- Devices 306A-B can be any type of computing device, including a laptop, a desktop, a smartphone, or a tablet computer, or wearable computer (such as a smartwatch or a augmented reality or virtual reality headset).
- Web browser 308A-B can retrieve a network resource (such as a webpage) addressed by the link identifier (such as a uniform resource locator, or URL) and present the network resource for display.
- web browser 308A-B is a software application for accessing information on the World Wide Web.
- web browser 308A-B makes this request using the hypertext transfer protocol (HTTP or HTTPS).
- HTTP Hypertext transfer protocol
- the web browser retrieves the necessary content from a web server, interprets and executes the content, and then displays the page on a display on device 306A-B shown as client/counterpart conference application 308A-B.
- the content may have HTML and client-side scripting, such as JavaScript.
- Conference application 310A-B may be a web application downloaded from server 302 and configured to be executed by the respective web browsers 308A-B.
- conference application 310A-B may be a JavaScript application.
- conference application 310A-B may be written in a higher-level language, such as a Typescript language, and translated or compiled into JavaScript.
- Conference application 310A-B is configured to interact with the WebGL JavaScript application programming interface. It may have control code specified in JavaScript and shader code written in OpenGL ES Shading Language (GLSL ES).
- GLSL ES OpenGL ES Shading Language
- conference application 310A-B may be able to utilize a graphics processing unit (not shown) of device 306A-B.
- Conference application 310A-B receives the data from server 302 describing position and direction of other avatars and three-dimensional modeling information describing the virtual environment. In addition, conference application 310A-B receives video and audio streams of other conference participants from server 302.
- Conference application 310A-B renders three three-dimensional modeling data, including data describing the three-dimensional environment and data representing the respective participant avatars.
- This rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques.
- the rendering may involve ray tracing based on the characteristics of the virtual camera.
- Ray tracing involves generating an image by tracing a path of light as pixels in an image plane and simulating the effects of his encounters with virtual objects.
- the ray tracing may simulate optical effects such as reflection, refraction, scattering, and dispersion.
- the user uses web browser 308A-B to enter a virtual space.
- the scene is displayed on the screen of the user.
- the webcam video stream and microphone audio stream of the user are sent to server 302.
- an avatar model is created for them.
- the position of this avatar is sent to the server and received by the other users.
- Other users also get a notification from server 302 that an audio/video stream is available.
- the video stream of a user is placed on the avatar that was created for that user.
- the audio stream is played back as coming from the position of the avatar.
- Figures 4A-C illustrate how data is transferred between various components of the system in figure 3 to provide videoconferencing. Like figure 3, each of figures 4A-C depict the connection between server 302 and devices 306 A and B. In particular, figures 4A-C illustrate example data flows between those devices.
- Figure 4A illustrates a diagram 400 illustrating how server 302 transmits data describing the virtual environment to devices 306 A and 306B. In particular, both devices 306 A and 306B, receive from server 302 the three-dimensional arena 404, background texture 402, space hierarchy 408 and any other three-dimensional modeling information 406.
- background texture 402 is an image illustrating distant features of the virtual environment.
- the image may be regular (such as a brick wall) or irregular.
- Background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image format. It describes the background image to be rendered against, for example, a sphere at a distance.
- Three-dimensional arena 404 is a three-dimensional model of the space in which the conference is to take place. As described above, it may include, for example, a mesh and possibly its own texture information to be mapped upon the three-dimensional primitives it describes. It may define the space in which the virtual camera and respective avatars can navigate within the virtual environment. Accordingly, it may be bounded by edges (such as walls or fences) that illustrate to users the perimeter of the navigable virtual environment.
- Space hierarchy 408 is data specifying partitions in the virtual environment. These partitions are used to determine how sound is processed before being transferred between participants. As will be described below, this partition data may be hierarchical and may describe sound processing to allow for areas where participants to the virtual conference can have private conversations or side conversations.
- Three-dimensional model 406 is any other three-dimensional modeling information needed to conduct the conference. In one embodiment, this may include information describing the respective avatars. Alternatively or additionally, this information may include product demonstrations.
- FIG. 4B-C illustrate how server 302 forwards information from one device to another.
- Figure 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and B
- Figure 4C illustrates a diagram 420 showing how server 302 transmits the information to respective devices 306B and A.
- device 306 A transmits position and direction 422 A, video stream 424 A, and audio stream 426 A to server 302, which transmits position and direction 422 A, video stream 424 A, and audio stream 426 A to device 306B.
- device 306B transmits position and direction 422B, video stream 424B, and audio stream 426B to server 302, which transmits position and direction 422B, video stream 424B, and audio stream 426B to device 306 A.
- Position and direction 422A-B describe the position and direction of the virtual camera for the user using device 306 A.
- the position may be a coordinate in three-dimensional space (e.g., x, y, z coordinate) and the direction may be a direction in three-dimensional space (e.g., pan, tilt, roll).
- the user may be unable to control the virtual camera’s roll, so the direction may only specify pan and tilt angles.
- the user may be unable to change the avatar’s z coordinate (as the avatar is bounded by virtual gravity), so the z coordinate may be unnecessary.
- position and direction 422A-B each may include at least a coordinate on a horizontal plane in the three-dimensional virtual space and a pan and tilt value.
- the user may be able to “jump” it’s avatar, so the Z position may be specified only by an indication of whether the user is jumping her avatar.
- position and direction 422A-B may be transmitted and received using HTTP request responses or using socket messaging.
- Video stream 424A-B is video data captured from a camera of the respective devices 306 A and B.
- the video may be compressed.
- the video may use any commonly known video codecs, including MPEG-4, VP8, or H.264.
- the video may be captured and transmitted in real time.
- audio stream 426A-B is audio data captured from a microphone of the respective devices.
- the audio may be compressed.
- the video may use any commonly known audio codecs, including MPEG-4 or vorbis.
- the audio may be captured and transmitted in real time.
- Video stream 424A and audio stream 426A are captured, transmitted, and presented synchronously with one another.
- video stream 424B and audio stream 426B are captured, transmitted, and presented synchronously with one another.
- the video stream 424A-B and audio stream 426A-B may be transmitted using the WebRTC application programming interface.
- the WebRTC is an API available in JavaScript.
- devices 306 A and B download and run web applications, as conference applications 310A and B, and conference applications 310A and B may be implemented in JavaScript.
- Conference applications 310A and B may use WebRTC to receive and transmit video stream 424A-B and audio stream 426A-B by making API calls from its JavaScript.
- conference applications 310A and B may periodically or intermittently re-render the virtual space based on new information from respective video streams 424A and B, position and direction 422A and B, and new information relating to the three-dimensional environment.
- new information from respective video streams 424A and B, position and direction 422A and B, and new information relating to the three-dimensional environment.
- each of these updates are now described from the perspective of device 306A.
- device 306B would behave similarly given similar changes.
- device 306A texture maps frames from video stream 424 A on to an avatar corresponding to device 306B. That texture mapped avatar is re-rendered within the three-dimensional virtual space and presented to a user of device 306 A.
- device 306A As device 306A receives a new position and direction 422B, device 306A generates the avatar corresponding to device 306B positioned at the new position and oriented at the new direction. The generated avatar is re-rendered within the three- dimensional virtual space and presented to the user of device 306 A.
- server 302 may send updated model information describing the three-dimensional virtual environment.
- server 302 may send updated information 402, 404, 406, or 408.
- device 306 A will rerender the virtual environment based on the updated information. This may be useful when the environment changes over time. For example, an outdoor event may change from daylight to dusk as the event progresses.
- server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In that case, device 306 A would re-render the virtual environment without the avatar for device 306B.
- figure 3 in figures 4A-C is illustrated with two devices for simplicity, a skilled artisan would understand that the techniques described herein can be extended to any number of devices. Also, while figure 3 in figures 4A-C illustrates a single server 302, a skilled artisan would understand that the functionality of server 302 can be spread out among a plurality of computing devices.
- the data transferred in FIG. 4 A may come from one network address for server 302, while the data transferred in FIGs. 4B-C can be transferred to/from another network address for server 302.
- participants can set their webcam, microphone, speakers and graphical settings before entering the virtual conference.
- users may enter a virtual lobby where they are greeted by an avatar controlled by a real person. This person is able to view and modify the webcam, microphone, speakers and graphical settings of the user.
- the attendant can also instruct the user on how to use the virtual environment, for example by teaching them about looking, moving around and interacting. When they are ready, the user automatically leaves the virtual waiting room and joins the real virtual environment.
- Embodiments also adjust volume to provide a sense of position and space within the virtual conference. This is illustrated, for example, in figures 5-7, 8A-B and 9A-C, each of which is described below.
- Figure 5 is a flowchart illustrating a method 500 for adjusting relative left-right volume to provide a sense of position in a virtual environment during a videoconference.
- volume is adjusted based on distance between the avatars.
- an audio stream from a microphone of a device of another user is received.
- the volume of both the first and second audio streams is adjusted based on a distance between the second position to the first position. This is illustrated in Figure 6.
- Figure 6 shows a chart 600 illustrating how volume rolls off as distance between the avatars increases.
- Chart 600 illustrates volume 602 on its x-axis and y-axis. As distance between the users increases, the volume stays constant until a reference distance 602 is reached. At that point, volume begins to drop off. In this way, all other things being equal, a closer user will often sound louder than a farther user.
- How fast the sound drops off depends on a roll off factor. This may be a coefficient built into the settings of the videoconferencing system or the client device. As illustrated by line 608 and line 610, a greater roll off factor will cause the volume to deteriorate more rapidly than a lesser one.
- relative left-right audio is adjusted based on a direction where the avatar is located. That is, a volume of the audio to be output on the user’s speaker (e.g., headset) will vary to provide a sense of where the speaking user’s avatar is located.
- the relative volume of the left and right audio streams are adjusted based on a direction of a position where the user generating the audio stream is located (e.g., the location of the speaking user’s avatar) relative to a position where the user receiving the audio is located (e.g., the location of the virtual camera). The positions may be on a horizontal plane within the three-dimensional virtual space.
- the relative volume of the left and right audio streams to provide a sense of where the second position is in the three-dimensional virtual space relative to the first position.
- step 504 audio corresponding to an avatar to the left of the virtual camera would be adjusted such that the audio is output on the receiving user’s left ear at a higher volume than on the right ear.
- audio corresponding to an avatar to the right of the virtual camera would be adjusted such that the audio is output on the receiving user’s right ear at a higher volume than on the left ear.
- relative left-right audio is adjusted based on the direction that one avatar is oriented relative to the other.
- a relative volume of the left and right audio streams is adjusted based on an angle between the direction where the virtual camera is facing and a direction where the avatar is facing such that the angle being more normal tends to have a greater difference in volume between the left and right audio streams.
- the relative leftright volume of the avatar’s corresponding audio stream may not be adjusted at all in step 506.
- the relative left-right volume of the avatar’s corresponding audio stream may be adjusted so that left is louder than right.
- the relative left-right volume of the avatar’s corresponding audio stream may be adjusted so that right is louder than left.
- the calculation in step 506 may involve taking the cross product of the angle where the virtual camera is facing and the angle where the avatar is facing. The angles may be the direction they are facing on a horizontal plane.
- a check may be conducted to determine the audio output device the user is using. If the audio output device is not a set of headphones or another type of speaker that provides a stereo effect, the adjustments in steps 504 and 506 may not occur.
- Steps 502-506 are repeated for every audio stream received from every other participant. Based on the calculations in steps 502-506, a left and right audio gain is calculated for every other participant.
- the audio streams for each participant are adjusted to provide a sense of where the participant’s avatar is located in the three-dimensional virtual environment.
- audio streams can be adjusted to provide private or semi-private volume areas.
- the virtual environment enables users to have private conversations. Also, it enables users to mingle with one another and allow separate, side conversations to occur, something that’s not possible with conventional videoconferencing software. This is illustrated for example in with respect to figure 7.
- Figure 7 is a flowchart illustrating a method 700 for adjusting relative volume to provide different volume areas in a virtual environment during a videoconference.
- the server may provide specification of sound or volume areas to the client devices.
- Virtual environment may be partitioned into different volume areas.
- a device determines in which sound areas the respective avatars and the virtual camera are located.
- FIGS 8A-B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
- Figure 8A illustrates a diagram 800 with a volume area 802 that allows for a semi-private or side conversation between a user controlling avatar 806 and the user controlling the virtual camera.
- the sound from the users controlling avatar 806 in the virtual camera may fall off as it exits volume area 802, but not entirely. That allows passersby to join the conversation if they’d like.
- Interface 800 also includes buttons 804, 806, and 808, which will be described below.
- Figure 8B illustrates a diagram 800 with a volume area 804 that allows for a private conversation between a user controlling avatar 808 and the user controlling the virtual camera.
- a volume area 804 that allows for a private conversation between a user controlling avatar 808 and the user controlling the virtual camera.
- audio from the user controlling avatar 808 and the user controlling the virtual camera may only be output to those inside volume area 804. As no audio at all is played from those users to others in the conference, their audio streams may not even be transmitted to the other user devices.
- Volume spaces may be hierarchical as illustrated in figures 9A and 9B.
- Figure 9B is a diagram 930 shows a layout with different volume areas arranged in a hierarchy. Volume areas 934 and 935 are within volume area 933, and volume area 933 and 932 are within volume area 931. These volume areas are represented in a hierarchical tree, as illustrated in diagram 900 and figure 9A.
- node 901 represents volume area 931 and is the root of the tree.
- Nodes 902 and 903 are children of node 901, and represent volume areas 932 and 933.
- Nodes 904 and 906 are children of node 903, and represent volume areas 934 and 935.
- the hierarchy is traversed to determine which various sound areas are between the avatars. This is illustrated, for example, in figure 9C. Starting from the node corresponding to the virtual area of the speaking voice (in this case node 904) a path to the node of the receiving user (in this case node 902) is determined. To determine the path, the links 952 going between the nodes are determined. In this way, a subset of areas between an area including the avatar and an area including the virtual camera is determined.
- the audio stream from the speaking user is attenuated based on respective wall transmission factors of the subset of areas.
- Each respective wall transmission factor specifies how much the audio stream is attenuated.
- the different areas have different roll off factors in that case, the distance based calculation shown in method 600 may be applied for individual areas based on the respective roll off factors. In this way, different areas of the virtual environment project sound at different rates.
- the audio gains determined in the method as described above with respect to figure 5 may be applied to the audio stream to determine left and right audio accordingly. In this way, both wall transmission factors, roll off factors, and left-right adjustments to provide a sense of direction for the sound may be applied together to provide a comprehensive audio experience.
- a volume area may be a podium area. If the user is located in the podium area, some or all of the attenuation described with respect to figures 5 or 7 may not occur. For example, no attenuation may occur because of roll off factors or wall transmission factors. In some embodiments, the relative left-right audio may still be adjusted to provide a sense of direction.
- the methods described with respect to figures 5 and 7 are describing audio streams from a user who has a corresponding avatar.
- the same methods may be applied to other sound sources, other than avatars.
- the virtual environment may have three-dimensional models of speakers. Sound may be emitted from the speakers in the same way as the avatar models described above, either because of a presentation or just to provide background music.
- wall transmission factors may be used to isolate audio entirely.
- this can be used to create virtual offices.
- each user may have in their physical (perhaps home) office a monitor displaying the conference application constantly on and logged into the virtual office.
- There may be a feature that allows the user to indicate whether he’s in the office or should not be disturbed. If the do-not-disturb indicator is off, a coworker or manager may come around within the virtual space and knock or walk in as they would in a physical office. The visitor may be able to leave a note if the worker is not present in her office. When the worker returns, she would be able to read the note left by the visitor.
- the virtual office may have a whiteboard and/or an interface that displays messages for the user.
- the messages may be email and/or from a messaging application such as the SLACK application available from Slack Technologies, Inc. of San Francisco, CA.
- Users may be able to customize or personalize their virtual offices. For example, they may be able to put up models of posters or other wall ornaments. They may be able to change models or orientation of desks or decorative ornaments, such as plantings. They may be able to change lighting or view out the window.
- the interface 800 includes various buttons 804, 806, and 808.
- the button 804 When a user presses the button 804, the attenuation described above with respect to the methods in figures 5 and 7 may not occur, or may occur only in smaller amounts. In that situation, the user’s voice is output uniformly to other users, allowing for the user to provide a talk to all participants in the meeting.
- the user video may also be output on a presentation screen within the virtual environment as well, as will be described below.
- a speaker mode is enabled. In that case, audio is output from sound sources within the virtual environment, such as to play background music.
- a screen share mode may be enabled, enabling the user to share contents of a screen or window on their device with other users. The contents may be presented on a presentation model. This too will be described below.
- Figure 10 illustrates an interface 1000 with a three-dimensional model 1004 in a three-dimensional virtual environment.
- interface 1000 may be displayed to a user who can navigate around the virtual environment.
- the virtual environment includes an avatar 1004 and a three-dimensional model 1002.
- Three-dimensional model 1002 is a 3D model of a product which is placed inside a virtual space. People are able to join this virtual space to observe the model, and can walk around it. The product may have localized sound to enhance the experience.
- a three-dimensional model may be rendered for display simultaneously with presenting the video stream. Users can navigate the virtual camera around the three-dimensional model of the product.
- Figure 11 illustrates an interface 1100 with a presentation screen share in a three- dimensional virtual environment used for videoconferencing. As described above with respect to figure 1, interface 1100 may be displayed to a user who can navigate around the virtual environment. As illustrated in interface 1100, the virtual environment includes an avatar 1104 and a presentation screen 1106.
- a presentation stream from a device of a participant in the conference is received.
- the presentation stream is texture mapped onto a three- dimensional model of a presentation screen 1106.
- the presentation stream may be a video stream from a camera on user’s device.
- the presentation stream may be a screen share from the user’s device, where a monitor or window is shared.
- the presentation video and audio stream could also be from an external source, for example a livestream of an event.
- the presentation stream (and audio stream) of the user is published to the server tagged with the name of the screen the user wants to use. Other clients are notified that a new stream is available.
- the presenter may also be able to control the location and orientation of the audience members. For example, the presenter may have an option to select to re-arrange all the other participants to the meeting to be positioned and oriented to face the presentation screen.
- An audio stream is captured synchronously with the presentation stream and from a microphone of the device of the first participant.
- the audio stream from the microphone of the user may be heard by other users as to be coming from presentation screen 1106.
- presentation screen 1106 may be a sound source as described above. Because the user’s audio stream is projected from the presentation screen 1106, it may be suppressed coming from the user’s avatar. In this way, the audio stream is outputted to play synchronously with display of the presentation stream on screen 1106 within the three-dimensional virtual space.
- Figure 12 is a flowchart illustrating a method 1200 for apportioning available bandwidth based on relative position of avatars within the three-dimensional virtual environment.
- a distance is determined between a first user and a second user in a virtual conference space. The distance may be a distance between them on a horizontal plane in three-dimensional space.
- received video streams are prioritized such that those of closer users are prioritized over video streams from farther ones.
- a priority value may be determined as illustrated in figure 13.
- Figure 13 shows a chart 1300 that shows a priority 1306 on the y-axis and a distance 1302. As illustrated by line 1306, priority state that maintains a constant level until a reference distance 1304 is reached. After the reference distance is reached, the priority starts to fall off.
- the available bandwidth to the user device is apportioned between the various video streams. This may be done based on the priority values determined in step 1204. For example, the priorities may be proportionally adjusted so that all together they sum to 1. For any videos where insufficient bandwidth is available, the relative priority may be brought to zero. Then, the priorities are again adjusted for the remainder of the video streams. The bandwidth is allocated based on these relative priority values. In addition, bandwidth may be reserved for the audio streams. This is illustrated in figure 14.
- Figure 14 illustrates a chart 1400 with a y-axis representing bandwidth 1406 and an x-axis representing relative priority. After a video is allocated a minimum bandwidth 1406 to be effective, the bandwidth 1406 allocated to a video stream increases proportionally with its relative priority.
- the client may request the video from the server at the bandwidth/bitrate/frame rate/re solution selected and allocated for that video. This may start a negotiation process between the client and the server to begin streaming the video at the designated bandwidth. In this way, the available video and audio bandwidth is divided fairly over all users, where users with twice as much priority will get twice as much bandwidth.
- step 1208 it is determined whether the bandwidth available between the first and second user in the virtual conference space is such that display of video at the distance is ineffective. This determination may be done by either the client or server. If by the client, then the client sends a message for the server to cease transmission of the video to the client. If it is ineffective, transmission of the video stream to the device of the second user is halted, and the device of the second user is notified to substitute a still image for the video stream. The still image may simply be the last (or one of the last) video frames received.
- a similar process may be executed for audio, reducing the quality given the size of the reserved portion for the audio.
- each audio stream is given a consistent bandwidth.
- embodiments increase performance for all users and for the server the video and audio stream quality can be reduced for users that are farther away and/or less important. This is not done when there is enough bandwidth budget available. The reduction is done in both bitrate and resolution. This improves video quality as the available bandwidth for that user can be utilized more efficiently by the encoder.
- the video resolution is scaled down based on distance, with users that are twice as far away having half the resolution. In this way, resolution that is unnecessary, given limitations in screen resolution, may not be downloaded. Thus, bandwidth is conserved.
- Figure 15 is a diagram of a system 1500 illustrating components of devices used to provide videoconferencing within a virtual environment.
- system 1500 can operate according to the methods described above.
- Device 306A is a user computing device.
- Device 306A could be a desktop or laptop computer, smartphone, tablet, or wearable (e.g., watch or head mounted device).
- Device 306A includes a microphone 1502, camera 1504, stereo speaker 1506, input device 1512.
- device 306A also includes a processor and persistent, non transitory and volatile memory.
- the processors can include one or more central processing units, graphic processing units or any combination thereof.
- Microphone 1502 converts sound into an electrical signal. Microphone 1502 is positioned to capture speech of a user of device 306 A.
- microphone 1502 could be a condenser microphone, electret microphone, moving-coil microphone, ribbon microphone, carbon microphone, piezo microphone, fiber-optic microphone, laser microphone, water microphone, or MEMs microphone.
- Camera 1504 captures image data by capturing light, generally through one or more lenses. Camera 1504 is positioned to capture photographic images of a user of device 306A. Camera 1504 includes an image sensor (not shown).
- the image sensor may, for example, be a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor.
- CMOS complementary metal oxide semiconductor
- the image sensor may include one or more photodetectors that detect light and convert to electrical signals. These electrical signals captured together in a similar timeframe comprise a still photographic image. A sequence of still photographic images captured at regular intervals together comprise a video. In this way, camera 1504 captures images and videos.
- Stereo speaker 1506 is a device which converts an electrical audio signal into a corresponding left-right sound. Stereo speaker 1506 outputs the left audio stream and the right audio stream generated by an audio processor 1520 (below) to be played to device 306A’s user in stereo. Stereo speaker 1506 includes both ambient speakers and headphones that are designed to play sound directly into a user’s left and right ears.
- Example speakers includes moving-iron loudspeakers, piezoelectric speakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, heil air motion transducers, transparent ionic conduction speakers, plasma arc speakers, thermoacoustic speakers, rotary woofers, moving-coil, electrostatic, electret, planar magnetic, and balanced armature.
- Network interface 1508 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network.
- Network interface 1508 receives a video stream from server 302 for respective participants for the meeting. The video stream is captured from a camera on a device of another participant to the video conference.
- Network interface 1508 also received data specifying a three-dimensional virtual space and any models therein from server 302. For each of the other participants, network interface 1508 receives a position and direction in the three-dimensional virtual space. The position and direction are input by each of the respective other participants.
- Network interface 1508 also transmits data to server 302. It transmits the position of device 306A’s user’s virtual camera used by Tenderer 1518 and it transmits video and audio streams from camera 1504 and microphone 1502.
- Display 1510 is an output device for presentation of electronic information in visual or tactile form (the latter used for example in tactile electronic displays for blind people).
- Display 1510 could be a television set, computer monitor, head-mounted display, heads-up displays, output of a augmented reality or virtual reality headset, broadcast reference monitor, medical monitors mobile displays (for mobile devices), Smartphone displays (for smartphones).
- display 1510 may include an electroluminescent (ELD) display, liquid crystal display (LCD), light-emitting diode (LED) backlit LCD, thin-film transistor (TFT) LCD, light-emitting diode (LED) display, OLED display, AMOLED display, plasma (PDP) display, quantum dot (QLED) display.
- ELD electroluminescent
- LCD liquid crystal display
- LED light-emitting diode
- TFT thin-film transistor
- LED light-emitting diode
- OLED display OLED display
- AMOLED display plasma (PDP) display
- QLED quantum dot
- Input device 1512 is a piece of equipment used to provide data and control signals to an information processing system such as a computer or information appliance. Input device 1512 allows a user to input a new desired position of a virtual camera used by Tenderer 1518, thereby enabling navigation in the three-dimensional environment. Examples of input devices include keyboards, mouse, scanners, joysticks, and touchscreens.
- Web browser 308A and web application 310A were described above with respect to Figure 3.
- Web application 310A includes screen capturer 1514, texture mapper 1516, Tenderer 1518, and audio processor 1520.
- Screen capturer 1514 captures a presentation stream, in particular a screen share.
- Screen capturer 1514 may interact with an API made available by web browser 308A. By calling a function available from the API, screen capturer 1514 may cause web browser 308 A to ask the user which window or screen the user would like to share. Based on the answer to that query, web browser 308 A may return a video stream corresponding to the screen share to screen capturer 1514, which passes it on to network interface 1508 for transmission to server 302 and ultimately to other participants’ devices.
- Texture mapper 1516 textures map the video stream onto a three-dimensional model corresponding to an avatar. Texture mapper 1516 May texture map respective frames from the video to the avatar. In addition, texture mapper 1516 may texture map a presentation stream to a three-dimensional model of a presentation screen. [0150] Renderer 1518 renders, from a perspective of a virtual camera of the user of device 306 A, for output to display 1510 the three-dimensional virtual space including the texture-mapped three-dimensional models of the avatars for respective participants located at the received, corresponding position and oriented at the direction. Renderer 1518 also renders any other three-dimensional models including for example the presentation screen.
- Audio processor 1520 adjusts volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second position is in the three-dimensional virtual space relative to the first position. In one embodiment, audio processor 1520 adjusts the volume based on a distance between the second position to the first position. In another embodiment, audio processor 1520 adjusts the volume based on a direction of the second position to the first position. In yet another embodiment, audio processor 1520 adjusts the volume based on a direction of the second position relative to the first position on a horizontal plane within the three-dimensional virtual space.
- audio processor 1520 adjusts the volume based on a direction where the virtual camera is facing in the three-dimensional virtual space such that the left audio stream tends to have a higher volume when the avatar is located to the left of the virtual camera and the right audio stream tends to have a higher volume when the avatar is located to the right of the virtual camera.
- audio processor 1520 adjusts the volume based on an angle between the direction where the virtual camera is facing and a direction where the avatar is facing such that the angle being more normal to where the avatar is facing tends to have a greater difference in volume between the left and right audio streams.
- Audio processor 1520 can also adjust an audio stream’s volume based on the area where the speaker is located relative to an area where the virtual camera is located.
- the three-dimensional virtual space is segmented into a plurality of areas. These areas may be hierarchical. When the speaker and virtual camera are located in different areas, a wall transmission factor may be applied to attenuate the speaking audio stream’s volume.
- Server 302 includes an attendance notifier 1522, a stream adjuster 1524, and a stream forwarder 1526.
- Attendance notifier 1522 notifies conference participants when participants join and leave the meeting. When a new participant joins the meeting, attendance notifier 1522 sends a message to the devices of the other participants to the conference indicating that a new participant has joined.
- Attendance notifier 1522 signals stream forwarder 1526 to start forwarding video, audio, and position/direction information to the other participants.
- Stream adjuster 1524 receives a video stream captured from a camera on a device of a first user. Stream adjuster 1524 determines an available bandwidth to transmit data for the virtual conference to the second user. It determines a distance between a first user and a second user in a virtual conference space. And, it apportions the available bandwidth between the first video stream and the second video stream based on the relative distance. In this way, stream adjuster 1524 prioritizes video streams of closer users over video streams from farther ones. Additionally or alternatively, stream adjuster 1524 may be located on device 306A, perhaps as part of web application 310A.
- Stream forwarder 1526 broadcasts position/direction information, video, audio, and screen share screens received (with adjustments made by stream adjuster 1524). Stream forwarder 1526 may send information to the device 306A in response to a request from conference application 310A. Conference application 310A may send that request in response to the notification from attendance notifier 1522.
- Network interface 1528 is a software or hardware interface between two pieces of equipment or protocol layers in a computer network.
- Network interface 1528 transmits the model information to devices of the various participants.
- Network interface 1528 receives video, audio, and screen share screens from the various participants.
- Screen capturer 1514, texture mapper 1516, Tenderer 1518, audio processor 1520, attendance notifier 1522, a stream adjuster 1524, and a stream forwarder 1526 can each be implemented in hardware, software, firmware, or any combination thereof.
- Identifiers such as “(a),” “(b),” “(i),” “(ii),” etc., are sometimes used for different elements or steps. These identifiers are used for clarity and do not necessarily designate an order for the elements or steps.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Transfer Between Computers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Processing Or Creating Images (AREA)
- Geometry (AREA)
- User Interface Of Digital Computer (AREA)
- Image Generation (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims
Priority Applications (12)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021366657A AU2021366657B2 (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
JP2022562717A JP7318139B1 (en) | 2020-10-20 | 2021-10-20 | Web-based videoconferencing virtual environment with steerable avatars and its application |
KR1020237026431A KR20230119261A (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
CN202180037563.6A CN116018803A (en) | 2020-10-20 | 2021-10-20 | Web-based video conference virtual environment with navigable avatar and application thereof |
EP21807405.2A EP4122192A1 (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
BR112022024836A BR112022024836A2 (en) | 2020-10-20 | 2021-10-20 | WEB-BASED VIRTUAL VIDEOCONFERENCE ENVIRONMENT WITH NAVIGABLE AVATARS AND APPLICATIONS THEREOF |
IL308489A IL308489A (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
IL298268A IL298268B2 (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
CA3181367A CA3181367C (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
KR1020227039238A KR102580110B1 (en) | 2020-10-20 | 2021-10-20 | Web-based video conferencing virtual environment with navigable avatars and its applications |
JP2023117467A JP2023139110A (en) | 2020-10-20 | 2023-07-19 | Web-based video conference virtual environment with navigable avatar, and application thereof |
AU2023229565A AU2023229565B2 (en) | 2020-10-20 | 2023-09-14 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
Applications Claiming Priority (14)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/075,338 US10979672B1 (en) | 2020-10-20 | 2020-10-20 | Web-based videoconference virtual environment with navigable avatars, and applications thereof |
US17/075,390 US10952006B1 (en) | 2020-10-20 | 2020-10-20 | Adjusting relative left-right sound to provide sense of an avatar's position in a virtual space, and applications thereof |
US17/075,408 US11070768B1 (en) | 2020-10-20 | 2020-10-20 | Volume areas in a three-dimensional virtual conference space, and applications thereof |
US17/075,390 | 2020-10-20 | ||
US17/075,454 | 2020-10-20 | ||
US17/075,428 | 2020-10-20 | ||
US17/075,408 | 2020-10-20 | ||
US17/075,454 US11457178B2 (en) | 2020-10-20 | 2020-10-20 | Three-dimensional modeling inside a virtual video conferencing environment with a navigable avatar, and applications thereof |
US17/075,362 | 2020-10-20 | ||
US17/075,338 | 2020-10-20 | ||
US17/075,362 US11095857B1 (en) | 2020-10-20 | 2020-10-20 | Presenter mode in a three-dimensional virtual conference space, and applications thereof |
US17/075,428 US11076128B1 (en) | 2020-10-20 | 2020-10-20 | Determining video stream quality based on relative position in a virtual space, and applications thereof |
US17/198,323 US11290688B1 (en) | 2020-10-20 | 2021-03-11 | Web-based videoconference virtual environment with navigable avatars, and applications thereof |
US17/198,323 | 2021-03-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022087147A1 true WO2022087147A1 (en) | 2022-04-28 |
Family
ID=81289363
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/055875 WO2022087147A1 (en) | 2020-10-20 | 2021-10-20 | A web-based videoconference virtual environment with navigable avatars, and applications thereof |
Country Status (9)
Country | Link |
---|---|
EP (1) | EP4122192A1 (en) |
JP (2) | JP7318139B1 (en) |
KR (2) | KR102580110B1 (en) |
CN (1) | CN116018803A (en) |
AU (2) | AU2021366657B2 (en) |
BR (1) | BR112022024836A2 (en) |
CA (1) | CA3181367C (en) |
IL (2) | IL298268B2 (en) |
WO (1) | WO2022087147A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024020452A1 (en) * | 2022-07-20 | 2024-01-25 | Katmai Tech Inc. | Multi-screen presentation in a virtual videoconferencing environment |
WO2024020562A1 (en) * | 2022-07-21 | 2024-01-25 | Katmai Tech Inc. | Resituating virtual cameras and avatars in a virtual environment |
US11928774B2 (en) | 2022-07-20 | 2024-03-12 | Katmai Tech Inc. | Multi-screen presentation in a virtual videoconferencing environment |
WO2024053845A1 (en) * | 2022-09-08 | 2024-03-14 | 삼성전자주식회사 | Electronic device and method for providing content sharing based on object |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110225039A1 (en) * | 2010-03-10 | 2011-09-15 | Oddmobb, Inc. | Virtual social venue feeding multiple video streams |
US20130321564A1 (en) * | 2012-05-31 | 2013-12-05 | Microsoft Corporation | Perspective-correct communication window with motion parallax |
US20200099891A1 (en) * | 2017-06-09 | 2020-03-26 | Pcms Holdings, Inc. | Spatially faithful telepresence supporting varying geometries and moving users |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2006217569A1 (en) * | 2005-02-23 | 2006-08-31 | Craig Summers | Automatic scene modeling for the 3D camera and 3D video |
US7576766B2 (en) * | 2005-06-30 | 2009-08-18 | Microsoft Corporation | Normalized images for cameras |
WO2013119802A1 (en) * | 2012-02-11 | 2013-08-15 | Social Communications Company | Routing virtual area based communications |
US8994780B2 (en) | 2012-10-04 | 2015-03-31 | Mcci Corporation | Video conferencing enhanced with 3-D perspective control |
US9524588B2 (en) | 2014-01-24 | 2016-12-20 | Avaya Inc. | Enhanced communication between remote participants using augmented and virtual reality |
JP7415940B2 (en) | 2018-11-09 | 2024-01-17 | ソニーグループ株式会社 | Information processing device and method, and program |
JP6684952B1 (en) | 2019-06-28 | 2020-04-22 | 株式会社ドワンゴ | Content distribution device, content distribution program, content distribution method, content display device, content display program, and content display method |
-
2021
- 2021-10-20 BR BR112022024836A patent/BR112022024836A2/en unknown
- 2021-10-20 KR KR1020227039238A patent/KR102580110B1/en active IP Right Grant
- 2021-10-20 IL IL298268A patent/IL298268B2/en unknown
- 2021-10-20 AU AU2021366657A patent/AU2021366657B2/en active Active
- 2021-10-20 WO PCT/US2021/055875 patent/WO2022087147A1/en unknown
- 2021-10-20 IL IL308489A patent/IL308489A/en unknown
- 2021-10-20 EP EP21807405.2A patent/EP4122192A1/en active Pending
- 2021-10-20 CA CA3181367A patent/CA3181367C/en active Active
- 2021-10-20 CN CN202180037563.6A patent/CN116018803A/en active Pending
- 2021-10-20 JP JP2022562717A patent/JP7318139B1/en active Active
- 2021-10-20 KR KR1020237026431A patent/KR20230119261A/en active Application Filing
-
2023
- 2023-07-19 JP JP2023117467A patent/JP2023139110A/en active Pending
- 2023-09-14 AU AU2023229565A patent/AU2023229565B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110225039A1 (en) * | 2010-03-10 | 2011-09-15 | Oddmobb, Inc. | Virtual social venue feeding multiple video streams |
US20130321564A1 (en) * | 2012-05-31 | 2013-12-05 | Microsoft Corporation | Perspective-correct communication window with motion parallax |
US20200099891A1 (en) * | 2017-06-09 | 2020-03-26 | Pcms Holdings, Inc. | Spatially faithful telepresence supporting varying geometries and moving users |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024020452A1 (en) * | 2022-07-20 | 2024-01-25 | Katmai Tech Inc. | Multi-screen presentation in a virtual videoconferencing environment |
US11928774B2 (en) | 2022-07-20 | 2024-03-12 | Katmai Tech Inc. | Multi-screen presentation in a virtual videoconferencing environment |
WO2024020562A1 (en) * | 2022-07-21 | 2024-01-25 | Katmai Tech Inc. | Resituating virtual cameras and avatars in a virtual environment |
WO2024053845A1 (en) * | 2022-09-08 | 2024-03-14 | 삼성전자주식회사 | Electronic device and method for providing content sharing based on object |
Also Published As
Publication number | Publication date |
---|---|
KR20230119261A (en) | 2023-08-16 |
IL298268B1 (en) | 2024-01-01 |
KR102580110B1 (en) | 2023-09-18 |
AU2021366657B2 (en) | 2023-06-15 |
CA3181367A1 (en) | 2022-04-28 |
BR112022024836A2 (en) | 2023-05-09 |
KR20220160699A (en) | 2022-12-06 |
AU2023229565B2 (en) | 2024-08-15 |
CA3181367C (en) | 2023-11-21 |
JP7318139B1 (en) | 2023-07-31 |
JP2023139110A (en) | 2023-10-03 |
CN116018803A (en) | 2023-04-25 |
IL308489A (en) | 2024-01-01 |
JP2023534092A (en) | 2023-08-08 |
AU2023229565A1 (en) | 2023-10-05 |
IL298268B2 (en) | 2024-05-01 |
EP4122192A1 (en) | 2023-01-25 |
IL298268A (en) | 2023-01-01 |
AU2021366657A1 (en) | 2022-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11290688B1 (en) | Web-based videoconference virtual environment with navigable avatars, and applications thereof | |
US10952006B1 (en) | Adjusting relative left-right sound to provide sense of an avatar's position in a virtual space, and applications thereof | |
US11076128B1 (en) | Determining video stream quality based on relative position in a virtual space, and applications thereof | |
US11070768B1 (en) | Volume areas in a three-dimensional virtual conference space, and applications thereof | |
US11095857B1 (en) | Presenter mode in a three-dimensional virtual conference space, and applications thereof | |
US12081908B2 (en) | Three-dimensional modeling inside a virtual video conferencing environment with a navigable avatar, and applications thereof | |
US11184362B1 (en) | Securing private audio in a virtual conference, and applications thereof | |
AU2021366657B2 (en) | A web-based videoconference virtual environment with navigable avatars, and applications thereof | |
US11743430B2 (en) | Providing awareness of who can hear audio in a virtual conference, and applications thereof | |
US11700354B1 (en) | Resituating avatars in a virtual environment | |
US12028651B1 (en) | Integrating two-dimensional video conference platforms into a three-dimensional virtual environment | |
US20240087236A1 (en) | Navigating a virtual camera to a video avatar in a three-dimensional virtual environment, and applications thereof | |
US20240087213A1 (en) | Selecting a point to navigate video avatars in a three-dimensional environment | |
US11928774B2 (en) | Multi-screen presentation in a virtual videoconferencing environment | |
US20240031531A1 (en) | Two-dimensional view of a presentation in a three-dimensional videoconferencing environment | |
US11741664B1 (en) | Resituating virtual cameras and avatars in a virtual environment | |
US20240007593A1 (en) | Session transfer in a virtual videoconferencing environment | |
WO2024020452A1 (en) | Multi-screen presentation in a virtual videoconferencing environment | |
WO2022235916A1 (en) | Securing private audio in a virtual conference, and applications thereof | |
WO2024020562A1 (en) | Resituating virtual cameras and avatars in a virtual environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21807405 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022562717 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021807405 Country of ref document: EP Effective date: 20221017 |
|
ENP | Entry into the national phase |
Ref document number: 3181367 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 20227039238 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021366657 Country of ref document: AU Date of ref document: 20211020 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112022024836 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112022024836 Country of ref document: BR Kind code of ref document: A2 Effective date: 20221205 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |