KR20220160699A

KR20220160699A - Web-based video conferencing virtual environment with navigable avatars and its applications

Info

Publication number: KR20220160699A
Application number: KR1020227039238A
Authority: KR
Inventors: 제라드 코넬리스 크롤; 에릭 스튜어트 브라운드
Original assignee: 카트마이 테크 인크.
Priority date: 2020-10-20
Filing date: 2021-10-20
Publication date: 2022-12-06
Also published as: IL298268B2; CN116018803A; AU2021366657B2; CA3181367C; AU2023229565B2; WO2022087147A1; IL308489A; KR20230119261A; CA3181367A1; AU2023229565A1; BR112022024836A2; IL298268B1; IL298268A; EP4122192A1; JP2023534092A; JP7318139B1; JP2023139110A; AU2021366657A1; KR102580110B1

Abstract

비디오 아바타들이 가상 환경 내에서 내비게이션할 수 있게 하는 웹 기반 화상 회의 시스템이 본 명세서에서 개시된다. 이 시스템은 프레젠테이션 스트림이 가상 환경 내에 위치한 발표자 화면에 텍스처 매핑될 수 있게 하는 프레젠테이션 모드를 갖는다. 가상 공간에서의 아바타의 위치의 감각을 제공하기 위해 상대적인 좌우 사운드가 조정된다. 아바타가 위치하는 영역 및 가상 카메라가 위치하는 영역에 기초하여 사운드가 추가로 조정된다. 비디오 스트림 품질이 가상 공간에서의 상대적인 위치에 기초하여 조정된다. 가상 화상 회의 환경 내부에서 3차원 모델링이 이용 가능하다.A web-based video conferencing system that allows video avatars to navigate within a virtual environment is disclosed herein. The system has a presentation mode that allows the presentation stream to be texture mapped to a presenter's screen located within the virtual environment. The relative left and right sounds are adjusted to provide a sense of the avatar's position in virtual space. The sound is further adjusted based on the area where the avatar is located and the area where the virtual camera is located. Video stream quality is adjusted based on relative position in virtual space. Three-dimensional modeling is available inside the virtual videoconferencing environment.

Description

Web-based video conferencing virtual environment with navigable avatars and its applications

[관련 출원들의 상호 참조][Cross References to Related Applications]

본 출원은 2021년 4월 13일에 등록된 미국 특허 제10,979,672호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,338호, 2021년 3월 11일에 출원된 미국 실용 특허 출원 제17/198,323호, 2021년 8월 17일에 등록된 미국 특허 제11,095,857호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,362호, 2021년 3월 16일에 등록된 미국 특허 제10,952,006호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,390호, 2021년 7월 20일에 등록된 미국 특허 제11,070,768호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,408호, 2021년 7월 27일에 등록된 미국 특허 제11,076,128호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,428호, 및 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,454호에 대한 우선권을 주장한다. 이러한 출원들 각각의 내용은 참조에 의해 그 전체가 본 명세서에 포함된다.This application is filed as US Patent No. 10,979,672, issued on April 13, 2021, and US Practical Patent Application Serial No. 17/075,338, filed on October 20, 2020, currently filed on March 11, 2021 U.S. Utility Patent Application No. 17/198,323 filed on August 17, 2021, U.S. Utility Patent Application No. 17/075,362, filed on October 20, 2020, currently registered as U.S. Patent No. 11,095,857, filed on August 17, 2021; US Patent Application Serial No. 17/075,390, filed on October 20, 2020, currently registered as US Patent No. 10,952,006, issued on March 16, 2021, US Patent issued on July 20, 2021 U.S. Utility Patent Application No. 17/075,408, filed on October 20, 2020, currently registered as No. 11,070,768; Priority is claimed to U.S. Utility Patent Application Serial No. 17/075,428, filed October 20, and U.S. Utility Patent Application Serial No. 17/075,454, filed October 20, 2020. The contents of each of these applications are incorporated herein by reference in their entirety.

[기술 분야][Technical field]

본 기술 분야는 일반적으로 화상 회의에 관한 것이다.This technical field generally relates to video conferencing.

[관련 기술][Related technology]

화상 회의는 실시간으로 사람들 사이의 통신을 위해 상이한 위치들에 있는 사용자들에 의한 오디오-비디오 신호들의 수신 및 전송을 수반한다. 화상 회의는, 캘리포니아 산호세의 Zoom Communications Inc.로부터 이용 가능한 ZOOM 서비스를 포함한, 각종의 상이한 서비스들로부터 많은 컴퓨팅 디바이스들에서 널리 이용 가능하다. 캘리포니아 쿠퍼티노의 Apple Inc.로부터 이용 가능한 FaceTime 애플리케이션과 같은, 일부 화상 회의 소프트웨어는 모바일 디바이스들에 표준으로 제공된다.Videoconferencing involves the reception and transmission of audio-video signals by users in different locations for interpersonal communication in real time. Video conferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, California. Some video conferencing software, such as the FaceTime application available from Apple Inc. of Cupertino, California, comes standard with mobile devices.

일반적으로, 이러한 애플리케이션들은 다른 회의 참가자들의 비디오를 디스플레이하고 오디오를 출력하는 것에 의해 작동한다. 다수의 참가자들이 있을 때, 화면은, 참가자의 비디오를 각각 디스플레이하는, 다수의 직사각형 프레임들로 분할될 수 있다. 때때로 이러한 서비스들은 말하는 사람의 비디오를 제시하는 보다 큰 프레임을 갖는 것에 의해 작동한다. 상이한 개인들이 발화(speech)할 때, 해당 프레임이 발화자들 간에 전환될 것이다. 애플리케이션은 사용자의 디바이스와 통합된 카메라로부터의 비디오와 사용자의 디바이스와 통합된 마이크로폰으로부터의 오디오를 캡처한다. 애플리케이션은 이어서 해당 오디오 및 비디오를 다른 사용자의 디바이스들에서 실행 중인 다른 애플리케이션들에게 전송한다.Generally, these applications work by displaying video of other conference participants and outputting audio. When there are multiple participants, the screen may be divided into multiple rectangular frames, each displaying a participant's video. Sometimes these services work by having a larger frame presenting video of the person speaking. When different individuals speak, the frame will switch between the speakers. The application captures video from a camera integrated with the user's device and audio from a microphone integrated with the user's device. The application then transmits that audio and video to other applications running on other users' devices.

이러한 화상 회의 애플리케이션들 중 다수는 화면 공유 기능을 갖는다. 사용자가 자신의 화면(또는 자신의 화면의 일 부분)을 공유하기로 결정할 때, 스트림이 자신의 화면의 내용과 함께 다른 사용자들의 디바이스들에게 전송된다. 일부 경우에, 다른 사용자들이 심지어 사용자의 화면에 있는 것을 제어할 수 있다. 이러한 방식으로, 사용자들은 프로젝트에 대해 협업하거나 다른 회의 참가자들에게 프레젠테이션을 할 수 있다.Many of these video conferencing applications have screen sharing capabilities. When a user decides to share their screen (or a portion of their screen), a stream is sent to other users' devices along with the contents of their screen. In some cases, other users may even control what is on the user's screen. In this way, users can collaborate on a project or present a presentation to other meeting participants.

최근에, 화상 회의 기술의 중요성이 증가되었다. 많은 직장들, 무역 박람회들, 회의들, 회담들, 학교들, 및 예배 장소들이 폐쇄되었거나 사람들에게 질병, 특히 COVID-19의 확산을 두려워하여 참석하지 않도록 권장했다. 화상 회의 기술을 사용하는 가상 회의들은 실제 회의들을 점점 더 대체하고 있다. 추가적으로, 이 기술은 여행과 출퇴근을 피하기 위해 물리적으로 만나는 것보다 장점들을 제공한다.Recently, the importance of video conferencing technology has increased. Many workplaces, trade shows, conferences, conferences, schools, and places of worship have been closed or people have been encouraged not to attend for fear of spreading disease, particularly COVID-19. Virtual meetings using video conferencing technology are increasingly replacing physical meetings. Additionally, this technology offers advantages over physically meeting to avoid travel and commuting.

그렇지만 종종, 이 화상 회의 기술의 사용은 장소 감각(sense of place)의 상실을 야기한다. 회의가 가상으로 수행될 때 상실되는, 동일한 장소에 있으면서 물리적으로 직접 만나는 것에 대한 경험적 측면이 있다. 자세를 취하고 동료들을 바라볼 수 있는 것에 대한 사회적 측면이 있다. 이러한 경험의 느낌은 관계들 및 사회적 연결들을 만드는 데 중요하다. 그러나, 종래의 화상 회의들에 관한 한 이러한 느낌이 부족하다.Often, though, the use of this video conferencing technology causes a loss of sense of place. There is an experiential aspect of meeting physically in person while being in the same location that is lost when a meeting is conducted virtually. There is a social aspect to being able to pose and look up to your peers. The feeling of this experience is important in creating relationships and social connections. However, this feeling is lacking as far as conventional video conferences are concerned.

더욱이, 회의에 여러 참가자들이 모이기 시작할 때, 이러한 화상 회의 기술들에 추가적인 문제들이 발생한다. 물리적으로 만나는 회의에서, 사람들은 부차적인 대화를 나눌 수 있다. 자신에 가까이 있는 사람들만이 자신이 말하는 것을 들을 수 있도록 자신의 목소리를 투사할 수 있다. 일부 경우에, 보다 대규모의 회의와 관련하여 심지어 사적인 대화를 나눌 수 있다. 그렇지만, 가상 회의에서, 다수의 사람들이 동시에 발화하고 있을 때, 소프트웨어는 2 개의 오디오 스트림을 실질적으로 동일하게 믹싱하여, 참가자들로 하여금 서로 겹쳐서 발화하게 한다. 따라서, 다수의 사람들이 가상 회의에 관여될 때, 사적인 대화가 불가능하며, 대화가 일대다의 발화의 형태로 보다 많이 이루어지는 경향이 있다. 여기에서도, 가상 회의는 참가자들이 사회적 연결들을 만들고 보다 효과적으로 의사소통을 하며 인적 네트워크를 형성할 수 있는 기회를 잃게 된다.Furthermore, additional problems arise with these video conferencing technologies when multiple participants in a conference begin to gather. In a meeting that meets physically, people can have side conversations. You can project your voice so that only those close to you can hear what you are saying. In some cases, you may even have a private conversation in connection with a larger meeting. However, in a virtual conference, when multiple people are speaking at the same time, the software mixes the two audio streams substantially equally, allowing the participants to speak on top of each other. Thus, when a large number of people are involved in a virtual meeting, private conversation is not possible, and conversation tends to be more in the form of one-to-many utterances. Here, too, virtual meetings lose opportunities for participants to make social connections, communicate more effectively, and network.

더욱이, 네트워크 대역폭 및 컴퓨팅 하드웨어의 제한으로 인해, 회의에서 많은 스트림들이 배치될 때, 많은 화상 회의 시스템들의 성능이 느려지기 시작한다. 많은 컴퓨팅 디바이스들은, 소수의 참가자들로부터의 비디오 스트림을 처리할 장비를 갖추고 있지만, 십여 명의 참가자로부터의 비디오 스트림을 처리하기에는 장비가 불충분하다. 많은 학교들이 완전히 가상으로 운영되기 때문에, 25 명의 학급은 학교에서 지급한 컴퓨팅 디바이스들의 속도를 심각하게 저하시킬 수 있다.Moreover, due to limitations in network bandwidth and computing hardware, the performance of many video conferencing systems starts to slow down when many streams are deployed in a conference. Many computing devices are equipped to process video streams from a small number of participants, but are insufficiently equipped to process video streams from dozens of participants. Because many schools operate completely virtual, a class of 25 can seriously slow down school-issued computing devices.

대규모 멀티플레이어 온라인 게임들(MMOG 또는 MMO)은 일반적으로 25 명보다 상당히 더 많은 참가자를 처리할 수 있다. 이러한 게임들은 종종 단일 서버에 수백 또는 수천 명의 플레이어를 갖는다. MMO들은 종종 플레이어들이 아바타들을 가상 세계 여기저기로 내비게이션할 수 있게 한다. 때때로 이러한 MMO들은 사용자들이 서로 대화할 수 있게 하거나 메시지들을 서로에게 송신할 수 있게 한다. 예들은 캘리포니아 샌머테이오의 Roblox Corporation으로부터 이용 가능한 ROBLOX 게임 및 스웨덴 스톡홀름의 Mojang Studios로부터 이용 가능한 MINECRAFT 게임을 포함한다.Massively multiplayer online games (MMOGs or MMOs) can generally handle significantly more than 25 players. These games often have hundreds or thousands of players on a single server. MMOs often allow players to navigate avatars through virtual worlds. Sometimes these MMOs allow users to talk to each other or send messages to each other. Examples include the ROBLOX game available from Roblox Corporation of San Mateo, Calif. and the MINECRAFT game available from Mojang Studios of Stockholm, Sweden.

무표정의(bare) 아바타들이 서로 상호작용하게 하는 것이 또한 사회적 상호작용의 측면에서 제한들이 있다. 이러한 아바타들은 일반적으로, 사람들이 종종 무심코 짓는 얼굴 표정들을 전달할 수 없다. 화상 회의에서 이러한 얼굴 표정들이 관찰 가능하다. 일부 공보들은 가상 세계에서 아바타에 비디오를 배치하는 것을 설명할 수 있다. 그렇지만, 이러한 시스템들은 전형적으로 특수 소프트웨어를 필요로 하며 그들의 유용성을 제한하는 다른 제한들이 있다.Having bare avatars interact with each other also has limitations in terms of social interaction. These avatars are generally unable to convey the facial expressions that people often make inadvertently. In a video conference, these facial expressions are observable. Some publications may describe placing a video on an avatar in a virtual world. However, these systems typically require special software and have other limitations that limit their usefulness.

화상 회의를 위한 개선된 방법들이 필요하다.Improved methods for video conferencing are needed.

일 실시예에서, 디바이스는 제1 사용자와 제2 사용자 사이의 화상 회의를 가능하게 한다. 디바이스는 메모리에 결합된 프로세서, 디스플레이 스크린, 네트워크 인터페이스, 및 웹 브라우저를 포함한다. 네트워크 인터페이스는: (i) 3차원 가상 공간을 지정하는 데이터, (ii) 3차원 가상 공간에서의 위치 및 방향 - 위치 및 방향은 제1 사용자에 의해 입력됨 -, 및 (iii) 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림을 수신하도록 구성된다. 제1 사용자의 카메라는 제1 사용자의 사진 이미지들을 캡처하도록 배치된다. 프로세서에서 구현되는 웹 브라우저는 서버로부터 웹 애플리케이션을 다운로드하고 웹 애플리케이션을 실행하도록 구성된다. 웹 애플리케이션은 텍스처 매퍼 및 렌더러를 포함한다. 텍스처 매퍼는 비디오 스트림을 아바타의 3차원 모델 상으로 텍스처 매핑하도록 구성된다. 렌더러는, 제2 사용자의 가상 카메라의 시점으로부터, 제2 사용자에게 디스플레이하기 위해 해당 위치에 위치하고 해당 방향으로 배향되는 아바타의 텍스처 매핑된 3차원 모델을 포함하는 3차원 가상 공간을 렌더링하도록 구성된다. 웹 애플리케이션 내에서 텍스처 매핑을 관리하는 것에 의해, 실시예들은 특수 소프트웨어를 설치할 필요가 없게 한다.In one embodiment, the device enables video conferences between a first user and a second user. The device includes a processor coupled to memory, a display screen, a network interface, and a web browser. The network interface includes: (i) data specifying a three-dimensional virtual space, (ii) a position and orientation in the three-dimensional virtual space, the position and orientation being input by the first user, and (iii) the first user's It is configured to receive a video stream captured from a camera on the device. The camera of the first user is arranged to capture photographic images of the first user. A web browser implemented in the processor is configured to download a web application from a server and execute the web application. A web application includes a texture mapper and renderer. The texture mapper is configured to texture map the video stream onto the three-dimensional model of the avatar. The renderer is configured to render, from the point of view of the second user's virtual camera, a three-dimensional virtual space comprising a texture-mapped three-dimensional model of the avatar located at that location and oriented in that direction for display to the second user. By managing texture mapping within the web application, the embodiments eliminate the need to install special software.

일 실시예에서, 컴퓨터 구현 방법은 복수의 참가자들을 포함하는 가상 회의에서 프레젠테이션을 가능하게 한다. 이 방법에서, 3차원 가상 공간을 지정하는 데이터가 수신된다. 3차원 가상 공간에서의 위치와 방향이 또한 수신된다. 해당 위치 및 방향은 회의에 대한 복수의 참가자들 중 제1 참가자에 의해 입력되었다. 마지막으로, 제1 참가자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림이 수신된다. 카메라는 제1 참가자의 사진 이미지들을 캡처하도록 배치되었다. 비디오 스트림은 아바타의 3차원 모델 상으로 텍스처 매핑된다. 추가적으로, 제1 참가자의 디바이스로부터 프레젠테이션 스트림이 수신된다. 프레젠테이션 스트림은 프레젠테이션 화면의 3차원 모델 상으로 텍스처 매핑된다. 마지막으로, 복수의 참가자들 중 제2 참가자의 가상 카메라의 시점으로부터, 제2 참가자에게 디스플레이하기 위해 텍스처 매핑된 아바타와 텍스처 매핑된 프레젠테이션 화면을 갖는 3차원 가상 공간이 렌더링된다. 이러한 방식으로, 실시예들은 소셜 콘퍼런스(social conference) 환경에서 프레젠테이션을 가능하게 한다.In one embodiment, a computer implemented method enables presentation in a virtual conference involving a plurality of participants. In this method, data specifying a three-dimensional virtual space is received. A position and orientation in three-dimensional virtual space are also received. The location and orientation were entered by a first participant of the plurality of participants to the conference. Finally, a video stream captured from a camera on the first participant's device is received. A camera was positioned to capture photographic images of the first participant. The video stream is texture mapped onto the avatar's three-dimensional model. Additionally, a presentation stream is received from the first participant's device. The presentation stream is texture mapped onto a 3D model of the presentation screen. Finally, from the viewpoint of the virtual camera of a second participant among the plurality of participants, a three-dimensional virtual space having a texture-mapped avatar and a texture-mapped presentation screen is rendered for display to the second participant. In this way, embodiments enable presentation in a social conference environment.

일 실시예에서, 컴퓨터 구현 방법은 복수의 참가자들을 포함하는 가상 회의를 위한 오디오를 제공한다. 이 방법에서, 제1 사용자의 가상 카메라의 시점으로부터, 제1 사용자에게 디스플레이하기 위해 제2 사용자의 비디오로 텍스처 매핑된 아바타를 포함하는 3차원 가상 공간이 렌더링된다. 가상 카메라는 3차원 가상 공간에서의 제1 위치에 있고 아바타는 3차원 가상 공간에서의 제2 위치에 있다. 제2 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 마이크로폰은 제2 사용자의 발화를 캡처하도록 배치되었다. 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하도록 좌측 오디오 스트림과 우측 오디오 스트림을 결정하기 위해 수신된 오디오 스트림의 볼륨이 조정된다. 좌측 오디오 스트림과 우측 오디오 스트림은 스테레오로 제1 사용자에게 재생되도록 출력된다.In one embodiment, a computer implemented method provides audio for a virtual conference involving a plurality of participants. In this method, a three-dimensional virtual space containing an avatar texture-mapped from the viewpoint of a first user's virtual camera to a video of a second user is rendered for display to the first user. The virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space. An audio stream from a microphone of a second user's device is received. A microphone was positioned to capture the second user's utterance. The volume of the received audio stream is adjusted to determine the left and right audio streams to provide a sense of where the second position is relative to the first position in three-dimensional virtual space. The left audio stream and the right audio stream are output to be reproduced to the first user in stereo.

일 실시예에서, 컴퓨터 구현 방법은 가상 회의를 위한 오디오를 제공한다. 이 방법에서, 제1 사용자의 가상 카메라의 시점으로부터, 제1 사용자에게 디스플레이하기 위해 제2 사용자의 비디오로 텍스처 매핑된 아바타를 포함하는 3차원 가상 공간이 렌더링된다. 가상 카메라는 3차원 가상 공간에서의 제1 위치에 있고 아바타는 3차원 가상 공간에서의 제2 위치에 있다. 제2 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 가상 카메라와 아바타가 복수의 영역들 중 동일한 영역에 위치하는지 여부가 결정된다. 가상 카메라와 아바타가 동일한 영역에 위치하지 않는 것으로 결정될 때, 오디오 스트림이 감쇠된다. 감쇠된 오디오 스트림은 제1 사용자에게 재생되도록 출력된다. 이러한 방식으로, 실시예들은 가상 화상 회의 환경에서 사적이고 부차적인 대화를 가능하게 한다.In one embodiment, a computer implemented method provides audio for a virtual conference. In this method, a three-dimensional virtual space containing an avatar texture-mapped from the viewpoint of a first user's virtual camera to a video of a second user is rendered for display to the first user. The virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space. An audio stream from a microphone of a second user's device is received. It is determined whether the virtual camera and the avatar are located in the same area among the plurality of areas. When it is determined that the virtual camera and avatar are not located in the same area, the audio stream is attenuated. The attenuated audio stream is output to be reproduced to the first user. In this way, embodiments enable private and secondary conversations in a virtual videoconferencing environment.

일 실시예에서, 컴퓨터 구현 방법은 가상 회의를 위한 비디오를 효율적으로 스트리밍한다. 이 방법에서, 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리가 결정된다. 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림이 수신된다. 카메라는 제1 사용자의 사진 이미지들을 캡처하도록 배치되었다. 보다 가까운 거리가 보다 먼 거리보다 더 큰 해상도를 결과하도록 비디오 스트림의 해상도 또는 비트레이트가 결정된 거리에 기초하여 감소된다. 비디오 스트림은 가상 회의 공간 내의 제2 사용자에게 디스플레이하기 위해 감소된 해상도 또는 비트레이트로 제2 사용자의 디바이스에게 전송된다. 비디오 스트림은 가상 회의 공간 내의 제2 사용자에게 디스플레이하기 위해 제1 사용자의 아바타 상에 텍스처 매핑된다. 이러한 방식으로, 실시예들은 많은 수의 회의 참가자들이 있을 때에도 대역폭 및 컴퓨팅 자원들을 효율적으로 할당한다.In one embodiment, a computer implemented method efficiently streams video for a virtual conference. In this method, a distance between a first user and a second user in a virtual meeting space is determined. A video stream captured from a camera on a device of a first user is received. The camera is positioned to capture photographic images of the first user. The resolution or bitrate of the video stream is reduced based on the determined distance such that closer distances result in greater resolution than greater distances. The video stream is transmitted to the second user's device at a reduced resolution or bitrate for display to the second user in the virtual meeting space. The video stream is texture mapped onto the first user's avatar for display to the second user in the virtual meeting space. In this way, embodiments efficiently allocate bandwidth and computing resources even when there are a large number of conference participants.

일 실시예에서, 컴퓨터 구현 방법은 가상 화상 회의에서의 모델링을 가능하게 한다. 이 방법에서, 가상 환경의 3차원 모델, 객체의 3차원 모델을 나타내는 메시, 및 가상 화상 회의의 참가자로부터의 비디오 스트림이 수신된다. 비디오 스트림은 참가자에 의해 내비게이션 가능한 아바타에 텍스처 매핑된다. 텍스처 매핑된 아바타와 가상 환경 내의 객체의 3차원 모델을 나타내는 메시가 디스플레이하기 위해 렌더링된다.In one embodiment, a computer implemented method enables modeling in virtual videoconferencing. In this method, a three-dimensional model of a virtual environment, a mesh representing a three-dimensional model of an object, and video streams from participants in a virtual videoconference are received. The video stream is texture mapped to an avatar that can be navigated by participants. Meshes representing three-dimensional models of texture-mapped avatars and objects within the virtual environment are rendered for display.

시스템, 디바이스, 및 컴퓨터 프로그램 제품 실시예들이 또한 개시된다.System, device, and computer program product embodiments are also disclosed.

본 발명의 추가 실시예들, 특징들, 및 장점들은 물론, 다양한 실시예들의 구조 및 동작이 첨부 도면을 참조하여 아래에서 상세하게 설명된다.Additional embodiments, features, and advantages of the present invention, as well as the structure and operation of various embodiments, are described in detail below with reference to the accompanying drawings.

여기에 포함되어 명세서의 일부를 형성하는 첨부 도면은 본 개시내용을 예시하고, 이 설명과 함께, 추가로 본 개시내용의 원리들을 설명하고 관련 기술의 통상의 기술자가 본 개시내용을 제조 및 사용할 수 있게 하는 역할을 한다.
도 1은 비디오 스트림들이 아바타들 상으로 매핑되는 가상 환경에서 화상 회의를 제공하는 예시적인 인터페이스를 예시하는 다이어그램이다.
도 2는 화상 회의를 위한 아바타들을 갖는 가상 환경을 렌더링하는 데 사용되는 3차원 모델을 예시하는 다이어그램이다.
도 3은 가상 환경에서 화상 회의를 제공하는 시스템을 예시하는 다이어그램이다.
도 4a 내지 도 4c는 화상 회의를 제공하기 위해 도 3에서의 시스템의 다양한 컴포넌트들 사이에서 데이터가 어떻게 전송되는지를 예시한다.
도 5는 화상 회의 동안 가상 환경에서 위치 감각(sense of position)을 제공하기 위해 상대적인 좌우 볼륨을 조정하기 위한 방법을 예시하는 플로차트이다.
도 6은 아바타들 사이의 거리가 증가함에 따라 볼륨이 어떻게 롤오프되는지를 예시하는 차트이다.
도 7은 화상 회의 동안 가상 환경에서 상이한 볼륨 영역들을 제공하기 위해 상대적인 볼륨을 조정하기 위한 방법을 예시하는 플로차트이다.
도 8a 및 도 8b는 화상 회의 동안 가상 환경에서의 상이한 볼륨 영역들을 예시하는 다이어그램들이다.
도 9a 내지 도 9c는 화상 회의 동안 가상 환경에서의 볼륨 영역들의 계층구조를 순회하는 것을 예시하는 다이어그램들이다.
도 10은 3차원 가상 환경에서의 3차원 모델과의 인터페이스를 예시한다.
도 11은 화상 회의에 사용되는 3차원 가상 환경에서의 프레젠테이션 화면 공유를 예시한다.
도 12는 3차원 가상 환경 내에서의 아바타들의 상대적인 위치에 기초하여 이용 가능한 대역폭을 배분하기 위한 방법을 예시하는 플로차트이다.
도 13은 아바타들 사이의 거리가 증가함에 따라 우선순위 값이 어떻게 떨어질 수 있는지를 예시하는 차트이다.
도 14는 할당된 대역폭이 상대적인 우선순위에 기초하여 어떻게 달라질 수 있는지를 예시하는 차트이다.
도 15는 가상 환경 내에서 화상 회의를 제공하기 위해 사용되는 디바이스들의 컴포넌트들을 예시하는 다이어그램이다.
요소가 처음으로 나오는 도면은 전형적으로 대응하는 참조 번호에서의 가장 왼쪽의 숫자 또는 숫자들에 의해 표시된다. 도면에서, 비슷한 참조 번호들은 동일하거나 기능적으로 유사한 요소들을 나타낼 수 있다.The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present disclosure and, together with the description, further explain the principles of the present disclosure and enable those skilled in the relevant art to make and use the present disclosure. play a role in
1 is a diagram illustrating an exemplary interface for providing video conferencing in a virtual environment in which video streams are mapped onto avatars.
2 is a diagram illustrating a three-dimensional model used to render a virtual environment with avatars for videoconferencing.
3 is a diagram illustrating a system for providing video conferencing in a virtual environment.
4A-4C illustrate how data is transferred between various components of the system in FIG. 3 to provide video conferencing.
5 is a flowchart illustrating a method for adjusting relative left and right volumes to provide a sense of position in a virtual environment during a video conference.
6 is a chart illustrating how the volume rolls off as the distance between avatars increases.
7 is a flowchart illustrating a method for adjusting relative volume to present different volume areas in a virtual environment during a video conference.
8A and 8B are diagrams illustrating different volume areas in a virtual environment during a videoconference.
9A-9C are diagrams illustrating traversing a hierarchy of volume areas in a virtual environment during a videoconference.
10 illustrates an interface with a 3D model in a 3D virtual environment.
11 illustrates presentation screen sharing in a 3D virtual environment used for video conferences.
12 is a flowchart illustrating a method for apportioning available bandwidth based on the relative positions of avatars within a three-dimensional virtual environment.
13 is a chart illustrating how priority values may drop as the distance between avatars increases.
14 is a chart illustrating how allocated bandwidth can vary based on relative priority.
15 is a diagram illustrating components of devices used to provide video conferencing within a virtual environment.
The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.

가상 환경에서의 아바타들을 사용한 화상 회의Video conference using avatars in a virtual environment

도 1은 비디오 스트림들이 아바타들 상으로 매핑되는 가상 환경에서 화상 회의를 제공하는 인터페이스(100)의 예를 예시하는 다이어그램이다.1 is a diagram illustrating an example of an interface 100 for providing video conferencing in a virtual environment where video streams are mapped onto avatars.

인터페이스(100)는 화상 회의의 참가자에게 디스플레이될 수 있다. 예를 들어, 인터페이스(100)는 참가자에게 디스플레이하기 위해 렌더링될 수 있고 화상 회의가 진행됨에 따라 지속적으로 업데이트될 수 있다. 사용자는, 예를 들어, 키보드 입력들을 사용하여 자신의 가상 카메라의 배향을 제어할 수 있다. 이러한 방식으로, 사용자는 가상 환경 여기저기로 내비게이션할 수 있다. 일 실시예에서, 상이한 입력들은 가상 환경에서의 가상 카메라의 X 위치 및 Y 위치와 팬 각도(pan angle) 및 틸트 각도(tilt angle)를 변경할 수 있다. 추가 실시예들에서, 사용자는 가상 카메라의 높이(Z 좌표) 또는 요(yaw)를 변경하기 위해 입력들을 사용할 수 있다. 다른 추가 실시예들에서, 사용자는 가상 카메라로 하여금 위로 "호핑(hop)"하게 하고 중력을 시뮬레이션하여 그의 원래 위치로 돌아가게 하는 입력들을 입력할 수 있다. 가상 카메라를 내비게이션하는 데 이용 가능한 입력들은, 예를 들어, X-Y 평면에서 가상 카메라를 전후 좌우로 이동시키기 위한 WASD 키보드 키들, 가상 카메라를 "호핑"시키기 위한 스페이스 바 키, 그리고 팬 각도 및 틸트 각도의 변경들을 지정하는 마우스 움직임들과 같은, 키보드 및 마우스 입력들을 포함할 수 있다.Interface 100 may be displayed to participants in a video conference. For example, interface 100 can be rendered for display to participants and continuously updated as the video conference progresses. A user can control the orientation of his virtual camera using, for example, keyboard inputs. In this way, the user can navigate through the virtual environment. In one embodiment, the different inputs may change the X and Y positions and pan and tilt angles of the virtual camera in the virtual environment. In further embodiments, the user may use the inputs to change the height (Z coordinate) or yaw of the virtual camera. In yet further embodiments, the user can enter inputs that cause the virtual camera to “hop” upward and return to its original position simulating gravity. Inputs available for navigating the virtual camera include, for example, WASD keyboard keys to move the virtual camera back and forth and left and right in the X-Y plane, the space bar key to "hop" the virtual camera, and the pan and tilt angles. It may include keyboard and mouse inputs, such as mouse movements specifying changes.

인터페이스(100)는 아바타들(102A 및 102B)을 포함하고, 아바타들(102A 및 102B) 각각은 화상 회의의 상이한 참가자들을 나타낸다. 아바타들(102A 및 102B)은, 제각기, 제1 참가자 및 제2 참가자의 디바이스들로부터의 비디오 스트림들(104A 및 104B)로 텍스처 매핑된다. 텍스처 맵(texture map)은 형상 또는 폴리곤의 표면에 적용되는(매핑되는) 이미지이다. 여기서, 이미지들은 비디오의 각자의 프레임들이다. 비디오 스트림들(104A 및 104B)을 캡처하는 카메라 디바이스들은 각각의 참가자들의 얼굴들을 캡처하도록 배치된다. 이러한 방식으로, 아바타들은 회의의 참가자들이 말하고 들을 때 얼굴들의 움직이는 이미지들로 텍스처 매핑된다.Interface 100 includes avatars 102A and 102B, each representing a different participant in a video conference. Avatars 102A and 102B are texture mapped into video streams 104A and 104B from the devices of the first participant and the second participant, respectively. A texture map is an image that is applied (mapped) to the surface of a shape or polygon. Here, images are respective frames of video. The camera devices that capture the video streams 104A and 104B are positioned to capture the faces of the respective participants. In this way, avatars are texture mapped into moving images of faces as participants in the conference speak and listen.

가상 카메라가 사용자 보기 인터페이스(100)에 의해 제어되는 방식과 유사하게, 아바타들(102A 및 102B)의 위치 및 방향은 이들이 표현하는 각자의 참가자들에 의해 제어된다. 아바타들(102A 및 102B)은 메시에 의해 표현되는 3차원 모델들이다. 각각의 아바타(102A 및 102B)는 아바타 아래에 참가자의 이름을 가질 수 있다.Similar to the way a virtual camera is controlled by user viewing interface 100, the position and orientation of avatars 102A and 102B are controlled by the respective participants they represent. Avatars 102A and 102B are three-dimensional models represented by a mesh. Each avatar 102A and 102B may have the participant's name below the avatar.

각자의 아바타들(102A 및 102B)은 다양한 사용자들에 의해 제어된다. 이들 각각은 가상 환경 내에서 자신의 가상 카메라들이 위치하는 곳에 대응하는 지점에 배치될 수 있다. 사용자 보기 인터페이스(100)가 가상 카메라를 여기저기로 이동시킬 수 있는 것처럼, 다양한 사용자들은 각자의 아바타들(102A 및 102B)을 여기저기로 이동시킬 수 있다.Respective avatars 102A and 102B are controlled by various users. Each of these may be placed at a point corresponding to where its virtual cameras are located within the virtual environment. Just as user viewing interface 100 can move a virtual camera around, various users can move their respective avatars 102A and 102B around.

인터페이스(100)에 렌더링되는 가상 환경은 배경 이미지(120) 및 아레나(arena)의 3차원 모델(118)을 포함한다. 아레나는 화상 회의가 개최되어야 하는 장소 또는 건물일 수 있다. 아레나는 벽들에 의해 경계지어지는 바닥 영역을 포함할 수 있다. 3차원 모델(118)은 메시 및 텍스처를 포함할 수 있다. 3차원 모델(118)의 표면을 수학적으로 표현하는 다른 방식들도 가능할 수 있다. 예를 들어, 폴리곤 모델링, 곡선 모델링, 및 디지털 스컬프팅(digital sculpting)이 가능할 수 있다. 예를 들어, 3차원 모델(118)은 복셀들, 스플라인들, 기하학적 프리미티브들, 폴리곤들, 또는 3차원 공간에서의 임의의 다른 가능한 표현에 의해 표현될 수 있다. 3차원 모델(118)은 광원들의 사양을 또한 포함할 수 있다. 광원들은, 예를 들어, 점 광원, 지향성 광원, 스포트라이트 광원, 및 주변 광원을 포함할 수 있다. 객체들은 객체들이 광을 어떻게 반사하는지를 설명하는 특정 속성들을 또한 가질 수 있다. 예들에서, 속성들은 확산 조명, 주변 조명, 및 스펙트럼 조명 상호작용들을 포함할 수 있다.The virtual environment rendered on the interface 100 includes a background image 120 and a three-dimensional model 118 of an arena. An arena may be a place or building where a video conference is to be held. An arena may include a floor area bounded by walls. The 3D model 118 may include meshes and textures. Other ways of mathematically representing the surface of the three-dimensional model 118 may be possible. For example, polygon modeling, curve modeling, and digital sculpting may be possible. For example, the three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space. The three-dimensional model 118 may also include specifications of light sources. Light sources may include, for example, point light sources, directional light sources, spotlight sources, and ambient light sources. Objects can also have certain properties that describe how objects reflect light. In examples, properties may include diffuse lighting, ambient lighting, and spectral lighting interactions.

아레나 외에도, 가상 환경은 환경의 상이한 컴포넌트들을 예시하는 다양한 다른 3차원 모델들을 포함할 수 있다. 예를 들어, 3차원 환경은 장식 모델(114), 스피커 모델(116), 및 프레젠테이션 화면 모델(122)을 포함할 수 있다. 모델(118)과 마찬가지로, 이러한 모델들은 3차원 공간에서의 기하학적 표면을 표현하는 임의의 수학적 방식을 사용하여 표현될 수 있다. 이러한 모델들은 모델(118)과 분리될 수 있거나 가상 환경의 단일 표현으로 결합될 수 있다.In addition to the arena, the virtual environment may include a variety of other three-dimensional models illustrating the different components of the environment. For example, the 3D environment may include a decoration model 114 , a speaker model 116 , and a presentation screen model 122 . Like models 118, these models can be represented using any mathematical way of representing a geometric surface in three-dimensional space. These models may be separate from model 118 or combined into a single representation of the virtual environment.

모델(114)과 같은 장식 모델들은 사실감을 향상시키고 아레나의 미적 매력을 증가시키는 역할을 한다. 도 5 및 도 7과 관련하여 아래에서 보다 상세히 설명될 것인 바와 같이, 스피커 모델(116)은 프레젠테이션 및 배경 음악과 같은 사운드를 가상적으로 방출할 수 있다. 프레젠테이션 화면 모델(122)은 프레젠테이션을 제시하기 위한 출구를 제공하는 역할을 할 수 있다. 발표자의 비디오 또는 프레젠테이션 화면 공유는 프레젠테이션 화면 모델(122) 상으로 텍스처 매핑될 수 있다.Decorative models, such as model 114, serve to enhance realism and increase the aesthetic appeal of the arena. As will be described in more detail below with respect to FIGS. 5 and 7 , the speaker model 116 can virtually emit sounds such as presentations and background music. The presentation screen model 122 may serve to provide an exit for presenting a presentation. The presenter's video or presentation screen share may be texture mapped onto the presentation screen model 122 .

버튼(108)은 참가자들의 목록을 사용자에게 제공할 수 있다. 일 예에서, 사용자가 버튼(108)을 선택한 후에, 사용자는 개별적으로 또는 그룹으로서 문자 메시지들을 보내는 것에 의해 다른 참가자들과 채팅할 수 있다.Button 108 may present a list of participants to the user. In one example, after the user selects button 108, the user may chat with other participants either individually or as a group by sending text messages.

버튼(110)은 사용자가 인터페이스(100)를 렌더링하는 데 사용되는 가상 카메라의 어트리뷰트들을 변경하는 것을 가능하게 할 수 있다. 예를 들어, 가상 카메라는 디스플레이하기 위해 데이터가 렌더링되는 각도를 지정하는 시야를 가질 수 있다. 카메라 시야 내의 모델링 데이터는 렌더링되는 반면, 카메라의 시야 밖의 모델링 데이터는 렌더링되지 않을 수 있다. 기본적으로, 가상 카메라의 시야는, 광각 렌즈와 인간 시각에 상응하는, 60°와 110° 사이의 어딘가로 설정될 수 있다. 그렇지만, 버튼(110)을 선택하는 것은 가상 카메라로 하여금 어안 렌즈에 상응하는 170°를 초과하도록 시야를 증가시키게 할 수 있다. 이것은 사용자가 가상 환경에서 자신의 주변 환경에 대한 보다 넓은 주변 인식을 갖도록 하는 것을 가능하게 할 수 있다.Button 110 may enable a user to change attributes of the virtual camera used to render interface 100 . For example, a virtual camera may have a field of view that specifies an angle at which data is rendered for display. Modeling data within the camera's field of view may be rendered, whereas modeling data outside the camera's field of view may not be rendered. By default, the virtual camera's field of view can be set to somewhere between 60° and 110°, corresponding to a wide-angle lens and human vision. However, selecting button 110 may cause the virtual camera to increase its field of view beyond 170°, which corresponds to a fisheye lens. This may enable a user to have a wider ambient awareness of their surroundings in a virtual environment.

마지막으로, 버튼(112)은 사용자로 하여금 가상 환경을 종료하게 한다. 버튼(112)을 선택하는 것은 인터페이스(100)를 이전에 보고 있던 사용자에 대응하는 아바타를 디스플레이하는 것을 중단하도록 다른 참가자들의 디바이스들에게 시그널링하는 통지가 다른 참가자들에 속하는 디바이스들에게 송신되게 할 수 있다.Finally, button 112 allows the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to devices belonging to the other participants signaling the devices of the other participants to stop displaying the avatar corresponding to the user who was previously viewing interface 100 . have.

이러한 방식으로, 인터페이스 가상 3D 공간이 화상 회의를 수행하는 데 사용된다. 모든 사용자는 아바타를 제어하는데, 사용자는 이동, 둘러보기, 점프, 또는 위치 또는 방향을 변경하는 다른 일들을 수행하도록 아바타를 제어할 수 있다. 가상 카메라는 가상 3D 환경 및 다른 아바타들을 사용자에게 보여준다. 다른 사용자들의 아바타들은 사용자의 웹캠 이미지를 보여주는 가상 디스플레이를 일체 부분으로서 갖는다.In this way, the interface virtual 3D space is used to conduct video conferences. Every user controls an avatar, and a user can control an avatar to move, look around, jump, or do other things that change position or direction. The virtual camera shows the user a virtual 3D environment and other avatars. Other users' avatars have as an integral part a virtual display showing the user's webcam image.

사용자들에게 공간 감각을 제공하고 사용자들이 서로의 얼굴들을 볼 수 있도록 하는 것에 의해, 실시예들은 종래의 웹 회의 또는 종래의 MMO 게이밍보다 더 많은 사회적 경험을 제공한다. 보다 많은 사회적 경험은 다양한 응용들을 갖는다. 예를 들어, 이는 온라인 쇼핑에서 사용될 수 있다. 예를 들어, 인터페이스(100)는 가상 식료품점, 예배당, 무역 박람회, B2B 판매, B2C 판매, 학교 교육, 레스토랑 또는 구내 식당, 제품 출시, (예를 들면, 건축가, 엔지니어, 계약자를 위한) 건설 현장 방문, 사무실 공간(예를 들면, 사람들이 가상으로 "자신의 책상에서" 일을 함), 원격으로 기계(선박, 차량, 비행기, 잠수함, 드론, 드릴링 장비 등)을 제어하는 것, 플랜트/공장 제어실, 의료 시술, 정원 디자인, 가이드가 있는 가상 버스 투어, 음악 행사(예를 들면, 콘서트), 강의(예를 들면, TED 강연), 정당 회의, 이사회 회의, 수중 연구, 접근하기 어려운 장소에 대한 연구, 비상 사태(예를 들면, 화재)에 대한 훈련, 요리, 쇼핑(계산 및 배달 포함), 가상 예술 및 공예(예를 들면, 그림 및 도자기), 결혼식, 장례식, 세례, 원격 스포츠 훈련, 상담, 두려움 치료(예를 들면, 대면 요법), 패션쇼, 놀이 공원, 가정 장식, 스포츠 관람, e스포츠 관람, 3차원 카메라를 사용하여 캡처되는 공연을 관람하는 것, 보드 및 롤플레잉 게임 플레이, 의료 이미저리 검토, 지질 데이터 보기, 언어 학습, 시각 장애인을 위한 공간에서의 회의, 청각 장애인을 위한 공간에서의 회의 , 일반적으로 걷거나 일어설 수 없는 사람들의 이벤트 참여, 뉴스나 날씨 발표, 토크쇼, 책 사인회, 투표, MMO, (캘리포니아주 샌프란시스코의 Linden Research, Inc.로부터 이용 가능한 SECOND LIFE 게임과 같은 일부 MMO에서 이용 가능한 것과 같은) 가상 위치 구매/판매, 벼룩시장, 차고 판매, 여행사, 은행, 기록 보관소, 컴퓨터 프로세스 관리, 펜싱/검투/무술, 재연(예를 들면, 범죄 현장 및/또는 사고 재연), 실제 이벤트(예를 들면, 결혼식, 프레젠테이션, 쇼, 우주 유영)의 리허설, 3차원 카메라로 캡처된 실제 이벤트의 평가 또는 보기, 가축 쇼, 동물원, 키가 큰 사람/키가 작은 사람/맹인/농아인/백인/흑인의 삶을 경험하는 것(예를 들면, 사용자가 반응을 경험하기를 원하는 관점을 시뮬레이션하기 위한 가상 세계에 대한 수정된 비디오 스트림 또는 정지 이미지), 면접, 게임 쇼, 대화형 소설(예를 들면, 살인 미스터리), 가상 낚시, 가상 항해, 심리 연구, 행동 분석, 가상 스포츠(예를 들면, 등산/볼더링), 집 또는 다른 위치에서의 조명 등의 제어(도모틱스), 기억의 궁전, 고고학, 선물 가게, 고객이 실제 방문 시에 보다 편안하도록 하는 가상 방문, 시술을 설명하고 사람들이 보다 편안하게 느끼게 하는 가상 의료 시술, 및 가상 거래소/금융 시장/주식 시장(예를 들면, 실시간 데이터 및 비디오 피드를 가상 세계에 통합, 실시간 거래 및 분석), 사람들이 실제로 서로 유기적으로 만나도록 업무의 일부로서 가야 하는 가상 위치(예를 들면, 송장을 생성하기를 원하는 경우, 이는 가상 위치 내부에서만 가능함) 및 사람의 얼굴 표정을 볼 수 있도록 AR 헤드셋(또는 헬멧) 상에 사람의 얼굴을 투영하는 증강 현실(예를 들면, 군대, 법 집행, 소방관, 특수 작전에 유용함), 및 (예를 들면, 특정 휴가용 주택/자동차 등에 대한) 예약을 제공하는 데 응용들을 갖는다.By providing users with a sense of space and allowing users to see each other's faces, the embodiments provide a more social experience than conventional web conferencing or conventional MMO gaming. A more social experience has a variety of applications. For example, it can be used in online shopping. For example, interface 100 may be used for virtual grocery stores, places of worship, trade shows, B2B sales, B2C sales, schooling, restaurants or cafeterias, product launches, construction sites (eg, for architects, engineers, contractors). Visits, office spaces (e.g. people virtually work "at their desks"), remotely controlling machines (ships, vehicles, airplanes, submarines, drones, drilling rigs, etc.), plants/plants Control rooms, medical procedures, garden design, guided virtual bus tours, music events (e.g. concerts), lectures (e.g. TED Talks), political party meetings, board meetings, underwater research, inaccessible locations. Research, emergency (eg fire) training, cooking, shopping (including checkout and delivery), virtual arts and crafts (eg painting and pottery), weddings, funerals, baptisms, distance sports training, counseling , fear therapy (eg, face-to-face therapy), fashion shows, amusement parks, home décor, watching sports, watching esports, watching performances captured using 3-D cameras, playing board and role-playing games, medical imagery review, viewing geologic data, language learning, meetings in spaces for the blind, meetings in spaces for the hearing impaired, participation in events for people who cannot normally walk or stand, presenting news or weather, talk shows, book signings, voting ; administration, fencing/swordfighting/martial arts, reenactment (eg crime scene and/or accident reenactment), rehearsal of real events (eg weddings, presentations, shows, spacewalks), real events captured by 3D cameras evaluation or viewing of animals, livestock shows, zoos, experiencing tall/short/blind/deaf/white/black life (e.g. (e.g. modified video streams or still images of virtual worlds to simulate the perspective the user wants to experience a reaction to), interviews, game shows, interactive novels (e.g. murder mysteries), virtual fishing, virtual voyages. , psychological research, behavioral analysis, virtual sports (e.g., mountaineering/bouldering), control of lights at home or other locations (domotics), memory palaces, archeology, gift shops, Virtual visits to provide comfort, virtual medical procedures to explain procedures and make people feel more comfortable, and virtual exchanges/financial markets/stock markets (e.g., incorporating real-time data and video feeds into the virtual world, real-time trading and analysis ), virtual locations that people must go to as part of their work so that they can actually meet each other organically (for example, if you want to create an invoice, this can only be done inside the virtual location), and AR headsets so that you can see people's facial expressions ( or helmet), augmented reality (e.g. useful for military, law enforcement, firefighters, special operations), and providing reservations (e.g. for certain vacation homes/cars, etc.) has applications.

도 2는 화상 회의를 위한 아바타들을 갖는 가상 환경을 렌더링하는 데 사용되는 3차원 모델을 예시하는 다이어그램(200)이다. 도 1에 예시된 바와 같이, 여기서 가상 환경은 3차원 아레나(118), 및 3차원 모델들(114 및 122)을 포함하는 다양한 3차원 모델들을 포함한다. 또한 도 1에 예시된 바와 같이, 다이어그램(200)은 가상 환경 여기저기로 내비게이션하는 아바타들(102A 및 102B)을 포함한다.2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment with avatars for video conferencing. As illustrated in FIG. 1 , the virtual environment here includes a three-dimensional arena 118 and various three-dimensional models including three-dimensional models 114 and 122 . As also illustrated in FIG. 1 , diagram 200 includes avatars 102A and 102B navigating around the virtual environment.

위에서 설명된 바와 같이, 도 1에서의 인터페이스(100)는 가상 카메라의 시점으로부터 렌더링된다. 해당 가상 카메라는 가상 카메라(204)로서 다이어그램(200)에 예시되어 있다. 위에서 언급된 바와 같이, 도 1에서의 사용자 보기 인터페이스(100)는 가상 카메라(204)를 제어하고 3차원 공간에서 가상 카메라를 내비게이션할 수 있다. 인터페이스(100)는 가상 카메라(204)의 새로운 위치 및 가상 카메라(204)의 시야 내의 모델들의 임의의 변경들에 따라 지속적으로 업데이트된다. 위에서 설명된 바와 같이, 가상 카메라(204)의 시야는 수평 시야각 및 수직 시야각에 의해, 적어도 부분적으로, 정의되는 절두체일 수 있다.As described above, the interface 100 in FIG. 1 is rendered from the perspective of a virtual camera. That virtual camera is illustrated in diagram 200 as virtual camera 204 . As mentioned above, the user viewing interface 100 in FIG. 1 can control the virtual camera 204 and navigate the virtual camera in three-dimensional space. The interface 100 is continuously updated according to the new location of the virtual camera 204 and any changes to the models within the field of view of the virtual camera 204 . As described above, the field of view of the virtual camera 204 may be a frustum defined, at least in part, by a horizontal field of view and a vertical field of view.

도 1과 관련하여 위에서 설명된 바와 같이, 배경 이미지 또는 텍스처는 가상 환경의 적어도 일부를 정의할 수 있다. 배경 이미지는 멀리 떨어져 나타나도록 의도된 가상 환경의 측면들을 캡처할 수 있다. 배경 이미지는 구(202) 상으로 매핑되는 텍스처일 수 있다. 가상 카메라(204)는 구(202)의 원점에 있을 수 있다. 이러한 방식으로, 가상 환경의 멀리 떨어진 특징물들이 효율적으로 렌더링될 수 있다.As described above with respect to FIG. 1 , a background image or texture may define at least a portion of the virtual environment. The background image may capture aspects of the virtual environment that are intended to appear at a distance. The background image may be a texture mapped onto the sphere 202 . Virtual camera 204 may be at the origin of sphere 202 . In this way, distant features of the virtual environment can be efficiently rendered.

다른 실시예들에서, 구(202) 대신에 다른 형상들이 배경 이미지를 텍스처 매핑하는 데 사용될 수 있다. 다양한 대안적인 실시예들에서, 형상은 원통, 입방체, 직사각형 프리즘, 또는 임의의 다른 3차원 기하형태일 수 있다.In other embodiments, other shapes may be used to texture map the background image instead of sphere 202 . In various alternative embodiments, the shape may be a cylinder, a cube, a rectangular prism, or any other three-dimensional geometry.

도 3은 가상 환경에서 화상 회의를 제공하는 시스템(300)을 예시하는 다이어그램이다. 시스템(300)은 하나 이상의 네트워크(304)를 통해 디바이스들(306A 및 306B)에 결합되는 서버(302)를 포함한다.3 is a diagram illustrating a system 300 for providing video conferencing in a virtual environment. System 300 includes a server 302 coupled to devices 306A and 306B via one or more networks 304 .

서버(302)는 디바이스(306A)와 디바이스(306B) 사이에 화상 회의 세션을 연결시키기 위한 서비스들을 제공한다. 아래에서 보다 자세히 설명될 것인 바와 같이, 서버(302)는 새로운 참가자들이 회의에 합류할 때 및 기존의 참가자들이 회의에서 나갈 때 회의 참가자들의 디바이스들(예를 들면, 디바이스들(306A 및 306B))에 통지들을 통신한다. 서버(302)는 3차원 가상 공간 내의 각자의 참가자의 가상 카메라들에 대한 3차원 가상 공간에서의 위치 및 방향을 설명하는 메시지들을 통신한다. 서버(302)는 또한 참가자들의 각자의 디바이스들(예를 들면, 디바이스들(306A 및 306B)) 사이에서 비디오 및 오디오 스트림들을 통신한다. 마지막으로, 서버(302)는 3차원 가상 공간을 지정하는 데이터를 설명하는 데이터를 저장하고 각자의 디바이스들(306A 및 306B)에게 전송한다.Server 302 provides services for establishing a video conference session between device 306A and device 306B. As will be described in more detail below, server 302 is responsible for monitoring conference participants' devices (e.g., devices 306A and 306B) as new participants join the conference and as existing participants leave the conference. ) to communicate notifications. The server 302 communicates messages describing the position and orientation in the 3-D virtual space to each participant's virtual cameras within the 3-D virtual space. Server 302 also communicates video and audio streams between the participants' respective devices (eg, devices 306A and 306B). Finally, server 302 stores and transmits data describing data specifying a three-dimensional virtual space to respective devices 306A and 306B.

가상 회의에 필요한 데이터에 추가하여, 서버(302)는 대화형 회의를 제공하기 위해 데이터를 어떻게 렌더링해야 하는지에 대해 디바이스들(306A 및 306B)에게 알려 주는 실행 가능한 정보를 제공할 수 있다.In addition to the data needed for the virtual conference, server 302 may provide actionable information that tells devices 306A and 306B how to render the data to provide an interactive conference.

서버(302)는 응답으로 요청들에 대해 응답한다. 서버(302)는 웹 서버일 수 있다. 웹 서버는 월드 와이드 웹(World Wide Web)을 통해 이루어지는 클라이언트 요청들에 응답하기 위해 HTTP(Hypertext Transfer Protocol) 및 다른 프로토콜들을 사용하는 소프트웨어 및 하드웨어이다. 웹 서버의 주된 일은, 웹 페이지들을 저장하고, 처리하며 사용자들에게 전달하는 것을 통해, 웹사이트 콘텐츠를 디스플레이하는 것이다.Server 302 responds to requests with a response. Server 302 may be a web server. A web server is software and hardware that uses the Hypertext Transfer Protocol (HTTP) and other protocols to respond to client requests made over the World Wide Web. A web server's main job is to display website content by storing, processing and delivering web pages to users.

대안적인 실시예에서, 디바이스들(306A 및 306B) 사이의 통신은 서버(302)를 통하지 않고 피어 투 피어 기반으로 발생한다. 해당 실시예에서, 각자의 참가자들의 위치 및 방향을 설명하는 데이터, 신규 및 기존 참가자들에 관한 통지들, 및 각자의 참가자들의 비디오 및 오디오 스트림들 중 하나 이상이 서버(302)를 통하지 않고 디바이스들(306A 및 306B) 사이에서 직접 통신된다.In an alternative embodiment, communication between devices 306A and 306B occurs on a peer-to-peer basis and not through server 302 . In that embodiment, one or more of data describing the location and orientation of respective participants, notifications regarding new and existing participants, and video and audio streams of respective participants are sent to devices without going through server 302. There is direct communication between 306A and 306B.

네트워크(304)는 다양한 디바이스들(306A 및 306B)과 서버(302) 사이의 통신을 가능하게 한다. 네트워크(304)는 애드혹 네트워크, 인트라넷, 엑스트라넷, VPN(virtual private network), LAN(local area network), WLAN(wireless LAN), WAN(wide area network), WWAN(wireless wide area network), MAN(metropolitan area network), 인터넷의 일 부분, PSTN(Public Switched Telephone Network)의 일 부분, 셀룰러 전화 네트워크, 무선 네트워크, WiFi 네트워크, WiMax 네트워크, 임의의 다른 유형의 네트워크, 또는 2 개 이상의 그러한 네트워크의 임의의 조합일 수 있다.Network 304 enables communication between various devices 306A and 306B and server 302 . The network 304 includes an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a MAN ( metropolitan area network), a portion of the Internet, a portion of a Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any combination of two or more such networks. can be a combination.

디바이스들(306A 및 306B)은 가상 회의의 각자의 참가자들의 각각의 디바이스들이다. 디바이스들(306A 및 306B) 각각은 가상 회의를 수행하는 데 필요한 데이터를 수신하고 가상 회의를 제공하는 데 필요한 데이터를 렌더링한다. 아래에서 보다 자세히 설명될 것인 바와 같이, 디바이스들(306A 및 306B)은 렌더링된 회의 정보를 제시하기 위한 디스플레이, 사용자가 가상 카메라를 제어할 수 있게 하는 입력들, 회의를 위해 사용자에게 오디오를 제공하기 위한 스피커(예컨대, 헤드셋), 사용자의 음성 입력을 캡처하기 위한 마이크로폰, 및 사용자의 얼굴의 비디오를 캡처하기 위해 배치된 카메라를 포함한다.Devices 306A and 306B are respective devices of respective participants of the virtual conference. Devices 306A and 306B each receive data necessary to conduct the virtual conference and render data necessary to provide the virtual conference. As will be described in more detail below, devices 306A and 306B include a display for presenting rendered conferencing information, inputs allowing the user to control a virtual camera, and providing audio to the user for the conferencing. a speaker (eg, headset) for listening, a microphone for capturing the user's voice input, and a camera positioned to capture video of the user's face.

디바이스들(306A 및 306B)은, 랩톱, 데스크톱, 스마트폰, 또는 태블릿 컴퓨터, 또는 웨어러블 컴퓨터(예컨대, 스마트워치 또는 증강 현실 또는 가상 현실 헤드셋)를 포함한, 임의의 유형의 컴퓨팅 디바이스일 수 있다.Devices 306A and 306B may be any type of computing device, including a laptop, desktop, smartphone, or tablet computer, or wearable computer (eg, smartwatch or augmented reality or virtual reality headset).

웹 브라우저(308A 및 308B)는 (통합 자원 로케이터(Uniform Resource Locator) 또는 URL과 같은) 링크 식별자에 의해 주소 지정되는 네트워크 자원(예컨대, 웹 페이지)을 검색하고 디스플레이하기 위해 네트워크 자원을 제시할 수 있다. 상세하게는, 웹 브라우저(308A 및 308B)는 월드 와이드 웹 상의 정보에 액세스하기 위한 소프트웨어 애플리케이션이다. 일반적으로, 웹 브라우저(308A 및 308B)는 하이퍼텍스트 전송 프로토콜(HTTP 또는 HTTPS)을 사용하여 이 요청을 한다. 사용자가 특정 웹사이트에 웹 페이지를 요청할 때, 웹 브라우저는 웹 서버로부터 필요한 콘텐츠를 검색하고, 콘텐츠를 해석하고 실행하며, 이어서 클라이언트/상대방 회의 애플리케이션(308A 및 308B)으로서 도시된 페이지를 디바이스(306A 및 306B) 상의 디스플레이 상에 디스플레이한다. 예들에서, 콘텐츠는 HTML 및, JavaScript와 같은, 클라이언트 측 스크립팅을 가질 수 있다. 일단 디스플레이되면, 사용자는 정보를 입력하고 페이지 상에서 선택들을 할 수 있으며, 이는 웹 브라우저(308A 및 308B)로 하여금 추가 요청들을 하게 할 수 있다.Web browsers 308A and 308B may retrieve and present network resources (eg, web pages) addressed by link identifiers (such as Uniform Resource Locators or URLs) and present network resources for display. . Specifically, web browsers 308A and 308B are software applications for accessing information on the World Wide Web. Typically, web browsers 308A and 308B make this request using a hypertext transfer protocol (HTTP or HTTPS). When a user requests a web page from a particular website, the web browser retrieves the necessary content from the web server, interprets and executes the content, and then sends the page shown as the client/party conferencing application 308A and 308B to the device 306A. and 306B) on the display on the display. In examples, the content may have HTML and client-side scripting, such as JavaScript. Once displayed, the user can enter information and make selections on the page, which can cause web browsers 308A and 308B to make further requests.

회의 애플리케이션(310A 및 310B)은 서버(302)로부터 다운로드되고 각자의 웹 브라우저들(308A 및 308B)에 의해 실행되도록 구성된 웹 애플리케이션일 수 있다. 일 실시예에서, 회의 애플리케이션(310A 및 310B)은 JavaScript 애플리케이션일 수 있다. 일 예에서, 회의 애플리케이션(310A 및 310B)은, Typescript 언어와 같은, 보다 고수준의 언어로 작성되고 JavaScript로 번역되거나 컴파일될 수 있다. 회의 애플리케이션(310A 및 310B)은 WebGL JavaScript 애플리케이션 프로그래밍 인터페이스와 상호작용하도록 구성된다. 이는 JavaScript로 지정되는 제어 코드와 GLSL ES(OpenGL ES Shading Language)로 작성된 셰이더 코드(shader code)를 가질 수 있다. WebGL API를 사용하여, 회의 애플리케이션(310A 및 310B)은 디바이스(306A 및 306B)의 그래픽 처리 유닛(도시되지 않음)을 활용할 수 있다. 더욱이, 플러그인들을 사용하지 않는 대화형 2차원 및 3차원 그래픽의 OpenGL 렌더링.Conferencing applications 310A and 310B may be web applications downloaded from server 302 and configured to be executed by respective web browsers 308A and 308B. In one embodiment, conferencing applications 310A and 310B may be JavaScript applications. In one example, conferencing applications 310A and 310B may be written in a higher level language, such as the Typescript language, and translated or compiled into JavaScript. Conferencing applications 310A and 310B are configured to interact with the WebGL JavaScript application programming interface. It can have control code specified in JavaScript and shader code written in GLSL ES (OpenGL ES Shading Language). Using the WebGL API, conferencing applications 310A and 310B may utilize graphics processing units (not shown) of devices 306A and 306B. Moreover, OpenGL rendering of interactive 2D and 3D graphics without the use of plugins.

회의 애플리케이션(310A 및 310B)은 다른 아바타들의 위치 및 방향을 설명하는 데이터 및 가상 환경을 설명하는 3차원 모델링 정보를 서버(302)로부터 수신한다. 추가적으로, 회의 애플리케이션(310A 및 310B)은 서버(302)로부터 다른 회의 참가자들의 비디오 및 오디오 스트림들을 수신한다.Conferencing applications 310A and 310B receive data describing the positions and orientations of other avatars and three-dimensional modeling information describing the virtual environment from server 302 . Additionally, conferencing applications 310A and 310B receive video and audio streams of other conference participants from server 302 .

회의 애플리케이션(310A 및 310B)은, 3차원 환경을 설명하는 데이터 및 각자의 참가자 아바타들을 표현하는 데이터를 포함한, 3차원 모델링 데이터를 렌더링한다. 이 렌더링은 래스터화, 텍스처 매핑, 광선 추적, 셰이딩, 또는 다른 렌더링 기술들을 수반할 수 있다. 일 실시예에서, 렌더링은 가상 카메라의 특성들에 기초한 광선 추적을 수반할 수 있다. 광선 추적은 이미지 평면에서의 픽셀들로서 광의 경로를 추적하고 가상 객체들과의 조우의 효과들을 시뮬레이션하는 것에 의해 이미지를 생성하는 것을 수반한다. 일부 실시예들에서, 사실감을 향상시키기 위해, 광선 추적은 반사, 굴절, 산란, 및 분산과 같은 광학 효과들을 시뮬레이션할 수 있다.Conferencing applications 310A and 310B render three-dimensional modeling data, including data describing the three-dimensional environment and data representing respective participant avatars. This rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques. In one embodiment, rendering may involve ray tracing based on the properties of the virtual camera. Ray tracing involves creating an image by tracing the path of light as pixels in the image plane and simulating the effects of encounters with virtual objects. In some embodiments, ray tracing can simulate optical effects such as reflection, refraction, scattering, and dispersion to enhance realism.

이러한 방식으로, 사용자는 웹 브라우저(308A 및 308B)를 사용하여 가상 공간에 입장한다. 장면이 사용자의 스크린 상에 디스플레이된다. 사용자의 웹캠 비디오 스트림 및 마이크로폰 오디오 스트림이 서버(302)에게 송신된다. 다른 사용자들이 가상 공간에 입장할 때, 이들에 대한 아바타 모델이 생성된다. 이 아바타의 위치는 서버에게 송신되고 다른 사용자들에 의해 수신된다. 다른 사용자들은 또한 오디오/비디오 스트림이 이용 가능하다는 통지를 서버(302)로부터 받는다. 사용자의 비디오 스트림은 해당 사용자를 위해 생성된 아바타 상에 배치된다. 오디오 스트림은 아바타의 위치로부터 오는 것처럼 재생된다.In this way, users enter the virtual space using web browsers 308A and 308B. A scene is displayed on the user's screen. The user's webcam video stream and microphone audio stream are sent to server 302 . When other users enter the virtual space, avatar models for them are created. The location of this avatar is transmitted to the server and received by other users. Other users also receive notification from server 302 that an audio/video stream is available. A user's video stream is placed on an avatar created for that user. The audio stream is played as if coming from the avatar's position.

도 4a 내지 도 4c는 화상 회의를 제공하기 위해 도 3에서의 시스템의 다양한 컴포넌트들 사이에서 데이터가 어떻게 전송되는지를 예시한다. 도 3과 같이, 도 4a 내지 도 4c의 각각은 서버(302)와 디바이스들(306A 및 306B) 사이의 연결을 묘사한다. 상세하게는, 도 4a 내지 도 4c는 해당 디바이스들 사이의 예시적인 데이터 흐름들을 예시한다.4A-4C illustrate how data is transferred between various components of the system in FIG. 3 to provide video conferencing. Like FIG. 3 , each of FIGS. 4A-4C depicts a connection between server 302 and devices 306A and 306B. In particular, FIGS. 4A-4C illustrate example data flows between the devices.

도 4a는 서버(302)가 가상 환경을 설명하는 데이터를 디바이스들(306A 및 306B)에게 어떻게 전송하는지를 예시하는 다이어그램(400)을 예시한다. 상세하게는, 디바이스들(306A 및 306B) 양쪽 모두는 3차원 아레나(404), 배경 텍스처(402), 공간 계층구조(408) 및 임의의 다른 3차원 모델링 정보(406)를 서버(302)로부터 수신한다.4A illustrates a diagram 400 illustrating how server 302 transmits data describing a virtual environment to devices 306A and 306B. Specifically, devices 306A and 306B both send the 3D arena 404, background texture 402, spatial hierarchy 408 and any other 3D modeling information 406 from server 302. receive

위에서 설명된 바와 같이, 배경 텍스처(402)는 가상 환경의 멀리 떨어진 특징물들을 나타내는 이미지이다. 이미지는 규칙적(예컨대, 벽돌 벽)이거나 불규칙적일 수 있다. 배경 텍스처(402)는, 비트맵, JPEG, GIF, 또는 다른 파일 이미지 포맷과 같은, 임의의 공통 이미지 파일 포맷으로 인코딩될 수 있다. 이는, 예를 들어, 멀리 떨어져 있는 구에 대해 렌더링될 배경 이미지를 설명한다.As described above, background texture 402 is an image representing distant features of the virtual environment. The image may be regular (eg brick wall) or irregular. Background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image format. This describes, for example, a background image to be rendered for a sphere far away.

3차원 아레나(404)는 회의가 개최될 공간의 3차원 모델이다. 위에서 설명된 바와 같이, 이는, 예를 들어, 메시 및 어쩌면 자체 텍스처 정보가 설명하는 3차원 프리미티브들 상에 매핑될 자체 텍스처 정보를 포함할 수 있다. 이는 가상 카메라와 각자의 아바타들이 가상 환경 내에서 내비게이션할 수 있는 공간을 정의할 수 있다. 그에 따라, 이는 내비게이션 가능한 가상 환경의 외곽을 사용자들에게 나타내는 가장자리들(예컨대, 벽들 또는 울타리들)에 의해 경계지어질 수 있다.The 3D arena 404 is a 3D model of the space where the conference will be held. As described above, this may include, for example, its own texture information to be mapped onto the mesh and possibly the three-dimensional primitives it describes. This may define a space in which the virtual camera and respective avatars may navigate within the virtual environment. As such, it may be bounded by edges (eg, walls or fences) that indicate to users the perimeter of the navigable virtual environment.

공간 계층구조(408)는 가상 환경에서의 파티션들을 지정하는 데이터이다. 이러한 파티션들은 참가자들 사이에서 전송되기 전에 사운드가 어떻게 처리되는지를 결정하는 데 사용된다. 아래에서 설명될 것인 바와 같이, 이 파티션 데이터는 계층적일 수 있고, 가상 회의의 참가자들이 사적인 대화 또는 부차적인 대화를 나눌 수 있는 영역들을 가능하게 하기 위한 사운드 처리를 설명할 수 있다.Spatial hierarchy 408 is data specifying partitions in the virtual environment. These partitions are used to determine how the sound is processed before being transmitted between participants. As will be explained below, this partition data can be hierarchical and can describe sound processing to enable areas where participants in a virtual conference can have private or side conversations.

3차원 모델(406)은 회의를 수행하는 데 필요한 임의의 다른 3차원 모델링 정보이다. 일 실시예에서, 이것은 각자의 아바타들을 설명하는 정보를 포함할 수 있다. 대안적으로 또는 추가적으로, 이 정보는 제품 시연들을 포함할 수 있다.The 3D model 406 is any other 3D modeling information needed to conduct the meeting. In one embodiment, this may include information describing the respective avatars. Alternatively or additionally, this information may include product demonstrations.

회의를 수행하는 데 필요한 정보가 참가자들에게 송신되면, 도 4b 및 도 4c는 서버(302)가 하나의 디바이스로부터 다른 디바이스로 어떻게 정보를 전달하는지를 예시한다. 도 4b는 서버(302)가 각자의 디바이스들(306A 및 306B)로부터 어떻게 정보를 수신하는지를 보여주는 다이어그램(420)을 예시하고, 도 4c는 서버(302)가 정보를 각자의 디바이스들(306B 및 306A)에게 어떻게 전송하는지를 보여주는 다이어그램(420)을 예시한다. 상세하게는, 디바이스(306A)는 위치 및 방향(422A), 비디오 스트림(424A), 및 오디오 스트림(426A)을 서버(302)에게 전송하고, 서버(302)는 위치 및 방향(422A), 비디오 스트림(424A), 및 오디오 스트림(426A)을 디바이스(306B)에게 전송한다. 그리고 디바이스(306B)는 위치 및 방향(422B), 비디오 스트림(424B), 및 오디오 스트림(426B)을 서버(302)에게 전송하고, 서버(302)는 위치 및 방향(422B), 비디오 스트림(424B), 및 오디오 스트림(426B)을 디바이스(306A)에게 전송한다.Once the information needed to conduct the conference is transmitted to the participants, FIGS. 4B and 4C illustrate how the server 302 transfers the information from one device to another. 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and 306B, and FIG. 4C illustrates server 302 sending information to respective devices 306B and 306A. ) illustrates a diagram 420 showing how to transmit to. Specifically, device 306A sends position and direction 422A, video stream 424A, and audio stream 426A to server 302, and server 302 sends position and direction 422A, video stream 424A, and audio stream 426A to device 306B. And device 306B sends position and direction 422B, video stream 424B, and audio stream 426B to server 302, which server 302 sends position and direction 422B, video stream 424B. ), and audio stream 426B to device 306A.

위치 및 방향(422A 및 422B)은 디바이스(306A)를 사용하는 사용자에 대한 가상 카메라의 위치 및 방향을 설명한다. 위에서 설명된 바와 같이, 위치는 3차원 공간에서의 좌표(예를 들면, x, y, z 좌표)일 수 있고, 방향은 3차원 공간에서의 방향(예를 들어, 팬, 틸트, 롤(roll))일 수 있다. 일부 실시예들에서, 사용자는 가상 카메라의 롤을 제어할 수 없을 수 있으며, 따라서 방향은 팬 각도 및 틸트 각도만을 지정할 수 있다. 유사하게, 일부 실시예들에서, 사용자는 (아바타가 가상 중력에 의해 제한되기 때문에) 아바타의 z 좌표를 변경할 수 없을 수 있으며, z 좌표가 불필요할 수 있다. 이러한 방식으로, 위치 및 방향(422A 및 422B) 각각은 적어도 3차원 가상 공간에서의 수평 평면 상의 좌표 및 팬 및 틸트 값을 포함할 수 있다. 대안적으로 또는 추가적으로, 사용자는 자신의 아바타를 "점프"시킬 수 있으며, 따라서 Z 위치는 사용자가 자신의 아바타를 점프시키는지 여부의 표시에 의해서만 지정될 수 있다.Position and orientation 422A and 422B describe the position and orientation of the virtual camera relative to the user using device 306A. As described above, a position can be a coordinate in 3-dimensional space (eg, x, y, z coordinates), and a direction can be a direction in 3-dimensional space (eg, pan, tilt, roll). )) can be. In some embodiments, the user may not be able to control the roll of the virtual camera, so the direction can only specify the pan and tilt angles. Similarly, in some embodiments, the user may not be able to change the avatar's z-coordinate (because the avatar is constrained by virtual gravity), and the z-coordinate may be unnecessary. In this way, each of position and orientation 422A and 422B may include coordinates and pan and tilt values on a horizontal plane in at least a three-dimensional virtual space. Alternatively or additionally, the user may "jump" his or her avatar, so the Z position may be specified only by an indication of whether or not the user jumps their avatar.

상이한 예들에서, 위치 및 방향(422A 및 422B)은 HTTP 요청 응답들을 사용하여 또는 소켓 메시징을 사용하여 전송 및 수신될 수 있다.In different examples, location and direction 422A and 422B may be sent and received using HTTP request responses or using socket messaging.

비디오 스트림(424A 및 424B)은 각자의 디바이스들(306A 및 306B)의 카메라로부터 캡처되는 비디오 데이터이다. 비디오는 압축될 수 있다. 예를 들어, 비디오는, MPEG-4, VP8, 또는 H.264를 포함한, 임의의 통상적으로 알려진 비디오 코덱들을 사용할 수 있다. 비디오는 실시간으로 캡처되어 전송될 수 있다.Video streams 424A and 424B are video data captured from the cameras of respective devices 306A and 306B. Video can be compressed. For example, the video may use any commonly known video codecs, including MPEG-4, VP8, or H.264. Video can be captured and transmitted in real time.

유사하게, 오디오 스트림(426A 및 426B)은 각자의 디바이스들의 마이크로폰으로부터 캡처되는 오디오 데이터이다. 오디오는 압축될 수 있다. 예를 들어, 오디오는, MPEG-4 또는 vorbis를 포함한, 임의의 통상적으로 알려진 오디오 코덱들을 사용할 수 있다. 오디오는 실시간으로 캡처되어 전송될 수 있다. 비디오 스트림(424A) 및 오디오 스트림(426A)은 서로 동기하여 캡처되고, 전송되며, 제시된다. 유사하게, 비디오 스트림(424B) 및 오디오 스트림(426B)은 서로 동기하여 캡처되고, 전송되며, 제시된다.Similarly, audio streams 426A and 426B are audio data captured from the microphones of the respective devices. Audio can be compressed. For example, audio may use any commonly known audio codecs, including MPEG-4 or vorbis. Audio can be captured and transmitted in real time. Video stream 424A and audio stream 426A are captured, transmitted, and presented synchronously with each other. Similarly, video stream 424B and audio stream 426B are captured, transmitted, and presented synchronously with each other.

비디오 스트림(424A 및 424B) 및 오디오 스트림(426A 및 426B)은 WebRTC 애플리케이션 프로그래밍 인터페이스를 사용하여 전송될 수 있다. WebRTC는 JavaScript에서 이용 가능한 API이다. 위에서 설명된 바와 같이, 디바이스들(306A 및 306B)은, 회의 애플리케이션들(310A 및 310B)로서, 웹 애플리케이션들을 다운로드하고 실행하며, 회의 애플리케이션들(310A 및 310B)은 JavaScript로 구현될 수 있다. 회의 애플리케이션들(310A 및 310B)은 자신의 JavaScript로부터 API 호출들을 수행하는 것에 의해 비디오 스트림(424A 및 424B) 및 오디오 스트림(426A 및 426B)을 수신 및 전송하기 위해 WebRTC를 사용할 수 있다.Video streams 424A and 424B and audio streams 426A and 426B may be transmitted using a WebRTC application programming interface. WebRTC is an API available in JavaScript. As described above, devices 306A and 306B download and run web applications, as conferencing applications 310A and 310B, which may be implemented in JavaScript. Conferencing applications 310A and 310B may use WebRTC to receive and transmit video streams 424A and 424B and audio streams 426A and 426B by making API calls from their JavaScript.

위에서 언급된 바와 같이, 사용자가 가상 회의에서 나갈 때, 이 퇴장이 모든 다른 사용자들에게 통신된다. 예를 들어, 디바이스(306A)가 가상 회의에서 빠져나가는 경우, 서버(302)는 해당 퇴장을 디바이스(306B)에게 통신할 것이다. 결과적으로, 디바이스(306B)는 디바이스(306A)에 대응하는 아바타를 렌더링하는 것을 중지하여, 가상 공간으로부터 아바타를 제거할 것이다. 추가적으로, 디바이스(306B)는 비디오 스트림(424A) 및 오디오 스트림(426A)을 수신하는 것을 중단할 것이다.As mentioned above, when a user leaves a virtual meeting, this exit is communicated to all other users. For example, if device 306A leaves the virtual meeting, server 302 will communicate that exit to device 306B. As a result, device 306B will stop rendering the avatar corresponding to device 306A, removing the avatar from the virtual space. Additionally, device 306B will cease receiving video stream 424A and audio stream 426A.

위에서 설명된 바와 같이, 회의 애플리케이션들(310A 및 310B)은 각자의 비디오 스트림들(424A 및 424B), 위치 및 방향(422A 및 422B), 및 3차원 환경에 관한 새로운 정보에 기초하여 가상 공간을 주기적으로 또는 간헐적으로 재렌더링할 수 있다. 단순함을 위해, 이러한 업데이트들 각각이 이제 디바이스(306A)의 관점에서 설명된다. 그렇지만, 통상의 기술자는 유사한 변경들이 주어지면 디바이스(306B)가 유사하게 거동할 것임을 이해할 것이다.As described above, conferencing applications 310A and 310B periodically navigate the virtual space based on new information about respective video streams 424A and 424B, position and orientation 422A and 422B, and the three-dimensional environment. or intermittently re-render. For simplicity, each of these updates is now described from the perspective of device 306A. However, the skilled person will understand that device 306B will behave similarly given similar changes.

디바이스(306A)가 비디오 스트림(424B)을 수신할 때, 디바이스(306A)는 비디오 스트림(424A)으로부터의 프레임들을 디바이스(306B)에 대응하는 아바타 상으로 텍스처 매핑한다. 해당 텍스처 매핑된 아바타는 3차원 가상 공간 내에서 재렌더링되고 디바이스(306A)의 사용자에게 제시된다.When device 306A receives video stream 424B, device 306A texture maps frames from video stream 424A onto an avatar corresponding to device 306B. The corresponding texture mapped avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306A.

디바이스(306A)가 새로운 위치 및 방향(422B)을 수신할 때, 디바이스(306A)는 새로운 위치에 배치되고 새로운 방향으로 배향된 디바이스(306B)에 대응하는 아바타를 생성한다. 생성된 아바타는 3차원 가상 공간 내에서 재렌더링되고 디바이스(306A)의 사용자에게 제시된다.When device 306A receives the new location and orientation 422B, device 306A creates an avatar corresponding to device 306B placed in the new location and oriented in the new orientation. The created avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306A.

일부 실시예들에서, 서버(302)는 3차원 가상 환경을 설명하는 업데이트된 모델 정보를 송신할 수 있다. 예를 들어, 서버(302)는 업데이트된 정보(402, 404, 406, 또는 408)를 송신할 수 있다. 그러한 일이 발생할 때, 디바이스(306A)는 업데이트된 정보에 기초하여 가상 환경을 재렌더링할 것이다. 이것은 시간이 지남에 따라 환경이 변할 때 유용할 수 있다. 예를 들어, 야외 이벤트는 이벤트가 진행됨에 따라 낮으로부터 해질녘으로 바뀔 수 있다.In some embodiments, server 302 may transmit updated model information describing the three-dimensional virtual environment. For example, server 302 may transmit updated information 402 , 404 , 406 , or 408 . When that happens, device 306A will re-render the virtual environment based on the updated information. This can be useful when circumstances change over time. For example, an outdoor event may change from daytime to dusk as the event progresses.

다시 말하지만, 디바이스(306B)가 가상 회의에서 빠져나갈 때, 서버(302)는 디바이스(306B)가 더 이상 회의에 참여하고 있지 않음을 나타내는 통지를 디바이스(306A)에게 송신한다. 그 경우에, 디바이스(306A)는 디바이스(306B)에 대한 아바타를 갖지 않는 가상 환경을 재렌더링할 것이다.Again, when device 306B exits the virtual conference, server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In that case, device 306A will re-render the virtual environment without an avatar for device 306B.

도 3 및 도 4a 내지 도 4c가 단순함을 위해 2 개의 디바이스로 예시되어 있지만, 통상의 기술자는 본 명세서에서 설명되는 기술들이 임의의 수의 디바이스들로 확장될 수 있음을 이해할 것이다. 또한, 도 3 및 도 4a 내지 도 4c가 단일 서버(302)를 예시하지만, 통상의 기술자는 서버(302)의 기능이 복수의 컴퓨팅 디바이스들 간에 분산될 수 있음을 이해할 것이다. 일 실시예에서, 도 4a에서 전송되는 데이터는 서버(302)에 대한 하나의 네트워크 주소로부터 올 수 있는 반면, 도 4b 및 도 4c에서 전송되는 데이터는 서버(302)에 대한 다른 네트워크 주소로/로부터 전송될 수 있다.Although FIGS. 3 and 4A-4C are illustrated with two devices for simplicity, one skilled in the art will appreciate that the techniques described herein may be extended to any number of devices. Further, while FIGS. 3 and 4A-4C illustrate a single server 302 , those skilled in the art will understand that the functionality of server 302 may be distributed among multiple computing devices. In one embodiment, the data transmitted in FIG. 4A may come from one network address for server 302, while the data transmitted in FIGS. 4B and 4C are to/from another network address for server 302. can be transmitted

일 실시예에서, 참가자들은 가상 회의에 입장하기 전에 자신의 웹캠, 마이크로폰, 스피커들 및 그래픽 설정들을 설정할 수 있다. 대안적인 실시예에서, 애플리케이션을 시작한 후에, 사용자들은 가상 로비에 들어갈 수 있으며, 거기에서 실제 사람에 의해 제어되는 아바타의 인사를 받는다. 이 사람은 사용자의 웹캠, 마이크로폰, 스피커들 및 그래픽 설정들을 보고 수정할 수 있다. 안내원은 또한, 예를 들어, 보기, 이동 및 상호작용에 관해 사용자들에게 가르치는 것에 의해 가상 환경을 어떻게 사용하는지에 대해 사용자에게 알려 줄 수 있다. 준비가 될 때, 사용자는 자동으로 가상 대기실에서 나와서 실제 가상 환경에 합류한다.In one embodiment, participants may set their webcam, microphone, speakers and graphics settings prior to entering the virtual meeting. In an alternative embodiment, after launching the application, users may enter a virtual lobby, where they are greeted by an avatar controlled by a real person. This person can view and modify the user's webcam, microphone, speakers and graphics settings. The guide may also inform the user how to use the virtual environment, for example by teaching the user about viewing, moving and interacting. When ready, the user automatically exits the virtual waiting room and joins the real virtual environment.

가상 환경에서의 화상 회의를 위한 볼륨의 조정Adjustment of volume for video conference in virtual environment

실시예들은 또한 가상 회의 내에서의 위치 및 공간의 감각을 제공하기 위해 볼륨을 조정한다. 이는, 예를 들어, 도 5 내지 도 7, 도 8a 및 도 8b 및 도 9a 내지 도 9c에 예시되어 있으며, 이들 각각은 아래에 설명되어 있다.Embodiments also adjust the volume to provide a sense of position and space within the virtual meeting. This is illustrated, for example, in FIGS. 5-7 , 8A and 8B and 9A-9C , each of which is described below.

도 5는 화상 회의 동안 가상 환경에서 위치 감각을 제공하기 위해 상대적인 좌우 볼륨을 조정하기 위한 방법(500)을 예시하는 플로차트이다.5 is a flowchart illustrating a method 500 for adjusting relative left and right volumes to provide a sense of position in a virtual environment during a video conference.

단계(502)에서, 아바타들 사이의 거리에 기초하여 볼륨이 조정된다. 위에서 설명된 바와 같이, 다른 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 제2 위치와 제1 위치 사이의 거리에 기초하여 제1 오디오 스트림과 제2 오디오 스트림 양쪽 모두의 볼륨이 조정된다. 이것은 도 6에 예시되어 있다.In step 502, the volume is adjusted based on the distance between the avatars. As described above, an audio stream from the microphone of another user's device is received. Volumes of both the first audio stream and the second audio stream are adjusted based on the distance between the second location and the first location. This is illustrated in FIG. 6 .

도 6은 아바타들 사이의 거리가 증가함에 따라 볼륨이 어떻게 롤오프되는지를 예시하는 차트(600)를 도시한다. 차트(600)는 x 축 및 y 축에 볼륨(602)을 예시한다. 사용자들 사이의 거리가 증가함에 따라, 볼륨은 기준 거리(602)에 도달할 때까지 일정하게 유지된다. 그 시점에서, 볼륨이 하강하기 시작한다. 이러한 방식으로, 모든 다른 조건이 동일하다면, 보다 가까운 사용자는 보다 멀리 있는 사용자보다 종종 더 크게 들릴 것이다.6 shows a chart 600 illustrating how the volume rolls off as the distance between avatars increases. Chart 600 illustrates volume 602 on the x-axis and y-axis. As the distance between users increases, the volume remains constant until a reference distance 602 is reached. At that point, the volume starts to drop. In this way, all other things being equal, a closer user will often be heard louder than a more distant user.

사운드가 얼마나 빨리 하강하는지는 롤 오프 율(roll off factor)에 의존한다. 이것은 화상 회의 시스템 또는 클라이언트 디바이스의 설정들에 내장된 계수일 수 있다. 라인(608) 및 라인(610)에 의해 예시된 바와 같이, 보다 큰 롤 오프 율은 보다 작은 롤 오프 율보다 볼륨을 더 빠르게 떨어뜨릴 것이다.How fast the sound descends depends on the roll off factor. This may be a coefficient built into the settings of the video conferencing system or client device. As illustrated by lines 608 and 610, a larger roll off rate will drop the volume faster than a smaller roll off rate.

도 5로 돌아가면, 단계(504)에서, 아바타가 위치하는 방향에 기초하여 상대적인 좌우 오디오가 조정된다. 즉, 말하는 사용자의 아바타가 어디에 위치하는지의 감각을 제공하기 위해 사용자의 스피커(예를 들면, 헤드셋)에서 출력되는 오디오의 볼륨이 변할 것이다. 오디오를 수신하는 사용자가 위치하는 위치(예를 들면, 가상 카메라의 위치)를 기준으로 오디오 스트림을 생성하는 사용자가 위치하는 위치(예를 들어, 말하는 사용자의 아바타의 위치)의 방향에 기초하여 좌우 오디오 스트림들의 상대적인 볼륨이 조정된다. 위치들은 3차원 가상 공간 내의 수평 평면에 있을 수 있다. 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하기 위해 좌측 오디오 스트림과 우측 오디오 스트림의 상대적인 볼륨이 조정된다.Returning to Figure 5, in step 504, the relative left and right audio is adjusted based on the direction in which the avatar is located. That is, the volume of the audio output from the user's speaker (eg headset) will vary to provide a sense of where the speaking user's avatar is located. Based on the direction of the location where the user receiving the audio is located (for example, the location of the virtual camera) and the location where the user generating the audio stream is located (for example, the location of the speaking user's avatar) The relative volume of the audio streams is adjusted. The locations may be on a horizontal plane within a three-dimensional virtual space. The relative volumes of the left and right audio streams are adjusted to provide a sense of where the second position is relative to the first position in three-dimensional virtual space.

예를 들어, 단계(504)에서, 오디오가 수신하는 사용자의 좌측 귀에서 우측 귀에서보다 더 높은 볼륨으로 출력되도록 가상 카메라의 좌측에 있는 아바타에 대응하는 오디오가 조정될 것이다. 유사하게, 오디오가 수신하는 사용자의 우측 귀에서 좌측 귀에서보다 더 높은 볼륨으로 출력되도록 가상 카메라의 우측에 있는 아바타에 대응하는 오디오가 조정될 것이다.For example, at step 504, the audio corresponding to the avatar to the left of the virtual camera will be adjusted so that the audio is output at a higher volume in the receiving user's left ear than in the right ear. Similarly, the audio corresponding to the avatar to the right of the virtual camera will be adjusted so that the audio is output at a higher volume in the receiving user's right ear than in the left ear.

단계(506)에서, 상대적인 좌우 오디오는 하나의 아바타가 다른 아바타를 기준으로 배향되는 방향에 기초하여 조정된다. 가상 카메라가 향하고 있는 방향과 아바타가 향하고 있는 방향 사이의 각도에 기초하여, 이 각도가 더 수직인 것이 좌측 오디오 스트림과 우측 오디오 스트림 사이의 더 큰 볼륨 차이를 갖는 경향이 있도록, 좌측 오디오 스트림과 우측 오디오 스트림의 상대적인 볼륨이 조정된다.In step 506, the relative left-right audio is adjusted based on the direction in which one avatar is oriented relative to the other avatar. Based on the angle between the direction the virtual camera is facing and the direction the avatar is facing, such that the more perpendicular this angle tends to have a larger volume difference between the left and right audio streams, the left audio stream and the right audio stream The relative volume of the audio stream is adjusted.

예를 들어, 아바타가 가상 카메라를 바로 마주하고 있을 때, 단계(506)에서 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 전혀 조정되지 않을 수 있다. 아바타가 가상 카메라의 좌측을 향하고 있을 때, 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 좌측이 우측보다 더 크도록 조정될 수 있다. 그리고, 아바타가 가상 카메라의 우측을 향하고 있을 때, 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 우측이 좌측보다 더 크도록 조정될 수 있다.For example, when the avatar is directly facing the virtual camera, the relative left and right volumes of the avatar's corresponding audio streams at step 506 may not be adjusted at all. When the avatar is facing the left side of the virtual camera, the relative left and right volumes of the avatar's corresponding audio streams can be adjusted such that the left side is greater than the right side. And, when the avatar is facing the right side of the virtual camera, the relative left and right volumes of the corresponding audio streams of the avatar may be adjusted such that the right side is greater than the left side.

일 예에서, 단계(506)에서의 계산은 가상 카메라가 향하고 있는 각도와 아바타가 향하고 있는 각도의 외적을 취하는 것을 수반할 수 있다. 각도들은 수평 평면에서 이들이 향하고 있는 방향일 수 있다.In one example, the calculation at step 506 may involve taking the cross product of the angle at which the virtual camera is facing and the angle at which the avatar is facing. The angles can be the direction they are facing in the horizontal plane.

일 실시예에서, 사용자가 사용하고 있는 오디오 출력 디바이스를 결정하기 위해 검사가 수행될 수 있다. 오디오 출력 디바이스가 스테레오 효과를 제공하는 헤드폰 세트 또는 다른 유형의 스피커가 아닌 경우, 단계(504 및 506)에서의 조정들이 발생하지 않을 수 있다.In one embodiment, a check may be performed to determine the audio output device the user is using. Adjustments in steps 504 and 506 may not occur if the audio output device is not a set of headphones or other type of speaker that provides a stereo effect.

단계들(502 내지 506)은 모든 다른 참가자로부터 수신되는 모든 오디오 스트림에 대해 반복된다. 단계들(502 내지 506)에서의 계산들에 기초하여, 모든 다른 참가자에 대해 좌측 및 우측 오디오 이득이 계산된다.Steps 502 to 506 are repeated for every audio stream received from every other participant. Based on the calculations in steps 502 to 506, left and right audio gains are calculated for every other participant.

이러한 방식으로, 각각의 참가자에 대한 오디오 스트림들은 참가자의 아바타가 3차원 가상 환경에서 어디에 위치하는지의 감각을 제공하도록 조정된다.In this way, the audio streams for each participant are tailored to provide a sense of where the participant's avatar is located in the three-dimensional virtual environment.

오디오 스트림은 아바타들이 어디에 위치하는지의 감각을 제공하도록 조정될 뿐만 아니라, 특정 실시예들에서, 오디오 스트림은 사적인 또는 준사적인(semi-private) 볼륨 영역들을 제공하도록 조정될 수 있다. 이러한 방식으로, 가상 환경은 사용자들이 사적인 대화를 나눌 수 있게 한다. 또한, 이는 사용자들이 서로 어울릴 수 있게 하고, 종래의 화상 회의 소프트웨어에서는 가능하지 않은 개별적이고 부차적인 대화를 할 수 있게 한다. 이것은, 예를 들어, 도 7과 관련하여 예시된다.Not only is the audio stream adjusted to provide a sense of where the avatars are located, but in certain embodiments, the audio stream can be adjusted to provide private or semi-private volume regions. In this way, virtual environments allow users to have private conversations. It also allows users to mingle with each other and have individual, side conversations that are not possible with conventional videoconferencing software. This is illustrated, for example, with respect to FIG. 7 .

도 7은 화상 회의 동안 가상 환경에서 상이한 볼륨 영역들을 제공하기 위해 상대적인 볼륨을 조정하기 위한 방법(700)을 예시하는 플로차트이다.7 is a flowchart illustrating a method 700 for adjusting relative volume to present different volume areas in a virtual environment during a video conference.

위에서 설명된 바와 같이, 서버는 클라이언트 디바이스들에게 사운드 또는 볼륨 영역들의 사양을 제공할 수 있다. 가상 환경은 상이한 볼륨 영역들로 분할될 수 있다. 단계(702)에서, 디바이스는 각자의 아바타들 및 가상 카메라가 어느 사운드 영역들에 위치하는지를 결정한다.As described above, the server may provide the client devices with a specification of sound or volume fields. A virtual environment can be partitioned into different volume areas. In step 702, the device determines in which sound areas the respective avatars and virtual camera are located.

예를 들어, 도 8a 및 도 8b는 화상 회의 동안 가상 환경에서의 상이한 볼륨 영역들을 예시하는 다이어그램들이다. 도 8a는 아바타(806)를 제어하는 사용자와 가상 카메라를 제어하는 사용자 사이의 준사적이거나 부차적인 대화를 가능하게 하는 볼륨 영역(802)을 갖는 다이어그램(800)을 예시한다. 이러한 방식으로, 회의 테이블(810) 주위에 있는 사용자들은 방에 있는 다른 사람들을 방해하지 않고 대화를 나눌 수 있다. 가상 카메라에서의 아바타(806)를 제어하는 사용자로부터의 사운드는 가상 카메라가 볼륨 영역(802)에서 빠져나갈 때 떨어질 수 있지만 완전히 그러한 것은 아니다. 이것은 지나던 사람이 원하는 경우 대화에 합류할 수 있게 한다.For example, FIGS. 8A and 8B are diagrams illustrating different volume areas in a virtual environment during a videoconference. 8A illustrates a diagram 800 with a volume area 802 enabling quasi- or sub-conversation between a user controlling an avatar 806 and a user controlling a virtual camera. In this way, users around the conference table 810 can converse without disturbing others in the room. The sound from the user controlling the avatar 806 in the virtual camera may drop out as the virtual camera exits the volume area 802, but not completely. This allows passers-by to join the conversation if they wish.

인터페이스(800)는 아래에서 설명될 버튼들(804, 806, 및 808)을 또한 포함한다.Interface 800 also includes buttons 804, 806, and 808, which will be described below.

도 8b는 아바타(808)를 제어하는 사용자와 가상 카메라를 제어하는 사용자 사이의 사적인 대화를 가능하게 하는 볼륨 영역(804)을 갖는 다이어그램(800)을 예시한다. 일단 볼륨 영역(804) 내부에 들어가면, 아바타(808)를 제어하는 사용자 및 가상 카메라를 제어하는 사용자로부터의 오디오는 볼륨 영역(804) 내부에 있는 사람들에게만 출력될 수 있다. 해당 사용자들로부터의 오디오가 회의 내의 다른 사용자들에게 전혀 재생되지 않기 때문에, 해당 사용자들의 오디오 스트림들이 다른 사용자 디바이스들에게도 전혀 전송되지 않을 수 있다.8B illustrates a diagram 800 with a volume area 804 enabling private conversation between a user controlling an avatar 808 and a user controlling a virtual camera. Once inside the volume area 804 , audio from the user controlling the avatar 808 and the user controlling the virtual camera may only be output to those inside the volume area 804 . Because audio from those users is never played to other users in the conference, their audio streams may not be transmitted to other user devices at all.

볼륨 공간들은 도 9a 및 도 9b에 예시된 바와 같이 계층적일 수 있다. 도 9b는 계층구조로 배열된 상이한 볼륨 영역들을 갖는 레이아웃을 도시하는 다이어그램(930)이다. 볼륨 영역들(934 및 935)은 볼륨 영역(933) 내에 있고 볼륨 영역들(933 및 932)은 볼륨 영역(931) 내에 있다. 이러한 볼륨 영역들은, 다이어그램(900) 및 도 9a에 예시된 바와 같이, 계층적 트리로 표현된다.Volume spaces may be hierarchical as illustrated in FIGS. 9A and 9B. FIG. 9B is a diagram 930 illustrating a layout with different volume areas arranged in a hierarchical structure. Volume areas 934 and 935 are within volume area 933 and volume areas 933 and 932 are within volume area 931 . These volume areas are represented in a hierarchical tree, as illustrated in diagram 900 and FIG. 9A.

다이어그램(900)에서, 노드(901)는 볼륨 영역(931)을 나타내고 트리의 루트이다. 노드들(902 및 903)은 노드(901)의 자식들이며, 볼륨 영역들(932 및 933)을 나타낸다. 노드들(904 및 906)은 노드(903)의 자식들이며, 볼륨 영역들(934 및 935)을 나타낸다.In diagram 900, node 901 represents volume area 931 and is the root of the tree. Nodes 902 and 903 are children of node 901 and represent volume areas 932 and 933 . Nodes 904 and 906 are children of node 903 and represent volume areas 934 and 935 .

영역(934)에 위치하는 사용자가 영역(932)에 위치하는 사용자의 말을 듣고자 하는 경우, 오디오 스트림은, 오디오 스트림을 각각 감쇠시키는, 다수의 상이한 가상 "벽들"을 통과해야 한다. 상세하게는, 사운드는 영역(932)의 벽, 영역(933)의 벽, 및 영역(934)의 벽을 통과해야 한다. 각각의 벽은 특정 인자에 의해 감쇠된다. 이 계산은 도 7에서의 단계들(704 및 706)과 관련하여 설명된다.If the user located in area 934 wants to hear the user located in area 932, the audio stream must pass through a number of different virtual "walls", each attenuating the audio stream. Specifically, the sound must pass through the walls of area 932, the walls of area 933, and the walls of area 934. Each wall is attenuated by a specific factor. This calculation is explained with respect to steps 704 and 706 in FIG. 7 .

단계(704)에서, 아바타들 사이에 어떤 다양한 사운드 영역들이 있는지를 결정하기 위해 계층구조가 순회된다. 이것은, 예를 들어, 도 9c에 예시되어 있다. 말소리(speaking voice)의 가상 영역에 대응하는 노드(이 경우에 노드(904))로부터 시작하여, 수신하는 사용자의 노드(이 경우에, 노드(902))로의 경로가 결정된다. 경로를 결정하기 위해, 노드에서 노드로 가는 링크들(952)이 결정된다. 이러한 방식으로, 아바타를 포함하는 영역과 가상 카메라를 포함하는 영역 사이의 영역 서브세트가 결정된다.At step 704, the hierarchy is traversed to determine what various sound regions are among the avatars. This is illustrated, for example, in FIG. 9C. Starting from the node corresponding to the virtual region of the speaking voice (node 904 in this case), a route is determined to the receiving user's node (node 902 in this case). To determine the path, links 952 from node to node are determined. In this way, a subset of the area between the area containing the avatar and the area containing the virtual camera is determined.

단계(706)에서, 말하는 사용자로부터의 오디오 스트림이 영역 서브세트의 각자의 벽 투과 율들에 기초하여 감쇠된다. 각각의 각자의 벽 투과 율은 오디오 스트림이 얼마나 감쇠되는지를 지정한다.In step 706, the audio stream from the speaking user is attenuated based on the respective wall transmittances of the region subset. Each respective wall transmittance specifies how much the audio stream is attenuated.

추가적으로 또는 대안적으로, 그 경우에 상이한 영역들은 상이한 롤 오프 율들을 가지며, 각자의 롤 오프 율들에 기초하여 방법(600)에 도시된 거리 기반 계산이 개별 영역들에 적용될 수 있다. 이러한 방식으로, 가상 환경의 상이한 영역들은 상이한 레이트들로 사운드를 투사한다. 도 5와 관련하여 위에서 설명된 바와 같은 방법에서 결정되는 오디오 이득들이 그에 따라 좌측 및 우측 오디오를 결정하기 위해 오디오 스트림에 적용될 수 있다. 이러한 방식으로, 포괄적인 오디오 경험을 제공하기 위해 벽 투과 율, 롤 오프 율, 및 사운드의 방향 감각을 제공하기 위한 좌우 조정 모두가 함께 적용될 수 있다.Additionally or alternatively, in that case different areas have different roll off rates, and the distance-based calculation shown in method 600 may be applied to the individual areas based on the respective roll off rates. In this way, different areas of the virtual environment project sound at different rates. The audio gains determined in the method as described above with respect to FIG. 5 may be applied to the audio stream to determine the left and right audio accordingly. In this way, wall transmittance, roll off rate, and side-to-side adjustment to provide a sense of direction of sound can all be applied together to provide a comprehensive audio experience.

상이한 오디오 영역들은 상이한 기능을 가질 수 있다. 예를 들어, 볼륨 영역은 연단 영역일 수 있다. 사용자가 연단 영역에 위치하는 경우, 도 5 또는 도 7과 관련하여 설명되는 감쇠의 일부 또는 전부가 발생하지 않을 수 있다. 예를 들어, 롤 오프 율 또는 벽 투과 율로 인해 감쇠가 발생하지 않을 수 있다. 일부 실시예들에서, 상대적인 좌우 오디오는 여전히 방향 감각을 제공하도록 조정될 수 있다.Different audio regions may have different functions. For example, the volume area may be a podium area. When the user is located in the podium area, some or all of the attenuation described with respect to FIG. 5 or FIG. 7 may not occur. Attenuation may not occur, for example, due to roll off ratio or wall permeability. In some embodiments, the relative left and right audio can still be adjusted to provide a sense of direction.

예시 목적으로, 도 5 및 도 7과 관련하여 설명되는 방법들은 대응하는 아바타를 갖는 사용자로부터의 오디오 스트림들을 설명하고 있다. 그렇지만, 아바타들 이외의 다른 사운드 소스들에 동일한 방법들이 적용될 수 있다. 예를 들어, 가상 환경은 스피커들의 3차원 모델들을 가질 수 있다. 프레젠테이션으로 인해 또는 단순히 배경 음악을 제공하기 위해, 위에서 설명된 아바타 모델들과 동일한 방식으로 스피커들로부터 사운드가 방출될 수 있다.For illustrative purposes, the methods described with respect to FIGS. 5 and 7 describe audio streams from a user with a corresponding avatar. However, the same methods can be applied to other sound sources than avatars. For example, a virtual environment may have three-dimensional models of speakers. Sound may be emitted from the speakers in the same way as the avatar models described above, either due to a presentation or simply to provide background music.

위에서 언급된 바와 같이, 오디오를 완전히 격리시키기 위해 벽 투과 율이 사용될 수 있다. 일 실시예에서, 이것은 가상 사무실을 생성하는 데 사용될 수 있다. 일 예에서, 각각의 사용자는 가상 사무실에서 지속적으로 켜져 있고 가상 사무실에 로그인되어 있는 회의 애플리케이션을 디스플레이하는 모니터를 자신의 물리적(아마도 집) 사무실에 가질 수 있다. 사용자가 사무실에 있는지 또는 방해받지 않아야 하는지를 사용자가 표시할 수 있게 하는 기능이 있을 수 있다. 방해 금지 표시기가 꺼져 있는 경우, 동료 또는 관리자는 가상 공간 내에 들어와 실제 사무실에서와 같이 노크하거나 걸어 들어갈 수 있다. 방문자는 직원이 사무실에 없을 경우 메모를 남길 수 있다. 직원이 돌아올 때, 직원은 방문자가 남긴 메모를 읽을 수 있다. 가상 사무실은 사용자를 위한 메시지들을 디스플레이하는 화이트보드 및/또는 인터페이스를 가질 수 있다. 메시지들은 이메일일 수 있고/있거나 캘리포니아주 샌프란시스코의 Slack Technologies, Inc.로부터 이용 가능한 SLACK 애플리케이션과 같은 메시징 애플리케이션으로부터의 것일 수 있다.As mentioned above, wall transmittance can be used to completely isolate the audio. In one embodiment, this may be used to create a virtual office. In one example, each user may have a monitor in their physical (possibly home) office that displays a conferencing application that is constantly on in the virtual office and logged into the virtual office. There may be a feature that allows the user to indicate whether the user is in the office or should not be disturbed. When the do not disturb indicator is off, co-workers or managers can enter the virtual space and knock or walk in, just like in a real office. Visitors can leave notes when staff are not in the office. When the staff returns, they can read the notes left by the visitor. The virtual office may have a whiteboard and/or interface displaying messages for the user. The messages may be email and/or may be from a messaging application such as the SLACK application available from Slack Technologies, Inc. of San Francisco, CA.

사용자들은 자신의 가상 사무실들을 맞춤화하거나 개인화할 수 있다. 예를 들어, 사용자들은 포스터들 또는 다른 벽 장식품들의 모델들을 세울 수 있다. 사용자들은 책상 또는, 식물 재배와 같은, 장식용 장식품의 모델들 또는 배향을 변경할 수 있다. 사용자들은 조명을 바꾸거나 창 밖을 볼 수 있다.Users can customize or personalize their virtual offices. For example, users can erect models of posters or other wall ornaments. Users can change the models or orientation of a desk or decorative ornament, such as a planter. Users can change the lighting or look out the window.

다시 도 8a로 돌아가서, 인터페이스(800)는 다양한 버튼들(804, 806, 및 808)을 포함한다. 사용자가 버튼(804)을 누를 때, 도 5 및 도 7에서의 방법들과 관련하여 위에서 설명된 감쇠가 발생하지 않을 수 있거나, 보다 적은 양으로만 발생할 수 있다. 그 상황에서, 사용자의 음성이 다른 사용자들에게 균일하게 출력되어, 사용자가 회의의 모든 참가자들에게 말을 할 수 있게 한다. 아래에서 설명될 것인 바와 같이, 사용자 비디오가 또한 가상 환경 내의 프레젠테이션 화면에도 출력될 수 있다. 사용자가 버튼(806)을 누를 때, 스피커 모드가 인에이블된다. 그 경우에, 배경 음악을 재생하는 것과 같이, 가상 환경 내의 사운드 소스들로부터 오디오가 출력된다. 사용자가 버튼(808)을 누를 때, 화면 공유 모드가 인에이블될 수 있어, 사용자가 다른 사용자들과 자신의 디바이스 상의 화면 또는 창의 콘텐츠를 공유할 수 있게 한다. 콘텐츠는 프레젠테이션 모델 상에 제시될 수 있다. 이것도 아래에서 설명될 것이다.Returning to FIG. 8A , interface 800 includes various buttons 804 , 806 , and 808 . When the user presses the button 804, the attenuation described above with respect to the methods in FIGS. 5 and 7 may not occur, or may only occur in a lesser amount. In that situation, the user's voice is output uniformly to the other users, allowing the user to address all participants in the conference. As will be explained below, the user video can also be output to the presentation screen within the virtual environment. When the user presses button 806, speaker mode is enabled. In that case, audio is output from sound sources in the virtual environment, such as playing background music. When the user presses the button 808, a screen sharing mode may be enabled, allowing the user to share the content of a screen or window on their device with other users. Content can be presented on a presentation model. This will also be explained below.

3차원 환경에서 프레젠테이션하기Presenting in a 3D Environment

도 10은 3차원 가상 환경에서의 3차원 모델(1004)과의 인터페이스(1000)를 예시한다. 도 1과 관련하여 위에서 설명된 바와 같이, 인터페이스(1000)는 가상 환경 주위를 내비게이션할 수 있는 사용자에게 디스플레이될 수 있다. 인터페이스(1000)에 예시된 바와 같이, 가상 환경은 아바타(1004) 및 3차원 모델(1002)을 포함한다.10 illustrates an interface 1000 with a three-dimensional model 1004 in a three-dimensional virtual environment. As described above with respect to FIG. 1 , interface 1000 may be displayed to a user capable of navigating around a virtual environment. As illustrated in interface 1000 , the virtual environment includes an avatar 1004 and a three-dimensional model 1002 .

3차원 모델(1002)은 가상 공간 내부에 배치되는 제품의 3D 모델이다. 사람들은 모델을 관찰하기 위해 이 가상 공간에 합류할 수 있고, 그 주위를 걸을 수 있다. 제품은 경험을 향상시키기 위해 로컬화된 사운드를 가질 수 있다.The 3D model 1002 is a 3D model of a product placed inside the virtual space. People can join this virtual space to observe the model and walk around it. Products can have localized sounds to enhance the experience.

보다 상세하게는, 가상 공간에 있는 발표자가 3D 모델을 보여주기를 원할 때, 발표자는 인터페이스로부터 원하는 모델을 선택한다. 이것은 세부 사항들(모델의 이름 및 경로를 포함함)을 업데이트하라는 메시지를 서버에게 송신한다. 이것은 자동으로 클라이언트들에게 통신될 것이다. 이러한 방식으로, 3차원 모델은 비디오 스트림을 제시하는 것과 동시에 디스플레이하기 위해 렌더링될 수 있다. 사용자들은 제품의 3차원 모델 주위에서 가상 카메라를 내비게이션할 수 있다.More specifically, when a presenter in the virtual space wants to show a 3D model, the presenter selects the desired model from the interface. This sends a message to the server to update the details (including the model's name and path). This will be automatically communicated to clients. In this way, the three-dimensional model can be rendered for display concurrently with presenting the video stream. Users can navigate a virtual camera around a three-dimensional model of a product.

상이한 예들에서, 객체는 제품 시연일 수 있거나, 제품에 대한 광고일 수 있다.In different examples, the object may be a product demonstration or may be an advertisement for a product.

도 11은 화상 회의에 사용되는 3차원 가상 환경에서의 프레젠테이션 화면 공유를 갖는 인터페이스(1100)를 예시한다. 도 1과 관련하여 위에서 설명된 바와 같이, 인터페이스(1100)는 가상 환경 주위를 내비게이션할 수 있는 사용자에게 디스플레이될 수 있다. 인터페이스(1100)에 예시된 바와 같이, 가상 환경은 아바타(1104) 및 프레젠테이션 화면(1106)을 포함한다.11 illustrates an interface 1100 with presentation screen sharing in a three-dimensional virtual environment used for video conferencing. As described above with respect to FIG. 1 , interface 1100 may be displayed to a user capable of navigating around a virtual environment. As illustrated in interface 1100 , the virtual environment includes an avatar 1104 and a presentation screen 1106 .

이 실시예에서, 회의의 참가자의 디바이스로부터의 프레젠테이션 스트림이 수신된다. 프레젠테이션 스트림은 프레젠테이션 화면(1106)의 3차원 모델 상으로 텍스처 매핑된다. 일 실시예에서, 프레젠테이션 스트림은 사용자의 디바이스 상의 카메라로부터의 비디오 스트림일 수 있다. 다른 실시예에서, 프레젠테이션 스트림은, 모니터 또는 창이 공유되는, 사용자의 디바이스로부터의 화면 공유일 수 있다. 화면 공유를 통해 또는 다른 방식으로, 프레젠테이션 비디오 및 오디오 스트림은 또한 외부 소스, 예를 들어, 이벤트의 라이브 스트림으로부터의 것일 수 있다. 사용자가 발표자 모드를 인에이블시킬 때, 사용자가 사용하기를 원하는 화면의 이름으로 태깅된 사용자의 프레젠테이션 스트림(및 오디오 스트림)이 서버에 게시된다. 다른 클라이언트들은 새로운 스트림이 이용 가능하다는 통지를 받는다.In this embodiment, a presentation stream from a device of a participant in a conference is received. The presentation stream is texture mapped onto the 3D model of the presentation screen 1106. In one embodiment, the presentation stream may be a video stream from a camera on the user's device. In another embodiment, the presentation stream may be a screen share from a user's device, where a monitor or window is shared. Via screen sharing or otherwise, the presentation video and audio streams can also be from an external source, eg a live stream of an event. When the user enables presenter mode, the user's presentation stream (and audio stream) tagged with the name of the screen the user wants to use is posted to the server. Other clients are notified that a new stream is available.

발표자는 또한 청중 구성원들의 위치와 배향을 제어할 수 있다. 예를 들어, 발표자는 프레젠테이션 화면을 향하도록 배치되고 배향되도록 회의의 모든 다른 참가자들을 재배열하도록 선택하는 옵션을 가질 수 있다.The presenter can also control the position and orientation of audience members. For example, the presenter may have the option of choosing to rearrange all other participants in the conference so that they are positioned and oriented towards the presentation screen.

오디오 스트림은 프레젠테이션 스트림과 동기하여 제1 참가자의 디바이스의 마이크로폰으로부터 캡처된다. 사용자의 마이크로폰으로부터의 오디오 스트림은 프레젠테이션 화면(1106)으로부터 나오는 것처럼 다른 사용자들에게 들릴 수 있다. 이러한 방식으로, 프레젠테이션 화면(1106)은 위에서 설명된 바와 같은 사운드 소스일 수 있다. 사용자의 오디오 스트림이 프레젠테이션 화면(1106)으로부터 투사되기 때문에, 사용자의 아바타로부터 오는 것이 억제될 수 있다. 이러한 방식으로, 오디오 스트림은 3차원 가상 공간 내의 화면(1106) 상에 프레젠테이션 스트림을 디스플레이하는 것과 동기하여 재생되도록 출력된다.The audio stream is captured from the microphone of the first participant's device in synchronization with the presentation stream. The audio stream from the user's microphone can be heard by other users as coming from the presentation screen 1106 . In this way, the presentation screen 1106 may be a sound source as described above. Since the user's audio stream is projected from the presentation screen 1106, it can be suppressed from coming from the user's avatar. In this way, the audio stream is output to be reproduced in synchronization with the display of the presentation stream on the screen 1106 in the three-dimensional virtual space.

사용자들 사이의 거리에 기초한 대역폭 할당Bandwidth allocation based on distance between users

도 12는 3차원 가상 환경 내에서의 아바타들의 상대적인 위치에 기초하여 이용 가능한 대역폭을 배분하기 위한 방법(1200)을 예시하는 플로차트이다.12 is a flowchart illustrating a method 1200 for apportioning available bandwidth based on relative positions of avatars within a three-dimensional virtual environment.

단계(1202)에서, 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리가 결정된다. 거리는 3차원 공간에서 수평 평면에서의 그들 사이의 거리일 수 있다.In step 1202, a distance between a first user and a second user in the virtual meeting space is determined. The distance may be the distance between them on a horizontal plane in a three-dimensional space.

단계(1204)에서, 보다 가까운 사용자들의 비디오 스트림들이 보다 멀리 있는 사용자들로부터의 비디오 스트림들보다 우선순위를 부여받도록 수신된 비디오 스트림들에 우선순위가 부여된다. 우선순위 값은 도 13에 예시된 바와 같이 결정될 수 있다.In step 1204, the received video streams are prioritized such that video streams of closer users are given priority over video streams from more distant users. Priority values may be determined as illustrated in FIG. 13 .

도 13은 y 축 상의 우선순위(1306) 및 거리(1302)를 보여주는 차트(1300)를 도시한다. 라인(1306)에 의해 예시된 바와 같이, 기준 거리(1304)에 도달할 때까지 우선순위 상태가 일정한 레벨을 유지한다. 기준 거리에 도달한 후에, 우선순위가 떨어지기 시작한다.13 shows a chart 1300 showing priority 1306 and distance 1302 on the y-axis. As illustrated by line 1306, the priority state remains at a constant level until the reference distance 1304 is reached. After reaching the criterion distance, the priority starts to drop.

단계(1206)에서, 사용자 디바이스에 이용 가능한 대역폭이 다양한 비디오 스트림들 간에 배분된다. 이것은 단계(1204)에서 결정되는 우선순위 값들에 기초하여 수행될 수 있다. 예를 들어, 우선순위들은 모두의 합이 1이 되도록 비례적으로 조정될 수 있다. 불충분한 대역폭이 이용 가능한 임의의 비디오들의 경우, 상대적인 우선순위가 0이 될 수 있다. 그러면, 나머지 비디오 스트림들에 대해 우선순위들이 또다시 조정된다. 대역폭은 이러한 상대적인 우선순위 값들에 기초하여 할당된다. 추가적으로, 오디오 스트림들을 위한 대역폭이 예약될 수 있다. 이것은 도 14에 예시되어 있다.At step 1206, the bandwidth available to the user device is distributed among the various video streams. This may be done based on the priority values determined in step 1204. For example, the priorities can be scaled proportionally so that the sum of all equals one. For any videos for which insufficient bandwidth is available, the relative priority may be zero. The priorities are then adjusted again for the remaining video streams. Bandwidth is allocated based on these relative priority values. Additionally, bandwidth for audio streams may be reserved. This is illustrated in FIG. 14 .

도 14는 대역폭(1406)을 나타내는 y 축 및 상대적인 우선순위를 나타내는 x 축을 갖는 차트(1400)를 예시한다. 비디오가 유효한 최소 대역폭(1406)을 할당받은 후에, 비디오 스트림에 할당되는 대역폭(1406)은 그의 상대적인 우선순위에 비례하여 증가한다.14 illustrates a chart 1400 with the y-axis representing bandwidth 1406 and the x-axis representing relative priority. After a video has been allocated the minimum valid bandwidth 1406, the bandwidth 1406 allocated to a video stream is increased proportionally to its relative priority.

일단 할당된 대역폭이 결정되면, 클라이언트는 비디오를 해당 비디오에 대해 선택되고 할당된 대역폭/비트레이트/프레임 레이트/해상도로 서버에 요청할 수 있다. 이것은 지정된 대역폭으로 비디오를 스트리밍하기 시작하기 위해 클라이언트와 서버 사이의 협상 프로세스를 시작할 수 있다. 이러한 방식으로, 이용 가능한 비디오 및 오디오 대역폭은 모든 사용자들에 걸쳐 공정하게 분배되며, 여기서 두 배의 우선순위를 갖는 사용자들은 두 배의 대역폭을 받을 것이다.Once the allocated bandwidth is determined, the client can request video from the server at the selected and allocated bandwidth/bitrate/frame rate/resolution for that video. This may initiate a negotiation process between the client and server to start streaming video with the specified bandwidth. In this way, the available video and audio bandwidth is divided fairly among all users, where users with twice the priority will receive twice the bandwidth.

하나의 가능한 구현에서, 사이멀캐스트(simulcast)를 사용하여, 모든 클라이언트들이, 상이한 비트레이트들 및 해상도들로, 다수의 비디오 스트림들을 서버에게 송신한다. 다른 클라이언트들은 그러면 이러한 스트림들 중 어느 것에 관심이 있고 수신하기를 원하는지를 서버에 알려 줄 수 있다.In one possible implementation, all clients transmit multiple video streams to the server, at different bitrates and resolutions, using simulcast. Other clients can then tell the server which of these streams they are interested in and want to receive.

단계(1208)에서, 가상 회의 공간에서 제1 사용자와 제2 사용자 사이에서 이용 가능한 대역폭이 해당 거리에서의 비디오의 디스플레이를 비효과적이게 하는지 여부가 결정된다. 이 결정은 클라이언트 또는 서버에 의해 수행될 수 있다. 클라이언트에 의한 경우, 클라이언트는 클라이언트로의 비디오의 전송을 중단하라는 서버에 대한 메시지를 송신한다. 비효과적인 경우, 제2 사용자의 디바이스로의 비디오 스트림의 전송이 중단되고, 제2 사용자의 디바이스는 비디오 스트림을 정지 이미지로 대체하도록 통지받는다. 정지 이미지는 단순히 수신된 마지막 비디오 프레임들(또는 수신된 마지막 비디오 프레임들 중 하나)일 수 있다.In step 1208, it is determined whether the available bandwidth between the first user and the second user in the virtual meeting space makes display of the video at that distance ineffective. This determination can be made by either the client or the server. If by the client, the client sends a message to the server to stop sending the video to the client. If ineffective, transmission of the video stream to the second user's device is stopped, and the second user's device is notified to replace the video stream with a still image. The still image may simply be the last video frames received (or one of the last video frames received).

일 실시예에서, 오디오에 대해 유사한 프로세스가 실행될 수 있어, 오디오에 대해 예약된 부분의 크기가 주어진 경우 품질을 감소시킬 수 있다. 다른 실시예에서, 각각의 오디오 스트림은 일관된 대역폭을 부여받는다.In one embodiment, a similar process can be performed for audio, reducing the quality given the size of the portion reserved for audio. In another embodiment, each audio stream is given a consistent bandwidth.

이러한 방식으로, 실시예들은 모든 사용자들에 대해 성능을 증가시키고, 서버의 경우 보다 멀리 떨어져 있고/있거나 덜 중요한 사용자들에 대해 비디오 및 오디오 스트림 품질이 감소될 수 있다. 충분한 대역폭 버짓이 이용 가능할 때 이것은 수행되지 않는다. 감소는 비트레이트 및 해상도 양쪽 모두에서 수행된다. 이것은 비디오 품질을 개선시키는데, 그 이유는 해당 사용자에 이용 가능한 대역폭이 인코더에 의해 보다 효율적으로 활용될 수 있기 때문이다.In this way, the embodiments increase performance for all users, and video and audio stream quality may be reduced for users who are more distant and/or less important than in the case of the server. This is not done when sufficient bandwidth budget is available. Reduction is done in both bitrate and resolution. This improves video quality because the bandwidth available to that user can be utilized more efficiently by the encoder.

이와는 별개로, 비디오 해상도가 거리에 기초하여 축소되며, 두 배 더 멀리 떨어져 있는 사용자들은 절반의 해상도를 갖는다. 이러한 방식으로, 화면 해상도의 제한이 주어진 경우, 불필요한 해상도는 다운로드되지 않을 수 있다. 따라서, 대역폭이 보존된다.Separately, video resolution is scaled down based on distance, and users twice as far away have half the resolution. In this way, given the limitations of screen resolution, unnecessary resolutions may not be downloaded. Thus, bandwidth is conserved.

도 15는 가상 환경 내에서 화상 회의를 제공하기 위해 사용되는 디바이스들의 컴포넌트들을 예시하는 시스템(1500)의 다이어그램이다. 다양한 실시예들에서, 시스템(1500)은 위에서 설명된 방법들에 따라 작동할 수 있다.15 is a diagram of a system 1500 illustrating components of devices used to provide video conferencing within a virtual environment. In various embodiments, system 1500 may operate according to the methods described above.

디바이스(306A)는 사용자 컴퓨팅 디바이스이다. 디바이스(306A)는 데스크톱 또는 랩톱 컴퓨터, 스마트폰, 태블릿, 또는 웨어러블(예를 들면, 시계 또는 헤드 마운티드 디바이스)일 수 있다. 디바이스(306A)는 마이크로폰(1502), 카메라(1504), 스테레오 스피커(1506), 입력 디바이스(1512)를 포함한다. 도시되어 있지 않지만, 디바이스(306A)는 프로세서 및 영구적, 비일시적 및 휘발성 메모리를 또한 포함한다. 프로세서들은 하나 이상의 중앙 처리 유닛, 그래픽 처리 유닛 또는 이들의 임의의 조합을 포함할 수 있다.Device 306A is a user computing device. Device 306A may be a desktop or laptop computer, smartphone, tablet, or wearable (eg, watch or head mounted device). Device 306A includes a microphone 1502 , a camera 1504 , a stereo speaker 1506 , and an input device 1512 . Although not shown, device 306A also includes a processor and permanent, non-transitory and volatile memory. Processors may include one or more central processing units, graphics processing units, or any combination thereof.

마이크로폰(1502)은 사운드를 전기 신호로 변환한다. 마이크로폰(1502)은 디바이스(306A)의 사용자의 음성을 캡처하도록 배치된다. 상이한 예들에서, 마이크로폰(1502)은 콘덴서 마이크로폰, 일렉트릿 마이크로폰, 이동 코일 마이크로폰, 리본 마이크로폰, 탄소 마이크로폰, 압전 마이크로폰, 광섬유 마이크로폰, 레이저 마이크로폰, 물 마이크로폰, 또는 MEMS 마이크로폰일 수 있다.A microphone 1502 converts sound into an electrical signal. A microphone 1502 is positioned to capture the voice of the user of device 306A. In different examples, the microphone 1502 can be a condenser microphone, an electret microphone, a moving coil microphone, a ribbon microphone, a carbon microphone, a piezoelectric microphone, a fiber optic microphone, a laser microphone, a water microphone, or a MEMS microphone.

카메라(1504)는 일반적으로 하나 이상의 렌즈를 통해 광을 캡처하는 것에 의해 이미지 데이터를 캡처한다. 카메라(1504)는 디바이스(306A)의 사용자의 사진 이미지들을 캡처하도록 배치된다. 카메라(1504)는 이미지 센서(도시되지 않음)를 포함한다. 이미지 센서는, 예를 들어, CCD(charge coupled device) 센서 또는 CMOS(complementary metal oxide semiconductor) 센서일 수 있다. 이미지 센서는 광을 검출하고 전기 신호들로 변환하는 하나 이상의 광검출기를 포함할 수 있다. 유사한 시간 프레임에서 함께 캡처되는 이러한 전기 신호들은 정지 사진 이미지를 구성한다. 일정한 간격으로 캡처되는 일련의 정지 사진 이미지들은 함께 비디오를 구성한다. 이러한 방식으로, 카메라(1504)는 이미지들 및 비디오들을 캡처한다.Camera 1504 captures image data, typically by capturing light through one or more lenses. Camera 1504 is positioned to capture photographic images of a user of device 306A. Camera 1504 includes an image sensor (not shown). The image sensor may be, for example, a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor. An image sensor may include one or more photodetectors that detect light and convert it into electrical signals. These electrical signals captured together in a similar time frame constitute a still photographic image. A series of still photographic images captured at regular intervals together constitute a video. In this way, camera 1504 captures images and videos.

스테레오 스피커(1506)는 전기 오디오 신호를 대응하는 좌우 사운드로 변환하는 디바이스이다. 스테레오 스피커(1506)는 오디오 프로세서(1520)(아래)에 의해 생성되는 좌측 오디오 스트림과 우측 오디오 스트림을 스테레오로 디바이스(306A)의 사용자에게 재생되도록 출력한다. 스테레오 스피커(1506)는 사용자의 좌측 귀와 우측 귀에 직접적으로 사운드를 재생하도록 설계된 주변 스피커들 및 헤드폰들 양쪽 모두를 포함한다. 예시적인 스피커들은 가동 철편(moving-iron) 라우드스피커, 압전 스피커, 정자기 라우드스피커, 정전기 라우드스피커, 리본 및 평면 자기 라우드스피커, 굽힘파 라우드스피커, 평면 패널 라우드스피커, 하일 에어 모션 트랜스듀서(heil air motion transducer), 투명 이온 전도 스피커, 플라즈마 아크 스피커, 열음향 스피커, 로터리 우퍼, 가동 코일, 정전기, 일렉트릿, 평면 자기, 및 균형 전기자(balanced armature)를 포함한다.The stereo speaker 1506 is a device that converts electrical audio signals into corresponding left and right sounds. The stereo speaker 1506 outputs the left audio stream and the right audio stream generated by the audio processor 1520 (below) to be reproduced in stereo to the user of the device 306A. Stereo speakers 1506 include both peripheral speakers and headphones designed to reproduce sound directly to the user's left and right ears. Exemplary loudspeakers include moving-iron loudspeakers, piezoelectric loudspeakers, static magnetic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, flexural wave loudspeakers, flat panel loudspeakers, heil air motion transducers. air motion transducer), transparent ion conduction speaker, plasma arc speaker, thermoacoustic speaker, rotary woofer, moving coil, electrostatic, electret, planar magnetic, and balanced armature.

네트워크 인터페이스(1508)는 컴퓨터 네트워크에서 2 개의 장비 또는 프로토콜 계층 사이의 소프트웨어 또는 하드웨어 인터페이스이다. 네트워크 인터페이스(1508)는 회의의 각자의 참가자들에 대한 서버(302)로부터의 비디오 스트림을 수신한다. 비디오 스트림은 화상 회의의 다른 참가자의 디바이스 상의 카메라로부터 캡처된다. 네트워크 인터페이스(1508)는 또한 서버(302)로부터 3차원 가상 공간 및 그 안의 임의의 모델들을 지정하는 데이터를 수신한다. 다른 참가자들 각각에 대해, 네트워크 인터페이스(1508)는 3차원 가상 공간에서의 위치 및 방향을 수신한다. 위치 및 방향은 각자의 다른 참가자들 각각에 의해 입력된다.Network interface 1508 is a software or hardware interface between two equipment or protocol layers in a computer network. Network interface 1508 receives video streams from server 302 for the respective participants in the conference. The video stream is captured from cameras on the devices of other participants in the video conference. The network interface 1508 also receives data specifying the three-dimensional virtual space and arbitrary models therein from the server 302 . For each of the other participants, the network interface 1508 receives a position and orientation in a three-dimensional virtual space. Position and orientation are entered by each of the respective other participants.

네트워크 인터페이스(1508)는 또한 데이터를 서버(302)에게 전송한다. 이는 렌더러(1518)에 의해 사용되는 디바이스(306A)의 사용자의 가상 카메라의 위치를 전송하고, 카메라(1504)와 마이크로폰(1502)으로부터의 비디오 및 오디오 스트림들을 전송한다.Network interface 1508 also transmits data to server 302 . It transmits the position of the user's virtual camera of device 306A used by renderer 1518 and transmits video and audio streams from camera 1504 and microphone 1502 .

디스플레이(1510)는 시각적 또는 촉각적 형태(후자는, 예를 들어, 시각 장애인을 위한 촉각 전자 디스플레이들에서 사용됨)로 전자 정보를 제시하기 위한 출력 디바이스이다. 디스플레이(1510)는 텔레비전 세트, 컴퓨터 모니터, 헤드 마운티드 디스플레이, 헤드업 디스플레이, 증강 현실 또는 가상 현실 헤드셋의 출력, 방송 참조 모니터(broadcast reference monitor), 의료 모니터, (모바일 디바이스를 위한) 모바일 디스플레이, (스마트폰을 위한) 스마트폰 디스플레이일 수 있다. 정보를 제시하기 위해, 디스플레이(1510)는 전자발광(ELD) 디스플레이, 액정 디스플레이(LCD), 발광 다이오드(LED) 백라이트 LCD, 박막 트랜지스터(TFT) LCD, 발광 다이오드(LED) 디스플레이, OLED 디스플레이, AMOLED 디스플레이, 플라즈마(PDP) 디스플레이, 양자점(QLED) 디스플레이를 포함할 수 있다.Display 1510 is an output device for presenting electronic information in a visual or tactile form (the latter used, for example, in tactile electronic displays for the visually impaired). The display 1510 may be a television set, a computer monitor, a head mounted display, a heads-up display, the output of an augmented reality or virtual reality headset, a broadcast reference monitor, a medical monitor, a mobile display (for mobile devices), ( It may be a smartphone display (for a smartphone). To present information, the display 1510 may be an electroluminescent (ELD) display, a liquid crystal display (LCD), a light emitting diode (LED) backlit LCD, a thin film transistor (TFT) LCD, a light emitting diode (LED) display, an OLED display, or an AMOLED. It may include a display, a plasma (PDP) display, and a quantum dot (QLED) display.

입력 디바이스(1512)는 컴퓨터 또는 정보 기기와 같은 정보 처리 시스템에 데이터 및 제어 신호들을 제공하는 데 사용되는 장비이다. 입력 디바이스(1512)는 사용자가 렌더러(1518)에 의해 사용되는 가상 카메라의 새로운 원하는 위치를 입력할 수 있게 함으로써, 3차원 환경에서 내비게이션을 가능하게 한다. 입력 디바이스의 예들은 키보드, 마우스, 스캐너, 조이스틱, 및 터치스크린을 포함한다.Input device 1512 is equipment used to provide data and control signals to an information processing system, such as a computer or information appliance. The input device 1512 enables the user to enter a new desired position of the virtual camera used by the renderer 1518, thereby enabling navigation in a three-dimensional environment. Examples of input devices include keyboards, mice, scanners, joysticks, and touch screens.

웹 브라우저(308A) 및 웹 애플리케이션(310A)은 도 3과 관련하여 위에서 설명되었다. 웹 애플리케이션(310A)은 화면 캡처기(1514), 텍스처 매퍼(1516), 렌더러(1518), 및 오디오 프로세서(1520)를 포함한다.Web browser 308A and web application 310A have been described above with respect to FIG. 3 . The web application 310A includes a screen capturer 1514 , a texture mapper 1516 , a renderer 1518 , and an audio processor 1520 .

화면 캡처기(1514)는 프레젠테이션 스트림, 상세하게는 화면 공유를 캡처한다. 화면 캡처기(1514)는 웹 브라우저(308A)에 의해 이용가능하게 되는 API와 상호작용할 수 있다. API로부터 이용 가능한 함수를 호출하는 것에 의해, 화면 캡처기(1514)는 웹 브라우저(308A)로 하여금 사용자가 어느 창 또는 화면을 공유하기를 원하는지를 사용자에게 질문하게 할 수 있다. 그 질의에 대한 답변에 기초하여, 웹 브라우저(308A)는 화면 공유에 대응하는 비디오 스트림을 화면 캡처기(1514)에 반환할 수 있고, 화면 캡처기(1514)는 서버(302) 및 궁극적으로 다른 참가자들의 디바이스들에게 전송하기 위해 이를 네트워크 인터페이스(1508)에 전달한다.The screen capturer 1514 captures a presentation stream, specifically a screen share. Screen capturer 1514 can interact with APIs made available by web browser 308A. By calling a function available from the API, screen capturer 1514 can cause web browser 308A to ask the user which window or screen the user would like to share. Based on the answer to that query, web browser 308A may return a video stream corresponding to the screen share to screen capturer 1514, which may then return to server 302 and ultimately other Pass it to the network interface 1508 for transmission to the participants' devices.

텍스처 매퍼(1516)는 비디오 스트림을 아바타에 대응하는 3차원 모델 상으로 텍스처 매핑한다. 텍스처 매퍼(1516)는 비디오로부터의 각자의 프레임들을 아바타에 텍스처 매핑할 수 있다. 추가적으로, 텍스처 매퍼(1516)는 프레젠테이션 스트림을 프레젠테이션 화면의 3차원 모델에 텍스처 매핑할 수 있다.A texture mapper 1516 texture maps the video stream onto a 3D model corresponding to the avatar. Texture mapper 1516 may texture map respective frames from the video to the avatar. Additionally, the texture mapper 1516 may texture map the presentation stream to the 3D model of the presentation screen.

렌더러(1518)는, 디바이스(306A)의 사용자의 가상 카메라의 시점으로부터, 디스플레이(1510)에 출력하기 위해 수신된 대응하는 위치에 위치하고 그 방향으로 배향되는 각자의 참가자들에 대한 아바타들의 텍스처 매핑된 3차원 모델들을 포함하는 3차원 가상 공간을 렌더링한다. 렌더러(1518)는 또한, 예를 들어, 프레젠테이션 화면을 포함한 임의의 다른 3차원 모델들을 렌더링한다.Renderer 1518 renders, from the point of view of the user's virtual camera of device 306A, a texture-mapped image of the avatars for the respective participants positioned and oriented in the corresponding location received for output to display 1510. A 3D virtual space containing 3D models is rendered. Renderer 1518 also renders any other three-dimensional models, including, for example, presentation screens.

오디오 프로세서(1520)는 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하도록 좌측 오디오 스트림 및 우측 오디오 스트림을 결정하기 위해 수신된 오디오 스트림의 볼륨을 조정한다. 일 실시예에서, 오디오 프로세서(1520)는 제2 위치와 제1 위치 사이의 거리에 기초하여 볼륨을 조정한다. 다른 실시예에서, 오디오 프로세서(1520)는 제2 위치로부터 제1 위치로의 방향에 기초하여 볼륨을 조절한다. 또 다른 실시예에서, 오디오 프로세서(1520)는 3차원 가상 공간 내의 수평 평면 상에서 제1 위치를 기준으로 제2 위치의 방향에 기초하여 볼륨을 조정한다. 또 다른 실시예에서, 오디오 프로세서(1520)는 가상 카메라가 3차원 가상 공간에서 향하고 있는 방향에 기초하여, 아바타가 가상 카메라의 좌측에 위치할 때는 좌측 오디오 스트림이 더 높은 볼륨을 갖는 경향이 있고 아바타가 가상 카메라의 우측에 위치할 때는 우측 오디오 스트림이 더 높은 볼륨을 가지는 경향이 있도록, 볼륨을 조정한다. 마지막으로, 또 다른 실시예에서, 오디오 프로세서(1520)는 가상 카메라가 향하고 있는 방향과 아바타가 향하고 있는 방향 사이의 각도에 기초하여, 이 각도가 아바타가 향하고 있는 곳에 더 수직인 것이 좌측 오디오 스트림과 우측 오디오 스트림 사이의 더 큰 볼륨 차이를 갖는 경향이 있도록, 볼륨을 조정한다.Audio processor 1520 adjusts the volume of the received audio stream to determine the left audio stream and the right audio stream to provide a sense of where the second location is relative to the first location in three-dimensional virtual space. In one embodiment, the audio processor 1520 adjusts the volume based on the distance between the second location and the first location. In another embodiment, the audio processor 1520 adjusts the volume based on the direction from the second location to the first location. In another embodiment, the audio processor 1520 adjusts the volume based on the direction of the second position relative to the first position on a horizontal plane in a three-dimensional virtual space. In another embodiment, the audio processor 1520 determines that based on the direction the virtual camera is facing in 3D virtual space, when the avatar is positioned to the left of the virtual camera, the left audio stream tends to have a higher volume and the avatar When is positioned to the right of the virtual camera, adjust the volume so that the audio stream on the right tends to have a higher volume. Finally, in another embodiment, the audio processor 1520 determines, based on the angle between the direction the virtual camera is facing and the direction the avatar is facing, that the angle more perpendicular to where the avatar is facing corresponds to the left audio stream. Adjust the volume so that it tends to have a larger volume difference between the right audio streams.

오디오 프로세서(1520)는 또한 가상 카메라가 위치하는 영역을 기준으로 스피커가 위치하는 영역에 기초하여 오디오 스트림의 볼륨을 조정할 수 있다. 이 실시예에서, 3차원 가상 공간은 복수의 영역들로 분할된다. 이러한 영역들은 계층적일 수 있다. 스피커와 가상 카메라가 상이한 영역들에 위치할 때, 말하는 오디오 스트림의 볼륨을 감쇠시키기 위해 벽 투과 율이 적용될 수 있다.The audio processor 1520 may also adjust the volume of the audio stream based on the area where the speaker is located relative to the area where the virtual camera is located. In this embodiment, a three-dimensional virtual space is divided into a plurality of regions. These areas can be hierarchical. Wall transmittance can be applied to attenuate the volume of the talking audio stream when the speaker and the virtual camera are located in different areas.

서버(302)는 참석 통지기(1522), 스트림 조정기(1524), 및 스트림 포워더(1526)를 포함한다.The server 302 includes an attendance notifier 1522 , a stream coordinator 1524 , and a stream forwarder 1526 .

참석 통지기(1522)는 참가자들이 회의에 합류하고 회의에서 나갈 때 회의 참가자들에게 통지한다. 새로운 참가자가 회의에 합류할 때, 참석 통지기(1522)는 새로운 참가자가 합류했다는 것을 나타내는 메시지를 회의의 다른 참가자들의 디바이스들에게 송신한다. 참석 통지기(1522)는 비디오, 오디오, 및 위치/방향 정보를 다른 참가자들에게 전달하기 시작하도록 스트림 포워더(1526)에 시그널링한다.Attendance notifier 1522 notifies conference participants when they join and leave the conference. When a new participant joins the conference, the attendance notifier 1522 sends a message to the devices of the other participants in the conference indicating that the new participant has joined. Attendance notifier 1522 signals stream forwarder 1526 to begin passing video, audio, and location/direction information to other participants.

스트림 조정기(1524)는 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림을 수신한다. 스트림 조정기(1524)는 가상 회의를 위한 데이터를 제2 사용자에게 전송하기 위해 이용 가능한 대역폭을 결정한다. 이는 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리를 결정한다. 그리고, 이는 상대적인 거리에 기초하여 제1 비디오 스트림과 제2 비디오 스트림 간에 이용 가능한 대역폭을 배분한다. 이러한 방식으로, 스트림 조정기(1524)는 보다 멀리 있는 사용자들로부터의 비디오 스트림들보다 보다 가까운 사용자들의 비디오 스트림들에 우선순위를 부여한다. 추가적으로 또는 대안적으로, 스트림 조정기(1524)는, 아마도 웹 애플리케이션(310A)의 일부로서, 디바이스(306A) 상에 위치할 수 있다.Stream coordinator 1524 receives a video stream captured from a camera on a first user's device. Stream coordinator 1524 determines available bandwidth for transmitting data for the virtual conference to the second user. This determines the distance between the first user and the second user in the virtual meeting space. And, it distributes the available bandwidth between the first video stream and the second video stream based on the relative distance. In this way, the stream coordinator 1524 prioritizes video streams of closer users over video streams from more distant users. Additionally or alternatively, stream coordinator 1524 may be located on device 306A, perhaps as part of web application 310A.

스트림 포워더(1526)는 수신되는 위치/방향 정보, 비디오, 오디오, 및 화면 공유 화면들을 (스트림 조정기(1524)에 의해 이루어진 조정들과 함께) 브로드캐스트한다. 스트림 포워더(1526)는 회의 애플리케이션(310A)으로부터의 요청에 응답하여 정보를 디바이스(306A)에게 송신할 수 있다. 회의 애플리케이션(310A)은 참석 통지기(1522)로부터의 통지에 응답하여 해당 요청을 송신할 수 있다.Stream forwarder 1526 broadcasts (with adjustments made by stream coordinator 1524) the received position/orientation information, video, audio, and screen sharing screens. Stream forwarder 1526 may transmit information to device 306A in response to a request from conferencing application 310A. Conferencing application 310A may send the request in response to the notification from attendance notifier 1522 .

네트워크 인터페이스(1528)는 컴퓨터 네트워크에서 2 개의 장비 또는 프로토콜 계층 사이의 소프트웨어 또는 하드웨어 인터페이스이다. 네트워크 인터페이스(1528)는 모델 정보를 다양한 참가자들의 디바이스들에게 전송한다. 네트워크 인터페이스(1528)는 다양한 참가자들로부터 비디오, 오디오, 및 화면 공유 화면들을 수신한다.Network interface 1528 is a software or hardware interface between two equipment or protocol layers in a computer network. Network interface 1528 transmits the model information to the devices of the various participants. Network interface 1528 receives video, audio, and screen sharing screens from various participants.

화면 캡처기(1514), 텍스처 매퍼(1516), 렌더러(1518), 오디오 프로세서(1520), 참석 통지기(1522), 스트림 조정기(1524), 및 스트림 포워더(1526)는 각각 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수 있다.Screen Capturer 1514, Texture Mapper 1516, Renderer 1518, Audio Processor 1520, Attendance Notifier 1522, Stream Coordinator 1524, and Stream Forwarder 1526 are respectively hardware, software, firmware, or any combination thereof.

"(a)," "(b)," "(i)," "(ii)" 등과 같은, 식별자들은 때때로 상이한 요소들 또는 단계들에 사용된다. 이러한 식별자들은 명확함을 위해 사용되며 반드시 요소들 또는 단계들의 순서를 지정하는 것은 아니다.Identifiers, such as "(a)," "(b)," "(i)," "(ii)", etc., are sometimes used for different elements or steps. These identifiers are used for clarity and do not necessarily specify a sequence of elements or steps.

본 발명은 지정된 기능들 및 그 관계들의 구현을 예시하는 기능 빌딩 블록들의 도움을 받아 위에서 설명되었다. 이러한 기능 빌딩 블록들의 경계들은 설명의 편의를 위해 본 명세서에서 임의로 정의되었다. 지정된 기능들 및 그 관계들이 적절하게 수행되는 한, 대안의 경계들이 정의될 수 있다.The invention has been described above with the aid of functional building blocks that illustrate the implementation of specified functions and their relationships. The boundaries of these functional building blocks have been arbitrarily defined herein for convenience of explanation. Alternate boundaries may be defined, as long as the specified functions and their relationships are properly performed.

다른 실시예들이 과도한 실험 없이 그리고 본 발명의 일반적인 개념을 벗어나지 않으면서, 당해 기술 분야의 지식을 적용하는 것에 의해, 특정 실시예들과 같은 다양한 응용들을 위해 용이하게 수정 및/또는 적응될 수 있도록, 특정 실시예들에 대한 전술한 설명은 따라서 본 발명의 일반적인 특성을 충분히 드러낼 것이다. 따라서, 그러한 적응 및 수정은 본 명세서에서 제시되는 교시 및 지침에 기초하여 개시된 실시예들의 균등물들의 의미 및 범위 내에 있는 것으로 의도된다. 본 명세서의 전문용어 또는 어구가 교시 및 지침을 바탕으로 통상의 기술자에 의해 해석될 수 있도록, 본 명세서에서의 어구 또는 전문용어가 설명을 위한 것이고 제한을 위한 것이 아님이 이해되어야 한다.that other embodiments may be readily modified and/or adapted for various applications, such as specific embodiments, by applying knowledge in the art, without undue experimentation and without departing from the general concept of the invention; The foregoing description of specific embodiments will thus fully reveal the general nature of the present invention. Accordingly, such adaptations and modifications are intended to be within the meaning and scope of equivalents of the disclosed embodiments based on the teaching and guidance presented herein. It should be understood that the terminology or phraseology in this specification is for descriptive purposes and not for limitation, so that terminology or phraseology in this specification can be interpreted by those skilled in the art based on the teachings and guidelines.

본 발명의 폭 및 범위가 위에서 설명된 예시적인 실시예들 중 어느 것에 의해서도 제한되어서는 안되고, 첨부된 청구항들 및 그들의 등가물들에 따라서만 한정되어야 한다.The breadth and scope of this invention should not be limited by any of the exemplary embodiments described above, but only in accordance with the appended claims and their equivalents.

Claims

A system for enabling videoconferencing between a first user and a second user, comprising:
a processor coupled to the memory;
display screen;
(i) data specifying a three-dimensional virtual space, (ii) a position and direction in the three-dimensional virtual space, the position and direction being input by the first user, and a camera on the first user's device. a network interface configured to receive a video stream captured from the camera, wherein the camera is arranged to capture photographic images of the first user;
and a web browser implemented in the processor and configured to download a web application from a server and execute the web application, wherein the web application:
a texture mapper configured to texture map the video stream onto a three-dimensional model of an avatar; and
A renderer, wherein the renderer:
(i) from the point of view of the second user's virtual camera, the three-dimensional virtual space comprising the texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user; render,
(ii) change the viewpoint of the virtual camera of the second user when input is received from the second user indicating a desire to change the viewpoint of the virtual camera;
(iii) re-rendering, from the changed viewpoint of the virtual camera, the three-dimensional virtual space including the texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user. A system configured to do so.

2. The method of claim 1 , wherein the device further comprises a graphics processing unit, wherein the texture mapper and the renderer include WebGL application calls that enable the web application to texture map or render using the graphics processing unit. system.

A computer implemented method for enabling a video conference between a first user and a second user, comprising:
transmitting a web application to a first client device of the first user and a second client device of the second user;
(i) a position and orientation in a three-dimensional virtual space, wherein the position and orientation are input by the first user, and (ii) a video stream captured from a camera on the first client device, wherein the camera is arranged to capture photo images of a user; receiving, from the first client device running the web application; and
transmitting the location and direction and the video stream to the second client device of the second user, wherein the web application, when executed on a web browser, causes the second client device to:
(i) cause the video stream to be texture mapped onto a three-dimensional model of an avatar, and (ii) located at the location and oriented in the direction for display to the second user, from the viewpoint of the second user's virtual camera. render the 3D virtual space including the texture-mapped 3D model of the avatar;
(ii) when input is received from the second user indicating a desire to change the viewpoint of the virtual camera, cause the second user to change the viewpoint of the virtual camera;
(iii) re-rendering, from the changed viewpoint of the virtual camera, the three-dimensional virtual space including the texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user. A method comprising executable instructions that cause

4. The method of claim 3, wherein the web application includes WebGL application calls that enable the web application to texture map or render using a graphics processing unit of the second client device.

A computer implemented method for enabling a video conference between a first user and a second user, comprising:
Receiving data specifying a 3D virtual space;
receiving a position and direction in the three-dimensional virtual space, wherein the position and direction are input by the first user;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
texture mapping the video stream onto a three-dimensional model of an avatar by a web application implemented in a web browser; and
The texture-mapped three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user, from the viewpoint of the second user's virtual camera, by the web application implemented in the web browser. Rendering the 3D virtual space including;
When input from the second user indicating a desire to change the viewpoint of the virtual camera is received:
changing the viewpoint of the virtual camera of the second user; and
re-rendering, from the changed viewpoint of the virtual camera, the three-dimensional virtual space including the texture-mapped three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user; Including, how.

According to claim 5,
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone being arranged to capture speech of the first user; and
outputting the audio stream for playback to the second user in synchronism with the display of the video stream in the three-dimensional virtual space;
Further comprising a method.

The method of claim 5, wherein the viewpoint of the virtual camera is defined by at least coordinates on a horizontal plane in the 3D virtual space and pan and tilt values.

6. The method of claim 5, when the new position and orientation of the first user in the three-dimensional virtual space are received:
re-rendering the three-dimensional virtual space comprising a texture-mapped three-dimensional model of the avatar positioned in the new location and oriented in the new direction for display to the second user.

6. The method of claim 5, wherein the texture mapping step comprises, for each frame of the video stream, iteratively mapping pixels onto the three-dimensional model of the avatar.

6. The method of claim 5, wherein the data, the position and direction, and the video stream are received from a server to a web browser, and wherein the texture mapping and rendering are performed by the web browser.

According to claim 10,
receiving, from the server, a notification indicating that the first user is no longer present; and
re-rendering the three-dimensional virtual space without the texture-mapped three-dimensional model of the avatar for display to the second user on the web browser.

According to claim 11,
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second location and a second direction of the third user in the 3D virtual space;
receiving a second video stream captured from a camera on the device of the third user, the camera arranged to capture photographic images of the third user;
texture mapping the second video stream onto a second three-dimensional model of a second avatar; and
the 3D virtual model comprising the second texture-mapped 3D model located at the second position and oriented in the second direction for display to the second user, from the viewpoint of the virtual camera of the second user. The method further comprising rendering the space.

6. The method of claim 5, wherein the receiving of the data specifying the three-dimensional virtual space comprises receiving a mesh specifying a conference space and receiving a background image, and the rendering comprises constructing the background image. and mapping the texture onto the image.

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for enabling a video conference between a first user and a second user, , the operations are:
receiving data designating a 3-dimensional virtual space;
receiving a position and direction in the 3D virtual space, the position and direction being input by the first user;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
texture mapping the video stream onto a 3D model of an avatar;
rendering the 3D virtual space including the texture-mapped 3D model of the avatar located at the location and oriented in the direction, from the viewpoint of the second user's virtual camera, for display to the second user; ;
When input from the second user indicating a desire to change the viewpoint of the virtual camera is received:
changing the viewpoint of the virtual camera of the second user; and
re-rendering, from the changed viewpoint of the virtual camera, the 3D virtual space including the texture-mapped 3D model of the avatar located at the location and oriented in the direction for display to the second user; contains,
wherein the data, the position and direction, and the video stream are received from a server to a web browser, and the texture mapping and rendering are performed by the web browser.

15. The method of claim 14, wherein the operations are:
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone being positioned to capture utterances of the first user; and
and outputting the audio stream for playback to the second user in synchronism with the display of the video stream in the 3D virtual space.

The device of claim 14, wherein the viewpoint of the virtual camera is defined by at least coordinates on a horizontal plane in the 3D virtual space and pan and tilt values.

15. The method of claim 14, wherein the actions are: when a new position and orientation of the first user in the three-dimensional virtual space are received:
and re-rendering the three-dimensional virtual space including the texture-mapped three-dimensional model of the avatar located in the new location and oriented in the new direction for display to the second user.

15. The device of claim 14, wherein the texture mapping comprises, for each frame of the video stream, iteratively mapping pixels onto the 3D model of the avatar.

15. The method of claim 14, wherein the operations are:
receiving, from the server, a notification indicating that the first user is no longer present; and
and re-rendering the three-dimensional virtual space without the texture-mapped three-dimensional model of the avatar for display to the second user on the web browser.

20. The method of claim 19, wherein the operations are:
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second position and a second direction of the third user in the 3D virtual space;
receiving a second video stream captured from a camera on the device of the third user, the camera arranged to capture photographic images of the third user;
texture mapping the second video stream onto a second 3-dimensional model of a second avatar; and
the 3D virtual model comprising the second texture-mapped 3D model located at the second position and oriented in the second direction for display to the second user, from the viewpoint of the virtual camera of the second user. A device further comprising an operation of rendering a space.

15. The method of claim 14, wherein the receiving of the data specifying the 3D virtual space comprises receiving a mesh specifying a conference space and receiving a background image, and the rendering includes constructing the background image. A device that includes an operation of mapping a texture onto an image.

A computer implemented method for enabling a video conference between a first user and a second user, comprising:
Receiving data specifying a 3D virtual space;
receiving a position and direction in the three-dimensional virtual space, wherein the position and direction are input by the first user;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
texture mapping the video stream onto a three-dimensional model of an avatar by a web application implemented in a web browser;
The texture-mapped three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user, from the viewpoint of the second user's virtual camera, by the web application implemented in the web browser. Rendering the 3D virtual space including;
receiving, from a server, a notification indicating that the first user is no longer present; and
re-rendering the three-dimensional virtual space without the texture-mapped three-dimensional model of the avatar for display to the second user on the web browser.

The method of claim 22,
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second location and a second direction of the third user in the 3D virtual space;
receiving a second video stream captured from a camera on the device of the third user, the camera arranged to capture photographic images of the third user;
texture mapping the second video stream onto a second three-dimensional model of a second avatar; and
the 3D virtual model comprising the second texture-mapped 3D model located at the second position and oriented in the second direction for display to the second user, from the viewpoint of the virtual camera of the second user. The method further comprising rendering the space.

A system for enabling videoconferencing between a first user and a second user, comprising:
a processor coupled to the memory;
display screen;
(i) data specifying a three-dimensional virtual space, (ii) a position and direction in the three-dimensional virtual space, the position and direction being input by the first user, and a camera on the first user's device. a network interface configured to receive a video stream captured from the camera, wherein the camera is arranged to capture photographic images of the first user;
and a web browser implemented in the processor and configured to download a web application from a server and execute the web application, wherein the web application:
a mapper configured to map the video stream onto a three-dimensional model of an avatar; and
the three-dimensional virtual space comprising the three-dimensional model of the avatar with the mapped video stream located at the location and oriented in the direction, from the viewpoint of the second user's virtual camera, for display to the second user. A system, including a renderer configured to render.

25. The system of claim 24, wherein the device further comprises a graphics processing unit, the mapper and the renderer including WebGL application calls that enable the web application to map or render using the graphics processing unit.

A computer implemented method for enabling a video conference between a first user and a second user, comprising:
transmitting a web application to a first client device of the first user and a second client device of the second user;
(i) a position and orientation in a three-dimensional virtual space, wherein the position and orientation are input by the first user, and (ii) a video stream captured from a camera on the first client device, wherein the camera is arranged to capture photo images of a user; receiving, from the first client device running the web application; and
and transmitting the location and direction and the video stream to the second client device of the second user, wherein the web application, when executed on a web browser, transmits the video stream to a three-dimensional model of an avatar. and the three-dimensional model of the avatar mapped from the viewpoint of the second user's virtual camera to the video stream located at the location and oriented in the direction for display to the second user. A system comprising executable instructions that render a dimensional virtual space.

27. The method of claim 26, wherein the web application includes WebGL application calls that enable the web application to map or render using a graphics processing unit of the second client device.

A computer implemented method for enabling a video conference between a first user and a second user, comprising:
Receiving data specifying a 3D virtual space;
receiving a position and direction in the three-dimensional virtual space, wherein the position and direction are input by the first user;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
mapping, by a web application implemented in a web browser, the video stream onto a three-dimensional model of an avatar; and
comprising the three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user, from the viewpoint of the second user's virtual camera, by the web application implemented in the web browser; and rendering the three-dimensional virtual space.

According to claim 28,
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone being arranged to capture utterances of the first user; and
outputting the audio stream for playback to the second user in synchronization with the display of the video stream within the three-dimensional virtual space.

29. The method of claim 28, wherein when input from the second user indicating a desire to change the viewpoint of the virtual camera is received:
changing the viewpoint of the virtual camera of the second user; and
Re-rendering, from the changed viewpoint of the virtual camera, the three-dimensional virtual space including the three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user. , Way.

31. The method of claim 30, wherein the viewpoint of the virtual camera is defined by at least coordinates on a horizontal plane in the three-dimensional virtual space and pan and tilt values.

29. The method of claim 28, wherein when a new position and orientation of the first user in the three-dimensional virtual space is received:
re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned in the new location and oriented in the new direction for display to the second user.

29. The method of claim 28, wherein the mapping comprises, for respective frames of the video stream, iteratively mapping pixels onto the three-dimensional model of the avatar.

29. The method of claim 28, wherein the data, the position and direction, and the video stream are received from a server to a web browser, and wherein the mapping and rendering are performed by the web browser.

35. The method of claim 34,
receiving, from the server, a notification indicating that the first user is no longer present; and
re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display to the second user on the web browser.

The method of claim 35,
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second location and a second direction of the third user in the 3D virtual space;
receiving a second video stream captured from a camera on the device of the third user, the camera arranged to capture photographic images of the third user;
mapping the second video stream onto a second three-dimensional model of a second avatar; and
Rendering of the three-dimensional virtual space including the second three-dimensional model located at the second position and oriented in the second direction for display to the second user, from the viewpoint of the virtual camera of the second user A method further comprising the step of doing.

29. The method of claim 28, wherein the step of receiving data specifying the three-dimensional virtual space includes receiving a mesh specifying a conference space and receiving a background image, and wherein the rendering step constructs the background image. A method comprising mapping onto a phase.

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for enabling a video conference between a first user and a second user, , the operations are:
receiving data designating a 3-dimensional virtual space;
receiving a position and direction in the 3D virtual space, the position and direction being input by the first user;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
mapping the video stream onto a 3D model of an avatar; and
Rendering, from the viewpoint of the second user's virtual camera, the three-dimensional virtual space including the three-dimensional model of the avatar located at the position and oriented in the direction for display to the second user. , device.

39. The method of claim 38, wherein the operations are:
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone being positioned to capture utterances of the first user; and
and outputting the audio stream for playback to the second user in synchronism with the display of the video stream in the 3D virtual space.

39. The method of claim 38, wherein the operations are: when input from the second user indicating a desire to change the viewpoint of the virtual camera is received:
changing the viewpoint of the virtual camera of the second user; and
Re-rendering, from the changed viewpoint of the virtual camera, the 3D virtual space including the 3D model of the avatar located at the location and oriented in the direction for display to the second user. , device.

41. The device of claim 40, wherein the viewpoint of the virtual camera is defined by at least coordinates on a horizontal plane in the three-dimensional virtual space and pan and tilt values.

39. The method of claim 38, wherein the operations are: when a new position and orientation of the first user in the three-dimensional virtual space is received:
and re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar located in the new location and oriented in the new direction for display to the second user.

39. The device of claim 38, wherein the mapping comprises, for respective frames of the video stream, iteratively mapping pixels onto the three-dimensional model of the avatar.

39. The device of claim 38, wherein the data, the position and direction, and the video stream are received from a server to a web browser, and wherein the mapping and rendering are performed by the web browser.

45. The method of claim 44, wherein the operations are:
receiving, from the server, a notification indicating that the first user is no longer present; and
and re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display to the second user on the web browser.

46. The method of claim 45, wherein the operations are:
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second position and a second direction of the third user in the 3D virtual space;
receiving a second video stream captured from a camera on the device of the third user, the camera arranged to capture photographic images of the third user;
mapping the second video stream onto a second 3-dimensional model of a second avatar; and
Rendering of the three-dimensional virtual space including the second three-dimensional model located at the second position and oriented in the second direction for display to the second user, from the viewpoint of the virtual camera of the second user A device further comprising an operation of doing.

39. The method of claim 38, wherein the receiving of the data specifying the three-dimensional virtual space includes receiving a mesh specifying a conference space and receiving a background image, and the rendering includes constructing the background image. A device comprising an operation of mapping onto an image.

A computer implemented method for presenting in a virtual conference involving a plurality of participants, comprising:
Receiving data specifying a 3D virtual space;
receiving a position and direction in the three-dimensional virtual space, the position and direction being input by a first participant of the plurality of participants of the virtual conference;
receiving a video stream captured from a camera on the device of the first participant, the camera arranged to capture photographic images of the first participant;
mapping the video stream onto a three-dimensional model of an avatar;
receiving a presentation stream from the device of the first participant;
mapping the presentation stream onto a 3D model of a presentation screen; and
rendering, from the viewpoint of a virtual camera of a second participant of the plurality of participants, the three-dimensional virtual space having the mapped avatar and the mapped presentation screen for display to the second participant; .

The method of claim 48,
receiving an audio stream captured in synchronization with the presentation stream from a microphone of the device of the first participant, the microphone being arranged to capture utterances of the first participant; and
outputting the audio stream for playback to the second participant in synchronization with the display of the presentation stream within the three-dimensional virtual space.

The method of claim 48,
receiving a position of a third participant among the plurality of participants in the 3D virtual space;
receiving an audio stream from a microphone of the third participant's device, the microphone being arranged to capture the third participant's utterance; and
adjusting the audio stream to provide a sense of the received position of the third participant in the three-dimensional virtual space relative to the position of the virtual camera;
wherein the rendering comprises rendering the three-dimensional virtual space with an avatar for the third participant at the received location for display to the second participant.

The method of claim 48,
receiving a position of the first participant in the 3-dimensional virtual space;
receiving an audio stream from a microphone of the first participant's device, the microphone being arranged to capture the first participant's utterance;
adjusting the audio stream to provide a sense of the received position of the first participant in the three-dimensional virtual space relative to the position of the virtual camera;
rendering the three-dimensional virtual space with an avatar for the third participant at the received location for display to the second participant; and
adjusting the audio stream to provide a sense of the location of the mapped presentation screen relative to the location of the virtual camera when entering presentation mode.

49. The method of claim 48, wherein the presentation stream is a video of the first participant.

49. The method of claim 48, wherein the presentation stream is the first participant's screen share.

49. The method of claim 48, wherein the step of mapping the video stream maps respective frames of the video stream onto the three-dimensional model of the avatar to present moving images of the face of the first participant on the avatar. A method comprising the steps of:

55. The method of claim 54, wherein the avatar comprises a surface, and wherein the mapping step comprises mapping the respective frames onto the surface.

56. The method of claim 55, wherein the rendering step comprises rendering the mapped avatar at the position and orientation in the three-dimensional virtual space, wherein the first participant determines the location and orientation input by the first participant. wherein the location and orientation of the mapped avatar within the rendered three-dimensional virtual space can be changed based on a change in orientation.

56. The method of claim 55, wherein the rendering step comprises rendering the avatar positioned at the position in the three-dimensional virtual space and the surface oriented in the direction in the three-dimensional virtual space;
receiving a new direction in the three-dimensional virtual space, the new direction being input by the first participant;
re-rendering the three-dimensional virtual space from the viewpoint of the virtual camera of the second participant such that the surface of the texture-mapped avatar is oriented in the new orientation for display to the second participant. , Way.

58. The method of claim 57, wherein when the new direction is input by the first participant, the virtual camera of the first participant changes according to the new direction, and the virtual camera of the first participant changes to the three-dimensional virtual space. specify how the first participant is rendered for display to the first participant.

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for a presentation in a virtual conference including a plurality of participants, the operations comprising: :
receiving data designating a 3-dimensional virtual space;
receiving a position and direction in the three-dimensional virtual space, the position and direction being input by a first participant among the plurality of participants of the virtual conference;
receiving a video stream captured from a camera on the first participant's device, the camera arranged to capture photographic images of the first participant;
mapping the video stream onto a 3D model of an avatar;
receiving a presentation stream from the device of the first participant;
mapping the presentation stream onto a 3D model of a presentation screen; and
and rendering the three-dimensional virtual space having the mapped avatar and the mapped presentation screen for display to the second participant from a viewpoint of a virtual camera of a second participant among the plurality of participants. .

60. The method of claim 59, wherein the operations are:
receiving an audio stream captured in synchronization with the presentation stream from a microphone of the device of the first participant, the microphone being positioned to capture utterances of the first participant; and
and outputting the audio stream for playback to the second participant in synchronism with the display of the presentation stream within the 3D virtual space.

60. The method of claim 59, wherein the operations are:
receiving a location of a third participant among the plurality of participants in the 3D virtual space;
receiving an audio stream from a microphone of the third participant's device, the microphone being positioned to capture the third participant's utterance; and
adjusting the audio stream to provide a sense of the received position of the third participant in the three-dimensional virtual space relative to the position of the virtual camera;
wherein the act of rendering comprises rendering the three-dimensional virtual space with an avatar for the third participant at the received location for display to the second participant.

60. The method of claim 59, wherein the operations are:
receiving a location of the first participant in the 3D virtual space;
receiving an audio stream from a microphone of the first participant's device, the microphone being positioned to capture the first participant's utterance;
adjusting the audio stream to provide a sense of the received position of the first participant in the three-dimensional virtual space relative to the position of the virtual camera;
rendering the three-dimensional virtual space with an avatar for the third participant at the received location for display to the second participant; and
and adjusting the audio stream to provide a sense of the location of the mapped presentation screen relative to the location of the virtual camera when entering a presentation mode.

60. The device of claim 59, wherein the presentation stream is a video of the first participant.

60. The device of claim 59, wherein the presentation stream is the screen sharing of the first participant.

A system for presenting in a virtual conference involving a plurality of participants, comprising:
a processor coupled to the memory;
(i) data designating a three-dimensional virtual space, (ii) a position and direction in the three-dimensional virtual space, wherein the position and direction are input by a first participant of the plurality of participants of the virtual conference; (iii) a video stream captured from a camera on the first participant's device, the camera being arranged to capture photographic images of the first participant; and (iv) receiving a presentation stream from the first participant's device. configured network interface;
a mapper implemented in the processor and configured to map the video stream onto a 3-dimensional model of an avatar and map the presentation stream onto a 3-dimensional model of a presentation screen; and
implemented on the processor, to render, from a viewpoint of a virtual camera of a second participant of the plurality of participants, the three-dimensional virtual space having the mapped avatar and the mapped presentation screen for display to the second participant; A system, containing a configured renderer.

66. The system of claim 65, wherein the presentation stream is a video of the first participant.

66. The system of claim 65, wherein the presentation stream is the first participant's screen share.

A computer implemented method for presenting in a virtual conference involving a plurality of participants, comprising:
From a first device of a first participant of the plurality of participants of the virtual conference, (i) a position and direction in the three-dimensional virtual space, the position and direction being input by the first participant, (ii) ) a video stream captured from a camera on the first device, the camera arranged to capture photographic images of the first participant, and (iii) receiving a presentation stream; and
transmitting the presentation stream to a second device of a second participant of the plurality of participants, wherein the second device (i) maps the presentation stream onto a three-dimensional model of a presentation screen; (ii) maps the video stream to an avatar; (iii) from the viewpoint of a virtual camera of the second participant of the plurality of participants, located at the location and oriented in the direction for display to the second participant; and render a three-dimensional virtual space having the mapped three-dimensional model of the avatar and the mapped presentation screen space.

69. The method of claim 68, wherein the presentation stream is a video of the first participant.

69. The method of claim 68, wherein the presentation stream is the first participant's screen share.

69. The method of claim 68,
The method further comprising transmitting, to the second device, a web application having executable code specifying how the second device maps and renders the presentation screen.

A computer implemented method for providing a virtual conference involving a plurality of participants, comprising:
rendering a three-dimensional virtual space from the point of view of a virtual camera of a first user for display to the first user, the three-dimensional virtual space including an avatar having texture-mapped video of a second user, wherein the virtual space includes an avatar of a second user; the camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space;
receiving an audio stream from a microphone of the second user's device, the microphone being arranged to capture the second user's utterance;
adjusting the volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second location is relative to the first location in the three-dimensional virtual space; and
and outputting the left audio stream and the right audio stream in stereo to be played back to the first user.

73. The method of claim 72, wherein the adjusting comprises adjusting the relative volumes of the left audio stream and the right audio stream based on a direction from the second location to the first location.

74. The method of claim 73, wherein the adjusting comprises adjusting relative volumes of the left audio stream and the right audio stream based on a direction from the second location to the first location on a horizontal plane in the three-dimensional virtual space. A method comprising steps.

73. The method of claim 72, wherein the adjusting step adjusts the relative volume of the left audio stream and the right audio stream based on a direction of the second position relative to the first position on a horizontal plane in the three-dimensional virtual space. A method comprising the steps of:

73. The method of claim 72, wherein the adjusting step based on the direction the virtual camera is facing in the three-dimensional virtual space, wherein the left audio stream has a higher volume when the avatar is positioned to the left of the virtual camera. and adjusting the relative volumes of the left and right audio streams such that the right audio stream tends to have a higher volume when the avatar is positioned to the right of the virtual camera. .

77. The method of claim 76, wherein the adjusting step is based on an angle between the direction the virtual camera is facing and the direction the avatar is facing, such that the angle is more perpendicular to where the avatar is facing the left audio stream. adjusting the relative volumes of the left audio stream and the right audio stream to tend to have a larger volume difference between the right audio stream and the right audio stream.

73. The method of claim 72, wherein the adjusting step comprises adjusting the volume of both the first audio stream and the second audio stream based on a distance between the second location and the first location. .

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for a presentation in a virtual conference including a plurality of participants, the operations comprising: :
rendering a 3D virtual space from the viewpoint of a first user's virtual camera for display to the first user, the 3D virtual space including an avatar having a texture mapped video of a second user, the virtual space the camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space;
receiving an audio stream from a microphone of the second user's device, the microphone being positioned to capture the second user's utterance;
adjusting the volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second location is relative to the first location in the three-dimensional virtual space; and
and outputting the left audio stream and the right audio stream in stereo to be reproduced by the first user.

80. The device of claim 79, wherein the adjusting comprises adjusting relative volumes of the left audio stream and the right audio stream based on a direction from the second location to the first location.

80. The method of claim 79, wherein the adjusting comprises adjusting relative volumes of the left audio stream and the right audio stream based on a direction from the second position to the first position on a horizontal plane in the three-dimensional virtual space. A device containing an action.

80. The method of claim 79, wherein the adjusting operation adjusts the relative volume of the left audio stream and the right audio stream based on a direction of the second position relative to the first position on a horizontal plane in the three-dimensional virtual space. A device that includes an operation to do.

83. The method of claim 82, wherein the adjusting operation causes the left audio stream to have a higher volume when the avatar is positioned to the left of the virtual camera based on the direction the virtual camera is facing in the three-dimensional virtual space. and adjusting relative volumes of the left audio stream and the right audio stream such that the right audio stream tends to have a higher volume when the avatar is positioned to the right of the virtual camera. .

84. The method of claim 83, wherein the adjusting operation is based on an angle between the direction the virtual camera is facing and the direction the avatar is facing, such that the angle is more perpendicular to the direction the avatar is facing the left audio stream. and adjusting the relative volumes of the left audio stream and the right audio stream to tend to have a larger volume difference between the right audio stream and the right audio stream.

80. The device of claim 79, wherein the adjusting comprises adjusting the volume of both the first audio stream and the second audio stream based on a distance between the second location and the first location. .

A system for providing a virtual conference involving a plurality of participants, comprising:
a processor coupled to the memory;
a renderer embodied in the processor and configured to render a three-dimensional virtual space for display to the first user from the viewpoint of a first user's virtual camera, the three-dimensional virtual space avatar having texture-mapped video of a second user; including, wherein the virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space;
a network interface configured to receive an audio stream from a microphone of the second user's device, the microphone being arranged to capture speech of the second user;
an audio processor configured to adjust a volume of the received audio stream to determine a left audio stream and a right audio stream to provide a sense of where the second location is relative to the first location in the three-dimensional virtual space; and
and a stereo speaker outputting the left audio stream and the right audio stream in stereo to be reproduced to the first user.

87. The system of claim 86, wherein the audio processor is configured to adjust relative volumes of the left audio stream and the right audio stream based on a direction from the second location to the first location.

88. The apparatus of claim 87 , wherein the audio processor is configured to adjust relative volumes of the left audio stream and the right audio stream based on a direction from the second location to the first location on a horizontal plane within the three-dimensional virtual space. being, the system.

87. The method of claim 86, wherein the audio processor is configured to adjust relative volumes of the left audio stream and the right audio stream based on a direction of the second location relative to the first location on a horizontal plane in the three-dimensional virtual space. configured system.

90. The audio processor of claim 89, wherein the audio processor tends to have a higher volume when the avatar is positioned to the left of the virtual camera, based on the direction the virtual camera is facing in the three-dimensional virtual space. and adjust the relative volumes of the left audio stream and the right audio stream such that the right audio stream tends to have a higher volume when the avatar is positioned to the right of the virtual camera.

87. The method of claim 86, wherein the audio processor determines, based on an angle between the direction the virtual camera is facing and the direction the avatar is facing, that the angle is more perpendicular to where the avatar is facing the left audio stream. and adjust the relative volumes of the left audio stream and the right audio stream to tend to have a larger volume difference between the right audio stream.

87. The system of claim 86, wherein the audio processor is configured to adjust the volume of both the first audio stream and the second audio stream based on a distance between the second location and the first location.

A computer implemented method for providing audio for a virtual conference, comprising:
(a) rendering, from the viewpoint of a first user's virtual camera, at least a portion of a three-dimensional virtual space for display to the first user, wherein the three-dimensional virtual space includes an avatar representing a second user; , the virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space, the 3D virtual space is divided into a plurality of regions;
(b) receiving an audio stream from a microphone of the second user's device, the microphone being arranged to capture the second user's utterance;
(c) determining whether the virtual camera and the avatar are located in the same area among the plurality of areas;
(d) determining whether the avatar is in a podium area of the plurality of areas;
(e) attenuating the audio stream when it is determined that the virtual camera and the avatar are not located in the same area and the avatar is not located in the podium area; and
(f) outputting the audio stream to be played back to the first user.

94. The method of claim 93, wherein the audio stream is a first audio stream, the three-dimensional virtual space includes a second avatar representing a third user, and the determining step (c) includes the virtual camera and the avatar Determining that they are located in the same area;
(g) receiving a second audio stream from a microphone of the first user's device, the microphone being arranged to capture speech of the first user;
(h) determining that the second avatar and the virtual camera are located in an area of the 3D virtual space different from the same area in which the virtual camera is located; and
(i) attenuating the first audio stream and the second audio stream to prevent the third user from hearing the first audio stream and the second audio stream, and between the first user and the second user; A computer implemented method further comprising enabling a pseudo-private conversation.

94. The method of claim 93, wherein each of the plurality of regions has a wall transmission factor specifying how much the audio stream is attenuated in (e).

94. The method of claim 93, wherein each region of the plurality of regions has a distance transmission factor;
(g) determining a distance between the virtual camera and the avatar in the 3D virtual space;
(h) determining at least one region between the virtual camera and the avatar; and
(i) attenuating the audio stream based on the distance determined in (g) and the distance transmittance corresponding to the at least one region determined in (h).

94. The method of claim 93, wherein the plurality of regions are organized as a hierarchy.

98. The method of claim 97, wherein respective areas of the plurality of areas have a wall transmittance;
(g) traversing the hierarchy to determine a region subset of the plurality of regions between the region containing the avatar and the region containing the virtual camera; and
(h) attenuating the audio stream based on the respective wall transmittance corresponding to the region subset determined in (g).

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations to provide audio for a virtual meeting, the operations comprising:
(a) rendering at least a portion of a 3D virtual space for display to the first user, from the viewpoint of a first user's virtual camera, wherein the 3D virtual space includes an avatar representing a second user; , the virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space, the 3D virtual space is divided into a plurality of regions;
(b) receiving an audio stream from a microphone of the second user's device, the microphone being positioned to capture the second user's utterance;
(c) determining whether the virtual camera and the avatar are located in the same area among the plurality of areas;
(d) determining whether the avatar is in a podium area among the plurality of areas;
(e) attenuating the audio stream when it is determined that the virtual camera and the avatar are not located in the same area and the avatar is not located in the podium area; and
(f) outputting the audio stream to be reproduced to the first user.

101. The method of claim 99, wherein the audio stream is a first audio stream, the three-dimensional virtual space includes a second avatar representing a third user, and the determining operation (c) includes the virtual camera and the avatar and determining that they are located in the same area, wherein the operations are:
(g) receiving a second audio stream from a microphone of the first user's device, the microphone being positioned to capture speech of the first user;
(h) determining that the second avatar and the virtual camera are located in an area of the 3D virtual space different from the same area in which the virtual camera is located; and
(i) attenuating the first audio stream and the second audio stream to prevent the third user from hearing the first audio stream and the second audio stream, and between the first user and the second user; The device further comprising an operation enabling a pseudo-private conversation.

101. The device of claim 99, wherein each area of the plurality of areas has a wall transmittance specifying how much the audio stream is attenuated in (e).

100. The method of claim 99, wherein each area of the plurality of areas has a distance transmittance, and the operations are:
(g) determining a distance between the virtual camera and the avatar in the 3D virtual space;
(h) determining at least one area between the virtual camera and the avatar; and
(i) attenuating the audio stream based on the distance determined in (g) and the distance transmittance corresponding to the at least one area determined in (h).

101. The device of claim 99, wherein the plurality of areas are organized as a hierarchy.

104. The method of claim 103, wherein respective areas of the plurality of areas have a wall transmittance, and the operations are to:
(g) traversing the hierarchy to determine a region subset of the plurality of regions between a region containing the avatar and a region containing the virtual camera; and
(h) attenuating the audio stream based on the respective wall transmittance corresponding to the region subset determined in (g).

A system for providing audio for a virtual conference, comprising:
a processor coupled to the memory;
a renderer implemented on a processor and configured to render, from the viewpoint of a virtual camera of a first user, at least a portion of a three-dimensional virtual space for display to the first user, wherein the three-dimensional virtual space represents an avatar representing a second user; wherein the virtual camera is at a first position in the 3D virtual space and the avatar is at a second position in the 3D virtual space, and the 3D virtual space is divided into a plurality of regions configured as a hierarchical structure. become -;
a network interface configured to receive an audio stream from a microphone of the second user's device, the microphone being arranged to capture speech of the second user; and
determine whether the virtual camera and the avatar are in the same area of the plurality of areas and whether the avatar is in a podium area of the plurality of areas, and the virtual camera and the avatar are not located in the same area and an audio processor configured to attenuate the audio stream and output the audio stream to be played back to the first user when it is determined that the avatar is not in the podium area.

106. The system of claim 105, wherein each area of the plurality of areas has a wall transmittance specifying how much the audio stream is attenuated.

106. The method of claim 105, wherein each region of the plurality of regions has a distance transmittance, and the audio processor: (i) determines a distance between the virtual camera and the avatar in the 3D virtual space; ( ii) determine at least one area between the virtual camera and the avatar, and (iii) attenuate the audio stream based on the determined distance and the distance transmittance corresponding to the determined at least one area. , system.

106. The method of claim 105, wherein each region of the plurality of regions has a wall transmittance, and the audio processor determines a region sub of the plurality of regions between a region including the avatar and a region including the virtual camera. traverse the hierarchy to determine a set, and attenuate the audio stream based on a respective wall transmittance corresponding to the determined subset of regions.

A computer implemented method for streaming video for a virtual conference, comprising:
(a) determining a distance between a first user and a second user in the virtual meeting space;
(b) receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
(c) selecting a reduced resolution or bitrate of the video stream based on the determined distance such that a closer distance results in a higher resolution or bitrate than a greater distance; and
(d) requesting transmission of the video stream at the reduced resolution or bitrate to the device of the second user for display to the second user within the virtual meeting space, wherein the video stream is transmitted within the virtual meeting space. mapped onto the first user's avatar for display to the second user.

109. The method of claim 109,
(e) receiving a second video stream captured from a camera on a device of a third user, the camera arranged to capture photographic images of the third user;
(f) determining available bandwidth for transmitting data for the virtual conference to the second user;
(g) determining a second distance between the first user and the second user in the virtual meeting space; and
(h) the use between the video stream received in (b) and the second video stream received in (e) based on the second distance determined in (g) for the distance determined in (a). The method further comprises apportioning the available bandwidth.

111. The method of claim 110, wherein the allocating (h) comprises prioritizing video streams of closer users over video streams from more distant users.

111. The method of claim 110,
(i) receiving a first audio stream from the device of the first user;
(j) further comprising receiving a second audio stream from the device of the third user, wherein the distributing (h) comprises reserving a portion of the first and second audio streams. Way.

112. The method of claim 112,
(k) reducing the quality of the first and second audio streams according to the size of the reserved portion.

114. The method of claim 113, wherein the reducing step (k) comprises reducing the quality independently of the second distance determined in (g) relative to the distance determined in (a).

109. The method of claim 109,
(e) determining that the distance between the first user and the second user at the virtual meeting space makes display of video at the distance ineffective;
In response to the determination in (e):
(f) stopping transmission of the video stream to the device of the second user; and
(g) notifying the device of the second user to replace the video stream with a still image.

110. The method of claim 109, wherein the video stream at the reduced resolution is onto an avatar to be rendered at the second user's location within the virtual meeting space for display by the second user's device to the second user. Mapped, how.

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for streaming video for a virtual meeting, the operations comprising:
(a) determining a distance between a first user and a second user in the virtual meeting space;
(b) receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user;
(c) selecting a reduced resolution or bitrate of the video stream based on the determined distance such that a closer distance results in a higher resolution or bitrate than a greater distance; and
(d) requesting transmission of the video stream at the reduced resolution or bitrate to the device of the second user for display to the second user within the virtual meeting space, wherein the video stream is transmitted within the virtual meeting space mapped onto the first user's avatar for display to the second user.

118. The method of claim 117, wherein the operations are:
(e) receiving a second video stream captured from a camera on a third user's device, the camera arranged to capture photographic images of the third user;
(f) determining an available bandwidth for transmitting data for the virtual conference to the second user;
(g) determining a second distance between the first user and the second user in the virtual meeting space; and
(h) the use between the video stream received in (b) and the second video stream received in (e) based on the second distance determined in (g) relative to the distance determined in (a). The device further comprising allocating available bandwidth.

119. The device of claim 118, wherein the distributing (h) comprises giving priority to video streams of closer users over video streams from more distant users.

119. The method of claim 118, wherein the operations are:
(i) receiving a first audio stream from the device of the first user;
(j) further comprising receiving a second audio stream from the device of the third user, wherein the distributing (h) includes reserving a portion of the first and second audio streams; device.

121. The method of claim 120, wherein the actions are:
(k) reducing the quality of the first and second audio streams according to the size of the reserved portion.

122. The device of claim 121, wherein the reducing operation (k) comprises reducing the quality independently of the second distance determined in (g) relative to the distance determined in (a).

118. The method of claim 117, wherein the operations are:
(e) determining that the distance between the first user and the second user at the virtual meeting space makes display of video at the distance ineffective;
In response to the determination in (e):
(f) stopping transmission of the video stream to the device of the second user; and
(g) notifying the device of the second user to replace the video stream with a still image.

118. The method of claim 117, wherein the video stream at the reduced resolution is onto an avatar to be rendered at the second user's location in the virtual meeting space for display by the second user's device to the second user. The device being mapped.

A system for streaming video for a virtual conference, comprising:
a processor coupled to the memory;
a network interface that receives a video stream captured from a camera on a device of a first user, the camera arranged to capture photographic images of the first user; and
determining a distance between a first user and a second user in a virtual conference space, and reducing the resolution of the video stream based on the determined distance such that a closer distance results in a greater resolution or bitrate than a greater distance A stream conditioner configured to
wherein the network interface is configured to transmit the video stream of the reduced resolution to a device of the second user for display to the second user in the virtual meeting space, wherein the video stream is configured to transmit the video stream to the second user in the virtual meeting space. mapped onto the first user's avatar for display to the user -, system.

126. The method of claim 125, wherein the network interface receives a second video stream captured from a camera on a third user's device, the camera arranged to capture photographic images of the third user;
The stream coordinator (i) determines an available bandwidth for transmitting data for the virtual meeting to the second user, and (ii) determines a second distance between the first user and the second user in the virtual meeting space. and (iii) allocate the available bandwidth between the video stream and the second video stream based on the second distance relative to the distance.

127. The system of claim 126, wherein the stream coordinator is configured to prioritize video streams of closer users over video streams from more distant users.

127. The method of claim 126, wherein the network interface is configured to receive a first audio stream from the first user's device and to receive a second audio stream from the third user's device, the stream coordinator comprising the first and second audio streams. A system configured to reserve a portion of the second audio streams.

A computer implemented method for streaming video for a virtual video conference, comprising:
Receiving a 3D model of a virtual environment;
Receiving a mesh representing a 3D model of an object;
receiving a video stream of a first participant of the virtual video conference, the video stream comprising a plurality of frames;
mapping respective frames of the plurality of frames of the video stream onto a three-dimensional model represented by a mesh to create an avatar navigable by the first participant, the mesh being created independent of the video stream. -; and
rendering, from the viewpoint of a second participant's virtual camera, the mesh representing the mapped avatar and the three-dimensional model of the object in the virtual environment for display for the second participant.

130. The method of claim 129, wherein the object is a product and the second participant can navigate the virtual camera around the three-dimensional model of the product.

130. The method of claim 129, wherein the object is an advertisement.

129. The method of claim 129,
and demonstrating the product at the virtual meeting space.

130. The method of claim 129, wherein the virtual environment is a building.

130. The method of claim 129, wherein the mesh is a first mesh,
and sending a request for the first mesh from the second participant, wherein the step of receiving the first mesh occurs in response to the request.

129. The method of claim 129,
receiving the location and direction of the first user in the 3D virtual space;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user; and
further comprising mapping the video stream onto a three-dimensional model of an avatar; wherein the rendering comprises rendering the virtual meeting to include a mapped three-dimensional model of the avatar positioned at the location and oriented in the orientation.

A non-transitory tangible computer readable device having stored thereon instructions that, when executed by at least one computing device, cause the at least one computing device to perform operations for streaming video for a virtual meeting, the operations comprising:
receiving a 3D model of the virtual environment;
receiving a mesh representing a 3D model of an object;
receiving a video stream of a first participant of the virtual video conference, the video stream comprising a plurality of frames;
mapping respective frames of the plurality of frames of the video stream onto a three-dimensional model represented by a mesh to create an avatar navigable by the first participant, the mesh being created independent of the video stream. -; and
rendering, from the viewpoint of a second participant's virtual camera, the mesh representing the mapped avatar and the three-dimensional model of the object in the virtual environment for display for the second participant. A tangible computer readable device.

137. The non-transitory tangible computer readable device of claim 136, wherein the object is a product and the second participant can navigate the virtual camera around the three-dimensional model of the product.

137. The non-transitory tangible computer readable device of claim 136, wherein the object is an advertisement.

138. The method of claim 137, wherein the operations are:
A non-transitory tangible computer-readable device further comprising demonstrating the product in the virtual meeting space.

137. The non-transitory tangible computer readable device of claim 136, wherein the virtual environment is a building.

137. The method of claim 136, wherein the mesh is a first mesh, and the operations are:
and sending a request for the first mesh from the second participant, wherein the step of receiving the first mesh occurs in response to the request.

137. The method of claim 136, wherein the operations are:
receiving a location and direction of the first user in the 3D virtual space;
receiving a video stream captured from a camera on the device of the first user, the camera arranged to capture photographic images of the first user; and
further comprising mapping the video stream onto a 3D model of an avatar; Wherein the act of rendering comprises an act of rendering the virtual meeting to include a mapped three-dimensional model of the avatar disposed at the location and oriented in the direction.

A system for streaming video for virtual videoconferencing, comprising:
a processor coupled to the memory;
(i) a three-dimensional model of a virtual environment, (ii) a mesh representing a three-dimensional model of an object, (iii) a network interface configured to receive a video stream of a first participant of the virtual video conference, the video stream comprising a plurality of contains frames -;
A mapper that maps respective frames of the plurality of frames of the video stream onto a three-dimensional model represented by a mesh to create an avatar navigable by the first participant, the mesh being created independent of the video stream. -; and
a renderer configured to render, from the viewpoint of a second participant's virtual camera, the mesh representing the mapped avatar and the three-dimensional model of the object in the virtual environment for display for the second participant. .

144. The system of claim 143, wherein the object is a product and the second participant can navigate the virtual camera around the three-dimensional model of the product.

144. The system of claim 143, wherein the object is an advertisement.

144. The system of claim 143, wherein the virtual environment is a building.