KR102580110B1

KR102580110B1 - Web-based video conferencing virtual environment with navigable avatars and its applications

Info

Publication number: KR102580110B1
Application number: KR1020227039238A
Authority: KR
Inventors: 제라드 코넬리스 크롤; 에릭 스튜어트 브라운드
Original assignee: 카트마이 테크 인크.
Priority date: 2020-10-20
Filing date: 2021-10-20
Publication date: 2023-09-18
Also published as: JP2023139110A; IL298268B1; CA3181367A1; KR20220160699A; JP2023534092A; WO2022087147A1; BR112022024836A2; KR20230119261A; CN116018803A; IL308489A; AU2021366657A1; EP4122192A1; AU2023229565A1; IL298268A; CA3181367C; JP7318139B1; AU2021366657B2

Abstract

비디오 아바타들이 가상 환경 내에서 내비게이션할 수 있게 하는 웹 기반 화상 회의 시스템이 본 명세서에서 개시된다. 이 시스템은 프레젠테이션 스트림이 가상 환경 내에 위치한 발표자 화면에 텍스처 매핑될 수 있게 하는 프레젠테이션 모드를 갖는다. 가상 공간에서의 아바타의 위치의 감각을 제공하기 위해 상대적인 좌우 사운드가 조정된다. 아바타가 위치하는 영역 및 가상 카메라가 위치하는 영역에 기초하여 사운드가 추가로 조정된다. 비디오 스트림 품질이 가상 공간에서의 상대적인 위치에 기초하여 조정된다. 가상 화상 회의 환경 내부에서 3차원 모델링이 이용 가능하다.Disclosed herein is a web-based video conferencing system that allows video avatars to navigate within a virtual environment. The system has a presentation mode that allows the presentation stream to be texture mapped to a presenter's screen located within the virtual environment. Relative left and right sounds are adjusted to provide a sense of the avatar's position in virtual space. The sound is further adjusted based on the area in which the avatar is located and the area in which the virtual camera is located. Video stream quality is adjusted based on relative position in virtual space. Three-dimensional modeling is available inside a virtual video conferencing environment.

Description

Web-based video conferencing virtual environment with navigable avatars and its applications

[관련 출원들의 상호 참조][Cross-reference to related applications]

본 출원은 2021년 4월 13일에 등록된 미국 특허 제10,979,672호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,338호, 2021년 3월 11일에 출원된 미국 실용 특허 출원 제17/198,323호, 2021년 8월 17일에 등록된 미국 특허 제11,095,857호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,362호, 2021년 3월 16일에 등록된 미국 특허 제10,952,006호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,390호, 2021년 7월 20일에 등록된 미국 특허 제11,070,768호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,408호, 2021년 7월 27일에 등록된 미국 특허 제11,076,128호로서 현재 등록되어 있는, 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,428호, 및 2020년 10월 20일에 출원된 미국 실용 특허 출원 제17/075,454호에 대한 우선권을 주장한다. 이러한 출원들 각각의 내용은 참조에 의해 그 전체가 본 명세서에 포함된다.This application is currently based on U.S. Patent No. 10,979,672, issued on April 13, 2021, and U.S. Utility Patent Application No. 17/075,338, filed on October 20, 2020, filed on March 11, 2021. U.S. Utility Patent Application No. 17/198,323, currently issued as U.S. Patent No. 11,095,857, issued on Aug. 17, 2021; U.S. Utility Patent Application No. 17/075,362, filed on October 20, 2020; U.S. patent application Ser. No. 17/075,390, filed Oct. 20, 2020, currently issued as U.S. Patent No. 10,952,006, issued March 16, 2021; U.S. Utility Patent Application No. 17/075,408, filed October 20, 2020, currently filed as U.S. Patent No. 11,070,768, currently filed as U.S. Patent No. 11,076,128, filed July 27, 2021 Priority is claimed on U.S. Utility Patent Application No. 17/075,428, filed Oct. 20, and U.S. Utility Patent Application No. 17/075,454, filed Oct. 20, 2020. The contents of each of these applications are incorporated herein by reference in their entirety.

[기술 분야][Technical field]

본 기술 분야는 일반적으로 화상 회의에 관한 것이다.This field of technology generally relates to video conferencing.

[관련 기술][Related technology]

화상 회의는 실시간으로 사람들 사이의 통신을 위해 상이한 위치들에 있는 사용자들에 의한 오디오-비디오 신호들의 수신 및 전송을 수반한다. 화상 회의는, 캘리포니아 산호세의 Zoom Communications Inc.로부터 이용 가능한 ZOOM 서비스를 포함한, 각종의 상이한 서비스들로부터 많은 컴퓨팅 디바이스들에서 널리 이용 가능하다. 캘리포니아 쿠퍼티노의 Apple Inc.로부터 이용 가능한 FaceTime 애플리케이션과 같은, 일부 화상 회의 소프트웨어는 모바일 디바이스들에 표준으로 제공된다.Videoconferencing involves the reception and transmission of audio-video signals by users at different locations for communication between people in real time. Video conferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, California. Some video conferencing software, such as the FaceTime application available from Apple Inc. of Cupertino, California, comes standard on mobile devices.

일반적으로, 이러한 애플리케이션들은 다른 회의 참가자들의 비디오를 디스플레이하고 오디오를 출력하는 것에 의해 작동한다. 다수의 참가자들이 있을 때, 화면은, 참가자의 비디오를 각각 디스플레이하는, 다수의 직사각형 프레임들로 분할될 수 있다. 때때로 이러한 서비스들은 말하는 사람의 비디오를 제시하는 보다 큰 프레임을 갖는 것에 의해 작동한다. 상이한 개인들이 발화(speech)할 때, 해당 프레임이 발화자들 간에 전환될 것이다. 애플리케이션은 사용자의 디바이스와 통합된 카메라로부터의 비디오와 사용자의 디바이스와 통합된 마이크로폰으로부터의 오디오를 캡처한다. 애플리케이션은 이어서 해당 오디오 및 비디오를 다른 사용자의 디바이스들에서 실행 중인 다른 애플리케이션들에게 전송한다.Typically, these applications work by displaying video and outputting audio of other meeting participants. When there are multiple participants, the screen may be divided into multiple rectangular frames, each displaying a participant's video. Sometimes these services work by having larger frames that present a video of the speaker. When different individuals speak, the frame will switch between speakers. The application captures video from a camera integrated with the user's device and audio from a microphone integrated with the user's device. The application then transmits the audio and video to other applications running on other users' devices.

이러한 화상 회의 애플리케이션들 중 다수는 화면 공유 기능을 갖는다. 사용자가 자신의 화면(또는 자신의 화면의 일 부분)을 공유하기로 결정할 때, 스트림이 자신의 화면의 내용과 함께 다른 사용자들의 디바이스들에게 전송된다. 일부 경우에, 다른 사용자들이 심지어 사용자의 화면에 있는 것을 제어할 수 있다. 이러한 방식으로, 사용자들은 프로젝트에 대해 협업하거나 다른 회의 참가자들에게 프레젠테이션을 할 수 있다.Many of these video conferencing applications have screen sharing capabilities. When a user decides to share their screen (or a portion of their screen), a stream is sent to other users' devices along with the contents of their screen. In some cases, other users can even control what's on your screen. This way, users can collaborate on projects or make presentations to other meeting participants.

최근에, 화상 회의 기술의 중요성이 증가되었다. 많은 직장들, 무역 박람회들, 회의들, 회담들, 학교들, 및 예배 장소들이 폐쇄되었거나 사람들에게 질병, 특히 COVID-19의 확산을 두려워하여 참석하지 않도록 권장했다. 화상 회의 기술을 사용하는 가상 회의들은 실제 회의들을 점점 더 대체하고 있다. 추가적으로, 이 기술은 여행과 출퇴근을 피하기 위해 물리적으로 만나는 것보다 장점들을 제공한다.Recently, the importance of video conferencing technology has increased. Many workplaces, trade fairs, conferences, conferences, schools, and places of worship have been closed or have encouraged people not to attend for fear of spreading the disease, especially COVID-19. Virtual meetings using videoconferencing technology are increasingly replacing physical meetings. Additionally, this technology offers advantages over physically meeting to avoid travel and commuting.

그렇지만 종종, 이 화상 회의 기술의 사용은 장소 감각(sense of place)의 상실을 야기한다. 회의가 가상으로 수행될 때 상실되는, 동일한 장소에 있으면서 물리적으로 직접 만나는 것에 대한 경험적 측면이 있다. 자세를 취하고 동료들을 바라볼 수 있는 것에 대한 사회적 측면이 있다. 이러한 경험의 느낌은 관계들 및 사회적 연결들을 만드는 데 중요하다. 그러나, 종래의 화상 회의들에 관한 한 이러한 느낌이 부족하다.But often, the use of this video conferencing technology results in a loss of sense of place. There is an experiential aspect of being in the same location and physically meeting in person that is lost when meetings are conducted virtually. There is a social aspect to being able to pose and look at your peers. This feeling of experience is important in creating relationships and social connections. However, this feeling is lacking when it comes to conventional video conferences.

더욱이, 회의에 여러 참가자들이 모이기 시작할 때, 이러한 화상 회의 기술들에 추가적인 문제들이 발생한다. 물리적으로 만나는 회의에서, 사람들은 부차적인 대화를 나눌 수 있다. 자신에 가까이 있는 사람들만이 자신이 말하는 것을 들을 수 있도록 자신의 목소리를 투사할 수 있다. 일부 경우에, 보다 대규모의 회의와 관련하여 심지어 사적인 대화를 나눌 수 있다. 그렇지만, 가상 회의에서, 다수의 사람들이 동시에 발화하고 있을 때, 소프트웨어는 2 개의 오디오 스트림을 실질적으로 동일하게 믹싱하여, 참가자들로 하여금 서로 겹쳐서 발화하게 한다. 따라서, 다수의 사람들이 가상 회의에 관여될 때, 사적인 대화가 불가능하며, 대화가 일대다의 발화의 형태로 보다 많이 이루어지는 경향이 있다. 여기에서도, 가상 회의는 참가자들이 사회적 연결들을 만들고 보다 효과적으로 의사소통을 하며 인적 네트워크를 형성할 수 있는 기회를 잃게 된다.Moreover, when meetings start to gather multiple participants, additional problems arise with these video conferencing technologies. In physical meetings, people can have secondary conversations. You can project your voice so that only people close to you can hear what you are saying. In some cases, you may even have private conversations in the context of larger meetings. However, in a virtual meeting, when multiple people are speaking simultaneously, the software mixes the two audio streams substantially identically, causing participants to speak on top of each other. Therefore, when multiple people are involved in a virtual meeting, private conversations are not possible and conversations tend to take place more in the form of one-to-many speech. Here, too, virtual meetings rob participants of opportunities to make social connections, communicate more effectively, and network.

더욱이, 네트워크 대역폭 및 컴퓨팅 하드웨어의 제한으로 인해, 회의에서 많은 스트림들이 배치될 때, 많은 화상 회의 시스템들의 성능이 느려지기 시작한다. 많은 컴퓨팅 디바이스들은, 소수의 참가자들로부터의 비디오 스트림을 처리할 장비를 갖추고 있지만, 십여 명의 참가자로부터의 비디오 스트림을 처리하기에는 장비가 불충분하다. 많은 학교들이 완전히 가상으로 운영되기 때문에, 25 명의 학급은 학교에서 지급한 컴퓨팅 디바이스들의 속도를 심각하게 저하시킬 수 있다.Moreover, due to limitations in network bandwidth and computing hardware, the performance of many video conferencing systems begins to slow down when many streams are deployed in a conference. Many computing devices are equipped to process video streams from a few participants, but are inadequately equipped to process video streams from a dozen or so participants. With many schools operating entirely virtually, a class of 25 students can seriously slow down school-issued computing devices.

대규모 멀티플레이어 온라인 게임들(MMOG 또는 MMO)은 일반적으로 25 명보다 상당히 더 많은 참가자를 처리할 수 있다. 이러한 게임들은 종종 단일 서버에 수백 또는 수천 명의 플레이어를 갖는다. MMO들은 종종 플레이어들이 아바타들을 가상 세계 여기저기로 내비게이션할 수 있게 한다. 때때로 이러한 MMO들은 사용자들이 서로 대화할 수 있게 하거나 메시지들을 서로에게 송신할 수 있게 한다. 예들은 캘리포니아 샌머테이오의 Roblox Corporation으로부터 이용 가능한 ROBLOX 게임 및 스웨덴 스톡홀름의 Mojang Studios로부터 이용 가능한 MINECRAFT 게임을 포함한다.Massively multiplayer online games (MMOG or MMO) can typically handle significantly more than 25 participants. These games often have hundreds or thousands of players on a single server. MMOs often allow players to navigate avatars around the virtual world. Sometimes these MMOs allow users to talk to each other or send messages to each other. Examples include the ROBLOX game available from Roblox Corporation of San Mateo, California and the MINECRAFT game available from Mojang Studios of Stockholm, Sweden.

무표정의(bare) 아바타들이 서로 상호작용하게 하는 것이 또한 사회적 상호작용의 측면에서 제한들이 있다. 이러한 아바타들은 일반적으로, 사람들이 종종 무심코 짓는 얼굴 표정들을 전달할 수 없다. 화상 회의에서 이러한 얼굴 표정들이 관찰 가능하다. 일부 공보들은 가상 세계에서 아바타에 비디오를 배치하는 것을 설명할 수 있다. 그렇지만, 이러한 시스템들은 전형적으로 특수 소프트웨어를 필요로 하며 그들의 유용성을 제한하는 다른 제한들이 있다.Having bare avatars interact with each other also has limitations in terms of social interaction. These avatars are generally unable to convey the facial expressions that people often make inadvertently. These facial expressions can be observed in video conferences. Some publications may describe placing a video on an avatar in a virtual world. However, these systems typically require special software and have other limitations that limit their usefulness.

화상 회의를 위한 개선된 방법들이 필요하다.Improved methods for video conferencing are needed.

일 실시예에서, 디바이스는 제1 사용자와 제2 사용자 사이의 화상 회의를 가능하게 한다. 디바이스는 메모리에 결합된 프로세서, 디스플레이 스크린, 네트워크 인터페이스, 및 웹 브라우저를 포함한다. 네트워크 인터페이스는: (i) 3차원 가상 공간을 지정하는 데이터, (ii) 3차원 가상 공간에서의 위치 및 방향 - 위치 및 방향은 제1 사용자에 의해 입력됨 -, 및 (iii) 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림을 수신하도록 구성된다. 제1 사용자의 카메라는 제1 사용자의 사진 이미지들을 캡처하도록 배치된다. 프로세서에서 구현되는 웹 브라우저는 서버로부터 웹 애플리케이션을 다운로드하고 웹 애플리케이션을 실행하도록 구성된다. 웹 애플리케이션은 텍스처 매퍼 및 렌더러를 포함한다. 텍스처 매퍼는 비디오 스트림을 아바타의 3차원 모델 상으로 텍스처 매핑하도록 구성된다. 렌더러는, 제2 사용자의 가상 카메라의 시점으로부터, 제2 사용자에게 디스플레이하기 위해 해당 위치에 위치하고 해당 방향으로 배향되는 아바타의 텍스처 매핑된 3차원 모델을 포함하는 3차원 가상 공간을 렌더링하도록 구성된다. 웹 애플리케이션 내에서 텍스처 매핑을 관리하는 것에 의해, 실시예들은 특수 소프트웨어를 설치할 필요가 없게 한다.In one embodiment, the device enables video conferencing between a first user and a second user. The device includes a processor coupled to memory, a display screen, a network interface, and a web browser. The network interface includes: (i) data specifying a three-dimensional virtual space, (ii) a position and orientation in the three-dimensional virtual space, where the position and orientation are entered by the first user, and (iii) the first user's Configured to receive a video stream captured from a camera on the device. The first user's camera is positioned to capture photographic images of the first user. A web browser implemented in the processor is configured to download a web application from a server and execute the web application. The web application includes a texture mapper and renderer. The texture mapper is configured to texture map the video stream onto a three-dimensional model of the avatar. The renderer is configured to render, from the viewpoint of the second user's virtual camera, a three-dimensional virtual space including a texture-mapped three-dimensional model of the avatar positioned at a corresponding location and oriented in a corresponding direction for display to the second user. By managing texture mapping within the web application, embodiments eliminate the need to install special software.

일 실시예에서, 컴퓨터 구현 방법은 복수의 참가자들을 포함하는 가상 회의에서 프레젠테이션을 가능하게 한다. 이 방법에서, 3차원 가상 공간을 지정하는 데이터가 수신된다. 3차원 가상 공간에서의 위치와 방향이 또한 수신된다. 해당 위치 및 방향은 회의에 대한 복수의 참가자들 중 제1 참가자에 의해 입력되었다. 마지막으로, 제1 참가자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림이 수신된다. 카메라는 제1 참가자의 사진 이미지들을 캡처하도록 배치되었다. 비디오 스트림은 아바타의 3차원 모델 상으로 텍스처 매핑된다. 추가적으로, 제1 참가자의 디바이스로부터 프레젠테이션 스트림이 수신된다. 프레젠테이션 스트림은 프레젠테이션 화면의 3차원 모델 상으로 텍스처 매핑된다. 마지막으로, 복수의 참가자들 중 제2 참가자의 가상 카메라의 시점으로부터, 제2 참가자에게 디스플레이하기 위해 텍스처 매핑된 아바타와 텍스처 매핑된 프레젠테이션 화면을 갖는 3차원 가상 공간이 렌더링된다. 이러한 방식으로, 실시예들은 소셜 콘퍼런스(social conference) 환경에서 프레젠테이션을 가능하게 한다.In one embodiment, a computer-implemented method enables presentations in a virtual meeting including a plurality of participants. In this method, data specifying a three-dimensional virtual space is received. The position and orientation in three-dimensional virtual space are also received. The location and direction were entered by the first of the plurality of participants to the meeting. Finally, a video stream captured from the camera on the first participant's device is received. The camera was positioned to capture photographic images of the first participant. The video stream is texture mapped onto a three-dimensional model of the avatar. Additionally, a presentation stream is received from the first participant's device. The presentation stream is texture mapped onto a three-dimensional model of the presentation screen. Finally, from the viewpoint of the virtual camera of a second of the plurality of participants, a three-dimensional virtual space is rendered with a texture-mapped avatar and a texture-mapped presentation screen for display to the second participant. In this way, embodiments enable presentations in a social conference environment.

일 실시예에서, 컴퓨터 구현 방법은 복수의 참가자들을 포함하는 가상 회의를 위한 오디오를 제공한다. 이 방법에서, 제1 사용자의 가상 카메라의 시점으로부터, 제1 사용자에게 디스플레이하기 위해 제2 사용자의 비디오로 텍스처 매핑된 아바타를 포함하는 3차원 가상 공간이 렌더링된다. 가상 카메라는 3차원 가상 공간에서의 제1 위치에 있고 아바타는 3차원 가상 공간에서의 제2 위치에 있다. 제2 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 마이크로폰은 제2 사용자의 발화를 캡처하도록 배치되었다. 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하도록 좌측 오디오 스트림과 우측 오디오 스트림을 결정하기 위해 수신된 오디오 스트림의 볼륨이 조정된다. 좌측 오디오 스트림과 우측 오디오 스트림은 스테레오로 제1 사용자에게 재생되도록 출력된다.In one embodiment, a computer-implemented method provides audio for a virtual conference including a plurality of participants. In this method, from the viewpoint of the first user's virtual camera, a three-dimensional virtual space containing an avatar texture-mapped to the second user's video is rendered for display to the first user. The virtual camera is at a first position in the three-dimensional virtual space and the avatar is at a second position in the three-dimensional virtual space. An audio stream from the microphone of the second user's device is received. The microphone was positioned to capture the second user's speech. The volume of the received audio stream is adjusted to determine a left audio stream and a right audio stream to provide a sense of where the second location is relative to the first location in three-dimensional virtual space. The left audio stream and the right audio stream are output to be played to the first user in stereo.

일 실시예에서, 컴퓨터 구현 방법은 가상 회의를 위한 오디오를 제공한다. 이 방법에서, 제1 사용자의 가상 카메라의 시점으로부터, 제1 사용자에게 디스플레이하기 위해 제2 사용자의 비디오로 텍스처 매핑된 아바타를 포함하는 3차원 가상 공간이 렌더링된다. 가상 카메라는 3차원 가상 공간에서의 제1 위치에 있고 아바타는 3차원 가상 공간에서의 제2 위치에 있다. 제2 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 가상 카메라와 아바타가 복수의 영역들 중 동일한 영역에 위치하는지 여부가 결정된다. 가상 카메라와 아바타가 동일한 영역에 위치하지 않는 것으로 결정될 때, 오디오 스트림이 감쇠된다. 감쇠된 오디오 스트림은 제1 사용자에게 재생되도록 출력된다. 이러한 방식으로, 실시예들은 가상 화상 회의 환경에서 사적이고 부차적인 대화를 가능하게 한다.In one embodiment, a computer-implemented method provides audio for a virtual conference. In this method, from the viewpoint of the first user's virtual camera, a three-dimensional virtual space containing an avatar texture-mapped to the second user's video is rendered for display to the first user. The virtual camera is at a first position in the three-dimensional virtual space and the avatar is at a second position in the three-dimensional virtual space. An audio stream from the microphone of the second user's device is received. It is determined whether the virtual camera and the avatar are located in the same area among the plurality of areas. When it is determined that the virtual camera and the avatar are not located in the same area, the audio stream is attenuated. The attenuated audio stream is output to be played to the first user. In this way, embodiments enable private, secondary conversations in a virtual video conferencing environment.

일 실시예에서, 컴퓨터 구현 방법은 가상 회의를 위한 비디오를 효율적으로 스트리밍한다. 이 방법에서, 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리가 결정된다. 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림이 수신된다. 카메라는 제1 사용자의 사진 이미지들을 캡처하도록 배치되었다. 보다 가까운 거리가 보다 먼 거리보다 더 큰 해상도를 결과하도록 비디오 스트림의 해상도 또는 비트레이트가 결정된 거리에 기초하여 감소된다. 비디오 스트림은 가상 회의 공간 내의 제2 사용자에게 디스플레이하기 위해 감소된 해상도 또는 비트레이트로 제2 사용자의 디바이스에게 전송된다. 비디오 스트림은 가상 회의 공간 내의 제2 사용자에게 디스플레이하기 위해 제1 사용자의 아바타 상에 텍스처 매핑된다. 이러한 방식으로, 실시예들은 많은 수의 회의 참가자들이 있을 때에도 대역폭 및 컴퓨팅 자원들을 효율적으로 할당한다.In one embodiment, a computer-implemented method efficiently streams video for a virtual conference. In this method, the distance between a first user and a second user in the virtual meeting space is determined. A video stream captured from a camera on a first user's device is received. The camera was positioned to capture photographic images of the first user. The resolution or bitrate of the video stream is reduced based on the determined distance so that closer distances result in greater resolution than longer distances. The video stream is transmitted to the second user's device at a reduced resolution or bitrate for display to the second user within the virtual meeting space. The video stream is texture mapped onto the first user's avatar for display to a second user within the virtual meeting space. In this way, embodiments allocate bandwidth and computing resources efficiently even when there are a large number of conference participants.

일 실시예에서, 컴퓨터 구현 방법은 가상 화상 회의에서의 모델링을 가능하게 한다. 이 방법에서, 가상 환경의 3차원 모델, 객체의 3차원 모델을 나타내는 메시, 및 가상 화상 회의의 참가자로부터의 비디오 스트림이 수신된다. 비디오 스트림은 참가자에 의해 내비게이션 가능한 아바타에 텍스처 매핑된다. 텍스처 매핑된 아바타와 가상 환경 내의 객체의 3차원 모델을 나타내는 메시가 디스플레이하기 위해 렌더링된다.In one embodiment, a computer-implemented method enables modeling in virtual video conferencing. In this method, a three-dimensional model of a virtual environment, a mesh representing a three-dimensional model of an object, and a video stream from a participant in a virtual video conference are received. The video stream is texture mapped onto an avatar that can be navigated by the participant. A texture-mapped avatar and a mesh representing a three-dimensional model of objects within the virtual environment are rendered for display.

시스템, 디바이스, 및 컴퓨터 프로그램 제품 실시예들이 또한 개시된다.System, device, and computer program product embodiments are also disclosed.

본 발명의 추가 실시예들, 특징들, 및 장점들은 물론, 다양한 실시예들의 구조 및 동작이 첨부 도면을 참조하여 아래에서 상세하게 설명된다.Additional embodiments, features, and advantages of the invention, as well as the structure and operation of various embodiments, are described in detail below with reference to the accompanying drawings.

여기에 포함되어 명세서의 일부를 형성하는 첨부 도면은 본 개시내용을 예시하고, 이 설명과 함께, 추가로 본 개시내용의 원리들을 설명하고 관련 기술의 통상의 기술자가 본 개시내용을 제조 및 사용할 수 있게 하는 역할을 한다.
도 1은 비디오 스트림들이 아바타들 상으로 매핑되는 가상 환경에서 화상 회의를 제공하는 예시적인 인터페이스를 예시하는 다이어그램이다.
도 2는 화상 회의를 위한 아바타들을 갖는 가상 환경을 렌더링하는 데 사용되는 3차원 모델을 예시하는 다이어그램이다.
도 3은 가상 환경에서 화상 회의를 제공하는 시스템을 예시하는 다이어그램이다.
도 4a 내지 도 4c는 화상 회의를 제공하기 위해 도 3에서의 시스템의 다양한 컴포넌트들 사이에서 데이터가 어떻게 전송되는지를 예시한다.
도 5는 화상 회의 동안 가상 환경에서 위치 감각(sense of position)을 제공하기 위해 상대적인 좌우 볼륨을 조정하기 위한 방법을 예시하는 플로차트이다.
도 6은 아바타들 사이의 거리가 증가함에 따라 볼륨이 어떻게 롤오프되는지를 예시하는 차트이다.
도 7은 화상 회의 동안 가상 환경에서 상이한 볼륨 영역들을 제공하기 위해 상대적인 볼륨을 조정하기 위한 방법을 예시하는 플로차트이다.
도 8a 및 도 8b는 화상 회의 동안 가상 환경에서의 상이한 볼륨 영역들을 예시하는 다이어그램들이다.
도 9a 내지 도 9c는 화상 회의 동안 가상 환경에서의 볼륨 영역들의 계층구조를 순회하는 것을 예시하는 다이어그램들이다.
도 10은 3차원 가상 환경에서의 3차원 모델과의 인터페이스를 예시한다.
도 11은 화상 회의에 사용되는 3차원 가상 환경에서의 프레젠테이션 화면 공유를 예시한다.
도 12는 3차원 가상 환경 내에서의 아바타들의 상대적인 위치에 기초하여 이용 가능한 대역폭을 배분하기 위한 방법을 예시하는 플로차트이다.
도 13은 아바타들 사이의 거리가 증가함에 따라 우선순위 값이 어떻게 떨어질 수 있는지를 예시하는 차트이다.
도 14는 할당된 대역폭이 상대적인 우선순위에 기초하여 어떻게 달라질 수 있는지를 예시하는 차트이다.
도 15는 가상 환경 내에서 화상 회의를 제공하기 위해 사용되는 디바이스들의 컴포넌트들을 예시하는 다이어그램이다.
요소가 처음으로 나오는 도면은 전형적으로 대응하는 참조 번호에서의 가장 왼쪽의 숫자 또는 숫자들에 의해 표시된다. 도면에서, 비슷한 참조 번호들은 동일하거나 기능적으로 유사한 요소들을 나타낼 수 있다.The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the disclosure and, together with this description, further explain the principles of the disclosure and enable any person skilled in the art to make and use the disclosure. It plays a role in making things happen.
1 is a diagram illustrating an example interface for providing video conferencing in a virtual environment where video streams are mapped onto avatars.
2 is a diagram illustrating a three-dimensional model used to render a virtual environment with avatars for video conferencing.
3 is a diagram illustrating a system for providing video conferencing in a virtual environment.
Figures 4A-4C illustrate how data is transferred between the various components of the system in Figure 3 to provide video conferencing.
Figure 5 is a flow chart illustrating a method for adjusting relative left and right volumes to provide a sense of position in a virtual environment during a video conference.
Figure 6 is a chart illustrating how the volume rolls off as the distance between avatars increases.
Figure 7 is a flow chart illustrating a method for adjusting relative volume to provide different volume regions in a virtual environment during video conferencing.
8A and 8B are diagrams illustrating different volume areas in a virtual environment during a video conference.
9A-9C are diagrams illustrating traversing the hierarchy of volume areas in a virtual environment during a video conference.
10 illustrates an interface with a 3D model in a 3D virtual environment.
11 illustrates presentation screen sharing in a three-dimensional virtual environment used in video conferencing.
Figure 12 is a flow chart illustrating a method for allocating available bandwidth based on the relative positions of avatars within a three-dimensional virtual environment.
Figure 13 is a chart illustrating how priority values can drop as the distance between avatars increases.
Figure 14 is a chart illustrating how allocated bandwidth may vary based on relative priority.
Figure 15 is a diagram illustrating components of devices used to provide video conferencing within a virtual environment.
The drawing in which an element first appears is typically indicated by the leftmost number or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.

가상 환경에서의 아바타들을 사용한 화상 회의Video conferencing using avatars in a virtual environment

도 1은 비디오 스트림들이 아바타들 상으로 매핑되는 가상 환경에서 화상 회의를 제공하는 인터페이스(100)의 예를 예시하는 다이어그램이다.1 is a diagram illustrating an example of an interface 100 providing video conferencing in a virtual environment where video streams are mapped onto avatars.

인터페이스(100)는 화상 회의의 참가자에게 디스플레이될 수 있다. 예를 들어, 인터페이스(100)는 참가자에게 디스플레이하기 위해 렌더링될 수 있고 화상 회의가 진행됨에 따라 지속적으로 업데이트될 수 있다. 사용자는, 예를 들어, 키보드 입력들을 사용하여 자신의 가상 카메라의 배향을 제어할 수 있다. 이러한 방식으로, 사용자는 가상 환경 여기저기로 내비게이션할 수 있다. 일 실시예에서, 상이한 입력들은 가상 환경에서의 가상 카메라의 X 위치 및 Y 위치와 팬 각도(pan angle) 및 틸트 각도(tilt angle)를 변경할 수 있다. 추가 실시예들에서, 사용자는 가상 카메라의 높이(Z 좌표) 또는 요(yaw)를 변경하기 위해 입력들을 사용할 수 있다. 다른 추가 실시예들에서, 사용자는 가상 카메라로 하여금 위로 "호핑(hop)"하게 하고 중력을 시뮬레이션하여 그의 원래 위치로 돌아가게 하는 입력들을 입력할 수 있다. 가상 카메라를 내비게이션하는 데 이용 가능한 입력들은, 예를 들어, X-Y 평면에서 가상 카메라를 전후 좌우로 이동시키기 위한 WASD 키보드 키들, 가상 카메라를 "호핑"시키기 위한 스페이스 바 키, 그리고 팬 각도 및 틸트 각도의 변경들을 지정하는 마우스 움직임들과 같은, 키보드 및 마우스 입력들을 포함할 수 있다.Interface 100 may be displayed to participants in a video conference. For example, interface 100 can be rendered for display to participants and continuously updated as the video conference progresses. A user can control the orientation of his or her virtual camera using, for example, keyboard inputs. In this way, the user can navigate around the virtual environment. In one embodiment, different inputs can change the X position and Y position and the pan angle and tilt angle of the virtual camera in the virtual environment. In further embodiments, the user may use inputs to change the height (Z coordinate) or yaw of the virtual camera. In still further embodiments, a user may enter inputs that cause the virtual camera to “hop” up and return to its original position, simulating gravity. The inputs available to navigate the virtual camera include, for example, the WASD keyboard keys to move the virtual camera forward, backward, left, and right in the X-Y plane, the space bar key to "hop" the virtual camera, and the pan and tilt angles. May include keyboard and mouse inputs, such as mouse movements specifying changes.

인터페이스(100)는 아바타들(102A 및 102B)을 포함하고, 아바타들(102A 및 102B) 각각은 화상 회의의 상이한 참가자들을 나타낸다. 아바타들(102A 및 102B)은, 제각기, 제1 참가자 및 제2 참가자의 디바이스들로부터의 비디오 스트림들(104A 및 104B)로 텍스처 매핑된다. 텍스처 맵(texture map)은 형상 또는 폴리곤의 표면에 적용되는(매핑되는) 이미지이다. 여기서, 이미지들은 비디오의 각자의 프레임들이다. 비디오 스트림들(104A 및 104B)을 캡처하는 카메라 디바이스들은 각각의 참가자들의 얼굴들을 캡처하도록 배치된다. 이러한 방식으로, 아바타들은 회의의 참가자들이 말하고 들을 때 얼굴들의 움직이는 이미지들로 텍스처 매핑된다.Interface 100 includes avatars 102A and 102B, each of which represents a different participant in the video conference. Avatars 102A and 102B are texture mapped to video streams 104A and 104B from the devices of the first participant and the second participant, respectively. A texture map is an image that is applied (mapped) to the surface of a shape or polygon. Here, the images are individual frames of video. Camera devices capturing video streams 104A and 104B are positioned to capture the faces of each participant. In this way, avatars are texture-mapped into moving images of the faces of conference participants as they speak and listen.

가상 카메라가 사용자 보기 인터페이스(100)에 의해 제어되는 방식과 유사하게, 아바타들(102A 및 102B)의 위치 및 방향은 이들이 표현하는 각자의 참가자들에 의해 제어된다. 아바타들(102A 및 102B)은 메시에 의해 표현되는 3차원 모델들이다. 각각의 아바타(102A 및 102B)는 아바타 아래에 참가자의 이름을 가질 수 있다.Similar to the way the virtual camera is controlled by the user viewing interface 100, the position and orientation of avatars 102A and 102B are controlled by the respective participants they represent. Avatars 102A and 102B are three-dimensional models represented by a mesh. Each avatar 102A and 102B may have the participant's name below the avatar.

각자의 아바타들(102A 및 102B)은 다양한 사용자들에 의해 제어된다. 이들 각각은 가상 환경 내에서 자신의 가상 카메라들이 위치하는 곳에 대응하는 지점에 배치될 수 있다. 사용자 보기 인터페이스(100)가 가상 카메라를 여기저기로 이동시킬 수 있는 것처럼, 다양한 사용자들은 각자의 아바타들(102A 및 102B)을 여기저기로 이동시킬 수 있다.Respective avatars 102A and 102B are controlled by various users. Each of these can be placed at a point within the virtual environment that corresponds to where its virtual cameras are located. Just as user viewing interface 100 can move the virtual camera around, various users can move their respective avatars 102A and 102B around.

인터페이스(100)에 렌더링되는 가상 환경은 배경 이미지(120) 및 아레나(arena)의 3차원 모델(118)을 포함한다. 아레나는 화상 회의가 개최되어야 하는 장소 또는 건물일 수 있다. 아레나는 벽들에 의해 경계지어지는 바닥 영역을 포함할 수 있다. 3차원 모델(118)은 메시 및 텍스처를 포함할 수 있다. 3차원 모델(118)의 표면을 수학적으로 표현하는 다른 방식들도 가능할 수 있다. 예를 들어, 폴리곤 모델링, 곡선 모델링, 및 디지털 스컬프팅(digital sculpting)이 가능할 수 있다. 예를 들어, 3차원 모델(118)은 복셀들, 스플라인들, 기하학적 프리미티브들, 폴리곤들, 또는 3차원 공간에서의 임의의 다른 가능한 표현에 의해 표현될 수 있다. 3차원 모델(118)은 광원들의 사양을 또한 포함할 수 있다. 광원들은, 예를 들어, 점 광원, 지향성 광원, 스포트라이트 광원, 및 주변 광원을 포함할 수 있다. 객체들은 객체들이 광을 어떻게 반사하는지를 설명하는 특정 속성들을 또한 가질 수 있다. 예들에서, 속성들은 확산 조명, 주변 조명, 및 스펙트럼 조명 상호작용들을 포함할 수 있다.The virtual environment rendered on the interface 100 includes a background image 120 and a three-dimensional model 118 of the arena. An arena may be a location or building where a video conference is to be held. The arena may include a floor area bounded by walls. The three-dimensional model 118 may include mesh and texture. Other ways to mathematically represent the surface of the three-dimensional model 118 may also be possible. For example, polygon modeling, curve modeling, and digital sculpting may be possible. For example, three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space. The three-dimensional model 118 may also include specifications of light sources. Light sources may include, for example, point sources, directional sources, spotlight sources, and ambient sources. Objects can also have certain properties that describe how they reflect light. In examples, properties may include diffuse lighting, ambient lighting, and spectral lighting interactions.

아레나 외에도, 가상 환경은 환경의 상이한 컴포넌트들을 예시하는 다양한 다른 3차원 모델들을 포함할 수 있다. 예를 들어, 3차원 환경은 장식 모델(114), 스피커 모델(116), 및 프레젠테이션 화면 모델(122)을 포함할 수 있다. 모델(118)과 마찬가지로, 이러한 모델들은 3차원 공간에서의 기하학적 표면을 표현하는 임의의 수학적 방식을 사용하여 표현될 수 있다. 이러한 모델들은 모델(118)과 분리될 수 있거나 가상 환경의 단일 표현으로 결합될 수 있다.In addition to the arena, the virtual environment may include various other three-dimensional models that illustrate different components of the environment. For example, the three-dimensional environment may include a decoration model 114, a speaker model 116, and a presentation screen model 122. Like model 118, these models can be expressed using any mathematical way of representing a geometric surface in three-dimensional space. These models may be separate from model 118 or may be combined into a single representation of the virtual environment.

모델(114)과 같은 장식 모델들은 사실감을 향상시키고 아레나의 미적 매력을 증가시키는 역할을 한다. 도 5 및 도 7과 관련하여 아래에서 보다 상세히 설명될 것인 바와 같이, 스피커 모델(116)은 프레젠테이션 및 배경 음악과 같은 사운드를 가상적으로 방출할 수 있다. 프레젠테이션 화면 모델(122)은 프레젠테이션을 제시하기 위한 출구를 제공하는 역할을 할 수 있다. 발표자의 비디오 또는 프레젠테이션 화면 공유는 프레젠테이션 화면 모델(122) 상으로 텍스처 매핑될 수 있다.Decorative models, such as model 114, serve to enhance realism and increase the aesthetic appeal of the arena. As will be described in greater detail below with respect to FIGS. 5 and 7 , speaker model 116 may virtually emit sounds such as presentations and background music. The presentation screen model 122 may serve to provide an outlet for presenting a presentation. The presenter's video or presentation screen share may be texture mapped onto the presentation screen model 122.

버튼(108)은 참가자들의 목록을 사용자에게 제공할 수 있다. 일 예에서, 사용자가 버튼(108)을 선택한 후에, 사용자는 개별적으로 또는 그룹으로서 문자 메시지들을 보내는 것에 의해 다른 참가자들과 채팅할 수 있다.Button 108 may provide the user with a list of participants. In one example, after the user selects button 108, the user can chat with other participants by sending text messages individually or as a group.

버튼(110)은 사용자가 인터페이스(100)를 렌더링하는 데 사용되는 가상 카메라의 어트리뷰트들을 변경하는 것을 가능하게 할 수 있다. 예를 들어, 가상 카메라는 디스플레이하기 위해 데이터가 렌더링되는 각도를 지정하는 시야를 가질 수 있다. 카메라 시야 내의 모델링 데이터는 렌더링되는 반면, 카메라의 시야 밖의 모델링 데이터는 렌더링되지 않을 수 있다. 기본적으로, 가상 카메라의 시야는, 광각 렌즈와 인간 시각에 상응하는, 60°와 110° 사이의 어딘가로 설정될 수 있다. 그렇지만, 버튼(110)을 선택하는 것은 가상 카메라로 하여금 어안 렌즈에 상응하는 170°를 초과하도록 시야를 증가시키게 할 수 있다. 이것은 사용자가 가상 환경에서 자신의 주변 환경에 대한 보다 넓은 주변 인식을 갖도록 하는 것을 가능하게 할 수 있다.Button 110 may enable a user to change attributes of the virtual camera used to render interface 100. For example, a virtual camera may have a field of view that specifies the angle at which data is rendered for display. Modeling data within the camera's field of view may be rendered, while modeling data outside the camera's field of view may not be rendered. Basically, the virtual camera's field of view can be set to somewhere between 60° and 110°, corresponding to a wide-angle lens and human vision. However, selecting button 110 can cause the virtual camera to increase its field of view beyond 170°, equivalent to a fisheye lens. This may enable users to have a broader peripheral awareness of their surroundings in a virtual environment.

마지막으로, 버튼(112)은 사용자로 하여금 가상 환경을 종료하게 한다. 버튼(112)을 선택하는 것은 인터페이스(100)를 이전에 보고 있던 사용자에 대응하는 아바타를 디스플레이하는 것을 중단하도록 다른 참가자들의 디바이스들에게 시그널링하는 통지가 다른 참가자들에 속하는 디바이스들에게 송신되게 할 수 있다.Finally, button 112 allows the user to exit the virtual environment. Selecting button 112 may cause a notification to be sent to devices belonging to other participants signaling the devices of other participants to stop displaying the avatar corresponding to the user who was previously viewing interface 100. there is.

이러한 방식으로, 인터페이스 가상 3D 공간이 화상 회의를 수행하는 데 사용된다. 모든 사용자는 아바타를 제어하는데, 사용자는 이동, 둘러보기, 점프, 또는 위치 또는 방향을 변경하는 다른 일들을 수행하도록 아바타를 제어할 수 있다. 가상 카메라는 가상 3D 환경 및 다른 아바타들을 사용자에게 보여준다. 다른 사용자들의 아바타들은 사용자의 웹캠 이미지를 보여주는 가상 디스플레이를 일체 부분으로서 갖는다.In this way, the interface virtual 3D space is used to conduct video conferencing. Every user controls an avatar, which can move, look around, jump, or do other things that change its position or direction. The virtual camera shows the user a virtual 3D environment and different avatars. Other users' avatars have as an integral part a virtual display showing the user's webcam image.

사용자들에게 공간 감각을 제공하고 사용자들이 서로의 얼굴들을 볼 수 있도록 하는 것에 의해, 실시예들은 종래의 웹 회의 또는 종래의 MMO 게이밍보다 더 많은 사회적 경험을 제공한다. 보다 많은 사회적 경험은 다양한 응용들을 갖는다. 예를 들어, 이는 온라인 쇼핑에서 사용될 수 있다. 예를 들어, 인터페이스(100)는 가상 식료품점, 예배당, 무역 박람회, B2B 판매, B2C 판매, 학교 교육, 레스토랑 또는 구내 식당, 제품 출시, (예를 들면, 건축가, 엔지니어, 계약자를 위한) 건설 현장 방문, 사무실 공간(예를 들면, 사람들이 가상으로 "자신의 책상에서" 일을 함), 원격으로 기계(선박, 차량, 비행기, 잠수함, 드론, 드릴링 장비 등)을 제어하는 것, 플랜트/공장 제어실, 의료 시술, 정원 디자인, 가이드가 있는 가상 버스 투어, 음악 행사(예를 들면, 콘서트), 강의(예를 들면, TED 강연), 정당 회의, 이사회 회의, 수중 연구, 접근하기 어려운 장소에 대한 연구, 비상 사태(예를 들면, 화재)에 대한 훈련, 요리, 쇼핑(계산 및 배달 포함), 가상 예술 및 공예(예를 들면, 그림 및 도자기), 결혼식, 장례식, 세례, 원격 스포츠 훈련, 상담, 두려움 치료(예를 들면, 대면 요법), 패션쇼, 놀이 공원, 가정 장식, 스포츠 관람, e스포츠 관람, 3차원 카메라를 사용하여 캡처되는 공연을 관람하는 것, 보드 및 롤플레잉 게임 플레이, 의료 이미저리 검토, 지질 데이터 보기, 언어 학습, 시각 장애인을 위한 공간에서의 회의, 청각 장애인을 위한 공간에서의 회의 , 일반적으로 걷거나 일어설 수 없는 사람들의 이벤트 참여, 뉴스나 날씨 발표, 토크쇼, 책 사인회, 투표, MMO, (캘리포니아주 샌프란시스코의 Linden Research, Inc.로부터 이용 가능한 SECOND LIFE 게임과 같은 일부 MMO에서 이용 가능한 것과 같은) 가상 위치 구매/판매, 벼룩시장, 차고 판매, 여행사, 은행, 기록 보관소, 컴퓨터 프로세스 관리, 펜싱/검투/무술, 재연(예를 들면, 범죄 현장 및/또는 사고 재연), 실제 이벤트(예를 들면, 결혼식, 프레젠테이션, 쇼, 우주 유영)의 리허설, 3차원 카메라로 캡처된 실제 이벤트의 평가 또는 보기, 가축 쇼, 동물원, 키가 큰 사람/키가 작은 사람/맹인/농아인/백인/흑인의 삶을 경험하는 것(예를 들면, 사용자가 반응을 경험하기를 원하는 관점을 시뮬레이션하기 위한 가상 세계에 대한 수정된 비디오 스트림 또는 정지 이미지), 면접, 게임 쇼, 대화형 소설(예를 들면, 살인 미스터리), 가상 낚시, 가상 항해, 심리 연구, 행동 분석, 가상 스포츠(예를 들면, 등산/볼더링), 집 또는 다른 위치에서의 조명 등의 제어(도모틱스), 기억의 궁전, 고고학, 선물 가게, 고객이 실제 방문 시에 보다 편안하도록 하는 가상 방문, 시술을 설명하고 사람들이 보다 편안하게 느끼게 하는 가상 의료 시술, 및 가상 거래소/금융 시장/주식 시장(예를 들면, 실시간 데이터 및 비디오 피드를 가상 세계에 통합, 실시간 거래 및 분석), 사람들이 실제로 서로 유기적으로 만나도록 업무의 일부로서 가야 하는 가상 위치(예를 들면, 송장을 생성하기를 원하는 경우, 이는 가상 위치 내부에서만 가능함) 및 사람의 얼굴 표정을 볼 수 있도록 AR 헤드셋(또는 헬멧) 상에 사람의 얼굴을 투영하는 증강 현실(예를 들면, 군대, 법 집행, 소방관, 특수 작전에 유용함), 및 (예를 들면, 특정 휴가용 주택/자동차 등에 대한) 예약을 제공하는 데 응용들을 갖는다.By providing users with a sense of space and allowing users to see each other's faces, embodiments provide a more social experience than traditional web conferencing or traditional MMO gaming. More social experiences have a variety of applications. For example, this can be used in online shopping. For example, interface 100 may be used for a virtual grocery store, house of worship, trade show, B2B sales, B2C sales, schooling, restaurant or cafeteria, product launch, or construction site (e.g., for architects, engineers, contractors). Visits, office spaces (e.g., people working virtually “at their desks”), remotely controlling machines (ships, vehicles, airplanes, submarines, drones, drilling equipment, etc.), plants/factories Control rooms, medical procedures, garden design, guided virtual bus tours, musical events (e.g. concerts), lectures (e.g. TED Talks), political party conferences, board meetings, underwater research, difficult-to-access locations. Research, training for emergencies (e.g. fire), cooking, shopping (including checkout and delivery), virtual arts and crafts (e.g. painting and pottery), weddings, funerals, baptisms, distance sports training, counseling. , fear treatment (e.g., face-to-face therapy), fashion shows, amusement parks, home decor, sports viewing, esports viewing, viewing performances captured using 3D cameras, board and role-playing game play, and medical imaging. Reviewing, viewing geological data, learning a language, meeting in a space for the visually impaired, meeting in a space for the hearing impaired, attending events for people who are normally unable to walk or stand, news or weather announcements, talk shows, book signings, voting. , MMOs, buying/selling virtual locations (such as those available in some MMOs, such as the SECOND LIFE games available from Linden Research, Inc., San Francisco, California), flea markets, garage sales, travel agencies, banks, archives, computer processes. Management, fencing/sword fighting/martial arts, re-enactments (e.g. crime scene and/or accident re-enactments), rehearsals of real events (e.g. weddings, presentations, shows, space walks), real events captured with 3D cameras. evaluating or viewing, livestock shows, zoos, experiencing the lives of tall/short/blind/deaf/white/black people (e.g. simulating the perspective from which the user would like to experience a response) modified video streams or still images for virtual worlds), interviews, game shows, interactive novels (e.g. murder mysteries), virtual fishing, virtual sailing, psychological research, behavioral analysis, virtual sports (e.g. hiking/bouldering), control of lights, etc. from home or other locations (domotics), memory palace, archeology, gift shop, virtual visit to make customers more comfortable during the actual visit, explaining procedures and making people more comfortable virtual medical procedures, and virtual exchanges/financial markets/stock markets (e.g. integrating real-time data and video feeds into the virtual world, real-time trading and analytics), as part of the job so that people actually meet each other organically. A virtual location you need to go to (for example, if you want to generate an invoice, this can only be done inside the virtual location) and augmented reality (which projects a person's face onto an AR headset (or helmet) so that you can see their facial expressions). (e.g., useful for military, law enforcement, firefighters, special operations, etc.), and has applications in providing reservations (e.g., for specific vacation homes/cars, etc.).

도 2는 화상 회의를 위한 아바타들을 갖는 가상 환경을 렌더링하는 데 사용되는 3차원 모델을 예시하는 다이어그램(200)이다. 도 1에 예시된 바와 같이, 여기서 가상 환경은 3차원 아레나(118), 및 3차원 모델들(114 및 122)을 포함하는 다양한 3차원 모델들을 포함한다. 또한 도 1에 예시된 바와 같이, 다이어그램(200)은 가상 환경 여기저기로 내비게이션하는 아바타들(102A 및 102B)을 포함한다.2 is a diagram 200 illustrating a three-dimensional model used to render a virtual environment with avatars for video conferencing. As illustrated in FIG. 1 , where the virtual environment includes a three-dimensional arena 118 and various three-dimensional models, including three-dimensional models 114 and 122 . As also illustrated in FIG. 1 , diagram 200 includes avatars 102A and 102B that navigate around the virtual environment.

위에서 설명된 바와 같이, 도 1에서의 인터페이스(100)는 가상 카메라의 시점으로부터 렌더링된다. 해당 가상 카메라는 가상 카메라(204)로서 다이어그램(200)에 예시되어 있다. 위에서 언급된 바와 같이, 도 1에서의 사용자 보기 인터페이스(100)는 가상 카메라(204)를 제어하고 3차원 공간에서 가상 카메라를 내비게이션할 수 있다. 인터페이스(100)는 가상 카메라(204)의 새로운 위치 및 가상 카메라(204)의 시야 내의 모델들의 임의의 변경들에 따라 지속적으로 업데이트된다. 위에서 설명된 바와 같이, 가상 카메라(204)의 시야는 수평 시야각 및 수직 시야각에 의해, 적어도 부분적으로, 정의되는 절두체일 수 있다.As described above, interface 100 in Figure 1 is rendered from the viewpoint of a virtual camera. The virtual camera in question is illustrated in diagram 200 as virtual camera 204. As mentioned above, user viewing interface 100 in Figure 1 can control virtual camera 204 and navigate the virtual camera in three-dimensional space. Interface 100 is continuously updated with new positions of virtual camera 204 and any changes to models within the field of view of virtual camera 204. As described above, the field of view of virtual camera 204 may be a frustum defined, at least in part, by a horizontal field of view and a vertical field of view.

도 1과 관련하여 위에서 설명된 바와 같이, 배경 이미지 또는 텍스처는 가상 환경의 적어도 일부를 정의할 수 있다. 배경 이미지는 멀리 떨어져 나타나도록 의도된 가상 환경의 측면들을 캡처할 수 있다. 배경 이미지는 구(202) 상으로 매핑되는 텍스처일 수 있다. 가상 카메라(204)는 구(202)의 원점에 있을 수 있다. 이러한 방식으로, 가상 환경의 멀리 떨어진 특징물들이 효율적으로 렌더링될 수 있다.As described above with respect to Figure 1, a background image or texture may define at least a portion of the virtual environment. The background image may capture aspects of the virtual environment that are intended to appear from a distance. The background image may be a texture mapped onto the sphere 202. Virtual camera 204 may be at the origin of sphere 202. In this way, distant features of the virtual environment can be rendered efficiently.

다른 실시예들에서, 구(202) 대신에 다른 형상들이 배경 이미지를 텍스처 매핑하는 데 사용될 수 있다. 다양한 대안적인 실시예들에서, 형상은 원통, 입방체, 직사각형 프리즘, 또는 임의의 다른 3차원 기하형태일 수 있다.In other embodiments, other shapes instead of sphere 202 may be used to texture map the background image. In various alternative embodiments, the shape may be a cylinder, cube, rectangular prism, or any other three-dimensional geometry.

도 3은 가상 환경에서 화상 회의를 제공하는 시스템(300)을 예시하는 다이어그램이다. 시스템(300)은 하나 이상의 네트워크(304)를 통해 디바이스들(306A 및 306B)에 결합되는 서버(302)를 포함한다.3 is a diagram illustrating a system 300 that provides video conferencing in a virtual environment. System 300 includes a server 302 coupled to devices 306A and 306B through one or more networks 304.

서버(302)는 디바이스(306A)와 디바이스(306B) 사이에 화상 회의 세션을 연결시키기 위한 서비스들을 제공한다. 아래에서 보다 자세히 설명될 것인 바와 같이, 서버(302)는 새로운 참가자들이 회의에 합류할 때 및 기존의 참가자들이 회의에서 나갈 때 회의 참가자들의 디바이스들(예를 들면, 디바이스들(306A 및 306B))에 통지들을 통신한다. 서버(302)는 3차원 가상 공간 내의 각자의 참가자의 가상 카메라들에 대한 3차원 가상 공간에서의 위치 및 방향을 설명하는 메시지들을 통신한다. 서버(302)는 또한 참가자들의 각자의 디바이스들(예를 들면, 디바이스들(306A 및 306B)) 사이에서 비디오 및 오디오 스트림들을 통신한다. 마지막으로, 서버(302)는 3차원 가상 공간을 지정하는 데이터를 설명하는 데이터를 저장하고 각자의 디바이스들(306A 및 306B)에게 전송한다.Server 302 provides services for connecting a video conference session between device 306A and device 306B. As will be explained in more detail below, server 302 stores conference participants' devices (e.g., devices 306A and 306B) when new participants join the conference and when existing participants leave the conference. ) to communicate notifications. Server 302 communicates messages describing the position and orientation in the three-dimensional virtual space for each participant's virtual cameras within the three-dimensional virtual space. Server 302 also communicates video and audio streams between the participants' respective devices (e.g., devices 306A and 306B). Finally, the server 302 stores and transmits data describing data specifying the three-dimensional virtual space to the respective devices 306A and 306B.

가상 회의에 필요한 데이터에 추가하여, 서버(302)는 대화형 회의를 제공하기 위해 데이터를 어떻게 렌더링해야 하는지에 대해 디바이스들(306A 및 306B)에게 알려 주는 실행 가능한 정보를 제공할 수 있다.In addition to the data needed for the virtual conference, server 302 may provide actionable information that informs devices 306A and 306B on how to render the data to provide an interactive conference.

서버(302)는 응답으로 요청들에 대해 응답한다. 서버(302)는 웹 서버일 수 있다. 웹 서버는 월드 와이드 웹(World Wide Web)을 통해 이루어지는 클라이언트 요청들에 응답하기 위해 HTTP(Hypertext Transfer Protocol) 및 다른 프로토콜들을 사용하는 소프트웨어 및 하드웨어이다. 웹 서버의 주된 일은, 웹 페이지들을 저장하고, 처리하며 사용자들에게 전달하는 것을 통해, 웹사이트 콘텐츠를 디스플레이하는 것이다.Server 302 responds to the requests with a response. Server 302 may be a web server. A web server is software and hardware that uses Hypertext Transfer Protocol (HTTP) and other protocols to respond to client requests made over the World Wide Web. The main job of a web server is to display website content by storing, processing, and delivering web pages to users.

대안적인 실시예에서, 디바이스들(306A 및 306B) 사이의 통신은 서버(302)를 통하지 않고 피어 투 피어 기반으로 발생한다. 해당 실시예에서, 각자의 참가자들의 위치 및 방향을 설명하는 데이터, 신규 및 기존 참가자들에 관한 통지들, 및 각자의 참가자들의 비디오 및 오디오 스트림들 중 하나 이상이 서버(302)를 통하지 않고 디바이스들(306A 및 306B) 사이에서 직접 통신된다.In an alternative embodiment, communication between devices 306A and 306B occurs on a peer-to-peer basis rather than through server 302. In a given embodiment, one or more of data describing each participant's location and orientation, notifications regarding new and existing participants, and each participant's video and audio streams are transmitted to the devices rather than via server 302. There is direct communication between (306A and 306B).

네트워크(304)는 다양한 디바이스들(306A 및 306B)과 서버(302) 사이의 통신을 가능하게 한다. 네트워크(304)는 애드혹 네트워크, 인트라넷, 엑스트라넷, VPN(virtual private network), LAN(local area network), WLAN(wireless LAN), WAN(wide area network), WWAN(wireless wide area network), MAN(metropolitan area network), 인터넷의 일 부분, PSTN(Public Switched Telephone Network)의 일 부분, 셀룰러 전화 네트워크, 무선 네트워크, WiFi 네트워크, WiMax 네트워크, 임의의 다른 유형의 네트워크, 또는 2 개 이상의 그러한 네트워크의 임의의 조합일 수 있다.Network 304 enables communication between various devices 306A and 306B and server 302. The network 304 includes an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), and a MAN ( metropolitan area network, part of the Internet, part of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or any of two or more such networks. It can be a combination.

디바이스들(306A 및 306B)은 가상 회의의 각자의 참가자들의 각각의 디바이스들이다. 디바이스들(306A 및 306B) 각각은 가상 회의를 수행하는 데 필요한 데이터를 수신하고 가상 회의를 제공하는 데 필요한 데이터를 렌더링한다. 아래에서 보다 자세히 설명될 것인 바와 같이, 디바이스들(306A 및 306B)은 렌더링된 회의 정보를 제시하기 위한 디스플레이, 사용자가 가상 카메라를 제어할 수 있게 하는 입력들, 회의를 위해 사용자에게 오디오를 제공하기 위한 스피커(예컨대, 헤드셋), 사용자의 음성 입력을 캡처하기 위한 마이크로폰, 및 사용자의 얼굴의 비디오를 캡처하기 위해 배치된 카메라를 포함한다.Devices 306A and 306B are the respective devices of respective participants in the virtual conference. Devices 306A and 306B each receive data necessary to conduct a virtual conference and render data necessary to provide a virtual conference. As will be described in more detail below, devices 306A and 306B include a display for presenting rendered meeting information, inputs that allow the user to control a virtual camera, and providing audio to the user for the meeting. It includes speakers (e.g., a headset) to capture the user's voice input, a microphone to capture the user's voice input, and a camera positioned to capture video of the user's face.

디바이스들(306A 및 306B)은, 랩톱, 데스크톱, 스마트폰, 또는 태블릿 컴퓨터, 또는 웨어러블 컴퓨터(예컨대, 스마트워치 또는 증강 현실 또는 가상 현실 헤드셋)를 포함한, 임의의 유형의 컴퓨팅 디바이스일 수 있다.Devices 306A and 306B may be any type of computing device, including a laptop, desktop, smartphone, or tablet computer, or a wearable computer (eg, a smartwatch or an augmented or virtual reality headset).

웹 브라우저(308A 및 308B)는 (통합 자원 로케이터(Uniform Resource Locator) 또는 URL과 같은) 링크 식별자에 의해 주소 지정되는 네트워크 자원(예컨대, 웹 페이지)을 검색하고 디스플레이하기 위해 네트워크 자원을 제시할 수 있다. 상세하게는, 웹 브라우저(308A 및 308B)는 월드 와이드 웹 상의 정보에 액세스하기 위한 소프트웨어 애플리케이션이다. 일반적으로, 웹 브라우저(308A 및 308B)는 하이퍼텍스트 전송 프로토콜(HTTP 또는 HTTPS)을 사용하여 이 요청을 한다. 사용자가 특정 웹사이트에 웹 페이지를 요청할 때, 웹 브라우저는 웹 서버로부터 필요한 콘텐츠를 검색하고, 콘텐츠를 해석하고 실행하며, 이어서 클라이언트/상대방 회의 애플리케이션(308A 및 308B)으로서 도시된 페이지를 디바이스(306A 및 306B) 상의 디스플레이 상에 디스플레이한다. 예들에서, 콘텐츠는 HTML 및, JavaScript와 같은, 클라이언트 측 스크립팅을 가질 수 있다. 일단 디스플레이되면, 사용자는 정보를 입력하고 페이지 상에서 선택들을 할 수 있으며, 이는 웹 브라우저(308A 및 308B)로 하여금 추가 요청들을 하게 할 수 있다.Web browsers 308A and 308B may present network resources (e.g., web pages) addressed by a link identifier (such as a Uniform Resource Locator or URL) to retrieve and display the network resource. . Specifically, web browsers 308A and 308B are software applications for accessing information on the World Wide Web. Typically, web browsers 308A and 308B make this request using Hypertext Transfer Protocol (HTTP or HTTPS). When a user requests a web page from a particular website, the web browser retrieves the required content from the web server, interprets and executes the content, and then transmits the page to device 306A, shown as client/peer conference applications 308A and 308B. and 306B). In examples, the content may have HTML and client-side scripting, such as JavaScript. Once displayed, the user can enter information and make selections on the page, which may cause web browsers 308A and 308B to make additional requests.

회의 애플리케이션(310A 및 310B)은 서버(302)로부터 다운로드되고 각자의 웹 브라우저들(308A 및 308B)에 의해 실행되도록 구성된 웹 애플리케이션일 수 있다. 일 실시예에서, 회의 애플리케이션(310A 및 310B)은 JavaScript 애플리케이션일 수 있다. 일 예에서, 회의 애플리케이션(310A 및 310B)은, Typescript 언어와 같은, 보다 고수준의 언어로 작성되고 JavaScript로 번역되거나 컴파일될 수 있다. 회의 애플리케이션(310A 및 310B)은 WebGL JavaScript 애플리케이션 프로그래밍 인터페이스와 상호작용하도록 구성된다. 이는 JavaScript로 지정되는 제어 코드와 GLSL ES(OpenGL ES Shading Language)로 작성된 셰이더 코드(shader code)를 가질 수 있다. WebGL API를 사용하여, 회의 애플리케이션(310A 및 310B)은 디바이스(306A 및 306B)의 그래픽 처리 유닛(도시되지 않음)을 활용할 수 있다. 더욱이, 플러그인들을 사용하지 않는 대화형 2차원 및 3차원 그래픽의 OpenGL 렌더링.Meeting applications 310A and 310B may be web applications downloaded from server 302 and configured to be executed by respective web browsers 308A and 308B. In one embodiment, meeting applications 310A and 310B may be JavaScript applications. In one example, meeting applications 310A and 310B may be written in a higher level language, such as the Typescript language, and translated or compiled into JavaScript. Meeting applications 310A and 310B are configured to interact with the WebGL JavaScript application programming interface. It can have control code specified in JavaScript and shader code written in GLSL ES (OpenGL ES Shading Language). Using the WebGL API, conferencing applications 310A and 310B can utilize graphics processing units (not shown) of devices 306A and 306B. Moreover, OpenGL rendering of interactive 2D and 3D graphics without using plug-ins.

회의 애플리케이션(310A 및 310B)은 다른 아바타들의 위치 및 방향을 설명하는 데이터 및 가상 환경을 설명하는 3차원 모델링 정보를 서버(302)로부터 수신한다. 추가적으로, 회의 애플리케이션(310A 및 310B)은 서버(302)로부터 다른 회의 참가자들의 비디오 및 오디오 스트림들을 수신한다.Meeting applications 310A and 310B receive data describing the positions and orientations of other avatars and three-dimensional modeling information describing the virtual environment from server 302. Additionally, conference applications 310A and 310B receive video and audio streams of other conference participants from server 302.

회의 애플리케이션(310A 및 310B)은, 3차원 환경을 설명하는 데이터 및 각자의 참가자 아바타들을 표현하는 데이터를 포함한, 3차원 모델링 데이터를 렌더링한다. 이 렌더링은 래스터화, 텍스처 매핑, 광선 추적, 셰이딩, 또는 다른 렌더링 기술들을 수반할 수 있다. 일 실시예에서, 렌더링은 가상 카메라의 특성들에 기초한 광선 추적을 수반할 수 있다. 광선 추적은 이미지 평면에서의 픽셀들로서 광의 경로를 추적하고 가상 객체들과의 조우의 효과들을 시뮬레이션하는 것에 의해 이미지를 생성하는 것을 수반한다. 일부 실시예들에서, 사실감을 향상시키기 위해, 광선 추적은 반사, 굴절, 산란, 및 분산과 같은 광학 효과들을 시뮬레이션할 수 있다.Meeting applications 310A and 310B render three-dimensional modeling data, including data describing the three-dimensional environment and data representing the respective participant avatars. This rendering may involve rasterization, texture mapping, ray tracing, shading, or other rendering techniques. In one embodiment, rendering may involve ray tracing based on characteristics of the virtual camera. Ray tracing involves creating an image by tracing the path of light to pixels in the image plane and simulating the effects of encountering virtual objects. In some embodiments, to enhance realism, ray tracing can simulate optical effects such as reflection, refraction, scattering, and dispersion.

이러한 방식으로, 사용자는 웹 브라우저(308A 및 308B)를 사용하여 가상 공간에 입장한다. 장면이 사용자의 스크린 상에 디스플레이된다. 사용자의 웹캠 비디오 스트림 및 마이크로폰 오디오 스트림이 서버(302)에게 송신된다. 다른 사용자들이 가상 공간에 입장할 때, 이들에 대한 아바타 모델이 생성된다. 이 아바타의 위치는 서버에게 송신되고 다른 사용자들에 의해 수신된다. 다른 사용자들은 또한 오디오/비디오 스트림이 이용 가능하다는 통지를 서버(302)로부터 받는다. 사용자의 비디오 스트림은 해당 사용자를 위해 생성된 아바타 상에 배치된다. 오디오 스트림은 아바타의 위치로부터 오는 것처럼 재생된다.In this way, the user enters the virtual space using web browsers 308A and 308B. The scene is displayed on the user's screen. The user's webcam video stream and microphone audio stream are transmitted to server 302. When other users enter the virtual space, avatar models are created for them. This avatar's location is transmitted to the server and received by other users. Other users also receive notification from server 302 that the audio/video stream is available. A user's video stream is placed on an avatar created for that user. The audio stream is played as if coming from the avatar's location.

도 4a 내지 도 4c는 화상 회의를 제공하기 위해 도 3에서의 시스템의 다양한 컴포넌트들 사이에서 데이터가 어떻게 전송되는지를 예시한다. 도 3과 같이, 도 4a 내지 도 4c의 각각은 서버(302)와 디바이스들(306A 및 306B) 사이의 연결을 묘사한다. 상세하게는, 도 4a 내지 도 4c는 해당 디바이스들 사이의 예시적인 데이터 흐름들을 예시한다.Figures 4A-4C illustrate how data is transferred between the various components of the system in Figure 3 to provide video conferencing. Like Figure 3, each of Figures 4A-4C depicts a connection between server 302 and devices 306A and 306B. In detail, Figures 4A-4C illustrate example data flows between the devices.

도 4a는 서버(302)가 가상 환경을 설명하는 데이터를 디바이스들(306A 및 306B)에게 어떻게 전송하는지를 예시하는 다이어그램(400)을 예시한다. 상세하게는, 디바이스들(306A 및 306B) 양쪽 모두는 3차원 아레나(404), 배경 텍스처(402), 공간 계층구조(408) 및 임의의 다른 3차원 모델링 정보(406)를 서버(302)로부터 수신한다.FIG. 4A illustrates a diagram 400 illustrating how server 302 transmits data describing the virtual environment to devices 306A and 306B. In particular, both devices 306A and 306B retrieve the 3D arena 404, background texture 402, spatial hierarchy 408, and any other 3D modeling information 406 from server 302. Receive.

위에서 설명된 바와 같이, 배경 텍스처(402)는 가상 환경의 멀리 떨어진 특징물들을 나타내는 이미지이다. 이미지는 규칙적(예컨대, 벽돌 벽)이거나 불규칙적일 수 있다. 배경 텍스처(402)는, 비트맵, JPEG, GIF, 또는 다른 파일 이미지 포맷과 같은, 임의의 공통 이미지 파일 포맷으로 인코딩될 수 있다. 이는, 예를 들어, 멀리 떨어져 있는 구에 대해 렌더링될 배경 이미지를 설명한다.As described above, background texture 402 is an image representing distant features of the virtual environment. The image may be regular (eg, a brick wall) or irregular. Background texture 402 may be encoded in any common image file format, such as bitmap, JPEG, GIF, or other file image format. This describes the background image that will be rendered, for example, for a sphere in the distance.

3차원 아레나(404)는 회의가 개최될 공간의 3차원 모델이다. 위에서 설명된 바와 같이, 이는, 예를 들어, 메시 및 어쩌면 자체 텍스처 정보가 설명하는 3차원 프리미티브들 상에 매핑될 자체 텍스처 정보를 포함할 수 있다. 이는 가상 카메라와 각자의 아바타들이 가상 환경 내에서 내비게이션할 수 있는 공간을 정의할 수 있다. 그에 따라, 이는 내비게이션 가능한 가상 환경의 외곽을 사용자들에게 나타내는 가장자리들(예컨대, 벽들 또는 울타리들)에 의해 경계지어질 수 있다.The 3D arena 404 is a 3D model of the space where the meeting will be held. As explained above, this may include, for example, a mesh and possibly its own texture information to be mapped onto the three-dimensional primitives it describes. This can define a space that the virtual camera and each avatar can navigate within the virtual environment. Accordingly, it may be bounded by edges (eg, walls or fences) that indicate to the user the perimeter of the navigable virtual environment.

공간 계층구조(408)는 가상 환경에서의 파티션들을 지정하는 데이터이다. 이러한 파티션들은 참가자들 사이에서 전송되기 전에 사운드가 어떻게 처리되는지를 결정하는 데 사용된다. 아래에서 설명될 것인 바와 같이, 이 파티션 데이터는 계층적일 수 있고, 가상 회의의 참가자들이 사적인 대화 또는 부차적인 대화를 나눌 수 있는 영역들을 가능하게 하기 위한 사운드 처리를 설명할 수 있다.Spatial hierarchy 408 is data that specifies partitions in the virtual environment. These partitions are used to determine how the sound is processed before being transmitted between participants. As will be explained below, this partition data may be hierarchical and may describe sound processing to enable areas in which participants of a virtual meeting can have private or secondary conversations.

3차원 모델(406)은 회의를 수행하는 데 필요한 임의의 다른 3차원 모델링 정보이다. 일 실시예에서, 이것은 각자의 아바타들을 설명하는 정보를 포함할 수 있다. 대안적으로 또는 추가적으로, 이 정보는 제품 시연들을 포함할 수 있다.The 3D model 406 is any other 3D modeling information needed to conduct a meeting. In one embodiment, this may include information describing the respective avatars. Alternatively or additionally, this information may include product demonstrations.

회의를 수행하는 데 필요한 정보가 참가자들에게 송신되면, 도 4b 및 도 4c는 서버(302)가 하나의 디바이스로부터 다른 디바이스로 어떻게 정보를 전달하는지를 예시한다. 도 4b는 서버(302)가 각자의 디바이스들(306A 및 306B)로부터 어떻게 정보를 수신하는지를 보여주는 다이어그램(420)을 예시하고, 도 4c는 서버(302)가 정보를 각자의 디바이스들(306B 및 306A)에게 어떻게 전송하는지를 보여주는 다이어그램(420)을 예시한다. 상세하게는, 디바이스(306A)는 위치 및 방향(422A), 비디오 스트림(424A), 및 오디오 스트림(426A)을 서버(302)에게 전송하고, 서버(302)는 위치 및 방향(422A), 비디오 스트림(424A), 및 오디오 스트림(426A)을 디바이스(306B)에게 전송한다. 그리고 디바이스(306B)는 위치 및 방향(422B), 비디오 스트림(424B), 및 오디오 스트림(426B)을 서버(302)에게 전송하고, 서버(302)는 위치 및 방향(422B), 비디오 스트림(424B), 및 오디오 스트림(426B)을 디바이스(306A)에게 전송한다.Once information needed to conduct a meeting is transmitted to participants, Figures 4B and 4C illustrate how server 302 passes information from one device to another. FIG. 4B illustrates a diagram 420 showing how server 302 receives information from respective devices 306A and 306B, and FIG. 4C illustrates how server 302 receives information from respective devices 306B and 306A. ) illustrates a diagram 420 showing how to transmit. Specifically, device 306A transmits location and orientation 422A, video stream 424A, and audio stream 426A to server 302, and server 302 transmits location and orientation 422A, video stream 424A, and audio stream 426A. Stream 424A and audio stream 426A are transmitted to device 306B. And the device 306B transmits the location and direction 422B, the video stream 424B, and the audio stream 426B to the server 302, and the server 302 transmits the location and direction 422B and the video stream 424B. ), and transmits the audio stream 426B to the device 306A.

위치 및 방향(422A 및 422B)은 디바이스(306A)를 사용하는 사용자에 대한 가상 카메라의 위치 및 방향을 설명한다. 위에서 설명된 바와 같이, 위치는 3차원 공간에서의 좌표(예를 들면, x, y, z 좌표)일 수 있고, 방향은 3차원 공간에서의 방향(예를 들어, 팬, 틸트, 롤(roll))일 수 있다. 일부 실시예들에서, 사용자는 가상 카메라의 롤을 제어할 수 없을 수 있으며, 따라서 방향은 팬 각도 및 틸트 각도만을 지정할 수 있다. 유사하게, 일부 실시예들에서, 사용자는 (아바타가 가상 중력에 의해 제한되기 때문에) 아바타의 z 좌표를 변경할 수 없을 수 있으며, z 좌표가 불필요할 수 있다. 이러한 방식으로, 위치 및 방향(422A 및 422B) 각각은 적어도 3차원 가상 공간에서의 수평 평면 상의 좌표 및 팬 및 틸트 값을 포함할 수 있다. 대안적으로 또는 추가적으로, 사용자는 자신의 아바타를 "점프"시킬 수 있으며, 따라서 Z 위치는 사용자가 자신의 아바타를 점프시키는지 여부의 표시에 의해서만 지정될 수 있다.Position and orientation 422A and 422B describe the position and orientation of the virtual camera relative to the user using device 306A. As described above, a position can be a coordinate in three-dimensional space (e.g., x, y, z coordinates), and a direction can be a direction in three-dimensional space (e.g., pan, tilt, roll). )) can be. In some embodiments, the user may not be able to control the roll of the virtual camera, so the direction may only specify the pan and tilt angles. Similarly, in some embodiments, the user may not be able to change the z-coordinate of the avatar (because the avatar is limited by virtual gravity), and the z-coordinate may be unnecessary. In this way, positions and directions 422A and 422B may each include coordinates and pan and tilt values on a horizontal plane in at least three-dimensional virtual space. Alternatively or additionally, the user may "jump" his or her avatar, so the Z position may be specified solely by an indication of whether the user is jumping his or her avatar.

상이한 예들에서, 위치 및 방향(422A 및 422B)은 HTTP 요청 응답들을 사용하여 또는 소켓 메시징을 사용하여 전송 및 수신될 수 있다.In different examples, location and directions 422A and 422B may be sent and received using HTTP request responses or using socket messaging.

비디오 스트림(424A 및 424B)은 각자의 디바이스들(306A 및 306B)의 카메라로부터 캡처되는 비디오 데이터이다. 비디오는 압축될 수 있다. 예를 들어, 비디오는, MPEG-4, VP8, 또는 H.264를 포함한, 임의의 통상적으로 알려진 비디오 코덱들을 사용할 수 있다. 비디오는 실시간으로 캡처되어 전송될 수 있다.Video streams 424A and 424B are video data captured from the cameras of the respective devices 306A and 306B. Video can be compressed. For example, video may use any commonly known video codecs, including MPEG-4, VP8, or H.264. Video can be captured and transmitted in real time.

유사하게, 오디오 스트림(426A 및 426B)은 각자의 디바이스들의 마이크로폰으로부터 캡처되는 오디오 데이터이다. 오디오는 압축될 수 있다. 예를 들어, 오디오는, MPEG-4 또는 vorbis를 포함한, 임의의 통상적으로 알려진 오디오 코덱들을 사용할 수 있다. 오디오는 실시간으로 캡처되어 전송될 수 있다. 비디오 스트림(424A) 및 오디오 스트림(426A)은 서로 동기하여 캡처되고, 전송되며, 제시된다. 유사하게, 비디오 스트림(424B) 및 오디오 스트림(426B)은 서로 동기하여 캡처되고, 전송되며, 제시된다.Similarly, audio streams 426A and 426B are audio data captured from the respective devices' microphones. Audio can be compressed. For example, audio can use any commonly known audio codecs, including MPEG-4 or vorbis. Audio can be captured and transmitted in real time. Video stream 424A and audio stream 426A are captured, transmitted, and presented in synchronization with each other. Similarly, video stream 424B and audio stream 426B are captured, transmitted, and presented in synchronization with each other.

비디오 스트림(424A 및 424B) 및 오디오 스트림(426A 및 426B)은 WebRTC 애플리케이션 프로그래밍 인터페이스를 사용하여 전송될 수 있다. WebRTC는 JavaScript에서 이용 가능한 API이다. 위에서 설명된 바와 같이, 디바이스들(306A 및 306B)은, 회의 애플리케이션들(310A 및 310B)로서, 웹 애플리케이션들을 다운로드하고 실행하며, 회의 애플리케이션들(310A 및 310B)은 JavaScript로 구현될 수 있다. 회의 애플리케이션들(310A 및 310B)은 자신의 JavaScript로부터 API 호출들을 수행하는 것에 의해 비디오 스트림(424A 및 424B) 및 오디오 스트림(426A 및 426B)을 수신 및 전송하기 위해 WebRTC를 사용할 수 있다.Video streams 424A and 424B and audio streams 426A and 426B may be transmitted using the WebRTC application programming interface. WebRTC is an API available in JavaScript. As described above, devices 306A and 306B download and run web applications, such as meeting applications 310A and 310B, which may be implemented in JavaScript. Conferencing applications 310A and 310B can use WebRTC to receive and transmit video streams 424A and 424B and audio streams 426A and 426B by making API calls from their JavaScript.

위에서 언급된 바와 같이, 사용자가 가상 회의에서 나갈 때, 이 퇴장이 모든 다른 사용자들에게 통신된다. 예를 들어, 디바이스(306A)가 가상 회의에서 빠져나가는 경우, 서버(302)는 해당 퇴장을 디바이스(306B)에게 통신할 것이다. 결과적으로, 디바이스(306B)는 디바이스(306A)에 대응하는 아바타를 렌더링하는 것을 중지하여, 가상 공간으로부터 아바타를 제거할 것이다. 추가적으로, 디바이스(306B)는 비디오 스트림(424A) 및 오디오 스트림(426A)을 수신하는 것을 중단할 것이다.As mentioned above, when a user leaves a virtual meeting, this exit is communicated to all other users. For example, if device 306A leaves the virtual meeting, server 302 will communicate that departure to device 306B. As a result, device 306B will stop rendering the avatar corresponding to device 306A, removing the avatar from the virtual space. Additionally, device 306B will stop receiving video stream 424A and audio stream 426A.

위에서 설명된 바와 같이, 회의 애플리케이션들(310A 및 310B)은 각자의 비디오 스트림들(424A 및 424B), 위치 및 방향(422A 및 422B), 및 3차원 환경에 관한 새로운 정보에 기초하여 가상 공간을 주기적으로 또는 간헐적으로 재렌더링할 수 있다. 단순함을 위해, 이러한 업데이트들 각각이 이제 디바이스(306A)의 관점에서 설명된다. 그렇지만, 통상의 기술자는 유사한 변경들이 주어지면 디바이스(306B)가 유사하게 거동할 것임을 이해할 것이다.As described above, conferencing applications 310A and 310B periodically update the virtual space based on new information about their respective video streams 424A and 424B, position and orientation 422A and 422B, and the three-dimensional environment. You can re-render periodically or intermittently. For simplicity, each of these updates is now described from the perspective of device 306A. However, one of ordinary skill in the art will understand that device 306B will behave similarly given similar changes.

디바이스(306A)가 비디오 스트림(424B)을 수신할 때, 디바이스(306A)는 비디오 스트림(424A)으로부터의 프레임들을 디바이스(306B)에 대응하는 아바타 상으로 텍스처 매핑한다. 해당 텍스처 매핑된 아바타는 3차원 가상 공간 내에서 재렌더링되고 디바이스(306A)의 사용자에게 제시된다.When device 306A receives video stream 424B, device 306A texture maps the frames from video stream 424A onto the avatar corresponding to device 306B. The texture-mapped avatar is re-rendered within a three-dimensional virtual space and presented to the user of device 306A.

디바이스(306A)가 새로운 위치 및 방향(422B)을 수신할 때, 디바이스(306A)는 새로운 위치에 배치되고 새로운 방향으로 배향된 디바이스(306B)에 대응하는 아바타를 생성한다. 생성된 아바타는 3차원 가상 공간 내에서 재렌더링되고 디바이스(306A)의 사용자에게 제시된다.When device 306A receives the new location and orientation 422B, device 306A creates an avatar corresponding to device 306B positioned at the new location and oriented in the new orientation. The created avatar is re-rendered within a three-dimensional virtual space and presented to the user of device 306A.

일부 실시예들에서, 서버(302)는 3차원 가상 환경을 설명하는 업데이트된 모델 정보를 송신할 수 있다. 예를 들어, 서버(302)는 업데이트된 정보(402, 404, 406, 또는 408)를 송신할 수 있다. 그러한 일이 발생할 때, 디바이스(306A)는 업데이트된 정보에 기초하여 가상 환경을 재렌더링할 것이다. 이것은 시간이 지남에 따라 환경이 변할 때 유용할 수 있다. 예를 들어, 야외 이벤트는 이벤트가 진행됨에 따라 낮으로부터 해질녘으로 바뀔 수 있다.In some embodiments, server 302 may transmit updated model information describing the three-dimensional virtual environment. For example, server 302 may transmit updated information 402, 404, 406, or 408. When that happens, device 306A will re-render the virtual environment based on the updated information. This can be useful when your environment changes over time. For example, an outdoor event may change from daytime to sunset as the event progresses.

다시 말하지만, 디바이스(306B)가 가상 회의에서 빠져나갈 때, 서버(302)는 디바이스(306B)가 더 이상 회의에 참여하고 있지 않음을 나타내는 통지를 디바이스(306A)에게 송신한다. 그 경우에, 디바이스(306A)는 디바이스(306B)에 대한 아바타를 갖지 않는 가상 환경을 재렌더링할 것이다.Again, when device 306B leaves the virtual conference, server 302 sends a notification to device 306A indicating that device 306B is no longer participating in the conference. In that case, device 306A will re-render the virtual environment without an avatar for device 306B.

도 3 및 도 4a 내지 도 4c가 단순함을 위해 2 개의 디바이스로 예시되어 있지만, 통상의 기술자는 본 명세서에서 설명되는 기술들이 임의의 수의 디바이스들로 확장될 수 있음을 이해할 것이다. 또한, 도 3 및 도 4a 내지 도 4c가 단일 서버(302)를 예시하지만, 통상의 기술자는 서버(302)의 기능이 복수의 컴퓨팅 디바이스들 간에 분산될 수 있음을 이해할 것이다. 일 실시예에서, 도 4a에서 전송되는 데이터는 서버(302)에 대한 하나의 네트워크 주소로부터 올 수 있는 반면, 도 4b 및 도 4c에서 전송되는 데이터는 서버(302)에 대한 다른 네트워크 주소로/로부터 전송될 수 있다.Although FIGS. 3 and 4A-4C are illustrated with two devices for simplicity, those skilled in the art will understand that the techniques described herein can be extended to any number of devices. Additionally, although FIGS. 3 and 4A-4C illustrate a single server 302, those skilled in the art will understand that the functionality of server 302 may be distributed among a plurality of computing devices. In one embodiment, the data transmitted in FIG. 4A may come from one network address for server 302, while the data transmitted in FIGS. 4B and 4C may come from/to a different network address for server 302. can be transmitted.

일 실시예에서, 참가자들은 가상 회의에 입장하기 전에 자신의 웹캠, 마이크로폰, 스피커들 및 그래픽 설정들을 설정할 수 있다. 대안적인 실시예에서, 애플리케이션을 시작한 후에, 사용자들은 가상 로비에 들어갈 수 있으며, 거기에서 실제 사람에 의해 제어되는 아바타의 인사를 받는다. 이 사람은 사용자의 웹캠, 마이크로폰, 스피커들 및 그래픽 설정들을 보고 수정할 수 있다. 안내원은 또한, 예를 들어, 보기, 이동 및 상호작용에 관해 사용자들에게 가르치는 것에 의해 가상 환경을 어떻게 사용하는지에 대해 사용자에게 알려 줄 수 있다. 준비가 될 때, 사용자는 자동으로 가상 대기실에서 나와서 실제 가상 환경에 합류한다.In one embodiment, participants can set their webcam, microphone, speakers, and graphics settings before entering the virtual meeting. In an alternative embodiment, after starting the application, users can enter a virtual lobby, where they are greeted by an avatar controlled by a real person. This person can view and modify the user's webcam, microphone, speakers, and graphics settings. The guide may also instruct the user on how to use the virtual environment, for example by teaching the user about viewing, moving and interacting. When ready, the user automatically leaves the virtual waiting room and joins the actual virtual environment.

가상 환경에서의 화상 회의를 위한 볼륨의 조정Adjustment of volume for video conferencing in a virtual environment

실시예들은 또한 가상 회의 내에서의 위치 및 공간의 감각을 제공하기 위해 볼륨을 조정한다. 이는, 예를 들어, 도 5 내지 도 7, 도 8a 및 도 8b 및 도 9a 내지 도 9c에 예시되어 있으며, 이들 각각은 아래에 설명되어 있다.Embodiments also adjust volume to provide a sense of location and space within the virtual meeting. This is illustrated, for example, in Figures 5-7, 8A-8B and 9A-9C, each of which is described below.

도 5는 화상 회의 동안 가상 환경에서 위치 감각을 제공하기 위해 상대적인 좌우 볼륨을 조정하기 위한 방법(500)을 예시하는 플로차트이다.5 is a flow chart illustrating a method 500 for adjusting relative left and right volumes to provide a sense of position in a virtual environment during a video conference.

단계(502)에서, 아바타들 사이의 거리에 기초하여 볼륨이 조정된다. 위에서 설명된 바와 같이, 다른 사용자의 디바이스의 마이크로폰으로부터의 오디오 스트림이 수신된다. 제2 위치와 제1 위치 사이의 거리에 기초하여 제1 오디오 스트림과 제2 오디오 스트림 양쪽 모두의 볼륨이 조정된다. 이것은 도 6에 예시되어 있다.At step 502, the volume is adjusted based on the distance between the avatars. As described above, an audio stream from the microphone of another user's device is received. The volume of both the first and second audio streams is adjusted based on the distance between the second location and the first location. This is illustrated in Figure 6.

도 6은 아바타들 사이의 거리가 증가함에 따라 볼륨이 어떻게 롤오프되는지를 예시하는 차트(600)를 도시한다. 차트(600)는 x 축 및 y 축에 볼륨(602)을 예시한다. 사용자들 사이의 거리가 증가함에 따라, 볼륨은 기준 거리(602)에 도달할 때까지 일정하게 유지된다. 그 시점에서, 볼륨이 하강하기 시작한다. 이러한 방식으로, 모든 다른 조건이 동일하다면, 보다 가까운 사용자는 보다 멀리 있는 사용자보다 종종 더 크게 들릴 것이다.Figure 6 shows a chart 600 illustrating how the volume rolls off as the distance between avatars increases. Chart 600 illustrates volume 602 on the x and y axes. As the distance between users increases, the volume remains constant until the reference distance 602 is reached. At that point, the volume begins to decline. In this way, all other things being equal, closer users will often sound louder than farther away users.

사운드가 얼마나 빨리 하강하는지는 롤 오프 율(roll off factor)에 의존한다. 이것은 화상 회의 시스템 또는 클라이언트 디바이스의 설정들에 내장된 계수일 수 있다. 라인(608) 및 라인(610)에 의해 예시된 바와 같이, 보다 큰 롤 오프 율은 보다 작은 롤 오프 율보다 볼륨을 더 빠르게 떨어뜨릴 것이다.How fast the sound falls depends on the roll off factor. This may be a factor built into the settings of the video conferencing system or client device. As illustrated by lines 608 and 610, larger roll off rates will cause volume to drop more quickly than smaller roll off rates.

도 5로 돌아가면, 단계(504)에서, 아바타가 위치하는 방향에 기초하여 상대적인 좌우 오디오가 조정된다. 즉, 말하는 사용자의 아바타가 어디에 위치하는지의 감각을 제공하기 위해 사용자의 스피커(예를 들면, 헤드셋)에서 출력되는 오디오의 볼륨이 변할 것이다. 오디오를 수신하는 사용자가 위치하는 위치(예를 들면, 가상 카메라의 위치)를 기준으로 오디오 스트림을 생성하는 사용자가 위치하는 위치(예를 들어, 말하는 사용자의 아바타의 위치)의 방향에 기초하여 좌우 오디오 스트림들의 상대적인 볼륨이 조정된다. 위치들은 3차원 가상 공간 내의 수평 평면에 있을 수 있다. 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하기 위해 좌측 오디오 스트림과 우측 오디오 스트림의 상대적인 볼륨이 조정된다.Returning to Figure 5, at step 504 the relative left and right audio is adjusted based on the direction in which the avatar is located. That is, the volume of audio output from the user's speakers (e.g., headset) will change to provide a sense of where the speaking user's avatar is located. Left and right based on the direction of the location of the user generating the audio stream (e.g., the location of the speaking user's avatar) relative to the location of the user receiving the audio (e.g., the location of the virtual camera). The relative volumes of the audio streams are adjusted. The locations may be on a horizontal plane within a three-dimensional virtual space. The relative volumes of the left and right audio streams are adjusted to provide a sense of where the second location is in three-dimensional virtual space relative to the first location.

예를 들어, 단계(504)에서, 오디오가 수신하는 사용자의 좌측 귀에서 우측 귀에서보다 더 높은 볼륨으로 출력되도록 가상 카메라의 좌측에 있는 아바타에 대응하는 오디오가 조정될 것이다. 유사하게, 오디오가 수신하는 사용자의 우측 귀에서 좌측 귀에서보다 더 높은 볼륨으로 출력되도록 가상 카메라의 우측에 있는 아바타에 대응하는 오디오가 조정될 것이다.For example, at step 504, the audio corresponding to the avatar on the left side of the virtual camera may be adjusted so that the audio is output at a higher volume in the receiving user's left ear than in the right ear. Similarly, the audio corresponding to the avatar on the right side of the virtual camera will be adjusted so that the audio is output at a higher volume in the receiving user's right ear than in the left ear.

단계(506)에서, 상대적인 좌우 오디오는 하나의 아바타가 다른 아바타를 기준으로 배향되는 방향에 기초하여 조정된다. 가상 카메라가 향하고 있는 방향과 아바타가 향하고 있는 방향 사이의 각도에 기초하여, 이 각도가 더 수직인 것이 좌측 오디오 스트림과 우측 오디오 스트림 사이의 더 큰 볼륨 차이를 갖는 경향이 있도록, 좌측 오디오 스트림과 우측 오디오 스트림의 상대적인 볼륨이 조정된다.At step 506, the relative left and right audio is adjusted based on the direction in which one avatar is oriented relative to the other. Based on the angle between the direction the virtual camera is facing and the direction the avatar is facing, such that more vertical this angle tends to have a larger volume difference between the left and right audio streams. The relative volume of the audio stream is adjusted.

예를 들어, 아바타가 가상 카메라를 바로 마주하고 있을 때, 단계(506)에서 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 전혀 조정되지 않을 수 있다. 아바타가 가상 카메라의 좌측을 향하고 있을 때, 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 좌측이 우측보다 더 크도록 조정될 수 있다. 그리고, 아바타가 가상 카메라의 우측을 향하고 있을 때, 아바타의 대응하는 오디오 스트림의 상대적인 좌우 볼륨은 우측이 좌측보다 더 크도록 조정될 수 있다.For example, when the avatar is directly facing the virtual camera, the relative left and right volumes of the avatar's corresponding audio streams at step 506 may not be adjusted at all. When the avatar is facing the left side of the virtual camera, the relative left and right volumes of the avatar's corresponding audio streams can be adjusted so that the left side is louder than the right side. And, when the avatar is facing the right side of the virtual camera, the relative left and right volumes of the avatar's corresponding audio stream can be adjusted so that the right side is louder than the left side.

일 예에서, 단계(506)에서의 계산은 가상 카메라가 향하고 있는 각도와 아바타가 향하고 있는 각도의 외적을 취하는 것을 수반할 수 있다. 각도들은 수평 평면에서 이들이 향하고 있는 방향일 수 있다.In one example, the calculation at step 506 may involve taking the cross product of the angle at which the virtual camera is facing and the angle at which the avatar is facing. The angles may be the direction they are facing in the horizontal plane.

일 실시예에서, 사용자가 사용하고 있는 오디오 출력 디바이스를 결정하기 위해 검사가 수행될 수 있다. 오디오 출력 디바이스가 스테레오 효과를 제공하는 헤드폰 세트 또는 다른 유형의 스피커가 아닌 경우, 단계(504 및 506)에서의 조정들이 발생하지 않을 수 있다.In one embodiment, a check may be performed to determine which audio output device the user is using. If the audio output device is not a set of headphones or another type of speaker that provides a stereo effect, the adjustments in steps 504 and 506 may not occur.

단계들(502 내지 506)은 모든 다른 참가자로부터 수신되는 모든 오디오 스트림에 대해 반복된다. 단계들(502 내지 506)에서의 계산들에 기초하여, 모든 다른 참가자에 대해 좌측 및 우측 오디오 이득이 계산된다.Steps 502-506 are repeated for all audio streams received from all other participants. Based on the calculations in steps 502-506, left and right audio gains are calculated for all other participants.

이러한 방식으로, 각각의 참가자에 대한 오디오 스트림들은 참가자의 아바타가 3차원 가상 환경에서 어디에 위치하는지의 감각을 제공하도록 조정된다.In this way, the audio streams for each participant are adjusted to provide a sense of where the participant's avatar is located in the three-dimensional virtual environment.

오디오 스트림은 아바타들이 어디에 위치하는지의 감각을 제공하도록 조정될 뿐만 아니라, 특정 실시예들에서, 오디오 스트림은 사적인 또는 준사적인(semi-private) 볼륨 영역들을 제공하도록 조정될 수 있다. 이러한 방식으로, 가상 환경은 사용자들이 사적인 대화를 나눌 수 있게 한다. 또한, 이는 사용자들이 서로 어울릴 수 있게 하고, 종래의 화상 회의 소프트웨어에서는 가능하지 않은 개별적이고 부차적인 대화를 할 수 있게 한다. 이것은, 예를 들어, 도 7과 관련하여 예시된다.Not only can the audio stream be adjusted to provide a sense of where the avatars are located, but in certain embodiments, the audio stream can be adjusted to provide private or semi-private volume regions. In this way, the virtual environment allows users to have private conversations. Additionally, it allows users to socialize with each other and have individual, secondary conversations not possible with conventional video conferencing software. This is illustrated, for example, in relation to Figure 7.

도 7은 화상 회의 동안 가상 환경에서 상이한 볼륨 영역들을 제공하기 위해 상대적인 볼륨을 조정하기 위한 방법(700)을 예시하는 플로차트이다.FIG. 7 is a flow chart illustrating a method 700 for adjusting relative volume to provide different volume regions in a virtual environment during video conferencing.

위에서 설명된 바와 같이, 서버는 클라이언트 디바이스들에게 사운드 또는 볼륨 영역들의 사양을 제공할 수 있다. 가상 환경은 상이한 볼륨 영역들로 분할될 수 있다. 단계(702)에서, 디바이스는 각자의 아바타들 및 가상 카메라가 어느 사운드 영역들에 위치하는지를 결정한다.As described above, the server may provide specifications of sound or volume regions to client devices. The virtual environment can be partitioned into different volume areas. At step 702, the device determines in which sound areas the respective avatars and virtual camera are located.

예를 들어, 도 8a 및 도 8b는 화상 회의 동안 가상 환경에서의 상이한 볼륨 영역들을 예시하는 다이어그램들이다. 도 8a는 아바타(806)를 제어하는 사용자와 가상 카메라를 제어하는 사용자 사이의 준사적이거나 부차적인 대화를 가능하게 하는 볼륨 영역(802)을 갖는 다이어그램(800)을 예시한다. 이러한 방식으로, 회의 테이블(810) 주위에 있는 사용자들은 방에 있는 다른 사람들을 방해하지 않고 대화를 나눌 수 있다. 가상 카메라에서의 아바타(806)를 제어하는 사용자로부터의 사운드는 가상 카메라가 볼륨 영역(802)에서 빠져나갈 때 떨어질 수 있지만 완전히 그러한 것은 아니다. 이것은 지나던 사람이 원하는 경우 대화에 합류할 수 있게 한다.For example, FIGS. 8A and 8B are diagrams illustrating different volume regions in a virtual environment during a video conference. FIG. 8A illustrates a diagram 800 with a volume area 802 that enables semi-private or secondary conversation between a user controlling an avatar 806 and a user controlling a virtual camera. In this way, users around conference table 810 can converse without disturbing others in the room. Sound from the user controlling the avatar 806 in the virtual camera may, but does not completely, drop when the virtual camera exits the volume area 802. This allows passers-by to join the conversation if they wish.

인터페이스(800)는 아래에서 설명될 버튼들(804, 806, 및 808)을 또한 포함한다.Interface 800 also includes buttons 804, 806, and 808, which will be described below.

도 8b는 아바타(808)를 제어하는 사용자와 가상 카메라를 제어하는 사용자 사이의 사적인 대화를 가능하게 하는 볼륨 영역(804)을 갖는 다이어그램(800)을 예시한다. 일단 볼륨 영역(804) 내부에 들어가면, 아바타(808)를 제어하는 사용자 및 가상 카메라를 제어하는 사용자로부터의 오디오는 볼륨 영역(804) 내부에 있는 사람들에게만 출력될 수 있다. 해당 사용자들로부터의 오디오가 회의 내의 다른 사용자들에게 전혀 재생되지 않기 때문에, 해당 사용자들의 오디오 스트림들이 다른 사용자 디바이스들에게도 전혀 전송되지 않을 수 있다.FIG. 8B illustrates a diagram 800 with a volume area 804 that enables private conversation between a user controlling an avatar 808 and a user controlling a virtual camera. Once inside the volume area 804, audio from the user controlling the avatar 808 and the user controlling the virtual camera can only be output to those inside the volume area 804. Because audio from those users is not played at all to other users in the conference, their audio streams may not be transmitted to other user devices at all.

볼륨 공간들은 도 9a 및 도 9b에 예시된 바와 같이 계층적일 수 있다. 도 9b는 계층구조로 배열된 상이한 볼륨 영역들을 갖는 레이아웃을 도시하는 다이어그램(930)이다. 볼륨 영역들(934 및 935)은 볼륨 영역(933) 내에 있고 볼륨 영역들(933 및 932)은 볼륨 영역(931) 내에 있다. 이러한 볼륨 영역들은, 다이어그램(900) 및 도 9a에 예시된 바와 같이, 계층적 트리로 표현된다.Volume spaces may be hierarchical, as illustrated in FIGS. 9A and 9B. Figure 9B is a diagram 930 showing a layout with different volume regions arranged in a hierarchy. Volume areas 934 and 935 are within volume area 933 and volume areas 933 and 932 are within volume area 931. These volume regions are represented as a hierarchical tree, as illustrated in diagram 900 and FIG. 9A.

다이어그램(900)에서, 노드(901)는 볼륨 영역(931)을 나타내고 트리의 루트이다. 노드들(902 및 903)은 노드(901)의 자식들이며, 볼륨 영역들(932 및 933)을 나타낸다. 노드들(904 및 906)은 노드(903)의 자식들이며, 볼륨 영역들(934 및 935)을 나타낸다.In diagram 900, node 901 represents volume area 931 and is the root of the tree. Nodes 902 and 903 are children of node 901 and represent volume regions 932 and 933. Nodes 904 and 906 are children of node 903 and represent volume regions 934 and 935.

영역(934)에 위치하는 사용자가 영역(932)에 위치하는 사용자의 말을 듣고자 하는 경우, 오디오 스트림은, 오디오 스트림을 각각 감쇠시키는, 다수의 상이한 가상 "벽들"을 통과해야 한다. 상세하게는, 사운드는 영역(932)의 벽, 영역(933)의 벽, 및 영역(934)의 벽을 통과해야 한다. 각각의 벽은 특정 인자에 의해 감쇠된다. 이 계산은 도 7에서의 단계들(704 및 706)과 관련하여 설명된다.If a user located in area 934 wishes to listen to a user located in area 932, the audio stream must pass through a number of different virtual “walls,” each attenuating the audio stream. Specifically, the sound must pass through the walls of area 932, the walls of area 933, and the walls of area 934. Each wall is attenuated by a specific factor. This calculation is described in relation to steps 704 and 706 in Figure 7.

단계(704)에서, 아바타들 사이에 어떤 다양한 사운드 영역들이 있는지를 결정하기 위해 계층구조가 순회된다. 이것은, 예를 들어, 도 9c에 예시되어 있다. 말소리(speaking voice)의 가상 영역에 대응하는 노드(이 경우에 노드(904))로부터 시작하여, 수신하는 사용자의 노드(이 경우에, 노드(902))로의 경로가 결정된다. 경로를 결정하기 위해, 노드에서 노드로 가는 링크들(952)이 결정된다. 이러한 방식으로, 아바타를 포함하는 영역과 가상 카메라를 포함하는 영역 사이의 영역 서브세트가 결정된다.At step 704, the hierarchy is traversed to determine what various sound regions there are among the avatars. This is illustrated, for example, in Figure 9C. Starting from a node corresponding to the virtual region of the speaking voice (in this case node 904), a path to the receiving user's node (in this case node 902) is determined. To determine the path, links 952 from node to node are determined. In this way, a subset of the area between the area containing the avatar and the area containing the virtual camera is determined.

단계(706)에서, 말하는 사용자로부터의 오디오 스트림이 영역 서브세트의 각자의 벽 투과 율들에 기초하여 감쇠된다. 각각의 각자의 벽 투과 율은 오디오 스트림이 얼마나 감쇠되는지를 지정한다.At step 706, the audio stream from the speaking user is attenuated based on the respective wall transmission rates of the area subset. Each respective wall transmittance specifies how much the audio stream is attenuated.

추가적으로 또는 대안적으로, 그 경우에 상이한 영역들은 상이한 롤 오프 율들을 가지며, 각자의 롤 오프 율들에 기초하여 방법(600)에 도시된 거리 기반 계산이 개별 영역들에 적용될 수 있다. 이러한 방식으로, 가상 환경의 상이한 영역들은 상이한 레이트들로 사운드를 투사한다. 도 5와 관련하여 위에서 설명된 바와 같은 방법에서 결정되는 오디오 이득들이 그에 따라 좌측 및 우측 오디오를 결정하기 위해 오디오 스트림에 적용될 수 있다. 이러한 방식으로, 포괄적인 오디오 경험을 제공하기 위해 벽 투과 율, 롤 오프 율, 및 사운드의 방향 감각을 제공하기 위한 좌우 조정 모두가 함께 적용될 수 있다.Additionally or alternatively, in that case different areas have different roll off rates and the distance based calculation shown in method 600 may be applied to the individual areas based on their respective roll off rates. In this way, different areas of the virtual environment project sound at different rates. The audio gains determined in a method as described above in relation to Figure 5 may be applied to the audio stream to determine left and right audio accordingly. In this way, wall penetration, roll-off rate, and side-to-side adjustments to provide a sense of direction of sound can all be applied together to provide a comprehensive audio experience.

상이한 오디오 영역들은 상이한 기능을 가질 수 있다. 예를 들어, 볼륨 영역은 연단 영역일 수 있다. 사용자가 연단 영역에 위치하는 경우, 도 5 또는 도 7과 관련하여 설명되는 감쇠의 일부 또는 전부가 발생하지 않을 수 있다. 예를 들어, 롤 오프 율 또는 벽 투과 율로 인해 감쇠가 발생하지 않을 수 있다. 일부 실시예들에서, 상대적인 좌우 오디오는 여전히 방향 감각을 제공하도록 조정될 수 있다.Different audio areas may have different functions. For example, the volume area may be a podium area. If the user is located in the podium area, some or all of the attenuation described with respect to Figures 5 or 7 may not occur. For example, roll-off rates or wall penetration rates may cause no attenuation. In some embodiments, relative left and right audio can be adjusted to still provide a sense of direction.

예시 목적으로, 도 5 및 도 7과 관련하여 설명되는 방법들은 대응하는 아바타를 갖는 사용자로부터의 오디오 스트림들을 설명하고 있다. 그렇지만, 아바타들 이외의 다른 사운드 소스들에 동일한 방법들이 적용될 수 있다. 예를 들어, 가상 환경은 스피커들의 3차원 모델들을 가질 수 있다. 프레젠테이션으로 인해 또는 단순히 배경 음악을 제공하기 위해, 위에서 설명된 아바타 모델들과 동일한 방식으로 스피커들로부터 사운드가 방출될 수 있다.For purposes of illustration, the methods described with respect to FIGS. 5 and 7 illustrate audio streams from a user with a corresponding avatar. However, the same methods can be applied to other sound sources other than avatars. For example, a virtual environment may have three-dimensional models of speakers. Sound may be emitted from speakers in the same way as the avatar models described above, either due to presentation or simply to provide background music.

위에서 언급된 바와 같이, 오디오를 완전히 격리시키기 위해 벽 투과 율이 사용될 수 있다. 일 실시예에서, 이것은 가상 사무실을 생성하는 데 사용될 수 있다. 일 예에서, 각각의 사용자는 가상 사무실에서 지속적으로 켜져 있고 가상 사무실에 로그인되어 있는 회의 애플리케이션을 디스플레이하는 모니터를 자신의 물리적(아마도 집) 사무실에 가질 수 있다. 사용자가 사무실에 있는지 또는 방해받지 않아야 하는지를 사용자가 표시할 수 있게 하는 기능이 있을 수 있다. 방해 금지 표시기가 꺼져 있는 경우, 동료 또는 관리자는 가상 공간 내에 들어와 실제 사무실에서와 같이 노크하거나 걸어 들어갈 수 있다. 방문자는 직원이 사무실에 없을 경우 메모를 남길 수 있다. 직원이 돌아올 때, 직원은 방문자가 남긴 메모를 읽을 수 있다. 가상 사무실은 사용자를 위한 메시지들을 디스플레이하는 화이트보드 및/또는 인터페이스를 가질 수 있다. 메시지들은 이메일일 수 있고/있거나 캘리포니아주 샌프란시스코의 Slack Technologies, Inc.로부터 이용 가능한 SLACK 애플리케이션과 같은 메시징 애플리케이션으로부터의 것일 수 있다.As mentioned above, wall transmissivity can be used to completely isolate the audio. In one embodiment, this can be used to create a virtual office. In one example, each user may have a monitor in their physical (perhaps home) office that is constantly turned on and displays a conferencing application logged into the virtual office. There may be a feature that allows the user to indicate whether the user is in the office or should not be disturbed. If the Do Not Disturb indicator is turned off, co-workers or managers can enter the virtual space and knock or walk in just like they would in a real office. Visitors can leave notes if employees are not in the office. When the employee returns, he or she can read the note left by the visitor. The virtual office may have a whiteboard and/or interface that displays messages for users. Messages may be email and/or from a messaging application, such as the SLACK application available from Slack Technologies, Inc., San Francisco, California.

사용자들은 자신의 가상 사무실들을 맞춤화하거나 개인화할 수 있다. 예를 들어, 사용자들은 포스터들 또는 다른 벽 장식품들의 모델들을 세울 수 있다. 사용자들은 책상 또는, 식물 재배와 같은, 장식용 장식품의 모델들 또는 배향을 변경할 수 있다. 사용자들은 조명을 바꾸거나 창 밖을 볼 수 있다.Users can customize or personalize their virtual offices. For example, users can set up models of posters or other wall decorations. Users can change the models or orientation of a desk or decorative item, such as a planter. Users can change the lighting or look out the window.

다시 도 8a로 돌아가서, 인터페이스(800)는 다양한 버튼들(804, 806, 및 808)을 포함한다. 사용자가 버튼(804)을 누를 때, 도 5 및 도 7에서의 방법들과 관련하여 위에서 설명된 감쇠가 발생하지 않을 수 있거나, 보다 적은 양으로만 발생할 수 있다. 그 상황에서, 사용자의 음성이 다른 사용자들에게 균일하게 출력되어, 사용자가 회의의 모든 참가자들에게 말을 할 수 있게 한다. 아래에서 설명될 것인 바와 같이, 사용자 비디오가 또한 가상 환경 내의 프레젠테이션 화면에도 출력될 수 있다. 사용자가 버튼(806)을 누를 때, 스피커 모드가 인에이블된다. 그 경우에, 배경 음악을 재생하는 것과 같이, 가상 환경 내의 사운드 소스들로부터 오디오가 출력된다. 사용자가 버튼(808)을 누를 때, 화면 공유 모드가 인에이블될 수 있어, 사용자가 다른 사용자들과 자신의 디바이스 상의 화면 또는 창의 콘텐츠를 공유할 수 있게 한다. 콘텐츠는 프레젠테이션 모델 상에 제시될 수 있다. 이것도 아래에서 설명될 것이다.Returning again to Figure 8A, interface 800 includes various buttons 804, 806, and 808. When the user presses button 804, the attenuation described above with respect to the methods in FIGS. 5 and 7 may not occur, or may only occur in a smaller amount. In that situation, the user's voice is output uniformly to other users, allowing the user to speak to all participants in the meeting. As will be explained below, user video may also be output to a presentation screen within the virtual environment. When the user presses button 806, speaker mode is enabled. In that case, audio is output from sound sources within the virtual environment, such as playing background music. When the user presses button 808, a screen sharing mode may be enabled, allowing the user to share the contents of a screen or window on his or her device with other users. Content may be presented on a presentation model. This will also be explained below.

3차원 환경에서 프레젠테이션하기Presenting in a 3D environment

도 10은 3차원 가상 환경에서의 3차원 모델(1004)과의 인터페이스(1000)를 예시한다. 도 1과 관련하여 위에서 설명된 바와 같이, 인터페이스(1000)는 가상 환경 주위를 내비게이션할 수 있는 사용자에게 디스플레이될 수 있다. 인터페이스(1000)에 예시된 바와 같이, 가상 환경은 아바타(1004) 및 3차원 모델(1002)을 포함한다.10 illustrates an interface 1000 with a 3D model 1004 in a 3D virtual environment. As described above with respect to Figure 1, interface 1000 may be displayed to a user who may navigate around the virtual environment. As illustrated in interface 1000, the virtual environment includes an avatar 1004 and a three-dimensional model 1002.

3차원 모델(1002)은 가상 공간 내부에 배치되는 제품의 3D 모델이다. 사람들은 모델을 관찰하기 위해 이 가상 공간에 합류할 수 있고, 그 주위를 걸을 수 있다. 제품은 경험을 향상시키기 위해 로컬화된 사운드를 가질 수 있다.The 3D model 1002 is a 3D model of a product placed inside a virtual space. People can join this virtual space to observe the model and walk around it. Products can have localized sound to enhance the experience.

보다 상세하게는, 가상 공간에 있는 발표자가 3D 모델을 보여주기를 원할 때, 발표자는 인터페이스로부터 원하는 모델을 선택한다. 이것은 세부 사항들(모델의 이름 및 경로를 포함함)을 업데이트하라는 메시지를 서버에게 송신한다. 이것은 자동으로 클라이언트들에게 통신될 것이다. 이러한 방식으로, 3차원 모델은 비디오 스트림을 제시하는 것과 동시에 디스플레이하기 위해 렌더링될 수 있다. 사용자들은 제품의 3차원 모델 주위에서 가상 카메라를 내비게이션할 수 있다.More specifically, when the presenter in the virtual space wants to show a 3D model, the presenter selects the desired model from the interface. This sends a message to the server to update the details (including the model's name and path). This will be automatically communicated to clients. In this way, the three-dimensional model can be rendered for display simultaneously with presenting the video stream. Users can navigate a virtual camera around a three-dimensional model of the product.

상이한 예들에서, 객체는 제품 시연일 수 있거나, 제품에 대한 광고일 수 있다.In different examples, the object may be a product demonstration, or may be an advertisement for a product.

도 11은 화상 회의에 사용되는 3차원 가상 환경에서의 프레젠테이션 화면 공유를 갖는 인터페이스(1100)를 예시한다. 도 1과 관련하여 위에서 설명된 바와 같이, 인터페이스(1100)는 가상 환경 주위를 내비게이션할 수 있는 사용자에게 디스플레이될 수 있다. 인터페이스(1100)에 예시된 바와 같이, 가상 환경은 아바타(1104) 및 프레젠테이션 화면(1106)을 포함한다.11 illustrates an interface 1100 with presentation screen sharing in a three-dimensional virtual environment used in video conferencing. As described above with respect to Figure 1, interface 1100 may be displayed to a user who may navigate around the virtual environment. As illustrated in interface 1100, the virtual environment includes an avatar 1104 and a presentation screen 1106.

이 실시예에서, 회의의 참가자의 디바이스로부터의 프레젠테이션 스트림이 수신된다. 프레젠테이션 스트림은 프레젠테이션 화면(1106)의 3차원 모델 상으로 텍스처 매핑된다. 일 실시예에서, 프레젠테이션 스트림은 사용자의 디바이스 상의 카메라로부터의 비디오 스트림일 수 있다. 다른 실시예에서, 프레젠테이션 스트림은, 모니터 또는 창이 공유되는, 사용자의 디바이스로부터의 화면 공유일 수 있다. 화면 공유를 통해 또는 다른 방식으로, 프레젠테이션 비디오 및 오디오 스트림은 또한 외부 소스, 예를 들어, 이벤트의 라이브 스트림으로부터의 것일 수 있다. 사용자가 발표자 모드를 인에이블시킬 때, 사용자가 사용하기를 원하는 화면의 이름으로 태깅된 사용자의 프레젠테이션 스트림(및 오디오 스트림)이 서버에 게시된다. 다른 클라이언트들은 새로운 스트림이 이용 가능하다는 통지를 받는다.In this embodiment, a presentation stream is received from the devices of participants in a conference. The presentation stream is texture mapped onto a three-dimensional model of the presentation screen 1106. In one embodiment, the presentation stream may be a video stream from a camera on the user's device. In another embodiment, the presentation stream may be a screen share from the user's device, where a monitor or window is shared. The presentation video and audio streams may also be from external sources, such as a live stream of the event, through screen sharing or otherwise. When a user enables presenter mode, the user's presentation stream (and audio stream) is posted to the server, tagged with the name of the screen the user wishes to use. Other clients are notified that a new stream is available.

발표자는 또한 청중 구성원들의 위치와 배향을 제어할 수 있다. 예를 들어, 발표자는 프레젠테이션 화면을 향하도록 배치되고 배향되도록 회의의 모든 다른 참가자들을 재배열하도록 선택하는 옵션을 가질 수 있다.The presenter can also control the position and orientation of audience members. For example, a presenter may have the option of choosing to rearrange all other participants in the meeting so that they are positioned and oriented towards the presentation screen.

오디오 스트림은 프레젠테이션 스트림과 동기하여 제1 참가자의 디바이스의 마이크로폰으로부터 캡처된다. 사용자의 마이크로폰으로부터의 오디오 스트림은 프레젠테이션 화면(1106)으로부터 나오는 것처럼 다른 사용자들에게 들릴 수 있다. 이러한 방식으로, 프레젠테이션 화면(1106)은 위에서 설명된 바와 같은 사운드 소스일 수 있다. 사용자의 오디오 스트림이 프레젠테이션 화면(1106)으로부터 투사되기 때문에, 사용자의 아바타로부터 오는 것이 억제될 수 있다. 이러한 방식으로, 오디오 스트림은 3차원 가상 공간 내의 화면(1106) 상에 프레젠테이션 스트림을 디스플레이하는 것과 동기하여 재생되도록 출력된다.An audio stream is captured from the microphone of the first participant's device in synchronization with the presentation stream. The audio stream from the user's microphone may be heard by other users as if it were coming from the presentation screen 1106. In this way, presentation screen 1106 may be a sound source as described above. Because the user's audio stream is projected from the presentation screen 1106, it may be suppressed from coming from the user's avatar. In this way, the audio stream is output for playback in synchronization with displaying the presentation stream on screen 1106 within the three-dimensional virtual space.

사용자들 사이의 거리에 기초한 대역폭 할당Bandwidth allocation based on distance between users

도 12는 3차원 가상 환경 내에서의 아바타들의 상대적인 위치에 기초하여 이용 가능한 대역폭을 배분하기 위한 방법(1200)을 예시하는 플로차트이다.FIG. 12 is a flow chart illustrating a method 1200 for allocating available bandwidth based on the relative positions of avatars within a three-dimensional virtual environment.

단계(1202)에서, 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리가 결정된다. 거리는 3차원 공간에서 수평 평면에서의 그들 사이의 거리일 수 있다.At step 1202, the distance between the first user and the second user in the virtual meeting space is determined. The distance may be the distance between them in a horizontal plane in three-dimensional space.

단계(1204)에서, 보다 가까운 사용자들의 비디오 스트림들이 보다 멀리 있는 사용자들로부터의 비디오 스트림들보다 우선순위를 부여받도록 수신된 비디오 스트림들에 우선순위가 부여된다. 우선순위 값은 도 13에 예시된 바와 같이 결정될 수 있다.At step 1204, the received video streams are prioritized such that video streams from closer users are given priority over video streams from more distant users. The priority value may be determined as illustrated in FIG. 13.

도 13은 y 축 상의 우선순위(1306) 및 거리(1302)를 보여주는 차트(1300)를 도시한다. 라인(1306)에 의해 예시된 바와 같이, 기준 거리(1304)에 도달할 때까지 우선순위 상태가 일정한 레벨을 유지한다. 기준 거리에 도달한 후에, 우선순위가 떨어지기 시작한다.13 shows a chart 1300 showing priority 1306 and distance 1302 on the y axis. As illustrated by line 1306, the priority state remains at a constant level until the reference distance 1304 is reached. After reaching the baseline distance, priority begins to drop.

단계(1206)에서, 사용자 디바이스에 이용 가능한 대역폭이 다양한 비디오 스트림들 간에 배분된다. 이것은 단계(1204)에서 결정되는 우선순위 값들에 기초하여 수행될 수 있다. 예를 들어, 우선순위들은 모두의 합이 1이 되도록 비례적으로 조정될 수 있다. 불충분한 대역폭이 이용 가능한 임의의 비디오들의 경우, 상대적인 우선순위가 0이 될 수 있다. 그러면, 나머지 비디오 스트림들에 대해 우선순위들이 또다시 조정된다. 대역폭은 이러한 상대적인 우선순위 값들에 기초하여 할당된다. 추가적으로, 오디오 스트림들을 위한 대역폭이 예약될 수 있다. 이것은 도 14에 예시되어 있다.At step 1206, the bandwidth available to the user device is distributed among the various video streams. This may be performed based on priority values determined at step 1204. For example, priorities can be adjusted proportionally so that they all add up to 1. For any videos for which insufficient bandwidth is available, the relative priority may be 0. Then, the priorities are adjusted again for the remaining video streams. Bandwidth is allocated based on these relative priority values. Additionally, bandwidth for audio streams may be reserved. This is illustrated in Figure 14.

도 14는 대역폭(1406)을 나타내는 y 축 및 상대적인 우선순위를 나타내는 x 축을 갖는 차트(1400)를 예시한다. 비디오가 유효한 최소 대역폭(1406)을 할당받은 후에, 비디오 스트림에 할당되는 대역폭(1406)은 그의 상대적인 우선순위에 비례하여 증가한다.14 illustrates a chart 1400 with the y-axis representing bandwidth 1406 and the x-axis representing relative priority. After the video has been allocated the minimum available bandwidth 1406, the bandwidth 1406 allocated to the video stream increases in proportion to its relative priority.

일단 할당된 대역폭이 결정되면, 클라이언트는 비디오를 해당 비디오에 대해 선택되고 할당된 대역폭/비트레이트/프레임 레이트/해상도로 서버에 요청할 수 있다. 이것은 지정된 대역폭으로 비디오를 스트리밍하기 시작하기 위해 클라이언트와 서버 사이의 협상 프로세스를 시작할 수 있다. 이러한 방식으로, 이용 가능한 비디오 및 오디오 대역폭은 모든 사용자들에 걸쳐 공정하게 분배되며, 여기서 두 배의 우선순위를 갖는 사용자들은 두 배의 대역폭을 받을 것이다.Once the allocated bandwidth is determined, the client can request video from the server at the bandwidth/bitrate/frame rate/resolution selected and allocated for that video. This can start the negotiation process between the client and server to start streaming video at the specified bandwidth. In this way, the available video and audio bandwidth is distributed fairly across all users, where users with double priority will receive double bandwidth.

하나의 가능한 구현에서, 사이멀캐스트(simulcast)를 사용하여, 모든 클라이언트들이, 상이한 비트레이트들 및 해상도들로, 다수의 비디오 스트림들을 서버에게 송신한다. 다른 클라이언트들은 그러면 이러한 스트림들 중 어느 것에 관심이 있고 수신하기를 원하는지를 서버에 알려 줄 수 있다.In one possible implementation, using simulcast, all clients send multiple video streams, at different bitrates and resolutions, to the server. Other clients can then tell the server which of these streams they are interested in and want to receive.

단계(1208)에서, 가상 회의 공간에서 제1 사용자와 제2 사용자 사이에서 이용 가능한 대역폭이 해당 거리에서의 비디오의 디스플레이를 비효과적이게 하는지 여부가 결정된다. 이 결정은 클라이언트 또는 서버에 의해 수행될 수 있다. 클라이언트에 의한 경우, 클라이언트는 클라이언트로의 비디오의 전송을 중단하라는 서버에 대한 메시지를 송신한다. 비효과적인 경우, 제2 사용자의 디바이스로의 비디오 스트림의 전송이 중단되고, 제2 사용자의 디바이스는 비디오 스트림을 정지 이미지로 대체하도록 통지받는다. 정지 이미지는 단순히 수신된 마지막 비디오 프레임들(또는 수신된 마지막 비디오 프레임들 중 하나)일 수 있다.At step 1208, it is determined whether the available bandwidth between the first user and the second user in the virtual meeting space makes display of the video ineffective at that distance. This decision can be made by the client or server. In the case of a client, the client sends a message to the server to stop sending video to the client. If ineffective, transmission of the video stream to the second user's device is stopped, and the second user's device is notified to replace the video stream with a still image. The still image may simply be the last video frames received (or one of the last video frames received).

일 실시예에서, 오디오에 대해 유사한 프로세스가 실행될 수 있어, 오디오에 대해 예약된 부분의 크기가 주어진 경우 품질을 감소시킬 수 있다. 다른 실시예에서, 각각의 오디오 스트림은 일관된 대역폭을 부여받는다.In one embodiment, a similar process can be performed for audio, reducing quality given the size of the portion reserved for audio. In another embodiment, each audio stream is given a consistent bandwidth.

이러한 방식으로, 실시예들은 모든 사용자들에 대해 성능을 증가시키고, 서버의 경우 보다 멀리 떨어져 있고/있거나 덜 중요한 사용자들에 대해 비디오 및 오디오 스트림 품질이 감소될 수 있다. 충분한 대역폭 버짓이 이용 가능할 때 이것은 수행되지 않는다. 감소는 비트레이트 및 해상도 양쪽 모두에서 수행된다. 이것은 비디오 품질을 개선시키는데, 그 이유는 해당 사용자에 이용 가능한 대역폭이 인코더에 의해 보다 효율적으로 활용될 수 있기 때문이다.In this way, embodiments can increase performance for all users, while reducing video and audio stream quality for users who are more distant and/or less critical to the server. This is not done when sufficient bandwidth budget is available. Reduction is performed in both bitrate and resolution. This improves video quality because the bandwidth available to that user can be utilized more efficiently by the encoder.

이와는 별개로, 비디오 해상도가 거리에 기초하여 축소되며, 두 배 더 멀리 떨어져 있는 사용자들은 절반의 해상도를 갖는다. 이러한 방식으로, 화면 해상도의 제한이 주어진 경우, 불필요한 해상도는 다운로드되지 않을 수 있다. 따라서, 대역폭이 보존된다.Separately, video resolution is scaled down based on distance, so users twice as far away have half the resolution. In this way, given screen resolution limitations, unnecessary resolutions may not be downloaded. Therefore, bandwidth is conserved.

도 15는 가상 환경 내에서 화상 회의를 제공하기 위해 사용되는 디바이스들의 컴포넌트들을 예시하는 시스템(1500)의 다이어그램이다. 다양한 실시예들에서, 시스템(1500)은 위에서 설명된 방법들에 따라 작동할 수 있다.FIG. 15 is a diagram of system 1500 illustrating components of devices used to provide video conferencing within a virtual environment. In various embodiments, system 1500 may operate according to the methods described above.

디바이스(306A)는 사용자 컴퓨팅 디바이스이다. 디바이스(306A)는 데스크톱 또는 랩톱 컴퓨터, 스마트폰, 태블릿, 또는 웨어러블(예를 들면, 시계 또는 헤드 마운티드 디바이스)일 수 있다. 디바이스(306A)는 마이크로폰(1502), 카메라(1504), 스테레오 스피커(1506), 입력 디바이스(1512)를 포함한다. 도시되어 있지 않지만, 디바이스(306A)는 프로세서 및 영구적, 비일시적 및 휘발성 메모리를 또한 포함한다. 프로세서들은 하나 이상의 중앙 처리 유닛, 그래픽 처리 유닛 또는 이들의 임의의 조합을 포함할 수 있다.Device 306A is a user computing device. Device 306A may be a desktop or laptop computer, smartphone, tablet, or wearable (eg, a watch or head mounted device). Device 306A includes microphone 1502, camera 1504, stereo speakers 1506, and input device 1512. Although not shown, device 306A also includes a processor and persistent, non-transitory and volatile memory. Processors may include one or more central processing units, graphics processing units, or any combination thereof.

마이크로폰(1502)은 사운드를 전기 신호로 변환한다. 마이크로폰(1502)은 디바이스(306A)의 사용자의 음성을 캡처하도록 배치된다. 상이한 예들에서, 마이크로폰(1502)은 콘덴서 마이크로폰, 일렉트릿 마이크로폰, 이동 코일 마이크로폰, 리본 마이크로폰, 탄소 마이크로폰, 압전 마이크로폰, 광섬유 마이크로폰, 레이저 마이크로폰, 물 마이크로폰, 또는 MEMS 마이크로폰일 수 있다.Microphone 1502 converts sound into electrical signals. Microphone 1502 is positioned to capture the voice of the user of device 306A. In different examples, microphone 1502 may be a condenser microphone, an electret microphone, a moving coil microphone, a ribbon microphone, a carbon microphone, a piezoelectric microphone, a fiber optic microphone, a laser microphone, a water microphone, or a MEMS microphone.

카메라(1504)는 일반적으로 하나 이상의 렌즈를 통해 광을 캡처하는 것에 의해 이미지 데이터를 캡처한다. 카메라(1504)는 디바이스(306A)의 사용자의 사진 이미지들을 캡처하도록 배치된다. 카메라(1504)는 이미지 센서(도시되지 않음)를 포함한다. 이미지 센서는, 예를 들어, CCD(charge coupled device) 센서 또는 CMOS(complementary metal oxide semiconductor) 센서일 수 있다. 이미지 센서는 광을 검출하고 전기 신호들로 변환하는 하나 이상의 광검출기를 포함할 수 있다. 유사한 시간 프레임에서 함께 캡처되는 이러한 전기 신호들은 정지 사진 이미지를 구성한다. 일정한 간격으로 캡처되는 일련의 정지 사진 이미지들은 함께 비디오를 구성한다. 이러한 방식으로, 카메라(1504)는 이미지들 및 비디오들을 캡처한다.Camera 1504 captures image data generally by capturing light through one or more lenses. Camera 1504 is positioned to capture photographic images of the user of device 306A. Camera 1504 includes an image sensor (not shown). The image sensor may be, for example, a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor. An image sensor may include one or more photodetectors that detect light and convert it into electrical signals. These electrical signals captured together in similar time frames make up a still photographic image. A series of still photographic images captured at regular intervals together make up a video. In this way, camera 1504 captures images and videos.

스테레오 스피커(1506)는 전기 오디오 신호를 대응하는 좌우 사운드로 변환하는 디바이스이다. 스테레오 스피커(1506)는 오디오 프로세서(1520)(아래)에 의해 생성되는 좌측 오디오 스트림과 우측 오디오 스트림을 스테레오로 디바이스(306A)의 사용자에게 재생되도록 출력한다. 스테레오 스피커(1506)는 사용자의 좌측 귀와 우측 귀에 직접적으로 사운드를 재생하도록 설계된 주변 스피커들 및 헤드폰들 양쪽 모두를 포함한다. 예시적인 스피커들은 가동 철편(moving-iron) 라우드스피커, 압전 스피커, 정자기 라우드스피커, 정전기 라우드스피커, 리본 및 평면 자기 라우드스피커, 굽힘파 라우드스피커, 평면 패널 라우드스피커, 하일 에어 모션 트랜스듀서(heil air motion transducer), 투명 이온 전도 스피커, 플라즈마 아크 스피커, 열음향 스피커, 로터리 우퍼, 가동 코일, 정전기, 일렉트릿, 평면 자기, 및 균형 전기자(balanced armature)를 포함한다.The stereo speaker 1506 is a device that converts electrical audio signals into corresponding left and right sounds. The stereo speaker 1506 outputs the left and right audio streams generated by the audio processor 1520 (below) to be played in stereo to the user of device 306A. Stereo speakers 1506 include both peripheral speakers and headphones designed to reproduce sound directly to the user's left and right ears. Exemplary speakers include moving-iron loudspeakers, piezoelectric loudspeakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, and heil air motion transducers. air motion transducer), transparent ion conduction speaker, plasma arc speaker, thermoacoustic speaker, rotary woofer, moving coil, electrostatic, electret, planar magnetic, and balanced armature.

네트워크 인터페이스(1508)는 컴퓨터 네트워크에서 2 개의 장비 또는 프로토콜 계층 사이의 소프트웨어 또는 하드웨어 인터페이스이다. 네트워크 인터페이스(1508)는 회의의 각자의 참가자들에 대한 서버(302)로부터의 비디오 스트림을 수신한다. 비디오 스트림은 화상 회의의 다른 참가자의 디바이스 상의 카메라로부터 캡처된다. 네트워크 인터페이스(1508)는 또한 서버(302)로부터 3차원 가상 공간 및 그 안의 임의의 모델들을 지정하는 데이터를 수신한다. 다른 참가자들 각각에 대해, 네트워크 인터페이스(1508)는 3차원 가상 공간에서의 위치 및 방향을 수신한다. 위치 및 방향은 각자의 다른 참가자들 각각에 의해 입력된다.Network interface 1508 is a software or hardware interface between two devices or protocol layers in a computer network. Network interface 1508 receives video streams from server 302 for each participant of the conference. The video stream is captured from cameras on the devices of other participants in the video conference. Network interface 1508 also receives data from server 302 specifying a three-dimensional virtual space and arbitrary models therein. For each of the other participants, network interface 1508 receives a position and orientation in three-dimensional virtual space. Location and direction are entered by each of the different participants.

네트워크 인터페이스(1508)는 또한 데이터를 서버(302)에게 전송한다. 이는 렌더러(1518)에 의해 사용되는 디바이스(306A)의 사용자의 가상 카메라의 위치를 전송하고, 카메라(1504)와 마이크로폰(1502)으로부터의 비디오 및 오디오 스트림들을 전송한다.Network interface 1508 also transmits data to server 302. This transmits the location of the user's virtual camera on device 306A, which is used by renderer 1518, and transmits video and audio streams from camera 1504 and microphone 1502.

디스플레이(1510)는 시각적 또는 촉각적 형태(후자는, 예를 들어, 시각 장애인을 위한 촉각 전자 디스플레이들에서 사용됨)로 전자 정보를 제시하기 위한 출력 디바이스이다. 디스플레이(1510)는 텔레비전 세트, 컴퓨터 모니터, 헤드 마운티드 디스플레이, 헤드업 디스플레이, 증강 현실 또는 가상 현실 헤드셋의 출력, 방송 참조 모니터(broadcast reference monitor), 의료 모니터, (모바일 디바이스를 위한) 모바일 디스플레이, (스마트폰을 위한) 스마트폰 디스플레이일 수 있다. 정보를 제시하기 위해, 디스플레이(1510)는 전자발광(ELD) 디스플레이, 액정 디스플레이(LCD), 발광 다이오드(LED) 백라이트 LCD, 박막 트랜지스터(TFT) LCD, 발광 다이오드(LED) 디스플레이, OLED 디스플레이, AMOLED 디스플레이, 플라즈마(PDP) 디스플레이, 양자점(QLED) 디스플레이를 포함할 수 있다.Display 1510 is an output device for presenting electronic information in visual or tactile form (the latter used, for example, in tactile electronic displays for the visually impaired). Display 1510 may be used in a television set, a computer monitor, a head mounted display, a heads-up display, the output of an augmented or virtual reality headset, a broadcast reference monitor, a medical monitor, a mobile display (for mobile devices), ( It may be a smartphone display (for a smartphone). To present information, display 1510 may be an electroluminescent (ELD) display, a liquid crystal display (LCD), a light emitting diode (LED) backlit LCD, a thin film transistor (TFT) LCD, a light emitting diode (LED) display, an OLED display, or an AMOLED display. It may include a display, a plasma (PDP) display, and a quantum dot (QLED) display.

입력 디바이스(1512)는 컴퓨터 또는 정보 기기와 같은 정보 처리 시스템에 데이터 및 제어 신호들을 제공하는 데 사용되는 장비이다. 입력 디바이스(1512)는 사용자가 렌더러(1518)에 의해 사용되는 가상 카메라의 새로운 원하는 위치를 입력할 수 있게 함으로써, 3차원 환경에서 내비게이션을 가능하게 한다. 입력 디바이스의 예들은 키보드, 마우스, 스캐너, 조이스틱, 및 터치스크린을 포함한다.Input device 1512 is equipment used to provide data and control signals to an information processing system, such as a computer or information device. Input device 1512 enables navigation in a three-dimensional environment by allowing a user to input a new desired location for a virtual camera used by renderer 1518. Examples of input devices include keyboards, mice, scanners, joysticks, and touchscreens.

웹 브라우저(308A) 및 웹 애플리케이션(310A)은 도 3과 관련하여 위에서 설명되었다. 웹 애플리케이션(310A)은 화면 캡처기(1514), 텍스처 매퍼(1516), 렌더러(1518), 및 오디오 프로세서(1520)를 포함한다.Web browser 308A and web application 310A were described above with respect to FIG. 3 . Web application 310A includes a screen capturer 1514, texture mapper 1516, renderer 1518, and audio processor 1520.

화면 캡처기(1514)는 프레젠테이션 스트림, 상세하게는 화면 공유를 캡처한다. 화면 캡처기(1514)는 웹 브라우저(308A)에 의해 이용가능하게 되는 API와 상호작용할 수 있다. API로부터 이용 가능한 함수를 호출하는 것에 의해, 화면 캡처기(1514)는 웹 브라우저(308A)로 하여금 사용자가 어느 창 또는 화면을 공유하기를 원하는지를 사용자에게 질문하게 할 수 있다. 그 질의에 대한 답변에 기초하여, 웹 브라우저(308A)는 화면 공유에 대응하는 비디오 스트림을 화면 캡처기(1514)에 반환할 수 있고, 화면 캡처기(1514)는 서버(302) 및 궁극적으로 다른 참가자들의 디바이스들에게 전송하기 위해 이를 네트워크 인터페이스(1508)에 전달한다.Screen capturer 1514 captures the presentation stream, specifically screen sharing. Screen capturer 1514 can interact with APIs made available by web browser 308A. By calling functions available from the API, screen capturer 1514 can cause web browser 308A to ask the user which window or screen the user wishes to share. Based on the answer to that query, web browser 308A may return a video stream corresponding to the screen share to screen capturer 1514, which may then transmit the video stream to server 302 and ultimately other This is passed to the network interface 1508 for transmission to the participants' devices.

텍스처 매퍼(1516)는 비디오 스트림을 아바타에 대응하는 3차원 모델 상으로 텍스처 매핑한다. 텍스처 매퍼(1516)는 비디오로부터의 각자의 프레임들을 아바타에 텍스처 매핑할 수 있다. 추가적으로, 텍스처 매퍼(1516)는 프레젠테이션 스트림을 프레젠테이션 화면의 3차원 모델에 텍스처 매핑할 수 있다.The texture mapper 1516 texture maps the video stream onto a 3D model corresponding to the avatar. Texture mapper 1516 may texture map individual frames from the video to the avatar. Additionally, the texture mapper 1516 can texture map the presentation stream to a three-dimensional model of the presentation screen.

렌더러(1518)는, 디바이스(306A)의 사용자의 가상 카메라의 시점으로부터, 디스플레이(1510)에 출력하기 위해 수신된 대응하는 위치에 위치하고 그 방향으로 배향되는 각자의 참가자들에 대한 아바타들의 텍스처 매핑된 3차원 모델들을 포함하는 3차원 가상 공간을 렌더링한다. 렌더러(1518)는 또한, 예를 들어, 프레젠테이션 화면을 포함한 임의의 다른 3차원 모델들을 렌더링한다.Renderer 1518 generates texture-mapped images of the avatars for each participant, from the viewpoint of the user's virtual camera on device 306A, positioned and oriented in the corresponding positions received for output to display 1510. Render a 3D virtual space containing 3D models. Renderer 1518 also renders any other three-dimensional models, including, for example, presentation screens.

오디오 프로세서(1520)는 제2 위치가 3차원 가상 공간에서 제1 위치를 기준으로 어디에 있는지의 감각을 제공하도록 좌측 오디오 스트림 및 우측 오디오 스트림을 결정하기 위해 수신된 오디오 스트림의 볼륨을 조정한다. 일 실시예에서, 오디오 프로세서(1520)는 제2 위치와 제1 위치 사이의 거리에 기초하여 볼륨을 조정한다. 다른 실시예에서, 오디오 프로세서(1520)는 제2 위치로부터 제1 위치로의 방향에 기초하여 볼륨을 조절한다. 또 다른 실시예에서, 오디오 프로세서(1520)는 3차원 가상 공간 내의 수평 평면 상에서 제1 위치를 기준으로 제2 위치의 방향에 기초하여 볼륨을 조정한다. 또 다른 실시예에서, 오디오 프로세서(1520)는 가상 카메라가 3차원 가상 공간에서 향하고 있는 방향에 기초하여, 아바타가 가상 카메라의 좌측에 위치할 때는 좌측 오디오 스트림이 더 높은 볼륨을 갖는 경향이 있고 아바타가 가상 카메라의 우측에 위치할 때는 우측 오디오 스트림이 더 높은 볼륨을 가지는 경향이 있도록, 볼륨을 조정한다. 마지막으로, 또 다른 실시예에서, 오디오 프로세서(1520)는 가상 카메라가 향하고 있는 방향과 아바타가 향하고 있는 방향 사이의 각도에 기초하여, 이 각도가 아바타가 향하고 있는 곳에 더 수직인 것이 좌측 오디오 스트림과 우측 오디오 스트림 사이의 더 큰 볼륨 차이를 갖는 경향이 있도록, 볼륨을 조정한다.Audio processor 1520 adjusts the volume of the received audio stream to determine the left and right audio streams to provide a sense of where the second location is relative to the first location in three-dimensional virtual space. In one embodiment, audio processor 1520 adjusts the volume based on the distance between the second location and the first location. In another embodiment, audio processor 1520 adjusts the volume based on the direction from the second location to the first location. In another embodiment, the audio processor 1520 adjusts the volume based on the orientation of the second location relative to the first location on a horizontal plane within the three-dimensional virtual space. In another embodiment, audio processor 1520 may determine that, based on the direction the virtual camera is facing in three-dimensional virtual space, the left audio stream tends to have a higher volume when the avatar is positioned to the left of the virtual camera and the avatar is positioned to the left of the virtual camera. Adjust the volume so that the right audio stream tends to have a higher volume when positioned to the right of the virtual camera. Finally, in another embodiment, audio processor 1520 may, based on the angle between the direction the virtual camera is facing and the direction the avatar is facing, determine which angle is more perpendicular to where the avatar is facing than the left audio stream. Adjust the volume so that there tends to be a larger volume difference between the right and left audio streams.

오디오 프로세서(1520)는 또한 가상 카메라가 위치하는 영역을 기준으로 스피커가 위치하는 영역에 기초하여 오디오 스트림의 볼륨을 조정할 수 있다. 이 실시예에서, 3차원 가상 공간은 복수의 영역들로 분할된다. 이러한 영역들은 계층적일 수 있다. 스피커와 가상 카메라가 상이한 영역들에 위치할 때, 말하는 오디오 스트림의 볼륨을 감쇠시키기 위해 벽 투과 율이 적용될 수 있다.The audio processor 1520 may also adjust the volume of the audio stream based on the area where the speaker is located relative to the area where the virtual camera is located. In this embodiment, the three-dimensional virtual space is divided into multiple regions. These areas can be hierarchical. When the speaker and virtual camera are located in different areas, wall transmittance can be applied to attenuate the volume of the speaking audio stream.

서버(302)는 참석 통지기(1522), 스트림 조정기(1524), 및 스트림 포워더(1526)를 포함한다.Server 302 includes attendance notifier 1522, stream coordinator 1524, and stream forwarder 1526.

참석 통지기(1522)는 참가자들이 회의에 합류하고 회의에서 나갈 때 회의 참가자들에게 통지한다. 새로운 참가자가 회의에 합류할 때, 참석 통지기(1522)는 새로운 참가자가 합류했다는 것을 나타내는 메시지를 회의의 다른 참가자들의 디바이스들에게 송신한다. 참석 통지기(1522)는 비디오, 오디오, 및 위치/방향 정보를 다른 참가자들에게 전달하기 시작하도록 스트림 포워더(1526)에 시그널링한다.Attendance notifier 1522 notifies meeting participants when they join and leave the meeting. When a new participant joins the conference, attendance notifier 1522 sends a message to the devices of other participants in the conference indicating that the new participant has joined. Attendance notifier 1522 signals stream forwarder 1526 to begin forwarding video, audio, and location/orientation information to other participants.

스트림 조정기(1524)는 제1 사용자의 디바이스 상의 카메라로부터 캡처되는 비디오 스트림을 수신한다. 스트림 조정기(1524)는 가상 회의를 위한 데이터를 제2 사용자에게 전송하기 위해 이용 가능한 대역폭을 결정한다. 이는 가상 회의 공간에서의 제1 사용자와 제2 사용자 사이의 거리를 결정한다. 그리고, 이는 상대적인 거리에 기초하여 제1 비디오 스트림과 제2 비디오 스트림 간에 이용 가능한 대역폭을 배분한다. 이러한 방식으로, 스트림 조정기(1524)는 보다 멀리 있는 사용자들로부터의 비디오 스트림들보다 보다 가까운 사용자들의 비디오 스트림들에 우선순위를 부여한다. 추가적으로 또는 대안적으로, 스트림 조정기(1524)는, 아마도 웹 애플리케이션(310A)의 일부로서, 디바이스(306A) 상에 위치할 수 있다.Stream coordinator 1524 receives a video stream captured from a camera on the first user's device. Stream coordinator 1524 determines available bandwidth to transmit data for the virtual conference to the second user. This determines the distance between the first user and the second user in the virtual meeting space. It then distributes the available bandwidth between the first and second video streams based on their relative distances. In this way, stream coordinator 1524 gives priority to video streams from closer users over video streams from more distant users. Additionally or alternatively, stream coordinator 1524 may be located on device 306A, perhaps as part of web application 310A.

스트림 포워더(1526)는 수신되는 위치/방향 정보, 비디오, 오디오, 및 화면 공유 화면들을 (스트림 조정기(1524)에 의해 이루어진 조정들과 함께) 브로드캐스트한다. 스트림 포워더(1526)는 회의 애플리케이션(310A)으로부터의 요청에 응답하여 정보를 디바이스(306A)에게 송신할 수 있다. 회의 애플리케이션(310A)은 참석 통지기(1522)로부터의 통지에 응답하여 해당 요청을 송신할 수 있다.Stream forwarder 1526 broadcasts received location/orientation information, video, audio, and screen sharing footage (along with any adjustments made by stream coordinator 1524). Stream forwarder 1526 may transmit information to device 306A in response to a request from conference application 310A. Meeting application 310A may send the request in response to a notification from attendance notifier 1522.

네트워크 인터페이스(1528)는 컴퓨터 네트워크에서 2 개의 장비 또는 프로토콜 계층 사이의 소프트웨어 또는 하드웨어 인터페이스이다. 네트워크 인터페이스(1528)는 모델 정보를 다양한 참가자들의 디바이스들에게 전송한다. 네트워크 인터페이스(1528)는 다양한 참가자들로부터 비디오, 오디오, 및 화면 공유 화면들을 수신한다.Network interface 1528 is a software or hardware interface between two devices or protocol layers in a computer network. Network interface 1528 transmits model information to the various participants' devices. Network interface 1528 receives video, audio, and screen sharing footage from various participants.

화면 캡처기(1514), 텍스처 매퍼(1516), 렌더러(1518), 오디오 프로세서(1520), 참석 통지기(1522), 스트림 조정기(1524), 및 스트림 포워더(1526)는 각각 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수 있다.The screen capturer 1514, texture mapper 1516, renderer 1518, audio processor 1520, presence notifier 1522, stream handler 1524, and stream forwarder 1526 each include hardware, software, firmware, Or it may be implemented as any combination thereof.

"(a)," "(b)," "(i)," "(ii)" 등과 같은, 식별자들은 때때로 상이한 요소들 또는 단계들에 사용된다. 이러한 식별자들은 명확함을 위해 사용되며 반드시 요소들 또는 단계들의 순서를 지정하는 것은 아니다.Identifiers such as “(a),” “(b),” “(i),” “(ii),” etc. are sometimes used for different elements or steps. These identifiers are used for clarity and do not necessarily specify the order of elements or steps.

본 발명은 지정된 기능들 및 그 관계들의 구현을 예시하는 기능 빌딩 블록들의 도움을 받아 위에서 설명되었다. 이러한 기능 빌딩 블록들의 경계들은 설명의 편의를 위해 본 명세서에서 임의로 정의되었다. 지정된 기능들 및 그 관계들이 적절하게 수행되는 한, 대안의 경계들이 정의될 수 있다.The invention has been described above with the help of functional building blocks that illustrate the implementation of specified functions and their relationships. The boundaries of these functional building blocks are arbitrarily defined herein for convenience of description. Alternative boundaries can be defined as long as the specified functions and their relationships are performed appropriately.

다른 실시예들이 과도한 실험 없이 그리고 본 발명의 일반적인 개념을 벗어나지 않으면서, 당해 기술 분야의 지식을 적용하는 것에 의해, 특정 실시예들과 같은 다양한 응용들을 위해 용이하게 수정 및/또는 적응될 수 있도록, 특정 실시예들에 대한 전술한 설명은 따라서 본 발명의 일반적인 특성을 충분히 드러낼 것이다. 따라서, 그러한 적응 및 수정은 본 명세서에서 제시되는 교시 및 지침에 기초하여 개시된 실시예들의 균등물들의 의미 및 범위 내에 있는 것으로 의도된다. 본 명세서의 전문용어 또는 어구가 교시 및 지침을 바탕으로 통상의 기술자에 의해 해석될 수 있도록, 본 명세서에서의 어구 또는 전문용어가 설명을 위한 것이고 제한을 위한 것이 아님이 이해되어야 한다.So that other embodiments can be readily modified and/or adapted for various applications, such as the specific embodiments, without undue experimentation and without departing from the general concept of the invention, by applying knowledge in the art, The foregoing description of specific embodiments will thus fully reveal the general nature of the invention. Accordingly, such adaptations and modifications are intended to be within the meaning and scope of equivalents of the disclosed embodiments based on the teachings and guidance presented herein. It should be understood that any terminology or terminology herein is for the purpose of description and not limitation, so that the terminology or terminology herein can be interpreted by a person skilled in the art based on the teachings and guidelines.

본 발명의 폭 및 범위가 위에서 설명된 예시적인 실시예들 중 어느 것에 의해서도 제한되어서는 안되고, 첨부된 청구항들 및 그들의 등가물들에 따라서만 한정되어야 한다.The breadth and scope of the invention should not be limited by any of the exemplary embodiments described above, but should be limited only by the appended claims and their equivalents.

Claims

A system for enabling video conferencing between a first user and a second user, comprising:
A processor coupled to memory;
display screen;
(i) data specifying a three-dimensional virtual space, (ii) a position and orientation in the three-dimensional virtual space, wherein the position and orientation are input by the first user, and a camera on the first user's device. a network interface configured to receive a video stream captured from, wherein the camera is positioned to capture photographic images of the first user;
A web browser implemented on the processor and configured to download a web application from a server and execute the web application.
, wherein the web application:
a mapper configured to map the video stream onto a three-dimensional model of an avatar, and
The three-dimensional virtual space comprising, from the viewpoint of the second user's virtual camera, the three-dimensional model of the avatar with the mapped video stream positioned at the location and oriented in the direction for display to the second user. A renderer configured to render
A system that includes a.

According to paragraph 1,
wherein the device further includes a graphics processing unit, and the mapper and the renderer include WebGL application calls that enable the web application to map or render using the graphics processing unit.

1. A computer-implemented method for enabling video conferencing between a first user and a second user, comprising:
transmitting a web application to a first client device of the first user and a second client device of the second user;
(i) a position and orientation in three-dimensional virtual space, wherein the position and orientation are input by the first user, and (ii) a video stream captured from a camera on the first client device, wherein the camera is receiving, from the first client device executing the web application - arranged to capture photographic images of a user; and
transmitting the location and direction and the video stream to the second client device of the second user.
Includes,
The web application, when executed on a web browser, maps the video stream onto a three-dimensional model of an avatar, positioned at the location and in the direction for display to the second user, from the viewpoint of the second user's virtual camera. and executable instructions for rendering the three-dimensional virtual space containing the three-dimensional model of the avatar mapped to the video stream oriented.

According to paragraph 3,
wherein the web application includes WebGL application calls that enable the web application to map or render using a graphics processing unit of the second client device.

1. A computer-implemented method for enabling video conferencing between a first user and a second user, comprising:
Receiving data specifying a three-dimensional virtual space;
Receiving a position and orientation in the three-dimensional virtual space, wherein the position and orientation are input by the first user;
receiving a video stream captured from a camera on the first user's device, the camera positioned to capture photographic images of the first user;
mapping the video stream onto a three-dimensional model of an avatar by a web application implemented in a web browser; and
comprising the three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user, from the viewpoint of the second user's virtual camera, by the web application implemented in the web browser. Rendering the three-dimensional virtual space
Method, including.

According to clause 5,
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone positioned to capture speech of the first user; and
Outputting the audio stream for playback to the second user in synchronization with the display of the video stream within the three-dimensional virtual space.
A method further comprising:

According to clause 5,
When input is received from the second user indicating a desire to change the viewpoint of the virtual camera:
changing the viewpoint of the virtual camera of the second user; and
From the changed viewpoint of the virtual camera, re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user.
A method further comprising:

In clause 7,
The method wherein the viewpoint of the virtual camera is defined by at least coordinates on a horizontal plane in the three-dimensional virtual space and pan and tilt values.

According to clause 5,
When the new location and new orientation of the first user in the three-dimensional virtual space are received:
The method further comprising re-rendering the three-dimensional virtual space including a three-dimensional model of the avatar located at the new location and oriented in the new orientation for display to the second user.

According to clause 5,
The method of claim 1 , wherein the mapping step includes iteratively mapping pixels onto the three-dimensional model of the avatar for each frame of the video stream.

According to clause 5,
wherein the data, the location and orientation, and the video stream are received from a server to a web browser, and the mapping and rendering are performed by the web browser.

According to clause 11,
receiving, from the server, a notification indicating that the first user is no longer present; and
Re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display to the second user on the web browser.
A method further comprising:

According to clause 12,
receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
Receiving a second location and a second direction of the third user in the three-dimensional virtual space;
receiving a second video stream captured from a camera on the third user's device, the camera positioned to capture photographic images of the third user;
mapping the second video stream onto a second three-dimensional model of a second avatar; and
From the viewpoint of the virtual camera of the second user, rendering the three-dimensional virtual space including the second three-dimensional model located at the second location and oriented in the second direction for display to the second user. steps to do
A method further comprising:

According to clause 5,
The step of receiving data designating the three-dimensional virtual space includes receiving a mesh designating a meeting space and receiving a background image, and the rendering step includes converting the background image into a sphere image. A method comprising the step of mapping to.

A non-transitory, tangible, computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations to enable video conferencing between a first user and a second user. , the operations are:
An act of receiving data specifying a three-dimensional virtual space;
Receiving a position and direction in the three-dimensional virtual space, wherein the position and direction are input by the first user;
receiving a video stream captured from a camera on the first user's device, the camera positioned to capture photographic images of the first user;
mapping the video stream onto a three-dimensional model of an avatar; and
Rendering, from the viewpoint of the second user's virtual camera, the three-dimensional virtual space including the three-dimensional model of the avatar located at the location and oriented in the direction for display to the second user.
A device containing a.

According to clause 15,
The above operations are:
receiving an audio stream captured in synchronization with the video stream from a microphone of the device of the first user, the microphone positioned to capture utterances of the first user; and
Outputting the audio stream for playback to the second user in synchronization with display of the video stream within the three-dimensional virtual space
A device further comprising:

According to clause 15,
The operations are: When input is received from the second user indicating a desire to change the viewpoint of the virtual camera:
Changing the viewpoint of the virtual camera of the second user; and
From the changed viewpoint of the virtual camera, re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user.
A device further comprising:

According to clause 17,
The device, wherein the viewpoint of the virtual camera is defined by at least coordinates and pan and tilt values on a horizontal plane in the three-dimensional virtual space.

According to clause 15,
The operations are: When the new location and new orientation of the first user in the three-dimensional virtual space are received:
The device further comprising re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar located at the new location and oriented in the new direction for display to the second user.

According to clause 15,
The mapping operation includes repeatedly mapping pixels for each frame of the video stream onto the three-dimensional model of the avatar.

According to clause 15,
The device, wherein the data, the location and orientation, and the video stream are received from a server to a web browser, and the mapping and rendering are performed by the web browser.

According to clause 21,
The above operations are:
receiving, from the server, a notification indicating that the first user is no longer present; and
Re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display to the second user on the web browser.
A device further comprising:

According to clause 22,
The above operations are:
Receiving, from the server, a notification indicating that a third user has entered the three-dimensional virtual space;
receiving a second location and a second direction of the third user in the three-dimensional virtual space;
receiving a second video stream captured from a camera on the third user's device, the camera positioned to capture photographic images of the third user;
mapping the second video stream onto a second three-dimensional model of a second avatar; and
From the viewpoint of the virtual camera of the second user, rendering the three-dimensional virtual space including the second three-dimensional model located at the second location and oriented in the second direction for display to the second user. action
A device further comprising:

According to clause 15,
The operation of receiving data specifying the three-dimensional virtual space includes receiving a mesh designating a meeting space and receiving a background image, and the rendering operation includes mapping the background image into a sphere. A device that does it.

delete