JP7318139B1

JP7318139B1 - Web-based videoconferencing virtual environment with steerable avatars and its application

Info

Publication number: JP7318139B1
Application number: JP2022562717A
Authority: JP
Inventors: コーネリスクロール，ジェラルド; スチュアートブラウンド，エリック
Original assignee: カトマイテックインコーポレイテッド
Priority date: 2020-10-20
Filing date: 2021-10-20
Publication date: 2023-07-31
Anticipated expiration: 2041-10-20
Also published as: IL298268A; AU2021366657A1; CA3181367A1; WO2022087147A1; KR20220160699A; AU2021366657B2; BR112022024836A2; JP2023139110A; KR20230119261A; KR102580110B1; CN116018803A; CA3181367C; AU2023229565A1; EP4122192A1; IL298268B1; JP2023534092A; IL308489A

Abstract

本明細書に開示されるのは、ビデオアバター（１０２Ａ、１０２Ｂ）が仮想環境内でナビゲートできるようにする、ウェブベースのテレビ会議システムである。システムは、プレゼンテーションストリームを仮想環境内に配置されたプレゼンター画面（１０４Ａ、１０４Ｂ）にテクスチャマッピングできるようにするプレゼンモードを有する。相対的な左右の音声は、仮想空間におけるアバターの位置の感覚を提供するように調整される。音声は、アバターが配置されているエリア及び仮想カメラが配置されているエリアに基づいて更に調整される。ビデオストリームの品質は、仮想空間における相対位置に基づいて調整される。三次元モデリングが仮想テレビ会議環境内部で利用可能である。Disclosed herein is a web-based videoconferencing system that allows video avatars (102A, 102B) to navigate within a virtual environment. The system has a presentation mode that allows the presentation stream to be texture-mapped onto the presenter's screen (104A, 104B) placed in the virtual environment. Relative left and right audio are adjusted to provide a sense of the avatar's position in the virtual space. The audio is further adjusted based on the area in which the avatar is located and the area in which the virtual camera is located. The quality of the video stream is adjusted based on relative position in virtual space. Three-dimensional modeling is available within the virtual videoconferencing environment.

Description

関連出願への相互参照
[0001] 本願は、現在は２０２１年４月１３日付けで発行された米国特許第１０，９７９，６７２号として発行されている、２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，３３８号、２０２１年３月１１日付けで出願された米国実用特許出願第１７／１９８，３２３号、現在は２０２１年８月１７日付けで発行されている米国特許第１１，０９５，８５７号として発行されている２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，３６２号、現在は２０２１年３月１６日付けで発行されている米国特許第１０，９５２，００６号として発行されている２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，３９０号、現在は２０２１年７月２０日付けで発行されている米国特許第１１，０７０，７６８号として発行されている２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，４０８号、現在は２０２１年７月２７日付けで発行されている米国特許第１１，０７６，１２８号として発行されている２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，４２８号、及び２０２０年１０月２０日付けで出願された米国実用特許出願第１７／０７５，４５４号への優先権を主張するものである。これらの出願の各々の内容は、全体的に参照により本明細書に援用される。 Cross-references to related applications
[0001] This application is derived from U.S. utility patent application no. 17/075,338, U.S. Utility Patent Application No. 17/198,323 filed Mar. 11, 2021, now U.S. Patent No. 11,095 issued Aug. 17, 2021 U.S. Utility Patent Application No. 17/075,362 filed Oct. 20, 2020, issued as No. 857, now U.S. Patent No. 10, issued Mar. 16, 2021. U.S. Utility Patent Application No. 17/075,390 filed Oct. 20, 2020, issued as No. 952,006, now U.S. Patent No. 11, issued Jul. 20, 2021; U.S. Utility Patent Application No. 17/075,408 filed Oct. 20, 2020, issued as No. 070,768, now U.S. Patent No. U.S. Utility Patent Application No. 17/075,428 filed October 20, 2020, published as No. 11,076,128 and U.S. Utility Patent Application filed October 20, 2020 No. 17/075,454 is claimed. The contents of each of these applications are hereby incorporated by reference in their entirety.

背景
分野
[0002] この分野は一般にテレビ会議に関する。 background field
[0002] This field relates generally to videoconferencing.

関連技術
[0003] テレビ会議には、人々の間でリアルタイム通信するために、異なる場所にいる複数のユーザによる視聴覚信号の送受信が関わる。テレビ会議は、San Jose, CA所在のZoom Communications Inc.から入手可能なZOOMサービスを含め多様な様々なサービスから多くの計算デバイスで広く利用可能である。Cupertino, CA所在のApple Inc.から入手可能なFaceTime（登録商標）アプリケーション等の幾つかのテレビ会議ソフトウェアは、モバイルデバイスに標準装備されている。 Related technology
[0003] Videoconferencing involves the transmission and reception of audiovisual signals by multiple users at different locations for real-time communication between people. Videoconferencing is widely available on many computing devices from a variety of different services, including the ZOOM service available from Zoom Communications Inc. of San Jose, CA. Some video conferencing software, such as the FaceTime® application available from Apple Inc. of Cupertino, CA, comes standard with mobile devices.

[0004] 一般に、これらのアプリケーションは、他の会議参加者のビデオを表示し、オーディオを出力することによって動作する。複数の参加者がいる場合、画面は、各々が１人の参加者のビデオを表示する幾つかの矩形フレームに分割し得る。これらのサービスは、発話している人物のビデオを提示するより大きなフレームを有することによって動作することもある。異なる個人が発話する際、そのフレームは話者間で切り替わる。アプリケーションは、ユーザのデバイスに統合されたカメラからのビデオを捕捉し、ユーザのデバイスに統合されたマイクロホンからオーディオを捕捉する。アプリケーションは次いで、他のユーザのデバイスで実行中の他のアプリケーションにそのオーディオ及びビデオを送信する。 [0004] In general, these applications operate by displaying video and outputting audio of other conference participants. If there are multiple participants, the screen may be split into several rectangular frames, each displaying one participant's video. These services may also work by having a larger frame presenting the video of the person speaking. As different individuals speak, the frames switch between speakers. The application captures video from a camera integrated into the user's device and audio from a microphone integrated into the user's device. The application then sends its audio and video to other applications running on other users' devices.

[0005] これらのテレビ会議アプリケーションの多くは、画面共有機能を有する。あるユーザが自身の画面（又は自身の画面の一部）の共有を特定すると、ストリームが画面の内容と共にその他のユーザのデバイスに送信される。幾つかの場合、他のユーザは、そのユーザの画面上にあるものを制御することさえできる。このようにして、ユーザはプロジェクトで協働し、又は他のミーティング参加者に対してプレゼンテーションを行うことができる。 [0005] Many of these video conferencing applications have screen sharing capabilities. When a user specifies sharing their screen (or part of their screen), a stream is sent to the other users' devices along with the contents of the screen. In some cases, other users can even control what is on that user's screen. In this way, users can collaborate on projects or give presentations to other meeting participants.

[0006] 近年、テレビ会議技術は重要性を増した。病気、特にＣＯＶＩＤ－１９の蔓延を恐れて、多くの職場、見本市、ミーティング、会議、学校、及び礼拝所は閉鎖され、又は参加しないように勧められている。テレビ会議技術を使用した仮想会議はますます、物理的なミーティングに取って代わっている。加えて、この技術は、出張及び通勤を回避するため、物理的に合うことよりも優れた利点を提供する。 [0006] In recent years, videoconferencing technology has gained in importance. Fearing the spread of disease, especially COVID-19, many workplaces, trade fairs, meetings, conferences, schools and places of worship have been closed or discouraged from attending. Virtual meetings using videoconferencing technology are increasingly replacing physical meetings. In addition, this technology offers advantages over physical fit because it avoids business travel and commuting.

[0007] しかしながら多くの場合、このテレビ会議技術を使用すると、場所の感覚が失われることになる。対面で物理的に会って同じ場所にいるという体験的側面があり、これは会議が仮想的に行われる場合、失われる。ユーザがポーズを取り、仲間を見ることが可能であるという社会的側面がある。体験のこの感覚は、関係及び社会的繋がりを作る上で重要である。従来のテレビ会議に関して言えば、今のところはまだこの感覚は欠けている。 [0007] In many cases, however, the use of this videoconferencing technology results in a loss of sense of place. There is an experiential aspect of meeting physically face-to-face and being in the same place, which is lost when meetings are held virtually. There is a social aspect where users can pose and see their peers. This sense of experience is important in making relationships and social connections. When it comes to traditional video conferencing, this feeling is still lacking so far.

[0008] 更に、会議が数人の参加者で開始される場合、これらのテレビ会議技術に伴って更なる問題が生じる。物理的なミーティング会議では、人々は雑談することができる。近くにいる人々だけに通じるように声を出すことができる。幾つかの場合、より大きなミーティングの状況において私的な会話をすることさえできる。しかしながら、仮想会議を用い、複数の人々が同時に話す場合、ソフトウェアは２つのオーディオストリームを略等しく混合し、参加者に互いに割り込んで発話させる。したがって、複数の人々が仮想会議に関わる場合、私的な会話は不可能であり、対話はむしろ、１人から多くの人への発話の形になりがちである。ここでも、仮想会議では、参加者が社会的繋がりを作り、より効率的にコミュニケーションをとって人脈を広げる機会を逃す。 [0008] Further problems arise with these videoconferencing techniques when a conference is started with several participants. In a physical meeting conference, people can chat. You can speak in a way that only people who are close to you can hear you. In some cases, you can even have a private conversation in the context of a larger meeting. However, when a virtual conference is used and multiple people are speaking at the same time, the software mixes the two audio streams approximately equally and forces the participants to interrupt each other to speak. Therefore, when multiple people are involved in a virtual meeting, private conversation is not possible and interaction tends to be in the form of utterances from one to many. Again, virtual meetings miss the opportunity for participants to make social connections and communicate more effectively to network.

[0009] 更に、ネットワーク帯域幅及び計算ハードウェアの制限に起因して、多くのストリームが会議で生じた場合、多くのテレビ会議の性能は遅くなり始める。多くの計算デバイスは、２～３人の参加者からのビデオストリームを扱うように装備されており、１２人以上の参加者からのビデオストリームを扱うようには装備されていない。全体的に仮想的に運営されている多くの学校では、２５人のクラスは学校発行の計算デバイスをかなり遅くさせ得る。 [0009] Furthermore, due to network bandwidth and computational hardware limitations, the performance of many video conferences begins to slow when many streams occur in the conference. Many computing devices are equipped to handle video streams from 2-3 participants and are not equipped to handle video streams from 12 or more participants. In many schools that operate entirely virtually, a class of 25 can slow school-issued computing devices significantly.

[0010] 多人数参加型オンラインゲーム（ＭＭＯＧ又はＭＭＯ）は一般に、２５人よりもかなり多くの参加者を扱うことができる。これらのゲームでは多くの場合、１台のサーバに数百人又数千人のプレーヤがいる。ＭＭＯでは多くの場合、プレーヤはアバターを仮想世界のあちこちにナビゲートすることができる。これらのＭＭＯでは、ユーザが互いと話し、又はメッセージを互いに送信することができることがある。例には、San Mateo, CA所在のRoblox Corporationから入手可能なROBLOXゲーム及びStockholm, Sweden所在のMojang Studiosから入手可能なMINECRAFTゲームがある。 [0010] Massively multiplayer online games (MMOGs or MMOs) can generally handle significantly more than 25 participants. These games often have hundreds or thousands of players on a single server. MMOs often allow a player to navigate an avatar around a virtual world. These MMOs may allow users to talk to each other or send messages to each other. Examples are ROBLOX games available from Roblox Corporation of San Mateo, CA and MINECRAFT games available from Mojang Studios of Stockholm, Sweden.

[0011] 裸のアバターに互いと対話させることにはまた、社会的相互作用に関して制限がある。これらのアバターは通常、人々が多くの場合、非意図的に作る表情を伝えることができない。これらの表情はテレビ会議で観察可能である。幾つかの公開公報が、仮想世界においてアバターに配置されるビデオを記載している可能性がある。しかしながら、これらのシステムは典型的には、専用ソフトウェアを必要とし、有用性を制限する他の制限を有する。 [0011] Having naked avatars interact with each other also has limitations with respect to social interaction. These avatars typically cannot convey the facial expressions that people often make unintentionally. These facial expressions are observable in a teleconference. Several publications may describe videos placed on avatars in virtual worlds. However, these systems typically require specialized software and have other limitations that limit their usefulness.

[0012] テレビ会議の改良された方法が必要とされている。 [0012] There is a need for improved methods of videoconferencing.

概要
[0013] 一実施形態において、デバイスは、第１のユーザと第２のユーザとの間でのテレビ会議を可能にする。本デバイスは、メモリに結合されたプロセッサ、ディスプレイ画面、ネットワークインターフェース、及びウェブブラウザを含む。ネットワークインターフェースは、（ｉ）三次元仮想空間を指定するデータ、（ｉｉ）前記三次元仮想空間における位置及び方向であって、前記第１のユーザによって入力される位置及び方向、及び（ｉｉｉ）前記第１のユーザのデバイスのカメラから捕捉されるビデオストリームを受信するように構成される。第１のユーザのカメラは、第１のユーザの写真画像を捕捉するように位置決めされる。ウェブブラウザは、プロセッサで実施され、ウェブアプリケーションをサーバからダウンロードし、前記ウェブアプリケーションを実行するように構成される。ウェブアプリケーションは、テクスチャマッパ及びレンダラーを含む。テクスチャマッパは、ビデオストリームをアバターの三次元モデルにテクスチャマッピングするように構成される。レンダラーは、第２のユーザの仮想カメラの視点から、前記第２のユーザに表示するために、上記位置に配置され、上記方向を向いた前記アバターの前記テクスチャマッピングされた三次元モデルを含む前記三次元仮想空間をレンダリングするように構成される。ウェブアプリケーション内のテクスチャマッピングを管理することにより、実施形態は専用ソフトウェアをインストールする必要性を回避する。 overview
[0013] In one embodiment, a device enables a video conference between a first user and a second user. The device includes a processor coupled to memory, a display screen, a network interface, and a web browser. The network interface comprises (i) data specifying a three-dimensional virtual space, (ii) a position and orientation in the three-dimensional virtual space that are input by the first user, and (iii) the It is configured to receive a video stream captured from a camera of the first user's device. The first user's camera is positioned to capture a photographic image of the first user. A web browser is implemented on the processor and configured to download a web application from a server and execute the web application. Web applications include texture mappers and renderers. A texture mapper is configured to texture map the video stream onto the three-dimensional model of the avatar. a renderer comprising the texture-mapped three-dimensional model of the avatar positioned at the location and oriented in the direction for display to the second user from the perspective of the second user's virtual camera; Configured to render a three-dimensional virtual space. By managing texture mapping within a web application, embodiments avoid the need to install specialized software.

[0014] 一実施形態において、コンピュータ実施方法は、複数の参加者を含む仮想会議でのプレゼンテーションを可能にする。本方法において、三次元仮想空間を指定するデータが受信される。三次元仮想空間における位置及び方向も受信される。位置及び方向は、会議への複数の参加者のうちの第１の参加者によって入力された。最後に、第１の参加者のデバイスのカメラから捕捉されたビデオストリームが受信される。カメラは、第１の参加者の写真画像を捕捉するように位置決めされた。ビデオストリームは、アバターの三次元モデルにテクスチャマッピングされる。加えて、第１の参加者のデバイスからのプレゼンテーションストリームが受信される。プレゼンテーションストリームは、プレゼンテーション画像の三次元モデルにテクスチャマッピングされる。最後に、テクスチャマッピングされたアバター及びテクスチャマッピングされたプレゼンテーション画面を有する三次元仮想空間が、複数の参加者のうちの第２の参加者の仮想カメラの視点から、第２の参加者に表示するために、レンダリングされる。このようにして、実施形態は、社会会議環境でプレゼンテーションを可能にする。 [0014] In one embodiment, a computer-implemented method enables presentation in a virtual conference including multiple participants. In the method, data specifying a three-dimensional virtual space is received. A position and orientation in the three-dimensional virtual space are also received. The position and orientation were entered by the first of the multiple participants to the conference. Finally, a video stream captured from the camera of the first participant's device is received. A camera was positioned to capture a photographic image of the first participant. The video stream is texture-mapped onto a three-dimensional model of the avatar. Additionally, a presentation stream from the first participant's device is received. The presentation stream is texture mapped onto a 3D model of the presentation image. Finally, the three-dimensional virtual space with the texture-mapped avatar and the texture-mapped presentation screen is displayed to a second of the multiple participants from the perspective of the second participant's virtual camera. is rendered for In this manner, embodiments enable presentations in a social conference environment.

[0015] 一実施形態において、コンピュータ実施方法は、複数の参加者を含む仮想会議にオーディオを提供する。本方法において、第２のユーザのテクスチャマッピングされたビデオを有するアバターを含む三次元仮想空間は、第１のユーザの仮想カメラの視点から、第１のユーザに表示するためにレンダリングされる。仮想カメラは、三次元仮想空間において第１の位置にあり、アバターは、三次元仮想空間において第２の位置にある。第２のユーザのデバイスのマイクロホンからのオーディオストリームが受信される。マイクロホンは、第２のユーザの発話を捕捉するように位置決めされた。受信されたオーディオストリームの音量は、左オーディオストリーム及び右オーディオストリームを特定して、第２の位置が三次元仮想空間において第１の位置に相対する場所の感覚を提供するように調整される。左オーディオストリーム及び右オーディオストリームはステレオで出力されて、第１のユーザに再生される。 [0015] In one embodiment, a computer-implemented method provides audio for a virtual conference including multiple participants. In the method, a three-dimensional virtual space containing an avatar with texture-mapped video of the second user is rendered for display to the first user from the viewpoint of the first user's virtual camera. The virtual camera is at a first position in the three-dimensional virtual space and the avatar is at a second position in the three-dimensional virtual space. An audio stream is received from a microphone of a second user's device. A microphone was positioned to pick up the second user's speech. Volumes of the received audio streams are adjusted to specify the left audio stream and the right audio stream to provide a sense of where the second location is relative to the first location in the three-dimensional virtual space. The left and right audio streams are output in stereo and played to the first user.

[0016] 一実施形態において、コンピュータ実施方法は、仮想会議にオーディオを提供する。本方法において、第２のユーザのテクスチャマッピングされたビデオを有するアバターを含む三次元仮想空間が、第１のユーザの仮想カメラの視点から、第１のユーザに表示するためにレンダリングされる。仮想カメラは、三次元仮想空間において第１の位置にあり、アバターは、三次元仮想空間において第２の位置にある。第２のユーザのデバイスのマイクロホンからのオーディオストリームが受信される。仮想カメラ及びアバターが複数のエリア中の同じエリアに配置されるか否かが特定される。仮想カメラ及びアバターが、同じエリアに配置されないと特定される場合、オーディオストリームは減衰する。減衰したオーディオストリームが出力されて第１のユーザに対して再生される。このようにして、実施形態は、仮想テレビ会議環境において私的な会話及び雑談を可能にする。 [0016] In one embodiment, a computer-implemented method provides audio for a virtual conference. In the method, a three-dimensional virtual space containing an avatar with texture-mapped video of the second user is rendered for display to the first user from the perspective of the first user's virtual camera. The virtual camera is at a first position in the three-dimensional virtual space and the avatar is at a second position in the three-dimensional virtual space. An audio stream is received from a microphone of a second user's device. It is specified whether the virtual camera and avatar are located in the same area of the multiple areas. If the virtual camera and avatar are specified not to be located in the same area, the audio stream will be attenuated. An attenuated audio stream is output and played to the first user. In this manner, embodiments enable private conversations and chit-chat in a virtual videoconferencing environment.

[0017] 一実施形態において、コンピュータ実施方法は、仮想会議に向けてビデオを効率的にストリーミングする。本方法において、仮想会議空間における第１のユーザと第２のユーザとの間の距離が特定される。第１のユーザのデバイスのカメラから捕捉されたビデオストリームが受信される。カメラは、第１のユーザの写真画像を捕捉するように位置決めされた。ビデオストリームの解像度又はビットレートは、距離が近いほど、距離が遠い場合よりも解像度が高くなるように、特定された距離に基づいて低減される。ビデオストリームは、仮想会議空間内の第２のユーザに表示するために、第２のユーザのデータに低減した解像度又はビットレートで送信される。ビデオストリームは、仮想会議空間内の第２のユーザに表示するために、第１のユーザのアバターにテクスチャマッピングされるべきである。このようにして、多数の会議参加者がいる場合であっても、実施形態は帯域幅及び計算リソースを効率的に割り振る。 [0017] In one embodiment, a computer-implemented method efficiently streams video for a virtual conference. In the method, a distance is identified between a first user and a second user in the virtual conference space. A video stream captured from a camera of a first user's device is received. The camera was positioned to capture a photographic image of the first user. The resolution or bitrate of the video stream is reduced based on the specified distance such that closer distances have higher resolution than longer distances. The video stream is transmitted at a reduced resolution or bitrate to the second user's data for display to the second user within the virtual conference space. The video stream should be texture mapped to the first user's avatar for display to the second user in the virtual conference space. In this manner, embodiments efficiently allocate bandwidth and computational resources, even when there are a large number of conference participants.

[0018] 一実施形態において、コンピュータ実施方法は、仮想テレビ会議においてモデリングを可能にする。本方法において、仮想環境の三次元モデルである、物体の三次元モデルを表すメッシュ及び仮想テレビ会議の参加者からのビデオストリームが受信される。ビデオストリームは、参加者によって操縦可能なアバターにテクスチャマッピングされる。テクスチャマッピングされたアバター及び仮想環境内の物体の三次元モデルを表すメッシュは、表示に向けてレンダリングされる。 [0018] In one embodiment, a computer-implemented method enables modeling in a virtual video conference. In the method, a three-dimensional model of a virtual environment, a mesh representing a three-dimensional model of an object and a video stream from participants of a virtual videoconference are received. The video stream is texture-mapped onto avatars that can be steered by the participants. Texture-mapped avatars and meshes representing three-dimensional models of objects in the virtual environment are rendered for display.

[0019] システム、デバイス、及びコンピュータプログラム製品の実施形態も開示される。 [0019] Systems, devices, and computer program product embodiments are also disclosed.

[0020] 本発明の更なる実施形態、特徴、及び利点、並びに種々の実施形態の構造及び動作について、添付図面を参照して以下詳細に説明する。 [0020] Further embodiments, features, and advantages of the present inventions, as well as the structure and operation of the various embodiments, are described in detail below with reference to the accompanying drawings.

図面の簡単な説明
[0021] 添付図面は、本明細書に組み込まれて本発明の一部をなし、本開示を示し、説明と一緒に、本開示の原理を説明し、当業者が本開示を作成し使用できるようにするよう更に機能する。 Brief description of the drawing
[0021] The accompanying drawings, which are incorporated in and constitute a part of the present invention, illustrate the disclosure and, together with the description, explain the principles of the disclosure and enable a person skilled in the art to make and use the disclosure. It also functions as if

[0022]ビデオストリームがアバターにマッピングされた、仮想環境においてテレビ会議を提供するインターフェース例を示す図である。[0022] FIG. 3 illustrates an example interface for providing video conferencing in a virtual environment with video streams mapped to avatars. [0023]テレビ会議に向けてアバターを有する仮想環境をレンダリングするのに使用される三次元モデルを示す図である。[0023] Fig. 3 depicts a three-dimensional model used to render a virtual environment with avatars for a video conference; [0024]仮想環境においてテレビ会議を提供するシステムを示す図である。[0024] Figure 1 illustrates a system for providing video conferencing in a virtual environment; [0025]テレビ会議を提供するために、データが図３のシステムの種々の構成要素間でいかに転送されるかを示す。[0025] FIG. 4 illustrates how data is transferred between the various components of the system of FIG. 3 to provide a video conference. [0025]テレビ会議を提供するために、データが図３のシステムの種々の構成要素間でいかに転送されるかを示す。[0025] FIG. 4 illustrates how data is transferred between the various components of the system of FIG. 3 to provide a video conference. [0025]テレビ会議を提供するために、データが図３のシステムの種々の構成要素間でいかに転送されるかを示す。[0025] FIG. 4 illustrates how data is transferred between the various components of the system of FIG. 3 to provide a video conference. [0026]相対的に左右音量を調整して、テレビ会議中、仮想環境における位置の感覚を提供する方法を示すフローチャートである。[0026] Fig. 6 is a flow chart illustrating a method for adjusting relative left and right volume to provide a sense of position in a virtual environment during a video conference. [0027]アバター間の距離が増大するにつれて音量がいかに下がるかを示すチャートである。[0027] Fig. 5 is a chart showing how the volume decreases as the distance between avatars increases. [0028]テレビ会議中、仮想環境において異なる音量エリアを提供するように相対音量を調整する方法を示すフローチャートである。[0028] FIG. 7 is a flowchart illustrating a method of adjusting relative volume to provide different volume areas in a virtual environment during a video conference. [0029]テレビ会議中、仮想環境における異なる音量エリアを示す図である。[0029] Fig. 4 illustrates different volume areas in a virtual environment during a video conference; [0029]テレビ会議中、仮想環境における異なる音量エリアを示す図である。[0029] Fig. 4 illustrates different volume areas in a virtual environment during a video conference; [0030]テレビ会議中、仮想環境における音量エリアの階層横断を示す図である。[0030] Fig. 10 illustrates traversing the volume area hierarchy in a virtual environment during a video conference; [0030]テレビ会議中、仮想環境における音量エリアの階層横断を示す図である。[0030] Fig. 10 illustrates traversing the volume area hierarchy in a virtual environment during a video conference; [0030]テレビ会議中、仮想環境における音量エリアの階層横断を示す図である。[0030] Fig. 10 illustrates traversing the volume area hierarchy in a virtual environment during a video conference; [0031]三次元仮想環境における三次元モデルとの対話を示す。[0031] Fig. 3 illustrates interaction with a three-dimensional model in a three-dimensional virtual environment; [0032]テレビ会議に使用される三次元仮想環境におけるプレゼンテーション画面共有を示す。[0032] Fig. 3 depicts presentation screen sharing in a three-dimensional virtual environment used for video conferencing. [0033]三次元仮想環境内のアバターの相対位置に基づいて、利用可能な帯域幅を分配する方法を示すフローチャートである。[0033] Fig. 4 is a flow chart illustrating a method of allocating available bandwidth based on relative positions of avatars within a three-dimensional virtual environment. [0034]アバター間の距離が増大するにつれて優先値がいかに低下することができるかを示すチャートである。[0034] Fig. 7 is a chart showing how priority values can decrease as the distance between avatars increases. [0035]相対優先度に基づいて割り振られる帯域波をいかに変更することができるかを示すチャートである。[0035] Fig. 6 is a chart showing how the allocated bands can be changed based on relative priority. [0036]仮想環境内でテレビ会議を提供するのに使用されるデバイスの構成要素を示す図である。[0036] Fig. 3 illustrates components of a device used to provide video conferencing within a virtual environment;

[0037] 要素が最初に現れた図面は典型的には、対応する参照番号における左端の１つ又は複数の桁によって示される。図面中、同様の参照番号は同一の要素又は機能的に同一の要素を示し得る。 [0037] The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally identical elements.

詳細な説明
仮想環境におけるアバターを用いたテレビ会議
[0038] 図１は、ビデオストリームがアバターにマッピングされる、仮想環境においてテレビ会議を提供する一例のインターフェース１００を示す図である。 Detailed description Teleconferencing with avatars in a virtual environment
[0038] FIG. 1 illustrates an example interface 100 for providing video conferencing in a virtual environment in which video streams are mapped to avatars.

[0039] インターフェース１００は、テレビ会議への参加者に表示し得る。例えば、インターフェース１００は、参加者に表示するためにレンダリングされ得、テレビ会議が進行するにつれて常時更新し得る。ユーザは、例えばキーボード入力を使用してユーザの仮想カメラの向きを制御し得る。このようにして、ユーザは仮想環境のあちこちをナビゲートすることができる。一実施形態において、異なる入力が、仮想環境における仮想カメラのＸ及びＹ位置並びにパン角及びチルト角を変更し得る。更なる実施形態において、ユーザは入力を使用して、仮想カメラの高さ（Ｚ座標）又はヨーを変更し得る。更なる実施形態において、ユーザは入力を入力して、仮想カメラが元の位置に戻る間、仮想カメラに「ホップ」アップさせて重力をシミュレートし得る。仮想カメラのナビゲートに利用可能な入力は、例えば、仮想カメラをＸ－Ｙ平面上で前後左右に移動させるＷＡＳＤキーボードキー、仮想カメラを「ホップ」させるスペースバーキー、並びにパン角及びチルト角の変更を指定するマウス移動等のキーボード入力及びマウス入力を含み得る。 [0039] Interface 100 may be displayed to participants in a video conference. For example, interface 100 may be rendered for display to participants and may be constantly updated as the videoconference progresses. The user may control the orientation of the user's virtual camera using keyboard input, for example. In this manner, the user can navigate around the virtual environment. In one embodiment, different inputs may change the X and Y position and pan and tilt angles of the virtual camera in the virtual environment. In further embodiments, the user may use input to change the height (Z coordinate) or yaw of the virtual camera. In a further embodiment, the user may enter an input to cause the virtual camera to "hop" up to simulate gravity while the virtual camera returns to its original position. Inputs available for navigating the virtual camera include, for example, the WASD keyboard keys to move the virtual camera forward, backward, left and right in the XY plane, the spacebar key to "hop" the virtual camera, and change the pan and tilt angles. can include keyboard input and mouse input, such as mouse movements that specify

[0040] インターフェース１００はアバター１０２Ａ及び１０２Ｂを含み、各アバターはテレビ会議への異なる参加者を表す。アバター１０２Ａ及び１０２Ｂはそれぞれ、第１及び第２の参加者のデバイスからのビデオストリーム１０４Ａ及び１０４Ｂがテクスチャマッピングされている。テクスチャマップとは、形状又は多角形の表面に適用（マッピング）される画像である。ここでは、画像はビデオの各フレームである。ビデオストリーム１０４Ａ及び１０４Ｂを捕捉するカメラデバイスは、各参加者の顔を捕捉するように位置決めされる。このようにして、アバターはテクスチャマッピングされ、ミーティングでの参加者が話を聞くとき、顔の画像を移動させる。 [0040] Interface 100 includes avatars 102A and 102B, each representing a different participant in the videoconference. Avatars 102A and 102B have texture-mapped video streams 104A and 104B from the devices of the first and second participants, respectively. A texture map is an image that is applied (mapped) to the surface of a shape or polygon. Here, an image is each frame of the video. Camera devices capturing video streams 104A and 104B are positioned to capture each participant's face. In this way, the avatar is texture-mapped, moving the image of the face as participants in the meeting listen.

[0041] 仮想カメラがユーザ閲覧インターフェース１００によって制御される方法と同様に、アバター１０２Ａ及び１０２Ｂの場所及び方向は、アバターが表す各参加者によって制御される。アバター１０２Ａ及び１０２Ｂは、メッシュによって表現される三次元モデルである。各アバター１０２Ａ及び１０２Ｂは、アバターの下に参加者の氏名を有し得る。 [0041] Similar to the way virtual cameras are controlled by user viewing interface 100, the location and orientation of avatars 102A and 102B are controlled by each participant they represent. Avatars 102A and 102B are three-dimensional models represented by meshes. Each avatar 102A and 102B may have the participant's full name under the avatar.

[0042] 各アバター１０２Ａ及び１０２Ｂは種々のユーザによって制御される。アバターは各々、アバター自身の仮想カメラが仮想環境内で配置されている場所に対応するポイントに位置決めされ得る。ユーザ閲覧インターフェース１００が仮想カメラの周囲を移動できるのと全く同じように、種々のユーザは各アバター１０２Ａ及び１０２Ｂの周囲を移動することができる。 [0042] Each avatar 102A and 102B is controlled by a different user. Each avatar may be positioned at a point corresponding to where the avatar's own virtual camera is positioned within the virtual environment. Different users can move around their respective avatars 102A and 102B, just as the user viewing interface 100 can move around a virtual camera.

[0043] インターフェース１００にレンダリングされる仮想環境は、背景画像１２０と、アリーナの三次元モデル１１８とを含む。アリーナは、テレビ会議が行われるべき会場又は建物であり得る。アリーナは、壁で仕切られたフロアエリアを含み得る。三次元モデル１１８は、メッシュ及びテクスチャを含むことができる。三次元モデル１１８の表面を数学的に表す他の方法も同様に可能であり得る。例えば、ポリゴンモデリング、曲線モデリング、及びデジタルスカルプティングが可能であり得る。例えば、三次元モデル１１８は、ボクセル、スプライン、幾何プリミティブ、ポリゴン、又は三次元空間における任意の他の可能な表現によって表され得る。三次元モデル１１８は光源の仕様を含むこともできる。光源は例えば、点光源、指向性光源、スポットライト光源、及び周囲光源を含むことができる。物体は、いかに光を反射するかを記述する特定の性質を有することもできる。例では、性質は拡散照明相互作用、周囲照明相互作用、及びスペクトル照明相互作用を含み得る。 [0043] The virtual environment rendered in the interface 100 includes a background image 120 and a three-dimensional model 118 of the arena. An arena can be a venue or building where a video conference is to take place. The arena may include floor areas separated by walls. Three-dimensional model 118 may include meshes and textures. Other methods of mathematically representing the surface of three-dimensional model 118 may be possible as well. For example, polygon modeling, curve modeling, and digital sculpting may be possible. For example, three-dimensional model 118 may be represented by voxels, splines, geometric primitives, polygons, or any other possible representation in three-dimensional space. The three-dimensional model 118 can also include specifications for the light sources. Light sources can include, for example, point light sources, directional light sources, spotlight light sources, and ambient light sources. Objects can also have certain properties that describe how they reflect light. In examples, properties may include diffuse lighting interactions, ambient lighting interactions, and spectral lighting interactions.

[0044] アリーナに加えて、仮想環境は、環境の異なる構成要素を示す種々の他の三次元モデルを含むことができる。例えば、三次元環境は、装飾モデル１１４、スピーカモデル１１６、及びプレゼンテーション画面モデル１２２を含むことができる。モデル１１８と全く同様に、これらのモデルは、三次元空間において幾何表面を表す任意の数学的方法を使用して表すことができる。これらのモデルは、モデル１１８とは別個であってもよく、又は仮想環境の単一表現に結合されてもよい。 [0044] In addition to the arena, the virtual environment may include various other three-dimensional models that depict different components of the environment. For example, the three-dimensional environment can include decoration model 114 , speaker model 116 , and presentation screen model 122 . Just like models 118, these models can be represented using any mathematical method for representing geometric surfaces in three-dimensional space. These models may be separate from model 118 or may be combined into a single representation of the virtual environment.

[0045] モデル１１４等の装飾モデルは、リアリズムを強化し、アリーナの美的アピールを増大させるように機能する。スピーカモデル１１６は、図５及び図７に関して更に詳細に下述するように、プレゼンテーション及び背景音楽等の音声を仮想的に発し得る。プレゼンテーション画面モデル１２２は、プレゼンテーションを例示するためのアウトレットを提供するように機能することができる。プレゼンター又はプレゼンテーション画面共有のビデオは、プレゼンテーション画面モデル１２２にテクスチャマッピングし得る。 [0045] Decorative models, such as model 114, serve to enhance realism and increase the aesthetic appeal of the arena. Speaker model 116 may virtually emit sounds such as presentations and background music, as described in more detail below with respect to FIGS. The presentation screen model 122 can function to provide outlets for demonstrating presentations. A presenter or presentation screen sharing video may be texture mapped onto the presentation screen model 122 .

[0046] ボタン１０８は、参加者のリストをユーザに提供し得る。一例において、ユーザがボタン１０８を選択した後、ユーザは、テキストメッセージを個々に又はグループとして送信することによって他の参加者とチャットすることができる。 [0046] Button 108 may provide the user with a list of participants. In one example, after the user selects button 108, the user can chat with other participants by sending text messages individually or as a group.

[0047] ボタン１１０は、ユーザが、インターフェース１００のレンダリングに使用される仮想カメラの属性を変更できるようにし得る。例えば、仮想カメラは、データが表示に向けてレンダリングされる角度を指定する視野を有し得る。カメラ視野内のデータのモデリングがレンダリングされ、一方、カメラ視野外のデータのモデリングはレンダリングされなくてもよい。デフォルトにより、仮想カメラの視野は、広角レンズ及び人間の視覚と釣り合った６０°と１１０°との間のどこかに設定し得る。しかしながら、ボタン１１０を選択すると、仮想カメラは、魚眼レンズと釣り合った１７０°を超えて視野を増大させ得る。これにより、ユーザは、仮想環境中の周囲のより広い周辺認識を有することが可能であり得る。 Button 110 may allow a user to change attributes of the virtual camera used to render interface 100 . For example, a virtual camera may have a field of view that specifies the angle at which data is rendered for display. Modeling of data within the camera's field of view may be rendered, while modeling of data outside the camera's field of view may not be rendered. By default, the field of view of the virtual camera can be set anywhere between 60° and 110°, commensurate with wide-angle lenses and human vision. However, when button 110 is selected, the virtual camera can increase the field of view beyond 170° to match the fisheye lens. This may allow the user to have a broader awareness of their surroundings in the virtual environment.

[0048] 最後に、ボタン１１２により、ユーザは仮想環境から出る。ボタン１１２を選択すると、他の参加者に属するデバイスに、前にインターフェース１００を見ていたユーザに対応するアバターの表示を停止させるようにデバイスにシグナリングする通知が通知され得る。 [0048] Finally, button 112 causes the user to exit the virtual environment. Selection of button 112 may notify devices belonging to other participants with a notification signaling the device to stop displaying the avatar corresponding to the user previously viewing interface 100 .

[0049] このようにして、インターフェース仮想３Ｄ空間を使用して、テレビ会議を行う。あらゆるユーザはアバターを制御し、ユーザは、あちこち移動する、周囲を見る、ジャンプする、又は位置もしくは向きを変更する他のことを行うようにアバターを制御することができる。仮想カメラは、ユーザに仮想３Ｄ環境及びその他のアバターを示す。その他のユーザのアバターは、一体部分として、ユーザのウェブカム画像を示す仮想ディスプレイを有する。 [0049] In this way, the interface virtual 3D space is used to hold a video conference. Any user controls an avatar, and a user can control an avatar to move around, look around, jump, or do other things that change position or orientation. A virtual camera presents the user with a virtual 3D environment and other avatars. The other user's avatar has as an integral part a virtual display showing the user's webcam image.

[0050] ユーザに空間の感覚を与え、ユーザが互いの顔を見られるようにすることにより、実施形態は、従来のウェブ会議又は従来のＭＭＯゲームよりも社会的な体験を提供する。より社会的な体験は多様な用途を有する。例えば、オンラインショッピングで使用することができる。例えば、インターフェース１００は、仮想のスーパーマーケット、礼拝所、見本市、Ｂ２Ｂ販売、Ｂ２Ｃ販売、スクーリング、レストラン又は食堂、製品リリース、建築現場訪問（例えば建築家、エンジニア、請負業者用）、オフィススペース（例えば人々が仮想的に「各自のデスクで」仕事をする）、機械のリモート制御（船、車両、飛行機、潜水艦、ドローン、穿孔機器等）、工場／施設制御室、医療処置、庭設計、ガイド付き仮想バスツアー、音楽イベント（例えばコンサート）、講義（例えばＴＥＤトーク）、政治団体のミーティング、役員ミーティング、水中調査、辿り着くのが難しい場所の調査、緊急事態（例えば火災）に向けた訓練、料理、買い物（支払い及び配達）、仮想アート及びクラフト（例えば絵画及び陶芸）、結婚式、葬式、洗礼式、リモートスポーツ訓練、カウンセリング、恐怖への対処（例えば対面療法）、ファッションショー、アミューズメントパーク、家の装飾、スポーツ観戦、ｅスポーツ観戦、三次元カメラを使用して捕捉されたパフォーマンスの閲覧、ボードゲーム及びルールプレイングゲームでの遊び、医療画像上／医療画像を通して歩くこと、地質データの閲覧、言語学習、視覚障害者用の空間におけるミーティング、聴覚障害者用の空間におけるミーティング、通常なら歩くことができない又は立つことができない人々によるイベントへの参加、ニュース又は天気の提示、トークショー、本のサイン会、投票、ＭＭＯ、仮想ロケーションの購入／販売（San Francisco, CA所在のLinden Research, Inc.から入手可能なSECOND LIFEゲームのような幾つかのＭＭＯで利用可能なもの等）、蚤の市、ガレージセール、旅行エージェンシー、銀行、アーカイブ、コンピュータプロセス管理、フェンシング／刀剣格闘技／武術、再現ビデオ（例えば犯罪シーン及び／又は事故の再現）、リアルイベントのリハーサル（例えばウェディング、プレゼンテーション、ショー、スペースウォーク）、三次元カメラで捕捉されたリアルイベントの評価又は閲覧、家畜ショー、動物園、背の高い人／背の低い人／視覚障害者／聴覚障害者／白人／黒人としての人生の体験（例えばユーザが反応を体験したい視点をシミュレートするための仮想世界の変更ビデオストリーム又は静止画）、ジョブ面接、ゲームショー、インタラクティブフィクション（例えば殺人ミステリー）、仮想魚釣り、仮想セーリング、心理学的調査、挙動分析、仮想スポーツ（例えばクライミング／ボルダリング）、家又は他の場所での照明制御等（ホームオートメーション）、記憶の宮殿、考古学、ギフトショップ、顧客が実際に訪れたときにより快適であるようにするような仮想訪問、処置を説明し、人々により快適に感覚させるための仮想医療処置及び仮想取引所／金融市場／株式市場（例えばリアルタイムデータ及びビデオフィードを統合して仮想世界のリアルタイム取引及び分析にする）、実際に互いと組織的に会えるようにする、仕事の一環として人々が行く必要がある仮想場所（例えばインボイスを作成したい場合、仮想場所内からでのみそうすることが可能である）、及び顔の表情（例えば軍事、法執行、消防士、特殊作戦にとって有用）を見ることができるように人物の顔をＡＲヘッドセット（又はヘルメット）上に投影する拡張現実、並びに予約（例えば特定の別荘／車／等）を提供するに当たり用途を有する。 [0050] By giving users a sense of space and allowing them to see each other's faces, embodiments provide a more social experience than traditional web conferencing or traditional MMO games. A more social experience has multiple uses. For example, it can be used in online shopping. For example, the interface 100 can be used for virtual supermarkets, houses of worship, trade fairs, B2B sales, B2C sales, schooling, restaurants or canteens, product releases, building site visits (e.g. for architects, engineers, contractors), office spaces (e.g. people work virtually "at their desks"), remote control of machines (ships, vehicles, planes, submarines, drones, drilling equipment, etc.), factory/facility control rooms, medical procedures, garden design, guided virtual Bus tours, musical events (e.g. concerts), lectures (e.g. TED talks), political group meetings, board meetings, underwater surveys, surveys of hard-to-reach locations, emergency drills (e.g. fire), cooking, shopping (payment and delivery), virtual arts and crafts (e.g. painting and pottery), weddings, funerals, baptisms, remote sports training, counseling, coping with fear (e.g. face-to-face therapy), fashion shows, amusement parks, home Decoration, watching sports, watching e-sports, viewing performances captured using 3D cameras, playing board games and rule-playing games, walking on/through medical images, viewing geological data, language learning , meetings in spaces for the visually impaired, meetings in spaces for the deaf, participation in events by people who normally cannot walk or stand, news or weather presentations, talk shows, book signings, Voting, MMOs, buying/selling virtual locations (such as those available in some MMOs such as the SECOND LIFE game available from Linden Research, Inc. of San Francisco, CA), flea markets, garage sales, travel Agencies, banks, archives, computer process control, fencing/sword fighting/martial arts, reenactment videos (e.g. crime scenes and/or accident reenactments), real event rehearsals (e.g. weddings, presentations, shows, space walks), 3D cameras. Livestock shows, zoos, experiencing life as a tall/short/blind/deaf/white/black person (e.g. user wants to experience reactions) changing video streams or still images of virtual worlds to simulate viewpoints), job interviews, game shows, interactive fiction (e.g. murder mysteries), virtual fishing, virtual sailing, psychological research, behavioral analysis, virtual sports (e.g. climbing/bouldering), lighting control, etc. in the home or elsewhere (home automation), memory palaces, archeology, gift shops, virtual visits, procedures to make the customer more comfortable when visiting in person virtual medical procedures and virtual exchanges/financial markets/stock markets (e.g. integrating real-time data and video feeds into real-time trading and analysis of virtual worlds) to explain and make people feel more comfortable, actually interacting with each other virtual places people need to go to as part of their work (e.g. if you want to create an invoice, you can only do so from within the virtual place), and facial expressions ( Augmented reality that projects a person's face onto an AR headset (or helmet) so that it can be seen (e.g. useful for military, law enforcement, firefighters, special operations), as well as reservations (e.g. specific villas/cars/etc.) ) has a use in providing

[0051] 図２は、テレビ会議に向けてアバターを有する仮想環境をレンダリングするのに使用される三次元モデルを示す図２００である。図１に示されるのと全く同様に、ここでの仮想環境は三次元アリーナ１１８と、三次元モデル１１４及び１２２を含む種々の三次元モデルとを含む。これもまた図１に示されるように、図２００は、仮想環境のあちこちをナビゲートするアバター１０２Ａ及び１０２Ｂを含む。 [0051] Figure 2 is a diagram 200 showing a three-dimensional model used to render a virtual environment with avatars for a video conference. Much like that shown in FIG. 1, the virtual environment here includes a 3D arena 118 and various 3D models, including 3D models 114 and 122 . Also as shown in FIG. 1, diagram 200 includes avatars 102A and 102B navigating around a virtual environment.

[0052] 上述したように、図１中のインターフェース１００は、仮想カメラの視点からレンダリングされる。その仮想カメラは、図２００において仮想カメラ２０４として示されている。先に述べたように、図１におけるユーザ閲覧インターフェース１００は、仮想カメラ２０４を制御し、仮想カメラを三次元空間でナビゲートすることができる。インターフェース１００は一貫して、仮想カメラ２０４の新しい位置及び仮想カメラ２０４のしや内のモデルの任意の変化に従って更新される。上述したように、仮想カメラ２０４の視野は、画角の水平視野及び垂直視野によって少なくとも部分的に定義される錐台であり得る。 [0052] As mentioned above, the interface 100 in FIG. 1 is rendered from the perspective of a virtual camera. That virtual camera is shown as virtual camera 204 in diagram 200 . As previously mentioned, the user viewing interface 100 in FIG. 1 can control the virtual camera 204 and navigate the virtual camera in three-dimensional space. The interface 100 is consistently updated according to the new position of the virtual camera 204 and any changes in the model within the shadow of the virtual camera 204 . As noted above, the field of view of virtual camera 204 may be a frustum defined at least in part by the horizontal field of view and the vertical field of view of the angle of view.

[0053] 図１に関して上述したように、背景画像又はテクスチャは仮想環境の少なくとも一部を定義し得る。背景画像は、遠くに見えることが意図される仮想環境の側面を捕捉し得る。背景画像は球体２０２にテクスチャマッピングし得る。仮想カメラ２０４は球体２０２の原点であり得る。このようにして、仮想環境の遠くの特徴を効率的にレンダリングし得る。 [0053] As described above with respect to Figure 1, a background image or texture may define at least part of the virtual environment. A background image may capture aspects of the virtual environment that are intended to be seen in the distance. A background image may be texture mapped onto the sphere 202 . Virtual camera 204 may be the origin of sphere 202 . In this way, distant features of the virtual environment can be rendered efficiently.

[0054] 他の実施形態において、球体２０２の代わりに他の形状を使用して、背景画像にテクスチャマッピングし得る。種々の代替の実施形態において、形状は円柱形、立方体、矩形プリズム、又は任意の他の三次元ジオメトリであり得る。 [0054] In other embodiments, other shapes may be used in place of the sphere 202 to texture map to the background image. In various alternative embodiments, the shape may be cylindrical, cubic, rectangular prism, or any other three-dimensional geometry.

[0055] 図３は、仮想環境においてテレビ会議を提供するシステム３００を示す図である。システム３００は、１つ又は複数のネットワーク３０４を介してデバイス３０６Ａ及び３０６Ｂに結合されたサーバ３０２を含む。 [0055] FIG. 3 illustrates a system 300 for providing videoconferencing in a virtual environment. System 300 includes a server 302 coupled via one or more networks 304 to devices 306A and 306B.

[0056] サーバ３０２は、デバイス３０６Ａと３０６Ｂとの間でテレビ会議セッションを接続するサービスを提供する。より詳細に下述するように、サーバ３０２は、新しい参加者が会議に加わったとき及び既存の参加者が会議から退出するとき、通知を会議参加者のデバイス（例えばデバイス３０６Ａ及び３０６Ｂ）に通信する。サーバ３０２は、三次元仮想空間内の各参加者の仮想カメラの三次元仮想空間における位置及び方向を記述するメッセージを通信する。サーバ３０２は、ビデオ及びオーディオストリームを参加者の各デバイス（例えばデバイス３０６Ａ及び３０６Ｂ）間で通信もする。最後に、サーバ３０２は、三次元仮想空間を指定するデータを記述するデータを記憶し、各デバイス３０６Ａ及び３０６Ｂに送信する。 [0056] Server 302 provides a service to connect a videoconference session between devices 306A and 306B. As described in more detail below, server 302 communicates notifications to conference participant devices (e.g., devices 306A and 306B) when new participants join the conference and when existing participants leave the conference. do. The server 302 communicates messages describing the position and orientation in the three-dimensional virtual space of each participant's virtual camera in the three-dimensional virtual space. Server 302 also communicates video and audio streams between each of the participant's devices (eg, devices 306A and 306B). Finally, server 302 stores and transmits data describing data specifying the three-dimensional virtual space to each of devices 306A and 306B.

[0057] 仮想会議に必要なデータに加えて、サーバ３０２は、インタラクティブ会議を提供するためにデータをいかにレンダリングするかについてデバイス３０６Ａ及び３０６Ｂに指示する、実行可能情報を提供し得る。 [0057] In addition to the data necessary for a virtual conference, server 302 may provide executable information that instructs devices 306A and 306B on how to render the data to provide an interactive conference.

[0058] サーバ３０２は要求に対して応答で応答する。サーバ３０２はウェブサーバであり得る。ウェブサーバは、ＨＴＴＰ（ハイパーテキスト転送プロトコル）及び他のプロトコルを使用して、ワールドワイドウェブを経由して行われたクライアント要求に応答するソフトウェア及びハードウェアである。ウェブサーバの主な仕事は、ウェブページの記憶、処理、及びユーザの送出を通してウェブサイトコンテンツを表示することである。 [0058] The server 302 responds to the request with a response. Server 302 may be a web server. A web server is software and hardware that responds to client requests made over the World Wide Web using HTTP (Hypertext Transfer Protocol) and other protocols. A web server's primary job is to display web site content through storing, processing, and serving web pages to users.

[0059] 代替の一実施形態において、デバイス３０６Ａと３０６Ｂとの間の通信はサーバ３０２を通して行われず、ピアツーピアベースで行われる。その実施形態において、各参加者の場所及び方向を記述するデータ、新しい参加者及び既存の参加者に関する通知、並びに各参加者のビデオストリーム及びオーディオストリームの１つ又は複数は、サーバ３０２を通してではなく、デバイス３０６Ａと３０６Ｂとの間で直接通信される。 [0059] In an alternative embodiment, communication between devices 306A and 306B does not occur through server 302, but on a peer-to-peer basis. In that embodiment, data describing the location and direction of each participant, notifications about new and existing participants, and one or more of each participant's video and audio streams are transmitted through server 302 rather than through server 302. , are communicated directly between devices 306A and 306B.

[0060] ネットワーク３０４は、種々のデバイス３０６Ａ及び３０６Ｂ並びにサーバ３０２間の通信を可能にする。ネットワーク３０４はアドホックネットワーク、イントラネット、エクストラネット、仮想施設ネットワーク（ＶＰＮ）、ローカルエリアネットワーク（ＬＡＮ）、無線ＬＡＮ（ＷＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、無線ワイドエリアネットワーク（ＷＷＡＮ）、大都市圏ネットワーク（ＭＡＮ）、インターネットの一部、公衆交換電話回線網（ＰＳＴＮ）の一部、セルラ電話回線網、ワイヤレスネットワーク、WiFiネットワーク、WiMaxネットワーク、任意の他のタイプのネットワーク、又は２つ以上のそのようなネットワークの任意の組合せであり得る。 Network 304 enables communication between various devices 306 A and 306 B and server 302 . Network 304 can include ad hoc networks, intranets, extranets, virtual facility networks (VPNs), local area networks (LANs), wireless LANs (WLANs), wide area networks (WANs), wireless wide area networks (WWANs), metropolitan area networks. (MAN), part of the Internet, part of the Public Switched Telephone Network (PSTN), cellular telephone network, wireless network, WiFi network, WiMax network, any other type of network, or two or more of such. any combination of suitable networks.

[0061] デバイス３０６Ａ及び３０６Ｂは、仮想会議への各参加者の各デバイスである。デバイス３０６Ａ及び３０６Ｂは各々、仮想会議を行うために必要なデータを受信し、仮想会議を提供するのに必要なデータをレンダリングする。より詳細に下述するように、デバイス３０６Ａ及び３０６Ｂは、レンダリングされた会議情報を提示するためのディスプレイ、ユーザが仮想カメラを制御できるようにする入力、会議に向けてオーディオをユーザに提供するスピーカ（ヘッドセット等）、ユーザの音声入力を捕捉するためのマイクロホン、及びユーザの顔のビデオを捕捉するように位置決めされたカメラを含む。 [0061] Devices 306A and 306B are each device of each participant in the virtual conference. Devices 306A and 306B each receive the data necessary to conduct the virtual conference and render the data necessary to provide the virtual conference. As described in more detail below, devices 306A and 306B include displays for presenting rendered conference information, inputs for allowing users to control virtual cameras, and speakers for providing audio to users for conferences. (such as a headset), a microphone for capturing the user's voice input, and a camera positioned to capture video of the user's face.

[0062] デバイス３０６Ａ及び３０６Ｂは、ラップトップ、デスクトップ、スマートフォン若しくはタブレットコンピュータ、又はウェアラブルコンピュータ（スマーツウォッチ、又は拡張現実、若しくは仮想現実ヘッドセット等）を含め、任意のタイプの計算デバイスであることができる。 [0062] Devices 306A and 306B may be any type of computing device, including laptop, desktop, smartphone or tablet computers, or wearable computers (such as smartwatches or augmented reality or virtual reality headsets). can be done.

[0063] ウェブブラウザ３０８Ａ及び３０８Ｂは、リンク識別子（例えばユニフォームリソースロケータ又はＵＲＬ）によってアドレス指定されたネットワークリソース（ウェブページ等）を検索し、表示に向けてネットワークリソースを提示することができる。特に、ウェブブラウザ３０８Ａ及び３０８Ｂは、ワールドワイドウェブ上の情報にアクセスするためのソフトウェアアプリケーションである。通常、ウェブブラウザ３０８Ａ及び３０８Ｂは、ハイパーテキスト転送プロトコル（ＨＴＴＰ又はＨＴＴＰＳ）を使用してこの要求を行う。ユーザが特定のウェブサイトからのウェブページを要求すると、ウェブブラウザは必要なコンテンツをウェブサーバから検索し、コンテンツを解釈して実行し、次いでクライアント／相手方会議アプリケーション３０８Ａ及び３０８Ｂとして示されているデバイス３０６Ａ及び３０６Ｂのディスプレイにページを表示する。例において、コンテンツは、ＨＴＭＬ及びJavaScript等のクライアント側スクリプトを有し得る。表示されると、ユーザは情報を入力し、ページ上で選択を行うことができ、それによりウェブブラウザ３０８Ａ及び３０８Ｂは更なる要求を行う。 [0063] Web browsers 308A and 308B can retrieve network resources (such as web pages) addressed by link identifiers (eg, uniform resource locators or URLs) and present the network resources for display. In particular, web browsers 308A and 308B are software applications for accessing information on the World Wide Web. Web browsers 308A and 308B typically make this request using the Hypertext Transfer Protocol (HTTP or HTTPS). When a user requests a web page from a particular website, the web browser retrieves the required content from the web server, interprets and executes the content, and then the device shown as client/partner conferencing applications 308A and 308B. The page is displayed on the displays at 306A and 306B. In an example, the content can include client-side script such as HTML and JavaScript. Once displayed, the user can enter information and make selections on the page, which causes web browsers 308A and 308B to make further requests.

[0064] 会議アプリケーション３１０Ａ及び３１０Ｂは、サーバ３０２からダウンロードされ、各ウェブブラウザ３０８Ａ及び３０８Ｂによって実行されるように構成されたウェブアプリケーションであり得る。一実施形態において、会議アプリケーション３１０Ａ及び３１０ＢはJavaScriptアプリケーションであり得る。一例において、会議アプリケーション３１０Ａ及び３１０Ｂは、Typescript言語等の高水準言語で書き、JavaScriptに翻訳又はコンパイルすることができる。会議アプリケーション３１０Ａ及び３１０Ｂは、WebGL JavaScriptアプリケーションプログラミングインターフェースと対話するように構成される。JavaScriptで指定された制御コード及びOpenGL ESシェーディング言語（ＧＬＳＬＥＳ）で書かれたシェーダコードを有し得る。WebGL ＡＰＩを使用して、会議アプリケーション３１０Ａ及び３１０Ｂはデバイス３０６Ａ及び３０６Ｂのグラフィックス処理ユニット（図示せず）を利用することが可能であり得る。更に、プラグインを使用しないインタラクティブ二次元及び三次元グラフィックスのOpenGLレンダリング。 [0064] Conferencing applications 310A and 310B may be web applications downloaded from server 302 and configured to be executed by respective web browsers 308A and 308B. In one embodiment, conferencing applications 310A and 310B may be JavaScript applications. In one example, conferencing applications 310A and 310B can be written in a high level language such as the Typescript language and translated or compiled into JavaScript. Conferencing applications 310A and 310B are configured to interact with the WebGL JavaScript application programming interface. It may have control code specified in JavaScript and shader code written in OpenGL ES Shading Language (GLSL ES). Using the WebGL API, conferencing applications 310A and 310B may be able to take advantage of the graphics processing units (not shown) of devices 306A and 306B. Additionally, OpenGL rendering of interactive 2D and 3D graphics without plugins.

[0065] 会議アプリケーション３１０Ａ及び３１０Ｂは、サーバ３０２から、他のアバターの位置及び方向を記述するデータ並びに仮想環境を記述する三次元モデリング情報を受信する。加えて、会議アプリケーション３１０Ａ及び３１０Ｂは、サーバ３０２から他の会議参加者のビデオストリーム及びオーディオストリームを受信する。 [0065] Conferencing applications 310A and 310B receive from server 302 data describing the positions and orientations of other avatars and three-dimensional modeling information describing the virtual environment. Additionally, conferencing applications 310A and 310B receive video and audio streams of other conference participants from server 302 .

[0066] 会議アプリケーション３１０Ａ及び３１０Ｂは、三次元環境を記述するデータ及び各参加者アバターを表すデータを含め、３つの三次元モデリングデータをレンダリングする。このレンダリングには、ラスタ化技法、テクスチャマッピング技法、レイトレーシング技法、シェーディング技法、又は他のレンダリング技法が関わり得る。一実施形態において、レンダリングには、仮想カメラの特性に基づくレイトレーシングが関わり得る。レイトレーシングは、光路を画像面におけるピクセルとしてトレースし、仮想物体との直面の効果をシミュレートすることによって画像を生成することを含む。幾つかの実施形態において、リアリズムを強化するために、レイトレーシングは、反射、屈折、散乱、及び拡散等の光学効果をシミュレートし得る。 [0066] Conferencing applications 310A and 310B render three three-dimensional modeling data, including data describing the three-dimensional environment and data representing each participant avatar. This rendering may involve rasterization techniques, texture mapping techniques, ray tracing techniques, shading techniques, or other rendering techniques. In one embodiment, rendering may involve ray tracing based on virtual camera properties. Ray tracing involves generating an image by tracing a light path as pixels in an image plane, simulating the effect of encountering a virtual object. In some embodiments, ray tracing may simulate optical effects such as reflection, refraction, scattering, and diffusion to enhance realism.

[0067] このようにして、ユーザはウェブブラウザ３０８Ａ及び３０８Ｂを使用して、仮想空間に入り得る。シーンはユーザの画面に表示される。ユーザのウェブカムビデオストリーム及びマイクロンオーディオストリームはサーバ３０２に送信される。他のユーザが仮想空間に入ると、それらのユーザのアバターモデルが作成される。このアバターの位置はサーバに送信され、その他のユーザによって受信される。他のユーザはまた、オーディオ／ビデオストリームが利用可能なことの通知をサーバ３０２から取得もする。ユーザのビデオストリームは、そのユーザに作成されたアバターに配置される。オーディオストリームは、アバターの位置から来るものとして再生される。 [0067] In this manner, a user may enter a virtual space using web browsers 308A and 308B. The scene is displayed on the user's screen. The user's webcam video stream and Micron audio stream are sent to server 302 . As other users enter the virtual space, avatar models of those users are created. This avatar position is sent to the server and received by other users. Other users also get notifications from server 302 that audio/video streams are available. A user's video stream is placed in an avatar created for that user. The audio stream is played as coming from the avatar's position.

[0068] 図４Ａ～図４Ｃは、テレビ会議を提供するために図３におけるシステムの種々の構成要素間でデータがいかに転送されるかを示す。図３のように、図４Ａ～図４Ｃの各々は、サーバ３０２とデバイス３０６Ａ及び３０６Ｂとの間の接続を示す。特に図４Ａ～図４Ｃはそれらのデバイス間のデータフロー例を示す。 [0068] Figures 4A-4C illustrate how data is transferred between the various components of the system in Figure 3 to provide a video conference. Like FIG. 3, each of FIGS. 4A-4C show connections between server 302 and devices 306A and 306B. In particular, Figures 4A-4C show example data flow between these devices.

[0069] 図４Ａは、サーバ３０２が、仮想環境を記述したデータをデバイス３０６Ａ及び３０６Ｂにいかに送信するかを示す図４００を示す。特に、両デバイス３０６Ａ及び３０６Ｂは、サーバ３０２から三次元アリーナ４０４、背景テクスチャ４０２、空間階層４０８、及び任意の他の三次元モデリング情報４０６を受信する。 [0069] Figure 4A shows a diagram 400 illustrating how server 302 sends data describing a virtual environment to devices 306A and 306B. In particular, both devices 306A and 306B receive 3D arena 404, background textures 402, spatial hierarchy 408, and any other 3D modeling information 406 from server 302. FIG.

[0070] 上述したように、背景テクスチャ４０２は、仮想環境の別個の特徴を示す画像である。画像は規則的（レンガの壁等）又は非規則的であり得る。背景テクスチャ４０２は、ビットマップ、ＪＰＥＧ、ＧＩＦ、又は他のファイル画像フォーマット等の任意の一般的な画像ファイルフォーマットで符号化し得る。例えば、遠くの球体に対してレンダリングされる背景画像を記述する。 [0070] As mentioned above, the background texture 402 is an image that represents a distinct feature of the virtual environment. Images can be regular (such as a brick wall) or irregular. Background texture 402 may be encoded in any common image file format such as bitmap, JPEG, GIF, or other file image format. For example, describe a background image rendered against a distant sphere.

[0071] 三次元アリーナ４０４は、会議が行われる空間の三次元モデルである。上述したように、三次元アリーナ４０４は例えば、メッシュと、恐らくは、記述する三次元プリミティブにマッピングされるそれ自体のテクスチャ情報とを含み得る。仮想カメラ及び各アバターは、仮想環境内をナビゲートすることができる空間を定義し得る。したがって、ナビゲート可能な仮想環境の外周をユーザに示すエッジ（壁又はフェンス等）によって区切られ得る。 [0071] Three-dimensional arena 404 is a three-dimensional model of the space in which the conference takes place. As noted above, the 3D arena 404 may include, for example, a mesh and possibly its own texture information that maps to the 3D primitives it describes. The virtual camera and each avatar may define a space that can be navigated within the virtual environment. Thus, it may be delimited by edges (such as walls or fences) that indicate to the user the perimeter of the navigable virtual environment.

[0072] 空間階層４０８は、仮想環境において区画を指定するデータである。これらの参加者は、音声が、参加者間で転送される前にいかに処理されるかを特定するのに使用される。下述するように、この区画データは階層を有し得、仮想会議への参加者が私的な会話又は雑談をすることができるエリアで許される音声処理を記述し得る。 [0072] Spatial hierarchy 408 is data that specifies a parcel in a virtual environment. These participants are used to specify how the audio is processed before being transferred between participants. As described below, this segment data may have a hierarchy and may describe the audio treatments that are allowed in areas where participants in the virtual conference may have private conversations or chit-chat.

[0073] 三次元モデル４０６は、会議を行うために必要な任意の他の三次元モデリング情報である。一実施形態において、これは、各アバターを記述する情報を含み得る。代替又は追加として、この情報は製品デモを含み得る。 [0073] Three-dimensional model 406 is any other three-dimensional modeling information needed to conduct the conference. In one embodiment, this may include information describing each avatar. Alternatively or additionally, this information may include product demos.

[0074] 会議を行うために必要な情報が参加者に送信された状態で、図４Ｂ及び図４Ｃは、サーバ３０２が情報をデバイス間でいかに転送するかを示す。図４Ｂは、サーバ３０２が各デバイス３０６Ａ及び３０６Ｂから情報をいかに受信するかを示す図４２０を示し、図４Ｃは、サーバ３０２が各デバイス３０６Ｂ及び３０６Ａに情報をいかに送信するかを示す図４２０を示す。特に、デバイス３０６Ａは位置及び方向４２２Ａ、ビデオストリーム４２４Ａ、並びにオーディオストリーム４２６Ａをサーバ３０２に送信し、サーバ３０２は、位置及び方向４２２Ａ、ビデオストリーム４２４Ａ、並びにオーディオストリーム４２６Ａをデバイス３０６Ｂに送信する。そしてデバイス３０６Ｂは位置及び方向４２２Ｂ、ビデオストリーム４２４Ｂ、並びにオーディオストリーム４２６Ｂをサーバ３０２に送信し、サーバ３０２は、位置及び方向４２２Ｂ、ビデオストリーム４２４Ｂ、並びにオーディオストリーム４２６Ｂをデバイス３０６Ａに送信する。 [0074] Once the information necessary to conduct the conference has been sent to the participants, Figures 4B and 4C illustrate how the server 302 transfers the information between devices. 4B shows a diagram 420 showing how server 302 receives information from each device 306A and 306B, and FIG. 4C shows a diagram 420 showing how server 302 sends information to each device 306B and 306A. show. In particular, device 306A sends position and orientation 422A, video stream 424A, and audio stream 426A to server 302, and server 302 sends position and orientation 422A, video stream 424A, and audio stream 426A to device 306B. Device 306B then sends position and orientation 422B, video stream 424B, and audio stream 426B to server 302, and server 302 sends position and orientation 422B, video stream 424B, and audio stream 426B to device 306A.

[0075] 位置及び方向４２２Ａ及び４２２Ｂは、デバイス３０６Ａを使用するユーザの仮想カメラの位置及び方向を記述する。上述したように、位置は三次元空間における座標（例えばｘ、ｙ、ｚ座標）であり得、方向は三次元空間における方向（例えばパン、チルト、ロール）であり得る。幾つかの実施形態において、ユーザは仮想カメラのロールを制御することが可能ではないことがあり、したがって、方向はパン角及びチルト角のみを指定することがある。同様に、幾つかの実施形態において、ユーザはアバターのｚ座標を制御することができないことがあり（アバターは仮想重力によって拘束されるため）、したがって、ｚ座標は不必要であり得る。このようにして、位置及び方向４２２Ａ及び４２２Ｂは各々、少なくとも、三次元仮想空間における水平面上の座標並びにパン値及びチルト値を含み得る。代替又は追加として、ユーザはアバターを「ジャンプ」させることが可能であり得、したがって、Ｚ位置は、ユーザがアバターをジャンプさせているか否かの指示によってのみ指定し得る。 [0075] Position and orientation 422A and 422B describe the position and orientation of the virtual camera of the user using device 306A. As mentioned above, the position can be a coordinate in three-dimensional space (eg, x, y, z coordinates) and the direction can be a direction in three-dimensional space (eg, pan, tilt, roll). In some embodiments, the user may not be able to control the roll of the virtual camera, so the directions may specify pan and tilt angles only. Similarly, in some embodiments, the user may not be able to control the avatar's z-coordinate (because the avatar is constrained by virtual gravity), so the z-coordinate may be unnecessary. Thus, position and orientation 422A and 422B may each include at least horizontal coordinates and pan and tilt values in three-dimensional virtual space. Alternatively or additionally, the user may be able to "jump" the avatar, so the Z position may only be specified by an indication of whether the user is making the avatar jump.

[0076] 異なる例において、位置及び方向４２２Ａ及び４２２Ｂは、ＨＴＴＰ要求応答を使用して又はソケットメッセージングを使用して送受信し得る。 [0076] In different examples, location and orientation 422A and 422B may be sent and received using HTTP request responses or using socket messaging.

[0077] ビデオストリーム４２４Ａ及び４２４Ｂは、各デバイス３０６Ａ及び３０６Ｂのカメラから捕捉されたビデオデータである。ビデオは圧縮し得る。例えば、ビデオは、ＭＰＥＧ－４、ＶＰ８、又はＨ．２６４を含め、任意の一般的に知られているビデオコーデックを使用し得る。ビデオはリアルタイムで捕捉し送信し得る。 [0077] Video streams 424A and 424B are video data captured from the camera of each device 306A and 306B. Video can be compressed. For example, video may be MPEG-4, VP8, or H.264. Any commonly known video codec may be used, including H.264. Video can be captured and transmitted in real time.

[0078] 同様に、オーディオストリーム４２６Ａ及び４２６Ｂは、各デバイスのマイクロホンから捕捉されたオーディオデータである。オーディオは圧縮し得る。例えば、ビデオはＭＰＥＧ－４又はvorbisを含め、任意の一般的に知られているオーディオコーデックを使用し得る。オーディオはリアルタイムで捕捉して送信し得る。ビデオストリーム４２４Ａ及びオーディオストリーム４２６Ａは、互いと同期して捕捉され、伝送され、提示される。同様に、ビデオストリーム４２４Ｂ及びオーディオストリーム４２６Ｂも、互いと同期して捕捉され、伝送され、提示される。 [0078] Similarly, audio streams 426A and 426B are audio data captured from the microphones of each device. Audio can be compressed. For example, video can use any commonly known audio codec, including MPEG-4 or vorbis. Audio can be captured and transmitted in real time. Video stream 424A and audio stream 426A are captured, transmitted and presented synchronously with each other. Similarly, video stream 424B and audio stream 426B are also captured, transmitted and presented synchronously with each other.

[0079] ビデオストリーム４２４Ａ及び４２４Ｂ並びにオーディオストリーム４２６Ａ及び４２６Ｂは、WebRTCアプリケーションプログラミングインターフェースを使用して伝送し得る。WebRTCはJavaScriptで利用可能なＡＰＩである。上述したように、デバイス３０６Ａ及び３０６Ｂは、会議アプリケーション３１０Ａ及び３１０Ｂとしてウェブアプリケーションをダウンロードして実行し、会議アプリケーション３１０Ａ及び３１０ＢはJavaScriptで実施し得る。会議アプリケーション３１０Ａ及び３１０ＢはWebRTCを使用して、JavaScriptからＡＰＩ呼び出しを行うことにより、ビデオストリーム４２４Ａ及び４２４Ｂ並びにオーディオストリーム４２６Ａ及び４２６Ｂを受信及び送信し得る。 [0079] Video streams 424A and 424B and audio streams 426A and 426B may be transmitted using the WebRTC application programming interface. WebRTC is an API that can be used with JavaScript. As described above, devices 306A and 306B download and execute web applications as conference applications 310A and 310B, which may be implemented in JavaScript. Conferencing applications 310A and 310B may use WebRTC to receive and send video streams 424A and 424B and audio streams 426A and 426B by making API calls from JavaScript.

[0080] 先に触れたように、ユーザが仮想会議から退出するとき、この離脱は他の全てのユーザに通信される。例えば、デバイス３０６Ａが仮想会議を出る場合、サーバ３０２は、その離脱をデバイス３０６Ｂに通信する。その結果、デバイス３０６Ｂはデバイス３０６Ａに対応するアバターのレンダリングを停止し、そのアバターを仮想空間から削除する。更に、デバイス３０６Ｂは、ビデオストリーム４２４Ａ及びオーディオストリーム４２６Ａの受信を停止する。 [0080] As alluded to earlier, when a user leaves a virtual conference, this departure is communicated to all other users. For example, if device 306A leaves the virtual conference, server 302 communicates its departure to device 306B. As a result, device 306B stops rendering the avatar corresponding to device 306A and removes the avatar from the virtual space. Additionally, device 306B stops receiving video stream 424A and audio stream 426A.

[0081] 上述したように、会議アプリケーション３１０Ａ及び３１０Ｂは、各ビデオストリーム４２４Ａ及び４２４Ｂからの新しい情報、位置及び方向４２２Ａ及び４２２Ｂ、並びに三次元環境に関連する新しい情報に基づいて仮想空間を周期的又は断続的に再レンダリングし得る。簡潔にするために、これらの更新の各々はここでは、デバイス３０６Ａの視点から説明される。しかしながら、当業者ならば、同様の変更を所与として、デバイス３０６Ｂも同様に挙動することを理解しよう。 [0081] As described above, conferencing applications 310A and 310B periodically navigate the virtual space based on new information from each video stream 424A and 424B, position and orientation 422A and 422B, and new information related to the three-dimensional environment. Or it can re-render intermittently. For brevity, each of these updates is described here from the perspective of device 306A. However, those skilled in the art will appreciate that device 306B behaves similarly, given similar modifications.

[0082] デバイス３０６Ａは、ビデオストリーム４２４Ｂを受信すると、ビデオストリーム４２４Ａからのフレームを、デバイス３０６Ｂに対応するアバターにテクスチャマッピングする。そのテクスチャマッピングされたアバターは、三次元仮想空間内で再レンダリングされ、デバイス３０６Ａのユーザに提示される。 [0082] When device 306A receives video stream 424B, it texture maps frames from video stream 424A onto an avatar corresponding to device 306B. The texture-mapped avatar is re-rendered in the three-dimensional virtual space and presented to the user of device 306A.

[0083] デバイス３０６Ａは、新しい位置及び方向４２２Ｂを受信すると、新しい位置に位置し新しい方向を向いた、デバイス３０６Ｂに対応するアバターを生成する。生成されたアバターは、三次元仮想空間内で再レンダリングされ、デバイス３０６Ａのユーザに提示される。 [0083] When device 306A receives the new location and orientation 422B, it generates an avatar corresponding to device 306B located at the new location and facing the new orientation. The generated avatar is re-rendered within the three-dimensional virtual space and presented to the user of device 306A.

[0084] 幾つかの実施形態において、サーバ３０２は、三次元仮想環境を記述する更新されたモデル情報を送信し得る。例えば、サーバ３０２は更新された情報４０２、４０４、４０６、又は４０８を送信し得る。それが行われると、デバイス３０６Ａは、更新された情報に基づいて仮想環境を再レンダリングする。これは、環境が経時変化する場合、有用であり得る。例えば、屋外イベントは、イベントが進行するにつれて日中から夕暮れに変化し得る。 [0084] In some embodiments, the server 302 may transmit updated model information describing the three-dimensional virtual environment. For example, server 302 may send updated information 402 , 404 , 406 , or 408 . When that happens, device 306A re-renders the virtual environment based on the updated information. This can be useful if the environment changes over time. For example, an outdoor event may change from day to dusk as the event progresses.

[0085] ここでも、デバイス３０６Ｂが仮想会議から出ると、サーバ３０２は、デバイス３０６Ｂがもはや会議に参加していないことを示す通知をデバイス３０６Ａに送信する。その場合、デバイス３０６Ａは、デバイス３０６Ｂのアバターのない仮想環境を再レンダリングする。 [0085] Again, when device 306B leaves the virtual meeting, server 302 sends a notification to device 306A indicating that device 306B is no longer in the meeting. Device 306A then re-renders the virtual environment without the avatar of device 306B.

[0086] 図４Ａ～図４Ｃにおける図３は、簡潔にするために２つのデバイスを有して示されているが、本明細書に記載の技法が任意の数のデバイスに拡張できることを当業者は理解するであろう。また、図４Ａ～図４Ｃにおける図３は単一のサーバ３０２を示しているが、サーバ３０２の機能が複数の計算デバイスにわたって分散することができることを当業者は理解するであろう。一実施形態において、図４Ａにおいて転送されたデータは、サーバ３０２の１つのネットワークアドレスからのものであり得るが、図４Ｂ及び図４Ｃにおいて転送されるデータは、サーバ３０２の別のネットワークアドレスに／ネットワークアドレスから転送することができる。 [0086] Although FIG. 3 in FIGS. 4A-4C is shown with two devices for simplicity, those skilled in the art will appreciate that the techniques described herein can be extended to any number of devices. will understand. Also, while FIG. 3 in FIGS. 4A-4C shows a single server 302, those skilled in the art will appreciate that the functionality of server 302 can be distributed across multiple computing devices. In one embodiment, the data transferred in FIG. 4A may be from one network address of server 302, while the data transferred in FIGS. Can be forwarded from a network address.

[0087] 一実施形態において、参加者は、仮想会議に入る前に、ウェブカム、マイクロホン、スピーカ、及びグラフィカル設定を設定することができる。代替の一実施形態において、アプリケーションを開始した後、ユーザは仮想ロビーに入り得、仮想ロビーにおいて、実際の人物によって制御されるアバターによって迎えられる。この人物は、ユーザのウェブカム、マイクロホン、スピーカ、及びグラフィカル設定を見て変更することが可能である。係員は、例えば見ること、あちこち移動すること、及び対話することについて教えることによって、仮想環境の使用方法をユーザに指示することもできる。ユーザは、準備ができると、仮想待機部屋から自動的に出て、実際の仮想環境に参加する。 [0087] In one embodiment, participants may configure webcams, microphones, speakers, and graphical settings prior to entering a virtual meeting. In an alternative embodiment, after starting the application, the user may enter a virtual lobby, where they are greeted by an avatar controlled by a real person. This person can view and change the user's webcam, microphone, speakers, and graphical settings. Attendants can also instruct users on how to use the virtual environment, for example, by teaching them to see, move around, and interact. When the user is ready, the user automatically exits the virtual waiting room and joins the real virtual environment.

仮想環境におけるテレビ会議での音量調整
[0088] 実施形態はまた、仮想会議内の位置及び空間の感覚を提供するように音量を調整する。これは例えば、図５～図７、図８Ａ、図８Ｂ、及び図９Ａ～図９Ｃに示され、各図について以下説明する。 Volume control in video conferencing in virtual environment
[0088] Embodiments also adjust the volume to provide a sense of position and space within the virtual meeting. This is illustrated, for example, in FIGS. 5-7, 8A, 8B, and 9A-9C, each of which is described below.

[0089] 図５は、テレビ会議中、仮想環境における位置の感覚を提供するように相対的な左右の音量を調整する方法５００を示すフローチャートである。 [0089] FIG. 5 is a flowchart illustrating a method 500 for adjusting relative left and right volumes to provide a sense of position in a virtual environment during a video conference.

[0090] ステップ５０２において、音量は、アバター間の距離に基づいて調整される。上述したように、別のユーザのデバイスのマイクロホンからのオーディオストリームが受信される。第１及び第２のオーディオストリームの両方の音量は、第１の位置に対する第２の位置間の距離に基づいて調整される。これを図６に示す。 [0090] At step 502, the volume is adjusted based on the distance between the avatars. As described above, an audio stream is received from the microphone of another user's device. Volumes of both the first and second audio streams are adjusted based on the distance between the first position and the second position. This is shown in FIG.

[0091] 図６は、アバター間の距離が増大するにつれて音量がいかに低下するかを示すチャート６００を示す。チャート６００は、ｘ軸及びｙ軸に音量６０２を示す。ユーザ間の距離が増大するにつれて、音量は、基準距離６０２に達するまで一定のままである。基準距離６０２に達した時点で、音量は低下し始める。このようにして、他の全てのことは等しく、近いユーザほど、遠いユーザよりも音が大きい。 [0091] Figure 6 shows a chart 600 showing how the volume decreases as the distance between the avatars increases. Chart 600 shows volume 602 on the x-axis and y-axis. As the distance between users increases, the volume remains constant until a reference distance 602 is reached. When the reference distance 602 is reached, the volume begins to fall. In this way, all other things being equal, closer users are louder than distant users.

[0092] 音声が低下する速度は低下係数に依存する。これは、テレビ会議システム又はクライアントデバイスの設定に内蔵される係数であり得る。線６０８及び線６１０によって示されるように、低下係数が大きいほど、小さいよりも音量は急速に低下する。 [0092] The rate at which the speech drops depends on the drop factor. This may be a factor built into the settings of the videoconferencing system or client device. As shown by lines 608 and 610, a larger drop factor will drop the volume more quickly than a smaller one.

[0093] 図５に戻ると、ステップ５０４において、相対的な左右オーディオは、アバターが配置された方向に基づいて調整される。即ち、ユーザのスピーカ（例えばヘッドセット）で出力されるオーディオの音量は、話しているユーザのアバターが配置されている場所の感覚を提供するように変更される。左右のオーディオストリームの相対音量は、オーディオを受け取っているユーザが配置されている位置（例えば仮想カメラの場所）に相対する、オーディオストリームを生成しているユーザが配置された位置（例えば話しているユーザのアバターの場所）の方向に基づいて調整される。位置は、三次元仮想空間内の水平面上にあり得る。左右のオーディオの相対音量は、第２の位置が第１の位置に相対して三次元仮想空間に存在する場所の感覚を提供するようにストリーミングされる。 [0093] Returning to Figure 5, at step 504, the relative left and right audio is adjusted based on the direction in which the avatar is positioned. That is, the volume of the audio output by the user's speakers (eg, headset) is altered to provide a sense of where the speaking user's avatar is located. The relative volume of the left and right audio streams is determined by the location of the user generating the audio stream (e.g. speaking) relative to the location of the user receiving the audio (e.g. the location of the virtual camera). user's avatar's location) orientation. The position can be on a horizontal plane in the three-dimensional virtual space. The relative volume of the left and right audio is streamed to provide a sense of where the second location is in the three-dimensional virtual space relative to the first location.

[0094] 例えば、ステップ５０４において、仮想カメラの左側のアバターに対応するオーディオは、オーディオが、受け取り側のユーザの右耳よりも左耳で大きな音量で出力されるように調整される。同様に、仮想カメラの右側のアバターに対応するオーディオは、オーディオが、受け取り側のユーザの左耳よりも右耳で大きな音量で出力されるように調整される。 [0094] For example, in step 504, the audio corresponding to the left avatar of the virtual camera is adjusted such that the audio is output louder in the left ear than in the right ear of the receiving user. Similarly, the audio corresponding to the avatar on the right side of the virtual camera is adjusted such that the audio is output louder in the receiving user's right ear than in the left ear.

[0095] ステップ５０６において、相対的な左右のオーディオは、一方のアバターが他方のアバターに対して向いている方向に基づいて調整される。左右のオーディオストリームの相対音量は、仮想カメラが面している方向とアバターが面している方向との間の角度に基づいて、角度が垂直になるほど、左右のオーディオストリーム間の音量差が大きくなる傾向があるように調整される。 [0095] At step 506, the relative left and right audio is adjusted based on the direction one avatar is facing relative to the other avatar. The relative volume of the left and right audio streams is based on the angle between the direction the virtual camera is facing and the direction the avatar is facing: the more vertical the angle, the greater the volume difference between the left and right audio streams. adjusted to tend to

[0096] 例えば、アバターが仮想カメラに直接面している場合、オーディオストリームに対応するアバターの相対的な左右の音量は、ステップ５０６において全く調整されなくてよい。アバターが仮想カメラの左側に面している場合、オーディオストリームに対応するアバターの相対的な左右の音量は、左が右よりも大きいように調整し得る。そしてアバターが仮想カメラの右側に面している場合、オーディオストリームに対応するアバターの相対的な左右の音量は、右が左よりも大きいように調整し得る。 [0096] For example, if the avatar is directly facing the virtual camera, the relative left and right volume of the avatar corresponding to the audio stream may not be adjusted at step 506 at all. If the avatar faces the left side of the virtual camera, the relative left and right volume of the avatar corresponding to the audio stream may be adjusted so that the left is louder than the right. And if the avatar faces the right side of the virtual camera, the relative left and right volume of the avatar corresponding to the audio stream can be adjusted so that the right is louder than the left.

[0097] 一例において、ステップ５０６における計算は、仮想カメラが面している角度とアバターが面している角度とのクロス積を取ることを含み得る。角度は、水平面上で面している方向であり得る。 [0097] In one example, the calculation in step 506 may include taking the cross product of the angle the virtual camera is facing and the angle the avatar is facing. The angle can be the direction facing on the horizontal plane.

[0098] 一実施形態において、チェックを行い、ユーザが使用しているオーディオ出力デバイスを特定し得る。オーディオ出力デバイスがヘッドフォンのセット又はステレオ効果を提供する別のタイプのスピーカではない場合、ステップ５０４及び５０６における調整は行われなくてよい。 [0098] In one embodiment, a check may be made to identify the audio output device that the user is using. If the audio output device is not a set of headphones or another type of speaker that provides a stereo effect, the adjustments in steps 504 and 506 may not be made.

[0099] ステップ５０２～５０６は、あらゆる他の参加者から受信されるあらゆるオーディオストリームに対して繰り返される。ステップ５０２～５０６における計算に基づいて、あらゆる他の参加者での左右のオーディオ利得が計算される。 [0099] Steps 502-506 are repeated for every audio stream received from every other participant. Left and right audio gains at every other participant are calculated based on the calculations in steps 502-506.

[0100] このようにして、各参加者のオーディオストリームは、参加者のアバターが三次元仮想環境において配置された場所の感覚を提供するように調整される。 [0100] In this way, each participant's audio stream is adjusted to provide a sense of where the participant's avatar was placed in the three-dimensional virtual environment.

[0101] アバターが配置された場所の感覚を提供するようにオーディオストリームが調整されるのみならず、特定の実施形態において、オーディオストリームは、私的又は半私的音量エリアを提供するようにも調整することができる。このようにして、仮想環境は、ユーザが私的会話を行えるようにする。また、仮想環境は、ユーザが互いと交流でき、従来のテレビ会議ソフトウェアでは可能ではなかった個別の雑談を行えるようにする。これは例えば、図７に関して示される。 [0101] Not only are the audio streams tailored to provide a sense of where the avatar is located, but in certain embodiments the audio streams are also adjusted to provide a private or semi-private volume area. can be adjusted. In this manner, the virtual environment allows users to conduct private conversations. The virtual environment also allows users to interact with each other and have personalized chats not possible with traditional videoconferencing software. This is illustrated, for example, with respect to FIG.

[0102] 図７は、テレビ会議中、仮想環境において異なる音量エリアを提供するように相対音量を調整する方法７００を示すフローチャートである。 [0102] Figure 7 is a flowchart illustrating a method 700 for adjusting relative volume to provide different volume areas in a virtual environment during a video conference.

[0103] 上述したように、サーバは、音声又は音量エリアの仕様をクライアントデバイスに提供し得る。仮想環境は異なる音量エリアに区画化し得る。ステップ７０２において、デバイスは、各アバター及び仮想カメラがどの音声エリアに配置されているかを特定する。 [0103] As noted above, the server may provide specifications of the audio or volume area to the client device. The virtual environment can be partitioned into different volume areas. At step 702, the device identifies in which audio area each avatar and virtual camera is located.

[0104] 例えば、図８Ａ及び図８Ｂは、テレビ会議中、仮想環境における異なる音量エリアを示す図である。図８Ａは、アバター８０６を制御しているユーザと仮想カメラを制御しているユーザとの間での半私的会話又は雑談を可能にする音量エリア８０２を有する図８００を示す。このようにして、会議テーブル８１０の周囲のユーザは、部屋の他の人を邪魔することなく会話することができる。仮想カメラのアバター８０６を制御しているユーザからの音声は、音量エリア８０２から出ると低下し得るが、完全になくなるわけではない。それにより、通行人は、参加したい場合、会議に加わることができる。 [0104] For example, Figures 8A and 8B are diagrams illustrating different volume areas in a virtual environment during a video conference. FIG. 8A shows a diagram 800 with a volume area 802 that allows for semi-private conversations or chats between the user controlling the avatar 806 and the user controlling the virtual camera. In this way, users around the conference table 810 can converse without disturbing others in the room. Audio from the user controlling the virtual camera avatar 806 may be attenuated when exiting the volume area 802, but not completely. Passers-by can then join the conference if they wish to participate.

[0105] インターフェース８００は、下述するボタン８０４、８０６、及び８０８も含む。 [0105] Interface 800 also includes buttons 804, 806, and 808, which are described below.

[0106] 図８Ｂは、アバター８０８を制御しているユーザと仮想カメラを制御しているユーザとの間での私的会話を可能にする音量エリア８０４を有する図８００を示す。音量エリア８０４内に入ると、アバター８０８を制御しているユーザ及び仮想カメラを制御しているユーザからのオーディオのみが、音量エリア８０４内部にいるユーザに出力され得る。オーディオがそれらのユーザから会議における他のユーザに全く再生されない場合、それらのオーディオストリームは、他のユーザデバイスに送信すらされなくてもよい。 [0106] Figure 8B shows a diagram 800 with a volume area 804 that allows private conversation between the user controlling the avatar 808 and the user controlling the virtual camera. Once within the volume area 804 , only audio from the user controlling the avatar 808 and the user controlling the virtual camera may be output to the user within the volume area 804 . If no audio is played from those users to other users in the conference, their audio streams may not even be sent to other user devices.

[0107] 音量空間は、図９Ａ及び図９Ｂに示されるように階層化し得る。図９Ｂは、階層に配置された異なる音量エリアを有するレイアウトを示す。音量エリア９３４及び９３５は音量エリア９３３内にあり、音量エリア９３３及び９３２は音量エリア９３１内にある。これらの音量エリアは、図９００及び図９Ａに示されるように、階層ツリーで表される。 [0107] The volume space may be layered as shown in Figures 9A and 9B. FIG. 9B shows a layout with different volume areas arranged in a hierarchy. Volume areas 934 and 935 are within volume area 933 and volume areas 933 and 932 are within volume area 931 . These volume areas are represented in a hierarchical tree, as shown in diagrams 900 and 9A.

[0108] 図９００において、ノード９０１は音量エリア９３１を表し、ツリーのルートである。ノード９０２及び９０３はノード９０１の子であり、音量エリア９３２及び９３３を表す。ノード９０４及び９０６はノード９０３の子であり、音量エリア９３４及び９３５を表す。 [0108] In diagram 900, node 901 represents volume area 931 and is the root of the tree. Nodes 902 and 903 are children of node 901 and represent volume areas 932 and 933 . Nodes 904 and 906 are children of node 903 and represent volume areas 934 and 935 .

[0109] エリア９３４に配置されたユーザが、エリア９３２に配置された、発話しているユーザを聞き取ろうとしている場合、オーディオストリームは、オーディオストリームを各々減衰させる幾つかの異なる仮想「壁」を通過する必要がある。特に、音声はエリア９３２の壁、エリア９３３の壁、及びエリア９３４の壁を通過する必要がある。各壁は特定の係数によって減衰させる。この計算は、図７のステップ７０４及び７０６に関して説明される。 [0109] If a user located in area 934 is trying to hear a speaking user located in area 932, the audio stream passes through several different virtual "walls" that each attenuate the audio stream. have to pass. Specifically, the sound needs to pass through the walls in area 932, the walls in area 933, and the walls in area 934. Each wall attenuates by a specific factor. This calculation is described with respect to steps 704 and 706 of FIG.

[0110] ステップ７０４において、階層をトラバースして、アバター間にある種々の音声エリアを特定する。これは例えば、図９Ｃに示される。話し声の仮想エリアに対応するノード（この場合、ノード９０４）から開始して、受け取り側ユーザのノード（この場合、ノード９０２）へのパスが特定される。パスを特定するために、ノード間のリンク９５２が特定される。このようにして、アバターを含むエリアと仮想カメラを含むエリアとの間のエリアのサブセットが特定される。 [0110] At step 704, the hierarchy is traversed to identify various audio areas between the avatars. This is shown, for example, in FIG. 9C. Starting from the node corresponding to the virtual area of speech (in this case node 904), a path is identified to the receiving user's node (in this case node 902). To identify the path, links 952 between nodes are identified. In this way a subset of areas between the area containing the avatar and the area containing the virtual camera is identified.

[0111] ステップ７０６において、話しているユーザからのオーディオストリームは、エリアのサブセットの各壁伝達係数に基づいて減衰する。各壁伝達係数は、オーディオストリームが減衰する量を指定する。 [0111] At step 706, the audio stream from the speaking user is attenuated based on each wall transmission coefficient of the subset of areas. Each wall transfer coefficient specifies the amount by which the audio stream is attenuated.

[0112] 追加又は代替として、異なるエリアはその場合、異なる低下係数を有し、方法６００に示される距離に基づく計算は、各低下係数に基づいて個々のエリアに適用し得る。このようにして、仮想環境の異なるエリアは異なる率で音を出す。図５に関して上述した方法において特定されるオーディオ利得がオーディオストリームに適用されて、それに従って左右のオーディオを特定し得る。このようにして、音声に方向の間隔を提供するための壁伝達係数、低下係数の両方、及び左右調整は一緒に適用されて、包括的なオーディオ体験を提供し得る。 [0112] Additionally or alternatively, different areas may then have different reduction factors, and the distance-based calculations shown in method 600 may be applied to individual areas based on each reduction factor. In this way, different areas of the virtual environment emit sounds at different rates. The audio gain specified in the method described above with respect to FIG. 5 may be applied to the audio stream to specify left and right audio accordingly. In this way, both the wall transfer factor, the drop factor, and the left/right adjustment for providing directional spacing to the sound can be applied together to provide a comprehensive audio experience.

[0113] 異なるオーディオエリアは異なる機能を有し得る。例えば、音量エリアは演壇エリアであり得る。ユーザが演壇エリアに配置されている場合、図５又は図７に関して説明した減衰の幾らか又は全ては生じなくてよい。例えば、低下係数又は壁伝達係数により減衰は生じなくてよい。幾つかの実施形態において、相対的な左右のオーディオはなお調整されて、方向の感覚を提供し得る。 [0113] Different audio areas may have different functions. For example, the volume area can be the podium area. If the user is located in the podium area, some or all of the attenuation described with respect to Figures 5 or 7 may not occur. For example, no damping may occur due to a drop factor or wall transmission factor. In some embodiments, relative left and right audio may still be adjusted to provide a sense of direction.

[0114] 例示を目的として、図５及び図７に関して説明した方法は、対応するアバターを有するユーザからのオーディオストリームを説明している。しかしながら、同じ方法をアバター以外の他の音声源に適用することも可能である。例えば、仮想環境はスピーカの三次元モデルを有し得る。プレゼンテーションにより又は単に背景音楽を提供するために、上述したアバターモデルと同じように音声をスピーカから発し得る。 [0114] For purposes of illustration, the methods described with respect to Figures 5 and 7 describe audio streams from users with corresponding avatars. However, it is also possible to apply the same method to other audio sources than avatars. For example, the virtual environment may have a 3D model of a loudspeaker. Sound can be emitted from speakers similar to the avatar model described above, either through presentation or simply to provide background music.

[0115] 先に触れたように、壁伝達係数を使用してオーディを全体的に分離し得る。一実施形態において、これを使用して、仮想オフィスを作成することができる。一例において、各ユーザは、物理的な（恐らくはホーム）オフィスに、常時オンになっており、仮想オフィスにログインした会議アプリケーションを表示するモニタを有し得る。ユーザが、オフィスにおり、入室不可であるかどうかを示せるようにする特徴があり得る。入室不可インジケータがオフの場合、同僚又はマネージャは仮想空間内をふらっと訪れ、物理的オフィスと同じようにノックするか、又は入室し得る。訪問者は、作業者がオフィスにいない場合、メモを残すことが可能であり得る。作業者が戻ると、作業者は訪問者が残したメモを読むことが可能である。仮想オフィスは、ユーザへのメッセージを表示するホワイトボード及び／又はインターフェースを有し得る。メッセージは電子メールであってもよく、及び／又はSan Francisco, CA所在のSlack Technologies, Inc.から入手可能なSLACKアプリケーション等のメッセージングアプリケーションからであってもよい。 [0115] As alluded to earlier, the wall transmission coefficient may be used to globally isolate the audio. In one embodiment, this can be used to create a virtual office. In one example, each user may have a monitor in their physical (perhaps home) office that is always on and displays a conferencing application logged into the virtual office. There may be features that allow users to indicate if they are in the office and are not allowed to enter. If the no entry indicator is off, a colleague or manager can wander into the virtual space and knock or enter just like in a physical office. A visitor may be able to leave a note if the worker is not in the office. When the worker returns, he can read the notes left by the visitor. A virtual office may have a whiteboard and/or an interface that displays messages to the user. The message may be email and/or from a messaging application such as the SLACK application available from Slack Technologies, Inc. of San Francisco, CA.

[0116] ユーザは、自身の仮想オフィスをカスタマイズ又はパーソナライズ可能であり得る。例えば、ユーザはポスター又は他の壁飾りのモデルを飾ることが可能であり得る。ユーザは、デスク又は植物等の装飾用置物のモデル又は向きを変更可能であり得る。ユーザは、照明又は窓からの眺めを変更可能であり得る。 [0116] Users may be able to customize or personalize their virtual office. For example, the user may be able to display models of posters or other wall decorations. A user may be able to change the model or orientation of an ornament, such as a desk or a plant. The user may be able to change the lighting or the view from the window.

[0117] 図８Ａに戻ると、インターフェース８００は種々のボタン８０４、８０６、及び８０８を含む。ユーザがボタン８０４を押下すると、図５及び図７の方法に関して上述した減衰を行うことができ、又はより小量でのみ行うことができる。その状況において、ユーザの声は他のユーザに対して均一に出力され、ユーザが話をミーティングの全参加者に提供できるようにする。ユーザビデオも、下述するように、仮想環境内のプレゼンテーション画面に同様に出力することができる。ユーザがボタン８０６を押下すると、スピーカモードが有効化される。その場合、オーディオは、背景音楽を再生する等のために、仮想環境内の音声源から出力される。ユーザがボタン８０８を押下すると、画面共有を有効化し得、ユーザが各自のデバイス上の画面又はウィンドウの内容を他のユーザと共有できるようにする。内容はプレゼンテーションモデルに提示し得る。これについても下述する。 [0117] Returning to FIG. When the user presses button 804, the attenuation described above with respect to the methods of FIGS. 5 and 7 can occur, or can occur only to a lesser extent. In that situation, the user's voice is output uniformly to the other users, allowing the user to speak to all participants in the meeting. User videos can similarly be output to presentation screens within the virtual environment, as described below. When the user presses button 806, speaker mode is activated. In that case, audio is output from an audio source within the virtual environment, such as to play background music. When the user presses button 808, screen sharing may be enabled, allowing the user to share the contents of the screen or window on their device with other users. Content can be presented in a presentation model. This is also discussed below.

三次元環境における提示
[0118] 図１０は、三次元仮想環境における三次元モデル１００４を有するインターフェース１０００を示す。図１に関して上述したように、インターフェース１０００は、仮想環境のあちこちをナビゲートすることができるユーザに表示され得る。インターフェース１０００に示されるように、仮想環境はアバター１００４及び三次元モデル１００２を含む。 Presentation in a 3D environment
[0118] Figure 10 shows an interface 1000 with a three-dimensional model 1004 in a three-dimensional virtual environment. As described above with respect to FIG. 1, interface 1000 may be displayed to a user who can navigate around the virtual environment. As shown in interface 1000 , the virtual environment includes avatar 1004 and three-dimensional model 1002 .

[0119] 三次元モデル１００２は、仮想空間内部に置かれた製品の３Ｄモデルである。人々はこの仮想空間に加わり、モデルを観察することが可能であり、その周囲を歩くことができる。製品は、体験を強化するために局所的音声を有し得る。 [0119] The three-dimensional model 1002 is a 3D model of the product placed inside the virtual space. People can join this virtual space, observe the model, and walk around it. Products may have localized audio to enhance the experience.

[0120] より詳細には、仮想空間におけるプレゼンターが３Ｄモデルを示したい場合、ユーザはインターフェースから所望のモデルを選択する。これは、詳細（モデルの名称及びパスを含む）を更新するメッセージをサーバに送信する。これはクライアントに自動的に通信される。このようにして、三次元モデルは、ビデオストリームの提示と同時に表示されるようにレンダリングし得る。ユーザは、製品の三次元モデルの周囲で仮想カメラをナビゲートすることができる。 [0120] More specifically, if the presenter in the virtual space wishes to show a 3D model, the user selects the desired model from the interface. It sends a message to the server updating the details (including the name and path of the model). This is automatically communicated to the client. In this way, the three-dimensional model can be rendered for display concurrently with the presentation of the video stream. A user can navigate a virtual camera around the three-dimensional model of the product.

[0121] 異なる例において、物体は製品デモであってもよく、又は製品の広告であってもよい。 [0121] In different examples, the object may be a product demo or an advertisement for the product.

[0122] 図１１は、テレビ会議で使用される三次元仮想環境におけるプレゼンテーション画面共有を有するインターフェース１１００を示す。図１に関して上述したように、インターフェース１１００は、仮想環境のあちこちをナビゲートすることができるユーザに表示し得る。インターフェース１１００に示されるように、仮想環境はアバター１１０４及びプレゼンテーション画面１１０６を含む。 [0122] Figure 11 shows an interface 1100 with presentation screen sharing in a three-dimensional virtual environment used in a video conference. As described above with respect to FIG. 1, interface 1100 may be displayed to a user who can navigate around the virtual environment. As shown in interface 1100 , the virtual environment includes avatar 1104 and presentation screen 1106 .

[0123] この実施形態において、会議への参加者のデバイスからのプレゼンテーションストリームが受信される。プレゼンテーションストリームは、プレゼンテーション画面１１０６の三次元モデルにテクスチャマッピングされる。一実施形態において、プレゼンテーションストリームは、ユーザのデバイスのカメラからのビデオストリームであり得る。別の実施形態において、プレゼンテーションストリームは、ユーザのデバイスからの画面共有であり得、その場合、モニタ又はウィンドウが共有される。画面共有又は他の方法を通して、プレゼンテーションビデオ及びオーディオストリームは外部ソース、例えばイベントのライブストリームからであることもできる。ユーザがプレゼンテーションモードを有効化する場合、そのユーザのプレゼンテーションストリーム（及びオーディオストリーム）が、ユーザが使用したい画面の名称がタグ付けられてサーバに公開される。他のクライアントには、新しいストリームが利用可能なことが通知される。 [0123] In this embodiment, a presentation stream is received from the devices of the participants in the conference. The presentation stream is texture mapped onto a 3D model of the presentation screen 1106 . In one embodiment, the presentation stream may be a video stream from the camera of the user's device. In another embodiment, the presentation stream may be a screen share from the user's device, where a monitor or window is shared. The presentation video and audio stream can also be from an external source, such as a live stream of an event, through screen sharing or other methods. When a user activates presentation mode, that user's presentation stream (and audio stream) is published to the server tagged with the name of the screen the user wishes to use. Other clients are notified that new streams are available.

[0124] プレゼンターは、聴衆メンバの場所及び向きを制御することも可能であり得る。例えば、プレゼンターは、プレゼンテーション画面に面するように位置決めされ方向付けられるようにミーティングへの他の全ての参加者を再配置するために選択される選択肢を有し得る。 [0124] The presenter may also be able to control the location and orientation of the audience members. For example, the presenter may have the option selected to rearrange all other participants in the meeting so that they are positioned and oriented to face the presentation screen.

[0125] オーディオストリームは、プレゼンテーションストリームと同期して、第１の参加者のデバイスのマイクロホンから捕捉される。ユーザのマイクロホンからのオーディオストリームは、プレゼンテーション画面１１０６からのものとして他のユーザによって聞かれ得る。このようにして、プレゼンテーション画面１１０６は上述したように音源であり得る。ユーザのオーディオストリームはプレゼンテーション画面１１０６から投影されるため、ユーザのアバターからのものは抑制し得る。このようにして、オーディオストリームは、三次元仮想空間内の画面１１０６上のプレゼンテーションストリームの表示と同期して出力され再生される。 [0125] An audio stream is captured from the microphone of the first participant's device in sync with the presentation stream. The audio stream from the user's microphone can be heard by other users as from presentation screen 1106 . In this way, presentation screen 1106 can be a sound source as described above. Since the user's audio stream is projected from the presentation screen 1106, that from the user's avatar may be suppressed. In this way, the audio stream is output and played in synchronism with the display of the presentation stream on screen 1106 in the three-dimensional virtual space.

ユーザ間の距離に基づく帯域幅の割り振り
[0126] 図１２は、三次元仮想環境内のアバターの相対位置に基づいて、利用可能な帯域幅を分配する方法１２００を示すフローチャートである。 Bandwidth allocation based on distance between users
[0126] Figure 12 is a flowchart illustrating a method 1200 for allocating available bandwidth based on the relative position of avatars within a three-dimensional virtual environment.

[0127] ステップ１２０２において、仮想会議空間における第１のユーザと第２のユーザとの間の距離が特定される。距離は、三次元空間における水平面上のユーザ間の距離であり得る。 [0127] At step 1202, the distance between the first user and the second user in the virtual conference space is determined. The distance may be the distance between users on a horizontal plane in three-dimensional space.

[0128] ステップ１２０４において、受信したビデオストリームは、近いユーザからのものほど、遠いユーザからのビデオストリームよりも高い優先度が付されるように優先度付けられる。優先値は図１３に示されるように特定し得る。 [0128] At step 1204, the received video streams are prioritized such that those from closer users are given higher priority than video streams from farther users. Priority values may be specified as shown in FIG.

[0129] 図１３は、ｙ軸上の優先度１３０６及び距離１３０２を示すチャート１３００を示す。線１３０６で示されるように、基準距離１３０４に達するまでは一定を維持する優先度状態。基準距離に達した後、優先度は低下し始める。 [0129] Figure 13 shows a chart 1300 showing priority 1306 and distance 1302 on the y-axis. A priority state that remains constant until a reference distance 1304 is reached, as indicated by line 1306 . After reaching the reference distance, the priority starts to drop.

[0130] ステップ１２０６において、ユーザデバイスへの利用可能な帯域幅は、種々のビデオストリーム間で分配される。これは、ステップ１２０４において特定された優先値に基づいて行われ得る。例えば、優先度は、全て一緒に合算すると１になるように比例的に調整し得る。利用可能な帯域幅が不十分である任意のビデオでは、相対優先度をゼロにし得る。次いで、残りのビデオストリームに対して優先度は再び調整される。帯域幅はこれらの相対優先値に基づいて割り振られる。加えて、帯域幅はオーディオストリーム用に確保し得る。これは図１４に示されている。 [0130] At step 1206, the available bandwidth to the user device is divided among the various video streams. This may be done based on the priority value identified in step 1204 . For example, the priorities may be adjusted proportionally so that they all add up to one. Any video with insufficient available bandwidth may have a relative priority of zero. The priorities are then adjusted again for the remaining video streams. Bandwidth is allocated based on these relative priority values. Additionally, bandwidth may be reserved for audio streams. This is shown in FIG.

[0131] 図１４は、帯域幅１４０６を表すｙ軸と、相対優先度を表すｘ軸とを有するチャート１４００を示す。有効とするには最小の帯域幅１４０６がビデオに割り当てられた後、ビデオストリームに割り振られる帯域幅１４０６は、その相対優先度に比例して増大する。 [0131] Figure 14 shows a chart 1400 with a y-axis representing bandwidth 1406 and an x-axis representing relative priority. After the minimum bandwidth 1406 has been allocated to a video to be effective, the bandwidth 1406 allocated to a video stream increases in proportion to its relative priority.

[0132] 割り振られる帯域幅が特定されると、クライアントは、そのビデオに選択され割り振られた帯域幅／ビットレート／フレームレート／解像度でサーバからビデオを要求し得る。これは、クライアントとサーバとの間でネゴシエーションプロセスを開始して、指定された帯域幅でビデオのストリーミングを開始し得る。このようにして、利用可能なビデオ及びオーディオの帯域幅は全ユーザに適正に分割され、優先度が２倍のユーザは２倍多くの帯域幅を取得することになる。 [0132] Once the allocated bandwidth is identified, the client may request the video from the server at the selected and allocated bandwidth/bitrate/framerate/resolution for that video. This may initiate a negotiation process between the client and server to begin streaming video over the specified bandwidth. In this way, the available video and audio bandwidth is split fairly among all users, with users with twice the priority getting twice as much bandwidth.

[0133] 可能な一実装形態において、同時放送を使用して、全てのクライアントは、異なるビットレート及び解像度で複数のビデオストリームをサーバに送信する。他のクライアントは次いで、クライアントが興味を有し、受信したいこれらのストリームの１つをサーバに示すことができる。 [0133] In one possible implementation, using simulcast, all clients send multiple video streams at different bitrates and resolutions to the server. Other clients can then indicate to the server one of these streams that the client is interested in and would like to receive.

[0134] ステップ１２０８において、仮想会議空間における第１のユーザと第２のユーザとの間で利用可能な帯域幅が、その距離におけるビデオの表示が非効率的であるようなものであるか否かが特定される。この特定は、クライアント又はサーバのいずれかによって行うことができる。クライアントにより行われる場合、クライアントは、サーバがクライアントへのビデオ送信を止めるためのメッセージを送信する。非効率的である場合、第２のユーザのデバイスへのビデオストリームの送信は中止され、第２のユーザのデバイスは、ビデオストリームを静止画像で置換することが通知される。静止画像は単に、受信した最後のビデオフレーム（又は最後のビデオフレームのうちの１つ）であり得る。 [0134] At step 1208, whether the available bandwidth between the first user and the second user in the virtual conference space is such that displaying video at that distance is inefficient. is specified. This identification can be done by either the client or the server. When done by the client, the client sends a message for the server to stop sending video to the client. If so, transmission of the video stream to the second user's device is discontinued and the second user's device is notified to replace the video stream with a still image. The still image may simply be the last video frame (or one of the last video frames) received.

[0135] 一実施形態において、同様のプロセスをオーディオに対して実行し得、オーディオに確保された部分のサイズを所与として、品質を下げ得る。別の実施形態において、各オーディオストリームに一定の帯域幅が与えられる。 [0135] In one embodiment, a similar process may be performed for audio to reduce the quality given the size of the portion reserved for the audio. In another embodiment, each audio stream is given a fixed bandwidth.

[0136] このようにして、実施形態は、全ユーザ及びサーバの性能を上げ、遠く離れたユーザ及び／又は重要性が低いユーザの場合、ビデオストリーム及びオーディオストリームの品質を下げることができる。これは、十分な帯域幅バジェットが利用可能な場合には行われない。低下はビットレート及び解像度の両方で行われる。そのユーザに利用可能な帯域幅をエンコーダによってより効率的に利用することができるため、これはビデオの品質を改善する。 [0136] In this manner, embodiments can increase performance for all users and servers, and reduce the quality of video and audio streams for remote and/or less important users. This is not done if sufficient bandwidth budget is available. The reduction is done in both bitrate and resolution. This improves the quality of the video as the available bandwidth for that user can be utilized more efficiently by the encoder.

[0137] これとは独立して、ビデオ解像度は距離に基づいて縮小され、２倍遠いユーザは半分の解像度を有する。このようにして、不必要な解像度は、画面解像度の制限を所与として、ダウンロードしなくてよい。したがって、帯域幅が保存される。 [0137] Independently, the video resolution is reduced based on distance, with users twice as far away having half the resolution. In this way, unnecessary resolutions may not be downloaded given the limitations of screen resolution. Bandwidth is thus conserved.

[0138] 図１５は、仮想環境内でテレビ会議を提供するのに使用されるデバイスの構成要素を示すシステム１５００の図である。種々の実施形態において、システム１５００は上述した方法に従って動作することができる。 [0138] Figure 15 is a diagram of a system 1500 showing components of a device used to provide videoconferencing within a virtual environment. In various embodiments, system 1500 can operate according to the methods described above.

[0139] デバイス３０６Ａはユーザ計算デバイスである。デバイス３０６Ａは、デスクトップ若しくはラップトップコンピュータ、スマートフォン、タブレット、又はウェアラブル（例えばウォッチ又はヘッドマウントディスプレイ）であることができる。デバイス３０６Ａは、マイクロホン１５０２、カメラ１５０４、ステレオスピーカ１５０６、入力デバイス１５１２を含む。示されていないが、デバイス３０６Ａは、プロセッサ及び永続的な一時的不揮発性メモリも含む。プロセッサは、１つ又は複数の中央演算処理装置、グラフィック処理ユニット、又はそれらの任意の組合せを含むことができる。 [0139] Device 306A is a user computing device. Device 306A can be a desktop or laptop computer, smart phone, tablet, or wearable (eg, watch or head-mounted display). Device 306 A includes microphone 1502 , camera 1504 , stereo speakers 1506 and input device 1512 . Although not shown, device 306A also includes a processor and permanent temporary nonvolatile memory. A processor may include one or more central processing units, graphics processing units, or any combination thereof.

[0140] マイクロホン１５０２は音声を電気信号に変換する。マイクロホン１５０２は、デバイス３０６Ａのユーザの発話を捕捉するように位置決めされる。異なる例において、マイクロホン１５０２はコンデンサマイクロホン、エレクトレットマイクロホン、可動コイルマイクロホン、リボンマイクロホン、カーボンマイクロホン、圧電マイクロホン、光ファイバマイクロホン、レーザマイクロホン、ウォーターマイクロホン、又はＭＥＭＳマイクロホンであることができる。 [0140] The microphone 1502 converts sound into an electrical signal. Microphone 1502 is positioned to capture speech of the user of device 306A. In different examples, the microphone 1502 can be a condenser microphone, an electret microphone, a moving coil microphone, a ribbon microphone, a carbon microphone, a piezoelectric microphone, a fiber optic microphone, a laser microphone, a water microphone, or a MEMS microphone.

[0141] カメラ１５０４は、一般に１つ又は複数のレンズを通して光を捕捉することによって画像データを捕捉する。カメラ１５０４は、デバイス３０６Ａのユーザの写真画像を捕捉するように位置決めされる。カメラ１５０４はイメージセンサ（図示せず）を含む。イメージセンサは例えば、電荷結合素子（ＣＣＤ）センサ又は相補型金属酸化膜半導体（ＣＭＯＳ）センサであり得る。イメージセンサは、光を検出して電気信号に変換する１つ又は複数の光検出器を含み得る。同様の時間枠内で一緒に捕捉されたこれらの電気信号は、静止写真画像を構成する。定期的な間隔で一緒に捕捉された一連の静止写真画像はビデオを構成する。このようにして、カメラ１５０４は画像及びビデオを捕捉する。 [0141] Camera 1504 typically captures image data by capturing light through one or more lenses. Camera 1504 is positioned to capture a photographic image of the user of device 306A. Camera 1504 includes an image sensor (not shown). The image sensor can be, for example, a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor. An image sensor may include one or more photodetectors that detect light and convert it into electrical signals. These electrical signals captured together within a similar time frame constitute a still photographic image. A series of still photographic images captured together at regular intervals constitutes a video. In this manner, camera 1504 captures images and video.

[0142] ステレオスピーカ１５０６は、電気オーディオ信号を対応する左右音に変換するデバイスである。ステレオスピーカ１５０６は、オーディオプロセッサ１５２０（以下）によって生成され、デバイス３０６Ａのユーザに対してステレオで再生される左オーディオストリーム及び右オーディオストリームを出力する。ステレオスピーカ１５０６は、周囲スピーカと、音声をユーザの左右の耳に直接再生するように設計されたヘッドフォンとを両方とも含む。スピーカの例には、可動鉄片型ラウドスピーカ、圧電スピーカ、静磁気ラウドスピーカ、静電ラウドスピーカ、リボン及び平面磁気ラウドスピーカ、曲げ波スピーカ、フラットパネルスピーカ、ハイルエアモーショントランスデューサ、透明イオン伝導スピーカ、プラズマアークスピーカ、熱音響スピーカ、ロータリーウーファー、可動コイル、静電、エレクトレット、平面磁気、及びバランスドアーマチェアがある。 [0142] Stereo speakers 1506 are devices that convert electrical audio signals into corresponding left and right sounds. Stereo speakers 1506 output left and right audio streams generated by audio processor 1520 (below) and played in stereo to a user of device 306A. Stereo speakers 1506 include both ambient speakers and headphones designed to play sound directly to the left and right ears of the user. Examples of loudspeakers include moving iron loudspeakers, piezoelectric loudspeakers, magnetostatic loudspeakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, Heil air motion transducers, transparent ion conduction loudspeakers, There are plasma arc speakers, thermoacoustic speakers, rotary woofers, moving coils, electrostatics, electrets, planar magnetics, and balanced armchairs.

[0143] ネットワークインターフェース１５０８は、２つの機器間又はコンピュータネットワーク内の２つのプロトコルレイヤ間のソフトウェア又はハードウェアインターフェースである。ネットワークインターフェース１５０８は、ミーティングの各参加者のビデオストリームをサーバ３０２から受信する。ビデオストリームは、テレビ会議への別の参加者のデバイスのカメラから捕捉される。ネットワークインターフェース１５０８は、三次元仮想空間及びその内部の任意のモデルを指定するデータもサーバ３０２から受信した。その他の各参加者について、ネットワークインターフェース１５０８は、三次元仮想空間における位置及び方向を受信する。位置及び方向は、他の各参加者によって入力される。 [0143] Network interface 1508 is a software or hardware interface between two devices or between two protocol layers in a computer network. Network interface 1508 receives the video stream of each meeting participant from server 302 . A video stream is captured from the camera of another participant's device to the videoconference. Network interface 1508 also received data from server 302 specifying the three-dimensional virtual space and any models therein. For each other participant, network interface 1508 receives the position and orientation in the three-dimensional virtual space. Position and orientation are entered by each of the other participants.

[0144] ネットワークインターフェース１５０８はまた、データをサーバ３０２に送信する。ネットワークインターフェース１５０８は、レンダラー１５１８によって使用されるデバイス３０６Ａのユーザの仮想カメラの位置を送信し、ビデオストリーム及びオーディオストリームをカメラ１５０４及びマイクロホン１５０２から送信する。 Network interface 1508 also transmits data to server 302 . Network interface 1508 transmits the position of the user's virtual camera of device 306 A used by renderer 1518 and transmits video and audio streams from camera 1504 and microphone 1502 .

[0145] ディスプレイ１５１０は、視覚的又は触覚的形態で電子情報を提示するための出力デバイスである（触覚的形態は例えば、視覚障害者の人々用の触覚電子ディスプレイに使用される）。ディスプレイ１５１０はテレビジョンセット、コンピュータモニタ、ヘッドマウントディスプレイ、ヘッドアップディスプレイ、拡張現実又は仮想現実ヘッドセットの出力、ブロードキャスト参照モニタ、医療用モニタ、モイバルディスプレイ（モバイルデバイスの）、スマートフォンディスプレイ（スマートフォンの）であることができる。情報を提示するために、ディスプレイ１５１０は、電子ルミネッセント（ＥＬＤ）ディスプレイ、液晶ディスプレイ（ＬＣＤ）、発光ダイオード（ＬＥＤ）バックライトＬＣＤ、薄膜トランジスタ（ＴＦＴ）ＬＣＤ、発光ダイオード（ＬＥＤ）ディスプレイ、ＯＬＥＤディスプレイ、ＡＭＯＬＥＤディスプレイ、プラズマ（ＰＤＰ）ディスプレイ、量子ドット（ＱＬＥＤ）ディスプレイを含み得る。 [0145] Display 1510 is an output device for presenting electronic information in visual or tactile form (tactile forms are used, for example, in tactile electronic displays for visually impaired people). The display 1510 can be a television set, computer monitor, head-mounted display, heads-up display, augmented or virtual reality headset output, broadcast reference monitor, medical monitor, mobile display (for mobile devices), smart phone display (for smart phones). ). For presenting information, the display 1510 may be an electroluminescent (ELD) display, a liquid crystal display (LCD), a light emitting diode (LED) backlight LCD, a thin film transistor (TFT) LCD, a light emitting diode (LED) display, an OLED display, an AMOLED display. Display may include plasma (PDP) display, quantum dot (QLED) display.

[0146] 入力デバイス１５１２は、データ及び制御信号をコンピュータ又は情報アプライアンス等の情報処理システムに提供するのに使用される機器である。入力デバイス１５１２は、ユーザが、レンダラー１５１８によって使用される仮想カメラの新しい所望位置を入力できるようにし、それにより、三次元環境におけるナビゲーションが可能になる。入力デバイスの例には、キーボード、マウス、スキャナ、ジョイスティック、及びタッチスクリーンがある。 [0146] Input devices 1512 are equipment used to provide data and control signals to an information processing system, such as a computer or information appliance. Input device 1512 allows a user to enter a new desired position for the virtual camera used by renderer 1518, thereby allowing navigation in a three-dimensional environment. Examples of input devices include keyboards, mice, scanners, joysticks, and touch screens.

[0147] ウェブブラウザ３０８Ａ及びウェブアプリケーション３１０Ａについて図３に関して上述した。ウェブアプリケーション３１０Ａは、スクリーンキャプチャ１５１４、テクスチャマッパ１５１６、レンダラー１５１８、及びオーディオプロセッサ１５２０を含む。 [0147] Web browser 308A and web application 310A are described above with respect to FIG. Web application 310 A includes screen capture 1514 , texture mapper 1516 , renderer 1518 and audio processor 1520 .

[0148] スクリーンキャプチャ１５１４は、プレゼンテーションストリーム、特に画面共有を捕捉する。スクリーンキャプチャ１５１４は、ウェブブラウザ３０８Ａによって提供されるＡＰＩと対話し得る。ＡＰＩから利用可能な関数を呼び出すことにより、スクリーンキャプチャ１５１４はウェブブラウザ３０８Ａに、どのウィンドウ又は画面をシェアしたいかをユーザに尋ねさせ得る。そのクエリへの答えに基づいて、ウェブブラウザ３０８Ａは、画面共有に対応するビデオストリームをスクリーンキャプチャ１５１４に返し得、スクリーンキャプチャ１５１４は、サーバ３０２に、そして最終的には他の参加者のデバイスに送信するためにネットワークインターフェース１５０８にそれを渡す。 [0148] Screen capture 1514 captures a presentation stream, specifically a screen share. Screen capture 1514 may interact with APIs provided by web browser 308A. By calling a function available from the API, screen capture 1514 may cause web browser 308A to ask the user which window or screen they would like to share. Based on the answer to that query, web browser 308A may return a video stream corresponding to screen sharing to screen capture 1514, which is sent to server 302 and ultimately to other participants' devices. Pass it to network interface 1508 for transmission.

[0149] テクスチャマッパ１５１６は、ビデオストリームを、アバターに対応する三次元モデルにテクスチャマッピングする。テクスチャマッパ１５１６は、ビデオからの各フレームをアバターにテクスチャマッピングし得る。加えて、テクスチャマッパ１５１６は、プレゼンテーションストリームをプレゼンテーション画面の三次元モデルにテクスチャマッピングし得る。 [0149] Texture mapper 1516 texture maps the video stream onto a three-dimensional model corresponding to the avatar. A texture mapper 1516 may texture map each frame from the video to the avatar. Additionally, texture mapper 1516 may texture map the presentation stream to a three-dimensional model of the presentation screen.

[0150] レンダラー１５１８は、デバイス３０６Ａのユーザの仮想カメラの視点から、ディスプレイ１５１０に出力するために、受信された対応する位置に配置され方向を向いた各参加者のアバターのテクスチャマッピングされた三次元モデルを含む三次元仮想空間をレンダリングする。レンダラー１５１８は、例えばプレゼンテーション画面を含む任意の他の三次元モデルもレンダリングする。 [0150] Renderer 1518 renders the texture-mapped 3D image of each participant's avatar positioned and oriented at the corresponding position received for output to display 1510, from the perspective of the virtual camera of the user of device 306A. Render a 3D virtual space containing the original model. Renderer 1518 also renders any other three-dimensional model, including presentation screens, for example.

[0151] オーディオプロセッサ１５２０は受信したオーディオストリームの音量を調整して、左オーディオストリーム及び右オーディオストリームを特定して、第２の位置が第１の位置に対して三次元仮想空間にある場所の感覚を提供する。一実施形態において、オーディオプロセッサ１５２０は、第１の位置に対する第２の位置間の距離に基づいて音量を調整する。別の実施形態において、オーディオプロセッサ１５２０は、第１の位置に対する第２の位置の方向に基づいて音量を調整する。更に別の実施形態において、オーディオプロセッサ１５２０は、三次元仮想空間内の水平面上の第１の位置に対する第２の位置の方向に基づいて音量を調整する。更に別の実施形態において、オーディオプロセッサ１５２０は、アバターが仮想カメラの左側に配置される場合、左オーディオストリームがより大きな音量を有する傾向を有し、アバターが仮想カメラの右側に配置される場合、右オーディオストリームがより大きな音量を有する傾向を有するように、三次元仮想空間において仮想カメラが面している方向に基づいて音量を調整する。最後に、更に別の実施形態において、オーディオプロセッサ１５２０は、仮想カメラが面する方向とアバターが面する方向との間の角度に基づいて、角度が、アバターが面する方向により垂直になるほど、左右のオーディオストリーム間の温良さが大きくなる傾向を有するように音量を調整する。 [0151] Audio processor 1520 adjusts the volume of the received audio streams to identify the left audio stream and the right audio stream of where the second position is in the three-dimensional virtual space relative to the first position. provide sensation. In one embodiment, audio processor 1520 adjusts the volume based on the distance between the second location relative to the first location. In another embodiment, audio processor 1520 adjusts the volume based on the orientation of the second position relative to the first position. In yet another embodiment, the audio processor 1520 adjusts the volume based on the orientation of the second position relative to the first position on a horizontal plane within the three-dimensional virtual space. In yet another embodiment, the audio processor 1520 tends to have the left audio stream have a louder volume when the avatar is positioned to the left of the virtual camera, and when the avatar is positioned to the right of the virtual camera. Adjust the volume based on the direction the virtual camera is facing in the 3D virtual space so that the right audio stream tends to have a higher volume. Finally, in yet another embodiment, the audio processor 1520 uses the angle between the direction the virtual camera faces and the direction the avatar faces, the more perpendicular the angle is the direction the avatar faces, left and right. Adjust the volume so that the warmth between the audio streams tends to be greater.

[0152] オーディオプロセッサ１５２０は、仮想カメラが配置されたエリアに対する、スピーカが配置されたエリアに基づいてオーディオストリームの音量を調整することもできる。この実施形態において、三次元仮想空間は複数のエリアにセグメント化される。これらのエリアは階層を有し得る。スピーカ及び仮想カメラが異なるエリアに配置されている場合、壁伝達係数を適用して、発話オーディオストリームの音量を減衰させ得る。 [0152] Audio processor 1520 may also adjust the volume of the audio stream based on the area where the speakers are located relative to the area where the virtual camera is located. In this embodiment, the three-dimensional virtual space is segmented into multiple areas. These areas may have a hierarchy. A wall transmission factor may be applied to attenuate the volume of the speech audio stream if the speaker and virtual camera are placed in different areas.

[0153] サーバ３０２は、出席者通知器１５２２、ストリーム調整器１５２４、及びストリーム転送器１５２６を含む。 Server 302 includes attendee notifier 1522 , stream coordinator 1524 , and stream forwarder 1526 .

[0154] 出席者通知器１５２２は、参加者がミーティングに加わるとき及びミーティングから去るとき、会議参加者に通知する。新しい参加者がミーティングに加わる場合、出席者通知器１５２２は、新しい参加者が加わったことを示すメッセージを会議へのその他の参加者のデバイスに送信する。出席者通知器１５２２は、ビデオ、オーディオ、及び位置／方向情報のその他の参加者への転送を開始するようにストリーム転送器１５２６にシグナリングする。 [0154] Attendee notifier 1522 notifies meeting participants when they join and leave the meeting. When a new participant joins the meeting, attendee notifier 1522 sends a message to the devices of other participants in the meeting indicating that the new participant has joined. Attendee notifier 1522 signals stream forwarder 1526 to begin forwarding video, audio, and position/orientation information to other participants.

[0155] ストリーム調整器１５２４は、第１のユーザのデバイスのカメラから捕捉されたビデオストリームを受信する。ストリーム調整器１５２４は、仮想会議のデータを第２のユーザに送信するために利用可能な帯域幅を特定する。ストリーム調整器１５２４は、仮想会議空間における第１のユーザと第２のユーザとの間の距離を特定する。そしてストリーム調整器１５２４は、利用可能な帯域幅を第１のビデオストリームと第２のビデオストリームとの間で相対距離に基づいて分配する。このようにして、ストリーム調整器１５２４は、遠いユーザからのビデオストリームよりも近いユーザのビデオストリームに高い優先度を付与する。追加又は代替として、ストリーム調整器１５２４は、恐らくはウェブアプリケーション３１０Ａの一部としてデバイス３０６Ａに配置し得る。 [0155] Stream conditioner 1524 receives the video stream captured from the camera of the first user's device. Stream adjuster 1524 identifies available bandwidth for transmitting virtual conference data to the second user. Stream adjuster 1524 identifies the distance between the first user and the second user in the virtual conference space. Stream adjuster 1524 then distributes the available bandwidth between the first video stream and the second video stream based on relative distance. In this way, stream adjuster 1524 gives higher priority to video streams of near users over video streams from farther users. Additionally or alternatively, stream conditioner 1524 may be located on device 306A, perhaps as part of web application 310A.

[0156] ストリーム転送器１５２６は、受信した位置／方向情報、ビデオ、オーディオ、及び画面共有画面をブロードキャストする（ストリーム調整器１５２４によって調整が行われた状態で）。ストリーム転送器１５２６は、会議アプリケーション３１０Ａからの要求に応答してデバイス３０６Ａに情報を送信し得る。会議アプリケーション３１０Ａは、出席者通知器１５２２からの通知に応答してその要求を送信し得る。 [0156] Stream forwarder 1526 broadcasts the received position/orientation information, video, audio, and screen sharing screens (with adjustments made by stream adjuster 1524). Stream forwarder 1526 may send information to device 306A in response to a request from conferencing application 310A. Conferencing application 310A may send the request in response to notification from attendee notifier 1522 .

[0157] ネットワークインターフェース１５２８は、２つの機器間又はコンピュータネットワーク内の２つのプロトコルレイヤ間のソフトウェア又はハードウェアインターフェースである。ネットワークインターフェース１５２８は、モデル情報を種々の参加者のデバイスに送信する。ネットワークインターフェース１５２８は、種々の参加者からビデオ、オーディオ、及び画面共有画面を受信する。 [0157] Network interface 1528 is a software or hardware interface between two devices or between two protocol layers in a computer network. Network interface 1528 transmits the model information to various participant devices. Network interface 1528 receives video, audio, and screen sharing screens from various participants.

[0158] スクリーンキャプチャ１５１４、テクスチャマッパ１５１６、レンダラー１５１８、オーディオプロセッサ１５２０、出席者通知器１５２２、ストリーム調整器１５２４、及びストリーム転送器１５２６は各々、ハードウェア、ソフトウェア、ファームウェア、又はそれらの任意の組合せで実施することができる。 [0158] Screen Capture 1514, Texture Mapper 1516, Renderer 1518, Audio Processor 1520, Attendee Notifier 1522, Stream Coordinator 1524, and Stream Forwarder 1526 are each implemented in hardware, software, firmware, or any combination thereof. can be implemented in

[0159] 「（ａ）」、「（ｂ）」、「（ｉ）」、「（ｉｉ）」、等の識別子が異なる要素又はステップに使用されることがある。これらの識別子は明確にするために使用されており、必ずしも要素又はステップの順序を示しているわけではない。 [0159] Identifiers such as "(a)", "(b)", "(i)", "(ii)", etc. may be used for different elements or steps. These identifiers are used for clarity and do not necessarily indicate the ordering of the elements or steps.

[0160] 本発明について、指定された機能の実施及びそれらの関係を示す機能構築ブロックを用いて上述した。これらの機能構築ブロックの境界は、説明の便宜のために本明細書において任意に定義されている。指定された機能及びその関係が適宜実行される限り、代替の境界を定義することが可能である。 [0160] The present invention has been described above in terms of functional building blocks that illustrate the implementation of specified functions and their relationships. The boundaries of these functional building blocks have been arbitrarily defined herein for convenience of explanation. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.

[0161] 特定の実施形態の上記説明は、当技術分野の技能内の知識を適用することにより、必要以上の実験なしで且つ本発明の一般概念から逸脱せずに、他人が、特定の実施形態等の種々の用途に向けて容易に変更及び／又は適合できるほど、本発明の一般性質を十分に明らかにするであろう。したがって、そのような適合及び変更が、本明細書に提示された教示及び指針に基づいて、開示された実施形態の意味及び均等物の範囲内であることが意図される。本明細書における表現及び用語が限定ではなく説明を目的としており、したがって、本明細書の用語又は表現は教示及び指針に鑑みて当業者によって解釈されるべきであることを理解されたい。 [0161] The above description of specific embodiments is intended to enable others, by applying knowledge within the skill in the art, to implement specific implementations without undue experimentation and without departing from the general concepts of the invention. The general nature of the invention will be sufficiently clear that it can be readily modified and/or adapted for various uses, such as configuration. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology and terminology used herein are for the purpose of description rather than limitation, and are, therefore, to be interpreted by those skilled in the art in light of the teachings and guidance.

[0162] 本発明の幅及び範囲は、上述した例示的な実施形態のいずれによっても限定されるべきではなく、以下の特許請求の範囲及びそれらの均等物に従ってのみ規定されるべきである。 [0162] The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

A system for enabling video conferencing between a first user and a second user, comprising:
a processor coupled to the memory;
a display screen;
a network interface, comprising: (i) data specifying a three-dimensional virtual space; (ii) a position and orientation in said three-dimensional virtual space, said position and orientation being input by said first user; A network interface configured to receive a video stream captured from a camera of a first user's device, said camera positioned to capture a photographic image of said first user. and,
a web browser implemented on the processor, the web browser configured to download a web application from a server and execute the web application;
with
The web application is
a mapper configured to map the video stream onto a three-dimensional model of an avatar;
a renderer and
including
The renderer comprises the avatar to which the video stream is mapped, positioned at the location and oriented for display to the second user, from the perspective of the second user's virtual camera. A system configured to render said three-dimensional virtual space including a three-dimensional model.

said device further comprising a graphics processing unit, said mapper and said renderer comprising WebGL application calls that enable said web application to map or render using said graphics processing unit;
The system of claim 1.

A computer-implemented method for enabling a video conference between a first user and a second user, comprising:
transmitting a web application to a first client device of the first user and a second client device of the second user;
From the first client device running the web application, (i) a position and orientation in a three-dimensional virtual space as input by the first user, and (ii) the first receiving a video stream captured from a camera of a client device of said camera positioned to capture a photographic image of said first user;
transmitting the position, the orientation, and the video stream to the second client device of the second user, wherein the web application includes executable instructions;
including
The executable instructions, when executed in a web browser, map the video stream to a three-dimensional model of an avatar and display it to the second user from the perspective of the second user's virtual camera. and rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and to which the video stream oriented in the direction is mapped.

4. The method of claim 3, wherein the web application includes WebGL application calls that enable the web application to be mapped or rendered using a graphics processing unit of the second client device.

A computer-implemented method for enabling a video conference between a first user and a second user, comprising:
receiving data specifying a three-dimensional virtual space;
receiving a position and orientation in said three-dimensional virtual space, said position and said orientation being input by said first user;
receiving a captured video stream from a camera of the first user's device, the camera positioned to capture a photographic image of the first user;
mapping the video stream onto a three-dimensional model of an avatar by a web application running in a web browser;
of the avatar positioned at the position and facing the direction for display to the second user by the web application running on the web browser, from the perspective of the second user's virtual camera; rendering the three-dimensional virtual space containing a three-dimensional model;
A method, including

receiving an audio stream captured synchronously with the video stream from a microphone of the device of the first user, the microphone positioned to capture speech of the first user. , receiving and
outputting the audio stream for playback to the second user in synchronization with display of the video stream in the three-dimensional virtual space;
6. The method of claim 5, further comprising:

if input is received from the second user indicating a desire to change the viewpoint of the virtual camera;
changing the viewpoint of the virtual camera of the second user;
re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and facing the direction for display to the second user from the altered viewpoint of the virtual camera; and
6. The method of claim 5, further comprising:

8. The method according to claim 7, wherein said viewpoint of said virtual camera is defined by at least coordinates on a horizontal plane and pan and tilt values in said three-dimensional virtual space.

when a new position and orientation of the first user in the three-dimensional virtual space is received,
6. The method of claim 5, further comprising re-rendering the three-dimensional virtual space including a three-dimensional model of the avatar positioned at the new location and oriented at the new orientation for display to the second user. the method of.

6. The method of claim 5, wherein said mapping comprises repeatedly mapping pixels to said three-dimensional model of said avatar for each frame of said video stream.

6. The method of claim 5, wherein the data, the position and orientation, and the video stream are received in a web browser from a server, and the mapping and rendering are performed by the web browser.

receiving a notification from the server indicating that the first user no longer exists;
re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display in the web browser to the second user;
12. The method of claim 11, further comprising:

receiving a notification from the server indicating that a third user has entered the three-dimensional virtual space;
receiving a second position and a second orientation of the third user in the three-dimensional virtual space;
receiving a second video stream captured from a camera of the third user's device, the camera positioned to capture a photographic image of the third user; ,
mapping the second video stream onto a second three-dimensional model of a second avatar;
including the second three-dimensional model positioned at the second location and oriented in the second direction for display to the second user from the viewpoint of the virtual camera of the second user; rendering the three-dimensional virtual space;
13. The method of claim 12, further comprising:

Receiving data specifying the three-dimensional virtual space includes receiving a mesh specifying a conference space; receiving a background image; and rendering mapping the background onto a sphere. 6. The method of claim 5, comprising:

A non-transitory tangible computer readable device storing instructions, the instructions, when executed by at least one computing device, instructing the at least one computing device to communicate between a first user and a second user. to perform an operation to enable video conferencing, said operation comprising:
receiving data specifying a three-dimensional virtual space;
receiving a position and orientation in the three-dimensional virtual space, the position and orientation being input by the first user;
receiving a captured video stream from a camera of the first user's device, the camera positioned to capture a photographic image of the first user;
mapping the video stream onto a three-dimensional model of an avatar;
Rendering, from the perspective of the second user's virtual camera, the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and oriented for display to the second user. and
device, including

The operation is
receiving an audio stream captured synchronously with the video stream from a microphone of the device of the first user, the microphone positioned to capture speech of the first user. , receiving and
outputting the audio stream for playback to the second user in synchronization with display of the video stream in the three-dimensional virtual space;
16. The device of claim 15, further comprising:

The action comprises: receiving input from the second user indicating a desire to change the viewpoint of the virtual camera;
changing the viewpoint of the virtual camera of the second user;
re-rendering the three-dimensional virtual space including the three-dimensional model of the avatar positioned at the location and facing the direction for display to the second user from the altered viewpoint of the virtual camera; and
16. The device of claim 15, further comprising:

18. The device of claim 17, wherein the viewpoint of the virtual camera is defined by at least horizontal coordinates and pan and tilt values in the three-dimensional virtual space.

When the action receives a new position and orientation of the first user in the three-dimensional virtual space,
16. The method of claim 15, further comprising re-rendering the three-dimensional virtual space including a three-dimensional model of the avatar positioned at the new location and oriented at the new orientation for display to the second user. Devices listed.

16. The device of claim 15, wherein said mapping comprises repeatedly mapping pixels to said three-dimensional model of said avatar for each frame of said video stream.

16. The device of claim 15, wherein the data, the position and orientation, and the video stream are received at a web browser from a server, and wherein the mapping and rendering is performed by the web browser.

The operation is
receiving a notification from the server indicating that the first user no longer exists;
re-rendering the three-dimensional virtual space without the three-dimensional model of the avatar for display in the web browser to the second user;
22. The device of claim 21, further comprising:

The operation is
receiving a notification from the server indicating that a third user has entered the three-dimensional virtual space;
receiving a second position and a second orientation of the third user in the three-dimensional virtual space;
receiving a second video stream captured from a camera of the third user's device, the camera positioned to capture a photographic image of the third user; ,
mapping the second video stream onto a second three-dimensional model of a second avatar;
including the second three-dimensional model positioned at the second location and oriented in the second direction for display to the second user from the viewpoint of the virtual camera of the second user; rendering the three-dimensional virtual space;
23. The device of claim 22, further comprising:

said receiving data specifying said three-dimensional virtual space includes receiving a mesh specifying a conference space; receiving a background image; said rendering mapping said background onto a sphere; 16. The device of claim 15, comprising: