JP2024509416A

JP2024509416A - Camera control data for virtual cameras in virtual interactive scenes defined by streamed media data

Info

Publication number: JP2024509416A
Application number: JP2023552339A
Authority: JP
Inventors: イメド・ボウアジジ; トーマス・ストックハンマー
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2021-03-10
Filing date: 2022-03-09
Publication date: 2024-03-01
Also published as: WO2022192885A1; TW202242677A; KR20230155444A; BR112023017524A2; EP4305847A1

Abstract

メディアデータを取り出すための例示的なデバイスは、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを含み、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、カメラ制御データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止することとを行うように構成される。An exemplary device for retrieving media data includes a memory configured to store media data and one or more processors implemented with circuitry and configured to execute a presentation engine. the presentation engine is configured to receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and to receive camera control data for the three-dimensional scene, the presentation engine comprising: a camera; The control data includes data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object, and requiring the virtual camera to move within the at least one virtual solid object. and, in response to the camera movement data, using the camera control data to prevent the virtual camera from passing through the at least one virtual solid object. It is composed of

Description

本出願は、2022年3月8日に出願された米国特許出願第17/654,020号、および2021年3月10日に出願された米国仮出願第63/159,379号の優先権を主張し、その各々の内容全体が参照により本明細書に組み込まれる。2022年3月8日に出願された米国特許出願第17/654,020号は、2021年3月10日に出願された米国仮出願第63/159,379号の利益を主張する。 This application claims priority to and claims priority to U.S. Patent Application No. 17/654,020, filed on March 8, 2022, and U.S. Provisional Application No. 63/159,379, filed on March 10, 2021. The entire contents of each are incorporated herein by reference. U.S. Patent Application No. 17/654,020, filed March 8, 2022, claims the benefit of U.S. Provisional Application No. 63/159,379, filed March 10, 2021.

本開示は、符号化ビデオデータの記憶および転送に関する。 TECHNICAL FIELD This disclosure relates to the storage and transfer of encoded video data.

デジタルビデオ能力は、デジタルテレビジョン、デジタルダイレクトブロードキャストシステム、ワイヤレスブロードキャストシステム、携帯情報端末(PDA)、ラップトップコンピュータまたはデスクトップコンピュータ、デジタルカメラ、デジタル記録デバイス、デジタルメディアプレーヤ、ビデオゲームデバイス、ビデオゲームコンソール、セルラーまたは衛星無線電話、ビデオ会議デバイスなどを含む、幅広いデバイスに組み込むことができる。デジタルビデオデバイスは、デジタルビデオ情報をより効率的に送受信するために、MPEG-2、MPEG-4、ITU-T H.263またはITU-T H.264/MPEG-4, Part10、アドバンストビデオコーディング(AVC)、ITU-T H.265(高効率ビデオコーディング(HEVC)とも呼ばれる)によって定められた規格、および、そのような規格の拡張に記載されているものなどのビデオ圧縮技法を実装する。 Digital video capabilities include digital television, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video game devices, and video game consoles. It can be incorporated into a wide range of devices, including cellular or satellite radio telephones, video conferencing devices, and more. Digital video devices use MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding ( AVC), the standards set forth by ITU-T H.265 (also known as High Efficiency Video Coding (HEVC)), and extensions to such standards.

ビデオ圧縮技法は、空間的予測および/または時間的予測を実施し、ビデオシーケンスに固有の冗長性を低減または除去する。ブロックベースのビデオコーディングの場合、ビデオフレームまたはスライスがマクロブロックに区分され得る。各マクロブロックはさらに区分され得る。イントラコード化(I)フレームまたはスライスにおけるマクロブロックは、近接マクロブロックに関する空間的予測を使用して符号化される。インターコード化(PまたはB)フレームまたはスライスにおけるマクロブロックは、同じフレームまたはスライスにおける近接マクロブロックに関する空間的予測または他の参照フレームに関する時間的予測を使用し得る。 Video compression techniques perform spatial and/or temporal prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, video frames or slices may be partitioned into macroblocks. Each macroblock may be further partitioned. Intra-coded (I) Macroblocks in a frame or slice are encoded using spatial prediction on neighboring macroblocks. Macroblocks in an inter-coded (P or B) frame or slice may use spatial prediction with respect to neighboring macroblocks in the same frame or slice or temporal prediction with respect to other reference frames.

ビデオデータが符号化された後、ビデオデータは送信または記憶のためにパケット化され得る。ビデオデータは、AVCなどの、国際標準化機構(ISO)ベースメディアファイルフォーマットおよびその拡張などの、様々な規格のいずれかに準拠するビデオファイルへと、アセンブルされ得る。 After the video data is encoded, it may be packetized for transmission or storage. Video data may be assembled into video files that conform to any of a variety of standards, such as the International Organization for Standardization (ISO) base media file format and its extensions, such as AVC.

R. Fielding他、RFC 2616、「Hypertext Transfer Protocol-HTTP/1.1」、Network Working Group、IETF、1999年6月R. Fielding et al., RFC 2616, "Hypertext Transfer Protocol-HTTP/1.1", Network Working Group, IETF, June 1999. T. Paila他、「FLUTE-File Delivery over Unidirectional Transport」、Network Working Group、RFC 6726、2012年11月T. Paila et al., “FLUTE-File Delivery over Unidirectional Transport”, Network Working Group, RFC 6726, November 2012

概して、本開示は、インタラクティブメディアデータのストリーミングに関連した技法について記載する。そのようなインタラクティブメディアデータは、たとえば、仮想現実、拡張現実、または他のそのようなインタラクティブコンテンツ、たとえば、他の3次元ビデオコンテンツであってよい。最近のMPEGシーン記述要素は、glTF2.0における時限メディアへのサポートを含む。メディアアクセス機能(MAF)は、プレゼンテーションエンジンにアプリケーションプログラミングインターフェース(API)を提供し、このAPIを通して、プレゼンテーションエンジンは時限メディアを要求し得る。MAFを実行する取出しユニットは、取り出された時限メディアデータを処理し、処理されたメディアデータを、循環バッファを通して、所望のフォーマットでプレゼンテーションエンジンに渡し得る。現在のMPEGシーン記述は、シーンメディアデータを6自由度(6DoF)で消費することをユーザに認めている。したがって、ユーザは一般的に、3Dシーンの中で自由に(たとえば、3Dシーンに表示される壁を通って)動くことができる。ただし、コンテンツ制作者は、たとえば、表示される壁または他のオブジェクトを通る移動を防止するために、いくつかのエリアへの、閲覧者の移動に対して限度を課すことを望む場合がある。本開示は、そのような限度を課すための技法について記載し、これらの技法は、ユーザが仮想世界における障害物の中を通るのを防止することによって、ユーザのエクスペリエンスがより現実的にされ得るので、エクスペリエンスを向上させることができる。 Generally, this disclosure describes techniques related to streaming interactive media data. Such interactive media data may be, for example, virtual reality, augmented reality, or other such interactive content, such as other three-dimensional video content. Recent MPEG scene description elements include support for timed media in glTF2.0. A media access function (MAF) provides an application programming interface (API) to the presentation engine through which the presentation engine can request timed media. A retrieval unit executing MAF may process the retrieved timed media data and pass the processed media data through a circular buffer to a presentation engine in a desired format. Current MPEG scene descriptions allow users to consume scene media data in six degrees of freedom (6DoF). Therefore, a user is generally able to move freely within the 3D scene (eg, through walls displayed in the 3D scene). However, content creators may wish to impose limits on viewer movement to some areas, for example to prevent movement through displayed walls or other objects. This disclosure describes techniques for imposing such limits, which may make the user's experience more realistic by preventing the user from passing through obstacles in the virtual world. So the experience can be improved.

一例では、メディアデータを取り出す方法は、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、3次元シーン用のカメラ制御データを受信するステップであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ移動データに応答して、カメラ制御データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップとを含む。 In one example, a method for retrieving media data includes receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving control data, the camera control data comprising data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object; receiving camera movement data from a user requesting that the camera move within the at least one virtual solid object; and in response to the camera movement data, the virtual camera is moved by the presentation engine using the camera control data. and preventing passage through at least one virtual solid object.

別の例では、メディアデータを取り出すためのデバイスは、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを含み、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、カメラ制御データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止することとを行うように構成される。 In another example, a device for retrieving media data is implemented with memory configured to store media data and one or more circuitry configured to execute a presentation engine. a processor, the presentation engine receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and receiving camera control data for the three-dimensional scene. , the camera control data includes data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object; and the virtual camera moving within the at least one virtual solid object. and, in response to the camera movement data, using the camera control data to prevent the virtual camera from passing through the at least one virtual solid object. configured to do so.

別の例では、命令を記憶したコンピュータ可読記憶媒体であって、命令は、実行されると、クライアントデバイスのプロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、カメラ制御データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止することとを行わせる。 In another example, a computer-readable storage medium having instructions stored thereon, the instructions, when executed, transmit to a processor of a client device a streamed media representing a virtual three-dimensional scene including at least one virtual solid object. receiving data; and receiving camera control data for a three-dimensional scene, the camera control data defining constraints to prevent the virtual camera from passing through the at least one virtual solid object. receiving from a user camera movement data requesting that the virtual camera move within the at least one virtual solid object; and in response to the camera movement data, using the camera control data; and preventing the virtual camera from passing through the at least one virtual solid object.

別の例では、メディアデータを取り出すためのデバイスは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、3次元シーン用のカメラ制御データを受信するための手段であって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ移動データに応答して、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するのにカメラ制御データを使うための手段とを含む。 In another example, the device for retrieving media data includes means for receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object, and camera control data for the three-dimensional scene. means for receiving, wherein the camera control data includes data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object; means for receiving camera movement data from a user requesting movement through the at least one virtual solid object; and, in response to the camera movement data, preventing the virtual camera from passing through the at least one virtual solid object. and means for using the camera control data to.

別の例では、メディアデータを取り出す方法は、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ移動データに応答して、オブジェクト衝突データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップとを含む。 In another example, a method for retrieving media data includes the steps of: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by a presentation engine, camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object; and, in response to the data, using the object collision data to prevent the virtual camera from passing through the at least one virtual solid object by the presentation engine.

別の例では、メディアデータを取り出すためのデバイスは、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを含み、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止することとを行うように構成される。 In another example, a device for retrieving media data is implemented with memory configured to store media data and one or more circuitry configured to execute a presentation engine. a processor and a presentation engine receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and receiving object collision data representing a boundary of the at least one virtual solid object. receiving from a user camera movement data requesting that the virtual camera move within the at least one virtual solid object; and in response to the camera movement data, using the object collision data, the virtual camera is configured to prevent the object from passing through the at least one virtual solid object.

別の例では、コンピュータ可読記憶媒体は命令を記憶し、命令は、実行されると、クライアントデバイスのプロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止することとを行わせる。 In another example, a computer-readable storage medium stores instructions that, when executed, cause a processor of a client device to stream media data representing a virtual three-dimensional scene including at least one virtual solid object. receiving object collision data representing a boundary of at least one virtual solid object; and receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object. and, in response to the camera movement data, using the object collision data to prevent the virtual camera from passing through the at least one virtual solid object.

別の例では、メディアデータを取り出すためのデバイスは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するための手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ移動データに応答して、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するのにオブジェクト衝突データを使うための手段とを含む。 In another example, the device for retrieving media data includes means for receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object, and a boundary of the at least one virtual solid object. means for receiving object collision data representing a virtual solid object, means for receiving from a user camera movement data requesting that a virtual camera move within the at least one virtual solid object; and means responsive to the camera movement data; and means for using the object collision data to prevent the virtual camera from passing through the at least one virtual solid object.

1つまたは複数の例の詳細が、添付図面および以下の説明に記載される。他の特徴、目的、および利点は、説明および図面から、ならびに特許請求の範囲から明らかになろう。 The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

ネットワークを介してメディアデータをストリーミングするための技法を実装する例示的なシステムを示すブロック図である。FIG. 1 is a block diagram illustrating an example system implementing techniques for streaming media data over a network. 図1の取出しユニットの構成要素の例示的なセットをより詳細に示すブロック図である。2 is a block diagram illustrating an exemplary set of components of the extraction unit of FIG. 1 in more detail; FIG. 例示的なマルチメディアコンテンツの要素を示す概念図である。FIG. 2 is a conceptual diagram illustrating elements of example multimedia content. 表現のセグメントに対応し得る例示的なビデオファイルの要素を示すブロック図である。FIG. 2 is a block diagram illustrating elements of an example video file that may correspond to segments of a representation. 本開示の技法による、境界ボリュームをもつ例示的なカメラ経路セグメントを示す概念図である。FIG. 2 is a conceptual diagram illustrating an example camera path segment with a bounding volume in accordance with the techniques of this disclosure. この例では椅子である例示的な仮想オブジェクトを示す概念図である。1 is a conceptual diagram illustrating an exemplary virtual object, in this example a chair; FIG. 本開示の技法による、メディアデータを取り出す例示的な方法を示すフローチャートである。3 is a flowchart illustrating an example method for retrieving media data in accordance with techniques of this disclosure. 本開示の技法による、メディアデータを取り出す例示的な方法を示すフローチャートである。3 is a flowchart illustrating an example method for retrieving media data in accordance with techniques of this disclosure.

インタラクティブメディアデータは、ネットワークを介してストリーミングされ得る。たとえば、クライアントデバイスが、ユニキャスト、ブロードキャスト、マルチキャストなどを使って、インタラクティブメディアデータを取り出し得る。インタラクティブメディアデータは、たとえば、エクステンデッドリアリティ(XR)、拡張現実(AR)、仮想現実(VR)などのための3次元(3D)メディアデータであってよい。したがって、ユーザに対して提示されると、ユーザは、インタラクティブメディアデータに従ってレンダリングされた3D仮想シーンをナビゲートすることができる。 Interactive media data may be streamed over a network. For example, a client device may retrieve interactive media data using unicast, broadcast, multicast, etc. The interactive media data may be, for example, three-dimensional (3D) media data for extended reality (XR), augmented reality (AR), virtual reality (VR), etc. Thus, when presented to a user, the user can navigate the rendered 3D virtual scene according to the interactive media data.

MPEGシーン記述は、仮想世界もしくはエクスペリエンスのため、たとえば、XR、VR、AR、または他のインタラクティブメディアエクスペリエンスのための3次元(3D)シーンを記述し得る。本開示の技法によると、MPEGシーン記述は、椅子、壁、テーブル、カウンタ、ドア、窓、または他の固体オブジェクトなど、3Dシーン内のオブジェクトを記述し得る。本開示は、MPEGシーン記述(またはデータの他のそのような記述セット)が、仮想カメラ移動に対して制限を課すため、たとえば、カメラが壁などの固体オブジェクトの中を通るのを防止するためにそれによって強化され得る技法について記載する。 An MPEG scene description may describe a three-dimensional (3D) scene for a virtual world or experience, eg, for an XR, VR, AR, or other interactive media experience. According to the techniques of this disclosure, the MPEG scene description may describe objects in the 3D scene, such as chairs, walls, tables, counters, doors, windows, or other solid objects. The present disclosure allows the MPEG scene description (or other such description set of data) to impose restrictions on virtual camera movement, e.g., to prevent the camera from passing through solid objects such as walls. describes techniques that can be enhanced thereby.

特に、シーン記述は、カメラがそれに沿って動くことを認められる経路のセットを記述し得る。経路は、経路セグメントを通して接続されるアンカーポイントのセットとして記述され得る。カメラ制御の表現力強化のために、各経路セグメントは、経路に沿って、ある程度の運動の自由を認める境界ボリュームで強化され得る。 In particular, the scene description may describe a set of paths along which the camera is allowed to move. A route may be described as a set of anchor points connected through route segments. For enhanced expressivity of camera control, each path segment can be enriched with bounding volumes that allow some degree of freedom of movement along the path.

追加または代替として、シーン記述は、シーンの中の仮想固体オブジェクトを記述し得る。シーン記述は、たとえば、オブジェクトの境界、オブジェクトがユーザもしくは他のオブジェクトとの衝突によって影響され得るかどうか(そのような衝突に応答して、オブジェクトが、動くか、それとも静止したままであるか、など)、衝突するオブジェクトがオブジェクトとどのように相互作用するかを表す、オブジェクト用の材料、および/または衝突に応答して再生もしくはオブジェクトに適用されるべきアニメーションを表すアニメーションデータを表す情報を提供し得る。 Additionally or alternatively, the scene description may describe virtual solid objects within the scene. The scene description includes, for example, the boundaries of the object, whether the object can be affected by collisions with the user or other objects (whether the object moves or remains stationary in response to such collisions, ), provides information representing the material for the object, describing how the colliding object interacts with the object, and/or animation data representing the animation that should be played or applied to the object in response to the collision. It is possible.

本開示の技法は、ISOベースメディアファイルフォーマット、スケーラブルビデオコーディング(SVC)ファイルフォーマット、アドバンストビデオコーディング(AVC)ファイルフォーマット、第3世代パートナーシッププロジェクト(3GPP(登録商標))ファイルフォーマット、および/もしくはマルチビュービデオコーディング(MVC)ファイルフォーマット、または他の同様のビデオファイルフォーマットのいずれかに従ってカプセル化されたビデオデータに準拠するビデオファイルに適用され得る。 The techniques of this disclosure may be applied to ISO base media file formats, Scalable Video Coding (SVC) file formats, Advanced Video Coding (AVC) file formats, 3rd Generation Partnership Project (3GPP(R)) file formats, and/or multi-view It may be applied to video files that conform to video data encapsulated according to any of the Video Coding (MVC) file formats, or other similar video file formats.

HTTPストリーミングにおいて、頻繁に使用される動作には、HEAD、GET、および部分GETがある。HEAD動作は、所与のユニフォームリソースロケータ(URL)またはユニフォームリソースネーム(URN)に関連付けられたペイロードを取り出さずに、URLまたはURNに関連付けられたファイルのヘッダを取り出す。GET動作は、所与のURLまたはURNに関連付けられたファイル全体を取り出す。部分GET動作は、入力パラメータとしてバイト範囲を受信し、ファイルの連続した数のバイトを取り出し、この場合、バイトの数は受信されるバイト範囲に対応する。したがって、部分GET動作は1つまたは複数の個々のムービーフラグメントを取得できるので、ムービーフラグメントがHTTPストリーミングのために提供されてよい。ムービーフラグメントでは、異なるトラックのいくつかのトラックフラグメントが存在してよい。HTTPストリーミングでは、メディアプレゼンテーションは、クライアントにとってアクセス可能なデータの構造化された集合体であり得る。クライアントは、メディアデータ情報を要求およびダウンロードして、ユーザにストリーミングサービスを提示することができる。 In HTTP streaming, frequently used operations include HEAD, GET, and partial GET. A HEAD operation retrieves the header of a file associated with a given uniform resource locator (URL) or uniform resource name (URN) without retrieving the payload associated with the URL or uniform resource name (URN). A GET operation retrieves the entire file associated with a given URL or URN. A partial GET operation receives a byte range as an input parameter and retrieves a consecutive number of bytes of the file, where the number of bytes corresponds to the received byte range. Accordingly, movie fragments may be provided for HTTP streaming because a partial GET operation can retrieve one or more individual movie fragments. In a movie fragment, there may be several track fragments of different tracks. In HTTP streaming, a media presentation can be a structured collection of data that is accessible to the client. Clients can request and download media data information to present streaming services to users.

HTTPストリーミングを使用して3GPPデータをストリーミングする例では、マルチメディアコンテンツのビデオおよび/またはオーディオデータに関して複数の表現が存在し得る。以下で説明するように、異なる表現は、異なるコーディング特性(たとえば、ビデオコーディング規格の異なるプロファイルまたはレベル)、異なるコーディング規格またはコーディング規格の拡張(マルチビューおよび/もしくはスケーラブル拡張など)、あるいは異なるビットレートに対応し得る。そのような表現のマニフェストは、メディアプレゼンテーション記述(MPD)データ構造において定義され得る。メディアプレゼンテーションは、HTTPストリーミングクライアントデバイスにとってアクセス可能なデータの構造化された集合体に対応し得る。HTTPストリーミングクライアントデバイスは、メディアデータ情報を要求およびダウンロードして、クライアントデバイスのユーザにストリーミングサービスを提示することができる。メディアプレゼンテーションは、MPDの更新を含み得るMPDデータ構造で記述され得る。 In the example of streaming 3GPP data using HTTP streaming, multiple representations may exist for the video and/or audio data of the multimedia content. As explained below, different representations may be based on different coding characteristics (e.g. different profiles or levels of a video coding standard), different coding standards or extensions of the coding standard (e.g. multi-view and/or scalable extensions), or different bit rates. can correspond to A manifest of such representations may be defined in a media presentation description (MPD) data structure. A media presentation may correspond to a structured collection of data accessible to an HTTP streaming client device. An HTTP streaming client device can request and download media data information to present streaming services to a user of the client device. Media presentations may be described in MPD data structures, which may include MPD updates.

メディアプレゼンテーションは、1つまたは複数の期間のシーケンスを含み得る。各期間は、次の期間の開始まで、または最後の期間の場合にはメディアプレゼンテーションの終了まで及び得る。各期間は、同じメディアコンテンツのための1つまたは複数の表現を含み得る。表現は、オーディオ、ビデオ、時限のテキスト、または他のそのようなデータのいくつかの代替符号化バージョンのうちの1つであり得る。表現は、符号化のタイプ、たとえば、ビデオデータのビットレート、解像度、および/またはコーデック、ならびにオーディオデータのビットレート、言語、および/またはコーデックによって異なる場合がある。表現という用語は、マルチメディアコンテンツのある特定の期間に対応し、ある特定のやり方で符号化された、符号化オーディオデータまたは符号化ビデオデータのあるセクションを指すために使用される場合がある。 A media presentation may include a sequence of one or more time periods. Each time period may extend until the start of the next time period or, in the case of the last time period, until the end of the media presentation. Each time period may include one or more representations for the same media content. The representation may be one of several alternative encoded versions of audio, video, timed text, or other such data. The representation may vary depending on the type of encoding, eg, the bit rate, resolution, and/or codec of video data, and the bit rate, language, and/or codec of audio data. The term representation may be used to refer to a section of encoded audio or video data that corresponds to a particular period of multimedia content and is encoded in a particular manner.

ある特定の期間の表現は、表現が属する適応セットを示すMPD内の属性によって示されるグループに割り当てられ得る。同じ適応セット内の表現は、概して、クライアントデバイスが、たとえば帯域幅適応を実施するためにこれらの表現の間で動的かつシームレスに切り替わることができる点で、互いに対する代替物と見なされる。たとえば、ある特定の期間のビデオデータの各表現は、同じ適応セットに割り当てられ得るので、表現のうちのいずれもが、対応する期間のマルチメディアコンテンツの、ビデオデータまたはオーディオデータなど、メディアデータを提示するように復号するために選択され得る。いくつかの例では、1つの期間内のメディアコンテンツは、存在する場合には、グループ0からの1つの表現、または各非ゼロのグループからの最大でも1つの表現の組合せのいずれかによって表され得る。ある期間の各表現のタイミングデータは、期間の開始時間に対して表され得る。 A representation for a particular time period may be assigned to a group indicated by an attribute in the MPD indicating the adaptation set to which the representation belongs. Representations within the same adaptation set are generally considered substitutes for each other in that a client device can dynamically and seamlessly switch between these representations to perform bandwidth adaptation, for example. For example, each representation of video data for a particular time period may be assigned to the same adaptation set so that any of the representations does not contain media data, such as video data or audio data, of multimedia content for the corresponding time period. may be selected for decoding as presented. In some examples, media content within one time period is represented by either one representation from group 0, if present, or a combination of at most one representation from each non-zero group. obtain. Timing data for each representation of a time period may be expressed relative to the start time of the time period.

表現は1つまたは複数のセグメントを含み得る。各表現は、初期化セグメントを含んでよく、または表現の各セグメントは、自己初期化するものであってよい。初期化セグメントは、存在する場合、表現にアクセスするための初期化情報を含み得る。一般に、初期化セグメントは、メディアデータを含まない。セグメントは、ユニフォームリソースロケータ(URL)、ユニフォームリソースネーム(URN)、またはユニフォームリソース識別子(URI)のような、識別子によって一意に参照され得る。MPDは、各セグメントのための識別子を提供し得る。いくつかの例では、MPDはまた、URL、URN、またはURIによってアクセス可能なファイル内のセグメントのためのデータに対応し得る、range属性の形式で、バイト範囲を提供することができる。 A representation may include one or more segments. Each representation may include an initialization segment, or each segment of the representation may be self-initializing. The initialization segment, if present, may include initialization information for accessing the representation. Generally, the initialization segment does not include media data. A segment may be uniquely referenced by an identifier, such as a uniform resource locator (URL), uniform resource name (URN), or uniform resource identifier (URI). MPD may provide an identifier for each segment. In some examples, the MPD may also provide a byte range in the form of a range attribute, which may correspond to data for a segment within a file that is accessible by a URL, URN, or URI.

異なるタイプのメディアデータに関して実質的に同時に取り出すために異なる表現を選択することができる。たとえば、クライアントデバイスは、セグメントを取り出すオーディオ表現、ビデオ表現、および時限のテキスト表現を選択することができる。いくつかの例では、クライアントデバイスは、帯域幅適応を実施するために特定の適応セットを選択することができる。すなわち、クライアントデバイスは、ビデオ表現を含む適応セット、オーディオ表現を含む適応セット、および/または時限のテキストを含む適応セットを選択することができる。代替として、クライアントデバイスは、あるタイプのメディア(たとえば、ビデオ)に関する適応セットを選択し、他のタイプのメディア(たとえば、オーディオおよび/または時限のテキスト)に関する表現を直接選択することができる。 Different representations may be selected for substantially simultaneous retrieval for different types of media data. For example, a client device may select an audio representation, a video representation, and a timed text representation from which to retrieve segments. In some examples, a client device may select a particular adaptation set to perform bandwidth adaptation. That is, the client device may select an adaptation set that includes a video representation, an adaptation set that includes an audio representation, and/or an adaptation set that includes timed text. Alternatively, a client device can select an adaptation set for one type of media (eg, video) and directly select a representation for another type of media (eg, audio and/or timed text).

図1は、ネットワークを介してメディアデータをストリーミングするための技法を実装する例示的なシステム10を示すブロック図である。この例では、システム10は、コンテンツ準備デバイス20、サーバデバイス60、およびクライアントデバイス40を含む。クライアントデバイス40およびサーバデバイス60は、インターネットを含み得るネットワーク74によって通信可能に結合される。いくつかの例では、コンテンツ準備デバイス20およびサーバデバイス60も、ネットワーク74もしくは別のネットワークによって結合されてよく、または直接通信可能に結合されてよい。いくつかの例では、コンテンツ準備デバイス20およびサーバデバイス60は、同じデバイスを含み得る。 FIG. 1 is a block diagram illustrating an example system 10 implementing techniques for streaming media data over a network. In this example, system 10 includes a content preparation device 20, a server device 60, and a client device 40. Client device 40 and server device 60 are communicatively coupled by a network 74, which may include the Internet. In some examples, content preparation device 20 and server device 60 may also be coupled by network 74 or another network, or may be coupled in direct communication. In some examples, content preparation device 20 and server device 60 may include the same device.

図1の例では、コンテンツ準備デバイス20は、オーディオソース22とビデオソース24とを含む。オーディオソース22は、たとえば、オーディオエンコーダ26によって符号化されるべきキャプチャされたオーディオデータを表す電気信号を生成するマイクロフォンを備え得る。代替として、オーディオソース22は、以前に記録されたオーディオデータを記憶する記憶媒体、コンピュータ化されたシンセサイザのようなオーディオデータ生成器、またはオーディオデータの任意の他のソースを備え得る。ビデオソース24は、ビデオエンコーダ28によって符号化されるべきビデオデータを生成するビデオカメラ、以前に記録されたビデオデータで符号化された記憶媒体、コンピュータグラフィックスソースのようなビデオデータ生成ユニット、またはビデオデータの任意の他のソースを備え得る。コンテンツ準備デバイス20は必ずしも、すべての例において、サーバデバイス60に通信可能に結合されるとは限らないが、サーバデバイス60によって読み取られる別個の媒体にマルチメディアコンテンツを記憶する場合がある。 In the example of FIG. 1, content preparation device 20 includes an audio source 22 and a video source 24. In the example of FIG. Audio source 22 may include, for example, a microphone that generates electrical signals representing captured audio data to be encoded by audio encoder 26. Alternatively, audio source 22 may comprise a storage medium storing previously recorded audio data, an audio data generator such as a computerized synthesizer, or any other source of audio data. Video source 24 may be a video data generating unit such as a video camera that produces video data to be encoded by video encoder 28, a storage medium encoded with previously recorded video data, a computer graphics source, or Any other source of video data may be provided. Content preparation device 20 is not necessarily communicatively coupled to server device 60 in all instances, but may store multimedia content on a separate medium that is read by server device 60.

生のオーディオデータおよびビデオデータは、アナログデータまたはデジタルデータを含み得る。アナログデータは、オーディオエンコーダ26および/またはビデオエンコーダ28によって符号化される前にデジタル化され得る。オーディオソース22は、話している参加者から、その話している参加者が話している間オーディオデータを取得することができ、ビデオソース24は、話している参加者のビデオデータを同時に取得することができる。他の例では、オーディオソース22は、記憶されたオーディオデータを含むコンピュータ可読記憶媒体を備えてよく、ビデオソース24は、記憶されたビデオデータを含むコンピュータ可読記憶媒体を備え得る。このように、本開示で説明される技術は、ライブ、ストリーミング、リアルタイムオーディオデータ、およびリアルタイムビデオデータに適用され得、または、アーカイブされた事前に記録されたオーディオデータ、およびアーカイブされた事前に記録されたビデオデータに適用され得る。 Raw audio and video data may include analog or digital data. Analog data may be digitized before being encoded by audio encoder 26 and/or video encoder 28. Audio source 22 may obtain audio data from a speaking participant while the speaking participant is speaking, and video source 24 may simultaneously obtain video data of the speaking participant. Can be done. In other examples, audio source 22 may comprise a computer-readable storage medium containing stored audio data, and video source 24 may comprise a computer-readable storage medium containing stored video data. As such, the techniques described in this disclosure may be applied to live, streaming, real-time audio data, and real-time video data, or to archived pre-recorded audio data and archived pre-recorded audio data. can be applied to video data that has been

ビデオフレームに対応するオーディオフレームは、一般に、ビデオフレーム内に含まれるビデオソース24によってキャプチャ(または、生成)されたビデオデータと同時に、オーディオソース22によってキャプチャ(または、生成)されたオーディオデータを含むオーディオフレームである。たとえば、話している参加者が一般に話すことによってオーディオデータを生成している間、オーディオソース22はオーディオデータをキャプチャし、ビデオソース24は同時に、すなわち、オーディオソース22がオーディオデータをキャプチャしている間に、話している参加者のビデオデータをキャプチャする。したがって、オーディオフレームは、1つまたは複数の特定のビデオフレームに時間的に対応し得る。したがって、ビデオフレームに対応するオーディオフレームは、一般に、オーディオデータおよびビデオデータが同時にキャプチャされた状況に対応し、その状況に対して、オーディオフレームおよびビデオフレームがそれぞれ、同時にキャプチャされたオーディオデータおよびビデオデータを含む。 Audio frames corresponding to video frames generally include audio data captured (or generated) by audio source 22 at the same time as video data captured (or generated) by video source 24 contained within the video frame. It is an audio frame. For example, while a speaking participant is generally generating audio data by speaking, audio source 22 is capturing audio data and video source 24 is capturing audio data at the same time, i.e., audio source 22 is capturing audio data. In between, capture video data of the speaking participant. Thus, an audio frame may correspond in time to one or more particular video frames. Thus, an audio frame that corresponds to a video frame generally corresponds to a situation where audio data and video data were captured simultaneously, and for which situation an audio frame and a video frame correspond to simultaneously captured audio data and video data, respectively. Contains data.

いくつかの例では、オーディオエンコーダ26は、各符号化オーディオフレームにおいて、符号化オーディオフレームに関するオーディオデータが記録された時間を表すタイムスタンプを符号化することができ、同様に、ビデオエンコーダ28は、各符号化ビデオフレームにおいて、符号化ビデオフレームに関するビデオデータが記録された時間を表すタイムスタンプを符号化することができる。そのような例では、ビデオフレームに対応するオーディオフレームは、タイムスタンプを含むオーディオフレームおよび同じタイムスタンプを含むビデオフレームを含み得る。コンテンツ準備デバイス20は、オーディオエンコーダ26および/もしくはビデオエンコーダ28がタイムスタンプを生成する場合がある内部クロック、またはオーディオソース22およびビデオソース24がそれぞれオーディオデータおよびビデオデータをタイムスタンプに関連付けるために使用する場合がある内部クロックを含み得る。 In some examples, audio encoder 26 may encode, in each encoded audio frame, a timestamp that represents the time the audio data for the encoded audio frame was recorded; In each encoded video frame, a timestamp may be encoded that represents the time that video data for the encoded video frame was recorded. In such an example, an audio frame that corresponds to a video frame may include an audio frame that includes a timestamp and a video frame that includes the same timestamp. Content preparation device 20 may have internal clocks that audio encoder 26 and/or video encoder 28 may generate time stamps, or that audio source 22 and video source 24 use to associate audio and video data with time stamps, respectively. may include an internal clock that may be

いくつかの例では、オーディオソース22は、オーディオデータが記録された時間に対応するデータをオーディオエンコーダ26に送ることができ、ビデオソース24は、ビデオデータが記録された時間に対応するデータをビデオエンコーダ28に送ることができる。いくつかの例では、オーディオエンコーダ26は、符号化オーディオデータにおいて、符号化オーディオデータの相対的な時間順序を示すために、オーディオデータが記録された絶対的な時間を必ずしも示すとは限らないが、シーケンス識別子を符号化することができ、同様に、ビデオエンコーダ28も、符号化ビデオデータの相対的な時間順序を示すためにシーケンス識別子を使用することができる。同様に、いくつかの例では、シーケンス識別子がタイムスタンプとともにマップされるか、あるいはタイムスタンプと相関することがある。 In some examples, audio source 22 may send data to audio encoder 26 that corresponds to the time that the audio data was recorded, and video source 24 may send data that corresponds to the time that the video data was recorded to video It can be sent to encoder 28. In some examples, audio encoder 26 uses encoded audio data to indicate the relative time order of the encoded audio data, but not necessarily the absolute time at which the audio data was recorded. , a sequence identifier may be encoded, and similarly, video encoder 28 may also use the sequence identifier to indicate the relative temporal order of encoded video data. Similarly, in some examples, sequence identifiers may be mapped with or correlated with timestamps.

オーディオエンコーダ26は、一般に、符号化オーディオデータのストリームを生成し、ビデオエンコーダ28は、符号化ビデオデータのストリームを生成する。データの個々の各ストリーム(オーディオかビデオかにかかわらず)は、エレメンタリストリームと呼ばれることがある。エレメンタリストリームは、表現の単一のデジタル的にコーディングされた(場合によっては圧縮された)成分である。たとえば、表現のコード化ビデオまたはオーディオの部分は、エレメンタリストリームであり得る。エレメンタリストリームは、ビデオファイル内にカプセル化される前に、パケット化エレメンタリストリーム(PES)に変換され得る。同じ表現内で、ストリームIDが、あるエレメンタリストリームに属するPESパケットを他のエレメンタリストリームに属するPESパケットと区別するために使用され得る。エレメンタリストリームのデータの基本単位は、パケット化エレメンタリストリーム(PES)パケットである。したがって、コード化ビデオデータは、一般に、エレメンタリビデオストリームに対応する。同様に、オーディオデータは、1つまたは複数のそれぞれのエレメンタリストリームに対応する。 Audio encoder 26 generally produces a stream of encoded audio data, and video encoder 28 generally produces a stream of encoded video data. Each individual stream of data (whether audio or video) is sometimes called an elementary stream. An elementary stream is a single digitally coded (possibly compressed) component of a representation. For example, the coded video or audio portion of the representation may be an elementary stream. Elementary streams may be converted to packetized elementary streams (PES) before being encapsulated within a video file. Within the same representation, a stream ID may be used to distinguish PES packets belonging to one elementary stream from PES packets belonging to other elementary streams. The basic unit of elementary stream data is a packetized elementary stream (PES) packet. Thus, coded video data generally corresponds to an elementary video stream. Similarly, audio data corresponds to one or more respective elementary streams.

ITU-T H.264/AVCおよび来たる高効率ビデオコーディング(HEVC)規格など、多くのビデオコーディング規格は、エラーのないビットストリームのためのシンタックス、意味論、および復号プロセスを定義し、それらのいずれもが、一定のプロファイルまたはレベルに準拠する。ビデオコーディング規格は、一般的にエンコーダを規定しないが、エンコーダは、生成されたビットストリームがデコーダのための規格に準拠することを保証する役割を課される。ビデオコーディング規格のコンテキストでは、「プロファイル」は、アルゴリズム、特徴、またはツールのサブセット、およびこれらに適用される制約に対応する。H.264規格によって定義されるように、たとえば、「プロファイル」は、H.264規格によって指定される全体のビットストリームシンタックスのサブセットである。「レベル」は、たとえば、デコーダメモリおよび計算のような、デコーダのリソース消費の制限に対応し、これは、ピクチャの解像度、ビットレート、およびブロック処理速度に関連する。プロファイルは、profile_idc(プロファイルインジケータ)値によってシグナリングされ得るが、レベルは、level_idc(レベルインジケータ)値によってシグナリングされ得る。 Many video coding standards, such as ITU-T H.264/AVC and the upcoming High Efficiency Video Coding (HEVC) standard, define syntax, semantics, and decoding processes for error-free bitstreams, and all conform to a certain profile or level. Although video coding standards generally do not specify encoders, the encoder is tasked with ensuring that the generated bitstream complies with the standards for the decoder. In the context of video coding standards, a "profile" corresponds to a subset of algorithms, features, or tools and the constraints applied to them. As defined by the H.264 standard, for example, a "profile" is a subset of the overall bitstream syntax specified by the H.264 standard. A "level" corresponds to a limit on the decoder's resource consumption, such as decoder memory and computation, which is related to picture resolution, bit rate, and block processing speed. A profile may be signaled by a profile_idc (profile indicator) value, whereas a level may be signaled by a level_idc (level indicator) value.

たとえば、所与のプロファイルのシンタックスによって課される範囲内で、復号されるピクチャの指定されたサイズのようなビットストリーム内のシンタックス要素のとる値に応じて、エンコーダおよびデコーダの性能に大きい変動を求めることが依然として可能であることを、H.264規格は認める。多くの用途において、特定のプロファイル内のシンタックスのすべての仮想的な使用を扱うことが可能なデコーダを実装するのは、現実的でも経済的でもないことを、H.264規格はさらに認める。したがって、H.264規格は、ビットストリーム内のシンタックス要素の値に課される制約の指定されたセットとして、「レベル」を定義する。これらの制約は、値に対する単純な制限であり得る。代替として、これらの制約は、値の算術的な組合せの制約の形式(たとえば、1秒当たりに復号されるピクチャの数と、ピクチャの高さと、ピクチャの幅との積)をとり得る。個々の実装形態が、サポートされるプロファイルごとに異なるレベルをサポートしてもよいことを、H.264規格はさらに規定する。 For example, within the limits imposed by the syntax of a given profile, the performance of the encoder and decoder can vary depending on the values taken by syntax elements in the bitstream, such as the specified size of the picture to be decoded. The H.264 standard recognizes that it is still possible to determine variations. The H.264 standard further recognizes that in many applications it is neither practical nor economical to implement a decoder that can handle all hypothetical uses of the syntax within a particular profile. Accordingly, the H.264 standard defines a "level" as a specified set of constraints placed on the values of syntax elements within a bitstream. These constraints can be simple limits on values. Alternatively, these constraints may take the form of constraints on arithmetic combinations of values (eg, the product of the number of pictures decoded per second, the picture height, and the picture width). The H.264 standard further specifies that individual implementations may support different levels of supported profiles.

プロファイルに準拠するデコーダは、普通、プロファイル内で定義されるすべての特徴をサポートする。たとえば、コーディング特徴として、Bピクチャコーディングは、H.264/AVCのベースラインプロファイルではサポートされないが、H.264/AVCの他のプロファイルではサポートされる。あるレベルに準拠するデコーダは、レベル内で定義された制限を超えるリソースを要求しない、あらゆるビットストリームを復号することが可能であるべきである。プロファイルおよびレベルの定義は、説明可能性のために有用であり得る。たとえば、ビデオ送信中、プロファイルおよびレベルの定義のペアが、送信セッション全体に対して取り決められ合意され得る。より具体的には、H.264/AVCにおいて、レベルは、処理される必要があるマクロブロックの数、復号ピクチャバッファ(DPB)のサイズ、コード化ピクチャバッファ(CPB)のサイズ、垂直方向の運動ベクトルの範囲、2つの連続するMB当たりの運動ベクトルの最大の数に対する制限、および、Bブロックが8×8ピクセルよりも小さいサブマクロブロック区分を有し得るかどうかを定義することができる。このようにして、デコーダは、デコーダがビットストリームを適切に復号することが可能であるかどうか判断することができる。 A decoder that conforms to a profile typically supports all features defined within the profile. For example, as a coding feature, B-picture coding is not supported in the baseline profile of H.264/AVC, but is supported in other profiles of H.264/AVC. A decoder compliant with a level should be able to decode any bitstream that does not require more resources than the limits defined within the level. Profile and level definitions may be useful for explainability. For example, during video transmission, a pair of profile and level definitions may be negotiated and agreed upon for the entire transmission session. More specifically, in H.264/AVC, the level depends on the number of macroblocks that need to be processed, the size of the decoded picture buffer (DPB), the size of the coded picture buffer (CPB), and the vertical motion. A range of vectors, a limit on the maximum number of motion vectors per two consecutive MBs, and whether a B block can have sub-macroblock partitions smaller than 8x8 pixels can be defined. In this way, the decoder can determine whether it is capable of properly decoding the bitstream.

図1の例では、コンテンツ準備デバイス20のカプセル化ユニット30は、ビデオエンコーダ28からのコード化ビデオデータを含むエレメンタリストリームと、オーディオエンコーダ26からのコード化オーディオデータを含むエレメンタリストリームとを受信する。いくつかの例では、ビデオエンコーダ28およびオーディオエンコーダ26は各々、符号化データからPESパケットを形成するためのパケッタイザを含み得る。他の例では、ビデオエンコーダ28およびオーディオエンコーダ26は各々、符号化データからPESパケットを形成するためのそれぞれのパケッタイザとインターフェースをとる場合がある。さらに他の例では、カプセル化ユニット30は、符号化オーディオデータおよび符号化ビデオデータからPESパケットを形成するためのパケッタイザを含み得る。 In the example of FIG. 1, encapsulation unit 30 of content preparation device 20 receives an elementary stream containing coded video data from video encoder 28 and an elementary stream containing coded audio data from audio encoder 26. do. In some examples, video encoder 28 and audio encoder 26 may each include a packetizer to form PES packets from encoded data. In other examples, video encoder 28 and audio encoder 26 may each interface with a respective packetizer for forming PES packets from encoded data. In yet other examples, encapsulation unit 30 may include a packetizer for forming PES packets from encoded audio data and encoded video data.

ビデオエンコーダ28は、種々のやり方でマルチメディアコンテンツのビデオデータを符号化して、ピクセル解像度、フレームレート、様々なコーディング規格に対する準拠、様々なコーディング規格のための様々なプロファイルおよび/もしくはプロファイルのレベルに対する準拠、1つもしくは複数のビューを有する表現(たとえば、2次元または3次元の再生のための)、または他のそのような特性のような、様々な特性を有する様々なビットレートのマルチメディアコンテンツの様々な表現を生成することができる。本開示において使用される表現は、オーディオデータ、ビデオデータ、(たとえば、クローズドキャプション用の)テキストデータ、または他のそのようなデータのうちの1つを含んでもよい。この表現は、オーディオエレメンタリストリームまたはビデオエレメンタリストリームなどのエレメンタリストリームを含み得る。各PESパケットは、PESパケットが属するエレメンタリストリームを特定するstream_idを含み得る。カプセル化ユニット30は、様々な表現のビデオファイル(たとえば、セグメント)へとエレメンタリストリームを組み立てる役割を担う。 Video encoder 28 encodes video data of the multimedia content in a variety of ways, including pixel resolution, frame rate, compliance with various coding standards, and various profiles and/or levels of profiles for various coding standards. multimedia content of different bitrates with different characteristics, such as conformance, representation with one or more views (e.g. for two-dimensional or three-dimensional playback), or other such characteristics It is possible to generate various representations of . The representations used in this disclosure may include one of audio data, video data, text data (eg, for closed captioning), or other such data. This representation may include an elementary stream, such as an audio elementary stream or a video elementary stream. Each PES packet may include a stream_id that identifies the elementary stream to which the PES packet belongs. The encapsulation unit 30 is responsible for assembling the elementary streams into various representations of video files (eg, segments).

カプセル化ユニット30は、オーディオエンコーダ26およびビデオエンコーダ28から表現のエレメンタリストリームのためのPESパケットを受信し、PESパケットから対応するネットワーク抽象化層(NAL)ユニットを形成する。コード化ビデオセグメントはNALユニットへと編成される場合があり、NALユニットは、ビデオ電話、記憶、ブロードキャスト、またはストリーミングのような、「ネットワークフレンドリ」なビデオ表現のアドレッシング適用を実現する。NALユニットは、ビデオコーディング層(VCL)NALユニットおよび非VCL NALユニットに分類され得る。VCLユニットは、コア圧縮エンジンを含んでよく、ブロック、マクロブロック、および/またはスライスレベルのデータを含んでよい。他のNALユニットは、非VCL NALユニットであり得る。いくつかの例では、1つの時間インスタンスにおけるコード化ピクチャは、通常は一次コード化ピクチャとして提示され、1つまたは複数のNALユニットを含み得るアクセスユニット内に包含され得る。 Encapsulation unit 30 receives PES packets for elementary streams of representations from audio encoder 26 and video encoder 28 and forms corresponding network abstraction layer (NAL) units from the PES packets. Coded video segments may be organized into NAL units that enable addressing applications for "network-friendly" video representations, such as video telephony, storage, broadcasting, or streaming. NAL units may be classified into video coding layer (VCL) NAL units and non-VCL NAL units. A VCL unit may include a core compression engine and may include block, macroblock, and/or slice level data. Other NAL units may be non-VCL NAL units. In some examples, coded pictures at one time instance may be contained within an access unit, which is typically presented as a primary coded picture and may include one or more NAL units.

非VCL NALユニットは、特に、パラメータセットのNALユニットおよびSEI NALユニットを含み得る。パラメータセットは、(シーケンスパラメータセット(SPS)内に)シーケンスレベルヘッダ情報を包含し、(ピクチャパラメータセット(PPS)内に)頻繁には変化しないピクチャレベルヘッダ情報を包含し得る。パラメータセット(たとえば、PPSおよびSPS)があれば、この頻繁には変化しない情報は、各シーケンスまたはピクチャに対して繰り返される必要がなく、したがって、コーディング効率が向上し得る。さらに、パラメータセットの使用が、重要なヘッダ情報の帯域外送信を有効化することができ、エラーの復元のための冗長な送信の必要がなくなる。帯域外送信の例では、パラメータセットのNALユニットが、SEI NALユニットなどの他のNALユニットとは異なるチャネル上で送信され得る。 Non-VCL NAL units may include parameter set NAL units and SEI NAL units, among others. The parameter sets may include sequence level header information (in a sequence parameter set (SPS)) and picture level header information that does not change frequently (in a picture parameter set (PPS)). With parameter sets (eg, PPS and SPS), this infrequently changing information does not need to be repeated for each sequence or picture, thus improving coding efficiency. Additionally, the use of parameter sets can enable out-of-band transmission of critical header information, eliminating the need for redundant transmission for error recovery. In an example of out-of-band transmission, a parameter set NAL unit may be transmitted on a different channel than other NAL units, such as SEI NAL units.

補足強調情報(SEI)は、VCL NALユニットからコード化ピクチャサンプルを復号するために必要ではない情報を包含し得るが、復号、表示、エラーの復元、および他の目的に関係するプロセスを支援し得る。SEIメッセージは、非VCL NALユニットに包含され得る。SEIメッセージは、いくつかの標準仕様の規範的部分であり、したがって、規格に準拠するデコーダの実装において常に必須であるとは限らない。SEIメッセージは、シーケンスレベルSEIメッセージまたはピクチャレベルSEIメッセージであり得る。いくつかのシーケンスレベル情報は、SVCの例におけるスケーラビリティ情報SEIメッセージおよびMVCにおけるビュースケーラビリティ情報SEIメッセージなどのSEIメッセージ内に包含され得る。これらの例示的なSEIメッセージは、たとえば、動作点の抽出および動作点の特性に関する情報を伝達することができる。加えて、カプセル化ユニット30は、表現の特性を記述するメディアプレゼンテーション記述子(MPD)などのマニフェストファイルを形成することができる。カプセル化ユニット30は、拡張可能マークアップ言語(XML)に従ってMPDをフォーマットすることができる。 Supplemental Enhancement Information (SEI) may contain information that is not necessary to decode coded picture samples from a VCL NAL unit, but may assist processes related to decoding, display, error recovery, and other purposes. obtain. SEI messages may be included in non-VCL NAL units. SEI messages are a normative part of some standard specifications and are therefore not always mandatory in standard-compliant decoder implementations. The SEI message may be a sequence level SEI message or a picture level SEI message. Some sequence level information may be included within SEI messages, such as scalability information SEI messages in the SVC example and view scalability information SEI messages in MVC. These example SEI messages may convey information regarding operating point extraction and operating point characteristics, for example. In addition, encapsulation unit 30 may form a manifest file, such as a media presentation descriptor (MPD), that describes characteristics of the representation. Encapsulation unit 30 may format the MPD according to Extensible Markup Language (XML).

カプセル化ユニット30は、マニフェストファイル(たとえば、MPD)とともに、マルチメディアコンテンツの1つまたは複数の表現のためのデータを出力インターフェース32に提供し得る。出力インターフェース32は、ネットワークインターフェースもしくはユニバーサルシリアルバス(USB)インターフェース、CDもしくはDVDのライターもしくはバーナー、磁気記憶媒体もしくはフラッシュ記憶媒体へのインターフェースのような記憶媒体へ書き込むためのインターフェース、または、メディアデータを記憶もしくは送信するための他のインターフェースを含み得る。カプセル化ユニット30は、マルチメディアコンテンツの表現の各々のデータを出力インターフェース32に提供することができ、出力インターフェース32は、ネットワーク送信または記憶媒体を介してデータをサーバデバイス60に送ることができる。図1の例では、サーバデバイス60は、それぞれのマニフェストファイル66と1つまたは複数の表現68A～68N(表現68)とをそれぞれが含む様々なマルチメディアコンテンツ64を記憶する記憶媒体62を含む。いくつかの例では、出力インターフェース32はネットワーク74にデータを直接送ることもできる。 Encapsulation unit 30 may provide data for one or more representations of multimedia content to output interface 32 along with a manifest file (eg, MPD). The output interface 32 is an interface for writing to a storage medium, such as a network interface or a Universal Serial Bus (USB) interface, a CD or DVD writer or burner, an interface to a magnetic storage medium or a flash storage medium, or an interface for transmitting media data. Other interfaces for storage or transmission may be included. Encapsulation unit 30 may provide data for each representation of multimedia content to output interface 32, which may send the data to server device 60 via network transmission or storage media. In the example of FIG. 1, server device 60 includes a storage medium 62 that stores various multimedia content 64, each of which includes a respective manifest file 66 and one or more representations 68A-68N (representation 68). In some examples, output interface 32 can also send data directly to network 74.

いくつかの例では、表現68は、適応セットに分離され得る。すなわち、表現68の様々なサブセットは、コーデック、プロファイルおよびレベル、解像度、ビューの数、セグメントのファイルフォーマット、たとえば話者による、復号され提示されるべき表現および/またはオーディオデータとともに表示されるべきテキストの言語または他の特性を識別する場合があるテキストタイプ情報、カメラの角度または適応セット内の表現のシーンの現実世界のカメラの視野を表す場合があるカメラ角度情報、特定の視聴者に対するコンテンツの適切性を表すレーティング情報などのような、特性のそれぞれの共通のセットを含み得る。 In some examples, representations 68 may be separated into adaptation sets. That is, the various subsets of representations 68 are determined by the codec, profile and level, resolution, number of views, file format of the segment, e.g. by the speaker, the representation to be decoded and presented and/or the text to be displayed along with the audio data. text type information, which may identify the language or other characteristics of the content; camera angle information, which may represent the real-world camera field of view of the scene of the representation in the camera angle or adaptation set; Each may include a common set of characteristics, such as rating information representing suitability.

マニフェストファイル66は、特定の適応セットに対応する表現68のサブセットを示すデータ、ならびに適応セットの共通の特性を含み得る。マニフェストファイル66はまた、適応セットの個々の表現のための、ビットレートのような個々の特性を表すデータを含み得る。このようにして、適応セットは、簡略化されたネットワーク帯域幅適応を可能にする場合がある。適応セット内の表現は、マニフェストファイル66の適応セット要素の子要素を使用して示され得る。 Manifest file 66 may include data indicating a subset of representations 68 that correspond to a particular adaptation set, as well as common characteristics of the adaptation sets. Manifest file 66 may also include data representing individual characteristics, such as bit rate, for individual representations of the adaptation set. In this way, the adaptation set may enable simplified network bandwidth adaptation. Representations within the adaptive set may be indicated using child elements of the adaptive set element in the manifest file 66.

サーバデバイス60は、要求処理ユニット70およびネットワークインターフェース72を含む。いくつかの例では、サーバデバイス60は、複数のネットワークインターフェースを含み得る。さらに、サーバデバイス60の特徴のいずれかまたはすべては、ルータ、ブリッジ、プロキシデバイス、スイッチ、または他のデバイスなどの、コンテンツ配信ネットワークの他のデバイス上に実装され得る。いくつかの例では、コンテンツ配信ネットワークの中間デバイスは、マルチメディアコンテンツ64のデータをキャッシュし、サーバデバイス60の構成要素に実質的に準拠する構成要素を含み得る。一般に、ネットワークインターフェース72は、ネットワーク74を介してデータを送受信するように構成される。 Server device 60 includes a request processing unit 70 and a network interface 72. In some examples, server device 60 may include multiple network interfaces. Additionally, any or all of the features of server device 60 may be implemented on other devices of the content delivery network, such as routers, bridges, proxy devices, switches, or other devices. In some examples, intermediate devices of the content distribution network may include components that cache multimedia content 64 data and substantially conform to components of server device 60. Generally, network interface 72 is configured to send and receive data over network 74.

要求処理ユニット70は、記憶媒体62のデータに対するネットワーク要求をクライアントデバイス40のようなクライアントデバイスから受信するように構成される。たとえば、要求処理ユニット70は、R. Fielding他による、RFC 2616、「Hypertext Transfer Protocol-HTTP/1.1」、Network Working Group、IETF、1999年6月に記述されるような、ハイパーテキスト転送プロトコル(HTTP)バージョン1.1を実装する場合がある。すなわち、要求処理ユニット70は、HTTP GETまたは部分GET要求を受信して、それらの要求に応答して、マルチメディアコンテンツ64のデータを提供するように構成され得る。要求は、たとえば、セグメントのURLを使用して、表現68のうちの1つのセグメントを指定することができる。いくつかの例では、要求はまた、セグメントの1つまたは複数のバイト範囲を指定することができ、したがって、部分GET要求を含む。要求処理ユニット70はさらに、表現68のうちの1つのセグメントのヘッダデータを提供するために、HTTP HEAD要求に対応するように構成されてよい。いずれの場合でも、要求処理ユニット70は、クライアントデバイス40のような要求デバイスに、要求されたデータを提供するために、要求を処理するように構成され得る。 Request processing unit 70 is configured to receive network requests for data on storage medium 62 from a client device, such as client device 40 . For example, request processing unit 70 may use Hypertext Transfer Protocol (HTTP), as described in R. Fielding et al. ) Version 1.1 may be implemented. That is, request processing unit 70 may be configured to receive HTTP GET or partial GET requests and provide data for multimedia content 64 in response to those requests. The request may specify a segment of one of the representations 68 using, for example, the segment's URL. In some examples, the request may also specify one or more byte ranges of the segment, and thus includes a partial GET request. Request processing unit 70 may be further configured to respond to an HTTP HEAD request to provide header data for one segment of representation 68. In either case, request processing unit 70 may be configured to process the request to provide the requested data to a requesting device, such as client device 40.

追加または代替として、要求処理ユニット70は、eMBMSなどのブロードキャストまたはマルチキャストプロトコルを介してメディアデータを配信するように構成され得る。コンテンツ準備デバイス20は、DASHセグメントおよび/またはサブセグメントを、説明したのと実質的に同じやり方で作成することができるが、サーバデバイス60は、これらのセグメントまたはサブセグメントを、eMBMSまたは別のブロードキャストもしくはマルチキャストのネットワークトランスポートプロトコルを使用して配信することができる。たとえば、要求処理ユニット70は、クライアントデバイス40からマルチキャストグループ参加要求を受信するように構成され得る。すなわち、サーバデバイス60は、特定のマルチメディアコンテンツ(たとえば、ライブイベントのブロードキャスト)に関連付けられたマルチキャストグループに関連付けられたインターネットプロトコル(IP)アドレスを、クライアントデバイス40を含むクライアントデバイスに広告し得る。次に、クライアントデバイス40は、マルチキャストグループに加わるための要求を提出することができる。この要求は、ルータがマルチキャストグループに関連付けられたIPアドレス宛のトラフィックをクライアントデバイス40などの加入クライアントデバイスに向けるように、ネットワーク74中、たとえば、ネットワーク74を構成するルータに伝搬され得る。 Additionally or alternatively, request processing unit 70 may be configured to deliver media data via a broadcast or multicast protocol such as eMBMS. Content preparation device 20 may create DASH segments and/or subsegments in substantially the same manner as described, but server device 60 may create DASH segments and/or subsegments in eMBMS or another broadcast format. Alternatively, it can be distributed using a multicast network transport protocol. For example, request processing unit 70 may be configured to receive multicast group join requests from client devices 40. That is, server device 60 may advertise to client devices, including client device 40, Internet Protocol (IP) addresses associated with multicast groups associated with particular multimedia content (eg, a live event broadcast). Client device 40 can then submit a request to join the multicast group. This request may be propagated through network 74, eg, to the routers that make up network 74, such that the routers direct traffic destined for the IP address associated with the multicast group to participating client devices, such as client device 40.

図1の例に示すように、マルチメディアコンテンツ64は、メディアプレゼンテーション記述(MPD)に対応し得るマニフェストファイル66を含む。マニフェストファイル66は、様々な代替の表現68(たとえば、品質が異なるビデオサービス)の記述を包含してよく、この記述は、たとえば、コーデック情報、プロファイル値、レベル値、ビットレート、および表現68の他の説明のための特性を含み得る。クライアントデバイス40は、表現68のセグメントにどのようにアクセスするかを決定するために、メディアプレゼンテーションのMPDを取り出し得る。 As shown in the example of FIG. 1, multimedia content 64 includes a manifest file 66, which may correspond to a media presentation description (MPD). Manifest file 66 may include descriptions of various alternative representations 68 (e.g., video services of varying quality), which descriptions may include, for example, codec information, profile values, level values, bitrates, and representations 68. Other explanatory characteristics may be included. Client device 40 may retrieve the MPD of the media presentation to determine how to access segments of representation 68.

具体的には、取出しユニット52は、ビデオデコーダ48の復号能力とビデオ出力44のレンダリング能力とを決定するために、クライアントデバイス40の構成データ(図示せず)を取り出し得る。ビデオ出力44は、ヘッドセットなど、エクステンデッドリアリティ、拡張現実、または仮想現実のためのディスプレイデバイスの中に含められてよい。同様に、構成データは、ビデオ出力44が、たとえば、エクステンデッドリアリティ、拡張現実、仮想現実などのための3Dビデオデータをレンダリングすることが可能であるかどうかを示し得る。構成データはまた、クライアントデバイス40のユーザによって選択される言語の選好、クライアントデバイス40のユーザによって設定される深さの選好に対応する1つもしくは複数のカメラ視野、および/または、クライアントデバイス40のユーザによって選択されるレーティングの選好のいずれかまたはすべてを含み得る。 Specifically, retrieval unit 52 may retrieve configuration data (not shown) of client device 40 to determine the decoding capabilities of video decoder 48 and the rendering capabilities of video output 44. Video output 44 may be included in a display device for extended reality, augmented reality, or virtual reality, such as a headset. Similarly, configuration data may indicate whether video output 44 is capable of rendering 3D video data for extended reality, augmented reality, virtual reality, etc., for example. The configuration data may also include a language preference selected by a user of client device 40, one or more camera fields of view corresponding to a depth preference set by a user of client device 40, and/or a language preference selected by a user of client device 40. It may include any or all of the rating preferences selected by the user.

取出しユニット52は、たとえば、HTTP GETおよび部分GET要求を提出するように構成されたウェブブラウザまたはメディアクライアントを備え得る。取出しユニット52は、クライアントデバイス40の1つまたは複数のプロセッサまたは処理ユニット(図示せず)によって実行されるソフトウェア命令に対応し得る。いくつかの例では、取出しユニット52に関して説明した機能性のすべてまたは一部は、ハードウェア、または、ハードウェア、ソフトウェア、および/もしくはファームウェアの組合せにおいて実装されてよく、この場合、必須のハードウェアは、ソフトウェアまたはファームウェアのための命令を実行するために提供され得る。 Retrieval unit 52 may comprise, for example, a web browser or media client configured to submit HTTP GET and partial GET requests. Retrieval unit 52 may correspond to software instructions executed by one or more processors or processing units (not shown) of client device 40. In some examples, all or some of the functionality described with respect to the extraction unit 52 may be implemented in hardware or a combination of hardware, software, and/or firmware, in which case the required hardware may be provided to execute instructions for software or firmware.

取出しユニット52は、クライアントデバイス40の復号およびレンダリング能力を、マニフェストファイル66の情報によって示される表現68の特性と比較することができる。取出しユニット52は、表現68の特性を決定するために、マニフェストファイル66の少なくとも一部分を最初に取り出し得る。たとえば、取出しユニット52は、1つまたは複数の適応セットの特性について説明する、マニフェストファイル66の一部分を要求する場合がある。取出しユニット52は、クライアントデバイス40のコーディングおよびレンダリング能力によって満たされ得る特性を有する、表現68のサブセット(たとえば、適応セット)を選択することができる。取出しユニット52は、次いで、適応セット内の表現に対するビットレートを決定し、ネットワーク帯域幅の現在利用可能な量を決定し、ネットワーク帯域幅によって満たされ得るビットレートを有する表現のうちの1つからセグメントを取り出すことができる。 Retrieval unit 52 may compare the decoding and rendering capabilities of client device 40 to the characteristics of representation 68 indicated by information in manifest file 66. Retrieval unit 52 may first retrieve at least a portion of manifest file 66 to determine characteristics of representation 68. For example, retrieval unit 52 may request a portion of manifest file 66 that describes characteristics of one or more adaptation sets. Retrieval unit 52 may select a subset (eg, an adaptive set) of representations 68 that have characteristics that can be satisfied by the coding and rendering capabilities of client device 40. The retrieval unit 52 then determines the bitrate for the representations in the adaptation set, determines the currently available amount of network bandwidth, and extracts the bitrate from one of the representations that has a bitrate that can be satisfied by the network bandwidth. Segments can be extracted.

概して、表現のビットレートが高くなると、ビデオ再生の品質が高くなる一方、表現のビットレートが低くなると、利用可能なネットワーク帯域幅が縮小したときに、ビデオ再生の品質が十分なものになる場合がある。したがって、利用可能なネットワーク帯域幅が比較的高いときには、取出しユニット52は、ビットレートが比較的高い表現からデータを取り出すことができ、利用可能なネットワーク帯域幅が低いときには、取出しユニット52は、ビットレートが比較的低い表現からデータを取り出すことができる。このように、クライアントデバイス40は、ネットワーク74の変化するネットワーク帯域幅の利用可能性にも適応しながら、ネットワーク74を介してマルチメディアデータをストリーミングし得る。 In general, a higher representation bitrate will result in a higher quality video playback, while a lower representation bitrate will provide sufficient video playback quality when the available network bandwidth is reduced. There is. Therefore, when the available network bandwidth is relatively high, the retrieval unit 52 can retrieve data from a representation with a relatively high bit rate, and when the available network bandwidth is low, the retrieval unit 52 can retrieve data from a representation with a relatively high bit rate. Data can be retrieved from representations with relatively low rates. In this manner, client device 40 may stream multimedia data over network 74 while also adapting to changing network bandwidth availability on network 74.

追加または代替として、取出しユニット52は、ブロードキャスト、またはeMBMSもしくはIPマルチキャストなどのマルチキャストネットワークプロトコルに従ってデータを受信するように構成され得る。そのような例では、取出しユニット52は、特定のメディアコンテンツに関連付けられたマルチキャストネットワークグループに加わるための要求を提出することができる。取出しユニット52は、マルチキャストグループに加わった後、サーバデバイス60またはコンテンツ準備デバイス20にさらなる要求を発行することなしに、マルチキャストグループのデータを受信することができる。取出しユニット52は、たとえば、再生を停止するために、または、チャネルを異なるマルチキャストグループに変更するために、マルチキャストグループのデータがもはや必要とされないとき、マルチキャストグループを出るための要求を提出することができる。 Additionally or alternatively, retrieval unit 52 may be configured to receive data according to a broadcast or multicast network protocol, such as eMBMS or IP multicast. In such an example, retrieval unit 52 may submit a request to join a multicast network group associated with particular media content. After retrieval unit 52 joins a multicast group, it can receive data of the multicast group without issuing further requests to server device 60 or content preparation device 20. The retrieval unit 52 may submit a request to leave the multicast group when the data of the multicast group is no longer needed, for example to stop playback or to change the channel to a different multicast group. can.

ネットワークインターフェース54は、選択された表現のセグメントのデータを受信し、取出しユニット52に提供することができ、次に、取出しユニット52は、セグメントをカプセル化解除ユニット50に提供することができる。カプセル化解除ユニット50は、ビデオファイルの要素を、構成要素であるPESストリームへとカプセル化解除し、PESストリームをパケット化解除して符号化データを取り出し、たとえば、ストリームのPESパケットヘッダによって示されるように、符号化データがオーディオストリームの一部それともビデオストリームの一部であるかに応じて、符号化データをオーディオデコーダ46またはビデオデコーダ48のいずれかに送ることができる。オーディオデコーダ46は、符号化オーディオデータを復号し、復号したオーディオデータをオーディオ出力42に送る一方、ビデオデコーダ48は、符号化ビデオデータを復号し、ストリームの複数のビューを含み得る復号ビデオデータをビデオ出力44に送る。 Network interface 54 may receive and provide data for the segment of the selected representation to retrieval unit 52, which in turn may provide the segment to decapsulation unit 50. Decapsulation unit 50 decapsulates the elements of the video file into constituent PES streams and depackets the PES streams to retrieve encoded data, e.g., as indicated by the PES packet header of the stream. As such, the encoded data can be sent to either audio decoder 46 or video decoder 48, depending on whether the encoded data is part of an audio stream or a video stream. Audio decoder 46 decodes the encoded audio data and sends the decoded audio data to audio output 42, while video decoder 48 decodes the encoded video data and sends the decoded video data, which may include multiple views of the stream. Send to video output 44.

ビデオエンコーダ28、ビデオデコーダ48、オーディオエンコーダ26、オーディオデコーダ46、カプセル化ユニット30、取出しユニット52、およびカプセル化解除ユニット50は各々、適用できる場合は、1つまたは複数のマイクロプロセッサ、デジタル信号プロセッサ(DSP)、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)、個別論理回路機構、ソフトウェア、ハードウェア、ファームウェア、またはそれらの任意の組合せなど、様々な適切な処理回路機構のいずれかとして実装され得る。ビデオエンコーダ28およびビデオデコーダ48の各々は、1つまたは複数のエンコーダまたはデコーダ内に含まれてよく、これらのいずれもが、複合ビデオエンコーダ/デコーダ(コーデック)の一部として統合され得る。同様に、オーディオエンコーダ26およびオーディオデコーダ46の各々は、1つまたは複数のエンコーダまたはデコーダ内に含まれてよく、これらのいずれもが、複合コーデックの一部として統合され得る。ビデオエンコーダ28、ビデオデコーダ48、オーディオエンコーダ26、オーディオデコーダ46、カプセル化ユニット30、取出しユニット52、および/またはカプセル化解除ユニット50を含む装置は、集積回路、マイクロプロセッサ、および/またはセルラー電話のようなワイヤレス通信デバイスを含み得る。 Video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, encapsulation unit 30, decapsulation unit 52, and decapsulation unit 50 each include one or more microprocessors, digital signal processors, if applicable. any of a variety of suitable processing circuitry, such as (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), discrete logic circuitry, software, hardware, firmware, or any combination thereof. It can be implemented as Each of video encoder 28 and video decoder 48 may be included within one or more encoders or decoders, any of which may be integrated as part of a composite video encoder/decoder (codec). Similarly, each of audio encoder 26 and audio decoder 46 may be included within one or more encoders or decoders, any of which may be integrated as part of a composite codec. The device including video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, encapsulation unit 30, retrieval unit 52, and/or decapsulation unit 50 may include an integrated circuit, a microprocessor, and/or a cellular phone. wireless communication devices such as

クライアントデバイス40、サーバデバイス60、および/またはコンテンツ準備デバイス20は、本開示の技法に従って動作するように構成され得る。例として、本開示は、クライアントデバイス40およびサーバデバイス60に関するこれらの技法について説明する。しかしながら、コンテンツ準備デバイス20は、サーバデバイス60の代わりに(または、それに加えて)これらの技法を実施するように構成され得ることを理解されたい。 Client device 40, server device 60, and/or content preparation device 20 may be configured to operate in accordance with the techniques of this disclosure. By way of example, this disclosure describes these techniques with respect to client device 40 and server device 60. However, it should be understood that content preparation device 20 may be configured to implement these techniques instead of (or in addition to) server device 60.

カプセル化ユニット30は、NALユニットが属するプログラム、ならびにペイロード、たとえばオーディオデータ、ビデオデータ、またはNALユニットが対応するトランスポートまたはプログラムストリームを記述するデータを特定するヘッダを含むNALユニットを形成することができる。たとえば、H.264/AVCにおいて、NALユニットは、1バイトのヘッダおよび可変サイズのペイロードを含む。そのペイロード内にビデオデータを含むNALユニットは、ビデオデータの様々な粒度レベルを含み得る。たとえば、NALユニットは、ビデオデータのブロック、複数のブロック、ビデオデータのスライス、またはビデオデータの全ピクチャを含み得る。カプセル化ユニット30は、ビデオエンコーダ28からの符号化ビデオデータをエレメンタリストリームのPESパケットの形で受信することができる。カプセル化ユニット30は、各エレメンタリストリームを対応するプログラムに関連付けることができる。 Encapsulation unit 30 may form a NAL unit that includes a header that identifies the program to which the NAL unit belongs, as well as a payload, such as audio data, video data, or data that describes the transport or program stream to which the NAL unit corresponds. can. For example, in H.264/AVC, a NAL unit includes a 1-byte header and a variable-sized payload. A NAL unit that includes video data within its payload may include various granularity levels of video data. For example, a NAL unit may include a block of video data, multiple blocks, a slice of video data, or an entire picture of video data. Encapsulation unit 30 may receive encoded video data from video encoder 28 in the form of PES packets of an elementary stream. Encapsulation unit 30 can associate each elementary stream with a corresponding program.

カプセル化ユニット30はまた、複数のNALユニットからアクセスユニットを組み立てることができる。一般に、アクセスユニットは、ビデオデータのフレーム、ならびにそのようなオーディオデータが利用可能であるときにそのフレームに対応するオーディオデータを表すために1つまたは複数のNALユニットを含むことができる。アクセスユニットは、一般に、1つの出力時間インスタンスに対するすべてのNALユニット、たとえば1つの時間インスタンスに対するすべてのオーディオデータおよびビデオデータを含む。たとえば、各ビューが20フレーム毎秒(fps)のフレームレートを有する場合、各時間インスタンスは、0.05秒の時間間隔に対応し得る。この時間間隔中、同じアクセスユニット(同じ時間インスタンス)のすべてのビューに対する特定のフレームは、同時にレンダリングされ得る。一例では、アクセスユニットは、一次コード化ピクチャとして提示され得る、1つの時間インスタンス内のコード化ピクチャを含み得る。 Encapsulation unit 30 can also assemble access units from multiple NAL units. Generally, an access unit may include one or more NAL units to represent a frame of video data, as well as audio data corresponding to the frame when such audio data is available. An access unit generally includes all NAL units for one output time instance, eg, all audio data and video data for one time instance. For example, if each view has a frame rate of 20 frames per second (fps), each time instance may correspond to a time interval of 0.05 seconds. During this time interval, a particular frame for all views of the same access unit (same time instance) may be rendered simultaneously. In one example, an access unit may include coded pictures within one time instance, which may be presented as a primary coded picture.

したがって、アクセスユニットは、共通の時間インスタンスのすべてのオーディオフレームおよびビデオフレーム、たとえば、時間Xに対応するすべてのビューを含むことができる。本開示はまた、特定のビューの符号化ピクチャを「ビューコンポーネント」と呼ぶ。すなわち、ビューコンポーネントは、特定の時間における特定のビューに対する符号化ピクチャ(または、フレーム)を含み得る。したがって、アクセスユニットは、共通の時間インスタンスのすべてのビューコンポーネントを含むものとして定義され得る。アクセスユニットの復号順序は、必ずしも出力または表示の順序と同じである必要はない。 Thus, an access unit may include all audio and video frames of a common time instance, e.g., all views corresponding to time X. This disclosure also refers to encoded pictures of a particular view as "view components." That is, a view component may include encoded pictures (or frames) for a particular view at a particular time. Thus, an access unit may be defined as containing all view components of a common time instance. The decoding order of access units does not necessarily have to be the same as the output or display order.

メディアプレゼンテーションは、異なる代替表現(たとえば、異なる品質を有するビデオサービス)の記述を包含し得るメディアプレゼンテーション記述(MPD)を含むことができ、記述は、たとえば、コーデック情報、プロファイル値、およびレベル値を含み得る。MPDは、マニフェストファイル66など、マニフェストファイルの一例である。クライアントデバイス40は、メディアプレゼンテーションのMPDを取り出して、様々なプレゼンテーションのムービーフラグメントにどのようにアクセスするかを決定することができる。ムービーフラグメントは、ビデオファイルのムービーフラグメントボックス(moofボックス)内に配置され得る。 A media presentation may include a media presentation description (MPD) that may include descriptions of different alternative representations (e.g., video services with different qualities), where the description includes, for example, codec information, profile values, and level values. may be included. MPD is an example of a manifest file, such as manifest file 66. Client device 40 can retrieve the MPD of a media presentation to determine how to access movie fragments of various presentations. Movie fragments may be placed within a movie fragment box (moof box) of a video file.

マニフェストファイル66(たとえば、MPDを含み得る)は、表現68のセグメントの利用可能性を広告することができる。すなわち、MPDは、表現68のうちの1つの第1のセグメントが利用可能になる壁時計時間を示す情報、ならびに表現68内のセグメントの持続時間を示す情報を含み得る。このようにして、クライアントデバイス40の取出しユニット52は、開始時間ならびに特定のセグメントに先行するセグメントの持続時間に基づいて、各セグメントが利用可能であるときを判断することができる。 Manifest file 66 (which may include an MPD, for example) may advertise the availability of segments of representation 68. That is, the MPD may include information indicating the wall clock time at which the first segment of one of the representations 68 will be available, as well as information indicating the duration of the segments within the representation 68. In this way, the retrieval unit 52 of the client device 40 can determine when each segment is available based on the start time as well as the duration of the segments that precede a particular segment.

カプセル化ユニット30が、受信されたデータに基づいてNALユニットおよび/またはアクセスユニットをビデオファイルに組み立てた後、カプセル化ユニット30は、ビデオファイルを出力のために出力インターフェース32に渡す。いくつかの例では、カプセル化ユニット30は、ビデオファイルを直接クライアントデバイス40に送る代わりに、ビデオファイルをローカルに記憶するか、または出力インターフェース32を介してビデオファイルをリモートサーバに送ることができる。出力インターフェース32は、たとえば、送信機、トランシーバ、たとえば、オプティカルドライブ、磁気媒体ドライブ(たとえば、フロッピードライブ)などのコンピュータ可読媒体にデータを書き込むためのデバイス、ユニバーサルシリアルバス(USB)ポート、ネットワークインターフェース、または他の出力インターフェースを含み得る。出力インターフェース32は、たとえば、送信信号、磁気媒体、光学媒体、メモリ、フラッシュドライブ、または他のコンピュータ可読媒体など、コンピュータ可読媒体にビデオファイルを出力する。 After encapsulation unit 30 assembles the NAL units and/or access units into a video file based on the received data, encapsulation unit 30 passes the video file to output interface 32 for output. In some examples, instead of sending the video file directly to client device 40, encapsulation unit 30 may store the video file locally or send the video file to a remote server via output interface 32. . Output interface 32 can include, for example, a transmitter, a transceiver, a device for writing data to a computer-readable medium, such as an optical drive, a magnetic media drive (e.g., a floppy drive), a universal serial bus (USB) port, a network interface, or other output interfaces. Output interface 32 outputs the video file to a computer readable medium, such as, for example, a transmitted signal, magnetic media, optical media, memory, flash drive, or other computer readable medium.

ネットワークインターフェース54は、ネットワーク74を介してNALユニットまたはアクセスユニットを受信し、NALユニットまたはアクセスユニットを取出しユニット52を介してカプセル化解除ユニット50に提供することができる。カプセル化解除ユニット50は、ビデオファイルの要素を、構成要素であるPESストリームへとカプセル化解除し、PESストリームをパケット化解除して符号化データを取り出し、たとえば、ストリームのPESパケットヘッダによって示されるように、符号化データがオーディオストリームの一部それともビデオストリームの一部であるかに応じて、符号化データをオーディオデコーダ46またはビデオデコーダ48のいずれかに送ることができる。オーディオデコーダ46は、符号化オーディオデータを復号し、復号したオーディオデータをオーディオ出力42に送る一方、ビデオデコーダ48は、符号化ビデオデータを復号し、ストリームの複数のビューを含み得る復号したビデオデータをビデオ出力44に送る。 Network interface 54 can receive NAL units or access units via network 74 and provide NAL units or access units to decapsulation unit 50 via retrieval unit 52 . Decapsulation unit 50 decapsulates the elements of the video file into constituent PES streams and depackets the PES streams to retrieve encoded data, e.g., as indicated by the PES packet header of the stream. As such, the encoded data may be sent to either audio decoder 46 or video decoder 48, depending on whether the encoded data is part of an audio stream or a video stream. Audio decoder 46 decodes the encoded audio data and sends the decoded audio data to audio output 42, while video decoder 48 decodes the encoded video data and sends the decoded video data, which may include multiple views of the stream. to video output 44.

本開示の技法によると、クライアントデバイス40のユーザは、たとえばエクステンデッドリアリティ(XR)、拡張現実(AR)、仮想現実(VR)などのための、3D仮想シーンに関連したメディアデータを取得することができる。ユーザは、コントローラなど、クライアントデバイス40と通信する1つまたは複数のデバイスを使って、3D仮想シーンの中をナビゲートすることができる。追加または代替として、クライアントデバイス40は、ユーザが現実世界空間の中で動いたと判断するためのセンサー、カメラなどを含んでよく、クライアントデバイス40は、そのような現実世界移動を仮想空間移動に翻訳し得る。 According to techniques of the present disclosure, a user of client device 40 may obtain media data related to a 3D virtual scene, e.g., for extended reality (XR), augmented reality (AR), virtual reality (VR), etc. can. A user may navigate through a 3D virtual scene using one or more devices, such as a controller, that communicate with client device 40. Additionally or alternatively, client device 40 may include sensors, cameras, etc. for determining that the user has moved within real-world space, and client device 40 may translate such real-world movement into virtual space movement. It is possible.

3D仮想シーンは、1つまたは複数の仮想固体オブジェクトを含み得る。そのようなオブジェクトは、たとえば、壁、窓、テーブル、椅子、または仮想シーンの中で出現し得る、どの他のそのようなオブジェクトも含み得る。本開示の技法によると、取出しユニット52によって取り出されたメディアデータは、そのような仮想固体オブジェクトを記述するシーン記述を含み得る。シーン記述は、たとえば、glTF2.0のMPEGシーン記述要素に準拠し得る。 A 3D virtual scene may include one or more virtual solid objects. Such objects may include, for example, walls, windows, tables, chairs, or any other such objects that may appear within the virtual scene. According to the techniques of this disclosure, media data retrieved by retrieval unit 52 may include a scene description that describes such virtual solid objects. The scene description may be compliant with the MPEG scene description element of glTF2.0, for example.

いくつかの例では、シーン記述は、許容カメラ移動の記述を含み得る。たとえば、シーン記述は、仮想カメラがその中で動くことを許可される1つまたは複数の境界ボリュームを(たとえば、球、立方体、円錐体、錐体などのような形状のボリュームに従って)、仮想カメラが形状の限界を超えて動くことを許可されないように記述し得る。つまり、境界ボリュームは、仮想カメラがその中で動くことを許可される許容カメラ移動ボリュームを記述し得る。追加または代替として、シーン記述は、1つまたは複数の頂点またはアンカーポイント、ならびに頂点またはアンカーポイントの間の許可された経路(たとえば、セグメント)を記述し得る。クライアントデバイス40は、仮想カメラが、許可された経路に沿って、および/または境界ボリューム内を動くことだけを許可し得る。 In some examples, the scene description may include a description of allowable camera movements. For example, the scene description defines one or more bounding volumes within which the virtual camera is allowed to move (e.g., according to volumes shaped like spheres, cubes, cones, cones, etc.), and the virtual camera can be written such that it is not allowed to move beyond the limits of its shape. That is, the bounding volume may describe the permissible camera movement volume within which the virtual camera is allowed to move. Additionally or alternatively, the scene description may describe one or more vertices or anchor points and allowed paths (eg, segments) between the vertices or anchor points. Client device 40 may only allow the virtual camera to move along the allowed path and/or within the bounding volume.

いくつかの例では、追加または代替として、シーン記述は、仮想カメラが通ることができない、シーンの中の1つまたは複数の仮想固体オブジェクトを記述し得る。 In some examples, the scene description may additionally or alternatively describe one or more virtual solid objects in the scene that the virtual camera cannot pass through.

図2は、図1の取出しユニット52の構成要素の例示的なセットをより詳細に示すブロック図である。この例では、取出しユニット52は、eMBMSミドルウェアユニット100、DASHクライアント110、メディアアプリケーション112、およびプレゼンテーションエンジン114を含む。 FIG. 2 is a block diagram illustrating an exemplary set of components of extraction unit 52 of FIG. 1 in more detail. In this example, retrieval unit 52 includes an eMBMS middleware unit 100, a DASH client 110, a media application 112, and a presentation engine 114.

この例では、eMBMSミドルウェアユニット100は、eMBMS受信ユニット106、キャッシュ104、およびプロキシサーバ102をさらに含む。この例では、eMBMS受信ユニット106は、たとえば、tools.ietf.org/html/rfc6726において入手可能な、T. Paila他、「FLUTE-File Delivery over Unidirectional Transport」、Network Working Group、RFC 6726、2012年11月に記述されたFile Delivery over Unidirectional Transport(FLUTE)に従って、eMBMSによりデータを受信し得る。つまり、eMBMS受信ユニット106は、たとえば、ブロードキャスト/マルチキャストサービスセンター(BM-SC)として作用し得るサーバデバイス60から、ブロードキャストによりファイルを受信し得る。 In this example, eMBMS middleware unit 100 further includes an eMBMS receiving unit 106, a cache 104, and a proxy server 102. In this example, the eMBMS receiving unit 106 is configured as described in T. Paila et al., "FLUTE-File Delivery over Unidirectional Transport," Network Working Group, RFC 6726, 2012, available at, for example, tools.ietf.org/html/rfc6726. Data can be received by eMBMS according to File Delivery over Unidirectional Transport (FLUTE), which was described in November. That is, eMBMS receiving unit 106 may receive files via broadcast, for example, from server device 60, which may act as a broadcast/multicast service center (BM-SC).

eMBMSミドルウェアユニット100がファイルについてのデータを受信すると、eMBMSミドルウェアユニットは、受信されたデータをキャッシュ104中に記憶してよい。キャッシュ104は、フラッシュメモリ、ハードディスク、RAM、または任意の他の適切な記憶媒体などのコンピュータ可読記憶媒体を含み得る。 When eMBMS middleware unit 100 receives data for a file, eMBMS middleware unit may store the received data in cache 104. Cache 104 may include a computer-readable storage medium such as flash memory, hard disk, RAM, or any other suitable storage medium.

プロキシサーバユニット102は、DASHクライアント110のためのサーバとして作用し得る。たとえば、プロキシサーバユニット102は、MPDファイルまたは他のマニフェストファイルをDASHクライアント110に与えてよい。プロキシサーバユニット102は、MPDファイル内、ならびにセグメントを取り出すことができるハイパーリンク内のセグメントに関する利用可能性時間を広告することができる。これらのハイパーリンクは、クライアントデバイス40に対応するローカルホストアドレスプレフィックス(たとえば、IPv4に関する127.0.0.1)を含み得る。このようにして、DASHクライアント110は、HTTP GETまたは部分GET要求を使って、プロキシサーバユニット102に対してセグメントを要求し得る。たとえば、リンクhttp://127.0.0.1/rep1/seg3から入手可能なセグメントに関して、DASHクライアント110は、http://127.0.0.1/rep1/seg3に関する要求を含むHTTP GET要求を構築し、その要求をプロキシサーバユニット102に提出することができる。プロキシサーバユニット102は、要求されたデータをキャッシュ104から取り出し、そのような要求に応答して、そのデータをDASHクライアント110に提供することができる。 Proxy server unit 102 may act as a server for DASH client 110. For example, proxy server unit 102 may provide an MPD file or other manifest file to DASH client 110. Proxy server unit 102 can advertise availability times for segments within the MPD file as well as within hyperlinks from which the segments can be retrieved. These hyperlinks may include local host address prefixes corresponding to client device 40 (eg, 127.0.0.1 for IPv4). In this manner, DASH client 110 may request a segment from proxy server unit 102 using an HTTP GET or partial GET request. For example, for a segment available from the link http://127.0.0.1/rep1/seg3, DASH client 110 constructs an HTTP GET request that includes a request for http://127.0.0.1/rep1/seg3 and can be submitted to the proxy server unit 102. Proxy server unit 102 can retrieve requested data from cache 104 and provide the data to DASH client 110 in response to such requests.

DASHクライアント110は、取り出されたメディアデータをメディアアプリケーション112に与える。メディアアプリケーション112は、たとえば、メディアデータを受信し、提示するウェブブラウザ、ゲームエンジン、または別のアプリケーションであってよい。さらに、プレゼンテーションエンジン114は、取り出されたメディアデータを3D仮想環境の中で提示するために、メディアアプリケーション112と対話するアプリケーションを表す。プレゼンテーションエンジン114は、たとえば、2次元メディアデータを3D投射上にマップし得る。プレゼンテーションエンジン114はまた、クライアントデバイス40の他の要素から入力を受信して、3D仮想環境の中でのユーザの位置、およびその位置でユーザが向いている配向を判断し得る。たとえば、プレゼンテーションエンジン114は、ユーザに対して表示するべき適切なメディアデータを判断するために、ユーザの位置についてのX、Y、およびZ座標、ならびにユーザが見ている配向を判断し得る。その上、プレゼンテーションエンジン114は、現実世界ユーザ移動データを表すカメラ移動データを受信し、現実世界ユーザ移動データを3D仮想空間移動データに翻訳し得る。 DASH client 110 provides the retrieved media data to media application 112. Media application 112 may be, for example, a web browser, game engine, or another application that receives and presents media data. Additionally, presentation engine 114 represents an application that interacts with media application 112 to present retrieved media data within a 3D virtual environment. Presentation engine 114 may, for example, map two-dimensional media data onto a 3D projection. Presentation engine 114 may also receive input from other elements of client device 40 to determine the user's position within the 3D virtual environment and the orientation the user is facing at that position. For example, presentation engine 114 may determine the X, Y, and Z coordinates for the user's location and the user's viewing orientation to determine the appropriate media data to display to the user. Additionally, presentation engine 114 may receive camera movement data representing real-world user movement data and translate the real-world user movement data into 3D virtual space movement data.

本開示の技法によると、eMBMSミドルウェアユニット100は、ブロードキャストまたはマルチキャストにより、(たとえば、glTF2.0による)メディアデータを受信する場合があり、次いで、DASHクライアント110は、eMBMSミドルウェアユニット100からメディアデータを取り出し得る。メディアデータは、仮想カメラが仮想シーンの中をどのように動くことができるかを示すカメラ制御情報を含むシーン記述を含み得る。たとえば、シーン記述は、たとえば、アンカーポイントの間の定義された経路に沿って、仮想シーンを通る許容経路を記述するデータを含み得る。追加または代替として、シーン記述は、仮想カメラがその中で動くことを許可されるボリュームを表す境界ボリュームを記述するデータを含み得る。追加または代替として、シーン記述は、壁、テーブル、椅子などのような、3D仮想環境の中の1つまたは複数の固体仮想オブジェクトを記述するデータを含み得る。たとえば、シーン記述のデータは、3D仮想オブジェクトの衝突境界を定義し得る。シーン記述は、オブジェクトが静的(たとえば、壁の場合のように)であろうとも動的(たとえば、椅子の場合のように)であろうとも、オブジェクトを使ってアニメーションが再生されるなど、そのようなオブジェクトとの衝突が起きた場合に何が起こるかを表すデータをさらに含み得る。 According to techniques of this disclosure, eMBMS middleware unit 100 may receive media data (e.g., via glTF2.0) via broadcast or multicast, and then DASH client 110 receives media data from eMBMS middleware unit 100. It can be taken out. The media data may include a scene description that includes camera control information indicating how the virtual camera can move through the virtual scene. For example, a scene description may include data describing allowable paths through the virtual scene, eg, along defined paths between anchor points. Additionally or alternatively, the scene description may include data describing a bounding volume that represents the volume within which the virtual camera is allowed to move. Additionally or alternatively, the scene description may include data describing one or more solid virtual objects within the 3D virtual environment, such as walls, tables, chairs, etc. For example, the scene description data may define collision boundaries for 3D virtual objects. A scene description describes whether an object is static (as in the case of a wall, for example) or dynamic (as in the case of a chair, for example), an animation is played using the object, etc. It may further include data representing what will happen if a collision with such an object occurs.

プレゼンテーションエンジン114は、シーン記述を使って、3D仮想オブジェクトとの衝突が起きた場合に何を提示するか、および/または許容経路もしくはボリュームの外に出るための試みを判断し得る。たとえば、シーン記述が、許容経路または境界ボリュームについてのデータを含み、ユーザが、許容経路または境界ボリュームを超えて動くことを試みる場合、プレゼンテーションエンジン114は、ディスプレイの更新を回避するだけでよく、そうすることによって、そのような移動が許可されないことを示す。別の例として、シーン記述が、3D仮想固体オブジェクトについてのデータを含み、ユーザが、3D仮想固体オブジェクトの中を動くことを試みる場合、3D仮想固体オブジェクトが静的である場合は、プレゼンテーションエンジン114は、ディスプレイの更新を回避してよい。3D仮想固体オブジェクトが静的でない場合、プレゼンテーションエンジン114は、オブジェクト用に表示するべきアニメーション、たとえば、オブジェクトに適用されるべき並進移動および/または回転移動を判断してよい。たとえば、3D仮想固体オブジェクトが椅子である場合、アニメーションデータは、椅子が床に沿って押されることになるか、または衝突が起きた場合は倒れることになることを示し得る。 Presentation engine 114 may use the scene description to determine what to present in the event of a collision with a 3D virtual object and/or an attempt to exit the allowed path or volume. For example, if the scene description includes data about permissible paths or bounding volumes and the user attempts to move beyond the permissible paths or bounding volumes, presentation engine 114 need only avoid updating the display; indicates that such movement is not permitted. As another example, if the scene description includes data about a 3D virtual solid object and the user attempts to move within the 3D virtual solid object, if the 3D virtual solid object is static, the presentation engine 114 may avoid updating the display. If the 3D virtual solid object is not static, presentation engine 114 may determine animation to display for the object, such as translational and/or rotational movement to be applied to the object. For example, if the 3D virtual solid object is a chair, the animation data may indicate that the chair will be pushed along the floor or will fall if a collision occurs.

図3は、例示的なマルチメディアコンテンツ120の要素を示す概念図である。マルチメディアコンテンツ120は、マルチメディアコンテンツ64(図1)、または記憶媒体62に記憶された別のマルチメディアコンテンツに対応し得る。図3の例では、マルチメディアコンテンツ120は、メディアプレゼンテーション記述(MPD)122と複数の表現124A～124N(表現124)とを含む。表現124Aは、任意選択のヘッダデータ126とセグメント128A～128N(セグメント128)とを含む一方、表現124Nは、任意選択のヘッダデータ130とセグメント132A～132N(セグメント132)とを含む。文字Nが、便宜的に、表現124の各々の最後のムービーフラグメントを指定するために使用される。いくつかの例では、表現124同士の間で異なる数のムービーフラグメントが存在し得る。 FIG. 3 is a conceptual diagram illustrating elements of example multimedia content 120. Multimedia content 120 may correspond to multimedia content 64 (FIG. 1) or other multimedia content stored on storage medium 62. In the example of FIG. 3, multimedia content 120 includes a media presentation description (MPD) 122 and a plurality of representations 124A-124N (representations 124). Representation 124A includes optional header data 126 and segments 128A-128N (segment 128), while representation 124N includes optional header data 130 and segments 132A-132N (segment 132). The letter N is conveniently used to designate the last movie fragment of each of representations 124. In some examples, there may be different numbers of movie fragments between representations 124.

MPD122は、表現124とは別個のデータ構造を含んでよい。MPD122は、図1のマニフェストファイル66に対応し得る。同様に、表現124は、図1の表現68に対応し得る。一般に、MPD122は、コーディングおよびレンダリング特性、適応セット、MPD122が対応するプロファイル、テキストタイプ情報、カメラ角度情報、レーティング情報、トリックモード情報(たとえば、時間的サブシーケンスを含む表現を示す情報)、および/または離れた期間を検索するための情報(たとえば、再生中のメディアコンテンツへのターゲティング広告の挿入のため)のような、表現124の特性を一般に表すデータを含んでよい。 MPD 122 may include data structures separate from representation 124. MPD 122 may correspond to manifest file 66 of FIG. Similarly, representation 124 may correspond to representation 68 of FIG. In general, the MPD 122 includes coding and rendering characteristics, an adaptation set, the profiles that the MPD 122 supports, text type information, camera angle information, rating information, trick mode information (e.g., information indicating representations that include temporal subsequences), and/or or may include data generally representative of characteristics of the representation 124, such as information for retrieving remote time periods (eg, for insertion of targeted advertisements into the media content being played).

ヘッダデータ126は、存在するとき、セグメント128の特性、たとえば、ランダムアクセスポイント(RAP、ストリームアクセスポイント(SAP)とも呼ばれる)の時間的ロケーション、セグメント128のうちのどれがランダムアクセスポイントを含むのか、セグメント128内のランダムアクセスポイントへのバイトオフセット、セグメント128のユニフォームリソースロケータ(URL)、またはセグメント128の他の態様を記述し得る。ヘッダデータ130は、存在するとき、セグメント132の同様の特性を記述し得る。追加または代替として、そのような特性はMPD122内に完全に含まれ得る。 Header data 126, when present, includes characteristics of segments 128, such as the temporal location of random access points (RAPs, also referred to as stream access points (SAPs)), which of segments 128 contain random access points, It may describe a byte offset to a random access point within segment 128, a uniform resource locator (URL) for segment 128, or other aspects of segment 128. Header data 130, when present, may describe similar characteristics of segment 132. Additionally or alternatively, such characteristics may be contained entirely within MPD 122.

セグメント128、132は、1つまたは複数のコード化ビデオサンプルを含み、ビデオサンプルの各々が、ビデオデータのフレームまたはスライスを含み得る。セグメント128のコード化ビデオサンプルの各々は、同様の特性、たとえば、高さ、幅、および帯域幅要件を有し得る。そのような特性は、MPD122のデータによって記述され得るが、そのようなデータは図3の例には示されていない。MPD122は、本開示で説明するシグナリングされた情報のいずれかまたはすべてが加えられた、3GPP仕様によって記述されるような特性を含み得る。 Segments 128, 132 include one or more coded video samples, each of which may include a frame or slice of video data. Each of the coded video samples in segment 128 may have similar characteristics, such as height, width, and bandwidth requirements. Such characteristics could be described by data of MPD 122, but such data is not shown in the example of FIG. 3. MPD 122 may include characteristics as described by the 3GPP specifications, plus any or all of the signaled information described in this disclosure.

セグメント128、132の各々は、固有のユニフォームリソースロケータ(URL)に関連付けられ得る。したがって、セグメント128、132の各々は、DASHのようなストリーミングネットワークプロトコルを使用して、別個に取出し可能であり得る。このようにして、クライアントデバイス40のような宛先デバイスは、HTTP GET要求を使用して、セグメント128または132を取り出すことができる。いくつかの例では、クライアントデバイス40は、HTTP部分GET要求を使用して、セグメント128または132の特定のバイト範囲を取り出すことができる。 Each of segments 128, 132 may be associated with a unique uniform resource locator (URL). Accordingly, each of segments 128, 132 may be separately retrievable using a streaming network protocol such as DASH. In this manner, a destination device, such as client device 40, can retrieve segment 128 or 132 using an HTTP GET request. In some examples, client device 40 may retrieve a particular range of bytes of segment 128 or 132 using an HTTP partial GET request.

図4は、図3のセグメント128、132のうちの1つのような表現のセグメントに対応し得る、例示的なビデオファイル150の要素を示すブロック図である。セグメント128、132の各々は、図4の例で示されるデータの構成に実質的に準拠するデータを含み得る。ビデオファイル150は、セグメントをカプセル化すると言われ得る。上記で説明したように、ISOベースのメディアファイルフォーマットおよびその拡張によるビデオファイルは、「ボックス」と呼ばれる一連のオブジェクト内にデータを記憶する。図4の例では、ビデオファイル150は、ファイルタイプ(FTYP)ボックス152と、ムービー(MOOV)ボックス154と、セグメントインデックス(sidx)ボックス162と、ムービーフラグメント(MOOF)ボックス164と、ムービーフラグメントランダムアクセス(MFRA)ボックス166とを含む。図4は、ビデオファイルの例を表すが、他のメディアファイルは、ISOベースのメディアファイルフォーマットおよびその拡張に従ってビデオファイル150のデータと同様に構成される他のタイプのメディアデータ(たとえば、オーディオデータ、時限のテキストデータなど)を含み得ることを理解されたい。 FIG. 4 is a block diagram illustrating elements of an example video file 150 that may correspond to a segment of a representation, such as one of segments 128, 132 of FIG. Each of segments 128, 132 may include data that substantially conforms to the organization of data shown in the example of FIG. Video file 150 may be said to encapsulate segments. As explained above, the ISO-based media file format and its extensions to video files store data in a series of objects called "boxes." In the example of Figure 4, a video file 150 has a file type (FTYP) box 152, a movie (MOOV) box 154, a segment index (sidx) box 162, a movie fragment (MOOF) box 164, and a movie fragment random access box. (MFRA) box 166. Although Figure 4 represents an example video file, other media files may be configured similarly to the data in the video file 150 according to the ISO-based media file format and its extensions (e.g., audio data , timed text data, etc.).

ファイルタイプ(FTYP)ボックス152は一般に、ビデオファイル150のファイルタイプを表す。ファイルタイプボックス152は、ビデオファイル150の最良の使用法を表す仕様を特定するデータを含み得る。ファイルタイプボックス152は、代替的には、MOOVボックス154、ムービーフラグメントボックス164、および/またはMFRAボックス166の前に配置され得る。 File type (FTYP) box 152 generally represents the file type of video file 150. File type box 152 may include data identifying specifications that represent the best use of video file 150. File type box 152 may alternatively be placed before MOOV box 154, movie fragment box 164, and/or MFRA box 166.

いくつかの例では、ビデオファイル150などのセグメントは、FTYPボックス152の前にMPD更新ボックス(図示せず)を含み得る。MPD更新ボックスは、ビデオファイル150を含む表現に対応するMPDが更新されるべきであることを示す情報を、MPDを更新するための情報とともに含み得る。たとえば、MPD更新ボックスは、MPDを更新するために使用されるリソースのURIまたはURLを提供することができる。別の例として、MPD更新ボックスは、MPDを更新するためのデータを含み得る。いくつかの例では、MPD更新ボックスは、ビデオファイル150のセグメントタイプ(STYP)ボックス(図示せず)の直後にくることがあり、このSTYPボックスは、ビデオファイル150のセグメントタイプを定義し得る。 In some examples, a segment such as video file 150 may include an MPD update box (not shown) before FTYP box 152. The MPD update box may include information indicating that the MPD corresponding to the representation that includes video file 150 should be updated, along with information for updating the MPD. For example, an MPD update box may provide the URI or URL of the resource used to update the MPD. As another example, an MPD update box may include data to update the MPD. In some examples, the MPD update box may immediately follow a segment type (STYP) box (not shown) for video file 150, which may define the segment type for video file 150.

図4の例では、MOOVボックス154は、ムービーヘッダ(MVHD)ボックス156と、トラック(TRAK)ボックス158と、1つまたは複数のムービー延長(MVEX)ボックス160とを含む。一般に、MVHDボックス156は、ビデオファイル150の一般的な特性を記述し得る。たとえば、MVHDボックス156は、ビデオファイル150がいつ最初に作成されたかを表すデータ、ビデオファイル150がいつ最後に修正されたかを表すデータ、ビデオファイル150のタイムスケールを表すデータ、ビデオファイル150の再生の長さを表すデータ、または、ビデオファイル150を全般に表す他のデータを含み得る。 In the example of FIG. 4, MOOV box 154 includes a movie header (MVHD) box 156, a track (TRAK) box 158, and one or more movie extension (MVEX) boxes 160. Generally, MVHD box 156 may describe general characteristics of video file 150. For example, MVHD box 156 contains data representing when video file 150 was first created, data representing when video file 150 was last modified, data representing the timescale of video file 150, and data representing the timescale of video file 150. or other data representative of video file 150 in general.

TRAKボックス158は、ビデオファイル150のトラックのためのデータを含み得る。TRAKボックス158は、TRAKボックス158に対応するトラックの特性を記述するトラックヘッダ(TKHD)ボックスを含み得る。いくつかの例では、TRAKボックス158は、コード化ビデオピクチャを含み得るが、他の例では、トラックのコード化ビデオピクチャは、TRAKボックス158のデータおよび/またはsidxボックス162のデータによって参照され得るムービーフラグメント164内に含まれ得る。 TRAK box 158 may contain data for tracks of video file 150. TRAK box 158 may include a track header (TKHD) box that describes characteristics of the track corresponding to TRAK box 158. In some examples, TRAK box 158 may include coded video pictures, while in other examples, the track's coded video pictures may be referenced by data in TRAK box 158 and/or data in sidx box 162. May be included within movie fragment 164.

いくつかの例では、ビデオファイル150は、2つ以上のトラックを含み得る。したがって、MOOVボックス154は、ビデオファイル150中のトラックの数と等しい数のTRAKボックスを含み得る。TRAKボックス158は、ビデオファイル150の対応するトラックの特性を記述し得る。たとえば、TRAKボックス158は、対応するトラックの時間情報および/または空間情報を記述し得る。MOOVボックス154のTRAKボックス158と同様のTRAKボックスは、カプセル化ユニット30(図3)がビデオファイル150のようなビデオファイル中にパラメータセットトラックを含める場合、パラメータセットトラックの特性を記述し得る。カプセル化ユニット30は、パラメータセットトラックを記述するTRAKボックス内で、パラメータセットトラックにシーケンスレベルSEIメッセージが存在することをシグナリングすることができる。 In some examples, video file 150 may include two or more tracks. Accordingly, MOOV box 154 may include a number of TRAK boxes equal to the number of tracks in video file 150. TRAK box 158 may describe characteristics of the corresponding track of video file 150. For example, TRAK box 158 may describe temporal and/or spatial information for the corresponding track. A TRAK box, similar to TRAK box 158 of MOOV box 154, may describe characteristics of a parameter set track if encapsulation unit 30 (FIG. 3) includes the parameter set track in a video file, such as video file 150. Encapsulation unit 30 may signal the presence of a sequence level SEI message in a parameter set track in a TRAK box that describes the parameter set track.

MVEXボックス160は、たとえば、もしあれば、MOOVボックス154内に含まれるビデオデータに加えて、ビデオファイル150がムービーフラグメント164を含むことをシグナリングするために、対応するムービーフラグメント164の特性を記述し得る。ストリーミングビデオデータのコンテキストでは、コード化ビデオピクチャは、MOOVボックス154の中ではなくムービーフラグメント164の中に含まれ得る。したがって、すべてのコード化ビデオサンプルは、MOOVボックス154の中ではなくムービーフラグメント164の中に含まれ得る。 MVEX box 160 describes the characteristics of corresponding movie fragment 164, e.g., to signal that video file 150 includes movie fragment 164 in addition to the video data contained within MOOV box 154, if any. obtain. In the context of streaming video data, coded video pictures may be contained within movie fragments 164 rather than within MOOV boxes 154. Therefore, all coded video samples may be contained within movie fragments 164 rather than within MOOV box 154.

MOOVボックス154は、ビデオファイル150の中のムービーフラグメント164の数に等しい数のMVEXボックス160を含み得る。MVEXボックス160の各々は、ムービーフラグメント164のうちの対応する1つの特性を記述し得る。たとえば、各MVEXボックスは、ムービーフラグメント164のうちの対応する1つの持続時間を記述するムービー延長ヘッダボックス(MEHD)ボックスを含み得る。 MOOV box 154 may include a number of MVEX boxes 160 equal to the number of movie fragments 164 in video file 150. Each of the MVEX boxes 160 may describe the characteristics of a corresponding one of the movie fragments 164. For example, each MVEX box may include a movie extension header box (MEHD) box that describes the duration of a corresponding one of movie fragments 164.

上述のように、カプセル化ユニット30は、実際のコード化ビデオデータを含まないビデオサンプル内にシーケンスデータセットを記憶し得る。ビデオサンプルは、一般にアクセスユニットに対応してよく、アクセスユニットは、特定の時間インスタンスにおけるコード化ピクチャの表現である。AVCのコンテキストでは、アクセスユニットと、SEIメッセージのような他の関連する非VCL NALユニットとのすべてのピクセルを構築するための情報を包含する、1つまたは複数のVCL NALユニットをコード化ピクチャは含む。したがって、カプセル化ユニット30は、シーケンスレベルSEIメッセージを含み得るシーケンスデータセットを、ムービーフラグメント164のうちの1つの中に含め得る。カプセル化ユニット30はさらに、シーケンスデータセットおよび/またはシーケンスレベルSEIメッセージの存在を、ムービーフラグメント164のうちの1つに対応するMVEXボックス160のうちの1つの中のムービーフラグメント164のうちの1つの中に存在するものとして、シグナリングすることができる。 As mentioned above, encapsulation unit 30 may store sequence data sets within video samples that do not contain actual coded video data. Video samples may generally correspond to access units, where an access unit is a representation of a coded picture at a particular time instance. In the context of AVC, a picture encodes one or more VCL NAL units, containing information for constructing all pixels with access units and other associated non-VCL NAL units such as SEI messages. include. Accordingly, encapsulation unit 30 may include within one of movie fragments 164 a sequence data set that may include sequence-level SEI messages. Encapsulation unit 30 further identifies the presence of a sequence data set and/or a sequence level SEI message in one of movie fragments 164 in one of MVEX boxes 160 corresponding to one of movie fragments 164. As such, it can be signaled.

SIDXボックス162は、ビデオファイル150の任意選択の要素である。すなわち、3GPPファイルフォーマットまたは他のそのようなファイルフォーマットに準拠するビデオファイルは、必ずしもSIDXボックス162を含むとは限らない。3GPPファイルフォーマットの例によれば、SIDXボックスは、セグメント(たとえば、ビデオファイル150内に含まれるセグメント)のサブセグメントを識別するために使用され得る。3GPPファイルフォーマットは、「対応するメディアデータボックスを有する1つまたは複数の連続するムービーフラグメントボックスの自己完結型セットであって、ムービーフラグメントボックスによって参照されるデータを包含するメディアデータボックスが、そのムービーフラグメントボックスに続き、同じトラックについての情報を包含する次のムービーフラグメントボックスに先行しなければならない」としてサブセグメントを定義する。3GPPファイルフォーマットはまた、SIDXボックスが、「ボックスによって文書化された(サブ)セグメントのサブセグメントへの一連の参照を包含する。参照されるサブセグメントは、プレゼンテーション時間において連続する。同様に、セグメントインデックスボックスによって参照されるバイトは、セグメント内で常に連続する。参照されるサイズは、参照される材料におけるバイトの数のカウントを与える」ことを示す。 SIDX box 162 is an optional element of video file 150. That is, a video file that conforms to the 3GPP file format or other such file format does not necessarily include a SIDX box 162. According to an example 3GPP file format, SIDX boxes may be used to identify subsegments of a segment (eg, a segment contained within video file 150). The 3GPP file format is a "self-contained set of one or more contiguous movie fragment boxes with corresponding media data boxes, such that the media data box containing the data referenced by the movie fragment box Define a subsegment as ``following a fragment box and must precede the next movie fragment box containing information about the same track.'' The 3GPP file format also specifies that a SIDX box "contains a series of references to subsegments of the (sub)segment documented by the box. The referenced subsegments are consecutive in presentation time. Similarly, the segment The bytes referenced by the index box are always contiguous within the segment.The referenced size gives a count of the number of bytes in the referenced material.

SIDXボックス162は、一般に、ビデオファイル150内に含まれるセグメントの1つまたは複数のサブセグメントを表す情報を提供する。たとえば、そのような情報は、サブセグメントが開始および/または終了する再生時間、サブセグメントに関するバイトオフセット、サブセグメントがストリームアクセスポイント(SAP)を含む(たとえば、それによって開始する)かどうか、SAPのタイプ(たとえば、SAPが、瞬時デコーダリフレッシュ(IDR)ピクチャ、クリーンランダムアクセス(CRA)ピクチャ、ブロークンリンクアクセス(BLA)ピクチャなどのいずれであるか)、サブセグメント内の(再生時間および/またはバイトオフセットに関する)SAPの位置、などを含み得る。 SIDX box 162 generally provides information representing one or more subsegments of the segments contained within video file 150. For example, such information may include the playback time at which the sub-segment starts and/or ends, the byte offset with respect to the sub-segment, whether the sub-segment contains (e.g. starts by) a stream access point (SAP), the SAP's type (for example, whether the SAP is an instantaneous decoder refresh (IDR) picture, clean random access (CRA) picture, broken link access (BLA) picture, etc.), (playback time and/or byte offset within the subsegment) (regarding) the location of the SAP, etc.

ムービーフラグメント164は、1つまたは複数のコード化ビデオピクチャを含み得る。いくつかの例では、ムービーフラグメント164は、1つまたは複数のピクチャグループ(GOP)を含んでよく、GOPの各々は、多数のコード化ビデオピクチャ、たとえばフレームまたはピクチャを含み得る。加えて、上記で説明したように、ムービーフラグメント164は、いくつかの例ではシーケンスデータセットを含み得る。ムービーフラグメント164の各々は、ムービーフラグメントヘッダボックス(MFHD、図4には示されない)を含み得る。MFHDボックスは、ムービーフラグメントのシーケンス番号などの、対応するムービーフラグメントの特性を記述し得る。ムービーフラグメント164は、ビデオファイル150の中でシーケンス番号の順序に含まれ得る。 Movie fragment 164 may include one or more coded video pictures. In some examples, movie fragment 164 may include one or more groups of pictures (GOPs), and each GOP may include multiple coded video pictures, eg, frames or pictures. Additionally, as explained above, movie fragment 164 may include sequence data sets in some examples. Each of movie fragments 164 may include a movie fragment header box (MFHD, not shown in FIG. 4). The MFHD box may describe characteristics of the corresponding movie fragment, such as the sequence number of the movie fragment. Movie fragments 164 may be included in sequence number order within video file 150.

MFRAボックス166は、ビデオファイル150のムービーフラグメント164内のランダムアクセスポイントを記述し得る。これは、ビデオファイル150によってカプセル化されたセグメント内の特定の時間的ロケーション(すなわち、再生時間)の探索を実施することなど、トリックモードを実施することを支援し得る。MFRAボックス166は、いくつかの例では、一般に任意選択であり、ビデオファイル中に含まれる必要はない。同様に、クライアントデバイス40のようなクライアントデバイスは、ビデオファイル150のビデオデータを正確に復号し表示するために、MFRAボックス166を必ずしも参照する必要はない。MFRAボックス166は、ビデオファイル150のトラックの数と等しい数のトラックフラグメントランダムアクセス(TFRA)ボックス(図示せず)を含んでよく、またはいくつかの例では、ビデオファイル150のメディアトラック(たとえば、非ヒントトラック)の数と等しい数のTFRAボックスを含んでよい。 MFRA box 166 may describe random access points within movie fragment 164 of video file 150. This may assist in implementing trick modes, such as performing a search for a particular temporal location (ie, playback time) within a segment encapsulated by video file 150. MFRA box 166 is generally optional in some examples and does not need to be included in the video file. Similarly, a client device, such as client device 40, does not necessarily need to reference MFRA box 166 to accurately decode and display the video data of video file 150. MFRA box 166 may include a number of track fragment random access (TFRA) boxes (not shown) equal to the number of tracks of video file 150, or in some examples, media tracks of video file 150 (e.g., may contain a number of TFRA boxes equal to the number of non-hint tracks).

いくつかの例では、ムービーフラグメント164は、IDRピクチャなどの1つまたは複数のストリームアクセスポイント(SAP)を含み得る。同様に、MFRAボックス166は、SAPのビデオファイル150内の位置の指示を提供し得る。したがって、ビデオファイル150の時間的サブシーケンスは、ビデオファイル150のSAPから形成され得る。時間的サブシーケンスはまた、SAPに従属するPフレームおよび/またはBフレームなどの他のピクチャを含み得る。時間的サブシーケンスのフレームおよび/またはスライスは、サブシーケンスの他のフレーム/スライスに依存する時間的サブシーケンスのフレーム/スライスが適切に復号され得るように、セグメント内に並べられ得る。たとえば、データの階層的構成において、他のデータのための予測に使用されるデータはまた、時間的サブシーケンス内に含まれ得る。 In some examples, movie fragment 164 may include one or more stream access points (SAPs), such as IDR pictures. Similarly, MFRA box 166 may provide an indication of the location within video file 150 of the SAP. Accordingly, a temporal subsequence of video file 150 may be formed from the SAP of video file 150. The temporal subsequence may also include other pictures such as P frames and/or B frames that are dependent on the SAP. Frames and/or slices of the temporal subsequence may be ordered into segments such that frames/slices of the temporal subsequence that depend on other frames/slices of the subsequence may be properly decoded. For example, in a hierarchical organization of data, data used in predictions for other data may also be included within temporal subsequences.

図5は、本開示の技法による、境界ボリュームをもつ例示的なカメラ経路セグメント212を示す概念図である。特に、3Dシーン200において、カメラ202は、ユーザが3Dシーン200の部分をそこから閲覧することができる視点を表す。この例では、経路セグメント212は、点204と点206との間に定義される。その上、境界ボリュームが、経路セグメント212に沿った、境界ボックス208から境界ボックス210への点の押出によって定義される。したがって、この例では、カメラ202は、経路セグメント212に沿って境界ボリューム内を動くことは許可されるが、境界ボリュームを超えて動くことは制限される。 FIG. 5 is a conceptual diagram illustrating an example camera path segment 212 with bounding volumes in accordance with the techniques of this disclosure. In particular, in 3D scene 200, camera 202 represents a viewpoint from which a user can view portions of 3D scene 200. In this example, path segment 212 is defined between points 204 and 206. Additionally, a bounding volume is defined by the extrusion of points from bounding box 208 to bounding box 210 along path segment 212. Thus, in this example, camera 202 is allowed to move within the bounding volume along path segment 212, but is restricted from moving beyond the bounding volume.

シーン記述は、カメラ202などのカメラが、それに沿って動くことを許可される経路のセットを記述し得る。経路は、経路セグメント212などの経路セグメントによって接続される、点204、206などのアンカーポイントのセットとして記述され得る。図5の例など、いくつかの例では、各経路セグメントは、経路に沿って、ある程度の運動の自由を認める境界ボリュームで強化され得る。 A scene description may describe a set of paths along which a camera, such as camera 202, is allowed to move. A path may be described as a set of anchor points, such as points 204, 206, connected by path segments, such as path segment 212. In some examples, such as the example of FIG. 5, each path segment may be reinforced with a bounding volume that allows some degree of freedom of movement along the path.

シーンカメラ、および結果的に閲覧者は、経路セグメントに沿って境界ボリューム内で自由に動くことができるようになる。経路セグメントは、経路のより細かい制御を認めるための、より複雑な幾何学的形態を使って記述され得る。 The scene camera, and thus the viewer, are allowed to move freely within the bounding volume along the path segments. Path segments may be described using more complex geometries to allow finer control of the path.

さらに、カメラパラメータは、経路沿いの各点において制約を受け得る。パラメータは、あらゆるアンカーポイントについて提供され、次いで、経路セグメント沿いのあらゆる点についての対応するパラメータを算出するために、補間関数と一緒に使われ得る。補間関数は、境界ボリュームを含むすべてのパラメータに適用することができる。 Additionally, camera parameters may be constrained at each point along the route. Parameters can be provided for every anchor point and then used with an interpolation function to calculate the corresponding parameters for every point along the path segment. Interpolation functions can be applied to all parameters, including bounding volumes.

本開示のカメラ制御拡張機構は、シーン用のカメラ制御を定義するglTF2.0拡張として実現されてよい。カメラ制御拡張は、「MPEG_camera_control」タグによって識別されてよく、このタグは、extensionsUsed要素に含まれてよく、3Dシーン用のextensionsRequired要素に含まれてよい。 The camera control extension mechanism of the present disclosure may be implemented as a glTF2.0 extension that defines camera control for a scene. Camera control extensions may be identified by the "MPEG_camera_control" tag, which may be included in the extensionsUsed element and may be included in the extensionsRequired element for the 3D scene.

例示的な「MPEG_camera_control」拡張が、以下でTable 1(表1)に示され、シーン記述の「カメラ」要素において定義され得る。
カメラ制御情報は、以下のように構造化され得る。
・各アンカーポイントについて、アンカーポイントの(x,y,z)座標は、浮動小数点値を使って表され得る
・各経路セグメントについて、経路セグメントの第1および第2のアンカーポイントの(i,j)インデックスは、整数値として表され得る
・境界ボリュームについて、
○boundingVolumeがBV_CONEである場合、第1のアンカーポイントおよび第2のアンカーポイントの円の(r1,r2)半径が与えられ得る。
○boundingVolumeがBV_FRUSTUMである場合、経路セグメントの各アンカーポイントについて((x,y,z)_topleft,w,h)が与えられ得る。
○boundingVolumeがBV_SPHEREである場合、経路セグメントの各アンカーポイントについて、球の半径としてのrが与えられ得る。
・intrinsicParametersが真である場合、内部パラメータオブジェクトは修正されてよい。 An exemplary "MPEG_camera_control" extension is shown below in Table 1 and may be defined in the "camera" element of the scene description.
Camera control information may be structured as follows.
- For each anchor point, the (x,y,z) coordinates of the anchor point can be represented using floating point values. - For each path segment, the (i,j) coordinates of the first and second anchor points of the path segment ) index can be expressed as an integer value. For bounding volumes,
o If boundingVolume is BV_CONE, the (r1,r2) radius of the circle of the first anchor point and the second anchor point may be given.
o If boundingVolume is BV_FRUSTUM, then ((x,y,z)_topleft,w,h) may be given for each anchor point of the path segment.
o If boundingVolume is BV_SPHERE, then r as the radius of the sphere may be given for each anchor point of the path segment.
- If intrinsicParameters is true, the internal parameters object may be modified.

プレゼンテーションエンジン(たとえば、図2のプレゼンテーションエンジン114または図1および図2に示す構成要素とは異なり得る、クライアントデバイス40の別の要素)は、MPEG_camera_control拡張または他のそのようなデータ構造をサポートし得る。シーンがカメラ制御情報を提供する場合、プレゼンテーションエンジンは、カメラの(x,y,z)座標が常に経路セグメント上に、または経路セグメントの境界ボリューム内にあるように、カメラ移動を、示される経路に制限してよい。プレゼンテーションエンジンは、境界ボリュームの境界に閲覧者が近づくと、視覚、音響、および/または触覚フィードバックを閲覧者に与え得る。 A presentation engine (e.g., presentation engine 114 of FIG. 2 or another element of client device 40 that may be different from the components shown in FIGS. 1 and 2) may support an MPEG_camera_control extension or other such data structure. . If the scene provides camera control information, the presentation engine controls the camera movement so that the camera's (x,y,z) coordinates are always on the path segment or within the bounding volume of the path segment. may be limited to The presentation engine may provide visual, audio, and/or haptic feedback to the viewer as the viewer approaches the boundary of the bounding volume.

図6は、この例では椅子である例示的な仮想オブジェクト220を示す概念図である。没入型エクスペリエンスを閲覧者に提供するために、閲覧者が、シーンの中のオブジェクトと適切に相互作用することが重要である。閲覧者は、壁、椅子、およびテーブルなど、シーン中の固体オブジェクト、または他のそのような固体オブジェクトの中を歩くことができるべきでない。 FIG. 6 is a conceptual diagram illustrating an exemplary virtual object 220, which in this example is a chair. In order to provide the viewer with an immersive experience, it is important that the viewer interact appropriately with objects in the scene. Viewers should not be able to walk through solid objects in the scene, such as walls, chairs, and tables, or other such solid objects.

図6は、直方体のセットとして定義される、椅子の3Dメッシュ表現を、衝突境界とともに示す。そのような3Dメッシュの衝突境界の記述を与えるために、MPEG_mesh_collision拡張データ構造が定義されてよい。拡張データ構造は、メッシュジオメトリの周りの直方体のセットとして、メッシュオブジェクト上に定義されてよい。以下のTable 2(表2)は、そのような拡張データ構造に含まれ得るプロパティの例示的なセットを表す。
Figure 6 shows a 3D mesh representation of the chair, defined as a set of cuboids, along with collision boundaries. The MPEG_mesh_collision extension data structure may be defined to provide a description of the collision boundaries of such a 3D mesh. An extended data structure may be defined on a mesh object as a set of cuboids around the mesh geometry. Table 2 below represents an exemplary set of properties that may be included in such an extended data structure.

メッシュ衝突情報は、直方体境界についての直方体頂点座標(x,y,z)または球状境界についての球中心および半径を含み得る。値は、浮動小数点数として与えられ得る。 Mesh collision information may include cuboid vertex coordinates (x, y, z) for a cuboid boundary or sphere center and radius for a spherical boundary. The value may be given as a floating point number.

プレゼンテーションエンジンは、MPEG_mesh_collision拡張または他のそのようなデータ構造をサポートし得る。プレゼンテーションエンジンは、カメラ位置(x,y,z)が、どの時点でも、定義されたメッシュ直方体のうちの1つに含まれることにはならないことを保証し得る。衝突は、視覚、音響、および/または触覚フィードバックを通して閲覧者に対してシグナリングされ得る。プレゼンテーションエンジンは、ノード用の境界についての情報を、衝突を検出する3D物理エンジンを初期化し、構成するのに使い得る。 The presentation engine may support the MPEG_mesh_collision extension or other such data structures. The presentation engine may ensure that the camera position (x,y,z) will not fall within one of the defined mesh cuboids at any time. Collisions may be signaled to the viewer through visual, acoustic, and/or haptic feedback. The presentation engine may use information about the boundaries for the nodes to initialize and configure the 3D physics engine that detects collisions.

図7は、本開示の技法による、メディアデータを取り出す例示的な方法を示すフローチャートである。図7の方法は、図1のクライアントデバイス40および図2の取出しユニット52に関して説明される。他のそのようなデバイスが、この方法または同様の方法を実施するように構成されてもよい。 FIG. 7 is a flowchart illustrating an example method for retrieving media data in accordance with the techniques of this disclosure. The method of FIG. 7 is described with respect to client device 40 of FIG. 1 and retrieval unit 52 of FIG. 2. Other such devices may be configured to perform this or similar methods.

最初に、クライアントデバイス40がメディアデータを取り出し得る(250)。たとえば、取出しユニット52が、たとえば、glTF2.0に準拠するメディアデータを取り出し得る。いくつかの例では、取出しユニット52は、たとえば、DASHを使うなど、ユニキャストにより、メディアデータを直接取り出し得る。いくつかの例では、図2のeMBMSミドルウェア100など、取出しユニット52のミドルウェアユニットが、ブロードキャストまたはマルチキャストによりメディアデータを受信してよく、次いでDASHクライアント、たとえば、図2のDASHクライアント110が、ミドルウェアユニットからメディアデータを取り出してよい。 Initially, client device 40 may retrieve media data (250). For example, retrieval unit 52 may retrieve media data that is compliant with glTF2.0, for example. In some examples, retrieval unit 52 may retrieve media data directly by unicast, eg, using DASH. In some examples, a middleware unit of retrieval unit 52, such as eMBMS middleware 100 of FIG. 2, may receive media data via broadcast or multicast, and then a DASH client, e.g., DASH client 110 of FIG. You can extract media data from .

メディアデータは、シーン記述を含み得る。したがって、取出しユニット52またはクライアントデバイス40の別の構成要素が、メディアデータからシーン記述を抽出し得る(252)。シーン記述は、本開示の技法による、カメラ制御データを含むMPEGシーン記述であってよい。取出しユニット52は、シーン記述をプレゼンテーションエンジン114に与えてよい。プレゼンテーションエンジン114はこのように、シーン記述を受信し、シーン記述から、3次元シーン用のカメラ制御データを判断し得る(254)。カメラ制御データは、上のTable 1(表1)に準拠し得る。つまり、たとえば、カメラ制御データは、カメラ経路用の1つもしくは複数のアンカーポイント、カメラ経路用のアンカーポイントの間の1つもしくは複数のセグメント、円錐体、錐体、もしくは球などの境界ボリューム、各アンカーポイントにおいて修正され得る内部パラメータ、および/またはカメラ制御情報を提供するアクセサを含み得る。 Media data may include scene descriptions. Accordingly, retrieval unit 52 or another component of client device 40 may extract the scene description from the media data (252). The scene description may be an MPEG scene description that includes camera control data in accordance with the techniques of this disclosure. Retrieval unit 52 may provide the scene description to presentation engine 114. Presentation engine 114 may thus receive the scene description and determine camera control data for the three-dimensional scene from the scene description (254). Camera control data may conform to Table 1 above. That is, for example, the camera control data may include one or more anchor points for the camera path, one or more segments between the anchor points for the camera path, a bounding volume such as a cone, cone, or sphere; It may include accessors that provide internal parameters and/or camera control information that may be modified at each anchor point.

プレゼンテーションエンジン114は、カメラ制御データから移動制約をさらに判断し得る(256)。たとえば、プレゼンテーションエンジン114は、2つ以上のアンカーポイントおよびアンカーポイントの間の容認経路を、カメラ制御データの移動制約から判断し得る。追加または代替として、プレゼンテーションエンジン114は、立方体、球、錐体、円錐体などのような境界ボリュームを、カメラ制御データの移動制約から判断し得る。プレゼンテーションエンジン114は、仮想カメラがそれに沿って動くことを許可される、および/または仮想カメラが境界ボリューム内で動くことを許可されるが境界ボリュームの外に出ることは許可されない経路を判断するのに、容認経路を使い得る。容認経路および/または境界ボリュームは、仮想カメラが、壁などの3D固体仮想オブジェクトを超えないことを保証するように定義されてよい。つまり、境界ボリュームまたは容認経路は、壁、床、天井、または3D仮想シーン内の他のオブジェクトなど、1つまたは複数の3D固体仮想オブジェクト内にあるように定義されてよい。 Presentation engine 114 may further determine movement constraints from the camera control data (256). For example, presentation engine 114 may determine two or more anchor points and acceptable paths between the anchor points from movement constraints in camera control data. Additionally or alternatively, the presentation engine 114 may determine bounding volumes, such as cubes, spheres, pyramids, cones, etc., from the movement constraints of the camera control data. The presentation engine 114 determines the path along which the virtual camera is allowed to move and/or the virtual camera is allowed to move within the bounding volume but not outside the bounding volume. The acceptance route can be used. Acceptable paths and/or bounding volumes may be defined to ensure that the virtual camera does not exceed 3D solid virtual objects such as walls. That is, a bounding volume or acceptable path may be defined to lie within one or more 3D solid virtual objects, such as walls, floors, ceilings, or other objects within a 3D virtual scene.

プレゼンテーションエンジン114は次いで、カメラ移動データを受信し得る(258)。たとえば、プレゼンテーションエンジン114は、ハンドヘルドコントローラおよび/またはディスプレイを含むヘッドセットなど、1つまたは複数のコントローラから、ヘッドセットの配向と、方向性移動および/または回転移動など、ヘッドセットおよび/または仮想カメラの移動とを表すデータを受信し得る。プレゼンテーションエンジン114は、カメラ移動データが、境界ボリュームの限界を超えるか、または定義された容認経路のうちの1つではない経路に沿うなど、3D固体仮想オブジェクトを通るカメラ移動を要求すると判断する場合がある(260)。それに応答して、プレゼンテーションエンジン114は、仮想カメラが3D固体仮想オブジェクトの中を通るのを防止してよい(262)。 Presentation engine 114 may then receive camera movement data (258). For example, the presentation engine 114 may control the orientation of the headset and the headset and/or virtual camera, such as directional and/or rotational movement, from one or more controllers, such as a headset that includes a handheld controller and/or display. may receive data representative of movement of the person. If the presentation engine 114 determines that the camera movement data requires camera movement through the 3D solid virtual object, such as beyond the limits of a bounding volume or along a path that is not one of the defined acceptable paths. There is (260). In response, presentation engine 114 may prevent the virtual camera from passing through the 3D solid virtual object (262).

このように、図7の方法は、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、3次元シーン用のカメラ制御データを受信するステップであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ移動データに応答して、カメラ制御データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップとを含む、メディアデータを取り出す方法の例を表す。 As such, the method of FIG. 7 includes the steps of receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving control data, the camera control data comprising data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object; receiving camera movement data from a user requesting that the camera move within the at least one virtual solid object; and in response to the camera movement data, the virtual camera is moved by the presentation engine using the camera control data. and preventing passage through at least one virtual solid object.

図8は、本開示の技法による、メディアデータを取り出す例示的な方法を示すフローチャートである。図8の方法は、図1のクライアントデバイス40および図2の取出しユニット52に関して説明される。他のそのようなデバイスが、この、または同様の方法を実施するように構成されてもよい。 FIG. 8 is a flowchart illustrating an example method for retrieving media data in accordance with the techniques of this disclosure. The method of FIG. 8 is described with respect to client device 40 of FIG. 1 and retrieval unit 52 of FIG. 2. Other such devices may be configured to implement this or similar methods.

最初に、クライアントデバイス40がメディアデータを取り出し得る(280)。たとえば、取出しユニット52が、たとえば、glTF2.0に準拠するメディアデータを取り出し得る。いくつかの例では、取出しユニット52は、たとえば、DASHを使うなど、ユニキャストにより、メディアデータを直接取り出し得る。いくつかの例では、図2のeMBMSミドルウェア100など、取出しユニット52のミドルウェアユニットが、ブロードキャストまたはマルチキャストによりメディアデータを受信してよく、次いでDASHクライアント、たとえば、図2のDASHクライアント110が、ミドルウェアユニットからメディアデータを取り出してよい。 Initially, client device 40 may retrieve media data (280). For example, retrieval unit 52 may retrieve media data that is compliant with glTF2.0, for example. In some examples, retrieval unit 52 may retrieve media data directly by unicast, eg, using DASH. In some examples, a middleware unit of retrieval unit 52, such as eMBMS middleware 100 of FIG. 2, may receive media data via broadcast or multicast, and then a DASH client, e.g., DASH client 110 of FIG. You can extract media data from .

メディアデータは、シーン記述を含み得る。したがって、取出しユニット52またはクライアントデバイス40の別の構成要素が、メディアデータからシーン記述を抽出し得る(282)。シーン記述は、本開示の技法による、オブジェクト衝突データを含むMPEGシーン記述であってよい。取出しユニット52は、シーン記述をプレゼンテーションエンジン114に与えてよい。プレゼンテーションエンジン114はこのように、シーン記述を受信し、1つまたは複数の3D固体仮想オブジェクトについてのオブジェクト衝突データをシーン記述から判断し得る(284)。オブジェクト衝突データは、上のTable 2(表2)に準拠し得る。つまり、オブジェクト衝突データは、たとえば、メッシュ(3D仮想固体)オブジェクトの衝突境界を定義する境界形状のアレイを表す境界を表すデータ、オブジェクトが静的(つまり、可動)であるかどうかを示すデータ、オブジェクト用の衝突材料を表す材料、および/または衝突が起きた場合にオブジェクト用に提示されるべきアニメーションを含み得る。 Media data may include scene descriptions. Accordingly, retrieval unit 52 or another component of client device 40 may extract the scene description from the media data (282). The scene description may be an MPEG scene description that includes object collision data in accordance with the techniques of this disclosure. Retrieval unit 52 may provide the scene description to presentation engine 114. Presentation engine 114 may thus receive the scene description and determine object collision data for one or more 3D solid virtual objects from the scene description (284). Object collision data may conform to Table 2 above. That is, object collision data may include, for example, data representing boundaries that represents an array of boundary shapes that define the collision boundaries of a mesh (3D virtual solid) object, data indicating whether the object is static (i.e., movable), It may include materials representing collision materials for the object and/or animations to be presented for the object if a collision occurs.

プレゼンテーションエンジン114は、カメラ制御データからオブジェクト衝突データをさらに判断し得る(286)。たとえば、プレゼンテーションエンジン114は、メッシュ(3D仮想固体)オブジェクトの衝突境界を定義する境界形状のアレイを表す境界、オブジェクトが静的(つまり、可動)であるかどうかを示すデータ、オブジェクト用の衝突材料を表す材料、および/または衝突が起きた場合にオブジェクト用に提示されるべきアニメーションを判断し得る。プレゼンテーションエンジン114は、3D固体仮想オブジェクトとの衝突が起きた場合にどのように反応するかを判断するのに、オブジェクト衝突データを使い得る。 Presentation engine 114 may further determine object collision data from the camera control data (286). For example, the presentation engine 114 may include mesh (3D virtual solid) boundaries representing an array of boundary shapes that define the collision boundaries of the object, data indicating whether the object is static (i.e., movable), collision materials for the object, etc. may determine the material representing the object and/or the animation to be presented for the object if a collision occurs. Presentation engine 114 may use object collision data to determine how to react in the event of a collision with a 3D solid virtual object.

プレゼンテーションエンジン114は次いで、カメラ移動データを受信し得る(288)。たとえば、プレゼンテーションエンジン114は、ハンドヘルドコントローラおよび/またはディスプレイを含むヘッドセットなど、1つまたは複数のコントローラから、ヘッドセットの配向と、方向性移動および/または回転移動など、ヘッドセットおよび/または仮想カメラの移動とを表すデータを受信し得る。プレゼンテーションエンジン114は、カメラ移動データが、オブジェクト衝突データによって定義される、3D固体仮想オブジェクトの中などへの、3D固体仮想オブジェクトを通るカメラ移動を要求すると判断し得る(290)。それに応答して、プレゼンテーションエンジン114は、仮想カメラが3D固体仮想オブジェクトの中を通るのを防止してよい(292)。たとえば、オブジェクトが、オブジェクト衝突データによって示されるように静的である場合、プレゼンテーションエンジン114は、仮想カメラがオブジェクトの中へ、およびその中を動くのを防止してよい。別の例として、オブジェクトが静的でない(たとえば、可動)場合、プレゼンテーションエンジン114は、オブジェクトとの衝突に応答して、オブジェクトが転倒するかまたは動くことになっている場合は、たとえば、オブジェクトに対して再生するべきアニメーションなど、オブジェクト衝突データからの反応を判断してよい。 Presentation engine 114 may then receive camera movement data (288). For example, the presentation engine 114 may control the orientation of the headset and the headset and/or virtual camera, such as directional and/or rotational movement, from one or more controllers, such as a headset that includes a handheld controller and/or display. may receive data representative of movement of the person. Presentation engine 114 may determine that the camera movement data calls for camera movement through the 3D solid virtual object, such as into and through the 3D solid virtual object defined by the object collision data (290). In response, presentation engine 114 may prevent the virtual camera from passing through the 3D solid virtual object (292). For example, if the object is static, as indicated by the object collision data, presentation engine 114 may prevent the virtual camera from moving into and through the object. As another example, if the object is not static (e.g., movable), presentation engine 114 may cause the object to fall or move in response to a collision with the object, e.g. The reaction may be determined from the object collision data, such as an animation to be played against the object.

このように、図8の方法は、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ移動データに応答して、オブジェクト衝突データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップとを含む、メディアデータを取り出す方法の例を表す。 As such, the method of FIG. 8 includes the steps of receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by a presentation engine, camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object; in response, using the object collision data to prevent, by a presentation engine, a virtual camera from passing through at least one virtual solid object.

本開示の技法のいくつかの例が、以下の条項において要約される。 Some examples of the techniques of this disclosure are summarized in the following sections.

条項1:メディアデータを取り出す方法であって、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、3次元シーン用のカメラ制御データを受信するステップであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、プレゼンテーションエンジンによって、仮想カメラのロケーションを更新するステップとを含む方法。 Clause 1: A method for retrieving media data, the method comprising: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving camera control data of the virtual camera, the camera control data comprising data defining permissible locations for the virtual camera; and the presentation engine moves the virtual camera within the at least one virtual solid object. receiving camera movement data from a user requesting camera movement; and using the camera control data to update, by a presentation engine, the location of the virtual camera to ensure that the virtual camera remains within an acceptable location. and a method including.

条項2:仮想カメラのロケーションを更新するステップは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップを含む、条項1の方法。 Clause 2: The method of Clause 1, wherein updating the location of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

条項3:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項1の方法。 Clause 3: The method of Clause 1, where the media data to be streamed includes glTF2.0 media data.

条項4:ストリーミングされるメディアデータを受信するステップは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するステップを含む、条項1の方法。 Clause 4: The method of Clause 1, wherein receiving the streamed media data comprises requesting the streamed media data from the retrieval unit via an application programming interface (API).

条項5:カメラ制御データは、MPEGシーン記述の中に含まれる、条項1の方法。 Clause 5: The method of Clause 1, where camera control data is included within the MPEG scene description.

条項6:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、仮想カメラのロケーションを更新するステップは、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めるステップを含む、条項1の方法。 Clause 6: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for a virtual camera, The method of clause 1, wherein updating the camera location includes allowing the virtual camera to traverse only segments between anchor points.

条項7:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、仮想カメラのロケーションを更新するステップは、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めるステップを含む、条項1の方法。 Clause 7: The camera control data includes data defining a bounding volume representing a permissible camera movement volume for the virtual camera, and updating the location of the virtual camera ensures that the virtual camera traverses only the permissible camera movement volume. The method of Clause 1, including the step of recognizing.

条項8:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項7の方法。 Clause 8: The method of Clause 7, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項9:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項1の方法。 Clause 9: Camera control data is contained within the MPEG_camera_control extension, the method of Clause 1.

条項10:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項9の方法。 Clause 10: MPEG_camera_control extension supports anchor data representing the number of anchor points for the allowed path for the virtual camera, segment data representing the number of path segments for the allowed path between the anchor points, and bounding volume for the virtual camera. clause 9, including one or more of: bounding volume data representing a camera parameter, an internal parameter indicating whether a camera parameter is modified at each of the anchor points, and accessor data representing an index of an accessor providing camera control data. the method of.

条項11:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項1の方法。 Clause 11: The method of Clause 1, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

条項12:カメラ制御データから、仮想カメラのための許容経路を判断するステップをさらに含み、仮想カメラのロケーションを更新するステップは、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証するステップを含む、条項1の方法。 Clause 12: The step of updating the location of the virtual camera further comprises determining from the camera control data an allowable path for the virtual camera, the step of updating the location of the virtual camera being such that the virtual camera is within the allowable path defined in the camera control data. The method of clause 1, comprising the step of ensuring movement only along some virtual path.

条項13:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項1の方法。 Clause 13: Camera control data is included in the MPEG_mesh_collision extension, the method of Clause 1.

条項14:メディアデータを取り出すためのデバイスであって、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを備え、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新することとを行うように構成される、デバイス。 Clause 14: A device for retrieving media data, the device comprising a memory configured to store the media data and one or more devices implemented with circuitry and configured to run a presentation engine. a processor; the presentation engine is configured to receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and to receive camera control data for the three-dimensional scene. , the camera control data includes data defining permissible locations for the virtual camera; and receiving camera movement data from a user that requires the virtual camera to move within the at least one virtual solid object. , using the camera control data to update a location of the virtual camera to ensure that the virtual camera remains within an acceptable location.

条項15:プレゼンテーションエンジンは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するように構成される、条項14のデバイス。 Clause 15: The device of Clause 14, wherein the presentation engine is configured to prevent the virtual camera from passing through at least one virtual solid object.

条項16:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項14のデバイス。 Clause 16: Media data streamed to a Clause 14 device, including glTF2.0 media data.

条項17:プレゼンテーションエンジンは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するように構成される、条項14のデバイス。 Clause 17: The device of Clause 14, wherein the presentation engine is configured to request streamed media data from the retrieval unit via an application programming interface (API).

条項18:カメラ制御データは、MPEGシーン記述の中に含まれる、条項14のデバイス。 Clause 18: Camera control data is included in the MPEG scene description for Clause 14 devices.

条項19:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めるように構成される、条項14のデバイス。 Clause 19: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for a virtual camera and The device of clause 14, wherein the presentation engine is configured to allow the virtual camera to only traverse segments between anchor points in order to update the camera's location.

条項20:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めるように構成される、条項14のデバイス。 Clause 20: The camera control data includes data defining a bounding volume representing the allowed camera movement volume for the virtual camera, and in order to update the location of the virtual camera, the presentation engine specifies that the virtual camera has only the allowed camera movement volume. Clause 14 devices configured to permit traversal.

条項21:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項20のデバイス。 Clause 21: The device of Clause 20, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項22:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項14のデバイス。 Clause 22: Camera control data is contained in the MPEG_camera_control extension for Clause 14 devices.

条項23:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項22のデバイス。 Clause 23: MPEG_camera_control extension supports anchor data representing the number of anchor points for the allowed path for the virtual camera, segment data representing the number of path segments for the allowed path between the anchor points, and bounding volume for the virtual camera. clause 22, including one or more of: bounding volume data representing the camera parameters, internal parameters indicating whether the camera parameters are modified at each of the anchor points, and accessor data representing the index of the accessor providing the camera control data. device.

条項24:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項14のデバイス。 Clause 24: The device of Clause 14, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項25:プレゼンテーションエンジンは、カメラ制御データから、仮想カメラのための許容経路を判断するようにさらに構成され、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証するように構成される、条項14のデバイス。 Clause 25: The presentation engine is further configured to determine, from the camera control data, an allowable path for the virtual camera; and to update the location of the virtual camera, the presentation engine determines that the virtual camera A device according to clause 14 configured to ensure movement only along virtual paths that are within the permissible paths defined in the device.

条項26:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項14のデバイス。 Clause 26: Camera control data is included in the MPEG_mesh_collision extension for Clause 14 devices.

条項27:命令を記憶したコンピュータ可読記憶媒体であって、命令は、実行されると、プレゼンテーションエンジンを実行するプロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新することとを行わせる。 Clause 27: A computer-readable storage medium having instructions stored thereon, the instructions, when executed, transmitting to a processor executing a presentation engine a streamed media representing a virtual three-dimensional scene including at least one virtual solid object. receiving data; and receiving camera control data for the three-dimensional scene, the camera control data including data defining permissible locations for the virtual camera; Receiving camera movement data from a user that requests movement within the virtual solid object and using camera control data to determine the location of the virtual camera to ensure that the virtual camera remains within acceptable locations. update.

条項28:プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止させる命令を含む、条項27のコンピュータ可読記憶媒体。 Clause 28: The computer-readable storage medium of Clause 27, wherein the instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

条項29:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項27のコンピュータ可読媒体。 Clause 29: Streamed media data is the computer-readable medium of Clause 27, including glTF2.0 media data.

条項30:プロセッサに、ストリーミングされるメディアデータを受信させる命令は、プロセッサに、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求させる命令を含む、条項27のコンピュータ可読媒体。 Clause 30: The instructions for causing the processor to receive streamed media data include instructions for causing the processor to request streamed media data from the retrieval unit via an application programming interface (API). Clause 27 computer readable medium.

条項31:カメラ制御データは、MPEGシーン記述の中に含まれる、条項27のコンピュータ可読媒体。 Clause 31: Camera control data is contained within an MPEG scene description on a Clause 27 computer-readable medium.

条項32:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めさせる命令を含む、条項27のコンピュータ可読媒体。 Clause 32: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for the virtual camera; 27. The computer-readable medium of clause 27, wherein the instructions cause the processor to update the location of the virtual camera to cause the processor to allow the virtual camera to traverse only segments between anchor points.

条項33:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めさせる命令を含む、条項27のコンピュータ可読媒体。 Clause 33: The camera control data includes data defining a bounding volume representing an allowable camera movement volume for a virtual camera, and instructions that cause the processor to update the location of the virtual camera cause the processor to specify that the virtual camera has an allowable camera movement volume. Article 27 computer-readable medium containing instructions that permit traversal only.

条項34:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項20のコンピュータ可読媒体。 Clause 34: The computer-readable medium of Clause 20, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項35:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項27のコンピュータ可読媒体。 Clause 35: Computer-readable medium of Clause 27, where camera control data is contained within the MPEG_camera_control extension.

条項36:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項22のコンピュータ可読媒体。 Clause 36: MPEG_camera_control extension supports anchor data representing the number of anchor points for the allowed path for the virtual camera, segment data representing the number of path segments for the allowed path between anchor points, bounding volume for the virtual camera. clause 22, including one or more of: bounding volume data representing the camera parameters, internal parameters indicating whether the camera parameters are modified at each of the anchor points, and accessor data representing the index of the accessor providing the camera control data. computer readable medium.

条項37:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項27のコンピュータ可読媒体。 Clause 37: The computer-readable medium of Clause 27, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項38:プロセッサに、カメラ制御データから、仮想カメラのための許容経路を判断させる命令をさらに含み、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証させる命令を含む、条項27のコンピュータ可読媒体。 Clause 38: The instructions further include instructions for causing the processor to determine an allowable path for the virtual camera from the camera control data, and the instructions for causing the processor to update the location of the virtual camera further include instructions for causing the processor to determine, from the camera control data, the location of the virtual camera. 27. The computer-readable medium of clause 27 containing instructions for ensuring movement only along virtual paths that are within the permissible paths defined in .

条項39:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項27のコンピュータ可読媒体。 Clause 39: Computer-readable medium of Clause 27, where camera control data is contained within the MPEG_mesh_collision extension.

条項40:メディアデータを取り出すためのデバイスであって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、3次元シーン用のカメラ制御データを受信するための手段であって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新するための手段とを備えるデバイス。 Clause 40: A device for retrieving media data, the device comprising: means for receiving streamed media data representing a virtual three-dimensional scene comprising at least one virtual solid object; and camera control data for the three-dimensional scene. means for receiving, the camera control data comprising data defining permissible locations for the virtual camera; and means for receiving camera movement requiring the virtual camera to move within the at least one virtual solid object. A device comprising: means for receiving data from a user; and means for using camera control data to update a location of a virtual camera to ensure that the virtual camera remains within an acceptable location.

条項41:メディアデータを取り出す方法であって、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するステップと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、プレゼンテーションエンジンによって、ユーザから受信するステップと、カメラ移動データに応答して、オブジェクト衝突データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように、仮想カメラのロケーションを更新するステップとを含む方法。 Clause 41: A method for retrieving media data, the method comprising: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by a presentation engine, from a user object collision data representing a boundary of the solid object; and receiving, by a presentation engine, camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object; and in response to the data, using the object collision data, the presentation engine updates the location of the virtual camera to ensure that the virtual camera remains outside of at least one virtual solid object.

条項42:仮想カメラのロケーションを更新するステップは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップを含む、条項41の方法。 Clause 42: The method of Clause 41, wherein updating the location of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

条項43:オブジェクト衝突データを受信するステップは、MPEG_mesh_collision拡張を受信するステップを含む、条項41の方法。 Clause 43: The method of Clause 41, wherein receiving object collision data comprises receiving an MPEG_mesh_collision extension.

条項44:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項43の方法。 Clause 44: The method of Clause 43, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for at least one virtual solid object.

条項45:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項44の方法。 Clause 45: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh The method of clause 44, including data defining one.

条項46:オブジェクト衝突データを受信するステップは、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信するステップを含む、条項41の方法。 Clause 46: The step of receiving object collision data comprises boundary data representing one or more collision boundaries of the at least one virtual solid object, static data representing whether the at least one virtual solid object is affected by the collision. , one or more of material data representing how the colliding object interacts with the at least one virtual solid object, or animation data representing an animation triggered by the collision with the at least one virtual solid object. The method of clause 41, comprising the step of receiving data comprising:

条項47:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項41の方法。 Clause 47: The method of Clause 41, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

条項48:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項41の方法。 Clause 48: The method of Clause 41, where the media data to be streamed includes glTF2.0 media data.

条項49:ストリーミングされるメディアデータを受信するステップは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するステップを含む、条項41の方法。 Clause 49: The method of Clause 41, wherein receiving the streamed media data comprises requesting the streamed media data from the retrieval unit via an application programming interface (API).

条項50:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項41の方法。 Clause 50: The method of Clause 41, in which object collision data is included within the MPEG scene description.

条項51:メディアデータを取り出すためのデバイスであって、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを備え、1つまたは複数のプロセッサは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように仮想カメラのロケーションを更新することとを行うように構成される、デバイス。 Clause 51: A device for retrieving media data, the device comprising a memory configured to store the media data and one or more circuitry implemented with circuitry and configured to run a presentation engine. a processor; the one or more processors receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and an object representing a boundary of the at least one virtual solid object. receiving collision data; receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object; and, in response to the camera movement data, using the object collision data. and updating the location of the virtual camera to ensure that the virtual camera remains outside of at least one virtual solid object.

条項52:仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するように構成される、条項51のデバイス。 Clause 52: The device of Clause 51, wherein the presentation engine is configured to prevent the virtual camera from passing through at least one virtual solid object in order to update the location of the virtual camera.

条項53:オブジェクト衝突データを受信するために、プレゼンテーションエンジンは、MPEG_mesh_collision拡張を受信するように構成される、条項51のデバイス。 Clause 53: The device of Clause 51, wherein the presentation engine is configured to receive the MPEG_mesh_collision extension to receive object collision data.

条項54:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項53のデバイス。 Clause 54: The MPEG_mesh_collision extension is a Clause 53 device containing data defining at least one 3D mesh for at least one virtual solid object.

条項55:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項54のデバイス。 Clause 55: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh Article 54 device containing data defining one.

条項56:オブジェクト衝突データを受信するために、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信するように構成される、条項51のデバイス。 Clause 56: To receive object collision data, the presentation engine shall include boundary data representing one or more collision boundaries of at least one virtual solid object, indicating whether the at least one virtual solid object is affected by the collision. static data representing, material data representing how the colliding object interacts with at least one virtual solid object, or animation data representing animation triggered by a collision with at least one virtual solid object. A clause 51 device configured to receive data containing one or more.

条項57:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項51のデバイス。 Clause 57: The device of Clause 51, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項58:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項51のデバイス。 Clause 58: Media data streamed to a Clause 51 device, including glTF2.0 media data.

条項59:ストリーミングされるメディアデータを受信するために、プレゼンテーションエンジンは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するように構成される、条項51のデバイス。 Clause 59: In order to receive the streamed media data, the presentation engine is configured to request the streamed media data from the retrieval unit via an application programming interface (API), Clause 51 device.

条項60:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項51のデバイス。 Clause 60: Object collision data is included in the MPEG scene description, Clause 51 devices.

条項61:命令を記憶したコンピュータ可読記憶媒体であって、命令は、実行されると、プロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように仮想カメラのロケーションを更新することとを行わせる、コンピュータ可読記憶媒体。 Clause 61: A computer-readable storage medium having instructions stored thereon, the instructions, when executed, causing a processor to receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object. and receiving object collision data representing a boundary of at least one virtual solid object; and receiving camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object. and, in response to the camera movement data, using the object collision data to update the location of the virtual camera to ensure that the virtual camera remains outside of the at least one virtual solid object. Medium.

条項62:プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止させる命令を含む、条項61のコンピュータ可読媒体。 Clause 62: The computer-readable medium of Clause 61, wherein the instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

条項63:プロセッサに、オブジェクト衝突データを受信させる命令は、プロセッサに、MPEG_mesh_collision拡張を受信させる命令を含む、条項61のコンピュータ可読媒体。 Clause 63: The computer-readable medium of Clause 61, wherein the instructions for causing the processor to receive object collision data include instructions for causing the processor to receive an MPEG_mesh_collision extension.

条項64:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項62のコンピュータ可読媒体。 Clause 64: The MPEG_mesh_collision extension is the computer-readable medium of Clause 62 containing data that defines at least one 3D mesh for at least one virtual solid object.

条項65:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項63のコンピュータ可読媒体。 Clause 65: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh Article 63 computer-readable medium containing data defining a

条項66:プロセッサに、オブジェクト衝突データを受信させる命令は、プロセッサに、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信させる命令を含む、条項61のコンピュータ可読媒体。 Clause 66: Instructions causing the processor to receive object collision data may cause the processor to receive object collision data, including boundary data representing one or more collision boundaries of at least one virtual solid object, whether the at least one virtual solid object is affected by the collision. static data representing whether the colliding object interacts with at least one virtual solid object, material data representing how the colliding object interacts with at least one virtual solid object, or animation data representing animation triggered by a collision with at least one virtual solid object. Article 61 computer-readable medium containing instructions for receiving data containing one or more of the following:

条項67:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項61のコンピュータ可読媒体。 Clause 67: The computer-readable medium of Clause 61, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項68:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項61のコンピュータ可読媒体。 Clause 68: Streamed media data is a computer-readable medium according to Clause 61, including glTF2.0 media data.

条項69:プロセッサに、ストリーミングされるメディアデータを受信させる命令は、プロセッサに、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求させる命令を含む、条項61のコンピュータ可読媒体。 Clause 69: The instructions for causing the processor to receive streamed media data include instructions for causing the processor to request streamed media data from the retrieval unit via an application programming interface (API). Clause 61 computer readable medium.

条項70:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項61のコンピュータ可読媒体。 Clause 70: Object collision data is contained within an MPEG scene description, a Clause 61 computer-readable medium.

条項71:メディアデータを取り出すためのデバイスであって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するための手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ移動データに応答して、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように仮想カメラのロケーションを更新するための手段とを備えるデバイス。 Clause 71: A device for retrieving media data, the device representing a virtual three-dimensional scene comprising at least one virtual solid object, with means for receiving streamed media data and boundaries of the at least one virtual solid object. means for receiving object collision data representing a virtual solid object, means for receiving from a user camera movement data requesting that a virtual camera move within the at least one virtual solid object; and means responsive to the camera movement data; and means for updating the location of the virtual camera to ensure that the virtual camera remains outside of at least one virtual solid object.

条項72:メディアデータを取り出す方法であって、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、3次元シーン用のカメラ制御データを受信するステップであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、プレゼンテーションエンジンによって、仮想カメラのロケーションを更新するステップとを含む方法。 Clause 72: A method for retrieving media data, the method comprising: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving camera control data of the virtual camera, the camera control data comprising data defining permissible locations for the virtual camera; and the presentation engine moves the virtual camera within the at least one virtual solid object. receiving camera movement data from a user requesting camera movement data; and using the camera control data to update the location of the virtual camera by a presentation engine to ensure that the virtual camera remains within an acceptable location. and a method including.

条項73:仮想カメラのロケーションを更新するステップは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップを含む、条項72の方法。 Clause 73: The method of Clause 72, wherein updating the location of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

条項74:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項72および73のいずれかの方法。 Clause 74: Streamed media data includes glTF2.0 media data in accordance with any of Clauses 72 and 73.

条項75:ストリーミングされるメディアデータを受信するステップは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するステップを含む、条項72から74のいずれかの方法。 Clause 75: The step of receiving the streamed media data comprises the step of requesting the streamed media data from the retrieval unit via an application programming interface (API). Method.

条項76:カメラ制御データは、MPEGシーン記述の中に含まれる、条項72から75のいずれかの方法。 Clause 76: Camera control data is included within the MPEG scene description in any of Clauses 72 to 75.

条項77:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、仮想カメラのロケーションを更新するステップは、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めるステップを含む、条項72から76のいずれかの方法。 Clause 77: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for a virtual camera and The method of any of clauses 72-76, wherein updating the location of the camera includes allowing the virtual camera to only traverse segments between anchor points.

条項78:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、仮想カメラのロケーションを更新するステップは、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めるステップを含む、条項72から77のいずれかの方法。 Clause 78: The camera control data includes data defining a bounding volume representing a permissible camera movement volume for the virtual camera, and updating the location of the virtual camera ensures that the virtual camera traverses only the permissible camera movement volume. any of the methods in clauses 72 to 77, including the step of acknowledging that

条項79:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項78の方法。 Clause 79: The method of Clause 78, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項80:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項72から79のいずれかの方法。 Clause 80: Camera control data is contained within the MPEG_camera_control extension, in any of Clauses 72 to 79.

条項81:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項80の方法。 Clause 81: MPEG_camera_control extension supports anchor data representing the number of anchor points for the allowed path for the virtual camera, segment data representing the number of path segments for the allowed path between the anchor points, and bounding volume for the virtual camera. clause 80, comprising one or more of: bounding volume data representing a camera parameter, an internal parameter indicating whether a camera parameter is modified at each of the anchor points, and accessor data representing an index of an accessor providing camera control data. the method of.

条項82:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項72から81のいずれかの方法。 Clause 82: The method of any of Clauses 72 to 81, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項83:カメラ制御データから、仮想カメラのための許容経路を判断するステップをさらに含み、仮想カメラのロケーションを更新するステップは、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証するステップを含む、条項72の方法。 Clause 83: The step of updating the location of the virtual camera further comprises determining from the camera control data an allowable path for the virtual camera, the step of updating the location of the virtual camera being such that the virtual camera is within the allowable path defined in the camera control data. The method of clause 72, comprising the step of ensuring movement only along certain virtual paths.

条項84:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項72から83のいずれかの方法。 Clause 84: Camera control data is contained within the MPEG_mesh_collision extension, in any of Clauses 72 to 83.

条項85:メディアデータを取り出すためのデバイスであって、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを備え、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新することとを行うように構成される、デバイス。 Clause 85: A device for retrieving media data, the device comprising a memory configured to store the media data and one or more circuitry implemented with circuitry and configured to run a presentation engine. a processor; the presentation engine is configured to receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and to receive camera control data for the three-dimensional scene. , the camera control data includes data defining permissible locations for the virtual camera; and receiving camera movement data from a user that requires the virtual camera to move within the at least one virtual solid object. , using the camera control data to update a location of the virtual camera to ensure that the virtual camera remains within an acceptable location.

条項86:プレゼンテーションエンジンは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するように構成される、条項85のデバイス。 Clause 86: The device of Clause 85, wherein the presentation engine is configured to prevent the virtual camera from passing through at least one virtual solid object.

条項87:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項85および86のいずれかのデバイス。 Clause 87: Media data streamed to any device in clauses 85 and 86, including glTF2.0 media data.

条項88:プレゼンテーションエンジンは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するように構成される、条項85から87のいずれかのデバイス。 Clause 88: The device of any of Clauses 85 to 87, wherein the presentation engine is configured to request streamed media data from the retrieval unit via an application programming interface (API).

条項89:カメラ制御データは、MPEGシーン記述の中に含まれる、条項85から88のいずれかのデバイス。 Clause 89: Camera control data is included in the MPEG scene description for any device in clauses 85 to 88.

条項90:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めるように構成される、条項85から89のいずれかのデバイス。 Clause 90: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for a virtual camera, Any device in clauses 85-89, wherein the presentation engine is configured to allow the virtual camera to traverse only segments between anchor points to update the camera's location.

条項91:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めるように構成される、条項85から90のいずれかのデバイス。 Clause 91: The camera control data includes data defining a bounding volume representing the allowed camera movement volume for the virtual camera, and in order to update the location of the virtual camera, the presentation engine specifies that the virtual camera any device in clauses 85 to 90 so constructed as to permit it to cross.

条項92:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項91のデバイス。 Clause 92: A clause 91 device, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項93:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項85から92のいずれかのデバイス。 Clause 93: Camera control data is contained within the MPEG_camera_control extension for any device in clauses 85 to 92.

条項94:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項93のデバイス。 Clause 94: MPEG_camera_control extension supports anchor data representing the number of anchor points for an allowed path for a virtual camera, segment data representing the number of path segments for an allowed path between anchor points, and bounding volume for a virtual camera. clause 93, comprising one or more of: bounding volume data representing a camera parameter, an internal parameter indicating whether a camera parameter is modified at each of the anchor points, and accessor data representing an index of an accessor providing camera control data. device.

条項95:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項85から94のいずれかのデバイス。 Clause 95: The at least one virtual solid object is a device according to any of Clauses 85 to 94, including one of a virtual wall, a virtual chair, or a virtual table.

条項96:プレゼンテーションエンジンは、カメラ制御データから、仮想カメラのための許容経路を判断するようにさらに構成され、仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証するように構成される、条項85から95のいずれかのデバイス。 Clause 96: The presentation engine is further configured to determine, from the camera control data, an acceptable path for the virtual camera, and to update the location of the virtual camera, the presentation engine is configured to determine, from the camera control data, A device according to any of clauses 85 to 95 configured to ensure that it moves only along virtual paths that are within the permissible paths defined in the device.

条項97:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項85から96のいずれかのデバイス。 Clause 97: Any device in clauses 85 through 96, where camera control data is contained within the MPEG_mesh_collision extension.

条項98:命令を記憶したコンピュータ可読記憶媒体であって、命令は、実行されると、プレゼンテーションエンジンを実行するプロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、3次元シーン用のカメラ制御データを受信することであって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、ことと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新することとを行わせる。 Clause 98: A computer-readable storage medium having instructions stored thereon, the instructions, when executed, transmitting to a processor executing a presentation engine a streamed media representing a virtual three-dimensional scene including at least one virtual solid object. receiving data; and receiving camera control data for the three-dimensional scene, the camera control data including data defining permissible locations for the virtual camera; Receiving camera movement data from a user that requests movement within the virtual solid object and using camera control data to determine the location of the virtual camera to ensure that the virtual camera remains within acceptable locations. update.

条項99:プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止させる命令を含む、条項98のコンピュータ可読記憶媒体。 Clause 99: The computer-readable storage medium of Clause 98, wherein the instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

条項100:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項98および99のいずれかのコンピュータ可読媒体。 Clause 100: Streamed media data is a computer-readable medium according to any of clauses 98 and 99, including glTF2.0 media data.

条項101:プロセッサに、ストリーミングされるメディアデータを受信させる命令は、プロセッサに、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求させる命令を含む、条項98から100のいずれかのコンピュータ可読媒体。 Clause 101: The instructions for causing the processor to receive streamed media data include instructions for causing the processor to request streamed media data from a retrieval unit via an application programming interface (API), Clause 98 on any computer-readable medium from 100 to 100.

条項102:カメラ制御データは、MPEGシーン記述の中に含まれる、条項98から101のいずれかのコンピュータ可読媒体。 Clause 102: Camera control data is contained within an MPEG scene description on a computer-readable medium according to any of Clauses 98 to 101.

条項103:カメラ制御データは、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは、仮想カメラについての許容カメラ移動ベクトルを表し、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、アンカーポイントの間のセグメントのみを横断することを認めさせる命令を含む、条項98から102のいずれかのコンピュータ可読媒体。 Clause 103: Camera control data includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowable camera movement vectors for the virtual camera; 102. The computer-readable medium of any of clauses 98-102, wherein the instructions cause the processor to update the location of the virtual camera.

条項104:カメラ制御データは、仮想カメラ用の許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含み、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、許容カメラ移動ボリュームのみを横断することを認めさせる命令を含む、条項103のコンピュータ可読媒体。 Clause 104: The camera control data includes data defining a bounding volume representing an allowed camera movement volume for a virtual camera, and instructions that cause the processor to update the location of the virtual camera cause the processor to specify that the virtual camera has an allowed camera movement volume. Article 103 computer-readable medium containing instructions authorizing traversal only.

条項105:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項98から104のいずれかのコンピュータ可読媒体。 Clause 105: The computer-readable medium of any of Clauses 98 to 104, wherein the data defining a bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項106:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項105のコンピュータ可読媒体。 Clause 106: Computer-readable medium of Clause 105, where camera control data is contained within the MPEG_camera_control extension.

条項107:MPEG_camera_control拡張は、仮想カメラのための許容経路用のアンカーポイントの数を表すアンカーデータ、アンカーポイントの間の許容経路用の経路セグメントの数を表すセグメントデータ、仮想カメラのための境界ボリュームを表す境界ボリュームデータ、カメラパラメータがアンカーポイントの各々において修正されるかどうかを示す内部パラメータ、およびカメラ制御データを提供するアクセサのインデックスを表すアクセサデータのうちの1つまたは複数を含む、条項98から106のいずれかのコンピュータ可読媒体。 Clause 107: MPEG_camera_control extension supports anchor data representing the number of anchor points for the allowed path for the virtual camera, segment data representing the number of path segments for the allowed path between the anchor points, and bounding volume for the virtual camera. clause 98, including one or more of: bounding volume data representing the camera parameters, internal parameters indicating whether the camera parameters are modified at each of the anchor points, and accessor data representing the index of the accessor providing the camera control data. Any computer readable medium from 106 to 106.

条項108:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項98から107のいずれかのコンピュータ可読媒体。 Clause 108: The computer readable medium of any of Clauses 98 to 107, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項109:プロセッサに、カメラ制御データから、仮想カメラのための許容経路を判断させる命令をさらに含み、プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが、カメラ制御データの中で定義される許容経路内にある仮想経路のみに沿って動くことを保証させる命令を含む、条項98から108のいずれかのコンピュータ可読媒体。 Clause 109: The instructions further include instructions for causing the processor to determine an allowable path for the virtual camera from the camera control data, and the instructions for causing the processor to update the location of the virtual camera further include instructions for causing the processor to determine the location of the virtual camera from the camera control data. A computer-readable medium according to any of clauses 98 to 108 containing instructions that ensure movement only along virtual paths that are within the permissible paths defined in .

条項110:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項98から109のいずれかのコンピュータ可読媒体。 Clause 110: Camera control data is contained in the MPEG_mesh_collision extension on any computer-readable medium of clauses 98 to 109.

条項111:メディアデータを取り出すためのデバイスであって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、3次元シーン用のカメラ制御データを受信するための手段であって、カメラ制御データは、仮想カメラについての許容ロケーションを定義するデータを含む、手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ制御データを使って、仮想カメラが許容ロケーション内に留まることを保証するように、仮想カメラのロケーションを更新するための手段とを備えるデバイス。 Clause 111: A device for retrieving media data, the device comprising: means for receiving streamed media data representing a virtual three-dimensional scene comprising at least one virtual solid object; and camera control data for the three-dimensional scene. means for receiving, the camera control data comprising data defining permissible locations for the virtual camera; and means for receiving camera movement requiring the virtual camera to move within the at least one virtual solid object. A device comprising: means for receiving data from a user; and means for using camera control data to update a location of a virtual camera to ensure that the virtual camera remains within an acceptable location.

条項112:メディアデータを取り出す方法であって、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信するステップと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、プレゼンテーションエンジンによって、ユーザから受信するステップと、カメラ移動データに応答して、オブジェクト衝突データを使って、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように、仮想カメラのロケーションを更新するステップとを含む方法。 Clause 112: A method for retrieving media data, the method comprising: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by a presentation engine, from a user object collision data representing a boundary of the solid object; and receiving, by a presentation engine, camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object; and in response to the data, using the object collision data, the presentation engine updates the location of the virtual camera to ensure that the virtual camera remains outside of at least one virtual solid object.

条項113:条項72から84のいずれかの方法と条項112の方法の組合せを含む方法。 Article 113: A method involving a combination of any of the methods of Articles 72 to 84 and the method of Article 112.

条項114:仮想カメラのロケーションを更新するステップは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップを含む、条項112および113のいずれかの方法。 Clause 114: The method of any of Clauses 112 and 113, wherein updating the location of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

条項115:オブジェクト衝突データを受信するステップは、MPEG_mesh_collision拡張を受信するステップを含む、条項112から114のいずれかの方法。 Clause 115: The method of any of Clauses 112 to 114, wherein receiving object collision data comprises receiving an MPEG_mesh_collision extension.

条項116:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項115の方法。 Clause 116: The method of Clause 115, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for at least one virtual solid object.

条項117:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項116の方法。 Clause 117: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh The method of clause 116, including data defining the

条項118:オブジェクト衝突データを受信するステップは、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信するステップを含む、条項112から117のいずれかの方法。 Clause 118: The step of receiving object collision data comprises boundary data representing one or more collision boundaries of the at least one virtual solid object, static data representing whether the at least one virtual solid object is affected by the collision. , one or more of material data representing how the colliding object interacts with the at least one virtual solid object, or animation data representing an animation triggered by the collision with the at least one virtual solid object. any of the methods of clauses 112 to 117, comprising the step of receiving data comprising:

条項119:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項112から118のいずれかの方法。 Clause 119: The method of any of Clauses 112 to 118, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項120:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項112から119のいずれかの方法。 Clause 120: The media data to be streamed shall be in the manner of any of Clauses 112 to 119, including glTF2.0 media data.

条項121:ストリーミングされるメディアデータを受信するステップは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するステップを含む、条項112から120のいずれかの方法。 Clause 121: The step of receiving the streamed media data comprises the step of requesting the streamed media data from the retrieval unit via an application programming interface (API). Method.

条項122:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項112から121のいずれかの方法。 Clause 122: Object collision data is included within the MPEG scene description, in any of Clauses 112 to 121.

条項123:メディアデータを取り出すためのデバイスであって、メディアデータを記憶するように構成されたメモリと、回路機構で実装されるとともに、プレゼンテーションエンジンを実行するように構成された1つまたは複数のプロセッサとを備え、1つまたは複数のプロセッサは、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように仮想カメラのロケーションを更新することとを行うように構成される、デバイス。 Clause 123: A device for retrieving media data, comprising a memory configured to store media data and one or more circuitry implemented with circuitry and configured to run a presentation engine. a processor; the one or more processors receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; and an object representing a boundary of the at least one virtual solid object. receiving collision data; receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object; and, in response to the camera movement data, using the object collision data. and updating the location of the virtual camera to ensure that the virtual camera remains outside of at least one virtual solid object.

条項124:条項85から97のいずれかのデバイスと条項123のデバイスの組合せを備えるデバイス。 Clause 124: A device comprising a combination of a device according to clauses 85 to 97 and a device according to clause 123.

条項125:仮想カメラのロケーションを更新するために、プレゼンテーションエンジンは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するように構成される、条項123および124のいずれかのデバイス。 Clause 125: The device of any of Clauses 123 and 124, wherein the presentation engine is configured to prevent the virtual camera from passing through at least one virtual solid object, in order to update the location of the virtual camera.

条項126:オブジェクト衝突データを受信するために、プレゼンテーションエンジンは、MPEG_mesh_collision拡張を受信するように構成される、条項123から125のいずれかのデバイス。 Clause 126: A device according to any of Clauses 123 to 125, wherein the presentation engine is configured to receive the MPEG_mesh_collision extension in order to receive object collision data.

条項127:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項126のデバイス。 Clause 127: The MPEG_mesh_collision extension includes data defining at least one 3D mesh for at least one virtual solid object for a Clause 126 device.

条項128:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項127のデバイス。 Clause 128: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh Article 127 device containing data defining one.

条項129:オブジェクト衝突データを受信するために、プレゼンテーションエンジンは、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信するように構成される、条項123から128のいずれかのデバイス。 Clause 129: To receive object collision data, the presentation engine shall include boundary data representing one or more collision boundaries of at least one virtual solid object, indicating whether the at least one virtual solid object is affected by the collision. static data representing, material data representing how the colliding object interacts with at least one virtual solid object, or animation data representing animation triggered by a collision with at least one virtual solid object. A device according to any of clauses 123 to 128, configured to receive data containing one or more of the following:

条項130:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項123から129のいずれかのデバイス。 Clause 130: The at least one virtual solid object is a device according to any of Clauses 123 to 129, including one of a virtual wall, a virtual chair, or a virtual table.

条項131:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項123から130のいずれかのデバイス。 Clause 131: Media data streamed to any device in clauses 123 to 130, including glTF2.0 media data.

条項132:ストリーミングされるメディアデータを受信するために、プレゼンテーションエンジンは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するように構成される、条項123から131のいずれかのデバイス。 Clause 132: In order to receive the streamed media data, the presentation engine is configured to request the streamed media data from the retrieval unit via an application programming interface (API), Clause 123. Any device from 131.

条項133:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項123から132のいずれかのデバイス。 Clause 133: Object collision data is included in the MPEG scene description, any device in clauses 123 to 132.

条項134:命令を記憶したコンピュータ可読記憶媒体であって、命令は、実行されると、プロセッサに、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信することと、少なくとも1つの仮想固体オブジェクトの境界を表すオブジェクト衝突データを受信することと、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信することと、カメラ移動データに応答して、オブジェクト衝突データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの外に留まることを保証するように仮想カメラのロケーションを更新することとを行わせる、コンピュータ可読記憶媒体。 Clause 134: A computer-readable storage medium having instructions stored thereon, the instructions, when executed, causing a processor to receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object. and receiving object collision data representing a boundary of at least one virtual solid object; and receiving camera movement data from a user requesting that a virtual camera move within the at least one virtual solid object. and, in response to the camera movement data, using the object collision data to update the location of the virtual camera to ensure that the virtual camera remains outside of the at least one virtual solid object. Medium.

条項135:条項98～110のいずれかのコンピュータ可読記憶媒体と、条項134のコンピュータ可読記憶媒体の組合せを含むコンピュータ可読記憶媒体。 Clause 135: A computer-readable storage medium comprising a combination of the computer-readable storage medium of any of Clauses 98-110 and the computer-readable storage medium of Clause 134.

条項136:プロセッサに仮想カメラのロケーションを更新させる命令は、プロセッサに、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止させる命令を含む、条項134および135のいずれかのコンピュータ可読媒体。 Clause 136: The computer-readable medium of any of clauses 134 and 135, wherein the instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to prevent the virtual camera from passing through at least one virtual solid object. .

条項137:プロセッサに、オブジェクト衝突データを受信させる命令は、プロセッサに、MPEG_mesh_collision拡張を受信させる命令を含む、条項134から136のいずれかのコンピュータ可読媒体。 Clause 137: The computer-readable medium of any of clauses 134-136, wherein the instructions for causing the processor to receive object collision data include instructions for causing the processor to receive an MPEG_mesh_collision extension.

条項138:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項134から137のいずれかのコンピュータ可読媒体。 Clause 138: The MPEG_mesh_collision extension is the computer-readable medium of any of Clauses 134 to 137 containing data that defines at least one 3D mesh for at least one virtual solid object.

条項139:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項134から138のいずれかのコンピュータ可読媒体。 Clause 139: The MPEG_mesh_collision extension specifies at least one of the following: a boundary of a 3D mesh for at least one virtual solid object, a material for the 3D mesh, or an animation that should be presented in response to a virtual camera touching the 3D mesh A computer-readable medium according to any of clauses 134 to 138 containing data defining a

条項140:プロセッサに、オブジェクト衝突データを受信させる命令は、プロセッサに、少なくとも1つの仮想固体オブジェクトの1つもしくは複数の衝突境界を表す境界データ、少なくとも1つの仮想固体オブジェクトが衝突によって影響されるかどうかを表す静的データ、衝突するオブジェクトが少なくとも1つの仮想固体オブジェクトとどのように相互作用するかを表す材料データ、または少なくとも1つの仮想固体オブジェクトとの衝突によってトリガされるアニメーションを表すアニメーションデータのうちの1つまたは複数を含むデータを受信させる命令を含む、条項134から139のいずれかのコンピュータ可読媒体。 Clause 140: Instructions causing the processor to receive object collision data may cause the processor to receive object collision data, including boundary data representing one or more collision boundaries of at least one virtual solid object, whether the at least one virtual solid object is affected by the collision. static data representing whether the colliding object interacts with at least one virtual solid object, material data representing how the colliding object interacts with at least one virtual solid object, or animation data representing animation triggered by a collision with at least one virtual solid object. A computer-readable medium according to any of clauses 134 to 139 containing instructions for receiving data containing one or more of the following:

条項141:少なくとも1つの仮想固体オブジェクトは、仮想壁、仮想椅子、または仮想テーブルのうちの1つを含む、条項134から140のいずれかのコンピュータ可読媒体。 Clause 141: The computer readable medium of any of Clauses 134 to 140, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

条項142:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項134から141のいずれかのコンピュータ可読媒体。 Clause 142: Streamed media data is a computer readable medium according to any of Clauses 134 to 141, including glTF2.0 media data.

条項143:プロセッサに、ストリーミングされるメディアデータを受信させる命令は、プロセッサに、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求させる命令を含む、条項134から142のいずれかのコンピュータ可読媒体。 Clause 143: The instructions for causing the processor to receive streamed media data include instructions for causing the processor to request streamed media data from the retrieval unit via an application programming interface (API). Clause 134 142 on any computer-readable medium.

条項144:オブジェクト衝突データはMPEGシーン記述の中に含まれる、条項134から143のいずれかのコンピュータ可読媒体。 Clause 144: The computer-readable medium of any of Clauses 134 to 143, wherein the object collision data is contained within an MPEG scene description.

条項145:メディアデータを取り出す方法であって、プレゼンテーションエンジンによって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するステップと、プレゼンテーションエンジンによって、3次元シーン用のカメラ制御データを受信するステップであって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、ステップと、プレゼンテーションエンジンによって、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するステップと、カメラ移動データに応答して、カメラ制御データを使って、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップとを含む方法。 Clause 145: A method for retrieving media data, the method comprising: receiving, by a presentation engine, streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving camera control data of the at least one virtual solid object, the camera control data comprising data defining constraints for preventing the virtual camera from passing through the at least one virtual solid object; , receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object; and in response to the camera movement data, using camera control data, the virtual camera moves within the at least one virtual solid object; and preventing passage through two virtual solid objects.

条項146:ストリーミングされるメディアデータは、glTF2.0メディアデータを含む、条項145の方法。 Clause 146: The method of Clause 145, wherein the streamed media data includes glTF2.0 media data.

条項147:ストリーミングされるメディアデータを受信するステップは、アプリケーションプログラミングインターフェース(API)を介して、取出しユニットに対して、ストリーミングされるメディアデータを要求するステップを含む、条項145および146のいずれかの方法。 Clause 147: The step of receiving the streamed media data comprises the step of requesting the streamed media data from the retrieval unit via an application programming interface (API). Method.

条項148:カメラ制御データは、MPEGシーン記述の中に含まれる、条項145から147のいずれかの方法。 Clause 148: Camera control data is included within the MPEG scene description in any of Clauses 145 to 147.

条項149:カメラ制御データはMPEG_camera_control拡張の中に含まれる、条項145から148のいずれかの方法。 Clause 149: Camera control data is contained within the MPEG_camera_control extension, in any of Clauses 145 to 148.

条項150:MPEG_camera_control拡張は、2つ以上のアンカーポイントと、アンカーポイントの間の1つまたは複数のセグメントとを定義するデータを含み、セグメントは許容カメラ移動ベクトルを表す、条項149の方法。 Clause 150: The method of Clause 149, wherein the MPEG_camera_control extension includes data defining two or more anchor points and one or more segments between the anchor points, where the segments represent allowed camera movement vectors.

条項151:MPEG_camera_control拡張は、許容カメラ移動ボリュームを表す境界ボリュームを定義するデータを含む、条項149および150のいずれかの方法。 Clause 151: The MPEG_camera_control extension includes data defining a bounding volume that represents the allowed camera movement volume, as in any of Clauses 149 and 150.

条項152:境界ボリュームを定義するデータは、円錐体、錐体、または球のうちの少なくとも1つを定義するデータを含む、条項151の方法。 Clause 152: The method of Clause 151, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

条項153:MPEG_camera_control拡張は、上のTable 1(表1)のデータに準拠する、条項149から152のいずれかの方法。 Clause 153: The MPEG_camera_control extension conforms to the data in Table 1 above, in any of Clauses 149 to 152.

条項154:少なくとも1つの仮想固体オブジェクトは仮想壁を含む、条項149から153のいずれかの方法。 Clause 154: The method of any of Clauses 149 to 153, wherein the at least one virtual solid object comprises a virtual wall.

条項155:仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップは、MPEG_camera_control拡張において定義される許容経路を超える仮想経路に沿って仮想カメラが動くのを防止するステップを含む、条項149から154のいずれかの方法。 Clause 155: Preventing the virtual camera from passing through the at least one virtual solid object comprises preventing the virtual camera from moving along a virtual path that exceeds the allowed path defined in the MPEG_camera_control extension. In any of the ways specified in Articles 149 to 154.

条項156:カメラ制御データはMPEG_mesh_collision拡張の中に含まれる、条項145から155のいずれかの方法。 Clause 156: Camera control data is contained within the MPEG_mesh_collision extension, in any of Clauses 145 to 155.

条項157:MPEG_mesh_collision拡張は、少なくとも1つの仮想固体オブジェクト用の少なくとも1つの3Dメッシュを定義するデータを含む、条項156の方法。 Clause 157: The method of Clause 156, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for at least one virtual solid object.

条項158:MPEG_mesh_collision拡張は、3Dメッシュの境界、3Dメッシュ用の材料、または仮想カメラが3Dメッシュに接触したことに応答して提示されるべきアニメーションのうちの少なくとも1つを定義するデータを含む、条項157の方法。 Clause 158: The MPEG_mesh_collision extension includes data defining at least one of the boundaries of a 3D mesh, the material for the 3D mesh, or an animation to be presented in response to a virtual camera contacting the 3D mesh; Article 157 method.

条項159:MPEG_mesh_collision拡張は上のTable 2(表2)に準拠する、条項156から158のいずれかの方法。 Clause 159: MPEG_mesh_collision extension conforms to Table 2 above, in any of Clauses 156 to 158.

条項160:仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するステップは、MPEG_mesh_collision拡張を使って、仮想カメラが少なくとも1つの仮想固体オブジェクトに入るのを防止するステップを含む、条項156から159のいずれかの方法。 Clause 160: Preventing the virtual camera from passing through the at least one virtual solid object comprises using the MPEG_mesh_collision extension to prevent the virtual camera from entering the at least one virtual solid object. Clause 156 Any way from 159.

条項161:メディアデータを取り出すためのデバイスであって、条項145から160のいずれかの方法を実施するための1つまたは複数の手段を備えるデバイス。 Clause 161: A device for retrieving media data, the device comprising one or more means for implementing any of the methods of Clauses 145 to 160.

条項162:1つまたは複数の手段が、回路機構において実装された1つまたは複数のプロセッサを備える、条項161のデバイス。 Clause 162: The device of Clause 161, wherein the one or more means comprises one or more processors implemented in circuitry.

条項163:装置は、集積回路、マイクロプロセッサ、およびワイヤレス通信デバイスのうちの少なくとも1つを含む、条項161のデバイス。 Clause 163: The device of Clause 161, wherein the apparatus includes at least one of an integrated circuit, a microprocessor, and a wireless communication device.

条項164:メディアデータを取り出すためのデバイスであって、少なくとも1つの仮想固体オブジェクトを含む仮想3次元シーンを表す、ストリーミングされるメディアデータを受信するための手段と、3次元シーン用のカメラ制御データを受信するための手段であって、カメラ制御データは、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するための制約を定義するデータを含む、手段と、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を動くことを要求するカメラ移動データを、ユーザから受信するための手段と、カメラ移動データに応答して、仮想カメラが少なくとも1つの仮想固体オブジェクトの中を通るのを防止するのにカメラ制御データを使うための手段とを備えるデバイス。 Clause 164: A device for retrieving media data, the device comprising: means for receiving streamed media data representing a virtual three-dimensional scene comprising at least one virtual solid object; and camera control data for the three-dimensional scene. means for receiving, wherein the camera control data includes data defining a constraint for preventing the virtual camera from passing through the at least one virtual solid object; means for receiving camera movement data from a user requesting movement through the at least one virtual solid object; and, in response to the camera movement data, preventing the virtual camera from passing through the at least one virtual solid object. and means for using the camera control data to.

1つまたは複数の例では、説明した機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せにおいて実装され得る。ソフトウェアで実装される場合、機能は、1つまたは複数の命令またはコードとして、コンピュータ可読媒体上に記憶されるか、またはコンピュータ可読媒体を介して送信され、ハードウェアベース処理ユニットによって実行され得る。コンピュータ可読媒体は、データ記憶媒体などの有形媒体に対応する、コンピュータ可読記憶媒体を含み得るか、または、たとえば、通信プロトコルに従って、ある場所から別の場所へのコンピュータプログラムの転送を容易にする任意の媒体を含む通信媒体を含み得る。このように、コンピュータ可読媒体は、一般に、(1)非一時的な有形コンピュータ可読記憶媒体、または(2)信号もしくは搬送波などの通信媒体に対応し得る。データ記憶媒体は、本開示で説明された技法の実装のための命令、コード、および/またはデータ構造を取り出すために1つもしくは複数のコンピュータまたは1つもしくは複数のプロセッサによってアクセスされ得る任意の利用可能な媒体であり得る。コンピュータプログラム製品が、コンピュータ可読媒体を含む場合がある。 In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media such as data storage media, or any computer-readable storage medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. communication media, including media. Thus, computer-readable media generally may correspond to (1) non-transitory, tangible computer-readable storage media, or (2) communication media such as a signal or carrier wave. A data storage medium can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementation of the techniques described in this disclosure. It can be a possible medium. A computer program product may include a computer readable medium.

限定ではなく例として、そのようなコンピュータ可読記憶媒体は、RAM、ROM、EEPROM、CD-ROMもしくは他の光ディスクストレージ、磁気ディスクストレージもしくは他の磁気記憶デバイス、フラッシュメモリ、または、命令もしくはデータ構造の形態の所望のプログラムコードを記憶するために使用され得るとともにコンピュータによってアクセスされ得る任意の他の媒体を備えることができる。また、いかなる接続もコンピュータ可読媒体と適切に呼ばれる。たとえば、命令が、同軸ケーブル、光ファイバーケーブル、ツイストペア、デジタル加入者線(DSL)、または赤外線、無線、およびマイクロ波などのワイヤレス技術を使用してウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバーケーブル、ツイストペア、DSL、または赤外線、無線、およびマイクロ波などのワイヤレス技術は媒体の定義に含まれる。しかしながら、コンピュータ可読記憶媒体およびデータ記憶媒体は、接続、搬送波、信号、または他の一時的媒体を含まないが、代わりに非一時的有形記憶媒体を対象とすることを理解されたい。ディスク(disk)およびディスク(disc)は、本明細書で使用するとき、コンパクトディスク(disc)(CD)、レーザーディスク(disc)、光ディスク(disc)、デジタル多用途ディスク(disc)(DVD)、フロッピーディスク(disk)、およびブルーレイディスク(disc)を含み、ディスク(disk)は、通常、データを磁気的に再生し、ディスク(disc)は、レーザーを用いてデータを光学的に再生する。上記の組合せもコンピュータ可読媒体の範囲に含まれるべきである。 By way of example and not limitation, such computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage device, flash memory, or a memory containing instructions or data structures. Any other medium that can be used to store desired form of program code and that can be accessed by a computer can be included. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave. coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transitory media, but instead refer to non-transitory tangible storage media. Disc and disc, as used herein, refer to compact disc (disc) (CD), laser disc (disc), optical disc (disc), digital versatile disc (disc) (DVD), Includes floppy disks (disks) and Blu-ray disks (discs), with disks typically reproducing data magnetically and discs reproducing data optically using a laser. Combinations of the above should also be included within the scope of computer-readable media.

命令は、1つもしくは複数のデジタル信号プロセッサ(DSP)、汎用マイクロプロセッサ、特定用途向け集積回路(ASIC)、フィールドプログラマブル論理アレイ(FPGA)、または他の等価な集積論理回路機構もしくは個別論理回路機構などの、1つまたは複数のプロセッサによって実行され得る。したがって、本明細書で使用する「プロセッサ」という用語は、上記の構造、または本明細書で説明する技法の実装に適した任意の他の構造のいずれかを指すことがある。加えて、いくつかの態様では、本明細書で説明された機能性は、符号化および復号のために構成された専用のハードウェアモジュールおよび/もしくはソフトウェアモジュール内で提供され得、または複合コーデックに組み込まれ得る。また、技法は、1つまたは複数の回路または論理要素で完全に実装され得る。 The instructions may be implemented on one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. may be executed by one or more processors, such as. Accordingly, the term "processor" as used herein may refer to any of the structures described above or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in combination codecs. can be incorporated. Also, the techniques may be implemented entirely with one or more circuits or logic elements.

本開示の技法は、ワイヤレスハンドセット、集積回路(IC)またはICのセット(たとえば、チップセット)を含む、多種多様なデバイスまたは装置において実装され得る。開示された技法を実施するように構成されたデバイスの機能的側面を強調するために、様々な構成要素、モジュール、またはユニットが本開示に記載されているが、それらは、必ずしも異なるハードウェアユニットによる実現を必要とするとは限らない。むしろ、上記で説明されたように、様々なユニットが、好適なソフトウェアおよび/またはファームウェアとともに、上記で説明された1つまたは複数のプロセッサを含んで、コーデックハードウェアユニットにおいて組み合わせられるか、または相互動作可能なハードウェアユニットの集合体によって与えられ得る。 The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs) or sets of ICs (eg, chipsets). Although various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to implement the disclosed techniques, they are not necessarily different hardware units. It does not necessarily need to be realized by Rather, the various units, including one or more processors as described above, together with suitable software and/or firmware, may be combined or interacted with in a codec hardware unit, as described above. It may be provided by a collection of operable hardware units.

様々な例について述べた。これらおよび他の例は、以下の特許請求の範囲内に入る。 Various examples have been described. These and other examples are within the scope of the following claims.

10 システム
20 コンテンツ準備デバイス
22 オーディオソース
24 ビデオソース
26 オーディオエンコーダ
28 ビデオエンコーダ
30 カプセル化ユニット
32 出力インターフェース
40 クライアントデバイス
42 オーディオ出力
44 ビデオ出力
46 オーディオデコーダ
48 ビデオデコーダ
50 カプセル化解除ユニット
52 取出しユニット
54 ネットワークインターフェース
60 サーバデバイス
62 記憶媒体
64 マルチメディアコンテンツ
66 マニフェストファイル
68 表現
68A～68N 表現
70 要求処理ユニット
72 ネットワークインターフェース
74 ネットワーク
100 eMBMSミドルウェアユニット
102 プロキシサーバユニット
104 キャッシュ
106 eMBMS受信ユニット
110 DASHクライアント
112 メディアアプリケーション
114 プレゼンテーションエンジン
120 マルチメディアコンテンツ
122 メディアプレゼンテーション記述(MPD)
124 表現
124A 表現
124N 表現
126 ヘッダデータ
128 セグメント
128A～128N セグメント
130 ヘッダデータ
132 セグメント
132A～132N セグメント
150 ビデオファイル
152 ファイルタイプ(FTYP)ボックス
154 ムービー(MOOV)ボックス
156 ムービーヘッダ(MVHD)ボックス
158 トラック(TRAK)ボックス
160 ムービー延長(MVEX)ボックス
162 セグメントインデックス(sidx)ボックス
164 ムービーフラグメント(MOOF)ボックス
166 ムービーフラグメントランダムアクセス(MFRA)ボックス
202 カメラ 10 systems
20 Content Preparation Device
22 Audio sources
24 video sources
26 Audio encoder
28 video encoder
30 encapsulation units
32 Output interface
40 client devices
42 audio output
44 video output
46 Audio decoder
48 Video decoder
50 Decapsulation Unit
52 Take-out unit
54 Network Interface
60 server devices
62 Storage medium
64 Multimedia Content
66 Manifest file
68 expression
68A~68N expression
70 Request processing unit
72 Network interface
74 Network
100 eMBMS middleware units
102 Proxy server unit
104 Cache
106 eMBMS receiving unit
110 DASH Client
112 Media Applications
114 Presentation Engine
120 Multimedia Content
122 Media Presentation Description (MPD)
124 Expression
124A Expression
124N expression
126 header data
128 segments
128A~128N segment
130 header data
132 segments
132A~132N segment
150 video files
152 File type (FTYP) box
154 Movie (MOOV) Box
156 Movie header (MVHD) box
158 TRAK Box
160 Movie Extension (MVEX) Box
162 Segment index (sidx) box
164 Movie Fragment (MOOF) Box
166 Movie Fragment Random Access (MFRA) Box
202 Camera

Claims

A method for extracting media data, the method comprising:
receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object by a presentation engine;
receiving, by the presentation engine, camera control data for the three-dimensional scene, the camera control data including data defining allowed locations for a virtual camera;
receiving, by the presentation engine, camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object;
using the camera control data to update, by the presentation engine, a location of the virtual camera to ensure that the virtual camera remains within the allowed locations.

2. The method of claim 1, wherein updating the location of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

2. The method of claim 1, wherein the streamed media data includes glTF2.0 media data.

2. The method of claim 1, wherein receiving the streamed media data comprises requesting the streamed media data from a retrieval unit via an application programming interface (API).

2. The method of claim 1, wherein the camera control data is included in an MPEG scene description.

The camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing allowed camera movement vectors for the virtual camera; 2. The method of claim 1, wherein updating the location of the virtual camera includes allowing the virtual camera to traverse only the segments between the anchor points.

The camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and updating the location of the virtual camera includes data that defines a bounding volume representing an allowable camera movement volume for the virtual camera, and updating the location of the virtual camera allows the virtual camera to only move the allowable camera movement volume. 2. The method of claim 1, comprising the step of allowing traversal.

8. The method of claim 7, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

2. The method of claim 1, wherein the camera control data is included in an MPEG_camera_control extension.

The MPEG_camera_control extension is
anchor data representing the number of anchor points for an allowable path for the virtual camera;
segment data representing the number of route segments for the allowed route between the anchor points;
bounding volume data representing a bounding volume for the virtual camera;
10. An internal parameter indicating whether a camera parameter is modified at each of the anchor points; and accessor data representing an index of an accessor providing the camera control data. Method.

2. The method of claim 1, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The step of updating the location of the virtual camera further includes determining an allowable path for the virtual camera from the camera control data, and updating the location of the virtual camera includes 2. The method of claim 1, comprising the step of ensuring movement only along virtual paths that are within allowed paths.

2. The method of claim 1, wherein the camera control data is included in an MPEG_mesh_collision extension.

A device for extracting media data,
a memory configured to store media data;
one or more processors implemented in circuitry and configured to execute a presentation engine, the presentation engine comprising:
receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object;
receiving camera control data for the three-dimensional scene, the camera control data including data defining allowed locations for a virtual camera;
receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object;
and updating a location of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowed locations.

15. The device of claim 14, wherein the presentation engine is configured to prevent the virtual camera from passing through the at least one virtual solid object.

15. The device of claim 14, wherein the streamed media data includes glTF2.0 media data.

15. The device of claim 14, wherein the presentation engine is configured to request the streamed media data from a retrieval unit via an application programming interface (API).

15. The device of claim 14, wherein the camera control data is included in an MPEG scene description.

The camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing allowed camera movement vectors for the virtual camera; 15. The device of claim 14, wherein to update the location of the virtual camera, the presentation engine is configured to allow the virtual camera to only traverse the segment between the anchor points. .

The camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and to update the location of the virtual camera, the presentation engine is configured such that the virtual camera 15. The device of claim 14, configured to allow traversing only a camera movement volume.

21. The device of claim 20, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

15. The device of claim 14, wherein the camera control data is included in an MPEG_camera_control extension.

The MPEG_camera_control extension is
anchor data representing the number of anchor points for an allowable path for the virtual camera;
segment data representing the number of route segments for the allowed route between the anchor points;
bounding volume data representing a bounding volume for the virtual camera;
23. An internal parameter indicating whether a camera parameter is modified at each of the anchor points; and accessor data representing an index of an accessor providing the camera control data. device.

15. The device of claim 14, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The presentation engine is further configured to determine an allowable path for the virtual camera from the camera control data, and to update the location of the virtual camera, the presentation engine is configured to: 15. The device of claim 14, configured to ensure movement only along virtual paths that are within the allowed paths defined in the camera control data.

15. The device of claim 14, wherein the camera control data is included in an MPEG_mesh_collision extension.

A computer-readable storage medium having instructions stored thereon, the instructions, when executed, causing a processor running a presentation engine to:
receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object;
receiving camera control data for the three-dimensional scene, the camera control data including data defining allowed locations for a virtual camera;
receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object;
and updating a location of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowed locations.

28. The computer of claim 27, wherein the instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to prevent the virtual camera from passing through the at least one virtual solid object. Readable storage medium.

28. The computer-readable storage medium of claim 27, wherein the streamed media data comprises glTF2.0 media data.

The instructions for causing the processor to receive the streamed media data include instructions for causing the processor to request the streamed media data from a retrieval unit via an application programming interface (API). Computer readable storage medium according to paragraph 27.

28. The computer-readable storage medium of claim 27, wherein the camera control data is included in an MPEG scene description.

The camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing allowed camera movement vectors for the virtual camera; 28. The instructions for causing the processor to update the location of the virtual camera include instructions for causing the processor to allow the virtual camera to only traverse the segment between the anchor points. computer readable storage medium.

The camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and the instructions causing the processor to update the location of the virtual camera cause the processor to update the location of the virtual camera. 28. The computer-readable storage medium of claim 27, comprising instructions for allowing only the permissible camera movement volume to be traversed.

34. The computer-readable storage medium of claim 33, wherein the data defining the bounding volume includes data defining at least one of a cone, a cone, or a sphere.

28. The computer-readable storage medium of claim 27, wherein the camera control data is included in an MPEG_camera_control extension.

The MPEG_camera_control extension is
anchor data representing the number of anchor points for an allowable path for the virtual camera;
segment data representing the number of route segments for the allowed route between the anchor points;
bounding volume data representing a bounding volume for the virtual camera;
36. An internal parameter indicating whether a camera parameter is modified at each of the anchor points; and accessor data representing an index of an accessor providing the camera control data. Computer readable storage medium.

28. The computer-readable storage medium of claim 27, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The instructions further include instructions for causing the processor to determine an acceptable path for the virtual camera from the camera control data, the instructions for causing the processor to update the location of the virtual camera, the instructions for causing the processor to determine if the virtual camera 28. The computer-readable storage medium of claim 27, comprising instructions for ensuring movement only along virtual paths that are within the allowed paths defined in the camera control data.

28. The computer-readable storage medium of claim 27, wherein the camera control data is included in an MPEG_mesh_collision extension.

A device for extracting media data,
means for receiving streamed media data representing a virtual three-dimensional scene including at least one virtual solid object;
means for receiving camera control data for the three-dimensional scene, the camera control data comprising data defining permissible locations for a virtual camera;
means for receiving camera movement data from a user requesting that the virtual camera move within the at least one virtual solid object;
and means for using the camera control data to update a location of the virtual camera to ensure that the virtual camera remains within the allowed locations.