TW202240431A

TW202240431A - Object collision data for virtual camera in virtual interactive scene defined by streamed media data

Info

Publication number: TW202240431A
Application number: TW111108833A
Authority: TW
Inventors: 依梅德堡爾吉吉; 湯瑪士史塔克漢莫
Original assignee: 美商高通公司
Priority date: 2021-03-10
Filing date: 2022-03-10
Publication date: 2022-10-16
Also published as: EP4305848A1; JP2024509524A; WO2022192886A1; BR112023017541A2; KR20230155445A

Abstract

An example device for retrieving media data includes a memory configured to store media data; and one or more processors implemented in circuitry and configured to execute a presentation engine, the presentation engine being configured to: receive streamed media data representing a virtual three-dimensional scene including at least one virtual solid object; receive object collision data representing boundaries of the at least one virtual solid object; receive camera movement data from a user requesting that the virtual camera move through the at least one virtual solid object; and using the object collision data, prevent the virtual camera from passing through the at least one virtual solid object in response to the camera movement data.

Description

Object collision data for a virtual camera in a virtual interactive scene defined by streaming data

本申請主張於2021年3月10日遞交的美國臨時申請No. 63/159,379的權益，據此將上述申請的全部內容通過引用的方式併入。This application claims the benefit of U.S. Provisional Application No. 63/159,379, filed March 10, 2021, which is hereby incorporated by reference in its entirety.

本公開內容係關於經編碼的視頻數據的儲存及傳輸。This disclosure relates to storage and transmission of encoded video data.

數位視頻能力可以被合併到各種各樣的裝置中，包括數位電視、數位直播系統、無線廣播系統、個人數位助理（PDA）、膝上型計算機或臺式計算機、數位相機、數位記錄裝置、數位媒體播放器、視頻遊戲裝置、視頻遊戲控制台、蜂巢或衛星無線電電話、視頻電話會議裝置等。數位視頻裝置實現視頻壓縮技術（諸如在由MPEG-2、MPEG-4、ITU-T H.263或ITU-T H.264/MPEG-4（第10部分，先進視頻寫碼（AVC））、ITU-T H.265（亦被稱為高效率視頻寫碼（HEVC））定義的標準以及此類標準的延伸中描述的那些技術），以更加高效地傳送及接收數位視頻資訊。Digital video capabilities can be incorporated into a wide variety of devices, including digital television, digital broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital Media players, video game devices, video game consoles, cellular or satellite radio phones, video teleconferencing devices, etc. Digital video devices implementing video compression techniques (such as those developed by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4 (Part 10, Advanced Video Coding (AVC)), Standards defined by ITU-T H.265 (also known as High Efficiency Video Coding (HEVC)) and those techniques described in extensions to such standards) to transmit and receive digital video information more efficiently.

視頻壓縮技術履行空間預測及/或時間預測，以減少或去除在視頻序列中固有的冗餘。對於基於塊的視頻寫碼，可以將視頻幀或切片劃分為宏塊。每個宏塊可以被進一步劃分。經幀內寫碼的（I）幀或切片中的宏塊是使用相對於相鄰宏塊的空間預測進行編碼的。經幀間寫碼的（P或B）幀或切片中的宏塊可以使用相對於相同幀或切片中的相鄰宏塊的空間預測，或者使用相對於其它參考幀的時間預測。Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, video frames or slices can be divided into macroblocks. Each macroblock can be further divided. Macroblocks in an intra-coded (I) frame or slice are coded using spatial prediction with respect to neighboring macroblocks. A macroblock in an intercoded (P or B) frame or slice may use spatial prediction with respect to neighboring macroblocks in the same frame or slice, or temporal prediction with respect to other reference frames.

在視頻數據已經被編碼之後，視頻數據可以被分封化以進行傳送或儲存。視頻數據可以被組裝成符合各種標準（諸如國際標準化組織（ISO）的基媒體檔案格式以及其延伸（諸如AVC））中的任何一種的視頻檔案。After the video data has been encoded, the video data may be packetized for transmission or storage. Video data may be assembled into a video archive conforming to any of various standards, such as the International Organization for Standardization (ISO) base media archive format and extensions thereof, such as AVC.

概括而言，本公開內容描述了與串流互動式媒體數據相關的技術。此類互動式媒體數據可以是例如虛擬實境、擴增實境或其它此類互動式內容（諸如其它三維視頻內容）。最新的MPEG場景描述元素包括對glTF 2.0中的定時媒體的支援。媒體存取功能（MAF）向呈現引擎提供應用程式介面（API），呈現引擎可以通過該介面請求定時媒體。執行MAF的檢索單元可以處理所檢索到的定時媒體數據，並且通過循環緩衝器將所處理的媒體數據以期望格式傳遞給呈現引擎。當前的MPEG場景描述允許用戶在6個自由度（6DoF）中消耗場景媒體數據。因此，用戶通常能夠在3D場景中自由地移動（例如，穿過在3D場景中顯示的牆）。然而，內容作者可能希望對觀看者在特定區域的移動施加限制，例如以防止移動穿過所顯示的牆或其它物體。本公開內容描述了施加此類限制的技術，這些限制可以改善用戶的體驗，因為可以通過防止用戶穿越虛擬世界中的障礙物而使體驗變得更真實。In summary, this disclosure describes techniques related to streaming interactive media data. Such interactive media data may be, for example, virtual reality, augmented reality, or other such interactive content (such as other three-dimensional video content). The latest MPEG scene description elements include support for timed media in glTF 2.0. The media access function (MAF) provides an application programming interface (API) to the presentation engine through which the presentation engine can request timed media. The retrieval unit executing the MAF may process the retrieved timed media data, and pass the processed media data to the rendering engine in a desired format through a circular buffer. The current MPEG scene description allows users to consume scene media data in 6 degrees of freedom (6DoF). Thus, the user is generally able to move freely in the 3D scene (eg, through a wall displayed in the 3D scene). However, content authors may wish to impose restrictions on viewer movement in certain areas, for example to prevent movement through displayed walls or other objects. This disclosure describes techniques for imposing such restrictions that can improve a user's experience by making the experience more realistic by preventing the user from traversing obstacles in the virtual world.

在一個示例中，一種檢索媒體數據之方法包括：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收用於該三維場景的相機控制數據，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，由該呈現引擎防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In one example, a method of retrieving media data includes: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, camera control data, the camera control data including data defining restrictions to prevent the virtual camera from traversing the at least one virtual solid object; receiving camera movement data from the user by the rendering engine, the camera movement data requesting the virtual camera to move through the at least one virtual solid object a virtual solid object; and using the camera control data, preventing, by the rendering engine, the virtual camera from traversing the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種用於檢索媒體數據的裝置包括：記憶體，其被組態以儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In another example, an apparatus for retrieving media data includes: a memory configured to store media data; and one or more processors implemented in a circuit and configured to execute a rendering engine, The rendering engine is configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive camera control data for the three-dimensional scene, the camera control data including defining constraints to data preventing the virtual camera from traversing the at least one virtual solid object; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the camera control data, preventing the virtual camera from responding The camera movement data traverses the at least one virtual solid object.

在另一示例中，一種計算機可讀儲存媒體具有儲存在其上的指令，該指令在被執行時使得客戶端裝置的處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In another example, a computer-readable storage medium has stored thereon instructions that, when executed, cause a processor of a client device to: receive streaming media data representing A virtual three-dimensional scene of at least one virtual solid object; receiving camera control data for the three-dimensional scene, the camera control data including data defining restrictions to prevent the virtual camera from traversing the at least one virtual solid object; receiving camera movement data from a user, the camera movement data requesting the virtual camera to move through the at least one virtual solid object; and using the camera control data to prevent the virtual camera from passing through the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種用於檢索媒體數據的裝置包括：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收用於該三維場景的相機控制數據的構件，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；用於從用戶接收相機移動數據的構件，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及用於使用該相機控制數據來防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體的構件。In another example, an apparatus for retrieving media data includes: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for camera control data of the scene, the camera control data comprising data defining limits to prevent the virtual camera from traversing the at least one virtual solid object; means for receiving camera movement data from the user requesting the virtual camera to move through passing the at least one virtual solid object; and means for using the camera control data to prevent the virtual camera from passing through the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種檢索媒體數據之方法包括：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，由該呈現引擎防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In another example, a method of retrieving media data includes: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, data representing the at least one virtual object collision data for boundaries of solid objects; receiving, by the rendering engine, camera movement data from a user, the camera movement data requesting the virtual camera to move through the at least one virtual solid object; and using the object collision data, the rendering engine prevents The virtual camera traverses the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種用於檢索媒體數據的裝置包括：記憶體，其被組態以儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In another example, an apparatus for retrieving media data includes: a memory configured to store media data; and one or more processors implemented in a circuit and configured to execute a rendering engine, The rendering engine is configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive object collision data representing boundaries of the at least one virtual solid object; receive a camera from a user movement data requesting the virtual camera to move through the at least one virtual solid object; and using the object collision data to prevent the virtual camera from moving through the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種計算機可讀儲存媒體具有儲存在其上的指令，該指令在被執行時使得客戶端裝置的處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。In another example, a computer-readable storage medium has stored thereon instructions that, when executed, cause a processor of a client device to: receive streaming media data representing a virtual three-dimensional scene of at least one virtual solid object; receiving object collision data representing a boundary of the at least one virtual solid object; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the object collision data, preventing the virtual camera from traversing the at least one virtual solid object in response to the camera movement data.

在另一示例中，一種用於檢索媒體數據的裝置包括：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據的構件；用於從用戶接收相機移動數據的構件，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及用於使用該物體碰撞數據來防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體的構件。In another example, an apparatus for retrieving media data includes: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for object collision data of a boundary of a virtual solid object; means for receiving camera movement data from a user requesting that the virtual camera move through the at least one virtual solid object; and for using the object collision data to A component that prevents the virtual camera from traversing the at least one virtual solid object in response to the camera movement data.

在隨附圖式及以下描述中闡述了一個或多個示例的細節。根據說明書及圖式以及根據申請專利範圍，其它特徵、目的及優點將是顯而易見的。The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the scope of claims.

互動式媒體數據可以通過網路進行串流傳輸。例如，客戶端裝置可以使用單播、廣播、多播等來檢索互動式媒體數據。互動式媒體數據可以是例如用於延展實境（XR）、擴增實境（AR）、虛擬實境（VR）等的三維（3D）媒體數據。因此，當被呈現給用戶時，用戶可以導航根據互動式媒體數據而渲染的3D虛擬場景。Interactive media data can be streamed over the Internet. For example, a client device may retrieve interactive media data using unicast, broadcast, multicast, etc. Interactive media data may be, for example, three-dimensional (3D) media data for extended reality (XR), augmented reality (AR), virtual reality (VR), and the like. Thus, when presented to the user, the user can navigate the 3D virtual scene rendered from the interactive media data.

MPEG場景描述可以描述用於虛擬世界或體驗的三維（3D）場景，例如用於XR、VR、AR或其它互動式媒體體驗。根據本公開內容的技術，MPEG場景描述可以描述3D場景內的物體，諸如椅子、牆、桌子、櫃檯、門、窗或其它固體物體。本公開內容描述了如下的技術：通過該技術，可以增強MPEG場景描述（或其它這樣的描述數據集），以對虛擬相機的移動施加限制，例如以防止相機穿過諸如牆壁之類的固體物體。MPEG scene descriptions can describe three-dimensional (3D) scenes for virtual worlds or experiences, such as for XR, VR, AR or other interactive media experiences. According to the techniques of this disclosure, an MPEG scene description can describe objects within a 3D scene, such as chairs, walls, tables, counters, doors, windows, or other solid objects. This disclosure describes techniques by which MPEG scene descriptions (or other such description datasets) can be enhanced to impose constraints on the movement of a virtual camera, for example to prevent the camera from passing through solid objects such as walls .

具體地，場景描述可以描述允許相機移動的路徑集合。路徑可以被描述為通過路徑分段連接的錨點集合。為了增強相機控制的表現力，每個路徑分段可以利用邊界體積進行增強，邊界體積允許沿著路徑的運動的某種自由。Specifically, the scene description may describe the set of paths that the camera is allowed to move. A path can be described as a collection of anchors connected by path segments. To enhance the expressiveness of camera control, each path segment can be augmented with a bounding volume that allows some freedom of motion along the path.

另外或替代地，場景描述可以描述場景中的虛擬固體物體。場景描述可以提供資訊，其表示例如物體之邊界、物體是否可能受到與用戶或其它物體的碰撞的影響（諸如物體是否移動或將響應此類碰撞而保持靜止）、表示碰撞的物體如何與物體互動的用於物體的材料、及/或表示響應於碰撞而播放或應用於物體的動畫的動畫數據。Additionally or alternatively, the scene description may describe virtual solid objects in the scene. A scene description can provide information that represents, for example, the boundaries of an object, whether the object is likely to be affected by collisions with the user or other objects (such as whether the object moves or will remain stationary in response to such collisions), represents how colliding objects interact with objects The material used for the object, and/or animation data representing animations played or applied to the object in response to collisions.

本公開內容的技術可以應用於符合根據以下各項中的任何一項封裝的視頻數據的視頻檔案：ISO基媒體檔案格式、可縮放視頻寫碼（SVC）檔案格式、先進視頻寫碼（AVC）檔案格式、第三代合作夥伴計劃（3GPP）檔案格式、及/或多視圖視頻寫碼（MVC）檔案格式、或其它類似的視頻檔案格式。The techniques of this disclosure may be applied to video archives conforming to video data encapsulated according to any of the following: ISO base media archive format, Scalable Video Coding (SVC) archive format, Advanced Video Coding (AVC) file format, 3rd Generation Partnership Project (3GPP) file format, and/or Multiview Video Coding (MVC) file format, or other similar video file formats.

在HTTP串流傳輸中，頻繁使用的操作包括HEAD、GET及部分GET。HEAD操作檢索與給定的統一資源定位符（URL）或統一資源名稱（URN）相關聯的檔案的標頭，而不檢索與URL或URN相關聯的酬載。GET操作檢索與給定的URL或URN相關聯的整個檔案。部分GET操作接收作為輸入參數的位元組範圍，並且檢索檔案的連續數量的位元組，其中位元組數量對應於所接收的位元組範圍。因此，可以提供電影片段以用於HTTP串流傳輸，因為部分GET操作可以獲得一個或多個個別電影片段。在電影片段中，可以存在不同軌道的若干軌道片段。在HTTP串流傳輸中，媒體呈現可以是客戶端可存取的結構化數據彙集。客戶端可以請求並且下載媒體數據資訊以向用戶呈現串流傳輸服務。In HTTP streaming, frequently used operations include HEAD, GET and partial GET. The HEAD operation retrieves the header of the archive associated with a given Uniform Resource Locator (URL) or Uniform Resource Name (URN), without retrieving the payload associated with the URL or URN. The GET operation retrieves the entire archive associated with a given URL or URN. A partial GET operation receives as an input parameter a byte range and retrieves a consecutive number of bytes for the archive, where the number of bytes corresponds to the received byte range. Thus, movie fragments can be provided for HTTP streaming, since a partial GET operation can obtain one or more individual movie fragments. In a movie fragment there may be several track fragments of different tracks. In HTTP streaming, a media presentation may be a client-accessible collection of structured data. The client can request and download the media data information to present the streaming service to the user.

在使用HTTP串流傳輸來對3GPP數據進行串流傳輸的示例中，針對多媒體內容的視頻及/或音頻數據可以存在多個表示。如下文所解釋的，不同的表示可以對應於不同的寫碼特性（例如，視頻寫碼標準的不同簡檔或級別）、不同的寫碼標準或寫碼標準的延伸（諸如多視圖及/或可縮放延伸）、或不同的位元率。這樣的表示的清單可以是在媒體呈現描述（MPD）數據結構中定義的。媒體呈現可以對應於HTTP串流傳輸客戶端裝置可存取的結構化數據彙集。HTTP串流傳輸客戶端裝置可以請求並且下載媒體數據資訊以向客戶端裝置的用戶呈現串流傳輸服務。媒體呈現可以是在MPD數據結構中描述的，MPD數據結構可以包括MPD的更新。In the example of streaming 3GPP data using HTTP streaming, there may be multiple representations for the video and/or audio data of the multimedia content. As explained below, different representations may correspond to different coding characteristics (e.g., different profiles or levels of video coding standards), different coding standards, or extensions of coding standards (such as multi-view and/or scalable extension), or different bit rates. A list of such representations may be defined in a Media Presentation Description (MPD) data structure. A media presentation may correspond to a collection of structured data accessible to an HTTP streaming client device. The HTTP streaming client device can request and download media data information to present the streaming service to the user of the client device. A media presentation may be described in an MPD data structure, which may include updates to the MPD.

媒體呈現可以含有一個或多個時段的序列。每個時段可以延長直到下一時段的開始為止，或者直到媒體呈現的結束為止（在最後一個時段的情況下）。每個時段可以含有針對相同媒體內容的一個或多個表示。表示可以是音頻、視頻、定時文本或其它此類數據的數個替代經編碼版本之一。表示可以在編碼類型（例如，對於視頻數據而言，位元率、解析度及/或編解碼器、以及對於音頻數據而言，位元率、語言及/或編解碼器）方面不同。術語表示可以用於指稱經編碼的音頻或視頻數據中的與多媒體內容的特定時段相對應並且以特定方式編碼的一部分。A media presentation may contain a sequence of one or more time periods. Each period may extend until the start of the next period, or until the end of the media presentation (in the case of the last period). Each period may contain one or more representations for the same media content. The representation may be one of several alternative encoded versions of audio, video, timed text, or other such data. The representations may differ in encoding type (eg, bit rate, resolution, and/or codec for video data, and bit rate, language, and/or codec for audio data). The term representation may be used to refer to a portion of encoded audio or video data that corresponds to a particular period of multimedia content and is encoded in a particular manner.

特定時段的表示可以被分配給由MPD中的指示這些表示所屬的適配集合的屬性所指示的組。同一適配集合中的表示通常被認為是彼此的替代，因為客戶端裝置可以在這些表示之間動態且無縫地切換，例如以履行帶寬適配。例如，用於特定時段的視頻數據的每個表示可以被分配給相同的適配集合，使得可以選擇這些表示中的任何表示進行解碼以呈現多媒體內容的用於對應時段的媒體數據，諸如視頻數據或音頻數據。在一些示例中，在一個時段內的媒體內容可以通過來自組0的任何一個表示（如果存在的話）或者來自每個非零組的至多一個表示的組合來表示。用於時段的每個表示的時序數據可以是相對於該時段的開始時間來表達的。Representations of a particular time period may be assigned to a group indicated by an attribute in the MPD indicating the adapted set to which these representations belong. Representations in the same adaptation set are generally considered to be substitutes for each other, since a client device can dynamically and seamlessly switch between these representations, eg, to perform bandwidth adaptation. For example, each representation of video data for a particular period of time may be assigned to the same adaptation set so that any of these representations may be selected for decoding to render media data for the corresponding period of multimedia content, such as video data or audio data. In some examples, media content within a time period may be represented by any one representation from group 0, if present, or a combination of at most one representation from each non-zero group. Timing data for each representation of a period may be expressed relative to the start time of that period.

表示可以包括一個或多個分段。每個表示可以包括初始化分段，或者表示的每個分段可以是自初始化的。當存在時，初始化分段可以含有用於存取表示的初始化資訊。通常，初始化分段不含有媒體數據。分段可以由標識符唯一地引用，諸如統一資源定位符（URL）、統一資源名稱（URN）或統一資源標識符（URI）。MPD可以為每個分段提供標識符。在一些示例中，MPD亦可以以 range屬性的形式提供位元組範圍，位元組範圍可以對應於用於在檔案內通過URL、URN或URI可存取的分段的數據。 A representation can consist of one or more segments. Each representation may include initialization segments, or each segment of the representation may be self-initializing. When present, the initialization section may contain initialization information for accessing representations. Typically, initialization segments contain no media data. A segment may be uniquely referenced by an identifier, such as a Uniform Resource Locator (URL), Uniform Resource Name (URN), or Uniform Resource Identifier (URI). MPD can provide an identifier for each segment. In some examples, the MPD may also provide a byte range in the form of a range attribute, which may correspond to data for a segment within the archive accessible via a URL, URN, or URI.

可以選擇不同的表示以用於基本上同時地檢索不同類型的媒體數據。例如，客戶端裝置可以選擇要從其檢索分段的音頻表示、視頻表示及定時文本表示。在一些示例中，客戶端裝置可以選擇特定的適配集合以履行帶寬適配。亦即，客戶端裝置可以選擇包括視頻表示的適配集合、包括音頻表示的適配集合及/或包括定時文本的適配集合。替代地，客戶端裝置可以為某些類型的媒體（例如，視頻）選擇適配集合，而為其它類型的媒體（例如，音頻及/或定時文本）直接選擇表示。Different representations can be selected for substantially simultaneous retrieval of different types of media data. For example, a client device may select audio representations, video representations, and timed text representations from which to retrieve segments. In some examples, a client device may select a particular adaptation set to perform bandwidth adaptation. That is, the client device may select an adaptation set that includes video representations, an adaptation set that includes audio representations, and/or an adaptation set that includes timed text. Alternatively, a client device may select adaptation sets for certain types of media (eg, video) and directly select representations for other types of media (eg, audio and/or timed text).

圖1是示出實現用於在網路上對媒體數據進行串流傳輸的技術的示例系統10的方塊圖。在該示例中，系統10包括內容準備裝置20、伺服器裝置60及客戶端裝置40。客戶端裝置40及伺服器裝置60通過可以包括互聯網的網路74通信地耦合。在一些示例中，內容準備裝置20及伺服器裝置60亦可以通過網路74或另一網路耦合，或者可以直接通信地耦合。在一些示例中，內容準備裝置20及伺服器裝置60可以包括相同的裝置。1 is a block diagram illustrating an example system 10 that implements techniques for streaming media data over a network. In this example, system 10 includes content preparation device 20 , server device 60 and client device 40 . Client device 40 and server device 60 are communicatively coupled by network 74, which may include the Internet. In some examples, content preparation device 20 and server device 60 may also be coupled through network 74 or another network, or may be directly communicatively coupled. In some examples, content preparation device 20 and server device 60 may comprise the same device.

在圖1的示例中，內容準備裝置20包括音頻源22及視頻源24。音頻源22可以包括例如麥克風，其產生表示被捕獲的要由音頻編碼器26編碼的音頻數據的電信號。替代地，音頻源22可以包括儲存先前記錄的音頻數據的儲存媒體、音頻數據生成器（諸如計算機化的合成器）、或任何其它音頻數據源。視頻源24可以包括產生要由視頻編碼器28編碼的視頻數據的攝影機、利用先前記錄的視頻數據而編碼的儲存媒體、視頻數據生成單元（諸如計算機圖形源）、或任何其它視頻數據源。在所有示例中，內容準備裝置20不一定通信地耦合到伺服器裝置60，而是可以將多媒體內容儲存到由伺服器裝置60讀取的單獨媒體。In the example of FIG. 1 , content preparation device 20 includes audio source 22 and video source 24 . Audio source 22 may include, for example, a microphone that generates electrical signals representative of captured audio data to be encoded by audio encoder 26 . Alternatively, audio source 22 may comprise a storage medium storing previously recorded audio data, an audio data generator such as a computerized synthesizer, or any other source of audio data. Video source 24 may include a video camera that generates video data to be encoded by video encoder 28, a storage medium encoded with previously recorded video data, a video data generation unit such as a computer graphics source, or any other source of video data. In all examples, content preparation device 20 is not necessarily communicatively coupled to server device 60 , but may store multimedia content to a separate medium that is read by server device 60 .

原始音頻及視頻數據可以包括類比或數位數據。類比數據可以在被音頻編碼器26及/或視頻編碼器28編碼之前被數位化。音頻源22可以在講話參與者正在講話時從講話參與者獲得音頻數據，並且視頻源24可以同時獲得講話參與者的視頻數據。在其它示例中，音頻源22可以包括包含儲存的音頻數據的計算機可讀儲存媒體，而視頻源24可以包括包含儲存的視頻數據的計算機可讀儲存媒體。以這種方式，在本公開內容中描述的技術可以被應用於實況的（live）、串流傳輸的、即時的（real-time）音頻及視頻數據或者被應用於被存檔的、預先記錄的音頻及視頻數據。Raw audio and video data may include analog or digital data. Analog data may be digitized before being encoded by audio encoder 26 and/or video encoder 28 . Audio source 22 may obtain audio data from the speaking participant while the speaking participant is speaking, and video source 24 may simultaneously obtain video data of the speaking participant. In other examples, audio source 22 may include a computer-readable storage medium containing stored audio data, and video source 24 may include a computer-readable storage medium containing stored video data. In this way, the techniques described in this disclosure can be applied to live, streaming, real-time audio and video data or to archived, pre-recorded audio and video data.

與視頻幀相對應的音頻幀通常是含有音頻數據的音頻幀，音頻數據是與由視頻源24捕獲（或生成）的被含有在視頻幀內的視頻數據同時地、由音頻源22捕獲（或生成）的。例如，當講話參與者通常通過講話產生音頻數據時，音頻源22捕獲音頻數據，而視頻源24同時（即，當音頻源22正在捕獲音頻數據時）捕獲講話參與者的視頻數據。因此，音頻幀可以在時間上對應於一個或多個特定視頻幀。相應地，對應於視頻幀的音頻幀通常對應於以下情形：其中音頻數據及視頻數據是同時被捕獲的，並且針對其音頻幀及視頻幀分別包括同時被捕獲的音頻數據及視頻數據。An audio frame corresponding to a video frame is typically an audio frame containing audio data captured by audio source 22 (or Generated. For example, audio source 22 captures audio data when a speaking participant would normally produce audio data by speaking, while video source 24 simultaneously (ie, while audio source 22 is capturing audio data) captures video data for the speaking participant. Thus, an audio frame may correspond in time to one or more specific video frames. Accordingly, an audio frame corresponding to a video frame generally corresponds to a situation in which audio data and video data are captured simultaneously, and for which the audio frame and video frame respectively include the simultaneously captured audio data and video data.

在一些示例中，音頻編碼器26可以將表示用於每個經編碼的音頻幀的音頻數據被記錄的時間的時間戳編碼到該經編碼的音頻幀中，並且類似地，視頻編碼器28可以將表示用於每個經編碼的視頻幀的視頻數據被記錄的時間的時間戳編碼在該經編碼的視頻幀中。在這樣的示例中，音頻幀對應於視頻幀可以包括含有時間戳的音頻幀及含有相同時間戳的視頻幀。內容準備裝置20可以包括內部時鐘，其中音頻編碼器26及/或視頻編碼器28可以根據該內部時鐘來生成時間戳，或者音頻源22及視頻源24可以使用該內部時鐘將音頻數據及視頻數據分別與時間戳進行關聯。In some examples, audio encoder 26 may encode into each encoded audio frame a time stamp representing the time at which the audio data was recorded for the encoded audio frame, and similarly, video encoder 28 may A time stamp representing the time at which video data for each encoded video frame was recorded is encoded in the encoded video frame. In such an example, an audio frame corresponding to a video frame may include an audio frame containing a time stamp and a video frame containing the same time stamp. Content preparation device 20 may include an internal clock from which audio encoder 26 and/or video encoder 28 may generate time stamps, or which audio source 22 and video source 24 may use to convert audio and video data Associated with timestamps respectively.

在一些示例中，音頻源22可以向音頻編碼器26發送與音頻數據被記錄的時間相對應的數據，而視頻源24可以向視頻編碼器28發送與視頻數據被記錄的時間相對應的數據。在一些示例中，音頻編碼器26可以將序列標識符編碼到經編碼的音頻數據中，以指示經編碼的音頻數據的相對時間順序，但是不一定指示音頻數據被記錄的絕對時間，並且類似地，視頻編碼器28亦可以使用序列標識符來指示經編碼的視頻數據的相對時間順序。類似地，在一些示例中，序列標識符可以被映射或以其它方式與時間戳相關。In some examples, audio source 22 may send data to audio encoder 26 corresponding to a time when audio data was recorded, and video source 24 may send data to video encoder 28 corresponding to a time when video data was recorded. In some examples, audio encoder 26 may encode a sequence identifier into the encoded audio data to indicate the relative temporal order of the encoded audio data, but not necessarily the absolute time at which the audio data was recorded, and similarly , video encoder 28 may also use sequence identifiers to indicate the relative temporal order of the encoded video data. Similarly, in some examples, sequence identifiers may be mapped or otherwise related to timestamps.

音頻編碼器26通常產生經編碼的音頻數據的流，而視頻編碼器28產生經編碼的視頻數據的流。每個個別的數據流（無論是音頻還是視頻）都可以被稱為基本流。基本流是表示的單個的、經數位寫碼的（可能被壓縮的）分量。例如，表示的經寫碼的視頻或音頻部分可以是基本流。在將基本流封裝在視頻檔案內之前，可以將其轉換為分封化基本流（PES）。在同一表示內，流ID可以用於將屬一個基本流的PES封包與屬另一基本流的PES封包區分開。基本流的基礎數據單元是分封化基本流（PES）封包。因此，經寫碼的視頻數據通常對應於基本視頻流。類似地，音頻數據對應於一個或多個相應的基本流。Audio encoder 26 typically produces a stream of encoded audio data, while video encoder 28 produces a stream of encoded video data. Each individual data stream (whether audio or video) can be called an elementary stream. An elementary stream is a single, bit-encoded (possibly compressed) component of a representation. For example, the encoded video or audio portion of a representation may be an elementary stream. Before encapsulating an elementary stream within a video archive, it can be converted to a Packetized Elementary Stream (PES). Within the same representation, the stream ID can be used to distinguish PES packets belonging to one elementary stream from PES packets belonging to another elementary stream. The basic data unit of an elementary stream is a packetized elementary stream (PES) packet. Thus, encoded video data generally corresponds to an elementary video stream. Similarly, audio data corresponds to one or more corresponding elementary streams.

許多視頻寫碼標準（諸如ITU-T H.264/AVC、以及即將產生的高效率視頻寫碼（HEVC）標準）定義了用於無錯誤位元流的語法、語義及解碼過程，其中的任何一者符合某個簡檔或級別。視頻寫碼標準通常不指定編碼器，但是編碼器被派給有保證所生成的位元流對於解碼器來說是符合標準的任務。在視頻寫碼標準的背景下，“簡檔”對應於應用於它們的演算法、特徵、或工具及約束的子集。例如，如由H.264標準所定義的，“簡檔”是由H.264標準所指定的整個位元流語法的子集。“級別”對應於與圖片的解析度、位元率及塊處理率有關的解碼器資源消耗的限制，諸如例如，解碼器記憶體及計算。可以利用profile_idc（簡檔指示符）值來用信號通知簡檔，而可以利用level_idc（級別指示符）值來用信號通知級別。Many video coding standards (such as ITU-T H.264/AVC, and the forthcoming High Efficiency Video Coding (HEVC) standard) define the syntax, semantics, and decoding process for error-free bitstreams, any of which One conforms to a certain profile or level. Video coding standards usually do not specify encoders, but encoders are tasked with ensuring that the resulting bitstream is standard-compliant for decoders. In the context of video coding standards, "profiles" correspond to subsets of algorithms, features, or tools and constraints that apply to them. For example, as defined by the H.264 standard, a "profile" is a subset of the entire bitstream syntax specified by the H.264 standard. A "level" corresponds to a limit on decoder resource consumption, such as, for example, decoder memory and computation, related to the resolution, bit rate, and block processing rate of a picture. A profile may be signaled with a profile_idc (profile indicator) value, while a level may be signaled with a level_idc (level indicator) value.

例如，H.264標準認可的是，在由給定簡檔的語法施加的界限內，仍然可能需要編碼器及解碼器的性能的大變化，這取決於由位元流中的語法元素所採用的值，諸如指定的經解碼的圖片大小。H.264標準進一步認可的是，在許多應用中，實現能夠處理特定簡檔內的語法的所有假設用途的解碼器是既不實用亦不經濟的。因此，H.264標準將“級別”定義為對在位元流中的語法元素的值施加的指定的約束集合。這些約束可能是對值的簡單限制。替代地，這些約束可以採取對值的算術組合的約束的形式（例如，圖片寬度乘以圖片高度乘以每秒解碼的圖片數量）。H.264標準進一步提供了個別實現針對每個支援的簡檔可以支援不同級別。For example, the H.264 standard recognizes that within the bounds imposed by the syntax of a given profile, large variations in the performance of encoders and decoders may still be required, depending on the A value such as the specified decoded picture size. The H.264 standard further recognizes that in many applications it is neither practical nor economical to implement a decoder capable of handling all hypothetical uses of the syntax within a particular profile. Accordingly, the H.264 standard defines a "level" as a specified set of constraints imposed on the values of syntax elements in a bitstream. These constraints may be simple restrictions on values. Alternatively, these constraints may take the form of constraints on arithmetic combinations of values (eg picture width times picture height times number of pictures decoded per second). The H.264 standard further provides that individual implementations may support different levels for each supported profile.

符合簡檔的解碼器通常支援在簡檔中定義的所有特徵。例如，作為寫碼特徵，B圖片寫碼在H.264/AVC的基準簡檔中是不支援的，但是在H.264/AVC的其它簡檔中是支援的。符合級別的解碼器應當能夠對不需要超出在該級別中定義的限制的資源的任何位元流進行解碼。簡檔及級別的定義可以有助於可解釋性。例如，在視頻傳輸期間，可以為整個傳輸會話協商並且商定一對簡檔及級別定義。更具體地說，在H.264/AVC中，級別可以定義對以下各項的限制：需要被處理的宏塊數量、經解碼圖片緩衝器（DPB）大小、經寫碼圖片緩衝器（CPB）大小、垂直運動向量範圍、每兩個連續MB的運動向量的最大數量、以及B塊是否可以具有小於8x8個像素的子宏塊劃分。以這種方式，解碼器可以決定該解碼器是否能夠對位元流進行正確地解碼。A decoder conforming to a profile typically supports all features defined in the profile. For example, as a coding feature, B-picture coding is not supported in the basic profile of H.264/AVC, but is supported in other profiles of H.264/AVC. A conforming class-compliant decoder shall be able to decode any bitstream that does not require resources beyond the limits defined in that class. The definition of profiles and levels can aid in interpretability. For example, during video transmission, a pair of profile and level definitions may be negotiated and agreed upon for the entire transmission session. More specifically, in H.264/AVC, a level can define limits on: the number of macroblocks that need to be processed, the decoded picture buffer (DPB) size, the coded picture buffer (CPB) Size, vertical motion vector range, maximum number of motion vectors per two consecutive MBs, and whether B-blocks can have sub-macroblock partitions smaller than 8x8 pixels. In this way, a decoder can determine whether the decoder is able to decode the bitstream correctly.

在圖1的示例中，內容準備裝置20的封裝單元30從視頻編碼器28接收包括經寫碼的視頻數據的基本流，並且從音頻編碼器26接收包括經寫碼的音頻數據的基本流。在一些示例中，視頻編碼器28及音頻編碼器26可以分別包括用於從經編碼的數據形成PES封包的分封化器。在其它示例中，視頻編碼器28及音頻編碼器26可以分別與用於從經編碼的數據形成PES封包的相應的分封化器進行對接。在又其它示例中，封裝單元30可以包括用於從經編碼的音頻及視頻數據形成PES封包的分封化器。In the example of FIG. 1 , encapsulation unit 30 of content preparation device 20 receives an elementary stream comprising encoded video data from video encoder 28 and an elementary stream comprising encoded audio data from audio encoder 26 . In some examples, video encoder 28 and audio encoder 26 may each include a packetizer for forming PES packets from encoded data. In other examples, video encoder 28 and audio encoder 26 may each interface with a corresponding packetizer for forming PES packets from the encoded data. In yet other examples, encapsulation unit 30 may include a packetizer for forming PES packets from encoded audio and video data.

視頻編碼器28可以以各種方式對多媒體內容的視頻數據進行編碼，以產生多媒體內容的處於各種位元率並且具有各種特性（諸如像素解析度、幀速率、符合各種寫碼標準、符合用於各種寫碼標準的各個簡檔及/或簡檔的級別、具有一個或多個視圖的表示（例如，用於二維或三維回放）或其它這樣的特性）的不同表示。如在本公開內容中使用的表示可以包括音頻數據、視頻數據、文本數據（例如，用於隱藏式字幕）或其它這樣的數據中的一者。表示可以包括基本流，諸如音頻基本流或視頻基本流。每個PES封包可以包括標識該PES封包所屬的基本流的stream_id。封裝單元30負責將基本流組裝成各個表示的視頻檔案（例如，分段）。The video encoder 28 may encode the video data of the multimedia content in various ways to generate images of the multimedia content at various bit rates and with various characteristics (such as pixel resolution, frame rate, compliance with various encoding standards, compliance with various Different representations of individual profiles and/or levels of profiles of coding standards, representations with one or more views (eg, for 2D or 3D playback, or other such characteristics). A representation, as used in this disclosure, may include one of audio data, video data, text data (eg, for closed captioning), or other such data. A representation may include elementary streams, such as audio elementary streams or video elementary streams. Each PES packet may include a stream_id identifying the elementary stream to which the PES packet belongs. Encapsulation unit 30 is responsible for assembling the elementary streams into video archives (eg segments) of the respective representations.

封裝單元30從音頻編碼器26及視頻編碼器28接收用於表示的基本流的PES封包，並且從PES封包形成對應的網路抽象化層（NAL）單元。可以將經寫碼的視頻分段組織為NAL單元，這些NAL單元提供了尋址到諸如視頻電話、儲存、廣播或串流傳輸之類的應用的“網路友好”視頻表示。NAL單元可以被分類為視頻寫碼層（VCL）NAL單元及非VCL NAL單元。VCL單元可以含有核心壓縮引擎，並且可以包括塊、宏塊及/或切片級數據。其它NAL單元可以是非VCL NAL單元。在一些示例中，在一個時間實例中通常被呈現為基本經寫碼圖片的經寫碼圖片可以被含有在存取單元中，存取單元可以包括一個或多個NAL單元。Encapsulation unit 30 receives PES packets for the represented elementary streams from audio encoder 26 and video encoder 28 and forms corresponding network abstraction layer (NAL) units from the PES packets. Coded video segments can be organized into NAL units that provide a "network friendly" video representation addressed to applications such as video telephony, storage, broadcasting or streaming. NAL units may be classified into video coding layer (VCL) NAL units and non-VCL NAL units. A VCL unit may contain a core compression engine, and may include block, macroblock, and/or slice level data. Other NAL units may be non-VCL NAL units. In some examples, a coded picture, generally represented as an elementary coded picture at one instance in time, may be contained in an access unit, which may include one or more NAL units.

非VCL NAL單元還可以包括參數集NAL單元及SEI NAL單元以及其它單元。參數集可以含有序列級別標頭資訊（在序列參數集（SPS）中）及不頻繁變化的圖片級別標頭資訊（在圖片參數集（PPS）中）。利用參數集（例如，PPS及SPS），不需要為每個序列或圖片重複不頻繁變化的資訊；因此可以提高寫碼效率。此外，使用參數集可以實現對重要標頭資訊的帶外傳輸，從而避免為了錯誤恢復而對於冗餘傳輸的需求。在帶外傳輸示例中，可以在與其它NAL單元（諸如SEI NAL單元）不同的信道上傳送參數集NAL單元。Non-VCL NAL units may also include parameter set NAL units and SEI NAL units, among other units. A parameter set can contain sequence-level header information (in a sequence parameter set (SPS)) and infrequently changing picture-level header information (in a picture parameter set (PPS)). With parameter sets (eg, PPS and SPS), infrequently changing information does not need to be repeated for each sequence or picture; thus, coding efficiency can be improved. In addition, the use of parameter sets enables out-of-band transmission of important header information, thereby avoiding the need for redundant transmission for error recovery. In an out-of-band transmission example, parameter set NAL units may be transmitted on a different channel than other NAL units, such as SEI NAL units.

補充增強資訊（SEI）可能含有對於從VCL NAL單元解碼經寫碼的圖片樣本而言不必要的資訊，但是可能有助於與解碼、顯示、錯誤恢復及其它目的有關的過程。SEI訊息可以被含有在非VCL NAL單元中。SEI訊息是一些標準規範的正規部分，並且因此對於符合標準的解碼器實現而言並非總是強制的。SEI訊息可以是序列級別SEI訊息或圖片級別SEI訊息。一些序列級別資訊可以被含有在SEI訊息中，諸如在SVC示例中的可縮放性資訊SEI訊息、以及在MVC中的視圖可縮放性資訊SEI訊息。這些示例SEI訊息可以傳遞關於例如操作點的提取及操作點的特性的資訊。另外，封裝單元30可以形成清單檔案，諸如描述表示的特性的媒體呈現描述符（MPD）。封裝單元30可以根據可延伸標示語言（XML）來將MPD格式化。Supplemental Enhancement Information (SEI) may contain information that is not necessary for decoding coded picture samples from VCL NAL units, but may facilitate processes related to decoding, display, error recovery, and other purposes. SEI information can be contained in non-VCL NAL units. SEI messages are a formal part of some standard specifications, and thus are not always mandatory for standard-compliant decoder implementations. SEI messages can be sequence-level SEI messages or picture-level SEI messages. Some sequence level information can be contained in SEI messages, such as scalability information SEI messages in the SVC example, and view scalability information SEI messages in MVC. These example SEI messages may convey information about, for example, the extraction of the operation point and the characteristics of the operation point. Additionally, encapsulation unit 30 may form a manifest file, such as a Media Presentation Descriptor (MPD), which describes characteristics of a representation. The encapsulation unit 30 may format the MPD according to Extensible Markup Language (XML).

封裝單元30可以將用於多媒體內容之一個或多個表示的數據以及清單檔案（例如，MPD）一起提供給輸出介面32。輸出介面32可以包括網路介面、或用於寫入儲存媒體的介面（諸如通用序列匯流排（USB）介面、CD或DVD刻錄機或燒錄機、與磁性或快閃儲存媒體的介面、或用於儲存或傳送媒體數據的其它介面）。封裝單元30可以將多媒體內容的表示中的每個表示的數據提供給輸出介面32，輸出介面32可以經由網路傳輸或儲存媒體將數據發送給伺服器裝置60。在圖1的示例中，伺服器裝置60包括用於儲存各種多媒體內容64的儲存媒體62，每種多媒體內容包括相應的清單檔案66及一個或多個表示68A-68N（表示68）。在一些示例中，輸出介面32亦可以直接向網路74發送數據。Encapsulation unit 30 may provide data for one or more representations of multimedia content to output interface 32 along with a manifest file (eg, MPD). Output interface 32 may include a network interface, or an interface for writing to storage media such as a Universal Serial Bus (USB) interface, a CD or DVD recorder or burner, an interface to magnetic or flash storage media, or other interfaces for storing or transferring media data). The encapsulation unit 30 may provide the data of each of the representations of the multimedia content to the output interface 32, and the output interface 32 may send the data to the server device 60 via network transmission or storage media. In the example of FIG. 1 , server device 60 includes a storage medium 62 for storing various multimedia content 64 each including a corresponding manifest file 66 and one or more representations 68A-68N (representation 68 ). In some examples, the output interface 32 can also directly send data to the network 74 .

在一些示例中，表示68可以被分成適配集合。亦即，表示68的各個子集可以包括相應的共同特性集，諸如編解碼器、簡檔及級別、解析度、視圖數量、用於分段的檔案格式、可以標識將與表示及/或要被解碼及例如由揚聲器呈現的音頻數據一起顯示的文本的語言或其它特性的文本類型資訊、可以描述針對適配集合中的表示的場景的相機角度或現實世界視角的相機角度資訊、描述內容對於特定觀眾的適合性的評級資訊等。In some examples, representation 68 may be divided into adapted sets. That is, each subset of representations 68 may include a corresponding set of common properties, such as codec, profile and level, resolution, number of views, file format for segmentation, may identify the associated representation and/or required Text type information that is decoded and displayed with, for example, audio data presented by a speaker, language or other characteristics of the text, camera angle information that can describe the camera angle or real-world perspective for the scene represented in the adaptation set, describing the content for Rating information on suitability for a particular audience, etc.

清單檔案66可以包括指示與特定的適配集合相對應的表示68的子集以及用於適配集合的共同特性的數據。清單檔案66亦可以包括表示用於適配集合中的個別表示的個別特性的數據，諸如位元率。以這種方式，適配集合可以提供簡化的網路帶寬適配。可以使用清單檔案66的適配集合元素中的子元素來指示在適配集合中的表示。Manifest archive 66 may include data indicating a subset of representations 68 corresponding to a particular set of adaptations, as well as common properties for the sets of adaptations. Manifest file 66 may also include data representing individual characteristics used to adapt individual representations in the set, such as bit rates. In this way, adaptation sets can provide simplified network bandwidth adaptation. Sub-elements in the adaptation set element of manifest file 66 may be used to indicate representations in the adaptation set.

伺服器裝置60包括請求處理單元70及網路介面72。在一些示例中，伺服器裝置60可以包括複數個網路介面。此外，伺服器裝置60的任何或所有特徵可以在內容遞送網路的其它裝置上實現，諸如路由器、橋接器、代理裝置、交換機或其它裝置。在一些示例中，內容遞送網路的中間裝置可以對多媒體內容64的數據進行快取，並且包括基本上與伺服器裝置60的組件一致的組件。通常，網路介面72被組態以經由網路74發送及接收數據。The server device 60 includes a request processing unit 70 and a network interface 72 . In some examples, the server device 60 may include a plurality of network interfaces. Additionally, any or all of the features of server device 60 may be implemented on other devices in the content delivery network, such as routers, bridges, proxy devices, switches, or other devices. In some examples, the intermediary device of the content delivery network may cache the data of the multimedia content 64 and include components substantially identical to those of the server device 60 . Generally, network interface 72 is configured to send and receive data over network 74 .

請求處理單元70被組態以從諸如客戶端裝置40之類的客戶端裝置接收對儲存媒體62的數據的網路請求。例如，請求處理單元70可以實現如在RFC 2616中（1999年6月，IETF，網路工作組，R. Fielding等人的“Hypertext Transfer Protocol – HTTP/1.1”）中描述的超文本傳輸協定（HTTP）版本1.1。亦即，請求處理單元70可以被組態以接收HTTP GET或部分GET請求，並且響應於該請求而提供多媒體內容64的數據。請求可以指定表示68中的一個表示的分段（例如，使用該分段的URL）。在一些示例中，請求亦可以指定分段之一個或多個位元組範圍，由此包括部分GET請求。請求處理單元70進一步可以被組態以對HTTP HEAD請求進行服務以提供表示68中的一個表示的分段的標頭數據。在任何情況下，請求處理單元70可以被組態以處理請求以將請求的數據提供給進行請求的裝置（諸如客戶端裝置40）。Request handling unit 70 is configured to receive network requests for data of storage medium 62 from a client device, such as client device 40 . For example, the request processing unit 70 may implement the Hypertext Transfer Protocol ( HTTP) version 1.1. That is, request handling unit 70 may be configured to receive an HTTP GET or partial GET request and provide data for multimedia content 64 in response to the request. The request may specify a segment of one of the representations 68 (eg, using the segment's URL). In some examples, the request may also specify one or more byte ranges of segments, thereby including partial GET requests. Request processing unit 70 may further be configured to service HTTP HEAD requests to provide header data for segments of one of representations 68 . In any event, request processing unit 70 may be configured to process requests to provide requested data to a requesting device (such as client device 40 ).

另外或替代地，請求處理單元70可以被組態以經由諸如eMBMS之類的廣播或多播協定來遞送媒體數據。內容準備裝置20可以以與所描述的基本相同的方式來創建DASH分段及/或子分段，但是伺服器裝置60可以使用eMBMS或另一廣播或多播網路傳輸協定來遞送這些分段或子分段。例如，請求處理單元70可以被組態以從客戶端裝置40接收多播組加入請求。亦即，伺服器裝置60可以向包括客戶端裝置40的客戶端裝置通告與多播組相關聯的互聯網協定（IP）位址，該多播組與特定的媒體內容（例如，實況事件的廣播）相關聯。客戶端裝置40進而可以提交用於加入多播組的請求。該請求可以在整個網路74（例如，組成網路74的路由器）中傳播，從而使路由器將去往與多播組相關聯的IP位址的訊務引導到訂制客戶端裝置（諸如客戶端裝置40）。Additionally or alternatively, request processing unit 70 may be configured to deliver media data via a broadcast or multicast protocol such as eMBMS. Content preparation device 20 may create DASH segments and/or sub-segments in substantially the same manner as described, but server device 60 may deliver these segments using eMBMS or another broadcast or multicast network transport protocol or subsegments. For example, request handling unit 70 may be configured to receive a multicast group join request from client device 40 . That is, server device 60 may advertise to client devices, including client device 40, an Internet Protocol (IP) address associated with a multicast group associated with particular media content (e.g., a broadcast of a live event). )Associated. Client device 40 may in turn submit a request to join the multicast group. This request can be propagated throughout the network 74 (e.g., the routers that make up the network 74), causing the routers to direct traffic destined for the IP address associated with the multicast group to a custom client device (such as a client end device 40).

如在圖1的示例中所示，多媒體內容64包括清單檔案66，清單檔案66可以對應於媒體呈現描述（MPD）。清單檔案66可以含有對不同替代表示68（例如，具有不同品質的視頻服務）的描述，並且該描述可以包括例如表示68的編解碼器資訊、簡檔值、級別值、位元率及其它描述性特性。客戶端裝置40可以檢索媒體呈現的MPD以決定如何存取表示68的分段。As shown in the example of FIG. 1 , multimedia content 64 includes a manifest file 66 , which may correspond to a media presentation description (MPD). The manifest file 66 may contain descriptions of different alternative representations 68 (e.g., video services with different qualities), and the description may include, for example, codec information, profile values, level values, bit rates, and other descriptions for the representation 68 sexual characteristics. Client device 40 may retrieve the MPD of the media presentation to determine how to access the segments of representation 68 .

具體地，檢索單元52可以檢索客戶端裝置40的組態數據（未示出）以決定視頻解碼器48的解碼能力及視頻輸出44的渲染能力。視頻輸出44可以被包括在用於延展實境、擴增實境或虛擬實境的顯示裝置（諸如頭戴機）中。同樣，組態數據可以指示視頻輸出44是否能夠呈現三維視頻數據，例如，用於延展實境、擴增實境、虛擬實境等。組態數據亦可以包括以下各項中的任何一項或全部：由客戶端裝置40的用戶選擇的語言偏好、與由客戶端裝置40的用戶設定的深度偏好相對應的一個或多個相機視角、及/或由客戶端裝置40的用戶選擇的評級偏好。Specifically, the retrieval unit 52 can retrieve configuration data (not shown) of the client device 40 to determine the decoding capability of the video decoder 48 and the rendering capability of the video output 44 . Video output 44 may be included in a display device, such as a headset, for extended reality, augmented reality, or virtual reality. Likewise, configuration data may indicate whether video output 44 is capable of presenting three-dimensional video data, eg, for extended reality, augmented reality, virtual reality, and the like. Configuration data may also include any or all of the following: a language preference selected by a user of client device 40, one or more camera angles corresponding to a depth preference set by a user of client device 40 , and/or rating preferences selected by the user of the client device 40.

檢索單元52可以包括例如被組態以提交HTTP GET及部分GET請求的網頁瀏覽器或媒體客戶端。檢索單元52可以對應於由客戶端裝置40之一個或多個處理器或處理單元（未示出）執行的軟體指令。在一些示例中，關於檢索單元52描述的功能中的全部或部分功能可以用硬體、或者用硬體、軟體及/或韌體的組合來實現，其中可以提供必需的硬體來執行針對軟體或韌體的指令。Retrieval unit 52 may include, for example, a web browser or media client configured to submit HTTP GET and partial GET requests. Retrieval unit 52 may correspond to software instructions executed by one or more processors or processing units (not shown) of client device 40 . In some examples, all or part of the functions described with respect to the retrieval unit 52 may be implemented by hardware, or by a combination of hardware, software and/or firmware, wherein necessary hardware may be provided to execute or firmware instructions.

檢索單元52可以將客戶端裝置40的解碼及渲染能力與由清單檔案66的資訊所指示的表示68的特性進行比較。檢索單元52可以初始地檢索清單檔案66的至少一部分以決定表示68的特性。例如，檢索單元52可以請求清單檔案66的描述一個或多個適配集合的特性的一部分。檢索單元52可以選擇表示68的具有可以由客戶端裝置40的寫碼及渲染能力滿足的特性的子集（例如，適配集合）。檢索單元52然後可以決定用於在適配集合中的表示的位元率，決定當前可用的網路帶寬量，並且從表示中的一個表示中檢索具有網路帶寬可以滿足的位元率的分段。The retrieval unit 52 may compare the decoding and rendering capabilities of the client device 40 with the characteristics of the representation 68 indicated by the information of the manifest file 66 . Retrieval unit 52 may initially retrieve at least a portion of manifest archive 66 to determine characteristics of representation 68 . For example, retrieval unit 52 may request a portion of manifest archive 66 that describes properties of one or more adaptation sets. Retrieval unit 52 may select a subset of representation 68 (eg, an adapted set) that has properties that may be satisfied by the coding and rendering capabilities of client device 40 . Retrieval unit 52 may then determine the bit rate for the representations in the adapted set, determine the amount of network bandwidth currently available, and retrieve from one of the representations a component with a bit rate that the network bandwidth can satisfy. part.

通常，較高位元率的表示可以產生較高品質的視頻回放，而較低位元率的表示可以在可用網路帶寬減小時提供足夠品質的視頻回放。相應地，當可用網路帶寬是相對高的時，檢索單元52可以從相對高位元率的表示中檢索數據，而當可用網路帶寬是低的時，檢索單元52可以從相對低位元率的表示中檢索數據。以這種方式，客戶端裝置40可以在網路74上對多媒體數據進行串流傳輸，同時亦適配網路74的變化的網路帶寬可用性。In general, higher bit rate representations can produce higher quality video playback, while lower bit rate representations can provide adequate quality video playback when available network bandwidth is reduced. Correspondingly, when the available network bandwidth is relatively high, retrieval unit 52 may retrieve data from relatively high bit-rate representations, and when available network bandwidth is low, retrieval unit 52 may retrieve data from relatively low-bit-rate representations. Retrieve data in the representation. In this manner, the client device 40 can stream multimedia data over the network 74 while also adapting to the varying network bandwidth availability of the network 74 .

另外或替代地，檢索單元52可以被組態以根據諸如eMBMS或IP多播之類的廣播或多播網路協定來接收數據。在這樣的示例中，檢索單元52可以提交用於加入與特定的媒體內容相關聯的多播網路組的請求。在加入多播組之後，檢索單元52可以接收該多播組的數據，而無需向伺服器裝置60或內容準備裝置20發出另外的請求。當不再需要多播組的數據時，檢索單元52可以提交用於離開該多播組的請求，例如，停止回放或者將信道改變到不同的多播組。Additionally or alternatively, retrieval unit 52 may be configured to receive data according to a broadcast or multicast network protocol, such as eMBMS or IP multicast. In such an example, retrieval unit 52 may submit a request to join a multicast network group associated with particular media content. After joining the multicast group, the retrieval unit 52 can receive the data of the multicast group without making an additional request to the server device 60 or the content preparation device 20 . When the data of the multicast group is no longer needed, the retrieval unit 52 may submit a request to leave the multicast group, eg, stop playback or change the channel to a different multicast group.

網路介面54可以接收所選擇的表示的分段的數據並且將其提供給檢索單元52，檢索單元52進而可以將分段提供給解封裝單元50。解封裝單元50可以將視頻檔案的元素解封裝為組成的PES流，對PES流進行解分封化以檢索經編碼的數據，並且向音頻解碼器46或視頻解碼器48發送經編碼的數據，這取決於經編碼的數據是音頻流還是視頻流的一部分（例如，如由該流的PES封包標頭所指示的）。音頻解碼器46對經編碼的音頻數據進行解碼並且將經解碼的音頻數據發送到音頻輸出42，而視頻解碼器48對經編碼的視頻數據進行解碼並且將經解碼的視頻數據（其可以包括流的複數個視圖）發送到視頻輸出44。Network interface 54 may receive data representing segments of the selected representation and provide them to retrieval unit 52 , which in turn may provide the segments to decapsulation unit 50 . Decapsulation unit 50 may decapsulate the elements of the video archive into a constituent PES stream, decapsulate the PES stream to retrieve the encoded data, and send the encoded data to audio decoder 46 or video decoder 48, which Depends on whether the encoded data is part of an audio stream or a video stream (eg, as indicated by the stream's PES packet header). Audio decoder 46 decodes encoded audio data and sends the decoded audio data to audio output 42, while video decoder 48 decodes encoded video data and sends the decoded video data (which may include stream multiple views) to the video output 44.

視頻編碼器28、視頻解碼器48、音頻編碼器26、音頻解碼器46、封裝單元30、檢索單元52及解封裝單元50均可以在適用的情況下被實現為各種適當的處理電路中的任何一者，諸如一個或多個微處理器、數位信號處理器（DSP）、特定應用積體電路（ASIC）、現場可程式閘陣列（FPGA）、離散邏輯電路、軟體、硬體、韌體或其任何組合。視頻編碼器28及視頻解碼器48中的每一者可以被包括在一個或多個編碼器或解碼器中，其中的任一者可以被整合為組合的視頻編碼器/解碼器（CODEC）的一部分。同樣，音頻編碼器26及音頻解碼器46中的每一者可以被包括在一個或多個編碼器或解碼器中，其中的任一者可以被整合為組合的CODEC的一部分。包括視頻編碼器28、視頻解碼器48、音頻編碼器26、音頻解碼器46、封裝單元30、檢索單元52及/或解封裝單元50的器具可以包括積體電路、微處理器及/或無線通信裝置（諸如蜂巢電話）。Video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, encapsulation unit 30, retrieval unit 52, and decapsulation unit 50 may be implemented as any of a variety of suitable processing circuits, as applicable. One or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic circuits, software, hardware, firmware, or any combination thereof. Each of video encoder 28 and video decoder 48 may be included in one or more encoders or decoders, either of which may be integrated into a combined video encoder/decoder (CODEC) part. Likewise, each of audio encoder 26 and audio decoder 46 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined CODEC. Appliances including video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, encapsulation unit 30, retrieval unit 52, and/or decapsulation unit 50 may include integrated circuits, microprocessors, and/or wireless Communication devices (such as cellular phones).

客戶端裝置40、伺服器裝置60及/或內容準備裝置20可以被組態以根據本公開內容的技術進行操作。出於示例的目的，本公開內容關於客戶端裝置40及伺服器裝置60描述了這些技術。然而，應當理解的是，內容準備裝置20可以被組態以履行這些技術，代替（或者除了）伺服器裝置60。Client device 40, server device 60, and/or content preparation device 20 may be configured to operate in accordance with the techniques of this disclosure. This disclosure describes these techniques with respect to client device 40 and server device 60 for purposes of example. It should be understood, however, that content preparation device 20 may be configured to perform these techniques in place of (or in addition to) server device 60 .

封裝單元30可以形成NAL單元，NAL單元包括標識該NAL單元所屬的節目的標頭以及酬載（例如，音頻數據、視頻數據、或描述NAL單元所對應的傳輸或節目流的數據）。例如，在H.264/AVC中，NAL單元包括1位元組的標頭及可變大小的酬載。在其酬載中包括視頻數據的NAL單元可以包括各種粒度級別的視頻數據。例如，NAL單元可以包括視頻數據塊、複數個塊、視頻數據的切片、或視頻數據的整個圖片。封裝單元30可以以基本流的PES封包的形式從視頻編碼器28接收經編碼的視頻數據。封裝單元30可以將每個基本流與對應的節目進行關聯。Encapsulation unit 30 may form a NAL unit that includes a header identifying the program to which the NAL unit belongs and payload (eg, audio data, video data, or data describing the transport or program stream to which the NAL unit corresponds). For example, in H.264/AVC, a NAL unit includes a 1-byte header and a variable-sized payload. A NAL unit that includes video data in its payload may include video data at various levels of granularity. For example, a NAL unit may include a block of video data, a plurality of blocks, a slice of video data, or an entire picture of video data. Encapsulation unit 30 may receive encoded video data from video encoder 28 in the form of PES packets of elementary streams. Encapsulation unit 30 may associate each elementary stream with a corresponding program.

封裝單元30亦可以從複數個NAL單元組裝存取單元。通常，存取單元可以包括一個或多個NAL單元，其用於表示視頻數據的幀、以及與該幀相對應的音頻數據（當這樣的音頻數據是可用的時）。存取單元通常包括用於一個輸出時間實例的所有NAL單元，例如，用於一個時間實例的所有音頻及視頻數據。例如，如果每個視圖具有20幀每秒（fps）的幀速率，則每個時間實例可以對應於0.05秒的時間間隔。在該時間間隔期間，可以同時渲染用於同一存取單元（同一時間實例）的所有視圖的特定幀。在一個示例中，存取單元可以包括在一個時間實例中的經寫碼的圖片，其可以被呈現為基本經寫碼圖片。Encapsulation unit 30 may also assemble access units from a plurality of NAL units. In general, an access unit may include one or more NAL units for representing a frame of video data, and (when such audio data is available) the corresponding audio data for that frame. An access unit typically includes all NAL units for one output time instance, eg, all audio and video data for one time instance. For example, if each view has a frame rate of 20 frames per second (fps), each time instance may correspond to a time interval of 0.05 seconds. During this time interval, a particular frame for all views of the same access unit (same time instance) may be rendered simultaneously. In one example, an access unit may include a coded picture at one instance of time, which may be presented as an elementary coded picture.

相應地，存取單元可以包括共同時間實例的所有音頻及視頻幀，例如，對應於時間 X的所有視圖。本公開內容亦將特定視圖的經編碼的圖片稱為“視圖分量”。亦即，視圖分量可以包括在特定時間處用於特定視圖的經編碼的圖片（或幀）。相應地，存取單元可以被定義為包括共同時間實例的所有視圖分量。存取單元的解碼順序不一定需要與輸出或顯示順序相同。 Accordingly, an access unit may include all audio and video frames at a common time instance, eg, all views corresponding to time X. This disclosure also refers to encoded pictures of a particular view as "view components." That is, a view component may include encoded pictures (or frames) for a particular view at a particular time. Accordingly, an access unit may be defined to include all view components for a common time instance. The decoding order of the access units does not necessarily need to be the same as the output or display order.

媒體呈現可以包括媒體呈現描述（MPD），其可以含有不同替代表示（例如，具有不同品質的視頻服務）的描述，並且該描述可以包括例如編解碼器資訊、簡檔值及級別值。MPD是清單檔案的一個示例，諸如清單檔案66。客戶端裝置40可以檢索媒體呈現的MPD，以決定如何存取各個呈現的電影片段。電影片段可以位於視頻檔案的電影片段盒（box）（moof盒）中。A media presentation may include a media presentation description (MPD), which may contain descriptions of different alternative representations (eg, video services with different qualities), and which may include, for example, codec information, profile values, and level values. MPD is an example of a manifest file, such as manifest file 66 . Client device 40 may retrieve the MPD of the media presentation to determine how to access the movie segments for each presentation. Movie fragments can be located in a movie fragment box (moof box) in a video archive.

清單檔案66（其可以包括例如MPD）可以通告表示68的分段的可用性。亦即，MPD可以包括指示表示68中的一個表示的第一分段變得可用的時鐘時間的資訊、以及指示表示68內的分段的持續時間的資訊。以這種方式，客戶端裝置40的檢索單元52可以基於在特定分段之前的分段的開始時間以及持續時間來決定每個分段何時可用。Manifest archive 66 (which may include, for example, MPD) may advertise the availability of segments of representation 68 . That is, the MPD may include information indicating the clock time at which a first segment of one of the representations 68 became available, and information indicating the duration of a segment within representation 68 . In this way, the retrieval unit 52 of the client device 40 can decide when each segment is available based on the start times and durations of the segments preceding the particular segment.

在封裝單元30已經基於所接收的數據將NAL單元及/或存取單元組裝為視頻檔案之後，封裝單元30將視頻檔案傳遞到輸出介面32以進行輸出。在一些示例中，封裝單元30可以將視頻檔案進行本地儲存或者經由輸出介面32將視頻檔案發送給遠程伺服器，而不是將視頻檔案直接發送給客戶端裝置40。輸出介面32可以包括例如發射器、收發器、用於將數據寫入計算機可讀媒體的裝置（諸如例如，光學驅動器、磁性媒體驅動器（例如，軟盤驅動器））、通用序列匯流排（USB）埠、網路介面或其它輸出介面。輸出介面32將視頻檔案輸出到計算機可讀媒體，諸如例如，傳輸信號、磁性媒體、光學媒體、記憶體、快閃驅動器或其它計算機可讀媒體。After encapsulation unit 30 has assembled the NAL units and/or access units into a video file based on the received data, encapsulation unit 30 passes the video file to output interface 32 for output. In some examples, the encapsulation unit 30 may store the video file locally or send the video file to a remote server through the output interface 32 instead of directly sending the video file to the client device 40 . Output interface 32 may include, for example, a transmitter, a transceiver, a device for writing data to a computer-readable medium such as, for example, an optical drive, a magnetic media drive (e.g., a floppy disk drive), a Universal Serial Bus (USB) port , network interface or other output interface. Output interface 32 outputs the video file to a computer-readable medium such as, for example, a transmission signal, magnetic media, optical media, memory, flash drive, or other computer-readable media.

網路介面54可以經由網路74接收NAL單元或存取單元，並且經由檢索單元52將NAL單元或存取單元提供給解封裝單元50。解封裝單元50可以將視頻檔案的元素解封裝為組成PES流，將PES流進行解分封化以檢索經編碼的數據，並且向音頻解碼器46或視頻解碼器48（取決於經編碼的數據是音頻流還是視頻流的一部分，例如如由流的PES封包標頭指示的）發送經編碼的數據。音頻解碼器46對經編碼的音頻數據進行解碼並且將經解碼的音頻數據發送到音頻輸出42，而視頻解碼器48對經編碼的視頻數據進行解碼並且將經解碼的視頻數據（其可以包括流的複數個視圖）發送到視頻輸出44。Network interface 54 may receive NAL units or access units via network 74 and provide the NAL units or access units to decapsulation unit 50 via retrieval unit 52 . Decapsulation unit 50 may decapsulate the elements of the video archive into constituent PES streams, decapsulate the PES streams to retrieve encoded data, and send audio decoder 46 or video decoder 48 (depending on whether the encoded data is An audio stream is also part of a video stream, eg sending encoded data as indicated by the stream's PES packet header. Audio decoder 46 decodes encoded audio data and sends the decoded audio data to audio output 42, while video decoder 48 decodes encoded video data and sends the decoded video data (which may include stream multiple views) to the video output 44.

根據本公開內容的技術，客戶端裝置40的用戶可以獲得與諸如用於延展實境（XR）、擴增實境（AR）、虛擬實境（VR）等的3D虛擬場景相關的媒體數據。用戶可以使用與客戶端裝置40相通信的一個或多個裝置（諸如控制器）來導航穿過3D虛擬場景。另外或替代地，客戶端裝置40可以包括用於決定用戶已經在真實世界空間中移動的感測器、相機等，並且客戶端裝置40可以將此類真實世界移動轉化為虛擬空間移動。According to the techniques of this disclosure, a user of client device 40 may obtain media data related to a 3D virtual scene, such as for extended reality (XR), augmented reality (AR), virtual reality (VR), and the like. A user may navigate through the 3D virtual scene using one or more devices, such as controllers, in communication with client device 40 . Additionally or alternatively, client device 40 may include sensors, cameras, etc. for determining that the user has moved in real-world space, and client device 40 may translate such real-world movement into virtual space movement.

3D虛擬場景可以包括一個或多個虛擬固體物體。此類物體可以包括例如牆、窗、桌子、椅子或可能出現在虛擬場景中的任何其它此類物體。根據本公開內容的技術，由檢索單元52檢索的媒體數據可以包括描述此類虛擬固體物體的場景描述。場景描述可以符合例如glTF 2.0的MPEG場景描述元素。A 3D virtual scene may include one or more virtual solid objects. Such objects may include, for example, walls, windows, tables, chairs, or any other such objects that may appear in a virtual scene. According to the techniques of this disclosure, the media data retrieved by retrieval unit 52 may include scene descriptions describing such virtual solid objects. The scene description may conform to eg the MPEG scene description element of glTF 2.0.

在一些示例中，場景描述可以包括可允許相機移動的描述。例如，場景描述可以描述允許虛擬相機在其中移動使得不允許虛擬相機移動超出形狀之邊界的一個或多個邊界體積（例如，根據形狀的體積，諸如球體、立方體、圓錐體、平截頭體等）。亦即，邊界體積可以描述允許虛擬相機在其中移動的可允許相機移動體積。另外或替代地，場景描述可以描述一個或多個頂點或錨點以及頂點或錨點之間的允許路徑（例如，分段）。客戶端裝置40可以僅允許虛擬相機沿著允許路徑及/或在邊界體積內移動。In some examples, the scene description may include a description of allowable camera movement. For example, the scene description may describe one or more bounding volumes within which the virtual camera is allowed to move such that the virtual camera is not allowed to move beyond the bounds of the shape (e.g., volumes according to shapes such as spheres, cubes, cones, frustums, etc. ). That is, the bounding volume may describe the allowable camera movement volume within which the virtual camera is permitted to move. Additionally or alternatively, the scene description may describe one or more vertices or anchors and allowed paths (eg, segments) between the vertices or anchors. Client device 40 may only allow the virtual camera to move along the allowed path and/or within the bounding volume.

在一些示例中，另外或替代地，場景描述可以描述在場景中虛擬相機無法穿過的一個或多個虛擬固體物體。In some examples, additionally or alternatively, the scene description may describe one or more virtual solid objects in the scene that the virtual camera cannot pass through.

圖2是更詳細地示出圖1的檢索單元52的示例組件集合的方塊圖。在該示例中，檢索單元52包括eMBMS中間件單元100、DASH客戶端110、媒體應用112以及呈現引擎114。FIG. 2 is a block diagram illustrating an example set of components of the retrieval unit 52 of FIG. 1 in more detail. In this example, the retrieval unit 52 includes an eMBMS middleware unit 100 , a DASH client 110 , a media application 112 and a presentation engine 114 .

在該示例中，eMBMS中間件單元100進一步包括eMBMS接收單元106、快取104及代理伺服器單元102。在該示例中，eMBMS接收單元106被組態以經由eMBMS，例如，根據基於單向傳輸的檔案傳遞（FLUTE）來接收數據，FLUTE是在2012年11月、RFC 6726、網路工作組、T. Paila等人的“FLUTE—File Delivery over Unidirectional Transport”中描述的，其可在tools.ietf.org/html/rfc6726處獲得。亦即，eMBMS接收單元106可以經由廣播從例如伺服器裝置60接收檔案，伺服器裝置60可以充當廣播/多播服務中心（BM-SC）。In this example, the eMBMS middleware unit 100 further includes an eMBMS receiving unit 106 , a cache 104 and a proxy server unit 102 . In this example, the eMBMS receiving unit 106 is configured to receive data via eMBMS, for example, according to File Delivery over Unidirectional Transport (FLUTE), as specified in November 2012, RFC 6726, Network Working Group, T . Paila et al., "FLUTE—File Delivery over Unidirectional Transport," available at tools.ietf.org/html/rfc6726. That is, the eMBMS receiving unit 106 may receive files via broadcast from, for example, the server device 60 , and the server device 60 may function as a Broadcast/Multicast Service Center (BM-SC).

隨著eMBMS中間件單元100接收用於檔案的數據，eMBMS中間件單元可以將所接收的數據儲存在快取104中。快取104可以包括計算機可讀儲存媒體，諸如快閃記憶體、硬盤驅動器、RAM或任何其它適當的儲存媒體。As the eMBMS middleware unit 100 receives data for archiving, the eMBMS middleware unit may store the received data in the cache 104 . Cache 104 may include computer-readable storage media, such as flash memory, hard drive, RAM, or any other suitable storage media.

代理伺服器單元102可以充當用於DASH客戶端110的伺服器。例如，代理伺服器單元102可以向DASH客戶端110提供MPD檔案或其它清單檔案。代理伺服器單元102可以在MPD檔案中通告針對分段的可用性時間以及分段可以從其中被檢索的超鏈接。這些超鏈接可以包括與客戶端裝置40相對應的本地主機位址前綴（例如，對於IPv4而言，為127.0.0.1）。以這種方式，DASH客戶端110可以使用HTTP GET或部分GET請求從代理伺服器單元102請求分段。例如，對於可從鏈接http://127.0.0.1/rep1/seg3得到的分段，DASH客戶端110可以構造包括針對http://127.0.0.1/rep1/seg3的請求的HTTP GET請求，並且向代理伺服器單元102提交該請求。代理伺服器單元102可以響應於這樣的請求來從快取104檢索所請求的數據，並且將該數據提供給DASH客戶端110。The proxy server unit 102 may act as a server for the DASH client 110 . For example, proxy server unit 102 may provide an MPD file or other manifest file to DASH client 110 . The proxy server unit 102 may advertise in the MPD file the availability times for the segments and the hyperlinks from which the segments can be retrieved. These hyperlinks may include a local host address prefix corresponding to client device 40 (eg, 127.0.0.1 for IPv4). In this manner, DASH client 110 may request segments from proxy server unit 102 using HTTP GET or partial GET requests. For example, for a segment available from link http://127.0.0.1/rep1/seg3, DASH client 110 may construct an HTTP GET request including a request for http://127.0.0.1/rep1/seg3, and send The proxy server unit 102 submits the request. Proxy server unit 102 may retrieve the requested data from cache 104 and provide the data to DASH client 110 in response to such a request.

DASH客戶端110將所檢索到的媒體數據提供給媒體應用112。例如，媒體應用112可以是網頁瀏覽器、遊戲引擎或接收及呈現媒體數據的另一應用。此外，呈現引擎114表示與媒體應用112進行互動以在3D虛擬環境中呈現所檢索到的媒體數據的應用。呈現引擎114可以例如將二維媒體數據映射到3D投影上。呈現引擎114亦可以接收來自客戶端裝置40的其它元件的輸入，以決定用戶在3D虛擬環境中的位置以及用戶在該位置中正在面對的朝向。例如，呈現引擎114可以決定用戶位置的X、Y及Z坐標、以及用戶正在觀看的朝向，以便決定要向用戶顯示的適當媒體數據。此外，呈現引擎114可以接收表示真實世界用戶移動數據的相機移動數據，並且將真實世界用戶移動數據轉化為3D虛擬空間移動數據。DASH client 110 provides the retrieved media data to media application 112 . For example, media application 112 may be a web browser, a game engine, or another application that receives and renders media data. Additionally, rendering engine 114 represents an application that interacts with media application 112 to render the retrieved media data in a 3D virtual environment. Rendering engine 114 may, for example, map two-dimensional media data onto a 3D projection. The presentation engine 114 may also receive input from other components of the client device 40 to determine the position of the user in the 3D virtual environment and the orientation the user is facing in that position. For example, presentation engine 114 may determine the X, Y, and Z coordinates of the user's location, and the orientation the user is viewing in order to determine appropriate media data to display to the user. Additionally, the rendering engine 114 may receive camera movement data representing real-world user movement data and convert the real-world user movement data into 3D virtual space movement data.

根據本公開內容的技術，eMBMS中間件單元100可以經由廣播或多播來接收媒體數據（例如，根據glTF 2.0），然後DASH客戶端110可以從eMBMS中間件單元100檢索媒體數據。媒體數據可以包括場景描述，場景描述包括指示虛擬相機可以如何移動穿過虛擬場景的相機控制資訊。例如，場景描述可以包括描述穿過虛擬場景的可允許路徑的數據，例如，沿著錨點之間的定義路徑。另外或替代地，場景描述可以包括描述邊界體積的數據，邊界體積表示允許虛擬相機在其中移動的體積。另外或替代地，場景描述可以包括描述3D虛擬環境中的一個或多個固體虛擬物體（諸如牆、桌子、椅子等）的數據。例如，場景描述的數據可以定義3D虛擬物體的碰撞邊界。場景描述進一步可以包括表示在與此類物體的碰撞的情況下發生哪種情況的數據，諸如要使用該物體播放的動畫，無論該物體是靜態的（例如，在牆的情況下）還是動態的（例如，在椅子的情況下）。According to the techniques of this disclosure, the eMBMS middleware unit 100 may receive media data via broadcast or multicast (eg, according to glTF 2.0), and then the DASH client 110 may retrieve the media data from the eMBMS middleware unit 100 . The media data may include a scene description including camera control information indicating how the virtual camera may move through the virtual scene. For example, a scene description may include data describing allowable paths through the virtual scene, eg, along defined paths between anchor points. Additionally or alternatively, the scene description may include data describing a bounding volume, representing a volume within which the virtual camera is allowed to move. Additionally or alternatively, the scene description may include data describing one or more solid virtual objects (such as walls, tables, chairs, etc.) in the 3D virtual environment. For example, scene description data can define collision boundaries for 3D virtual objects. The scene description may further include data representing what happens in the case of a collision with such an object, such as an animation to be played with the object, whether the object is static (e.g. in the case of a wall) or dynamic (for example, in the case of chairs).

呈現引擎114可以使用場景描述來決定在與3D虛擬物體的碰撞及/或嘗試移動到可允許路徑或體積之外的情況下要呈現什麼。例如，如果場景描述包括用於可允許路徑或邊界體積的數據，並且用戶嘗試移動超出可允許路徑或邊界體積，則呈現引擎114可以簡單地避免更新顯示器，從而指示不允許此類移動。作為另一示例，如果場景描述包括用於3D虛擬固體物體的數據，並且用戶嘗試移動穿過3D虛擬固體物體，那麼，如果3D虛擬固體物體是靜態的，則呈現引擎114可以避免更新顯示器。如果3D虛擬固體物體不是靜態的，則呈現引擎114可以決定要為該物體顯示的動畫，例如，要應用於該物體的平移移動及/或旋轉移動。例如，如果3D虛擬固體物體是一把椅子，則動畫數據可以指示椅子將沿著地板被推動，或者在碰撞的情況下傾倒。The rendering engine 114 may use the scene description to decide what to render in the event of a collision with a 3D virtual object and/or an attempt to move out of an allowable path or volume. For example, if the scene description includes data for an allowable path or bounding volume, and the user attempts to move beyond the allowable path or bounding volume, the rendering engine 114 may simply refrain from updating the display, indicating that such movement is not permitted. As another example, if the scene description includes data for a 3D virtual solid object, and the user attempts to move through the 3D virtual solid object, the rendering engine 114 may refrain from updating the display if the 3D virtual solid object is static. If the 3D virtual solid object is not static, rendering engine 114 may determine animations to be displayed for the object, eg, translational movement and/or rotational movement to be applied to the object. For example, if the 3D virtual solid object is a chair, the animation data may indicate that the chair will be pushed along the floor, or topple over in the event of a collision.

圖3是示出示例多媒體內容120的元素的概念圖。多媒體內容120可以對應於多媒體內容64（圖1）或被儲存在儲存媒體62中的另一多媒體內容。在圖3的示例中，多媒體內容120包括媒體呈現描述（MPD）122及複數個表示124A-124N（表示124）。表示124A包括可選的標頭數據126及分段128A-128N（分段128），而表示124N包括可選的標頭數據130及分段132A-132N（分段132）。為了方便起見，字母N用於指定表示124中的每個表示中的最後一個電影片段。在一些示例中，在表示124之間可以存在不同數量的電影片段。FIG. 3 is a conceptual diagram illustrating elements of example multimedia content 120 . Multimedia content 120 may correspond to multimedia content 64 ( FIG. 1 ) or another multimedia content stored in storage medium 62 . In the example of FIG. 3 , multimedia content 120 includes a media presentation description (MPD) 122 and a plurality of representations 124A- 124N (representations 124 ). Representation 124A includes optional header data 126 and segments 128A-128N (segment 128 ), while representation 124N includes optional header data 130 and segments 132A-132N (segment 132 ). For convenience, the letter N is used to designate the last movie segment in each of representations 124 . In some examples, there may be different numbers of movie segments between representations 124 .

MPD 122可以包括與表示124分開的數據結構。MPD 122可以對應於圖1的清單檔案66。同樣，表示124可以對應於圖1的表示68。通常，MPD 122可以包括通常描述表示124的特性的數據，諸如寫碼及渲染特性、適配集合、MPD 122所對應的簡檔、文本類型資訊、相機角度資訊、評級資訊、特技模式資訊（例如，指示包括時間子序列的表示的資訊）、及/或用於檢索遠程時段的資訊（例如，用於在回放期間將目標廣告插入到媒體內容中）。MPD 122 may include a separate data structure from representation 124 . MPD 122 may correspond to manifest file 66 of FIG. 1 . Likewise, representation 124 may correspond to representation 68 of FIG. 1 . In general, MPD 122 may include data that generally describes characteristics of representation 124, such as coding and rendering characteristics, adaptation sets, profiles to which MPD 122 corresponds, text type information, camera angle information, rating information, trick mode information (e.g. , indicating information comprising representations of temporal subsequences), and/or information for retrieving remote time periods (eg, for inserting targeted advertisements into media content during playback).

標頭數據126（當存在時）可以描述分段128的特性，例如，隨機存取點（RAP，亦被稱為流存取點（SAP））的時間位置、分段128中的哪個包括隨機存取點、與分段128內的隨機存取點的位元組偏移、分段128的統一資源定位符（URL）、或分段128的其它方面。標頭數據130（當存在時）可以描述分段132的類似特性。另外或替代地，這樣的特性可以被完全包括在MPD 122中。Header data 126 (when present) may describe characteristics of segments 128, for example, the temporal location of a random access point (RAP, also known as a stream access point (SAP)), which of the segments 128 includes random An access point, a byte offset from a random access point within segment 128 , a uniform resource locator (URL) for segment 128 , or other aspect of segment 128 . Header data 130 (when present) may describe similar characteristics of segments 132 . Additionally or alternatively, such features may be fully included in MPD 122 .

分段128、132包括一個或多個經寫碼的視頻樣本，其中的每個經寫碼的視頻樣本可以包括視頻數據的幀或切片。分段128的經寫碼的視頻樣本中的每一者可以具有類似的特性，例如，高度、寬度及帶寬要求。這樣的特性可以由MPD 122的數據來描述，雖然在圖3的示例中未示出這樣的數據。MPD 122可以包括如由3GPP規範描述的特性，其中添加了在本公開內容中描述的用信號通知的資訊中的任何或全部資訊。Segments 128, 132 include one or more encoded video samples, each of which may include a frame or slice of video data. Each of the coded video samples of segment 128 may have similar characteristics, such as height, width, and bandwidth requirements. Such characteristics may be described by data from MPD 122, although such data is not shown in the example of FIG. MPD 122 may include features as described by the 3GPP specification with any or all of the signaled information described in this disclosure added.

分段128、132中的每個分段可以與唯一的統一資源定位符（URL）相關聯。因此，分段128、132中的每個分段可以是可使用諸如DASH之類的串流傳輸網路協定來獨立地檢索的。以這種方式，諸如客戶端裝置40之類的目的地裝置可以使用HTTP GET請求來檢索分段128或132。在一些示例中，客戶端裝置40可以使用HTTP partial GET請求來檢索分段128或132的特定位元組範圍。Each of the segments 128, 132 may be associated with a unique Uniform Resource Locator (URL). Accordingly, each of the segments 128, 132 may be independently retrievable using a streaming network protocol such as DASH. In this manner, a destination device, such as client device 40, may retrieve segment 128 or 132 using an HTTP GET request. In some examples, client device 40 may retrieve a particular byte range of segment 128 or 132 using an HTTP partial GET request.

圖4是示出示例視頻檔案150的元素的方塊圖，視頻檔案150可以對應於表示的分段，諸如圖3的分段128、132之一。分段128、132中的每個分段可以包括基本上符合在圖4的示例中示出的數據的佈置的數據。視頻檔案150可以被認為是封裝分段。如上所述，根據ISO基媒體檔案格式以及其延伸的視頻檔案將數據儲存在被稱為“盒”的一系列物體中。在圖4的示例中，視頻檔案150包括檔案類型（FTYP）盒152、電影（MOOV）盒154、分段索引（sidx）盒162、電影片段（MOOF）盒164、以及電影片段隨機存取（MFRA）盒166。儘管圖4表示視頻檔案的示例，但是應當理解的是，其它媒體檔案可以包括根據ISO基媒體檔案格式以及其延伸而與視頻檔案150的數據類似地構造的其它類型的媒體數據（例如，音頻數據、定時文本數據等）。FIG. 4 is a block diagram illustrating elements of an example video archive 150, which may correspond to a segment of a representation, such as one of the segments 128, 132 of FIG. Each of the segments 128 , 132 may include data that substantially conforms to the arrangement of data shown in the example of FIG. 4 . Video archive 150 may be thought of as packaged segments. As mentioned above, data is stored in a series of objects called "boxes" according to the ISO base media file format and its extension, the video file. In the example of FIG. 4 , a video archive 150 includes a file type (FTYP) box 152, a movie (MOOV) box 154, a segment index (sidx) box 162, a movie fragment (MOOF) box 164, and a movie fragment random access ( MFRA) Box 166. Although FIG. 4 represents an example of a video archive, it should be understood that other media archives may include other types of media data (e.g., audio data) structured similarly to the data of video archive 150 according to the ISO base media archive format and extensions thereof. , timed text data, etc.).

檔案類型（FTYP）盒152通常描述用於視頻檔案150的檔案類型。檔案類型盒152可以包括標識描述用於視頻檔案150的最佳用途的規範的數據。檔案類型盒152可以替代地被放置在MOOV盒154、電影片段盒164及/或MFRA盒166之前。A file type (FTYP) box 152 generally describes the file type for the video file 150 . The dossier type box 152 may include data identifying a specification describing the best use for the video dossier 150 . File type box 152 may alternatively be placed before MOOV box 154 , movie fragment box 164 and/or MFRA box 166 .

在一些示例中，諸如視頻檔案150之類的分段可以在FTYP盒152之前包括MPD更新盒（未示出）。MPD更新盒可以包括指示與包括視頻檔案150的表示相對應的MPD將被更新的資訊以及用於更新MPD的資訊。例如，MPD更新盒可以提供用於將用於更新MPD的資源的URI或URL。作為另一示例，MPD更新盒可以包括用於更新MPD的數據。在一些示例中，MPD更新盒可以緊跟在視頻檔案150的分段類型（STYP）盒（未顯示）之後，其中STYP盒可以定義用於視頻檔案150的分段類型。In some examples, a segment such as video archive 150 may include an MPD update box (not shown) prior to FTYP box 152 . The MPD update box may include information indicating that the MPD corresponding to the representation including video dossier 150 is to be updated and information for updating the MPD. For example, the MPD update box may provide a URI or URL for a resource that will be used to update the MPD. As another example, an MPD update box may include data for updating the MPD. In some examples, an MPD update box may immediately follow a segment type (STYP) box (not shown) of video archive 150 , where the STYP box may define a segment type for video archive 150 .

在圖4的示例中，MOOV盒154包括電影標頭（MVHD）盒156、軌道（TRAK）盒158以及一個或多個電影延伸（MVEX）盒160。通常，MVHD盒156可以描述視頻檔案150的一般特性。例如，MVHD盒156可以包括描述視頻檔案150最初何時被創建、視頻檔案150最近何時被修改、用於視頻檔案150的時間標度、用於視頻檔案150的回放的持續時間的數據、或者通常描述視頻檔案150的其它數據。In the example of FIG. 4 , MOOV boxes 154 include a movie header (MVHD) box 156 , a track (TRAK) box 158 , and one or more movie extension (MVEX) boxes 160 . In general, MVHD box 156 may describe the general characteristics of video archive 150 . For example, MVHD box 156 may include data describing when video archive 150 was originally created, when video archive 150 was most recently modified, the time scale for video archive 150, the duration of playback for video archive 150, or generally describe Other data of the video file 150 .

TRAK盒158可以包括用於視頻檔案150的軌道的數據。TRAK盒158可以包括描述與TRAK盒158相對應的軌道的特性的軌道標頭（TKHD）盒。在一些示例中，TRAK盒158可以包括經寫碼的視頻圖片，而在其它示例中，軌道的經寫碼的視頻圖片可以被包括在電影片段164中，電影片段164可以通過TRAK盒158及/或sidx盒162的數據來引用。TRAK box 158 may contain data for a track of video archive 150 . The TRAK box 158 may include a track header (TKHD) box describing the characteristics of the track corresponding to the TRAK box 158 . In some examples, TRAK box 158 may include coded video pictures, while in other examples, the coded video pictures of a track may be included in movie fragments 164, which may pass through TRAK box 158 and/or or sidx box 162 data to reference.

在一些示例中，視頻檔案150可以包括一個以上的軌道。因此，MOOV盒154可以包括數個TRAK盒，TRAK盒的數量等於視頻檔案150中的軌道的數量。TRAK盒158可以描述視頻檔案150的對應軌道的特性。例如，TRAK盒158可以描述用於對應軌道的時間及/或空間資訊。當封裝單元30（圖3）在諸如視頻檔案150之類的視頻檔案中包括參數集軌道時，類似於MOOV盒154的TRAK盒158的TRAK盒可以描述參數集軌道的特性。封裝單元30可以在描述參數集軌道的TRAK盒內用信號通知序列級別SEI訊息在參數集軌道中的存在。In some examples, video archive 150 may include more than one track. Thus, the MOOV box 154 may include a number of TRAK boxes equal to the number of tracks in the video archive 150 . The TRAK box 158 may describe the characteristics of the corresponding track of the video archive 150 . For example, TRAK box 158 may describe temporal and/or spatial information for a corresponding track. When encapsulation unit 30 ( FIG. 3 ) includes parameter set tracks in a video archive such as video archive 150 , a TRAK box similar to TRAK box 158 of MOOV box 154 may describe characteristics of the parameter set track. Encapsulation unit 30 may signal the presence of sequence-level SEI messages in the parameter set track within the TRAK box describing the parameter set track.

MVEX盒160可以描述對應的電影片段164的特性，例如，以用信號通知除了被包括在MOOV盒154內的視頻數據（如果有的話）之外，視頻檔案150還包括電影片段164。在串流傳輸視頻數據的背景下，經寫碼的視頻圖片可以被包括在電影片段164中，而不是在MOOV盒154中。因此，所有經寫碼的視頻樣本可以被包括在電影片段164中，而不是在MOOV盒154中。The MVEX box 160 may describe the characteristics of the corresponding movie fragment 164 , eg, to signal that the video archive 150 includes the movie fragment 164 in addition to the video data (if any) included within the MOOV box 154 . In the context of streaming video data, encoded video pictures may be included in movie segments 164 rather than in MOOV boxes 154 . Therefore, all coded video samples may be included in movie fragment 164 instead of in MOOV box 154 .

MOOV盒154可以包括數個MVEX盒160，MVEX盒160的數量等於視頻檔案150中的電影片段164的數量。MVEX盒160中的每一者可以描述電影片段164中的相應電影片段的特性。例如，每個MVEX盒可以包括電影延伸標頭盒（MEHD）盒，其描述用於電影片段164中的對應電影片段的時間上的持續時間。The MOOV box 154 may include a number of MVEX boxes 160 equal to the number of movie segments 164 in the video archive 150 . Each of MVEX boxes 160 may describe characteristics of a corresponding one of movie fragments 164 . For example, each MVEX box may include a movie extension header box (MEHD) box that describes a temporal duration for a corresponding movie fragment in movie fragments 164 .

如上所述，封裝單元30可以將序列數據集儲存在不包括實際經寫碼的視頻數據的視頻樣本中。視頻樣本通常可以對應於存取單元，存取單元是在特定時間實例處的經寫碼的圖片的表示。在AVC的背景下，經寫碼的圖片包括含有要構造存取單元的所有像素的資訊的一個或多個VCL NAL單元及其它相關聯的非VCL NAL單元（諸如SEI訊息）。相應地，封裝單元30可以在電影片段164中的一個電影片段中包括序列數據集，序列數據集可以包括序列級別SEI訊息。封裝單元30進一步可以在MVEX盒160內的與電影片段164之一相對應的MVEX盒中將序列數據集及/或序列級別SEI訊息的存在用信號通知為存在於電影片段164中的該電影片段中。As noted above, encapsulation unit 30 may store sequence data sets in video samples that do not include actual encoded video data. A video sample may generally correspond to an access unit, which is a representation of a coded picture at a particular instance of time. In the context of AVC, a coded picture includes one or more VCL NAL units and other associated non-VCL NAL units (such as SEI information) that contain information for all pixels from which an access unit is to be constructed. Correspondingly, the encapsulation unit 30 may include a sequence data set in one of the movie segments 164, and the sequence data set may include sequence-level SEI information. Encapsulation unit 30 may further signal the presence of a sequence data set and/or sequence level SEI information in an MVEX box within MVEX box 160 corresponding to one of movie fragments 164 as present in that movie fragment 164 middle.

SIDX盒162是視頻檔案150的可選元素。亦即，符合3GPP檔案格式或其它這樣的檔案格式的視頻檔案不一定包括SIDX盒162。根據3GPP檔案格式的示例，SIDX盒可以用於標識分段（例如，被含有在視頻檔案150內的分段）的子分段。3GPP檔案格式將子分段定義為“具有對應媒體數據盒的一個或多個連續電影片段盒之自足集合，並且含有由電影片段盒引用的數據的媒體數據盒必須跟隨在該電影片段盒之後並且在含有關於相同軌道的資訊的下一電影片段盒之前。”3GPP檔案格式亦指示SIDX盒“含有對由該盒所記載的（子）分段的子分段的引用序列。所引用的子分段在呈現時間上是連續的。類似地，由分段索引盒所引用的位元組在分段內始終是連續的。所引用的大小給出了在所引用的材料中的位元組數量的計數。”SIDX box 162 is an optional element of video archive 150 . That is, a video file conforming to the 3GPP file format or other such file format does not necessarily include the SIDX box 162 . According to an example of the 3GPP archive format, SIDX boxes may be used to identify sub-segments of a segment (eg, a segment contained within video archive 150). The 3GPP archive format defines a subsegment as "a self-contained collection of one or more consecutive movie fragment boxes with corresponding media data boxes, and the media data box containing the data referenced by the movie fragment box must follow the movie fragment box and Before the next movie fragment box containing information about the same track." The 3GPP file format also indicates that a SIDX box "contains a sequence of references to the subsegments of the (sub)segments recorded by the box. The referenced subsegments Segments are contiguous in presentation time. Similarly, bytes referenced by a segment index box are always contiguous within a segment. The referenced size gives the number of bytes in the referenced material count."

SIDX盒162通常提供表示被包括在視頻檔案150中的分段之一個或多個子分段的資訊。例如，此類資訊可以包括子分段開始及/或結束的回放時間、針對子分段的位元組偏移、子分段是否包括流存取點（SAP）（例如，從其開始）、用於SAP的類型（例如，SAP是瞬時解碼器刷新（IDR）圖片、乾淨隨機存取（CRA）圖片、斷鏈存取（BLA）圖片、還是其它圖片）、SAP在子分段中的位置（依據回放時間及/或位元組偏移）等。SIDX box 162 generally provides information representing one or more sub-segments of a segment included in video archive 150 . For example, such information may include the playback time at which the subsegment begins and/or ends, the byte offset for the subsegment, whether the subsegment includes (e.g., starts from), a stream access point (SAP), Type of SAP used (e.g., is the SAP an Instantaneous Decoder Refresh (IDR) picture, Clean Random Access (CRA) picture, Broken Link Access (BLA) picture, or something else), the position of the SAP in the subsegment (based on playback time and/or byte offset), etc.

電影片段164可以包括一個或多個經寫碼的視頻圖片。在一些示例中，電影片段164可以包括一個或多個圖片組（GOP），其中的每個圖片組可以包括數個經寫碼的視頻圖片，例如，幀或圖片。另外，如上所述，在一些示例中，電影片段164可以包括序列數據集。電影片段164中的每個電影片段可以包括電影片段標頭盒（MFHD，在圖4中未示出）。MFHD盒可以描述對應電影片段的特性，諸如用於該電影片段的序列號。電影片段164可以按照序列號的順序被包括在視頻檔案150中。Movie segment 164 may include one or more encoded video pictures. In some examples, movie segment 164 may include one or more groups of pictures (GOPs), each of which may include a number of encoded video pictures, eg, frames or pictures. Additionally, as noted above, in some examples, movie segments 164 may include sequence data sets. Each of movie fragments 164 may include a movie fragment header box (MFHD, not shown in FIG. 4 ). The MFHD box may describe properties of the corresponding movie fragment, such as a serial number for that movie fragment. Movie clips 164 may be included in video archive 150 in order of serial number.

MFRA盒166可以描述在視頻檔案150的電影片段164內的隨機存取點。這可以輔助履行特技模式，諸如對通過視頻檔案150封裝的分段內的特定時間位置（即，回放時間）履行搜索。MFRA盒166通常是可選的，並且在一些示例中不需要被包括在視頻檔案中。同樣，客戶端裝置（例如，客戶端裝置40）不一定需要引用MFRA盒166來正確地解碼及顯示視頻檔案150的視頻數據。MFRA盒166可以包括數個軌道片段隨機存取（TFRA）盒（未示出），TFRA盒的數量等於視頻檔案150的軌道數量，或者在一些示例中，等於視頻檔案150的媒體軌道（例如，非提示軌道）的數量。MFRA boxes 166 may describe random access points within movie fragments 164 of video archive 150 . This may assist in performing trick modes, such as performing searches for specific temporal positions (ie, playback times) within segments encapsulated by the video archive 150 . MFRA box 166 is generally optional, and in some examples need not be included in the video archive. Likewise, a client device (eg, client device 40 ) does not necessarily need to reference MFRA box 166 to correctly decode and display the video data of video archive 150 . MFRA box 166 may include a number of Track Fragment Random Access (TFRA) boxes (not shown), the number of TFRA boxes equal to the number of tracks of video archive 150, or in some examples, equal to the number of media tracks of video archive 150 (e.g., number of non-hinted tracks).

在一些示例中，電影片段164可以包括一個或多個流存取點（SAP），諸如IDR圖片。同樣，MFRA盒166可以提供對SAP在視頻檔案150內的位置的指示。相應地，視頻檔案150的時間子序列可以從視頻檔案150的SAP形成。時間子序列亦可以包括其它圖片，諸如依賴於SAP的P幀及/或B幀。可以將時間子序列的幀及/或切片佈置在分段內，使得可以正確地解碼時間子序列的、依賴於該子序列的其它幀/切片的幀/切片。例如，在數據的分層佈置中，用於針對其它數據的預測的數據亦可以被包括在時間子序列中。In some examples, movie fragment 164 may include one or more stream access points (SAPs), such as IDR pictures. Likewise, MFRA box 166 may provide an indication of the location of the SAP within video archive 150 . Accordingly, a temporal subsequence of the video archive 150 may be formed from the SAP of the video archive 150 . The temporal subsequence may also include other pictures, such as P-frames and/or B-frames depending on SAP. Frames and/or slices of a temporal subsequence may be arranged within segments such that frames/slices of a temporal subsequence that are dependent on other frames/slices of the subsequence can be correctly decoded. For example, in a hierarchical arrangement of data, data used for predictions for other data may also be included in temporal subsequences.

圖5是示出根據本公開內容的技術的具有邊界體積的示例相機路徑分段212的概念圖。具體地，在3D場景200中，相機202表示用戶能夠觀看3D場景200的一部分的視點。在該示例中，路徑分段212被定義在點204及點206之間。此外，通過沿著路徑分段212將點從邊界框208擠壓到邊界框210來定義邊界體積。因此，在該示例中，允許相機202沿著路徑分段212在邊界體積內移動，但限制其移動超出邊界體積。FIG. 5 is a conceptual diagram illustrating an example camera path segment 212 with a bounding volume in accordance with the techniques of this disclosure. Specifically, in 3D scene 200 , camera 202 represents a point of view from which a user can watch a part of 3D scene 200 . In this example, path segment 212 is defined between point 204 and point 206 . Furthermore, a bounding volume is defined by extruding points from bounding box 208 to bounding box 210 along path segment 212 . Thus, in this example, the camera 202 is allowed to move within the bounding volume along the path segment 212, but is restricted from moving beyond the bounding volume.

場景描述可以描述允許相機（諸如相機202）沿著其移動的路徑集合。路徑可以被描述為錨點集合（諸如點204、206），它們通過路徑分段（諸如路徑分段212）連接。在一些示例（諸如圖5的示例）中，每個路徑分段可以利用允許沿著路徑的某種運動自由的邊界體積來增強。A scene description may describe a set of paths along which a camera, such as camera 202 , is allowed to move. A path may be described as a set of anchor points (such as points 204 , 206 ), which are connected by path segments (such as path segment 212 ). In some examples, such as the example of FIG. 5 , each path segment may be augmented with a bounding volume that allows some freedom of movement along the path.

因此，場景相機以及因此觀看者將能夠在邊界體積內沿著路徑分段自由地移動。可以使用更複雜的幾何形狀來描述路徑分段，以允許對路徑進行更精細的控制。Thus, the scene camera, and thus the viewer, will be able to move freely along the path segments within the bounding volume. Path segments can be described using more complex geometries to allow finer control over the path.

此外，相機參數可能在沿著路徑的每個點處受到約束。可以提供用於每個錨點的參數，並且然後與內插函數一起使用，以計算用於沿著路徑分段的每個點的相應參數。內插函數可以適用於所有參數，包括邊界體積。Additionally, camera parameters may be constrained at each point along the path. Parameters for each anchor point can be provided and then used with an interpolation function to calculate corresponding parameters for each point segmented along the path. Interpolation functions can be applied to all parameters, including bounding volumes.

本公開內容的相機控制延伸機制可以被實現為定義用於場景的相機控制的glTF 2.0延伸。相機控制延伸可以由“MPEG_camera_control”標籤來標識，該標籤可以被包括在extensionsUsed元素中，並且可以被包括在用於3D場景的extensionsRequired元素中。The camera control extension mechanism of the present disclosure can be implemented as a glTF 2.0 extension that defines camera control for a scene. The camera control extension may be identified by an 'MPEG_camera_control' tag, which may be included in the extensionsUsed element, and may be included in the extensionsRequired element for 3D scenes.

下面在表1中示出了示例“MPEG_camera_control”延伸，並且可以在場景描述的“camera”元素上定義。表1 名稱類型默認描述錨數字 N/A 相機路徑中的錨點的數量。分段數字 N/A 相機路徑中的路徑分段的數量。邊界體積數字 BV_NONE 用於路徑分段的邊界體積之類型。可能類型為： l BV_NONE：無邊界體積 l BV_CONE：帶帽圓錐體邊界體積，其是由每個錨點處的圓定義的。 l BV_FRUSTUM：平截頭體邊界體積，其是由各自含有錨點的兩個矩形定義的。 l BV_SPHERE：沿著路徑分段、圍繞每個點的球形邊界體積。邊界體積是由球體的半徑定義的。固有參數布爾型 false 當被設定為true時，指示在每個錨點處修改固有相機參數。應基於在[glTF 2.0]中定義為camera.perspective或camera.orthographic的相機類型來提供參數作為相機。存取器數字 N/A 提供相機控制資訊的存取器或定時存取器之索引。 An example "MPEG_camera_control" extension is shown in Table 1 below and may be defined on the "camera" element of a scene description. Table 1 name Types of default describe anchor number N/A The number of anchor points in the camera path. section number N/A The number of path segments in the camera path. bounding volume number BV_NONE The type of bounding volume used for path segmentation. Possible types are: l BV_NONE: no bounding volume l BV_CONE: capped cone bounding volume, which is defined by circles at each anchor point. l BV_FRUSTUM: Frustum bounding volume, which is defined by two rectangles each containing an anchor point. l BV_SPHERE: segmented along the path, spherical bounding volume around each point. The bounding volume is defined by the radius of the sphere. intrinsic parameters Boolean false When set to true, indicates that intrinsic camera parameters are modified at each anchor point. Parameters should be provided as camera based on the camera type defined as camera.perspective or camera.orthographic in [glTF 2.0]. accessor number N/A A reference to an accessor or timed accessor that provides camera control information.

相機控制資訊可以如下構造： l 對於每個錨點，錨點的(x,y,z)坐標可以使用浮點值來表示 l 對於每個路徑分段，路徑分段的第一錨點及第二錨點的(i,j)索引可以表示為整數值 l 對於邊界體積： o 如果邊界體積為BV_CONE，則可以提供第一錨點及第二錨點的圓的(r1,r2)半徑。 o 如果邊界體積為BV_FRUSTUM，則可以針對路徑分段的每個錨點提供((x,y,z)_topleft,w,h)。 o 如果邊界體積為BV_SPHERE，則可以針對路徑分段的每個錨點提供作為球體的半徑的r。 l 如果固有參數為true，則可以修改固有參數物體。 Camera control information can be structured as follows: l For each anchor point, the (x, y, z) coordinates of the anchor point can be represented by floating point values l For each path segment, the (i, j) index of the first anchor point and the second anchor point of the path segment can be expressed as an integer value l For bounding volumes: o If the bounding volume is BV_CONE, you can provide the (r1,r2) radii of the circles of the first anchor point and the second anchor point. o If the bounding volume is BV_FRUSTUM, ((x,y,z)_topleft,w,h) can be provided for each anchor point of the path segment. o If the bounding volume is BV_SPHERE, r can be provided as the radius of the sphere for each anchor point of the path segment. l If the intrinsic parameter is true, the intrinsic parameter object can be modified.

呈現引擎（例如，圖2的呈現引擎114或客戶端裝置40的另一元件，其可以不同於在圖1及圖2中所示的組件）可以支援MPEG_camera_control延伸或其它此類數據結構。如果場景提供相機控制資訊，則呈現引擎可以將相機移動限制在所指示的路徑上，使得相機的(x,y,z)坐標始終位於路徑分段上或路徑分段之邊界體積內。當觀看者接近邊界體積之邊界時，呈現引擎可以向他們提供視覺、聽覺及/或觸覺反饋。A presentation engine (eg, presentation engine 114 of FIG. 2 or another element of client device 40, which may be different than the components shown in FIGS. 1 and 2) may support the MPEG_camera_control extension or other such data structures. If the scene provides camera control information, the rendering engine can restrict camera movement to the indicated path such that the camera's (x, y, z) coordinates are always on the path segment or within the path segment's bounding volume. The rendering engine can provide visual, auditory and/or tactile feedback to viewers as they approach the boundaries of the bounding volume.

圖6是示出示例虛擬物體220的概念圖，在該示例中虛擬物體220是椅子。為了向觀看者提供沉浸式體驗，重要的是觀看者與場景中的物體正確地互動。觀看者不應當能夠穿行場景中的固體物體（諸如牆、椅子及桌子）或其它此類固體物體。FIG. 6 is a conceptual diagram illustrating an example virtual object 220, which in this example is a chair. In order to provide an immersive experience to the viewer, it is important that the viewer interacts correctly with the objects in the scene. Viewers should not be able to walk through solid objects in the scene such as walls, chairs and tables, or other such solid objects.

圖6描繪了椅子的3D網格表示以及被定義為長方體集合的碰撞邊界。可以定義MPEG_mesh_collision延伸數據結構，以提供此類3D網格的碰撞邊界的描述。延伸數據結構可以在網格物體上定義為在網格幾何體周圍的長方體集合。下面的表2表示了可以被包括在此類延伸數據結構中的示例屬性集合。表2 名稱類型默認描述邊界數組(物體) N/A 用於定義網格物體之碰撞邊界的邊界形狀數組。邊界可以是球體或長方體。靜態布爾型 True 決定物體是否受到碰撞影響。靜態物體將不會受到碰撞的影響，這意味著當觀看者或另一物體與該物體碰撞時，其位置將不會改變。材料數字 N/A 碰撞材料之索引，其定義碰撞物體或觀看者將如何與該物體互動。這可以包括反彈、摩擦等。動畫數組(物體) N/A 定義由該物體上的碰撞或動作觸發的動畫。動畫可以限於其它物體的子集，例如，只有觀看者可以觸發該動畫。其亦含有針對將在觸發時執行的動畫的指針。 Figure 6 depicts the 3D mesh representation of the chair and the collision boundaries defined as a collection of cuboids. An MPEG_mesh_collision extension data structure may be defined to provide a description of the collision boundaries of such 3D meshes. Extended data structures can be defined on mesh objects as collections of cuboids around the mesh geometry. Table 2 below represents an example set of attributes that may be included in such an extended data structure. Table 2 name Types of default describe boundary array(object) N/A An array of bounding shapes used to define the collision bounds of the mesh. The boundary can be a sphere or a cuboid. static Boolean True Determines whether objects are affected by collisions. Static objects will not be affected by collisions, meaning that when the viewer or another object collides with it, its position will not change. Material number N/A Index of the collision material, which defines how the collision object or viewer will interact with the object. This can include bounce, friction, etc. animation array(object) N/A Defines animations triggered by collisions or actions on this object. Animations can be limited to a subset of other objects, for example, only the viewer can trigger the animation. It also contains a pointer to the animation that will be executed when triggered.

網格碰撞資訊可以包括長方體邊界的長方體頂點坐標(x,y,z)、或球形邊界的球體中心及半徑。這些值可以作為浮點數來提供。Mesh collision information may include cuboid vertex coordinates (x, y, z) for cuboid boundaries, or sphere center and radius for spherical boundaries. These values can be provided as floating point numbers.

呈現引擎可以支援MPEG_mesh_collision延伸或其它此類數據結構。呈現引擎可以確保相機位置(x,y,z)在任何時間點都不會變得被含有在所定義的網格長方體中的一者內。可以通過視覺、聽覺及/或觸覺反饋將碰撞用信號通知給觀看者。呈現引擎可以使用關於用於節點的邊界的資訊來初始化及組態將檢測碰撞的3D物理引擎。The rendering engine may support the MPEG_mesh_collision extension or other such data structures. The rendering engine can ensure that the camera position (x,y,z) does not become contained within one of the defined mesh cuboids at any point in time. The collision may be signaled to the viewer by visual, audible and/or tactile feedback. The rendering engine can use the information about the bounds for the nodes to initialize and configure the 3D physics engine that will detect collisions.

圖7是示出根據本公開內容的技術的檢索媒體數據的示例方法的流程圖。關於圖1的客戶端裝置40及圖2的檢索單元52解釋了圖7的方法。其它此類裝置可以被組態以履行該方法或類似方法。7 is a flowchart illustrating an example method of retrieving media data in accordance with the techniques of this disclosure. The method of FIG. 7 is explained with respect to the client device 40 of FIG. 1 and the retrieval unit 52 of FIG. 2 . Other such devices may be configured to perform this or similar methods.

最初，客戶端裝置40可以檢索媒體數據（250）。例如，檢索單元52可以檢索例如符合glTF 2.0的媒體數據。在一些示例中，檢索單元52可以例如根據單播（諸如使用DASH）來直接檢索媒體數據。在一些示例中，檢索單元52的中間件單元（諸如圖2的eMBMS中間件100）可以經由廣播或多播接收媒體數據，然後DASH客戶端（例如，圖2的DASH客戶端110）可以從中間件單元檢索媒體數據。Initially, client device 40 may retrieve media data (250). For example, the retrieval unit 52 may retrieve media data conforming to glTF 2.0, for example. In some examples, retrieval unit 52 may directly retrieve the media data, eg, from unicast, such as using DASH. In some examples, the middleware unit of the retrieval unit 52 (such as the eMBMS middleware 100 of FIG. 2 ) can receive the media data via broadcast or multicast, and then the DASH client (for example, the DASH client 110 of FIG. 2 ) can receive the media data from the intermediate file unit to retrieve media data.

媒體數據可以包括場景描述。因此，檢索單元52或客戶端裝置40的另一組件可以從媒體數據中提取場景描述（252）。根據本公開內容的技術，場景描述可以是包括相機控制數據的MPEG場景描述。檢索單元52可以向呈現引擎114提供場景描述。呈現引擎114因此可以接收場景描述，並且進而根據場景描述來決定用於三維場景的相機控制數據（254）。相機控制數據可以符合上面的表1。亦即，例如，相機控制數據可以包括用於相機路徑的一個或多個錨點、用於相機路徑的錨點之間的一個或多個分段、諸如圓錐體、平截頭體或球體之類的邊界體積、可以在每個錨點處修改的固有參數、及/或提供相機控制資訊的存取器。Media data may include scene descriptions. Accordingly, retrieval unit 52 or another component of client device 40 may extract the scene description from the media data (252). According to the techniques of this disclosure, the scene description may be an MPEG scene description including camera control data. The retrieval unit 52 may provide the scene description to the rendering engine 114 . Rendering engine 114 may thus receive the scene description and, in turn, determine camera control data for the three-dimensional scene from the scene description (254). The camera control data may conform to Table 1 above. That is, for example, camera control data may include one or more anchor points for the camera path, one or more segments between anchor points for the camera path, such as a cone, frustum, or sphere. Class bounding volume, intrinsic parameters that can be modified at each anchor point, and/or accessors that provide camera control information.

呈現引擎114進一步可以根據相機控制數據來決定移動限制（256）。例如，呈現引擎114可以根據相機控制數據的移動限制來決定兩個或更多個錨點以及錨點之間的允許路徑。另外或替代地，呈現引擎114可以根據相機控制數據的移動限制來決定邊界體積，諸如立方體、球體、平截頭體、圓錐體等。呈現引擎114可以使用允許路徑來決定允許虛擬相機沿著其移動的路徑及/或允許虛擬相機在邊界體積內移動但不在邊界體積之外的路徑。可以定義允許路徑及/或邊界體積，以確保虛擬相機不會超過3D固體虛擬物體，諸如牆。亦即，可以將邊界體積或允許路徑定義為在一個或多個3D固體虛擬物體（諸如牆、地板、天花板或3D虛擬場景中的其它物體）內。Rendering engine 114 may further determine movement restrictions based on the camera control data ( 256 ). For example, the rendering engine 114 may determine two or more anchor points and allowed paths between the anchor points based on movement constraints of the camera control data. Additionally or alternatively, the rendering engine 114 may determine a bounding volume, such as a cube, sphere, frustum, cone, etc., based on the movement constraints of the camera control data. The rendering engine 114 may use the allowed paths to determine paths along which the virtual camera is allowed to move and/or paths that the virtual camera is allowed to move within but not outside of the bounding volume. Allowable paths and/or bounding volumes can be defined to ensure that the virtual camera does not go beyond 3D solid virtual objects, such as walls. That is, a bounding volume or allowed path may be defined within one or more 3D solid virtual objects such as walls, floors, ceilings, or other objects in the 3D virtual scene.

呈現引擎114然後可以接收相機移動數據（258）。例如，呈現引擎114可以從一個或多個控制器（諸如手持控制器及/或包括顯示器的頭戴機）接收數據，其表示頭戴機的朝向以及頭戴機及/或虛擬相機的移動（諸如定向移動及/或旋轉移動）。呈現引擎114可以決定相機移動數據請求相機穿過3D固體虛擬物體的移動（諸如超出邊界體積之邊界或沿著不是所定義的允許路徑之一的路徑）（260）。作為響應，呈現引擎114可以防止虛擬相機穿越3D固體虛擬物體（262）。Rendering engine 114 may then receive camera movement data (258). For example, rendering engine 114 may receive data from one or more controllers (such as handheld controllers and/or headsets that include displays) that represent the orientation of the headset and movement of the headset and/or virtual camera ( such as directional movement and/or rotational movement). The rendering engine 114 may determine camera movement data requesting movement of the camera through the 3D solid virtual object (such as beyond the bounds of the bounding volume or along a path that is not one of the defined allowed paths) ( 260 ). In response, rendering engine 114 may prevent the virtual camera from traversing the 3D solid virtual object (262).

以這種方式，圖7的方法表示一種檢索媒體數據的方法的示例，包括由呈現引擎接收串流媒體數據，串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由呈現引擎接收用於三維場景的相機控制數據，相機控制數據包括定義限制以防止虛擬相機穿越至少一個虛擬固體物體的數據；由呈現引擎從用戶接收相機移動數據，相機移動數據請求虛擬相機移動穿過至少一個虛擬固體物體；以及使用相機控制數據，由呈現引擎響應於相機移動數據來防止虛擬相機穿越至少一個虛擬固體物體。In this manner, the method of FIG. 7 represents an example of a method of retrieving media data, comprising receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; Camera control data for a three-dimensional scene, the camera control data including data defining constraints to prevent the virtual camera from passing through at least one virtual solid object; camera movement data received by the rendering engine from the user, the camera movement data requesting the virtual camera to move through at least one virtual solid object objects; and using the camera control data, preventing, by the rendering engine, the virtual camera from traversing the at least one virtual solid object in response to the camera movement data.

圖8是示出根據本公開內容的技術的檢索媒體數據的示例方法的流程圖。關於圖1的客戶端裝置40及圖2的檢索單元52解釋了圖8的方法。其它此類裝置可以被組態以履行該方法或類似方法。8 is a flowchart illustrating an example method of retrieving media data in accordance with the techniques of this disclosure. The method of FIG. 8 is explained with respect to the client device 40 of FIG. 1 and the retrieval unit 52 of FIG. 2 . Other such devices may be configured to perform this or similar methods.

最初，客戶端裝置40可以檢索媒體數據（280）。例如，檢索單元52可以檢索例如符合glTF 2.0的媒體數據。在一些示例中，檢索單元52可以例如根據單播（諸如使用DASH）直接地檢索媒體數據。在一些示例中，檢索單元52的中間件單元（諸如圖2的eMBMS中間件100）可以經由廣播或多播接收媒體數據，然後DASH客戶端（例如，圖2的DASH客戶端110）可以從中間件單元檢索媒體數據。Initially, client device 40 may retrieve media data (280). For example, the retrieval unit 52 may retrieve media data conforming to glTF 2.0, for example. In some examples, retrieval unit 52 may retrieve the media data directly, eg, from unicast, such as using DASH. In some examples, the middleware unit of the retrieval unit 52 (such as the eMBMS middleware 100 of FIG. 2 ) can receive the media data via broadcast or multicast, and then the DASH client (for example, the DASH client 110 of FIG. 2 ) can receive the media data from the intermediate file unit to retrieve media data.

媒體數據可以包括場景描述。因此，檢索單元52或客戶端裝置40的另一組件可以從媒體數據中提取場景描述（282）。根據本公開內容的技術，場景描述可以是包括物體碰撞數據的MPEG場景描述。檢索單元52可以向呈現引擎114提供場景描述。呈現引擎114因此可以接收場景描述，並且進而根據場景描述來決定用於一個或多個3D固體虛擬物體的物體碰撞數據（284）。物體碰撞數據可以符合上面的表2。亦即，物體碰撞數據可以包括表示例如以下各項的數據：表示定義網格（3D虛擬固體）物體之碰撞邊界的邊界形狀陣列的邊界；指示物體是否是靜態（即，可移動）的數據；表示用於物體的碰撞材料的材料；及/或在碰撞的情況下要針對該物體呈現的動畫。Media data may include scene descriptions. Accordingly, retrieval unit 52 or another component of client device 40 may extract the scene description from the media data (282). According to the techniques of this disclosure, the scene description may be an MPEG scene description including object collision data. The retrieval unit 52 may provide the scene description to the rendering engine 114 . Rendering engine 114 may thus receive the scene description and, in turn, determine object collision data for the one or more 3D solid virtual objects based on the scene description ( 284 ). Object collision data may conform to Table 2 above. That is, object collision data may include data representing, for example: the bounds of an array of bounding shapes defining the collision bounds of a mesh (3D virtual solid) object; data indicating whether the object is static (i.e., movable); The material representing the collision material to use for the object; and/or the animation to render for the object in case of collision.

呈現引擎114進一步可以根據相機控制數據來決定物體碰撞數據（286）。例如，呈現引擎114可以決定表示定義網格（3D虛擬固體）物體之碰撞邊界的邊界形狀陣列的邊界、指示該物體是否是靜態（即，可移動）的數據、表示用於該物體的碰撞材料的材料、及/或在碰撞的情況下要針對該物體呈現的動畫。呈現引擎114可以使用物體碰撞數據來決定在與3D固體虛擬物體的碰撞的情況下如何反應。The rendering engine 114 may further determine object collision data from the camera control data (286). For example, the rendering engine 114 may determine a boundary representing an array of boundary shapes defining a collision boundary for a mesh (3D virtual solid) object, data indicating whether the object is static (i.e., movable), representing a collision material for the object material, and/or the animation to render for the object in case of collision. Rendering engine 114 may use object collision data to decide how to react in the event of a collision with a 3D solid virtual object.

呈現引擎114然後可以接收相機移動數據（288）。例如，呈現引擎114可以從一個或多個控制器（諸如手持控制器及/或包括顯示器的頭戴機）接收數據，其表示頭戴機的朝向以及頭戴機及/或虛擬相機的移動（諸如定向移動及/或旋轉移動）。呈現引擎114可以決定相機移動數據請求相機穿過3D固體虛擬物體的移動（諸如進入由物體碰撞數據定義的3D固體虛擬物體）（290）。作為響應，呈現引擎114可以防止虛擬相機穿越3D固體虛擬物體（292）。例如，如果物體是靜態的（如物體碰撞數據所指示的），則呈現引擎114可以防止虛擬相機移動進入及穿過物體。作為另一示例，如果物體不是靜態的（例如，可移動的），則呈現引擎114可以根據物體碰撞數據來決定響應於與物體的碰撞的反應，諸如將在物體上播放的動畫，例如，如果物體將翻倒或移動。The rendering engine 114 may then receive the camera movement data (288). For example, rendering engine 114 may receive data from one or more controllers (such as handheld controllers and/or headsets that include displays) that represent the orientation of the headset and movement of the headset and/or virtual camera ( such as directional movement and/or rotational movement). The rendering engine 114 may determine the camera movement data to request movement of the camera through the 3D solid virtual object (such as into the 3D solid virtual object defined by the object collision data) ( 290 ). In response, rendering engine 114 may prevent the virtual camera from traversing the 3D solid virtual object (292). For example, if the object is static (as indicated by object collision data), rendering engine 114 may prevent the virtual camera from moving into and through the object. As another example, if the object is not static (e.g., movable), the rendering engine 114 may use the object collision data to determine a reaction, such as an animation to play on the object, in response to a collision with the object, e.g., if Object will tip over or move.

以這種方式，圖8的方法表示一種檢索媒體數據的方法的示例，該方法包括：由呈現引擎接收串流媒體數據，串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由呈現引擎接收表示至少一個虛擬固體物體之邊界的物體碰撞數據；由呈現引擎從用戶接收相機移動數據，相機移動數據請求虛擬相機移動穿過至少一個虛擬固體物體；以及使用物體碰撞數據，由呈現引擎響應於相機移動數據來防止虛擬相機穿越至少一個虛擬固體物體。In this manner, the method of FIG. 8 represents an example of a method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the engine, object collision data representing a boundary of at least one virtual solid object; receiving, by the rendering engine, camera movement data from the user, the camera movement data requesting that the virtual camera move through the at least one virtual solid object; and using the object collision data, the rendering engine responds The camera movement data is used to prevent the virtual camera from passing through at least one virtual solid object.

在以下條款中概述了本公開內容的技術的某些示例：Some examples of the techniques of this disclosure are outlined in the following clauses:

條款1：一種檢索媒體數據之方法，該方法包含：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，由該呈現引擎更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 1: A method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, a camera control data, the camera control data including data defining allowable positions for the virtual camera; receiving camera movement data from the user by the rendering engine, the camera movement data requesting the virtual camera to move through the at least one virtual solid object and updating, by the rendering engine, the position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款2：如條款1之方法，其中，更新該虛擬相機之該位置包含：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 2: The method of Clause 1, wherein updating the position of the virtual camera includes preventing the virtual camera from passing through the at least one virtual solid object.

條款3：如條款1之方法，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 3: The method of Clause 1, wherein the streaming media data includes glTF 2.0 media data.

條款4：如條款1之方法，其中，接收該串流媒體數據包含：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 4: The method of Clause 1, wherein receiving the streaming media data comprises: requesting the streaming media data from a retrieval unit via an Application Programming Interface (API).

條款5：如條款1之方法，其中，該相機控制數據被包括在MPEG場景描述中。Clause 5: The method of Clause 1, wherein the camera control data is included in an MPEG scene description.

條款6：如條款1之方法，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，更新該虛擬相機之該位置包含允許該虛擬相機僅越過該等錨點之間的該分段。Clause 6: The method of Clause 1, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing an allowable camera movement vector for the camera, and wherein updating the position of the virtual camera includes allowing the virtual camera to traverse only the segment between the anchor points.

條款7：如條款1之方法，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，更新該虛擬相機之該位置包含允許該虛擬相機僅越過該可允許相機移動體積。Clause 7: The method of Clause 1, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein updating the position of the virtual camera includes allowing The virtual camera only crosses the allowable camera movement volume.

條款8：如條款7之方法，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 8: The method of Clause 7, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款9：如條款1之方法，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 9: The method of clause 1, wherein the camera control data is included in an MPEG_camera_control extension.

條款10：如條款9之方法，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 10: The method of Clause 9, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing the number of anchor points for allowable paths for the virtual camera; segment data, It represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate whether the camera parameters are within the anchor points modified at each anchor point; and accessor data representing the index of the accessor that provided the camera control data.

條款11：如條款1之方法，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 11: The method of Clause 1, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款12：如條款1之方法，進一步包含：根據該相機控制數據來決定用於該虛擬相機的可允許路徑，其中，更新該虛擬相機之該位置包含確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動。Clause 12: The method of Clause 1, further comprising: determining an allowable path for the virtual camera based on the camera control data, wherein updating the position of the virtual camera comprises ensuring that the virtual camera only follows Virtual path movement within the allowable path defined in the data.

條款13：如條款1之方法，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 13: The method of Clause 1, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款14：一種用於檢索媒體數據的裝置，該裝置包含：記憶體，其被組態以儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 14: An apparatus for retrieving media data, the apparatus comprising: a memory configured to store the media data; and one or more processors implemented in a circuit and configured to execute a rendering engine, The rendering engine is configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive camera control data for the three-dimensional scene, the camera control data including definitions for data of allowable positions of the virtual camera; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the camera control data, updating the position of the virtual camera to ensure The virtual camera remains within the allowable position.

條款15：如條款14之裝置，其中，該呈現引擎被組態以：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 15: The apparatus of Clause 14, wherein the rendering engine is configured to: prevent the virtual camera from passing through the at least one virtual solid object.

條款16：如條款14之裝置，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 16: The device of Clause 14, wherein the streaming media data comprises glTF 2.0 media data.

條款17：如條款14之裝置，其中，該呈現引擎被組態以：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 17: The device of Clause 14, wherein the rendering engine is configured to: request the streaming media data from the retrieval unit via an application programming interface (API).

條款18：如條款14之裝置，其中，該相機控制數據被包括在MPEG場景描述中。Clause 18: The apparatus of Clause 14, wherein the camera control data is included in an MPEG scene description.

條款19：如條款14之裝置，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以允許該虛擬相機僅越過該等錨點之間的該分段。Clause 19: The apparatus of Clause 14, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing an allowable camera movement vector for the camera, and wherein, to update the position of the virtual camera, the rendering engine is configured to allow the virtual camera to only traverse the segment between the anchor points.

條款20：如條款14之裝置，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以允許該虛擬相機僅越過該可允許相機移動體積。Clause 20: The apparatus of Clause 14, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein, to update the position of the virtual camera, The rendering engine is configured to allow the virtual camera only beyond the allowable camera movement volume.

條款21：如條款20之裝置，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 21: The device of Clause 20, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款22：如條款14之裝置，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 22: The apparatus of Clause 14, wherein the camera control data is included in an MPEG_camera_control extension.

條款23：如條款22之裝置，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 23: The apparatus of Clause 22, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing the number of anchor points for allowable paths for the virtual camera; segment data, It represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate whether the camera parameters are within the anchor points modified at each anchor point; and accessor data representing the index of the accessor that provided the camera control data.

條款24：如條款14之裝置，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 24: The apparatus of Clause 14, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款25：如條款14之裝置，其中，該呈現引擎進一步被組態以：如該相機控制數據來決定用於該虛擬相機的可允許路徑，其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動。Clause 25: The apparatus of Clause 14, wherein the rendering engine is further configured to determine allowable paths for the virtual camera as the camera control data, wherein, to update the position of the virtual camera, the rendering The engine is configured to ensure that the virtual camera only moves along virtual paths within the allowable paths defined in the camera control data.

條款26：如條款14之裝置，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 26: The apparatus of Clause 14, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款27：一種具有儲存在其上的指令的計算機可讀儲存媒體，該指令在被執行時使得執行呈現引擎的處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 27: A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor executing a rendering engine to: receive streaming media data representing at least one A virtual three-dimensional scene of virtual solid objects; receiving camera control data for the three-dimensional scene, the camera control data including data defining allowable positions for the virtual camera; receiving camera movement data from a user, the camera movement data requesting the virtual moving a camera through the at least one virtual solid object; and updating a position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款28：如條款27之計算機可讀儲存媒體，其中，使得該處理器更新該虛擬相機之該位置的該指令包含：使得該處理器防止該虛擬相機穿越該至少一個虛擬固體物體的指令。Clause 28: The computer-readable storage medium of Clause 27, wherein the instructions causing the processor to update the position of the virtual camera comprise instructions causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

條款29：如條款27之計算機可讀媒體，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 29: The computer-readable medium of Clause 27, wherein the streaming media data comprises glTF 2.0 media data.

條款30：如條款27之計算機可讀媒體，其中，使得該處理器接收該串流媒體數據的該指令包含使得該處理器經由應用程式介面（API）從檢索單元請求該串流媒體數據的指令。Clause 30: The computer-readable medium of clause 27, wherein the instructions causing the processor to receive the streaming media data comprise instructions causing the processor to request the streaming media data from a retrieval unit via an application programming interface (API). .

條款31：如條款27之計算機可讀媒體，其中，該相機控制數據被包括在MPEG場景描述中。Clause 31: The computer-readable medium of Clause 27, wherein the camera control data is included in an MPEG scene description.

條款32：如條款27之計算機可讀媒體，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器允許該虛擬相機僅越過該等錨點之間的該分段的指令。Clause 32: The computer-readable medium of Clause 27, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing an allowable camera movement vector for the virtual camera, and wherein the instruction causing the processor to update the position of the virtual camera comprises causing the processor to allow the virtual camera to traverse only the segment between the anchor points instruction.

條款33：如條款27之計算機可讀媒體，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器允許該虛擬相機僅越過該可允許相機移動體積的指令。Clause 33: The computer-readable medium of Clause 27, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein the processor is caused to update the The instructions for the position of the virtual camera include instructions for causing the processor to allow the virtual camera to only cross the allowable camera movement volume.

條款34：如條款20之計算機可讀媒體，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 34: The computer-readable medium of Clause 20, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款35：如條款27之計算機可讀媒體，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 35: The computer-readable medium of Clause 27, wherein the camera control data is included in an MPEG_camera_control extension.

條款36：如條款22之計算機可讀媒體，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 36: The computer-readable medium of Clause 22, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing the number of anchor points for allowable paths for the virtual camera; segment data, which represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate whether the camera parameters are within the anchor points points are modified at each anchor point; and accessor data representing the index of the accessor that provided the camera control data.

條款37：如條款27之計算機可讀媒體，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 37: The computer-readable medium of Clause 27, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款38：如條款27之計算機可讀媒體，進一步包含：使得該處理器根據該相機控制數據來決定用於該虛擬相機的可允許路徑的指令，其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動的指令。Clause 38: The computer-readable medium of Clause 27, further comprising: instructions causing the processor to determine allowable paths for the virtual camera based on the camera control data, wherein the processor is caused to update the virtual camera's The instructions for position include instructions for the processor to ensure that the virtual camera moves only along virtual paths within the allowable paths defined in the camera control data.

條款39：如條款27之計算機可讀媒體，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 39: The computer-readable medium of Clause 27, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款40：一種用於檢索媒體數據的裝置，該裝置包含：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收用於該三維場景的相機控制數據的構件，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；用於從用戶接收相機移動數據的構件，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及用於使用該相機控制數據來更新該虛擬相機之位置以確保該虛擬相機保持在該可允許位置內的構件。Clause 40: An apparatus for retrieving media data, the apparatus comprising: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for camera control data of the scene, the camera control data including data defining allowable positions for the virtual camera; means for receiving camera movement data from the user, the camera movement data requesting the virtual camera to move through the at least one a virtual solid object; and means for updating the position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款41：一種檢索媒體數據之方法，該方法包含：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，由該呈現引擎更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 41: A method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, representation of the at least one virtual object collision data for boundaries of solid objects; receiving, by the rendering engine, camera movement data from a user, the camera movement data requesting a virtual camera to move through the at least one virtual solid object; and using the object collision data, updating, by the rendering engine, the A position of the virtual camera to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

條款42：如條款41之方法，其中，更新該虛擬相機之該位置包含：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 42: The method of Clause 41, wherein updating the position of the virtual camera comprises preventing the virtual camera from passing through the at least one virtual solid object.

條款43：如條款41之方法，其中，接收該物體碰撞數據包含：接收MPEG_mesh_collision延伸。Clause 43: The method of Clause 41, wherein receiving the object collision data comprises: receiving an MPEG_mesh_collision extension.

條款44：如條款43之方法，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 44: The method of Clause 43, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款45：如條款44之方法，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 45: The method of Clause 44, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh , or an animation that will be rendered in response to the virtual camera touching the 3D mesh.

條款46：如條款41之方法，其中，接收該物體碰撞數據包含接收包括以下各項中的一項或多項的數據：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 46: The method of Clause 41, wherein receiving the object collision data comprises receiving data comprising one or more of: boundary data representing one or more collision boundaries of the at least one virtual solid object; static data indicating whether the at least one virtual solid object is affected by the collision; material data indicating how the colliding object interacts with the at least one virtual solid object; or animation data indicating being triggered by a collision with the at least one virtual solid object animation.

條款47：如條款41之方法，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 47: The method of Clause 41, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款48：如條款41之方法，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 48: The method of Clause 41, wherein the streaming media data comprises glTF 2.0 media data.

條款49：如條款41之方法，其中，接收該串流媒體數據包含：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 49: The method of Clause 41, wherein receiving the streaming media data comprises: requesting the streaming media data from a retrieval unit via an Application Programming Interface (API).

條款50：如條款41之方法，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 50: The method of Clause 41, wherein the object collision data is included in an MPEG scene description.

條款51：一種用於檢索媒體數據的裝置，該裝置包含：記憶體，用於儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 51: An apparatus for retrieving media data, the apparatus comprising: a memory for storing the media data; and one or more processors implemented in a circuit and configured to execute a rendering engine, the rendering engine configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive object collision data representing boundaries of the at least one virtual solid object; receive camera movement data from a user, the camera movement data requests a virtual camera to move through the at least one virtual solid object; and updating the position of the virtual camera using the object collision data to ensure that the virtual camera remains on the at least one virtual solid in response to the camera movement data outside the object.

條款52：如條款51之裝置，其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 52: The device of Clause 51, wherein, to update the position of the virtual camera, the rendering engine is configured to: prevent the virtual camera from passing through the at least one virtual solid object.

條款53：如條款51之裝置，其中，為了接收該物體碰撞數據，該呈現引擎被組態以：接收MPEG_mesh_collision延伸。Clause 53: The apparatus of Clause 51, wherein, to receive the object collision data, the rendering engine is configured to: receive an MPEG_mesh_collision extension.

條款54：如條款53之裝置，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 54: The apparatus of Clause 53, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款55：如條款54之裝置，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 55: The apparatus of Clause 54, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh , or an animation that will be rendered in response to the virtual camera touching the 3D mesh.

條款56：如條款51之裝置，其中，為了接收該物體碰撞數據，該呈現引擎被組態以接收包括以下各項中的一項或多項的數據：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 56: The apparatus of Clause 51, wherein, to receive the object collision data, the rendering engine is configured to receive data comprising one or more of the following: boundary data representing the at least one virtual solid object one or more collision boundaries; static data indicating whether the at least one virtual solid object is affected by the collision; material data indicating how the collision object interacts with the at least one virtual solid object; An animation triggered by the collision of at least one virtual solid object.

條款57：如條款51之裝置，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 57: The apparatus of Clause 51, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款58：如條款51之裝置，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 58: The device of Clause 51, wherein the streaming media data comprises glTF 2.0 media data.

條款59：如條款51之裝置，其中，為了接收該串流媒體數據，該呈現引擎被組態以：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 59: The device of Clause 51, wherein, to receive the streaming media data, the rendering engine is configured to: request the streaming media data from the retrieval unit via an Application Programming Interface (API).

條款60：如條款51之裝置，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 60: The apparatus of Clause 51, wherein the object collision data is included in an MPEG scene description.

條款61：一種具有儲存在其上的指令的計算機可讀儲存媒體，該指令在被執行時使得處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 61: A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to: receive streaming media data representing an a virtual three-dimensional scene; receiving object collision data representing a boundary of the at least one virtual solid object; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the object collision data, The position of the virtual camera is updated to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

條款62：如條款61之計算機可讀媒體，其中，使得該處理器更新該虛擬相機之該位置的該指令包含：使得該處理器防止該虛擬相機穿越該至少一個虛擬固體物體的指令。Clause 62: The computer-readable medium of Clause 61, wherein the instructions causing the processor to update the position of the virtual camera comprise instructions causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

條款63：如條款61之計算機可讀媒體，其中，使得該處理器接收該物體碰撞數據的該指令包含：使得該處理器接收MPEG_mesh_collision延伸的指令。Clause 63: The computer-readable medium of Clause 61, wherein the instructions causing the processor to receive the object collision data comprise instructions causing the processor to receive an MPEG_mesh_collision extension.

條款64：如條款62之計算機可讀媒體，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 64: The computer-readable medium of Clause 62, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款65：如條款63之計算機可讀媒體，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 65: The computer-readable medium of Clause 63, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a boundary for the 3D mesh The material of the mesh, or the animation that will appear in response to the virtual camera touching the 3D mesh.

條款66：如條款61之計算機可讀媒體，其中，使得該處理器接收該物體碰撞數據的該指令包含使得該處理器接收包括以下各項中的一項或多項的數據的指令：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 66: The computer-readable medium of Clause 61, wherein the instructions causing the processor to receive the object collision data comprise instructions causing the processor to receive data comprising one or more of: boundary data, which represents one or more collision boundaries of the at least one virtual solid object; static data which represents whether the at least one virtual solid object is affected by the collision; material data which represents how the collision object interacts with the at least one virtual solid object; or animation data representing an animation triggered by a collision with the at least one virtual solid object.

條款67：如條款61之計算機可讀媒體，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 67: The computer-readable medium of Clause 61, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款68：如條款61之計算機可讀媒體，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 68: The computer-readable medium of Clause 61, wherein the streaming media data comprises glTF 2.0 media data.

條款69：如條款61之計算機可讀媒體，其中，使得該處理器接收該串流媒體數據的該指令包含：使得該處理器經由應用程式介面（API）從檢索單元請求該串流媒體數據的指令。Clause 69: The computer-readable medium of clause 61, wherein the instructions causing the processor to receive the streaming media data comprise: causing the processor to request the streaming media data from a retrieval unit via an application programming interface (API) instruction.

條款70：如條款61之計算機可讀媒體，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 70: The computer-readable medium of Clause 61, wherein the object collision data is included in an MPEG scene description.

條款71：一種用於檢索媒體數據的裝置，該裝置包含：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據的構件；用於從用戶接收相機移動數據的構件，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及用於更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外的構件。Clause 71: An apparatus for retrieving media data, the apparatus comprising: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for object collision data of a boundary of a virtual solid object; means for receiving camera movement data from a user requesting that the virtual camera move through the at least one virtual solid object; and for updating a position of the virtual camera, A means to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

條款72：一種檢索媒體數據之方法，該方法包含：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，由該呈現引擎更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 72: A method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, a camera control data, the camera control data including data defining allowable positions for the virtual camera; receiving camera movement data from the user by the rendering engine, the camera movement data requesting the virtual camera to move through the at least one virtual solid object and updating, by the rendering engine, the position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款73：如條款72之方法，其中，更新該虛擬相機之該位置包含：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 73: The method of Clause 72, wherein updating the position of the virtual camera comprises preventing the virtual camera from passing through the at least one virtual solid object.

條款74：如條款72及73中任一項之方法，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 74: The method of any one of clauses 72 and 73, wherein the streaming media data comprises glTF 2.0 media data.

條款75：如條款72-74中任一項之方法，其中，接收該串流媒體數據包含：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 75: The method of any one of clauses 72-74, wherein receiving the streaming media data comprises: requesting the streaming media data from a retrieval unit via an application programming interface (API).

條款76：如條款72-75中任一項之方法，其中，該相機控制數據被包括在MPEG場景描述中。Clause 76: The method of any one of clauses 72-75, wherein the camera control data is included in an MPEG scene description.

條款77：如條款72-76中任一項之方法，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，更新該虛擬相機之該位置包含允許該虛擬相機僅越過該等錨點之間的該分段。Clause 77: The method of any of clauses 72-76, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segment Segments represent allowable camera movement vectors for the virtual camera, and wherein updating the position of the virtual camera includes allowing the virtual camera to traverse only the segment between the anchor points.

條款78：如條款72-77中任一項之方法，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，更新該虛擬相機之該位置包含允許該虛擬相機僅越過該可允許相機移動體積。Clause 78: The method of any of clauses 72-77, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein updating the virtual The position of the camera includes allowing the virtual camera to pass only the allowable camera movement volume.

條款79：如條款78之方法，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 79: The method of Clause 78, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款80：如條款72-79中任一項之方法，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 80: The method of any one of clauses 72-79, wherein the camera control data is included in an MPEG_camera_control extension.

條款81：如條款80之方法，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 81: The method of Clause 80, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing the number of anchor points for allowable paths for the virtual camera; segment data, It represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate whether the camera parameters are within the anchor points modified at each anchor point; and accessor data representing the index of the accessor that provided the camera control data.

條款82：如條款72-81中任一項之方法，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 82: The method of any one of clauses 72-81, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款83：如條款72之方法，進一步包含：根據該相機控制數據來決定用於該虛擬相機的可允許路徑，其中，更新該虛擬相機之該位置包含確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動。Clause 83: The method of Clause 72, further comprising: determining an allowable path for the virtual camera based on the camera control data, wherein updating the position of the virtual camera comprises ensuring that the virtual camera only follows Virtual path movement within the allowable path defined in the data.

條款84：如條款72-83中任一項之方法，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 84: The method of any of clauses 72-83, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款85：一種用於檢索媒體數據的裝置，該裝置包含：記憶體，其被組態以儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 85: An apparatus for retrieving media data, the apparatus comprising: a memory configured to store the media data; and one or more processors implemented in a circuit and configured to execute a rendering engine, The rendering engine is configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive camera control data for the three-dimensional scene, the camera control data including definitions for data of allowable positions of the virtual camera; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the camera control data, updating the position of the virtual camera to ensure The virtual camera remains within the allowable position.

條款86：如條款85之裝置，其中，該呈現引擎被組態以：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 86: The device of Clause 85, wherein the rendering engine is configured to: prevent the virtual camera from passing through the at least one virtual solid object.

條款87：如條款85及86中任一項之裝置，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 87: The device of any one of clauses 85 and 86, wherein the streaming media data comprises glTF 2.0 media data.

條款88：如條款85-87中任一項之裝置，其中，該呈現引擎被組態以：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 88: The device of any one of clauses 85-87, wherein the presentation engine is configured to: request the streaming media data from the retrieval unit via an application programming interface (API).

條款89：如條款85-88中任一項之裝置，其中，該相機控制數據被包括在MPEG場景描述中。Clause 89: The apparatus of any one of clauses 85-88, wherein the camera control data is included in an MPEG scene description.

條款90：如條款85-89中任一項之裝置，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以允許該虛擬相機僅越過該等錨點之間的該分段。Clause 90: The device of any of Clauses 85-89, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points, the segment Segments represent allowable camera movement vectors for the virtual camera, and wherein, to update the position of the virtual camera, the rendering engine is configured to allow the virtual camera to only traverse the segment between the anchor points.

條款91：如條款85-90中任一項之裝置，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以允許該虛擬相機僅越過該可允許相機移動體積。Clause 91: The apparatus of any of clauses 85-90, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein, to update the At the location of the virtual camera, the rendering engine is configured to allow the virtual camera to only cross the allowable camera movement volume.

條款92：如條款91之裝置，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 92: The device of Clause 91, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款93：如條款85-92中任一項之裝置，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 93: The apparatus of any one of clauses 85-92, wherein the camera control data is included in an MPEG_camera_control extension.

條款94：如條款93之裝置，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 94: The apparatus of Clause 93, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing the number of anchor points for allowable paths for the virtual camera; segment data, It represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate whether the camera parameters are within the anchor points modified at each anchor point; and accessor data representing the index of the accessor that provided the camera control data.

條款95：如條款85-94中任一項之裝置，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 95: The apparatus of any of clauses 85-94, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款96：如條款85-95中任一項之裝置，其中，該呈現引擎進一步被組態以：根據該相機控制數據來決定用於該虛擬相機的可允許路徑，其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動。Clause 96: The apparatus of any one of clauses 85-95, wherein the rendering engine is further configured to: determine an allowable path for the virtual camera based on the camera control data, wherein, for updating the virtual camera For this position, the rendering engine is configured to ensure that the virtual camera moves only along virtual paths within the allowable paths defined in the camera control data.

條款97：如條款85-96中任一項之裝置，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 97: The apparatus of any one of clauses 85-96, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款98：一種具有儲存在其上的指令的計算機可讀儲存媒體，該指令在被執行時使得執行呈現引擎的處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收用於該三維場景的相機控制數據，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，更新該虛擬相機之位置，以確保該虛擬相機保持在該可允許位置內。Clause 98: A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor executing a rendering engine to: receive streaming media data representing at least one A virtual three-dimensional scene of virtual solid objects; receiving camera control data for the three-dimensional scene, the camera control data including data defining allowable positions for the virtual camera; receiving camera movement data from a user, the camera movement data requesting the virtual moving a camera through the at least one virtual solid object; and updating a position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款99：如條款98之計算機可讀儲存媒體，其中，使得該處理器更新該虛擬相機之該位置的該指令包含：使得該處理器防止該虛擬相機穿越該至少一個虛擬固體物體的指令。Clause 99: The computer-readable storage medium of Clause 98, wherein the instructions causing the processor to update the position of the virtual camera comprise instructions causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

條款100：如條款98及99中任一項之計算機可讀媒體，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 100: The computer-readable medium of any one of clauses 98 and 99, wherein the streaming media data comprises glTF 2.0 media data.

條款101：如條款98-100中任一項之計算機可讀媒體，其中，使得該處理器接收該串流媒體數據的該指令包含使得該處理器經由應用程式介面（API）從檢索單元請求該串流媒體數據的指令。Clause 101: The computer-readable medium of any one of clauses 98-100, wherein the instructions causing the processor to receive the streaming media data comprise causing the processor to request the stream from a retrieval unit via an application programming interface (API). Instructions for streaming media data.

條款102：如條款98-101中任一項之計算機可讀媒體，其中，該相機控制數據被包括在MPEG場景描述中。Clause 102: The computer-readable medium of any one of Clauses 98-101, wherein the camera control data is included in an MPEG scene description.

條款103：如條款98-102中任一項之計算機可讀媒體，其中，該相機控制數據包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示用於該虛擬相機的可允許相機移動向量，並且其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器允許該虛擬相機僅越過該等錨點之間的該分段的指令。Clause 103: The computer-readable medium of any one of clauses 98-102, wherein the camera control data includes data defining two or more anchor points and one or more segments between the anchor points , the segment represents an allowable camera movement vector for the virtual camera, and wherein the instruction causing the processor to update the position of the virtual camera includes causing the processor to allow the virtual camera to pass only between the anchor points Instructions for this segment between.

條款104：如條款103之計算機可讀媒體，其中，該相機控制數據包括定義邊界體積的數據，該邊界體積表示用於該虛擬相機的可允許相機移動體積，並且其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器允許該虛擬相機僅越過該可允許相機移動體積的指令。Clause 104: The computer-readable medium of Clause 103, wherein the camera control data includes data defining a bounding volume representing an allowable camera movement volume for the virtual camera, and wherein the processor is caused to update the The instructions for the position of the virtual camera include instructions for causing the processor to allow the virtual camera to only cross the allowable camera movement volume.

條款105：如條款98-104中任一項之計算機可讀媒體，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 105: The computer-readable medium of any one of clauses 98-104, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款106：如條款105之計算機可讀媒體，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 106: The computer-readable medium of Clause 105, wherein the camera control data is included in an MPEG_camera_control extension.

條款107：如條款98-106中任一項之計算機可讀媒體，其中，該MPEG_camera_control延伸包括以下各項中的一項或多項：錨點數據，其表示對於用於該虛擬相機的可允許路徑的錨點數量；分段數據，其表示對於該等錨點之間的該可允許路徑的路徑分段數量；邊界體積數據，其表示用於該虛擬相機的邊界體積；固有參數，其指示相機參數是否在該等錨點中的每個錨點處被修改；以及存取器數據，其表示提供該相機控制數據的存取器之索引。Clause 107: The computer-readable medium of any one of clauses 98-106, wherein the MPEG_camera_control extension includes one or more of: anchor point data representing allowable paths for the virtual camera number of anchor points; segment data, which represents the number of path segments for the allowable path between the anchor points; bounding volume data, which represents the bounding volume for the virtual camera; intrinsic parameters, which indicate the camera whether the parameter is modified at each of the anchor points; and accessor data representing the index of the accessor that provided the camera control data.

條款108：如條款98-107中任一項之計算機可讀媒體，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 108: The computer-readable medium of any one of clauses 98-107, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款109：如條款98-108中任一項之計算機可讀媒體，進一步包含：使得該處理器根據該相機控制數據來決定用於該虛擬相機的可允許路徑的指令，其中，使得該處理器更新該虛擬相機之該位置的該指令包含使得該處理器確保該虛擬相機僅沿著在該相機控制數據中定義的該可允許路徑內的虛擬路徑移動的指令。Clause 109: The computer-readable medium of any one of clauses 98-108, further comprising: instructions that cause the processor to determine an allowable path for the virtual camera based on the camera control data, wherein the processor is caused to The instructions to update the position of the virtual camera include instructions to cause the processor to ensure that the virtual camera moves only along virtual paths within the allowable paths defined in the camera control data.

條款110：如條款98-109中任一項之計算機可讀媒體，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 110: The computer-readable medium of any one of clauses 98-109, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款111：一種用於檢索媒體數據的裝置，該裝置包含：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收用於該三維場景的相機控制數據的構件，該相機控制數據包括定義用於虛擬相機的可允許位置的數據；用於從用戶接收相機移動數據的構件，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及用於使用該相機控制數據來更新該虛擬相機之位置以確保該虛擬相機保持在該可允許位置內的構件。Clause 111: An apparatus for retrieving media data, the apparatus comprising: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for camera control data of the scene, the camera control data including data defining allowable positions for the virtual camera; means for receiving camera movement data from the user, the camera movement data requesting the virtual camera to move through the at least one a virtual solid object; and means for updating the position of the virtual camera using the camera control data to ensure that the virtual camera remains within the allowable position.

條款112：一種檢索媒體數據之方法，該方法包含：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，由該呈現引擎更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 112: A method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, representation of the at least one virtual object collision data for boundaries of solid objects; receiving, by the rendering engine, camera movement data from a user, the camera movement data requesting a virtual camera to move through the at least one virtual solid object; and using the object collision data, updating, by the rendering engine, the A position of the virtual camera to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

條款113：一種方法，包含如條款72-84中任一項之方法及如條款112之方法之組合。Clause 113: A method comprising a combination of the method of any one of clauses 72-84 and the method of clause 112.

條款114：如條款112及113中任一項之方法，其中，更新該虛擬相機之該位置包含：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 114: The method of any one of clauses 112 and 113, wherein updating the position of the virtual camera comprises preventing the virtual camera from passing through the at least one virtual solid object.

條款115：如條款112-114中任一項之方法，其中，接收該物體碰撞數據包含：接收MPEG_mesh_collision延伸。Clause 115: The method of any one of clauses 112-114, wherein receiving the object collision data comprises: receiving an MPEG_mesh_collision extension.

條款116：如條款115之方法，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 116: The method of Clause 115, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款117：如條款116之方法，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 117: The method of Clause 116, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh , or an animation that will be rendered in response to the virtual camera touching the 3D mesh.

條款118：如條款112-117中任一項之方法，其中，接收該物體碰撞數據包含接收包括以下各項中的一項或多項的數據：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 118: The method of any of clauses 112-117, wherein receiving the object collision data comprises receiving data comprising one or more of: boundary data representing one of the at least one virtual solid object or a plurality of collision boundaries; static data representing whether the at least one virtual solid object is affected by the collision; material data representing how the collision object interacts with the at least one virtual solid object; or animation data representing the interaction between the at least one virtual solid object Animations triggered by collisions of virtual solid objects.

條款119：如條款112-118中任一項之方法，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 119: The method of any one of clauses 112-118, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款120：如條款112-119中任一項之方法，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 120: The method of any one of clauses 112-119, wherein the streaming media data comprises glTF 2.0 media data.

條款121：如條款112-120中任一項之方法，其中，接收該串流媒體數據包含：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 121: The method of any one of clauses 112-120, wherein receiving the streaming media data comprises: requesting the streaming media data from a retrieval unit via an application programming interface (API).

條款122：如條款112-121中任一項之方法，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 122: The method of any of clauses 112-121, wherein the object collision data is included in an MPEG scene description.

條款123：一種用於檢索媒體數據的裝置，該裝置包含：記憶體，用於儲存媒體數據；以及一個或多個處理器，其在電路中實現並且被組態以執行呈現引擎，該呈現引擎被組態以：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 123: An apparatus for retrieving media data, the apparatus comprising: memory for storing media data; and one or more processors implemented in circuitry and configured to execute a rendering engine, the rendering engine configured to: receive streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receive object collision data representing boundaries of the at least one virtual solid object; receive camera movement data from a user, the camera movement data requests a virtual camera to move through the at least one virtual solid object; and updating the position of the virtual camera using the object collision data to ensure that the virtual camera remains on the at least one virtual solid in response to the camera movement data outside the object.

條款124：一種裝置，包含如條款85-97中任一項之裝置及如條款123之裝置之組合。Clause 124: A device comprising a combination of the device of any one of clauses 85-97 and the device of clause 123.

條款125：如條款123及124中任一項之裝置，其中，為了更新該虛擬相機之該位置，該呈現引擎被組態以：防止該虛擬相機穿越該至少一個虛擬固體物體。Clause 125: The device of any one of clauses 123 and 124, wherein, to update the position of the virtual camera, the rendering engine is configured to: prevent the virtual camera from passing through the at least one virtual solid object.

條款126：如條款123-125中任一項之裝置，其中，為了接收該物體碰撞數據，該呈現引擎被組態以：接收MPEG_mesh_collision延伸。Clause 126: The apparatus of any one of clauses 123-125, wherein, to receive the object collision data, the rendering engine is configured to: receive an MPEG_mesh_collision extension.

條款127：如條款126之裝置，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 127: The device of Clause 126, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款128：如條款127之裝置，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 128: The apparatus of Clause 127, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh , or an animation that will be rendered in response to the virtual camera touching the 3D mesh.

條款129：如條款123-128中任一項之裝置，其中，為了接收該物體碰撞數據，該呈現引擎被組態以接收包括以下各項中的一項或多項的數據：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 129: The apparatus of any of Clauses 123-128, wherein, to receive the object collision data, the rendering engine is configured to receive data comprising one or more of the following: boundary data representing One or more collision boundaries of the at least one virtual solid object; static data indicating whether the at least one virtual solid object is affected by the collision; material data indicating how the collision object interacts with the at least one virtual solid object; or animation data , which represents an animation triggered by a collision with the at least one virtual solid object.

條款130：如條款123-129中任一項之裝置，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 130: The apparatus of any of clauses 123-129, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款131：如條款123-130中任一項之裝置，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 131: The device of any one of clauses 123-130, wherein the streaming media data comprises glTF 2.0 media data.

條款132：如條款123-131中任一項之裝置，其中，為了接收該串流媒體數據，該呈現引擎被組態以：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 132: The device of any one of clauses 123-131, wherein, to receive the streaming media data, the rendering engine is configured to: request the streaming media data from the retrieval unit via an application programming interface (API).

條款133：如條款123-132中任一項之裝置，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 133: The apparatus of any of Clauses 123-132, wherein the object collision data is included in an MPEG scene description.

條款134：一種具有儲存在其上的指令的計算機可讀儲存媒體，該指令在被執行時使得處理器進行以下操作：接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；接收表示該至少一個虛擬固體物體之邊界的物體碰撞數據；從用戶接收相機移動數據，該相機移動數據請求虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該物體碰撞數據，更新該虛擬相機之位置，以確保該虛擬相機響應於該相機移動數據而保持在該至少一個虛擬固體物體之外。Clause 134: A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to: receive streaming media data representing a a virtual three-dimensional scene; receiving object collision data representing a boundary of the at least one virtual solid object; receiving camera movement data from a user requesting the virtual camera to move through the at least one virtual solid object; and using the object collision data, The position of the virtual camera is updated to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

條款135：一種計算機可讀儲存媒體，其包含如條款98-110中任一項之計算機可讀儲存媒體及如條款134之計算機可讀儲存媒體之組合。Clause 135: A computer-readable storage medium comprising a combination of the computer-readable storage medium of any one of clauses 98-110 and the computer-readable storage medium of clause 134.

條款136：如條款134及135中任一項之計算機可讀媒體，其中，使得該處理器更新該虛擬相機之該位置的該指令包含：使得該處理器防止該虛擬相機穿越該至少一個虛擬固體物體的指令。Clause 136: The computer-readable medium of any one of clauses 134 and 135, wherein the instruction causing the processor to update the position of the virtual camera comprises: causing the processor to prevent the virtual camera from traversing the at least one virtual solid object instructions.

條款137：如條款134-136中任一項之計算機可讀媒體，其中，使得該處理器接收該物體碰撞數據的該指令包含：使得該處理器接收MPEG_mesh_collision延伸的指令。Clause 137: The computer-readable medium of any one of clauses 134-136, wherein the instructions causing the processor to receive the object collision data comprise instructions causing the processor to receive an MPEG_mesh_collision extension.

條款138：如條款134-137中任一項之計算機可讀媒體，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 138: The computer-readable medium of any one of clauses 134-137, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款139：如條款134-138中任一項之計算機可讀媒體，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：用於該至少一個虛擬固體物體的3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 139: The computer-readable medium of any of Clauses 134-138, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object , a material for the 3D mesh, or an animation to be rendered in response to the virtual camera touching the 3D mesh.

條款140：如條款134-139中任一項之計算機可讀媒體，其中，使得該處理器接收該物體碰撞數據的該指令包含使得該處理器接收包括以下各項中的一項或多項的數據的指令：邊界數據，其表示該至少一個虛擬固體物體之一個或多個碰撞邊界；靜態數據，其表示該至少一個虛擬固體物體是否受到碰撞影響；材料數據，其表示碰撞物體如何與該至少一個虛擬固體物體互動；或者動畫數據，其表示由與該至少一個虛擬固體物體的碰撞觸發的動畫。Clause 140: The computer-readable medium of any one of clauses 134-139, wherein the instructions causing the processor to receive the object collision data comprise causing the processor to receive data comprising one or more of the following Instructions for: boundary data, which represents one or more collision boundaries of the at least one virtual solid object; static data, which represents whether the at least one virtual solid object is affected by collision; material data, which represents how the collision object interacts with the at least one virtual solid object interaction; or animation data representing an animation triggered by a collision with the at least one virtual solid object.

條款141：如條款134-140中任一項之計算機可讀媒體，其中，該至少一個虛擬固體物體包含虛擬牆、虛擬椅子或虛擬桌子中的一者。Clause 141: The computer-readable medium of any one of clauses 134-140, wherein the at least one virtual solid object comprises one of a virtual wall, a virtual chair, or a virtual table.

條款142：如條款134-141中任一項之計算機可讀媒體，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 142: The computer-readable medium of any one of clauses 134-141, wherein the streaming media data comprises glTF 2.0 media data.

條款143：如條款134-142中任一項之計算機可讀媒體，其中，使得該處理器接收該串流媒體數據的該指令包含：使得該處理器經由應用程式介面（API）從檢索單元請求該串流媒體數據的指令。Clause 143: The computer-readable medium of any one of clauses 134-142, wherein the instructions causing the processor to receive the streaming media data comprise: causing the processor to request from the retrieval unit via an application programming interface (API) The instruction to stream media data.

條款144：如條款134-143中任一項之計算機可讀媒體，其中，該物體碰撞數據被包括在MPEG場景描述中。Clause 144: The computer-readable medium of any one of clauses 134-143, wherein the object collision data is included in an MPEG scene description.

條款145：一種檢索媒體數據之方法，該方法包含：由呈現引擎接收串流媒體數據，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；由該呈現引擎接收用於該三維場景的相機控制數據，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；由該呈現引擎從用戶接收相機移動數據，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及使用該相機控制數據，防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體。Clause 145: A method of retrieving media data, the method comprising: receiving, by a rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, a camera control data, the camera control data including data defining restrictions to prevent the virtual camera from traversing the at least one virtual solid object; receiving camera movement data from the user by the rendering engine, the camera movement data requesting the virtual camera to move through the at least one virtual solid object a virtual solid object; and using the camera control data, preventing the virtual camera from traversing the at least one virtual solid object in response to the camera movement data.

條款146：如條款145之方法，其中，該串流媒體數據包含glTF 2.0媒體數據。Clause 146: The method of Clause 145, wherein the streaming media data comprises glTF 2.0 media data.

條款147：如條款145及146中任一項之方法，其中，接收該串流媒體數據包含：經由應用程式介面（API）從檢索單元請求該串流媒體數據。Clause 147: The method of any one of clauses 145 and 146, wherein receiving the streaming media data comprises: requesting the streaming media data from a retrieval unit via an application programming interface (API).

條款148：如條款145-147中任一項之方法，其中，該相機控制數據被包括在MPEG場景描述中。Clause 148: The method of any one of clauses 145-147, wherein the camera control data is included in an MPEG scene description.

條款149：如條款145-148中任一項之方法，其中，該相機控制數據被包括在MPEG_camera_control延伸中。Clause 149: The method of any one of clauses 145-148, wherein the camera control data is included in an MPEG_camera_control extension.

條款150：如條款149之方法，其中，該MPEG_camera_control延伸包括定義兩個或更多個錨點及該等錨點之間的一個或多個分段的數據，該分段表示可允許相機移動向量。Clause 150: The method of Clause 149, wherein the MPEG_camera_control extension includes data defining two or more anchor points and one or more segments between the anchor points, the segments representing allowable camera movement vectors .

條款151：如條款145-148中任一項之方法，其中，該MPEG_camera_control延伸包括定義表示可允許相機移動體積的邊界體積的數據。Clause 151: The method of any one of clauses 145-148, wherein the MPEG_camera_control extension includes data defining a bounding volume representing an allowable camera movement volume.

條款152：如條款151之方法，其中，定義該邊界體積的該數據包含定義圓錐體、平截頭體或球體中的至少一者的數據。Clause 152: The method of Clause 151, wherein the data defining the bounding volume comprises data defining at least one of a cone, frustum, or sphere.

條款153：如條款149-152中任一項之方法，其中，該MPEG_camera_control延伸符合以上表1之數據。Clause 153: The method of any one of clauses 149-152, wherein the MPEG_camera_control extension conforms to the data of Table 1 above.

條款154：如條款149-153中任一項之方法，其中，該至少一個虛擬固體物體包含虛擬牆。Clause 154: The method of any one of clauses 149-153, wherein the at least one virtual solid object comprises a virtual wall.

條款155：如條款149-154中任一項之方法，其中，防止該虛擬相機穿越該至少一個虛擬固體物體包含：防止該虛擬相機沿著超出在該MPEG_camera_control延伸中定義的可允許路徑的虛擬路徑移動。Clause 155: The method of any one of clauses 149-154, wherein preventing the virtual camera from traversing the at least one virtual solid object comprises: preventing the virtual camera from following a virtual path that exceeds an allowable path defined in the MPEG_camera_control extension move.

條款156：如條款145-155中任一項之方法，其中，該相機控制數據被包括在MPEG_mesh_collision延伸中。Clause 156: The method of any one of clauses 145-155, wherein the camera control data is included in an MPEG_mesh_collision extension.

條款157：如條款145-155中任一項之方法，其中，該MPEG_mesh_collision延伸包括定義用於該至少一個虛擬固體物體的至少一個3D網格的數據。Clause 157: The method of any of clauses 145-155, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

條款158：如條款157之方法，其中，該MPEG_mesh_collision延伸包括定義以下各項中的至少一項的數據：該3D網格之邊界、用於該3D網格的材料、或將響應於該虛擬相機接觸該3D網格而呈現的動畫。Clause 158: The method of Clause 157, wherein the MPEG_mesh_collision extension includes data defining at least one of: boundaries of the 3D mesh, materials for the 3D mesh, or will respond to the virtual camera An animation rendered by touching the 3D mesh.

條款159：如條款156-158中任一項之方法，其中，該MPEG_mesh_collision延伸符合以上表2。Clause 159: The method of any one of clauses 156-158, wherein the MPEG_mesh_collision extension conforms to Table 2 above.

條款160：如條款156-159中任一項之方法，其中，防止該虛擬相機穿越該至少一個虛擬固體物體包含：使用該MPEG_mesh_collision延伸來防止該虛擬相機進入該至少一個虛擬固體物體。Clause 160: The method of any of clauses 156-159, wherein preventing the virtual camera from passing through the at least one virtual solid object comprises: using the MPEG_mesh_collision extension to prevent the virtual camera from entering the at least one virtual solid object.

條款161：一種用於檢索媒體數據的裝置，該裝置包含用於履行如條款145-160中任一項之方法的一個或多個構件。Clause 161: An apparatus for retrieving media data, the apparatus comprising one or more means for performing the method of any one of clauses 145-160.

條款162：如條款161之裝置，其中，該一個或多個構件包含在電路中實現的一個或多個處理器。Clause 162: The apparatus of Clause 161, wherein the one or more means comprise one or more processors implemented in circuitry.

條款163：如條款161之裝置，其中，該器具包含以下各項中的至少一項：積體電路；微處理器；以及無線通信裝置。Clause 163: The device of Clause 161, wherein the apparatus comprises at least one of: an integrated circuit; a microprocessor; and a wireless communication device.

條款164：一種用於檢索媒體數據的裝置，該裝置包含：用於接收串流媒體數據的構件，該串流媒體數據表示包括至少一個虛擬固體物體的虛擬三維場景；用於接收用於該三維場景的相機控制數據的構件，該相機控制數據包括定義限制以防止虛擬相機穿越該至少一個虛擬固體物體的數據；用於從用戶接收相機移動數據的構件，該相機移動數據請求該虛擬相機移動穿過該至少一個虛擬固體物體；以及用於使用該相機控制數據來防止該虛擬相機響應於該相機移動數據而穿越該至少一個虛擬固體物體的構件。Clause 164: An apparatus for retrieving media data, the apparatus comprising: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for camera control data of the scene, the camera control data comprising data defining limits to prevent the virtual camera from traversing the at least one virtual solid object; means for receiving camera movement data from the user requesting the virtual camera to move through passing the at least one virtual solid object; and means for using the camera control data to prevent the virtual camera from passing through the at least one virtual solid object in response to the camera movement data.

在一個或多個示例中，所描述的功能可以用硬體、軟體、韌體或其任何組合來實現。如果用軟體來實現，則該功能可以作為一個或多個指令或代碼儲存在計算機可讀媒體上或者通過其進行傳輸並且由基於硬體的處理單元執行。計算機可讀媒體可以包括計算機可讀儲存媒體，其對應於諸如數據儲存媒體之類的有形媒體，或者包括例如根據通信協定來促進計算機程式從一個地方傳輸到另一個地方的任何媒體的通信媒體。以這種方式，計算機可讀媒體通常可以對應於（1）非暫時性的有形計算機可讀儲存媒體、或者（2）諸如信號或載波之類的通信媒體。數據儲存媒體可以是可以由一個或多個計算機或者一個或多個處理器存取以檢索用於實現在本公開內容中描述的技術的指令、代碼及/或數據結構的任何可用的媒體。計算機程式產品可以包括計算機可讀媒體。In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include computer readable media.

通過舉例而非限制性的方式，這樣的計算機可讀儲存媒體可以包括RAM、ROM、EEPROM、CD-ROM或其它光盤儲存、磁盤儲存或其它磁儲存裝置、快閃記憶體、或者能夠用於以指令或數據結構形式儲存期望的程式代碼以及能夠由計算機存取的任何其它媒體。此外，任何連接被適當地稱為計算機可讀媒體。例如，如果使用同軸纜線、光纖纜線、雙絞線、數位用戶線路（DSL）或者無線技術（諸如紅外線、無線電及微波）從網站、伺服器或其它遠程源傳輸指令，則同軸纜線、光纖纜線、雙絞線、DSL或者無線技術（諸如紅外線、無線電及微波）被包括在媒體的定義中。然而，應當理解的是，計算機可讀儲存媒體及數據儲存媒體不包括連接、載波、信號或其它暫時性媒體，而是替代地針對非暫時性的有形儲存媒體。如本文所使用的，磁盤及光碟包括緊湊光碟（CD）、雷射光碟、光學碟、數位多功能光碟（DVD）、軟盤及藍光光碟，其中，磁盤通常磁性地複製數據，而光碟則利用雷射來光學地複製數據。上述各項的組合亦應當被包括在計算機可讀媒體的範圍之內。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or may be used in Store desired program code in the form of instructions or data structures and any other medium that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cables, fiber optic cables, twisted pair cables, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwaves, then coaxial cables, Fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs use Injection to optically replicate data. Combinations of the above should also be included within the scope of computer-readable media.

指令可以由一個或多個處理器來執行，諸如一個或多個數位信號處理器（DSP）、通用微處理器、特定應用積體電路（ASIC）、現場可程式邏輯陣列（FPGA）、或其它等效的整合或離散邏輯電路。因此，如本文所使用的術語“處理器”可以指稱前述結構中的任何一者或者適於實現本文描述的技術的任何其它結構。另外，在一些態樣中，本文描述的功能可以在被組態用於編碼及解碼的專屬硬體及/或軟體模組內提供，或者被併入經組合的編解碼器中。此外，該技術可以完全在一個或多個電路或邏輯元件中實現。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Additionally, the technology may be implemented entirely in one or more circuits or logic elements.

本公開內容的技術可以在多種多樣的裝置或器具中實現，包括無線手機、積體電路（IC）或一組IC（例如，晶片組）。在本公開內容中描述了各種組件、模組或單元以強調被組態以履行所公開的技術的裝置的功能性態樣，但是不一定需要通過不同的硬體單元來實現。確切而言，如上所述，各種單元可以被組合在編解碼器硬體單元中，或者由互操作的硬體單元的彙集（包括如上所述的一個或多個處理器）結合適當的軟體及/或韌體來提供。The techniques of this disclosure may be implemented in a wide variety of devices or appliances, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or by a collection of interoperable hardware units (including one or more processors as described above) combined with appropriate software and /or firmware to provide.

已經描述了各個示例。這些及其它示例在隨後的申請專利範圍的範疇內。Various examples have been described. These and other examples are within the scope of the claims that follow.

10:系統 20:內容準備裝置 22:音頻源 24:視頻源 26:音頻編碼器 28:視頻編碼器 30:封裝單元 32:輸出介面 40:客戶端裝置 42:音頻輸出 44:視頻輸出 46:音頻解碼器 48:視頻解碼器 50:解封裝單元 52:檢索單元 54:網路介面 60:伺服器裝置 62:儲存媒體 64:多媒體內容 66:清單檔案 68A-68N:表示 70:請求處理單元 72:網路介面 74:網路 100:eMBMS中間件單元 102:代理伺服器單元 104:快取 106:eMBMS接收單元 110:DASH客戶端 112:媒體應用 114:呈現引擎 120:多媒體內容 122:媒體呈現描述 124A-124N:表示 126、130:標頭數據 128A-128N、132A-132N:分段 150:視頻檔案 152:檔案類型（FTYP）盒 154:電影（MOOV）盒 156:電影標頭（MVHD）盒 158:軌道（TRAK）盒 160:電影延伸（MVEX）盒 162:分段索引（SIDX）盒 164:電影片段（MOOF）盒 166:電影片段隨機存取（MFRA）盒 200:3D場景 202:相機 204、206:點 208、210:邊界框 212:路徑分段 220:虛擬物體 250:檢索媒體數據 252:提取場景描述 254:根據場景描述來決定相機控制數據 256:根據相機控制數據來決定移動限制 258:接收相機移動數據 260:決定相機移動數據請求穿過3D固體虛擬物體的移動 262:防止虛擬相機穿越3D固體虛擬物體 280:檢索媒體數據 282:提取場景描述 284:根據場景描述來決定相機控制數據 286:根據相機控制數據來決定物體碰撞數據 288:接收相機移動數據 290:決定相機移動數據請求穿過3D固體虛擬物體的移動 292:防止虛擬相機穿越3D固體虛擬物體 10: System 20: Content preparation device 22: Audio source 24: Video source 26:Audio encoder 28:Video Encoder 30: Encapsulation unit 32: output interface 40: Client device 42: Audio output 44: Video output 46:Audio decoder 48:Video decoder 50: Decapsulation unit 52: retrieval unit 54: Network interface 60: Server device 62: Storage media 64: Multimedia content 66:Manifest file 68A-68N: Indicates 70: request processing unit 72: Network interface 74: Network 100: eMBMS middleware unit 102: proxy server unit 104: Cache 106: eMBMS receiving unit 110: DASH client 112:Media application 114: Rendering engine 120: Multimedia content 122:Media Presentation Description 124A-124N: Indicates 126, 130: header data 128A-128N, 132A-132N: Segmentation 150:Video Archives 152: File Type (FTYP) Box 154: Movie (MOOV) box 156:Movie header (MVHD) box 158: Track (TRAK) box 160: Movie Extended (MVEX) Box 162:Segmented Index (SIDX) Box 164:Movie fragment (MOOF) box 166:Movie Fragment Random Access (MFRA) Box 200:3D scene 202: camera 204, 206: points 208, 210: bounding box 212: Path segmentation 220: Virtual objects 250: Retrieve media data 252: Extract scene description 254:Determine camera control data according to scene description 256: Determine movement restrictions based on camera control data 258: Receive camera movement data 260: Determining Camera Movement Data Requests Movement Through 3D Solid Virtual Objects 262: Prevent the virtual camera from passing through 3D solid virtual objects 280: Retrieve media data 282: Extract scene description 284:Determine camera control data according to scene description 286: Determine object collision data based on camera control data 288: Receive camera movement data 290:Determine camera movement data request movement through 3D solid virtual objects 292:Prevent virtual camera from passing through 3D solid virtual objects

圖1是示出實現用於在網路上對媒體數據進行串流傳輸的技術的示例系統的方塊圖。1 is a block diagram illustrating an example system that implements techniques for streaming media data over a network.

圖2是更詳細地示出圖1的檢索單元的示例組件集合的方塊圖。FIG. 2 is a block diagram illustrating an example component set of the retrieval unit of FIG. 1 in more detail.

圖3是示出示例多媒體內容的元素的概念圖。FIG. 3 is a conceptual diagram illustrating elements of example multimedia content.

圖4是示出可以對應於表示的分段的示例視頻檔案的元素的方塊圖。4 is a block diagram illustrating elements of an example video archive that may correspond to segments of a representation.

圖5是示出根據本公開內容的技術的具有邊界體積的示例相機路徑分段的概念圖。5 is a conceptual diagram illustrating an example camera path segmentation with a bounding volume in accordance with the techniques of this disclosure.

圖6是示出示例虛擬物體的概念圖，虛擬物體在該示例中是椅子。FIG. 6 is a conceptual diagram illustrating an example virtual object, which in this example is a chair.

圖7是示出根據本公開內容的技術的檢索媒體數據的示例方法的流程圖。7 is a flowchart illustrating an example method of retrieving media data in accordance with the techniques of this disclosure.

圖8是示出根據本公開內容的技術的檢索媒體數據的示例方法的流程圖。8 is a flowchart illustrating an example method of retrieving media data in accordance with the techniques of this disclosure.

280:檢索媒體數據 280: Retrieve media data

282:提取場景描述 282: Extract scene description

284:根據場景描述來決定相機控制數據 284:Determine camera control data according to scene description

286:根據相機控制數據來決定物體碰撞數據 286: Determine object collision data based on camera control data

288:接收相機移動數據 288: Receive camera movement data

290:決定相機移動數據請求穿過3D固體虛擬物體的移動 290:Determine camera movement data request movement through 3D solid virtual objects

292:防止虛擬相機穿越3D固體虛擬物體 292:Prevent virtual camera from passing through 3D solid virtual objects

Claims

A method of retrieving media data, the method comprising: receiving, by the rendering engine, streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving, by the rendering engine, object collision data representing a boundary of the at least one virtual solid object; receiving, by the rendering engine, camera movement data from a user requesting movement of a virtual camera through the at least one virtual solid object; and Using the object collision data, the position of the virtual camera is updated by the rendering engine to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

The method of claim 1, wherein updating the position of the virtual camera includes: preventing the virtual camera from passing through the at least one virtual solid object.

The method of claim 1, wherein receiving the object collision data includes: receiving an MPEG_mesh_collision extension.

The method of claim 3, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

The method of claim 4, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh, or An animation that will be rendered in response to the virtual camera touching the 3D mesh.

The method of claim 1, wherein receiving the object collision data includes receiving data including one or more of the following items: boundary data representing one or more collision boundaries of the at least one virtual solid object; static data indicating whether the at least one virtual solid object is affected by a collision; material data representing how the collision object interacts with the at least one virtual solid object; or animation data representing an animation triggered by a collision with the at least one virtual solid object.

The method according to claim 1, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The method of claim 1, wherein the streaming media data includes glTF 2.0 media data.

The method of claim 1, wherein receiving the streaming media data includes: requesting the streaming media data from a retrieval unit through an application program interface (API).

The method of claim 1, wherein the object collision data is included in an MPEG scene description.

A device for retrieving media data, the device comprising: memory for storing media data; and One or more processors implemented in circuitry and configured to execute a rendering engine configured to: receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving object collision data representing a boundary of the at least one virtual solid object; receiving camera movement data from a user requesting virtual camera movement through the at least one virtual solid object; and Using the object collision data, the position of the virtual camera is updated to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

The device of claim 11, wherein, in order to update the position of the virtual camera, the rendering engine is configured to prevent the virtual camera from passing through the at least one virtual solid object.

The apparatus of claim 11, wherein, to receive the object collision data, the rendering engine is configured to receive an MPEG_mesh_collision extension.

The apparatus of claim 13, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

The apparatus of claim 14, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a material for the 3D mesh, or An animation that will be rendered in response to the virtual camera touching the 3D mesh.

The device of claim 11, wherein, in order to receive the object collision data, the rendering engine is configured to receive data comprising one or more of the following: boundary data representing one or more collision boundaries of the at least one virtual solid object; static data indicating whether the at least one virtual solid object is affected by a collision; material data representing how the collision object interacts with the at least one virtual solid object; or animation data representing an animation triggered by a collision with the at least one virtual solid object.

The device according to claim 11, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The device according to claim 11, wherein the streaming media data includes glTF 2.0 media data.

The device of claim 11, wherein, in order to receive the streaming media data, the presentation engine is configured to request the streaming media data from a retrieval unit via an Application Programming Interface (API).

The apparatus of claim 11, wherein the object collision data is included in an MPEG scene description.

A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to: receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; receiving object collision data representing a boundary of the at least one virtual solid object; receiving camera movement data from a user requesting virtual camera movement through the at least one virtual solid object; and Using the object collision data, the position of the virtual camera is updated to ensure that the virtual camera remains outside the at least one virtual solid object in response to the camera movement data.

The computer-readable medium of claim 21, wherein the instructions for causing the processor to update the position of the virtual camera comprise instructions for causing the processor to prevent the virtual camera from passing through the at least one virtual solid object.

The computer-readable medium of claim 21, wherein the instruction for enabling the processor to receive the object collision data comprises: an instruction for enabling the processor to receive an MPEG_mesh_collision extension.

The computer readable medium of claim 22, wherein the MPEG_mesh_collision extension includes data defining at least one 3D mesh for the at least one virtual solid object.

The computer-readable medium of claim 23, wherein the MPEG_mesh_collision extension includes data defining at least one of: a boundary of a 3D mesh for the at least one virtual solid object, a boundary for the 3D mesh material, or animation that will be rendered in response to the virtual camera touching the 3D mesh.

The computer-readable medium of claim 21, wherein the instructions for causing the processor to receive the object collision data include instructions for causing the processor to receive data comprising one or more of the following: boundary data representing one or more collision boundaries of the at least one virtual solid object; static data indicating whether the at least one virtual solid object is affected by a collision; material data representing how the collision object interacts with the at least one virtual solid object; or animation data representing an animation triggered by a collision with the at least one virtual solid object.

The computer-readable medium of claim 21, wherein the at least one virtual solid object includes one of a virtual wall, a virtual chair, or a virtual table.

The computer-readable medium of claim 21, wherein the streaming media data includes glTF 2.0 media data.

The computer-readable medium of claim 21, wherein the instruction for causing the processor to receive the streaming media data comprises: an instruction for causing the processor to request the streaming media data from a retrieval unit via an application program interface (API).

The computer readable medium of claim 21, wherein the object collision data is included in an MPEG scene description.

A device for retrieving media data, the device comprising: means for receiving streaming media data representing a virtual three-dimensional scene including at least one virtual solid object; means for receiving object collision data representing a boundary of the at least one virtual solid object; means for receiving camera movement data from a user requesting virtual camera movement through the at least one virtual solid object; and means for updating the position of the virtual camera using the object collision data to ensure that the virtual camera remains outside of the at least one virtual solid object in response to the camera movement data.