TW201813372A

TW201813372A - Method and system for signaling of 360-degree video information

Info

Publication number: TW201813372A
Application number: TW106129782A
Authority: TW
Inventors: 菲利普漢哈特; 何玉文; 言葉
Original assignee: 美商Ｖｉｄ衡器股份有限公司
Priority date: 2016-09-02
Filing date: 2017-08-31
Publication date: 2018-04-01
Also published as: US11284089B2; JP2019530311A; CN109644279A; US20220174289A1; WO2018045108A1; CN117201817A; KR20190054060A; US20190200023A1; EP3507985A1; US11876981B2; CN109644279B

Abstract

Coding techniques for 360-degree video are described. An encoder selects a projection format and maps the 360-degree video to a 2D planar video using the selected projection format. The encoder encodes the 2D planar video in a bitstream and further signals, in the bitstream, parameters identifying the projection format. The parameters identifying the projection format may be signaled in a video parameter set, sequence parameter set, and/or picture parameter set of the bitstream. Different projection formats that may be signaled include formats using geometries such as equirectangular, cubemap, equal-area, octahedron, icosahedron, cylinder, and user-specified polygon. Other parameters that may be signaled include different arrangements of geometric faces or different encoding quality for different faces. Corresponding decoders are also described. In some embodiments, projection parameters may further include relative geometry rotation parameters that define an orientation of the projection geometry.

Description

360 degree video information communication method and system

相關申請案的交叉引用 本申請案是根據35U.S.C. §119(e) 要求於2016年9月2日提出、名稱為“Method and System for Signaling of 360-Degree Video Information”的美國臨時專利申請案序號No. 62/383,367以及於2016年10月12日提出、名稱為“Method and System for Signaling of 360-Degree Video Information”的美國臨時專利申請案序號No. 62/407,337的權益，其全部內容藉由引用併入本文。 Cross-Reference to Related Applications This application is a US Provisional Patent Application entitled "Method and System for Signaling of 360-Degree Video Information", filed on September 2, 2016, in accordance with 35 USC § 119(e) No. 62/383,367, and the benefit of U.S. Provisional Patent Application Serial No. 62/407,337, entitled "Method and System for Signaling of 360-Degree Video Information", filed on October 12, 2016, the entire contents of which is incorporated by reference. Incorporated herein by reference.

虛擬實境（VR）正在從研究實驗室走出並進入我們的日常生活中。 VR有很多應用領域：醫療保健、教育、社交網路、行業設計/培訓、遊戲、電影、購物、娛樂等。其受到行業和消費者的高度關注，因為VR能夠帶來身臨其境的觀看體驗。其創建一個圍繞著觀看者的虛擬環境，並可能產生一種“正在存在”的真實感覺。如何在VR環境中提供完整的真實感覺對於使用者的體驗很重要。例如，VR系統應該經由姿態、手勢、眼睛凝視、聲音等來支援交互。為了允許使用者以自然的方式與VR世界中的物件交互，系統還可以向使用者提供觸覺回饋。Virtual Reality (VR) is coming out of research laboratories and entering our daily lives. VR has many application areas: healthcare, education, social networking, industry design/training, gaming, movies, shopping, entertainment, and more. It is highly regarded by industry and consumers because VR can bring an immersive viewing experience. It creates a virtual environment around the viewer and may create a real feeling of "being there." How to provide a complete real feeling in a VR environment is important to the user's experience. For example, a VR system should support interaction via gestures, gestures, eye gaze, sound, and the like. In order to allow the user to interact with objects in the VR world in a natural manner, the system can also provide tactile feedback to the user.

現今的VR系統使用360度視訊來為使用者提供從水平方向360度角度和垂直方向180度角度觀看場景的能力。同時，VR和360度視訊被視為超出超高畫質（UHD）服務的媒體消費的未來方向。為了提高VR中360度視訊的品質，規範VR用戶端互通性的處理鏈，2016年初，屬於MPEG-A（多媒體應用程式格式）部分-19的ad hoc組已在ISO/IEC/MPEG建立，以致力於全方位媒體應用格式的要求和潛在技術。另一個ad hoc組，免費觀看電視（FTV），發佈了360度3D視訊應用的探索實驗。 FTV的一個主要目標是測試兩種解決方案的性能：（1）基於360度視訊（全向視訊）的系統；（2）基於多視圖的系統。來自MPEG和ITU-T的聯合視訊探索團隊（JVET）正在探索下一代視訊編碼標準的新技術，發佈了包括VR在內的測試序列的召集。在2016年6月的會議上，ad hoc組（AHG8）成立，AHG8組的任務是制定360視訊編碼的常用測試條件、測試序列格式和評估標準。AHG8還將研究應用不同投影方法時對壓縮的影響，以及轉換軟體時對壓縮的影響。Today's VR systems use 360-degree video to provide the user with the ability to view the scene from a 360-degree angle in the horizontal direction and a 180-degree angle in the vertical direction. At the same time, VR and 360-degree video are seen as the future direction of media consumption beyond ultra-high definition (UHD) services. In order to improve the quality of 360-degree video in VR and standardize the processing chain of VR client interoperability, in early 2016, the ad hoc group belonging to MPEG-A (Multimedia Application Format) part-19 has been established in ISO/IEC/MPEG. Committed to the requirements and potential technologies of a full range of media application formats. Another ad hoc group, Free Watch TV (FTV), released a 360-degree 3D video application exploration experiment. One of the main goals of FTV is to test the performance of two solutions: (1) systems based on 360-degree video (omnidirectional video); and (2) multi-view based systems. The Joint Video Discovery Team (JVET) from MPEG and ITU-T is exploring new technologies for next-generation video coding standards and has released a collection of test sequences including VR. At the June 2016 meeting, the ad hoc group (AHG8) was established. The task of the AHG8 group was to develop common test conditions, test sequence formats and evaluation criteria for 360 video coding. AHG8 will also study the effects of compression on different projection methods and the effects of compression on software.

行業正在努力提高VR處理鏈中各個方面的品質和使用者體驗，包括擷取、處理、顯示和應用。在擷取方面，VR系統使用多個相機系統從不同的發散視圖擷取場景（例如，在某些情況下，大約6到12個視圖）。將這些視圖拼接（stitch）在一起形成高解析度（例如4K或8K）的360度視訊。在用戶端或使用者側，目前的虛擬實境系統通常包括計算平臺、頭戴式顯示器（HMD）和頭部追蹤感測器。計算平臺負責接收和解碼360度視訊、並產生視埠以用於顯示。兩張照片（每個眼睛一個）針對視埠而呈現。這兩張照片在HMD中顯示，用於立體觀看。可以使用透鏡來放大在HMD中顯示的圖像以便更好地觀看。頭部追蹤感測器可以不斷地追蹤觀看者的頭部方位（orientation），並且將方位資訊饋送到系統以顯示用於該方位的視埠圖像。The industry is working to improve the quality and user experience in all aspects of the VR processing chain, including capture, processing, display and application. In terms of capture, the VR system uses multiple camera systems to capture scenes from different divergent views (eg, in some cases, approximately 6 to 12 views). These views are stitched together to form a high resolution (eg 4K or 8K) 360 degree video. On the client or user side, current virtual reality systems typically include a computing platform, a head mounted display (HMD), and a head tracking sensor. The computing platform is responsible for receiving and decoding 360-degree video and generating a view for display. Two photos (one for each eye) are presented for viewing. These two photos are displayed in the HMD for stereo viewing. A lens can be used to magnify the image displayed in the HMD for better viewing. The head tracking sensor can continually track the viewer's head orientation and feed the orientation information to the system to display a viewfinder image for that orientation.

一些VR系統可以為觀看者提供專門的觸摸裝置以與虛擬世界中的物件進行交互。存在市場上可獲得的現有VR系統。一種是Oculus提供的Rift，以及來自三星和Oculus的Gear VR。 Rift由具有良好的GPU支援的強大的工作站驅動。Gear VR是一種輕型VR系統，其使用作為計算平臺的智慧手機、HMD顯示器和頭部追蹤感測器。第二種VR系統是HTC Vive系統。 Rift和Vive具有相似的性能。空間HMD解析度為2160×1200，復新率為90 Hz，視場（FOV）約為110度。頭追蹤感測器的取樣速率為1000 Hz，可以擷取非常快的運動。谷歌也有一個簡單的VR系統叫做紙盒（cardboard）。 Google 紙盒有一個鏡頭和紙盒元件，類似於Gear VR，其是由智慧手機驅動的。新力也提供了用於遊戲的PlayStation VR (遊戲機VR)。在360度視訊流服務方面，YouTube和臉書(Facebook)躋身於早期的提供者之中。Some VR systems can provide viewers with specialized touch devices to interact with objects in the virtual world. There are existing VR systems available on the market. One is the Rift from Oculus, and the Gear VR from Samsung and Oculus. The Rift is powered by a powerful workstation with good GPU support. Gear VR is a lightweight VR system that uses a smartphone, HMD display and head tracking sensor as a computing platform. The second type of VR system is the HTC Vive system. Rift and Vive have similar performance. The spatial HMD resolution is 2160×1200, the refresh rate is 90 Hz, and the field of view (FOV) is about 110 degrees. The head tracking sensor has a sampling rate of 1000 Hz and can take very fast motion. Google also has a simple VR system called a cardboard. The Google Tray has a lens and tray component, similar to the Gear VR, which is powered by a smartphone. Sony also offers PlayStation VR (Game VR) for games. In the 360-degree video streaming service, YouTube and Facebook are among the early providers.

在這些目前的VR系統中，諸如互動和觸覺回饋等體驗的品質仍然需要進一步改進。例如，現今的HMD仍然太大，不方便穿戴。此外，由HMD提供的立體視圖的目前解析度2160×1200是不夠的，並且可能導致一些使用者頭暈和不適。因此，增加解析度將是有益的。此外，將VR環境中的視覺感覺與現實世界中的力量回饋相結合，是增強VR體驗的一個選擇。VR雲霄飛車是一範例應用。In these current VR systems, the quality of experiences such as interactive and tactile feedback still needs further improvement. For example, today's HMDs are still too big to wear. Furthermore, the current resolution of 2160 x 1200 for stereoscopic views provided by HMDs is insufficient and may cause dizziness and discomfort to some users. Therefore, it would be beneficial to increase the resolution. In addition, combining visual sensations in a VR environment with power feedback in the real world is an option to enhance the VR experience. The VR Cloud Speed is an example application.

許多公司正在開展360度視訊壓縮和遞送系統，他們有自己的解決方案。例如，Google YouTube為基於DASH的360度視訊流提供了管道。 Facebook還擁有360度視訊遞送解決方案。Many companies are developing 360-degree video compression and delivery systems that have their own solutions. For example, Google YouTube provides a conduit for DASH-based 360-degree video streaming. Facebook also has a 360-degree video delivery solution.

本文的系統和方法旨在解決與360度視訊資料編碼和解碼有關的問題。The systems and methods herein are directed to solving problems associated with 360-degree video data encoding and decoding.

在編碼360度視訊的範例性方法中，編碼器選擇投影格式，其中投影格式包括諸如幾何類型及/或幾何方位之類的資訊。編碼器使用所選投影格式將360度視訊映射到2D平面視訊。編碼器對位元流中的2D平面視訊進行編碼、並且在位元流中進一步傳訊識別投影格式的參數。各種幾何類型可以被使用並且可以在位元流中被傳訊，包括等距柱狀（equirectangular）、立方體貼圖貼圖（cubemap）、等面積、八面體、二十面體、圓柱體和使用者指定的多邊形。對於與多個面相關聯的幾何類型，可以傳訊訊框封裝參數以識別2D平面視訊中那些面的位置及/或方位。可以用不同的尺寸及/或不同的品質位準對不同的面進行編碼。識別幾何方位的參數可以包括偏航參數、俯仰參數和滾動參數中的至少一者。In an exemplary method of encoding 360 degree video, the encoder selects a projection format, wherein the projection format includes information such as geometric type and/or geometric orientation. The encoder maps 360-degree video to 2D flat video using the selected projection format. The encoder encodes the 2D planar video in the bitstream and further communicates in the bitstream to identify parameters of the projected format. Various geometry types can be used and can be signaled in the bitstream, including equirectangular, cubemap, equal area, octahedron, icosahedron, cylinder, and user-specified Polygon. For geometry types associated with multiple faces, the frame encapsulation parameters can be communicated to identify the position and/or orientation of those faces in the 2D planar video. Different faces can be coded with different sizes and/or different quality levels. The parameter identifying the geometric orientation may include at least one of a yaw parameter, a pitch parameter, and a rolling parameter.

識別投影格式的參數可以在位元流的視訊參數集、序列參數集及/或圖像參數集中被傳訊。可以基於速率失真最佳化來選擇投影參數。視訊中的不同圖像或不同序列可以使用不同的投影格式進行編碼（例如，當不同的圖像或序列對於不同投影格式具有較高速率失真性能時），在適當的參數集處提供傳訊投影格式參數。還描述了相應的解碼技術。The parameters identifying the projection format can be communicated in the video parameter set, the sequence parameter set, and/or the image parameter set of the bit stream. The projection parameters can be selected based on rate distortion optimization. Different images or different sequences in the video can be encoded using different projection formats (eg, when different images or sequences have higher rate distortion performance for different projection formats), providing a telegraphic projection format at the appropriate parameter set parameter. Corresponding decoding techniques are also described.

在本揭露中描述的是用於360度視訊編碼的範例性語法。語法元素可以用於指定投影幾何及/或使用網格系統來指定在訊框封裝圖像中的面的佈置。面可以具有不同的尺寸及/或方位。在一些實施方式中，2-D平面上的面佈置可以具有各種特徵，例如沿著每行/列的固定的面寬度/高度。在一些實施方式中，使用任何基於多邊形的表示來描述使用者指定幾何體的範例性語法。在一些實施方式中使用的附加特徵可以包括：使用旗標來跳過用於填充訊框封裝圖像的樣本、在逐個面的等級上傳訊增量量化參數（QP）、賦能/禁用特定面之間的迴路濾波器的旗標、及/或僅編碼360視訊的特定區域的語法。Described in the present disclosure is an exemplary syntax for 360 degree video coding. Syntax elements can be used to specify projection geometry and/or use a grid system to specify the placement of faces in the frame encapsulation image. The faces can have different sizes and/or orientations. In some embodiments, the face arrangement on the 2-D plane can have various features, such as a fixed face width/height along each row/column. In some embodiments, any polygon-based representation is used to describe an exemplary syntax of a user-specified geometry. Additional features used in some embodiments may include using a flag to skip samples for filling the frame-packaged image, uploading incremental quantization parameters (QP) on a face-by-face level, enabling/disabling specific faces The flag of the loop filter between, and/or the syntax of only the specific area of the 360 video.

在一些實施方式中，投影參數還可以包括相對幾何旋轉參數。這樣的參數可以限定投影幾何的方位。投影幾何可以被選擇性地定向，使得感興趣的物件基本上完全包括在投影幾何的單一面內。在以不同品質位準（例如不同QP值）編碼不同面的實施方式中，投影幾何可以被定向為使得感興趣物件基本上完全包含在用相對高品質位準編碼的面內。In some embodiments, the projection parameters can also include relative geometric rotation parameters. Such parameters can define the orientation of the projection geometry. The projection geometry can be selectively oriented such that the object of interest is substantially completely included within a single plane of the projection geometry. In embodiments in which different faces are encoded with different quality levels (eg, different QP values), the projection geometry can be oriented such that the object of interest is substantially completely contained within the face encoded with a relatively high quality level.

現在將參考各附圖提供說明性實施方式的詳細描述。儘管該描述提供了可能的實施的詳細範例，但是應當注意，提供的細節旨在作為範例而不是限制應用的範圍。360 度視訊編碼和解碼 A detailed description of the illustrative embodiments will now be provided with reference to the drawings. While the description provides a detailed example of possible implementations, it should be noted that the details are not intended to limit the scope of the application. 360- degree video encoding and decoding

360度視訊遞送的一種技術是使用球體幾何結構來表示360度資訊。例如，由多個相機擷取的同步的多個視圖被拼接在球體上以作為一個整體結構。然後，使用給定的幾何轉換過程(例如，等距柱狀投影（ERP）方法)將球體資訊投影到2D平面表面。第1A圖示出了經度（φ）和緯度（θ）上的球體取樣，第1B圖示出了使用等距柱狀投影將球體投影到2D平面。在航空學中，在[-π，π]範圍內的經度φ被稱為偏航，在[-π/2，π/2]範圍內的緯度θ稱為俯仰，其中π是圓的周長與其直徑的比率。為了便於說明，（x，y，z）用於表示3D空間中的點的座標，以及（ue，ve）用於表示具有等距柱狀投影的2D平面中的點的座標。等距柱狀投影可以在等式（1）和（2）中以數學方法表示： ue = (φ/(2* π)+0.5)*W (1) ve = (0.5 - θ/π)*H (2)One technique for 360-degree video delivery is to use sphere geometry to represent 360-degree information. For example, a plurality of synchronized views captured by a plurality of cameras are spliced onto a sphere as a unitary structure. The sphere information is then projected onto the 2D planar surface using a given geometric transformation process (eg, an equidistant columnar projection (ERP) method). Figure 1A shows sphere sampling on longitude (φ) and latitude (θ), and Figure 1B shows projection of spheres into 2D plane using equidistant cylindrical projection. In aeronautics, the longitude φ in the range [-π, π] is called yaw, and the latitude θ in the range of [-π/2, π/2] is called pitch, where π is the circumference of the circle. The ratio to its diameter. For convenience of explanation, (x, y, z) is used to represent coordinates of points in 3D space, and (ue, ve) is used to represent coordinates of points in a 2D plane having equidistant columnar projections. The equidistant columnar projection can be mathematically represented in equations (1) and (2): ue = (φ/(2* π)+0.5)*W (1) ve = (0.5 - θ/π)* H (2)

其中W和H是2D平面圖像的寬度和高度。如第1A圖所示，使用等式（1）和（2）將球體上的經度L4和緯度A1之間的交叉點（點P）映射到2D平面中的唯一點q（第1B圖）。2D平面中的點q可以經由反向投影而投影回球體上的點P。第1B圖中的視場（FOV）示出了將球體中的FOV映射到2D平面的範例，其中沿X軸的視角為約110度。Where W and H are the width and height of the 2D planar image. As shown in Fig. 1A, the intersection (point P) between the longitude L4 and the latitude A1 on the sphere is mapped to the unique point q (Fig. 1B) in the 2D plane using equations (1) and (2). The point q in the 2D plane can be projected back to the point P on the sphere via backprojection. The field of view (FOV) in Figure 1B shows an example of mapping FOVs in a sphere to a 2D plane with a viewing angle along the X-axis of about 110 degrees.

經由ERP，360度視訊可以映射到正規2D視訊。其可以用現有的視訊編解碼器（如H.264或HEVC）被編碼，然後遞送到用戶端。在用戶端側，藉由在HMD內投影和顯示屬於等距柱狀圖像中的FOV的部分，基於使用者的視埠來解碼和呈現等距柱狀視訊。雖然球體視訊可以轉換到2D平面圖像以用等距柱狀投影進行編碼，但是等距柱狀2D圖像的特徵與傳統2D圖像（也稱為直線視訊）的特徵不同。第1C圖是房間內部的範例等距柱狀圖像的示意圖。由於在2D空間域中的等距柱狀取樣是不均勻的，與對應於赤道的圖像的中間部分相比，對應於北極的圖像的頂部和對應於南極的底部被拉伸。與正規2D視訊中的運動相比，2D等距柱狀圖像中的運動場在時間方向中變得複雜。Through ERP, 360-degree video can be mapped to regular 2D video. It can be encoded with an existing video codec such as H.264 or HEVC and then delivered to the client. On the user side, by projecting and displaying the portion of the FOV in the equidistant columnar image within the HMD, the equidistant columnar video is decoded and presented based on the user's view. Although spherical video can be converted to 2D planar images for encoding with equidistant cylindrical projections, the features of the equidistant cylindrical 2D images are different from those of conventional 2D images (also known as linear video). Figure 1C is a schematic illustration of an example equidistant columnar image inside a room. Since the equidistant columnar sampling in the 2D spatial domain is not uniform, the top of the image corresponding to the north pole and the bottom corresponding to the south pole are stretched compared to the middle portion of the image corresponding to the equator. Compared to motion in regular 2D video, the motion field in a 2D isometric columnar image becomes complex in the time direction.

例如MPEG-2、H.264和HEVC之類的視訊編解碼器使用平移模型來描述運動場，並且不能有效地表示等距柱狀投影的2D平面圖像中的形狀變化運動。等距柱狀投影的另一個缺點是，與更靠近赤道的區域相比，靠近極點的區域對於觀看者及/或內容提供者來說可能不那麼有趣。例如，觀看者可能不會在任何相當長的時間段內專注於頂部和底部區域。然而，基於翹曲效應，這些區域在等距柱狀投影之後被拉伸成2D平面的大部分，並且壓縮這些區域可能需要大量的位元。Video codecs such as MPEG-2, H.264, and HEVC use a translation model to describe the motion field and do not effectively represent shape-changing motion in a 2D planar image of an equidistant columnar projection. Another disadvantage of equidistant columnar projections is that the area near the pole may be less interesting to the viewer and/or content provider than to the area closer to the equator. For example, a viewer may not focus on the top and bottom regions for any significant period of time. However, based on the warping effect, these regions are stretched into a large portion of the 2D plane after equidistant columnar projection, and compressing these regions may require a large number of bits.

基於這些觀察，正在研究一些處理方法來改進等距柱狀圖像編碼，例如藉由對這些極點區域應用例如平滑之類的預處理來減少編碼它們所需的頻寬。此外，已經提出了用於表示360度視訊的不同幾何投影，諸如立方體貼圖、等面積、圓柱體、金字塔、八面體等。在這些投影方法中，最容易壓縮的幾何可以是立方體貼圖，其共有6個面，每個面都是一個平面正方形。第2A圖顯示立方體貼圖幾何的範例。立方體貼圖由6個正方形面組成。假設相切球體的半徑為1，則立方體貼圖的每個面（正方形）的橫向長度為2。第2B圖示出了將6個面放置成矩形的一種封裝方法，其可以用於編碼和遞送。具有立方體貼圖投影的範例圖像的示意圖在第2C圖示出。空白區域（20）是填充區域以填充矩形圖像。對於每個面，圖像看起來與正規2D圖像相同。但是，每個面的邊界不連續。穿過兩個相鄰面的直線（例如表示牆壁和天花板之間的連接處的線22）將在這兩個面的邊界處彎曲。這意味著在面邊界處的運動也將是不連續的。Based on these observations, some processing methods are being studied to improve equidistant columnar image coding, for example by applying pre-processing such as smoothing to these pole regions to reduce the bandwidth required to encode them. In addition, different geometric projections have been proposed for representing 360 degree video, such as cube maps, equal areas, cylinders, pyramids, octahedrons, and the like. Among these projection methods, the most easily compressed geometry can be a cube map with a total of 6 faces, each face being a flat square. Figure 2A shows an example of cube texture geometry. The cube map consists of 6 square faces. Assuming that the radius of the tangent sphere is 1, the lateral length of each face (square) of the cube map is 2. Figure 2B shows an encapsulation method in which six faces are placed in a rectangle, which can be used for encoding and delivery. A schematic of an example image with a cubemap projection is shown in Figure 2C. The blank area (20) is a filled area to fill a rectangular image. For each face, the image looks the same as a regular 2D image. However, the boundaries of each face are not continuous. A line passing through two adjacent faces (eg, line 22 representing the junction between the wall and the ceiling) will bend at the boundary of the two faces. This means that the motion at the face boundaries will also be discontinuous.

第3A圖至第3B圖示出了等面積投影的幾何結構範例。與等距柱狀投影不同，球體上的垂直取樣不是基於間距（pitch）的偶數間隔。每個取樣緯度的Y軸上的投影均勻分佈，以便為球體上的每個樣本實現相同的面積。對於靠近極點區域的那些區域，垂直方向的取樣變得更加稀疏。這也意味著赤道附近有更多的樣本。在實際情況下，這是較為理想的，因為與靠近極點的區域相比，使用者通常更頻繁地觀看赤道附近的區域。第3C圖是具有等面積投影的範例圖像的示意圖。與第1C圖相比，在第3C圖中在赤道附近的區域被放大，而極點附近的區域被擠壓。Figures 3A through 3B show examples of geometrical structures of equal area projections. Unlike equidistant columnar projections, vertical sampling on a sphere is not based on an even spacing of pitch. The projection on the Y-axis of each sample latitude is evenly distributed to achieve the same area for each sample on the sphere. For those areas close to the pole area, the sampling in the vertical direction becomes more sparse. This also means there are more samples near the equator. In practice, this is desirable because the user typically views the area near the equator more frequently than the area near the pole. Figure 3C is a schematic diagram of an example image with an equal area projection. In the 3C figure, the area near the equator is enlarged, and the area near the pole is squeezed, as compared with the 1C figure.

第4A圖示出了八面體投影的幾何結構的範例。八面體由8個等邊三角形面組成。如果相切球體的半徑為1，則每個三角形的橫向長度為√6。第4B圖示出了將八個三角形佈置成一個矩形的一種封裝方法。第4C圖示意性地示出了具有八面體投影的一個範例圖。在兩個相鄰三角形的共用邊界的角落處觀察到翹曲失真，例如所看到的在門口402的失真。Figure 4A shows an example of the geometry of an octahedral projection. The octahedron consists of 8 equilateral triangular faces. If the radius of the tangent sphere is 1, the lateral length of each triangle is √6. Fig. 4B shows a packaging method of arranging eight triangles into one rectangle. Figure 4C schematically shows an example diagram with an octahedral projection. Warping distortion is observed at the corners of the common boundary of two adjacent triangles, such as the distortion seen at the doorway 402.

為了比較不同幾何投影方法的編碼效率，Yu等在M. Yu，H. Lakshman，B. Girod，“A Framework to Evaluate Omnidirectional Video Coding Scheme”，IEEE International Symposium on Mixed and Augmented Reality（IEEE混合與增強現實國際會議），2015，中提出了基於緯度的PSNR（L-PSNR）。其考慮了兩個因素：（1）在球體上均勻取樣；（2）觀看者的觀看行為。其定義了在球體上均勻分佈的一些樣本，並且還根據其緯度來定義樣本的權重。藉由考慮所有這些均勻分佈的樣本，用加權均方誤差（MSE）測量失真。藉由在觀看者觀察這些訓練序列時，藉由追蹤觀看者的視角來導出權重。如果被更頻繁地觀察，則權重會更大。根據這些統計，赤道附近的權重大於極點附近的權重，因為最感興趣的內容位於赤道附近。使用球體上這些均勻分佈的樣本提供了一種比較不同投影方法的性能的措施。然而，當應用不同的投影時，那些預定義的球體樣本可能不會投影到整數取樣位置。如果應用基於內插濾波器的重取樣方法，則會引入額外的內插誤差。如果應用最近鄰取樣，則不能再保證均勻取樣。因此，客觀和主觀品質評估方法仍然是360度視訊編碼的開放主題。In order to compare the coding efficiency of different geometric projection methods, Yu et al., M. Yu, H. Lakshman, B. Girod, "A Framework to Evaluate Omnidirectional Video Coding Scheme", IEEE International Symposium on Mixed and Augmented Reality (IEEE Hybrid and Augmented Reality) Latitude-based PSNR (L-PSNR) is proposed in International Conferences, 2015. It considers two factors: (1) uniform sampling on the sphere; (2) viewer's viewing behavior. It defines some samples that are evenly distributed over the sphere and also defines the weight of the sample based on its latitude. Distortion is measured by weighted mean square error (MSE) by considering all of these uniformly distributed samples. The weights are derived by tracking the viewer's perspective as the viewer observes the training sequences. If it is observed more frequently, the weight will be greater. According to these statistics, the weight near the equator is greater than the weight near the pole because the most interesting content is located near the equator. Using these evenly distributed samples on the sphere provides a measure to compare the performance of different projection methods. However, when different projections are applied, those predefined sphere samples may not be projected to the integer sampling position. If an interpolation filter based resampling method is applied, additional interpolation errors are introduced. If the nearest neighbor sampling is applied, uniform sampling cannot be guaranteed. Therefore, objective and subjective quality assessment methods remain the open theme of 360-degree video coding.

360度相機和拼接軟體廣泛支援等距柱狀格式。為了在立方體貼圖幾何中編碼360度視訊，必須將等距柱狀格式轉換為立方體貼圖格式。等距柱狀和立方體貼圖之間的關係如下。在第2A圖中，每個面涉及從球體中心到面中心的三個軸中的每一個。“P”表示正，“N”表示負，PX表示從球體中心沿正X軸的方向，NX為PX的反方向，PY、NY、PZ和NZ類似標記。然後，存在分別對應於正面、背面、頂部、底部、右側和左側面的6個面（PX，NX，PY，NY，PZ，NZ），並且這些面從0到5被編索引。令Ps（X_s，Y_s，Z_s）是半徑為1的球體上的點。其可以偏航φ和俯仰θ表示如下： X_s = cos(θ)cos(φ) (3) Y_s = sin(θ) (4) Z_s = -cos(θ)sin(φ) (5)The 360-degree camera and stitching software support a wide range of equidistant column formats. In order to encode 360-degree video in cubemap geometry, the equidistant column format must be converted to a cubemap format. The relationship between equidistant columns and cube maps is as follows. In Figure 2A, each face relates to each of the three axes from the center of the sphere to the center of the face. "P" means positive, "N" means negative, PX means the direction from the center of the sphere along the positive X-axis, NX is the opposite direction of PX, and PY, NY, PZ and NZ are similarly marked. Then, there are six faces (PX, NX, PY, NY, PZ, NZ) corresponding to the front, back, top, bottom, right and left sides, respectively, and these faces are indexed from 0 to 5. Let Ps(X_s, Y_s, Z_s) be the point on the sphere with radius 1. Its yaw φ and pitch θ are expressed as follows: X_s = cos(θ)cos(φ) (3) Y_s = sin(θ) (4) Z_s = -cos(θ)sin(φ) (5)

設Pf為延伸從球從中心到Ps的線時的立方體貼圖上的點。在不失一般性的情況下，設Pf在面NZ上。Pf（X_f，Y_f，Z_f）的座標可以計算為： X_f = X_s/|Z_s| (6) Y_f = Y_s/|Z_s| (7) Z_f = -1 (8)Let Pf be the point on the cube map when extending the line from the ball to the center of the ball. In the case of no loss of generality, let Pf be on face NZ. The coordinates of Pf(X_f, Y_f, Z_f) can be calculated as: X_f = X_s/|Z_s| (6) Y_f = Y_s/|Z_s| (7) Z_f = -1 (8)

其中| x |是變數x的絕對值。然後，面NZ的2D平面中的Pf（uc，vc）的座標計算為： uc = W*(1-X_f)/2 (9) vc = H*(1-Y_f)/2 (10)Where | x | is the absolute value of the variable x. Then, the coordinates of Pf(uc,vc) in the 2D plane of the face NZ are calculated as: uc = W*(1-X_f)/2 (9) vc = H*(1-Y_f)/2 (10)

從等式（3）到（10），可以建立立方體貼圖中在特定面上的座標（uc，vc）和球體上的座標（φ，θ）之間的關係。並且從等式（1）和（2）可知等距柱狀點（ue，ve）與球體上的點（φ，θ）之間的關係。因此，可以找到等距柱狀幾何和立方體貼圖幾何體之間的關係。從立方體貼圖到等距柱狀的幾何映射可以總結如下。給定立方體貼圖中一個面上的點（uc，vc），等距柱狀平面上的輸出（ue，ve）可以計算為： 1）根據等式（9）和（10）中的關係，用（uc，vc）計算面上3D點P_f的座標； 2）根據等式（6）、（7）和（8）中的關係，用P_f計算球體上3D點P_s的座標； 3）根據等式（3）、（4）和（5）中的關係，用P_s計算球體上的（φ，θ）； 4）根據等式（1）和（2）中的關係，從（φ，θ）中計算等距柱狀圖像上點（ue，ve）的座標。From equations (3) through (10), the relationship between the coordinates (uc, vc) on a particular face and the coordinates (φ, θ) on the sphere in the cube map can be established. And from the equations (1) and (2), the relationship between the equidistant columnar points (ue, ve) and the points (φ, θ) on the sphere is known. Therefore, you can find the relationship between equidistant column geometry and cubemap geometry. The geometric mapping from cubemap to equidistant column can be summarized as follows. Given a point (uc, vc) on one face in a cube map, the output (ue, ve) on an equidistant columnar plane can be calculated as: 1) according to the relationship in equations (9) and (10), (uc,vc) Calculate the coordinates of the 3D point P_f on the surface; 2) Calculate the coordinates of the 3D point P_s on the sphere by P_f according to the relationship in equations (6), (7) and (8); 3) According to the equation (3), (4) and (5), using P_s to calculate (φ, θ) on the sphere; 4) according to the relationship in equations (1) and (2), from (φ, θ) Calculate the coordinates of the point (ue, ve) on the equidistant columnar image.

為了使用立方體貼圖表示一個2D圖像中的360度視訊，立方體貼圖的6個面可以被封裝成一個矩形區域，這稱為訊框封裝。然後將訊框封裝圖像作為一個正規2D圖像進行處理（例如，編碼）。存在不同的訊框封裝配置，如3x2和4x3。在3x2配置中，6個面被封裝成2列，一個列中有3個面。在4x3配置中，4個面PX、NZ、NX、PZ被封裝在一列（例如，中心列）中，並且面PY和NY被分別封裝在不同的兩列（例如頂部和底部列）中。第2C圖利用與第1C圖中的等距柱狀圖像對應的4×3訊框封裝。In order to represent a 360-degree video in a 2D image using a cube map, the six faces of the cube map can be encapsulated into a rectangular area, which is called a frame encapsulation. The frame encapsulated image is then processed (eg, encoded) as a regular 2D image. There are different frame packing configurations, such as 3x2 and 4x3. In a 3x2 configuration, six faces are packed into two columns, and three faces in one column. In a 4x3 configuration, four faces PX, NZ, NX, PZ are packaged in one column (eg, a center column), and faces PY and NY are packaged in two different columns (eg, top and bottom columns), respectively. Figure 2C uses a 4 x 3 frame encapsulation corresponding to the equidistant columnar image in Figure 1C.

在範例性場景中，採用等距柱狀格式的360度視訊作為輸入，並且希望將輸入轉換為立方體貼圖格式。應用以下步驟： 1）對於立方體貼圖格式的每個樣本位置（uc，vc），藉由上面介紹的方法計算等距柱狀格式的相應座標（ue，ve）。 2）如果如此計算的等距柱狀的座標（ue，ve）不在整數樣本位置，則可以應用內插濾波器，以使用其相鄰整數位置處的樣本獲得在其分數位置處的樣本值。In an exemplary scenario, 360-degree video in an equidistant column format is used as input, and it is desirable to convert the input to a cubemap format. Apply the following steps: 1) For each sample position (uc, vc) of the cubemap format, calculate the corresponding coordinates (ue, ve) of the equidistant column format by the method described above. 2) If the equidistant columnar coordinates (ue, ve) thus calculated are not at the integer sample position, an interpolation filter can be applied to obtain the sample values at its fractional position using the samples at their adjacent integer positions.

第5圖中描繪了用於360度視訊系統的一個工作流程。其包括360度視訊擷取502，例如使用多個相機擷取覆蓋整個球體空間的視訊。然後將這些視訊例如以等距柱狀幾何結構而拼接在一起（504）。等距柱狀幾何結構可以轉換為另一個幾何結構（506），例如立方體貼圖，以用於編碼，例如用現有視訊編解碼器進行編碼。訊框封裝508可以在編碼510之前執行。經編碼的視訊經由例如動態串流或廣播遞送給用戶端。在接收器處，視訊被解碼（512），解壓縮的訊框被解封裝（514）並轉換（516）為顯示幾何（例如，等距柱狀）。然後，其可以用於根據使用者的視角以經由視埠投影呈現518、並顯示在頭戴式顯示器520上。A workflow for a 360 degree video system is depicted in Figure 5. It includes a 360 degree video capture 502, for example using multiple cameras to capture video covering the entire sphere space. These images are then stitched together, for example, in an equidistant columnar geometry (504). The equidistant columnar geometry can be converted to another geometry (506), such as a cube map, for encoding, such as encoding with an existing video codec. Frame encapsulation 508 can be performed prior to encoding 510. The encoded video is delivered to the client via, for example, dynamic streaming or broadcast. At the receiver, the video is decoded (512), the decompressed frame is decapsulated (514) and converted (516) into display geometry (eg, equidistant columnar). It can then be used to render 518 via the viewfinder based on the user's perspective and displayed on the head mounted display 520.

在專業及/或消費者視訊應用中，色度分量通常被子取樣為比亮度分量更小的解析度。色度子取樣減少了要被編碼的視訊資料的量（並因此節省頻寬和計算能力），而不會顯著影響視訊品質。例如，廣泛使用的色度格式之一稱為4：2：0色度格式，其中兩個色度分量被子取樣為亮度解析度的1/4（水平1/2和垂直1/2）。在色度子取樣之後，色度取樣網格可能已經變得不同於亮度取樣網格。在第5圖中，在整個處理流程中，在每個階段處理的360度視訊可以是色度分量已被子取樣的色度格式。In professional and/or consumer video applications, the chrominance component is typically subsampled to a smaller resolution than the luminance component. Chroma subsampling reduces the amount of video data to be encoded (and thus saves bandwidth and computational power) without significantly affecting video quality. For example, one of the widely used chroma formats is called the 4:2:0 chroma format, where two chroma components are subsampled to 1/4 of the luminance resolution (horizontal 1/2 and vertical 1/2). After chroma subsampling, the chroma sampling grid may have become different from the luma sampling grid. In Figure 5, the 360 degree video processed at each stage may be a chroma format in which the chroma components have been subsampled throughout the processing flow.

第6圖是通用的基於塊的混合視訊編碼系統的一個實施方式的方塊圖。輸入視訊訊號102被逐塊處理。在HEVC中，使用擴展塊大小（稱為“編碼單元”或CU）來有效地壓縮高解析度（例如，1080p及更高）視訊訊號。在HEVC中，CU可以達到64x64像素。 CU可以進一步劃分為預測單元或PU，針對其應用單獨的預測方法。對於每個輸入視訊塊（MB或CU），可以執行空間預測（160）及/或時間預測（162）。空間預測（或“訊框內預測”）使用來自相同視訊圖像/切片中的已經編碼的相鄰塊的像素來預測目前視訊塊。空間預測減少視訊訊號中固有的空間冗餘。時間預測（也稱為“訊框間預測”或“運動補償預測”）使用來自已經編碼的視訊圖像的像素來預測目前視訊塊。時間預測減少視訊訊號中固有的時間冗餘。給定視訊塊的時間預測信號通常由表明目前塊與其參考塊之間的運動的量和方向的一個或多個運動向量傳訊。此外，如果支援多個參考圖像（如H.264 / AVC或HEVC等最近的視訊編碼標準的情況），則對於每個視訊塊，也發送其參考圖像索引；並且參考索引用於識別時間預測信號來自參考圖像儲存器（164）中的哪個參考圖像。在空間及/或時間預測之後，編碼器中的模式決定塊（180）例如基於速率失真最佳化方法選擇最佳預測模式。然後從目前視訊塊（116）中減去預測塊；並使用變換（104）和量化（106）將預測殘差去相關，以實現目標位元速率。量化的殘差係數被逆量化（110）和逆變換（112）以形成重建的殘差，然後將其加回到預測塊（126）以形成重建的視訊塊。在重建的視訊塊被放入參考圖像儲存器（164）中並被用於編碼未來的視訊塊進行之前，可以對重建的視訊塊應用（166）進一步的迴路濾波，如解塊濾波器和適應性迴路濾波器。為了形成輸出視訊位元流120，編碼模式（訊框間或訊框內）、預測模式資訊、運動資訊和量化的殘差係數都被發送到熵編碼單元（108），以被進一步壓縮和封裝來形成位元流。Figure 6 is a block diagram of one embodiment of a general block-based hybrid video coding system. The input video signal 102 is processed block by block. In HEVC, an extended block size (referred to as a "coding unit" or CU) is used to effectively compress high resolution (eg, 1080p and higher) video signals. In HEVC, the CU can reach 64x64 pixels. The CU can be further divided into prediction units or PUs for which a separate prediction method is applied. For each input video block (MB or CU), spatial prediction (160) and/or temporal prediction (162) may be performed. Spatial prediction (or "in-frame prediction") uses pixels from neighboring blocks that have been encoded in the same video image/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also known as "inter-frame prediction" or "motion compensated prediction") uses pixels from the already encoded video image to predict the current video block. Time prediction reduces the time redundancy inherent in video signals. The temporal prediction signal for a given video block is typically signaled by one or more motion vectors indicating the amount and direction of motion between the current block and its reference block. In addition, if multiple reference images are supported (as in the case of recent video coding standards such as H.264/AVC or HEVC), their reference image index is also sent for each video block; and the reference index is used to identify the time. The prediction signal is from which reference image in the reference image storage (164). After spatial and/or temporal prediction, the mode decision block (180) in the encoder selects the best prediction mode, for example based on a rate distortion optimization method. The prediction block is then subtracted from the current video block (116); and the prediction residuals are decorrelated using transform (104) and quantization (106) to achieve the target bit rate. The quantized residual coefficients are inverse quantized (110) and inverse transformed (112) to form a reconstructed residual, which is then added back to the prediction block (126) to form a reconstructed video block. Further loop filtering, such as deblocking filters, may be applied (166) to the reconstructed video block before the reconstructed video block is placed in the reference image store (164) and used to encode future video blocks. Adaptive loop filter. To form the output video bitstream 120, the coding mode (inter-frame or intra-frame), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit (108) for further compression and encapsulation. To form a bit stream.

第7圖是基於塊的視訊解碼器的一般方塊圖。視訊位元流202首先在熵解碼單元208處解封裝並進行熵解碼。編碼模式和預測資訊被發送到空間預測單元260（如果訊框內編碼）或時間預測單元262（如果訊框間編碼）以形成預測塊。殘差變換係數被發送到逆量化單元210和逆變換單元212以重建殘差塊。然後，在226處將預測塊和殘差塊相加在一起。重建的塊在被儲存在參考圖像儲存器264中之前可以進一步經過迴路濾波。然後將參考圖像記憶體中的重建的視訊發送出去以驅動顯示裝置，以及用於預測未來的視訊塊。範例性實施方式概述 Figure 7 is a general block diagram of a block based video decoder. Video bitstream 202 is first decapsulated at entropy decoding unit 208 and entropy decoded. The coding mode and prediction information are sent to spatial prediction unit 260 (if intra-frame coding) or temporal prediction unit 262 (if inter-frame coding) to form a prediction block. The residual transform coefficients are sent to inverse quantization unit 210 and inverse transform unit 212 to reconstruct the residual block. The prediction block and the residual block are then added together at 226. The reconstructed block may be further loop filtered before being stored in the reference image store 264. The reconstructed video in the reference image memory is then sent out to drive the display device and to predict future video blocks. Overview of an exemplary implementation

可以將360度視訊資料投影到2D平面上，以使用傳統2D平面視訊編碼對資訊進行編碼。由於可以使用許多種幾何投影來表示360度資料，並且可以將投影資料封裝成不同的配置，這會導致各種問題。The 360 degree video data can be projected onto the 2D plane to encode the information using conventional 2D planar video coding. Since many kinds of geometric projections can be used to represent 360 degree data, and the projection data can be packaged into different configurations, this causes various problems.

一個問題是，為了能夠從解碼的2D平面視訊中適當地重建360視訊，幾何和訊框封裝參數應該可用於解碼器來解封裝資料並將其從2D空間投射回3D空間。例如，立方體貼圖格式可以使用具有不同的面順序、不同的面旋轉或不同的面尺寸的不同的佈置來表示，例如3x2、4x3、1x6或6x1。另外，如果在接收器側使用與編碼格式不同的格式，則還需要將幾何和訊框封裝參數以將編碼格式轉換為所需格式。例如，如果編碼格式是立方體貼圖，但是顯示格式是等距柱狀，則必須進行轉換。實際上，當檔案格式多工器多工這些基本的流時，從視訊本身擷取這些訊框封裝佈置資訊更好，而不是依賴外部中繼資料。One problem is that in order to properly reconstruct 360 video from decoded 2D planar video, geometry and frame encapsulation parameters should be available to the decoder to decapsulate the data and project it back into the 3D space from 2D space. For example, the cubemap format can be represented using different arrangements with different face orders, different face rotations, or different face sizes, such as 3x2, 4x3, 1x6, or 6x1. In addition, if a format different from the encoding format is used on the receiver side, it is also necessary to convert the geometry and frame encapsulation parameters to convert the encoding format to the desired format. For example, if the encoding format is a cubemap, but the display format is an equidistant column, you must convert it. In fact, when the file format multiplexer multiplexes these basic streams, it is better to capture the information from the video itself rather than relying on external relay data.

另一個問題是，對於一些訊框封裝配置，填充展開的面以使得所得到的訊框封裝圖像形成矩形區域，對於儲存或壓縮目的可能是有益的。例如，在立方體貼圖4x3格式中，必須在右上和右下邊緣添加附加像素（見第2B圖及第2C圖）。編碼這些附加像素會消耗位元，但不會傳達任何有意義的資訊。因此，如果編碼器跳過這些像素，則可以實現位元速率節省。在這種情況下，應該向解碼器傳訊緊湊配置被用於360度視訊的正確重建。此外，與傳統的2D平面視訊不同，在重播視訊的任何時候，只有一部分360視訊（例如，視埠）被呈現並顯示給使用者（見第5圖）。統計顯示，赤道周圍的觀察概率通常高於極點周圍，靠近前視圖的觀察概率高於靠近後視圖。因此，識別投影格式的資訊將允許編碼器在投影的2D視訊中識別這些區域（即赤道vs極點和前面vs後面），並應用不同的編碼策略（例如，對與赤道及/或前面的區域相對應的區域花費更多的位元及/或應用更複雜的最佳化策略，並且對與極點及/或背面的區域相對應的區域花費較少的位元及/或應用更簡單的最佳化策略）來利用使用者的觀看行為以更智慧的方式分配位元及/或計算資源。Another problem is that for some frame packing configurations, the unfolded faces are filled so that the resulting frame-packaged image forms a rectangular area, which may be beneficial for storage or compression purposes. For example, in the cubemap 4x3 format, additional pixels must be added to the top right and bottom right edges (see Figures 2B and 2C). Encoding these additional pixels consumes bits, but does not convey any meaningful information. Therefore, if the encoder skips these pixels, bit rate savings can be achieved. In this case, the compact configuration should be communicated to the decoder for proper reconstruction of the 360-degree video. In addition, unlike conventional 2D planar video, only a portion of 360 video (eg, video) is presented and displayed to the user at any time during replay of the video (see Figure 5). Statistics show that the observation probability around the equator is usually higher than the pole point, and the observation probability near the front view is higher than that near the back view. Therefore, identifying the information in the projection format will allow the encoder to identify these regions in the projected 2D video (ie, the equator vs pole and the front vs) and apply different coding strategies (eg, to the equator and/or the front region). Corresponding regions cost more bits and/or apply more complex optimization strategies, and spend less bits and/or applications for regions that correspond to poles and/or back regions. The strategy is to use the user's viewing behavior to allocate bits and/or computing resources in a smarter way.

另一個問題是例如MPEG-2，H.264和HEVC之類的現有的編解碼器是專為傳統2D視訊設計的，不考慮360度資料表示的任何屬性。為了獲得更好的壓縮效率，先進的360視訊編碼工具可以利用完整的3D表示，但是由於在投影的2D平面視訊上執行編碼，這些工具可能會受益於關於幾何和訊框封裝的資訊。因此，關於幾何和訊框封裝參數的資訊可以對編碼器和解碼器都可用，以便能夠適當且更有效地編碼和解碼360視訊。例如，在立方體貼圖格式中，展開的面只有2D平面視訊上幾個正確定位的相鄰面，這限制了編解碼器利用相鄰面之間的冗餘資訊的能力。然而，如果編解碼器具有關於3D表示的資訊，其中立方體的每個面恰好具有4個相鄰面，則可以利用更多的冗餘資訊來減少必須被編碼的資料量。Another problem is that existing codecs such as MPEG-2, H.264 and HEVC are designed for traditional 2D video, regardless of any attributes of the 360 degree data representation. For better compression efficiency, advanced 360 video encoding tools can take advantage of the full 3D representation, but due to the encoding performed on the projected 2D planar video, these tools may benefit from information about geometry and frame packaging. Thus, information about geometry and frame packing parameters can be available to both the encoder and the decoder to enable proper and more efficient encoding and decoding of 360 video. For example, in a cubemap format, the expanded face has only a few correctly positioned adjacent faces on the 2D planar video, which limits the ability of the codec to utilize redundant information between adjacent faces. However, if the codec has information about the 3D representation where each face of the cube has exactly 4 adjacent faces, then more redundant information can be utilized to reduce the amount of data that must be encoded.

另外一個問題是幾何和訊框封裝參數可以在360度視訊的持續時間內變化。因此，如果幾何和訊框封裝參數隨著時間而變化，則這些參數應該對用於360視訊的每一訊框的編碼器和解碼器都可用。例如，編碼格式可以在特定時刻從立方體貼圖改變到等距柱狀，以實現更好的壓縮性能，或者特定的立方體貼圖面的集合的尺寸可以改變以適應特定視訊段中較低或更高頻寬要求。Another problem is that the geometry and frame encapsulation parameters can vary over the duration of the 360-degree video. Therefore, if the geometry and frame encapsulation parameters change over time, these parameters should be available to both the encoder and the decoder for each frame of 360 video. For example, the encoding format can be changed from a cubemap to an equidistant column at a particular moment to achieve better compression performance, or the size of a particular cubemap surface collection can be changed to accommodate lower or higher bandwidth requirements in a particular video segment. .

本文揭露的系統和方法解決了這些問題和其他問題。The systems and methods disclosed herein address these and other issues.

在一些實施方式中，藉由用附加的高階語法元素來傳訊位元流中的幾何和訊框封裝參數，以解決針對360度視訊編碼的上述問題中的一個或多個問題。特別地，可以指定投影幾何類型，包括用於幾何面的不同參數，以將其定位在2D平面視訊上。360視訊參數可以在不同等級被傳訊。下面的一部分描述了如何在視訊等級（例如，視訊參數集或VPS等級）儲存投影格式參數，以最小化當不同層及/或序列及/或圖像使用相同的投影格式時必須傳輸的資訊量。下面的另一部分介紹如何在序列等級（例如，序列參數集或SPS等級）傳訊投影格式，允許相同視訊的不同序列使用不同的投影格式，或改變與給定投影格式相關的參數。下面的另一部分介紹如何在圖像等級（例如，圖像參數集或PPS等級）傳訊投影格式，允許相同序列的不同圖像使用不同投影格式，或改變與給定投影格式相關的參數。本文揭露的系統和方法的另一方面是使得能夠以不同的品質因數編碼不同的幾何面。例如，在立方體貼圖格式中，可以用更高的品質對前、後、左和右面進行編碼，而頂面和底面可以用較低品質進行編碼。這是因為觀看者更有可能觀看地平線附近的區域，而不是靠近兩極的區域。以這種方式，可以更有效地編碼360視訊。In some embodiments, one or more of the above problems with 360 degree video coding are addressed by interfacing the geometry and frame encapsulation parameters in the bitstream with additional higher order syntax elements. In particular, the projection geometry type can be specified, including different parameters for the geometric face, to be positioned on the 2D planar video. 360 video parameters can be signaled at different levels. The following sections describe how to store projection format parameters at the video level (for example, video parameter set or VPS level) to minimize the amount of information that must be transmitted when different layers and/or sequences and/or images use the same projection format. . The other section below describes how to route a projection format at a sequence level (for example, a sequence parameter set or an SPS level), allowing different sequences of the same video to use different projection formats, or changing parameters associated with a given projection format. The other section below describes how to communicate a projection format at an image level (eg, image parameter set or PPS level), allowing different images of the same sequence to use different projection formats, or changing parameters associated with a given projection format. Another aspect of the systems and methods disclosed herein is to enable different geometric faces to be encoded with different quality factors. For example, in the cubemap format, the front, back, left, and right sides can be encoded with higher quality, while the top and bottom surfaces can be encoded with lower quality. This is because the viewer is more likely to see the area near the horizon than the area near the poles. In this way, 360 video can be encoded more efficiently.

在一些實施方式中，介紹了指定幾何坐標系相對於絕對座標系的旋轉的系統和方法。可以使用這些系統和方法來旋轉3D幾何，使得感興趣的物件或區域被投影到可以用更高品質編碼的面或面的集合中。類似地，如果感興趣的物體或區域在幾個面上分割，這可以減少每個面內的冗餘，則幾何旋轉可以用於限定不同的方位，使得一個或多個重要物體可以放置在一個面內，使得可以實現更好的壓縮效率。在某些情況下，如果這是不可能的，例如，如果物件是大的及/或足夠近的，其在水平和垂直方向中的一個或兩個中跨過90度，那麼這些面可以被旋轉，盡可能地將重要物體的部分放置在一個面內。由於3D幾何的固有特性，當物體跨越多於一個面時，其幾何結構在從一個面轉換到另一個面時會“失真”，從而降低了相關性和編碼效率。能夠指定投影方向以使得一個面內的物體連續性最大化，可以提高編碼效率。在視訊級的 360 度視訊屬性傳訊 In some embodiments, systems and methods are described for specifying the rotation of a geometric coordinate system relative to an absolute coordinate system. These systems and methods can be used to rotate the 3D geometry such that objects or regions of interest are projected into a collection of faces or faces that can be encoded with higher quality. Similarly, if the object or region of interest is split on several faces, which can reduce redundancy within each face, the geometric rotation can be used to define different orientations so that one or more important objects can be placed in one In-plane, making it possible to achieve better compression efficiency. In some cases, if this is not possible, for example, if the object is large and/or close enough to span 90 degrees in one or both of the horizontal and vertical directions, then these faces can be Rotate to place parts of important objects in one plane as much as possible. Due to the inherent nature of 3D geometry, when an object spans more than one face, its geometry "distorts" as it transitions from one face to the other, reducing correlation and coding efficiency. The ability to specify the projection direction to maximize the continuity of objects within a plane can improve coding efficiency. 360- degree video attribute messaging at the video level

不同的投影幾何具有不同的特徵。例如，只有一個面用於等距柱狀投影和等面積投影。沒有面邊界的問題，雖然圖像被拉伸。立方體貼圖具有六個面，並且在訊框封裝圖像中具有許多面邊界。每個圖像可以用不同的投影幾何被編碼、或用相同的幾何被編碼，但是具有不同的面佈局、尺寸或品質。為此，如表1所示，在一些實施方式中可以引入新的參數集，用於360視訊。表1 視訊參數集RBSP Different projection geometries have different characteristics. For example, only one face is used for equidistant columnar projection and equal area projection. There is no problem with the face boundary, although the image is stretched. The cube map has six faces and has many face boundaries in the frame package image. Each image can be encoded with a different projection geometry, or encoded with the same geometry, but with a different face layout, size or quality. To this end, as shown in Table 1, in some embodiments a new set of parameters can be introduced for 360 video. Table 1 Video Parameter Set RBSP

在範例性實施方式中，旗標vps_360_extension_flag可以具有以下語義。In an exemplary embodiment, the flag vps_360_extension_flag may have the following semantics.

vps_360_extension_flag：指定視訊是否為360度視訊，在這種情況下，可以使用用於360視訊的有效表示和壓縮的特定參數和工具。當不存在時，vps_360_extension_flag的值可以推斷為等於0。Vps_360_extension_flag: Specifies whether the video is 360-degree video. In this case, specific parameters and tools for efficient representation and compression of 360 video can be used. When not present, the value of vps_360_extension_flag can be inferred to be equal to zero.

在視訊等級，根據表2，在一些實施方式中可以傳訊不同序列及/或層中使用的投影格式的總數。表2. 視訊參數集360擴展語法 At the video level, according to Table 2, in some embodiments the total number of projection formats used in different sequences and/or layers can be communicated. Table 2. Video Parameter Set 360 Extended Syntax

在範例性實施方式中，表2的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 2 may have the following semantics.

vps_num_360_formats_minus1：指定在不同序列及/或層中使用的投影格式的數量（減1）。當不存在時，vps_num_360_formats_minus1的值可以推斷為等於0，表示僅使用一種投影格式。Vps_num_360_formats_minus1: Specifies the number of projection formats used in different sequences and/or layers (minus 1). When not present, the value of vps_num_360_formats_minus1 can be inferred to be equal to 0, indicating that only one projection format is used.

360_format_idx_present_flag：指定語法元素vps_360_format_idx[i]是否存在。當不存在時，可以推斷360_format_idx_present_flag的值等於0。360_format_idx_present_flag: Specifies whether the syntax element vps_360_format_idx[i] exists. When not present, it can be inferred that the value of 360_format_idx_present_flag is equal to zero.

vps_360_format_idx [i]：指定VPS中360_format（）語法結構列表中的適用於nuh_layer_id等於layer_id_in_nuh [i]的層的360_format（）語法結構的索引。當不存在時，vps_rep_format_idx[i]的值可以推斷為等於Min（i，vps_num_rep_formats_minus1）。Vps_360_format_idx [i]: Specifies an index of the 360_format() syntax structure applicable to the layer of nuh_layer_id equal to layer_id_in_nuh [i] in the 360_format() syntax structure list in the VPS. When not present, the value of vps_rep_format_idx[i] can be inferred to be equal to Min(i, vps_num_rep_formats_minus1).

對於這種提出的語法結構，在多層視訊流中，每層的投影格式可能不同。例如，可以使用速率失真最佳化以在編碼器處確定每層的投影格式。編碼器可以用所有可用投影格式對目前層進行編碼，並隨後測量速率失真成本。如果目前層是增強層，則可以在同一層中不僅使用訊框內和訊框間預測進行編碼，而且可以使用與相同或不同的投影格式的另一層（例如，參考層）的層間預測。當來自參考層的投影格式與目前層的投影格式不同時，層間預測處理還可以包括投影格式轉換。最後，可以選擇導致最小速率失真成本的投影格式進行最終編碼。For this proposed grammatical structure, the projection format of each layer may be different in a multi-layer video stream. For example, rate distortion optimization can be used to determine the projection format of each layer at the encoder. The encoder can encode the current layer with all available projection formats and then measure the rate distortion cost. If the current layer is an enhancement layer, not only intra-frame and inter-frame prediction may be used for encoding in the same layer, but inter-layer prediction of another layer (eg, reference layer) of the same or different projection format may be used. When the projection format from the reference layer is different from the projection format of the current layer, the inter-layer prediction process may also include a projection format conversion. Finally, the final encoding can be done by selecting the projection format that results in the lowest rate distortion cost.

在一些實施方式中，可以根據表3來傳訊每個投影格式和相關參數的屬性。表3. 360度表示格式語法 In some embodiments, the attributes of each projection format and associated parameters can be communicated according to Table 3. Table 3. 360 degree representation format syntax

在範例性實施方式中，表3的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 3 may have the following semantics.

projection_geometry：指定使用的投影幾何的表4中的映射索引。Projection_geometry: Specifies the map index in Table 4 of the projection geometry used.

geometry_rotation_param_present_flag：指定語法元素geometry_rotation_yaw、geometry_rotation_pitch和geometry_rotation_roll是否存在。當不存在時，geometry_rotation_param_present_flag的值可以被推斷為等於0。Geometry_rotation_param_present_flag: Specifies whether the syntax elements geometry_rotation_yaw, geometry_rotation_pitch, and geometry_rotation_roll exist. When not present, the value of geometry_rotation_param_present_flag can be inferred to be equal to zero.

geometry_rotation_yaw：指定相對於絕對座標系圍繞幾何坐標系的Y軸（參見第2A圖）的旋轉。當不存在時，geometry_rotation_yaw的值可以被推斷為等於0。Geometry_rotation_yaw: Specifies the rotation around the Y coordinate of the geometric coordinate system relative to the absolute coordinate system (see Figure 2A). When not present, the value of geometry_rotation_yaw can be inferred to be equal to zero.

geometry_rotation_pitch：指定相對於絕對座標系圍繞幾何坐標系的Z軸（參見第2A圖）的旋轉。當不存在時， geometry_rotation_pitch的值可以被推斷為等於0。Geometry_rotation_pitch: Specifies the rotation around the Z coordinate of the geometric coordinate system relative to the absolute coordinate system (see Figure 2A). When not present, the value of geometry_rotation_pitch can be inferred to be equal to zero.

geometry_rotation_roll：指定相對於絕對座標系圍繞幾何坐標系的X軸（參見第2A圖）旋轉。當不存在時，geometry_rotation_roll的值可以被推斷為等於0。Geometry_rotation_roll: Specifies the rotation of the X-axis (see Figure 2A) around the geometric coordinate system relative to the absolute coordinate system. When not present, the value of geometry_rotation_roll can be inferred to be equal to zero.

compact_representation_enabled_flag：指定用於將訊框封裝圖像填充到矩形圖像中的樣本或塊是否被編碼器跳過。當不存在時，compact_representation_enabled_flag的值可以被推斷為等於0。Compact_representation_enabled_flag: Specifies whether the sample or block used to fill the frame encapsulation image into the rectangular image is skipped by the encoder. When not present, the value of compact_representation_enabled_flag can be inferred to be equal to zero.

loop_filter_across_faces_enabled_flag：指定是否可以跨面邊界執行迴路濾波操作。當不存在時，可以推斷出loop_filter_across_faces_enabled_flag的值等於1。Loop_filter_across_faces_enabled_flag: Specifies whether loop filtering can be performed across the boundary. When not present, it can be inferred that the value of loop_filter_across_faces_enabled_flag is equal to one.

num_face_rows：指定在訊框封裝圖像中的面列數。當不存在時，num_face_rows的值可以被推斷為等於1。Num_face_rows: Specifies the number of faces in the frame wrapper image. When not present, the value of num_face_rows can be inferred to be equal to one.

num_face_columns：指定在訊框封裝圖像中的面行數。當不存在時， num_face_columns的值可以被推斷為等於1。Num_face_columns: Specifies the number of polygon lines in the frame encapsulation image. When not present, the value of num_face_columns can be inferred to be equal to one.

注意，可以傳訊num_face_rows_minus1和num_face_columns_minus1而不是傳訊num_face_rows和num_face_columns，以減少編碼這些語法元素所需的位元數。Note that num_face_rows_minus1 and num_face_columns_minus1 can be signaled instead of num_face_rows and num_face_columns to reduce the number of bits needed to encode these syntax elements.

equal_face_size_flag：指定是否所有面共用相同大小（寬度和高度相同）。當不存在時，equal_face_size_flag的值可以被推斷為等於0。當equal_face_size_flag被設定為1時，可以基於投影幾何推斷訊框封裝圖像中所有面的寬度和高度。例如，對於立方體貼圖投影，可以推斷訊框封裝圖像中所有面的亮度樣本中的寬度等於pic_width_in_luma_samples / num_face_columns，而訊框封裝圖像中所有面的亮度樣本的高度可被推斷為等於pic_height_in_luma_samples / num_face_rows。注意，訊框封裝圖像中所有面的亮度樣本的寬度和高度不應等於0，並且應為MinCbSizeY的整數倍。Equal_face_size_flag: Specifies whether all faces share the same size (width and height are the same). When not present, the value of equal_face_size_flag can be inferred to be equal to zero. When equal_face_size_flag is set to 1, the width and height of all faces in the frame encapsulation image can be inferred based on the projection geometry. For example, for a cubemap projection, it can be inferred that the width in the luma samples of all faces in the frame encapsulation image is equal to pic_width_in_luma_samples / num_face_columns, and the height of the luma samples of all faces in the frame encapsulation image can be inferred to be equal to pic_height_in_luma_samples / num_face_rows . Note that the width and height of the luma samples for all faces in the frame encapsulation image should not be equal to 0 and should be an integer multiple of MinCbSizeY.

face_qp_offset_enabled_flag指定是否為不同面使用不同QP。當不存在時，可以推斷face_qp_offset_enabled_flag的值等於0。Face_qp_offset_enabled_flag specifies whether different QPs are used for different faces. When not present, it can be inferred that the value of face_qp_offset_enabled_flag is equal to zero.

face_idx [i] [j]：指定位於訊框封裝圖像中的第i列和第j行的面的索引。對於只有單一面的簡單幾何，如等距柱狀或等面積，唯一的面是面＃0。對於其他幾何，可以使用面的預設編號和定位，如針對立方體貼圖和八面體幾何而在表5所示的。Face_idx [i] [j]: Specifies the index of the faces of the i-th and j-th rows in the frame encapsulation image. For simple geometries with only a single face, such as equidistant columns or equal areas, the only face is face #0. For other geometries, you can use the preset number and positioning of the faces, as shown in Table 5 for cube and octahedral geometry.

face_width_in_luma_samples [i] [j]：指定位於訊框封裝圖像中第i列和第j行的面的亮度樣本的寬度。可以採用技術來防止關於訊框封裝圖像寬度的歧義。例如，可以強制地設定沿著每列的不同面寬的總和等於訊框封裝的圖像寬度。face_width_in_luma_samples [i] [j]不得等於0，並且應為MinCbSizeY的整數倍。Face_width_in_luma_samples [i] [j]: Specifies the width of the luma samples of the faces of the i-th and j-th rows in the frame encapsulation image. Techniques can be employed to prevent ambiguity regarding the width of the frame package image. For example, the sum of the different face widths along each column can be forcibly set equal to the image width of the frame package. Face_width_in_luma_samples [i] [j] must not be equal to 0 and should be an integer multiple of MinCbSizeY.

face_height_in_luma_samples [i] [j]：指定位於訊框封裝圖像中第i列和第j行的面的亮度樣本的高度。可以採用技術來防止關於訊框封裝圖像高度的歧義。例如，可以強制的設定沿著每行的不同面的高度的總和等於訊框封裝圖像高度。face_height_in_luma_samples [i] [j]不得等於0，並且應為MinCbSizeY的整數倍。Face_height_in_luma_samples [i] [j]: Specifies the height of the luma samples of the faces of the i-th and j-th rows in the frame encapsulation image. Techniques can be employed to prevent ambiguity regarding the height of the frame package image. For example, the sum of the heights along different faces of each row can be forced to be equal to the frame package image height. Face_height_in_luma_samples [i] [j] must not be equal to 0 and should be an integer multiple of MinCbSizeY.

face_rotation_idc [i] [j]：指定位於訊框封裝圖像中第i列和第j行的面的圖像坐標系和面坐標系之間的旋轉的表6中的映射索引。當不存在時，face_rotation_idc [i] [j]的值可以被推斷為等於0。Face_rotation_idc [i] [j]: A mapping index in Table 6 that specifies the rotation between the image coordinate system and the area coordinate system of the faces of the i-th column and the j-th row in the frame encapsulation image. When not present, the value of face_rotation_idc [i] [j] can be inferred to be equal to zero.

face_rotation [i] [j]：指定位於訊框封裝圖像中第i列和第j行的面的圖像坐標系和面坐標系之間的旋轉度。Face_rotation [i] [j]: Specifies the degree of rotation between the image coordinate system and the area coordinate system of the faces of the i-th column and the j-th row in the frame encapsulation image.

face_vertical_flip_flag [i] [j]：指定位於訊框封裝圖像中位於第i列和第j行的面是否在旋轉後被垂直翻轉。當不存在時，face_vertical_flip_flag [i] [j]的值可以推斷為等於0。Face_vertical_flip_flag [i] [j]: Specifies whether the faces in the i-th column and the j-th row in the frame encapsulation image are vertically flipped after being rotated. When not present, the value of face_vertical_flip_flag [i] [j] can be inferred to be equal to zero.

face_qp_offset [i] [j]：當確定位於訊框封裝圖像中的第i列和第j行的面的QP值時，指定要添加到序列等級QP的差異。表4.投影幾何索引表5.預設面定義表6.旋轉索引 Face_qp_offset [i] [j]: Specifies the difference to be added to the sequence level QP when determining the QP value of the faces of the i-th column and the j-th row in the frame encapsulation image. Table 4. Projection Geometry Index Table 5. Preset surface definitions Table 6. Rotation index

將訊框封裝圖像考慮作為面網格，這些參數可用於幾何格式的非常靈活而強大的傳訊。對於導致單面（例如等距柱狀、等面積或圓柱體）的投影幾何，可以從幾何和圖像大小推斷參數num_face_rows、num_face_columns、face_idx，face_width_in_luma_samples、face_height_in_luma_samples和face_rotation。然而，對於諸如立方體貼圖，八面體或二十面體之類的其他幾何，最好指定這些參數，因為面可以用不同方式佈置或具有不同大小。例如，如第9A-9C圖所示，相同的立方體貼圖投影可以用不同方式被封裝，例如（a）3×4網格（第9A圖）或（b）2x3網格（第9B圖）。在3x4網格的情況下，可以將face_idx設定為高於實際面數的值，這可以從幾何推斷出來，以表明網格中不包含實際面的位置。例如，我們可以設定參數如下： projection_geometry = 1 // 立方體貼圖 face_idx[0][0] = 2 // 面 #2 face_idx[0][1] = 6 // 無效面 face_idx[0][2] = 7 // 無效面 face_idx[0][3] = 8 // 無效面 face_idx[1][0] = 1 // 面 #1 face_idx[1][1] = 4 // 面 #4 face_idx[1][2] = 0 // 面 #0 face_idx[1][3] = 5 // 面 #5 face_idx[2][0] = 3 // 面 #3 face_idx[2][1] = 9 //無效面 face_idx[2][2] = 10 //無效面 face_idx[2][3] = 11 //無效面Considering the frame-packaged image as a polygon mesh, these parameters can be used for very flexible and powerful messaging in geometric formats. For projection geometries that result in one-sided (eg, equidistant columns, equal areas, or cylinders), the parameters num_face_rows, num_face_columns, face_idx, face_width_in_luma_samples, face_height_in_luma_samples, and face_rotation can be inferred from geometry and image size. However, for other geometries such as cubemaps, octahedrons or icosahedrons, it is best to specify these parameters because the faces can be arranged differently or have different sizes. For example, as shown in Figures 9A-9C, the same cubemap projection can be packaged in different ways, such as (a) a 3x4 grid (Fig. 9A) or (b) a 2x3 grid (Fig. 9B). In the case of a 3x4 grid, face_idx can be set to a value higher than the actual number of faces, which can be inferred from the geometry to indicate that the position of the actual face is not included in the mesh. For example, we can set the parameters as follows: projection_geometry = 1 // cubemap face_idx[0][0] = 2 // face #2 face_idx[0][1] = 6 // invalid face_idx[0][2] = 7 // Invalid face_idx[0][3] = 8 // Invalid face_idx[1][0] = 1 // Face #1 face_idx[1][1] = 4 // Face #4 face_idx[1] [2] = 0 // face #0 face_idx[1][3] = 5 // face #5 face_idx[2][0] = 3 // face #3 face_idx[2][1] = 9 //invalid Face_idx[2][2] = 10 //invalid face_idx[2][3] = 11 //invalid face

為了在某些方向提供更好的細節，某些面可以用較高解析度被編碼。這是因為與其他區域相比觀看者更有可能觀看某些區域，特別是靠近前面方向的區域。以這種方式，可以更有效地編碼360度視訊。為此，可以使用face_width_in_luma_samples和face_height_in_luma_samples參數為不同的面指定不同的大小。例如，在立方體貼圖格式中，前面可以用比其他面高的解析度被編碼，如第9C圖所示，我們可以設定參數如下： projection_geometry = 1 In order to provide better detail in certain directions, certain faces can be encoded with higher resolution. This is because viewers are more likely to view certain areas than other areas, especially those near the front. In this way, 360 degree video can be encoded more efficiently. To do this, you can use the face_width_in_luma_samples and face_height_in_luma_samples parameters to specify different sizes for different faces. For example, in the cubemap format, the front can be encoded with a higher resolution than the other faces. As shown in Figure 9C, we can set the parameters as follows: projection_geometry = 1

其中W是亮度樣本中除了面0（前面）之外的所有其他面的面寬，H是亮度樣本中除面0（前面）之外的所有其他面的面高度。Where W is the face width of all other faces except the face 0 (front) in the luma sample, and H is the face height of all other faces except the face 0 (front) in the luma sample.

從這些參數可以推斷出，前面跨越4個網格位置，因為其大小是其他面的兩倍，可以正確擷取資訊。It can be inferred from these parameters that the front spans 4 grid positions, because the size is twice that of other faces, and the information can be correctly retrieved.

面可以佈置為具有不同的方位。例如，如立方體貼圖投影所示，當與第9A圖的3x4網格相較時，面“2”、“1”和“3”在第9B圖的2x3網格中逆時針旋轉90度。face_rottion_idc參數可用於指定面坐標系和訊框封裝圖像坐標系之間的旋轉。The faces can be arranged to have different orientations. For example, as shown by the cube map projection, when compared to the 3x4 grid of Figure 9A, the faces "2", "1", and "3" are rotated 90 degrees counterclockwise in the 2x3 grid of Figure 9B. The face_rottion_idc parameter can be used to specify the rotation between the area coordinate system and the frame-packaged image coordinate system.

網格系統也可用於具有非正方形面（例如三角形面）的幾何，如第11圖及第12圖所示分別為八面體和二十面體。因為一些三角形面被分成兩部分以用於緊湊表示（參見第11B圖和第12B圖），可以使用兩個直角三角形而不是一個等腰或等邊三角形來定義一個三角形面。基本的直角三角形可以被定義為如第10A圖所示。由於旋轉不足以使用兩個直角三角形構造一個等腰三角形或等邊三角形，所以旋轉可以與垂直翻轉（或在一些實施方式中，水平翻轉）組合。經由這種表示，相同的語法可以用於具有很大靈活性的緊湊和非緊湊表示。例如，為了傳訊第11B圖所示的緊湊八面體，可以如下設定參數: Grid systems can also be used for geometries with non-square faces (eg, triangular faces), as shown in Figures 11 and 12, respectively, as octahedrons and icosahedrons. Since some triangular faces are divided into two parts for compact representation (see Figures 11B and 12B), two right triangles can be used instead of one isosceles or an equilateral triangle to define a triangular face. The basic right triangle can be defined as shown in Figure 10A. Since the rotation is not sufficient to construct an isosceles triangle or an equilateral triangle using two right triangles, the rotation can be combined with a vertical flip (or, in some embodiments, a horizontal flip). Through this representation, the same syntax can be used for compact and non-compact representations with great flexibility. For example, to signal the compact octahedron shown in Figure 11B, the parameters can be set as follows:

face_qp_delta參數可用於指定特定面是否以較高或較低品質被編碼。例如，藉由在切片或編碼單元等級調整品質可以獲得類似的結果。然而，切片可以覆蓋幾個面，並且面很可能包含多個編碼單元，因此直接為每個面傳訊品質差異可能是更有效地。The face_qp_delta parameter can be used to specify whether a particular face is encoded with a higher or lower quality. For example, similar results can be obtained by adjusting the quality at the slice or coding unit level. However, a slice can cover several faces, and the face is likely to contain multiple coding units, so it may be more efficient to directly communicate the quality difference for each face.

對於由沿著每行具有相同寬度（但不同行不同寬度）且沿著每列的相同高度（但不同列不同高度）的面組成的正規訊框封裝網格，也可以使用較少的參數來傳訊面屬性，如表7所示。表7. 360度表示格式替代語法 For regular frame encapsulation grids consisting of faces that have the same width (but different rows of different widths) along each row and along the same height of each column (but different columns of different heights), fewer parameters can be used. The communication plane properties are shown in Table 7. Table 7. 360 degree representation format alternative syntax

在範例性實施方式中，表7的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 7 may have the following semantics.

num_face_rows_minus1：指定訊框封裝圖像中的面列數（減1）。當不存在時，可以推斷num_face_rows_minus1的值等於0。Num_face_rows_minus1: Specifies the number of facets (minus 1) in the frame encapsulation image. When not present, it can be inferred that the value of num_face_rows_minus1 is equal to zero.

num_face_columns_minus1：指定訊框封裝圖像中的面行數（減1）。當不存在時，可以推斷num_face_columns_minus1的值等於0。Num_face_columns_minus1: Specifies the number of face lines (minus 1) in the frame encapsulation image. When not present, it can be inferred that the value of num_face_columns_minus1 is equal to zero.

row_height_in_luma_samples[i]：指定位於訊框封裝圖像中第i列的面的亮度樣本中的高度。對於最後一列，高度可以推斷為等於pic_height_in_luma_samples -。 row_height_in_luma_samples[i]不得等於0，並且應為MinCbSizeY的整數倍。Row_height_in_luma_samples[i]: Specifies the height in the luma samples of the face of the i-th column in the frame encapsulation image. For the last column, the height can be inferred to be equal to pic_height_in_luma_samples - . Row_height_in_luma_samples[i] shall not be equal to 0 and shall be an integer multiple of MinCbSizeY.

column_width_in_luma_samples[j]：指定位於訊框封裝圖像中第j行的面的亮度樣本中的寬度。對於最後一行，寬度可以推斷為等於pic_width_in_luma_samples -。 column_width_in_luma_samples[j]不得等於0，並且應為MinCbSizeY的整數倍。Column_width_in_luma_samples[j]: Specifies the width in the luma sample of the face of the jth line in the frame encapsulation image. For the last line, the width can be inferred to be equal to pic_width_in_luma_samples - . Column_width_in_luma_samples[j] must not be equal to 0 and should be an integer multiple of MinCbSizeY.

面屬性也可以在用於不規則面形狀的面索引順序中傳訊。表8顯示了一個例子。表8. 360度表示格式替代語法 The face attributes can also be routed in the face index order for irregular face shapes. Table 8 shows an example. Table 8. 360 degree representation format alternative syntax

在範例性實施方式中，表8的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 8 may have the following semantics.

num_faces：指定訊框封裝圖像中的面數。當不存在時，可以推斷num_faces的值等於1。Num_faces: Specifies the number of faces in the frame encapsulation image. When not present, it can be inferred that the value of num_faces is equal to one.

注意，代替信號num_faces，可以傳訊num_faces_minus1以減少編碼該語法元素所需的位元數。Note that instead of the signal num_faces, num_faces_minus1 can be signaled to reduce the number of bits needed to encode the syntax element.

num_face_vertices[i]：指定第i個面的頂點數。當不存在時，num_face_vertices[i]的值可以被推斷為等於4，因為四邊形是最常見的面多邊形類型。Num_face_vertices[i]: Specifies the number of vertices on the i-th face. When not present, the value of num_face_vertices[i] can be inferred to be equal to 4 because the quadrilateral is the most common polygon type.

vertex_2D_pos_x[i][j]：指定第i個面的第j個頂點的訊框封裝圖像中的x座標。vertex_2D_pos_x[i][j]: specifies the x coordinate in the frame encapsulation image of the jth vertex of the i-th face.

vertex_2D_pos_y[i][j]：指定第i個面的第j個頂點的訊框封裝圖像中的y座標。Vertex_2D_pos_y[i][j]: Specifies the y coordinate in the frame encapsulation image of the jth vertex of the i-th face.

vertex_3D_pos_x[i][j]：指定第i個面的第j個頂點的3D坐標系中的x座標。vertex_3D_pos_x[i][j]: specifies the x coordinate in the 3D coordinate system of the jth vertex of the i-th face.

vertex_3D_pos_y[i][j]：指定第i個面的第j個頂點的3D坐標系中的y座標。vertex_3D_pos_y[i][j]: specifies the y coordinate in the 3D coordinate system of the jth vertex of the i-th face.

vertex_3D_pos_z[i][j]：指定第i個面的第j個頂點的3D坐標系中的z座標。vertex_3D_pos_z[i][j]: specifies the z coordinate in the 3D coordinate system of the jth vertex of the i-th face.

可以使用vertex_3D_pos_x[i][j]、vertex_3D_pos_y[i][j]和vertex_3D_pos_z[i][j]參數來定義3D空間中使用者指定的基於多邊形的幾何。這些參數可用於將樣本從訊框封裝圖像中的其位置映射到3D幾何中的相應位置。此資訊可能被先進的360視訊編碼利用，以達到更好的壓縮效率。例如，編解碼器可以利用在3D表示中的未被並置在訊框封裝圖像中的相鄰面之間的冗餘資訊。序列等級 360 度視訊屬性傳訊 You can use the vertex_3D_pos_x[i][j], vertex_3D_pos_y[i][j], and vertex_3D_pos_z[i][j] parameters to define user-specified polygon-based geometry in 3D space. These parameters can be used to map a sample from its position in the frame-packaged image to a corresponding position in the 3D geometry. This information may be utilized by advanced 360 video coding for better compression efficiency. For example, the codec can utilize redundant information in the 3D representation that is not collocated between adjacent faces in the frame encapsulated image. Sequence level 360 degree video attribute communication

在序列等級，可以傳訊所使用的投影格式。為此，如表9所示，可以為360視訊引入新的參數集。表9. 一般序列參數集RBSP語法 At the sequence level, the projection format used can be signaled. To this end, as shown in Table 9, a new parameter set can be introduced for 360 video. Table 9. General Sequence Parameter Set RBSP Syntax

在範例性實施方式中，表9的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 9 may have the following semantics.

sps_360_extension_flag：指定序列是否為360-視訊，在這種情況下，可以使用用於360視訊的有效壓縮的特定參數和工具。Sps_360_extension_flag: Specifies whether the sequence is 360-Video, in which case specific parameters and tools for efficient compression of 360 video can be used.

所使用的投影格式可以根據表10被傳訊。表10. 序列參數集360擴展語法 The projection format used can be communicated according to Table 10. Table 10. Sequence parameter set 360 extended syntax

在範例性實施方式中，表10的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 10 may have the following semantics.

sps_num_360_formats_minus1：指定在序列中使用的投影格式的數量（減1）。當不存在時，sps_num_360_formats_minus1的值可以被推斷為等於0，表示僅使用一種投影格式。Sps_num_360_formats_minus1: Specifies the number of projection formats used in the sequence (minus 1). When not present, the value of sps_num_360_formats_minus1 can be inferred to be equal to 0, indicating that only one projection format is used.

sps_360_format_idx[i]：指定在序列中使用的360_format（）語法結構的VPS中360_format（）語法結構列表中的索引列表。sps_360_format_idx [i]的值的範圍可以是0到vps_num_360_formats_minus1，包括端值。Sps_360_format_idx[i]: Specifies the index list in the 360_format() syntax structure list in the VPS of the 360_format() syntax structure used in the sequence. The value of sps_360_format_idx [i] can range from 0 to vps_num_360_formats_minus1, including the end value.

注意，在VPS等級所定義的所有360視訊相關參數可以在SPS等級改變。儘管未在表10中示出，替代使用sps_360_format_idx以索引在VPS中發送的360視訊格式的集合，與表3中定義的語法元素類似的語法元素（例如，projection_geometry、face dimension parameters、face QP offset等）可以作為SPS擴展的一部分直接傳訊，以表明參考此SPS的視訊序列的360視訊參數。圖像等級 360 度視訊屬性傳訊 Note that all 360 video related parameters defined at the VPS level can be changed at the SPS level. Although not shown in Table 10, instead of using sps_360_format_idx to index the set of 360 video formats sent in the VPS, syntax elements similar to the syntax elements defined in Table 3 (eg, projection_geometry, face dimension parameters, face QP offset, etc.) ) can be directly communicated as part of the SPS extension to indicate the 360 video parameters of the video sequence referenced to this SPS. Image level 360 degree video attribute messaging

在一些實施方式中，為了提供更大的編碼最佳化，可以使用針對不同訊框的不同投影格式對序列進行編碼。在這種情況下，可以經由已經在VPS或SPS等級已傳訊的投影格式的集合中的索引以在圖像等級傳訊投影格式。為此，在一些實施方式中，如表11所示，可以為360視訊引入新參數集。表11.一般圖像參數集RBSP語法 In some embodiments, to provide greater coding optimization, sequences can be encoded using different projection formats for different frames. In this case, the projection format can be communicated at the image level via an index in the set of projection formats that have been communicated at the VPS or SPS level. To this end, in some embodiments, as shown in Table 11, a new set of parameters can be introduced for 360 video. Table 11. General Image Parameter Set RBSP Syntax

在範例性實施方式中，表11的參數可以具有以下語義。In an exemplary embodiment, the parameters of Table 11 may have the following semantics.

pps_360_extension_flag：指定引用此PPS的圖像是否包含與360度視訊編碼相關的特定參數。當不存在時，可以推斷pps_360_extension_flag的值等於sps_360_extension_flag。Pps_360_extension_flag: Specifies whether the image referenced to this PPS contains specific parameters related to 360-degree video encoding. When not present, it can be inferred that the value of pps_360_extension_flag is equal to sps_360_extension_flag.

在表12和13中提供了360視訊的PPS擴展的範例。表12.圖像參數集360擴展語法表13. 編碼區域表語法 An example of a PPS extension of 360 video is provided in Tables 12 and 13. Table 12. Image Parameter Set 360 Extended Syntax Table 13. Code Region Table Syntax

在範例性實施方式中，表12和13的參數可以具有以下語義。In an exemplary embodiment, the parameters of Tables 12 and 13 may have the following semantics.

pps_360_format_idx：指定由此PPS引用的在SPS定義的投影幾何的集合中的索引。pps_360_format_idx的值應在0到sps_num_360_formats_minus1的範圍內，包括端點值。當不存在時，pps_360_format_idx的值可以被推斷為等於0。Pps_360_format_idx: Specifies the index in the set of projection geometries defined by the SPS referenced by this PPS. The value of pps_360_format_idx should be in the range of 0 to sps_num_360_formats_minus1, including the endpoint value. When not present, the value of pps_360_format_idx can be inferred to be equal to zero.

pps_360_format_idx參數用於指定在序列等級列出的可用投影格式中目前圖像的投影格式。例如，如果在SPS的sps_360_format_idx列表中只有等距柱狀和等面積可用，我們使用索引“0”表示等面積，“1”表示等距柱狀，則該參數可以設定如下： pps_360_format_idx = 0 //涉及此PPS的所有圖像將以等面積格式被編碼 pps_360_format_idx = 1 //涉及此PPS的所有圖像都以等距柱狀格式被編碼。The pps_360_format_idx parameter is used to specify the projection format of the current image in the available projection formats listed at the sequence level. For example, if only equidistant bars and equal areas are available in the SPS sps_360_format_idx list, we use the index "0" to represent the equal area, and "1" to represent the equidistant bars, then the parameter can be set as follows: pps_360_format_idx = 0 // All images involving this PPS will be encoded in an equal area format pps_360_format_idx = 1 //All images involving this PPS are encoded in an equidistant column format.

在相同的視訊序列內，如果允許不同的圖像具有不同的投影幾何格式，則使用具有2個運動參數（分別為水平和垂直位移參數）的平移運動模型或使用具有4個或6個運動參數的基於仿射的運動模型進行時間運動補償預測，可能無法再非常有效地工作。相反，如果目前圖像的投影幾何與其時間參考圖像的投影幾何不同，則可以在施加現有時間運動補償預測之前，執行幾何轉換以對齊目前圖像與其時間參考之間的投影幾何。這可以增加時間預測效率，儘管以更高的計算複雜度為代價。當在運動補償預測（例如，雙預測）中使用多於一個的時間參考圖像時，則在執行運動補償預測之前，可以在目前圖像與其所有參考圖像之間對齊投影幾何。Within the same video sequence, if different images are allowed to have different projection geometry formats, use a translational motion model with 2 motion parameters (horizontal and vertical displacement parameters, respectively) or use 4 or 6 motion parameters The affine-based motion model performs time motion compensation prediction and may not work very efficiently. Conversely, if the projection geometry of the current image is different from the projection geometry of its temporal reference image, geometric transformations can be performed to align the projection geometry between the current image and its temporal reference before applying the existing temporal motion compensated prediction. This can increase the efficiency of time prediction, albeit at the expense of higher computational complexity. When more than one time reference image is used in motion compensated prediction (eg, bi-prediction), the projection geometry can be aligned between the current image and all of its reference images before performing motion compensated prediction.

在範例性實施方式中，coding_region_table（）語法結構的語義可以如下：In an exemplary embodiment, the semantics of the coding_region_table() syntax structure can be as follows:

full_sphere_range_coding_flag：指定整個球體範圍是否被編碼，或者是否僅其中一部分被編碼。當不存在時，可以推斷full_sphere_range_coding_flag的值等於1。Full_sphere_range_coding_flag: Specifies whether the entire sphere range is encoded, or whether only a part of it is encoded. When not present, it can be inferred that the value of full_sphere_range_coding_flag is equal to one.

pos_x_in_360_packed_frame：指定訊框封裝圖像中編碼圖像左上角的x座標。Pos_x_in_360_packed_frame: specifies the x coordinate of the upper left corner of the encoded image in the frame encapsulation image.

pos_y_in_360_packed_frame：指定訊框封裝圖像中編碼圖像左上角的y座標。Pos_y_in_360_packed_frame: Specifies the y coordinate of the upper left corner of the encoded image in the frame encapsulation image.

由於不同的限制，例如頻寬或記憶體限制或解碼能力，整個球體只有一部分可以被編碼。可以使用full_sphere_range_coding_flag和關聯的pos_x_in_360_packed_frame和pos_y_in_360_packed_frame參數來傳訊此資訊。當full_sphere_range_coding_flag被設定為0時，僅對整個訊框封裝圖像的矩形部分進行編碼。然後，使用關聯的pos_x_in_360_packed_frame和pos_y_in_360_packed_frame參數來傳訊訊框封裝圖像內的編碼圖像的左上角。Due to different limitations, such as bandwidth or memory limitation or decoding capabilities, only a portion of the entire sphere can be encoded. This information can be signaled using the full_sphere_range_coding_flag and associated pos_x_in_360_packed_frame and pos_y_in_360_packed_frame parameters. When full_sphere_range_coding_flag is set to 0, only the rectangular portion of the entire frame encapsulation image is encoded. The associated pos_x_in_360_packed_frame and pos_y_in_360_packed_frame parameters are then used to signal the upper left corner of the encoded image within the frame encapsulation image.

第13A圖至第13B圖示出了用於立方體貼圖（第13A圖）和等距柱狀（第13B圖）投影的有限球體範圍編碼的使用。在這些範例中，僅前面區域被編碼。請注意，當使用有限的球體範圍編碼時，應禁用鏈接面寬/高度和編碼圖像寬度/高度的約束。如第13A圖所示，整個圖像表示訊框封裝圖像，並且矩形1305界定編碼區域。如第13B圖所示，整個圖像表示訊框封裝圖像，矩形1310界定編碼區域。Figures 13A through 13B illustrate the use of finite sphere range encoding for cube map (Fig. 13A) and equidistant column (Fig. 13B) projections. In these examples, only the front area is encoded. Note that when using limited sphere range encoding, the constraints of link face width/height and coded image width/height should be disabled. As shown in FIG. 13A, the entire image represents a frame encapsulation image, and a rectangle 1305 defines an encoding region. As shown in Fig. 13B, the entire image represents a frame encapsulation image, and the rectangle 1310 defines an encoding area.

也可以針對每種投影格式而在VPS及/或PPS等級對coding_region_table（）進行傳訊。The coding_region_table() can also be transmitted at the VPS and/or PPS level for each projection format.

注意，在SPS及/或VPS等級定義的一些參數可以替代地或附加地在PPS等級被傳訊。例如，特別有利的是，在PPS等級而不是在VPS或SPS等級處傳訊面QP偏移參數，因為其允許更多的靈活性來調整圖像等級上每個單獨的面的面品質。例如，其允許根據階層B預測結構中的目前訊框封裝圖像的時間等級來靈活地調整每個單獨面的面品質。例如，在較高時間等級，對於非前面，可以將面QP偏移設定為較大值，而對於前面，可以將面QP偏移設定為較小值（例如，0）。這可以確保不管目前圖像的時間等級如何，前面總是以相對高的恆定品質被編碼，而較高時間等級圖像的其他面可以被更多量化以節省位元。Note that some parameters defined at the SPS and/or VPS level may alternatively or additionally be communicated at the PPS level. For example, it is particularly advantageous to communicate the face QP offset parameter at the PPS level rather than at the VPS or SPS level as it allows for more flexibility to adjust the face quality of each individual face on the image level. For example, it allows for flexible adjustment of the face quality of each individual face based on the temporal level of the current frame encapsulation image in the hierarchy B prediction structure. For example, at a higher time level, the face QP offset can be set to a larger value for the non-front, and the face QP offset can be set to a smaller value (eg, 0) for the front. This ensures that regardless of the temporal level of the current image, the front is always encoded with a relatively high constant quality, while the other faces of the higher temporal image can be more quantized to save the bits.

類似地，幾何旋轉參數（例如geometry_rotation_yaw，geometry_rotation_pitch和geometry_rotation_roll）可以在PPS等級而不是在VPS或SPS等級被定義和傳訊，因為其允許更多的靈活性來調整在圖像等級的幾何旋轉。在一些實施方式中，對於正被編碼的內容（例如由視訊內容的引導選擇），推薦的觀看方向被選擇，其中推薦的觀看方向可以在視訊進程中改變。在這樣的實施方式中，幾何旋轉參數可以根據推薦的觀察方向設定並與面QP偏移參數耦合，使得感興趣的物件或區域投影到以最高品質編碼的面。Similarly, geometric rotation parameters (eg, geometry_rotation_yaw, geometry_rotation_pitch, and geometry_rotation_roll) can be defined and communicated at the PPS level rather than at the VPS or SPS level because it allows more flexibility to adjust the geometric rotation at the image level. In some embodiments, for content being encoded (eg, guided selection by video content), the recommended viewing direction is selected, wherein the recommended viewing direction can be changed during the video session. In such an embodiment, the geometric rotation parameter can be set according to the recommended viewing direction and coupled to the face QP offset parameter such that the object or region of interest is projected to the face encoded with the highest quality.

第14A圖及第14B圖示出了訊框封裝圖像中面的範例性替代佈置。第14A圖及第14B圖各自示出六個面的佈置，諸如可以與立方體貼圖投影結合使用。第14A圖及第14B圖中的面的佈置可以被充當使用本文揭露的實施方式的使用者指定的幾何。Figures 14A and 14B illustrate an exemplary alternative arrangement of faces in a frame-packaged image. Figures 14A and 14B each illustrate an arrangement of six faces, such as may be used in conjunction with cubemap projection. The arrangement of the faces in Figures 14A and 14B can be used as a user-specified geometry using the embodiments disclosed herein.

使用一個或多個有線及/或無線網路節點（諸如無線傳輸/接收單元（WTRU）或其他網路實體）來實施本文揭露的範例性實施方式。The exemplary embodiments disclosed herein are implemented using one or more wired and/or wireless network nodes, such as a wireless transmit/receive unit (WTRU) or other network entity.

第15圖為範例性WTRU 1502的系統圖，該WTRU 1502可以被用作此處描述的實施方式中的編碼器或解碼器。如第15圖中所示，WTRU 1502可以包括處理器1518、包括收發器1520的通信介面1519、傳輸/接收元件1522、揚聲器/麥克風1524、小鍵盤1526、顯示器/觸控板1528、非可移記憶體1530、可移記憶體1532、電源1534、全球定位系統（GPS）晶片組1536和感測器1538。需要理解的是，在保持與以上實施方式一致的同時，WTRU 1502可以包括上述元件的任何子集。Figure 15 is a system diagram of an exemplary WTRU 1502 that may be used as an encoder or decoder in the embodiments described herein. As shown in FIG. 15, the WTRU 1502 may include a processor 1518, a communication interface 1519 including a transceiver 1520, a transmission/reception component 1522, a speaker/microphone 1524, a keypad 1526, a display/touchpad 1528, and a non-removable Memory 1530, removable memory 1532, power supply 1534, global positioning system (GPS) chipset 1536, and sensor 1538. It is to be understood that the WTRU 1502 can include any subset of the above-described elements while remaining consistent with the above embodiments.

處理器1518可以是通用處理器、專用處理器、常規處理器、數位訊號處理器（DSP）、多個微處理器、與DSP核相關聯的一或多個微處理器、控制器、微控制器、專用積體電路（ASIC）、現場可程式設計閘陣列（FPGA）電路、其他任何類型的積體電路（IC）、狀態機等。處理器1518可以執行信號編碼、資料處理、功率控制、輸入/輸出處理及/或使得WTRU 1502能夠操作在無線環境中的其他任何功能。處理器1518可以耦合到收發器1520，該收發器1520可以耦合到傳輸/接收元件1522。儘管第15圖中將處理器1518和收發器1520描述為獨立的元件，但是可以理解的是處理器1518和收發器1520可以被一起集成到電子封裝或者晶片中。The processor 1518 can be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors associated with the DSP core, a controller, a micro control , dedicated integrated circuit (ASIC), field programmable gate array (FPGA) circuit, any other type of integrated circuit (IC), state machine, etc. The processor 1518 can perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1502 to operate in a wireless environment. The processor 1518 can be coupled to a transceiver 1520 that can be coupled to the transmit/receive element 1522. Although processor 1518 and transceiver 1520 are depicted as separate components in FIG. 15, it will be appreciated that processor 1518 and transceiver 1520 can be integrated together into an electronic package or wafer.

傳輸/接收元件1522可以被配置為經由空中介面1516將信號傳輸到基地台、或者從基地台接收信號。例如，在一種實施方式中，傳輸/接收元件1522可以是被配置為傳輸及/或接收RF信號的天線。在另一實施方式中，傳輸/接收元件1522可以是被配置為傳輸及/或接收例如IR、UV或者可見光信號的放射器/偵測器。在又一實施方式中，傳輸/接收元件1522可以被配置為傳輸和接收RF信號和光信號兩者。需要理解的是傳輸/接收元件1522可以被配置為傳輸及/或接收無線信號的任何組合。The transmit/receive element 1522 can be configured to transmit signals to, or receive signals from, the base station via the null plane 1516. For example, in one embodiment, the transmit/receive element 1522 can be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 1522 can be an emitter/detector configured to transmit and/or receive, for example, IR, UV, or visible light signals. In yet another embodiment, the transmit/receive element 1522 can be configured to transmit and receive both RF signals and optical signals. It is to be understood that the transmit/receive element 1522 can be configured to transmit and/or receive any combination of wireless signals.

儘管傳輸/接收元件1522在第15圖中被描述為單一元件，但是WTRU 1502可以包括任何數量的傳輸/接收元件1522。更特別地，WTRU 1502可以使用MIMO技術。由此，在一種實施方式中，WTRU 1502可以包括兩個或更多個傳輸/接收元件1522（例如多個天線）以用於經由空中介面1516傳輸和接收無線信號。Although the transmit/receive element 1522 is depicted as a single element in FIG. 15, the WTRU 1502 may include any number of transmit/receive elements 1522. More specifically, the WTRU 1502 may use MIMO technology. Thus, in one embodiment, the WTRU 1502 may include two or more transmit/receive elements 1522 (eg, multiple antennas) for transmitting and receiving wireless signals via the null intermediate plane 1516.

收發器1520可以被配置為對將由傳輸/接收元件1522傳輸的信號進行調變、並且被配置為對由傳輸/接收元件1522接收的信號進行解調。如上所述，WTRU 1502可以具有多模式能力。由此，收發器1520可以包括多個收發器以用於使WTRU 1502能夠經由例如UTRA和IEEE 802.11之類的多種RAT進行通信。The transceiver 1520 can be configured to modulate a signal to be transmitted by the transmit/receive element 1522 and configured to demodulate a signal received by the transmit/receive element 1522. As noted above, the WTRU 1502 may have multi-mode capabilities. Thus, transceiver 1520 can include multiple transceivers for enabling WTRU 1502 to communicate via multiple RATs, such as UTRA and IEEE 802.11.

WTRU 1502的處理器1518可以被耦合到揚聲器/麥克風1524、小鍵盤1526及/或顯示器/觸控板1528（例如，液晶顯示（LCD）顯示單元或者有機發光二極體（OLED）顯示單元）、並且可以從上述裝置接收使用者輸入資料。處理器1518還可以向揚聲器/麥克風1524、小鍵盤1526及/或顯示器/觸控板1528輸出使用者資料。此外，處理器1518可以存取來自任何類型的合適的記憶體中的資訊、以及向任何類型的合適的記憶體中儲存資料，該記憶體例如可以是非可移記憶體1530及/或可移記憶體1532。非可移記憶體1530可以包括隨機存取記憶體（RAM）、可讀記憶體（ROM）、硬碟或者任何其他類型的記憶體儲存裝置。可移記憶體1532可以包括使用者識別模組（SIM）卡、記憶條、安全數位（SD）記憶卡等類似裝置。在其他實施方式中，處理器1518可以存取來自實體上未位於WTRU 1502上而諸如位於伺服器或者家用電腦（未示出）上的記憶體的資料、以及向上述記憶體中儲存資料。The processor 1518 of the WTRU 1502 can be coupled to a speaker/microphone 1524, a keypad 1526, and/or a display/touchpad 1528 (eg, a liquid crystal display (LCD) display unit or an organic light emitting diode (OLED) display unit), And the user input data can be received from the above device. The processor 1518 can also output user profiles to the speaker/microphone 1524, the keypad 1526, and/or the display/touchpad 1528. In addition, the processor 1518 can access information from any type of suitable memory and store the data in any type of suitable memory, such as non-removable memory 1530 and/or removable memory. Body 1532. Non-removable memory 1530 can include random access memory (RAM), readable memory (ROM), hard disk, or any other type of memory storage device. The removable memory 1532 may include a user identification module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 1518 can access data from memory that is not physically located on the WTRU 1502, such as on a server or a home computer (not shown), and store data in the memory.

處理器1518可以從電源1534接收電能、並且可以被配置為將電能分配給WTRU 1502中的其他元件及/或對至WTRU 1502中的其他元件的電能進行控制。電源1534可以是任何適用於為WTRU 1502供電的裝置。例如，電源1534可以包括一個或多個乾電池（鎳鎘（NiCd）、鎳鋅（NiZn）、鎳氫（NiMH）、鋰離子（Li-ion）等）、太陽能電池、燃料電池等。The processor 1518 can receive power from the power source 1534 and can be configured to distribute power to other elements in the WTRU 1502 and/or to control power to other elements in the WTRU 1502. Power source 1534 can be any device suitable for powering WTRU 1502. For example, the power source 1534 can include one or more dry cells (NiCd, NiZn, NiMH, Li-ion, etc.), solar cells, fuel cells, and the like.

處理器1518還可以耦合到GPS晶片組1536，該GPS晶片組1536可以被配置為提供關於WTRU 1502的目前位置的位置資訊（例如經度和緯度）。作為來自GPS晶片組1536的資訊的補充或者替代，WTRU 1502可以經由空中介面1516從基地台接收位置資訊、及/或基於從兩個或更多個相鄰基地台接收到的信號時序來確定其位置。需要理解的是，在與實施方式一致的同時，WTRU 1502可以用任何合適的位置確定方法來獲取位置資訊。The processor 1518 can also be coupled to a GPS die set 1536 that can be configured to provide location information (eg, longitude and latitude) regarding the current location of the WTRU 1502. Additionally or alternatively to the information from the GPS chipset 1536, the WTRU 1502 may determine location information from the base station via the null plane 1516 and/or based on signal timing received from two or more neighboring base stations. position. It is to be understood that the WTRU 1502 can obtain location information using any suitable location determination method while consistent with the embodiments.

處理器1518還可以耦合到其他週邊裝置1538，該週邊裝置1538可以包括提供附加特徵、功能性及/或無線或有線連接的一個或多個軟體及/或硬體模組。例如，週邊裝置1538可以包括諸如加速度計的感測器、電子指南針（e-compass）、衛星收發器、數位相機（用於照片及/或視訊）、通用序列匯流排（USB）埠、震動裝置、電視收發器、免持耳機、藍牙®模組、射頻（FM）無線電單元、數位音樂播放器、媒體播放器、視訊遊戲播放器模組、網際網路瀏覽器等等。The processor 1518 can also be coupled to other peripheral devices 1538, which can include one or more software and/or hardware modules that provide additional features, functionality, and/or wireless or wired connections. For example, peripheral device 1538 can include sensors such as accelerometers, e-compass, satellite transceivers, digital cameras (for photos and/or video), universal serial bus (USB) ports, vibrating devices , TV transceivers, hands-free headsets, Bluetooth® modules, radio frequency (FM) radios, digital music players, media players, video game player modules, Internet browsers, and more.

第16圖描繪了可以在本揭露的實施方式中使用的範例性網路實體1590，例如作為編碼器或解碼器。如第16圖所示，網路實體1590包括通信介面1592、處理器1594和非暫時資料儲存器1596，所有這些都經由匯流排、網路或其他通信路徑1598通信地連結。Figure 16 depicts an exemplary network entity 1590 that may be used in embodiments of the present disclosure, for example as an encoder or decoder. As shown in FIG. 16, network entity 1590 includes communication interface 1592, processor 1594, and non-transitory data store 1596, all of which are communicatively coupled via bus, network, or other communication path 1598.

通信介面1592可以包括一或多個有線通信介面及/或一或多個無線通訊介面。關於有線通信，作為範例，通信介面1592可以包括一或多個介面，例如乙太網路介面。關於無線通訊，通信介面1592可以包括元件，諸如一個或多個天線、為一種或多種類型的無線（例如LTE）通信設計和配置的一個或多個收發器/晶片組、及/或由相關領域的技術人員視為合適的任何其他元件。此外，關於無線通訊，通信介面1592可以用適合於在無線通訊（例如，LTE通信、Wi Fi通信等）的網路側（與用戶端側相對）執行的規模和配置來配置。因此，通信介面1592可以包括用於為覆蓋區域中的多個行動站、UE或其他存取終端服務的適當的裝置和電路（可能包括多個收發器）。Communication interface 1592 can include one or more wired communication interfaces and/or one or more wireless communication interfaces. Regarding wired communication, as an example, communication interface 1592 can include one or more interfaces, such as an Ethernet interface. With respect to wireless communication, communication interface 1592 can include elements such as one or more antennas, one or more transceivers/chip sets designed and configured for one or more types of wireless (eg, LTE) communications, and/or related fields Any other component that the technician considers appropriate. Moreover, with regard to wireless communication, the communication interface 1592 can be configured with a scale and configuration suitable for execution on the network side (as opposed to the client side) of wireless communication (eg, LTE communication, Wi Fi communication, etc.). Thus, communication interface 1592 can include suitable devices and circuitry (possibly including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in the coverage area.

處理器1594可以包括由相關領域的技術人員認為合適的任何類型的一或多個處理器，一些範例包括通用微處理器和專用DSP。Processor 1594 can include any type of one or more processors as deemed suitable by those skilled in the relevant art, some examples including general purpose microprocessors and dedicated DSPs.

資料儲存器1596可以採取任何非暫時的電腦可讀媒體或這種媒體的組合的形式，一些範例包括快閃記憶體、唯讀記憶體（ROM）和隨機存取記憶體（RAM）等等，但是可以使用由相關領域的技術人員認為適合的任何一種或多種類型的非暫時資料儲存器。如第16圖所示，資料儲存器1596包含處理器1594可執行的用於執行本文所述的各種網路實體功能的各種組合的程式指令1597。The data store 1596 can take the form of any non-transitory computer readable medium or a combination of such media, some examples including flash memory, read only memory (ROM), and random access memory (RAM), to name a few. However, any one or more types of non-transitory data storage that are deemed suitable by those skilled in the relevant art can be used. As shown in FIG. 16, data store 1596 includes program instructions 1597 executable by processor 1594 for performing various combinations of the various network entity functions described herein.

注意，所描述的實施方式中的一個或多個的各種硬體元件被稱為與各個模組連接實施（即，執行、實行等）與本文所描述的各種功能的“模組”。如本文所使用的，針對給定實施模組包括由相關領域的技術人員認為適合的硬體（例如，一或多個處理器、一或多個微處理器、一或多個微控制器、一或多個微晶片、一或多個專用積體電路（ASIC）、一或多個現場可程式設計閘陣列（FPGA）、一或多個記憶體裝置）。每個所描述的模組還可以包括可執行用於執行由相應模組執行的一個或多個功能的指令，並且注意，這些指令可以採取或包括硬體（即硬連線）指令、韌體指令、軟體指令及/或類似的，並且可以儲存在任何合適的非暫時電腦可讀媒介或媒體中，諸如通常稱為RAM、ROM等。雖然本發明的特徵和元素以特定的結合在以上進行了描述，但本領域普通技術人員可以理解的是，每個特徵或元素可以在沒有其他特徵和元素的情況下單獨使用，或在與本發明的任何其他特徵和元素結合的各種情況下使用。此外，本發明描述的方法可以在由電腦或處理器執行的電腦程式、軟體或韌體中實施，其中所述電腦程式、軟體或韌體被包含在電腦可讀儲存媒體中。關於電腦可讀儲存媒體的範例包括但不限於唯讀記憶體（ROM）、隨機存取記憶體（RAM）、暫存器、快取記憶體、半導體儲存裝置、磁性媒體（例如，內部硬碟或抽取式磁碟）、磁光媒體以及CD-ROM光碟和數位多功能光碟（DVD）之類的光學媒體。與軟體有關的處理器可以被用於實施在WTRU、UE、終端、基地台、RNC或者任何主機電腦中使用的射頻收發器。It is noted that the various hardware components of one or more of the described embodiments are referred to as "modules" that are implemented (ie, executed, implemented, etc.) with the various modules and with the various functions described herein. As used herein, a given implementation module includes hardware as deemed suitable by those skilled in the relevant art (eg, one or more processors, one or more microprocessors, one or more microcontrollers, One or more microchips, one or more dedicated integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices). Each of the described modules can also include instructions executable to perform one or more functions performed by the respective modules, and note that the instructions can take or include hardware (ie, hardwired) instructions, firmware instructions. The software instructions and/or the like may be stored in any suitable non-transitory computer readable medium or medium, such as commonly referred to as RAM, ROM, and the like. Although the features and elements of the present invention have been described above in terms of specific combinations, those skilled in the art can understand that each feature or element can be used alone or in the absence of other features and elements. Any other combination of features and elements of the invention is used in various situations. Moreover, the methods described herein can be implemented in a computer program, software or firmware executed by a computer or processor, where the computer program, software or firmware is embodied in a computer readable storage medium. Examples of computer readable storage media include, but are not limited to, read only memory (ROM), random access memory (RAM), scratchpad, cache memory, semiconductor storage devices, magnetic media (eg, internal hard drives) Or removable disk), magneto-optical media, and optical media such as CD-ROMs and digital versatile discs (DVDs). The software related processor can be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

20‧‧‧空白區域20‧‧‧Blank area

22‧‧‧線22‧‧‧ line

102‧‧‧輸入視訊訊號102‧‧‧Enter video signal

104‧‧‧變換104‧‧‧Transformation

106‧‧‧量化106‧‧‧Quantification

108、208‧‧‧熵編碼單元108, 208‧‧ Entropy coding unit

110、210‧‧‧逆量化110, 210‧‧‧ inverse quantization

112、212‧‧‧逆變換112, 212‧‧‧ inverse transformation

116‧‧‧視訊塊116‧‧‧Video Block

120、202‧‧‧位元流120, 202‧‧‧ bit flow

126‧‧‧預測塊126‧‧‧ forecast block

160、260‧‧‧空間預測160, 260‧‧‧ space forecast

162、262‧‧‧時間預測162, 262‧‧‧ Time Forecast

164、264‧‧‧參考圖像儲存器164, 264‧‧‧ reference image storage

166‧‧‧視訊塊應用166‧‧•Video Block Application

180‧‧‧模式決定塊180‧‧‧ mode decision block

402‧‧‧門口402‧‧‧ Doorway

502‧‧‧360度視訊擷取502‧‧‧360 degree video capture

508‧‧‧訊框封裝508‧‧‧ Frame Assembly

510‧‧‧編碼510‧‧‧ code

520‧‧‧頭戴式顯示器520‧‧‧ head mounted display

1305、1310‧‧‧矩形1305, 1310‧‧‧ rectangle

1502‧‧‧傳輸/接收單元（WTRU）1502‧‧‧Transmission/receiving unit (WTRU)

1516‧‧‧空中介面1516‧‧‧Intermediate mediation

1522‧‧‧傳輸/接收元件1522‧‧‧Transmission/receiving components

1518、1594‧‧‧處理器1518, 1594‧‧‧ processor

1520‧‧‧收發器1520‧‧‧ transceiver

1519、1592‧‧‧通信介面1519, 1592‧‧‧Communication interface

1524‧‧‧揚聲器/麥克風1524‧‧‧Speaker/Microphone

1526‧‧‧小鍵盤1526‧‧‧Keypad

1528‧‧‧顯示器/觸控板1528‧‧‧Display/Touchpad

1530‧‧‧非可移記憶體1530‧‧‧ Non-removable memory

1532‧‧‧可移記憶體1532‧‧‧Removable memory

1534‧‧‧電源1534‧‧‧Power supply

1536‧‧‧全球定位系統（GPS）晶片組1536‧‧‧Global Positioning System (GPS) chipset

1538‧‧‧感測器、週邊裝置1538‧‧‧Sensor, peripheral device

1590‧‧‧網路實體1590‧‧‧Network entities

1596‧‧‧非暫時資料儲存器1596‧‧‧Non-temporary data storage

1597‧‧‧指令1597‧‧‧ directive

A0、A1、A2、A3、A4、A5、A6‧‧‧水平線A0, A1, A2, A3, A4, A5, A6‧‧‧ horizontal lines

FOV‧‧‧視場FOV‧‧ ‧ field of view

P、Pf、Ps、q‧‧‧點P, Pf, Ps, q‧‧‧ points

NX、NXNY、NXPY、NY、NYNZ、NYPZ、NZ、PX、PXNY、PXPY、PY、PYNZ、PYPZ、PZ‧‧‧面NX, NXNY, NXPY, NY, NYNZ, NYPZ, NZ, PX, PXNY, PXPY, PY, PYNZ, PYPZ, PZ‧‧

φ‧‧‧經度φ‧‧‧Longitude

θ‧‧‧緯度Θ‧‧‧ latitude

從以下結合附圖藉由範例提供的描述可以具有更詳細地理解，其中：第1A圖示出了球體幾何上的使用經度和緯度中的球體取樣的等距柱狀投影。第1B圖示出了針對第1A圖中的取樣的2D平面等距柱狀投影，其中在第1A圖中的球體上的點P被投影到2D平面中的點q。第1C圖是具有等距柱狀投影的範例圖像的示意圖。第2A圖示出了在具有面PX（0）、NX（1）、PY（2）、NY（3）、PZ（4）、NZ（5）的3D幾何結構上的立方體貼圖投影。第2B圖示出了第2A圖中限定的六個面的2D平面。第2C圖示意性地示出了具有立方體貼圖投影的範例圖。第3A圖示出了用於等面積投影的以等面積方式進行的球體取樣。第3B圖示出了第3A圖的等面積投影的2D平面，其中球體上的點p投影到2D平面中的點q，並且水平線（A0、A1、A2等）的緯度是不等間隔。第3C圖示意性地示出了具有等面積投影的範例圖。第4A圖示出了具有3D幾何結構的八面體投影。第4B圖示出了第4A圖的3D結構的2D平面封裝。第4C圖示意性地示出了具有八面體投影的範例圖。第5圖示出了360度視訊處理工作流程的一個實施方式。第6圖示出了基於塊的視訊編碼器的功能方塊圖的一個實施方式。第7圖示出了視訊解碼器的功能方塊圖的一個實施方式。第8A圖示出立方體貼圖投影格式的實體佈局的一個實施方式。第8B圖示出了八面體投影格式的實體佈局的一個實施方式。第9A圖示出了以4×3格式表示的立方體貼圖。第9B圖示出了以3×2格式表示的立方體貼圖。第9C圖示出了以3×3格式表示的立方體貼圖，前面是其他面的尺寸的兩倍（面積的四倍）（在這種情況下，前面擴展了兩列和兩行）。第10A圖至第10H圖示出了三角形面的面旋轉的定義：第10A圖：0°旋轉；第10B圖：90°旋轉；第10C圖：180°旋轉；第10D圖：270°旋轉；第10E圖：0°旋轉，然後垂直翻轉；第10F圖：90°旋轉，然後垂直翻轉；第10G圖：180°旋轉，然後垂直翻轉；第10H圖：270°旋轉，然後垂直翻轉。第11A圖示出了用於八面體的非緊湊訊框封裝格式。第11B圖示出了用於八面體的緊湊訊框封裝格式。第12A圖示出了用於二十面體的非緊湊訊框封裝格式。第12B圖示出了用於二十面體的緊湊的訊框封裝格式。第13A圖示出了用於立方體貼圖的有限球體範圍編碼，其中全圖像表示訊框封裝圖像，以及矩形限定編碼的區域。第13B圖示出了用於等邊的有限球體範圍編碼，其中全圖像表示訊框封裝圖像，以及矩形限定編碼的區域。第14A圖及第14B圖示出了訊框封裝圖像中的面的範例性替代佈置，每個示出了六個面的佈置，諸如可以與立方體貼圖投影結合使用。第15圖示出了在一些實施方式中可以用作編碼器或解碼器的範例性無線傳輸/接收單元（WTRU）。第16圖示出了在一些實施方式中可以用作編碼器或解碼器的範例性網路實體。The description provided by the examples in conjunction with the following figures can be understood in more detail, wherein: Figure 1A shows an equidistant columnar projection of the spherical geometry using the longitude and latitude sampling in the sphere. Figure 1B shows a 2D planar equidistant columnar projection for the sample in Figure 1A, where the point P on the sphere in Figure 1A is projected onto the point q in the 2D plane. Figure 1C is a schematic diagram of an example image with equidistant columnar projections. Figure 2A shows a cubemap projection on a 3D geometry with faces PX(0), NX(1), PY(2), NY(3), PZ(4), NZ(5). Figure 2B shows the 2D plane of the six faces defined in Figure 2A. Figure 2C schematically shows an example diagram with a cubemap projection. Figure 3A shows sphere sampling in an equal area manner for equal area projection. Fig. 3B shows the 2D plane of the equal area projection of Fig. 3A, in which the point p on the sphere is projected to the point q in the 2D plane, and the latitudes of the horizontal lines (A0, A1, A2, etc.) are unequal intervals. Figure 3C schematically shows an example diagram with an equal area projection. Figure 4A shows an octahedral projection with a 3D geometry. Figure 4B shows a 2D planar package of the 3D structure of Figure 4A. Figure 4C schematically shows an example diagram with an octahedral projection. Figure 5 shows an embodiment of a 360 degree video processing workflow. Figure 6 shows an embodiment of a functional block diagram of a block based video encoder. Figure 7 shows an embodiment of a functional block diagram of a video decoder. Figure 8A illustrates one embodiment of a physical layout of a cubemap projection format. Figure 8B illustrates one embodiment of a physical layout of an octahedral projection format. Figure 9A shows a cube map in a 4 x 3 format. Figure 9B shows a cube map in a 3 x 2 format. Figure 9C shows a cube map in a 3 x 3 format with twice the size of the other faces (four times the area) (in this case, two columns and two rows are expanded in front). Figures 10A through 10H show the definition of the surface rotation of the triangular face: Figure 10A: 0° rotation; Figure 10B: 90° rotation; Figure 10C: 180° rotation; Figure 10D: 270° rotation; Figure 10E: 0° rotation, then vertical flip; 10F: 90° rotation, then vertical flip; 10G: 180° rotation, then vertical flip; 10H: 270° rotation, then vertical flip. Figure 11A shows a non-compact frame encapsulation format for octahedrons. Figure 11B shows a compact frame encapsulation format for octahedrons. Figure 12A shows a non-compact frame encapsulation format for an icosahedron. Figure 12B shows a compact frame packing format for icosahedrons. Figure 13A shows a finite sphere range encoding for a cube map where the full image represents the frame encapsulated image and the rectangle defines the encoded region. Figure 13B shows a finite sphere range encoding for equilateral edges, where the full image represents the frame encapsulation image, and the rectangle defines the encoded region. Figures 14A and 14B illustrate exemplary alternative arrangements of faces in a frame-packaged image, each showing an arrangement of six faces, such as may be used in conjunction with cubemap projection. Figure 15 illustrates an exemplary wireless transmit/receive unit (WTRU) that may be used as an encoder or decoder in some embodiments. Figure 16 illustrates an exemplary network entity that may be used as an encoder or decoder in some embodiments.

Claims

A method for decoding 360-degree video encoded in a bit stream, the method comprising: receiving a bit stream encoding a 2D planar video, the bit stream including parameters identifying a projection geometry format; The identified projection geometry formats the 2D planar video to a 360 degree video.

The method of claim 1, wherein the bit stream further includes a parameter indicating whether the bit stream encodes 360-degree video, wherein the parameter is executed as long as the parameter indicates that the bit stream represents 360-degree video. The 2D plane video is to the mapping of the 360 degree video.

The method of claim 1, wherein the projection format comprises a projection geometry type, and wherein the parameter identifying the projection format comprises a parameter identifying the projection geometry type.

The method of claim 3, wherein the parameter identifying the projection geometry type comprises an index of the identified projection geometry type.

The method of claim 3, wherein the parameter identifying the projection geometry type identifies a geometric type selected from one or more of the following: equidistant columnar, cube map, equal area, eight sides Body, icosahedron, cylinder, and user-specified polygons.

The method of claim 3, wherein the identified projection geometry type has a plurality of faces, and wherein the parameter identifying the projection geometry type includes an indication of the number of faces.

The method of claim 3, wherein the identified projection geometry type has a plurality of faces, and wherein identifying the parameter of the projection geometry type comprises identifying an arrangement of the plurality of faces in the 2D planar video Frame encapsulation parameters.

The method of claim 1, wherein the identified projection format has a plurality of faces, and wherein the bitstream further comprises parameters identifying quality levels of the plurality of faces in the 2D planar video.

The method of claim 1, wherein the projection geometry comprises a geometric orientation of a projection geometry, and wherein the parameter identifying the projection geometry comprises a parameter identifying the geometric orientation.

The method of claim 9, wherein the parameter identifying the geometric orientation comprises at least one of: a yaw parameter, a pitch parameter, and a rolling parameter.

The method of claim 9, wherein the parameter identifying the geometric orientation comprises a parameter identifying the geometric orientation of an equidistant columnar projection, and wherein mapping the 2D planar video to a 360 degree video is Execution is performed using an equidistant columnar projection with the identified geometric orientation.

The method of any one of clauses 1 to 11, wherein the parameter identifying the projection format is received in at least one video parameter set of the bitstream.

The method of any one of clauses 1 to 11, wherein the parameter identifying the projection format is received in at least one sequence parameter set of the bitstream.

A method for encoding 360-degree video, the method comprising: selecting a projection geometry format; mapping the 360-degree video to a 2D planar video using the selected projection geometry format; and imaging the 2D plane in a bit stream Encoding; and in the bitstream, the communication identifies parameters of the projected geometric format.

The method of claim 14, further comprising transmitting, in the bitstream, a parameter indicating that the bitstream encodes 360-degree video.

The method of claim 14, wherein selecting the projection geometry comprises selecting a geometric orientation of a projection geometry, and wherein the parameter communicated in the bitstream includes a one identifying the selected geometric orientation. parameter.

The method of claim 16, wherein the parameter identifying the projected geometric orientation comprises a parameter identifying the geometric orientation of an equidistant cylindrical projection, and wherein the mapping of the 360-degree video to the 2D planar video It is performed using an equidistant columnar projection with the identified geometric orientation.

The method of claim 16, wherein the projection geometry comprises a plurality of faces, and wherein the geometric orientation of the projection geometry is selected such that it falls within one of the plurality of faces within the 360 degree video A portion of a selected region of interest is substantially maximized.

The method of claim 16, wherein the projection geometry comprises a plurality of faces comprising at least one face encoded with a higher quality level lower than at least one other face, and The geometric orientation of the projection geometry is selected such that a selected region of interest within the 360 degree video falls substantially over a portion of the face having the higher quality level.

The method of claim 14, wherein selecting the projection format comprises selecting a geometry type, and wherein the parameter communicated in the bitstream includes a parameter identifying the selected geometry type.