HK1237170B

HK1237170B - Design of sample entry and operation point signalling in a layered video file format

Info

Publication number: HK1237170B
Application number: HK17111039.1A
Authority: HK
Inventors: 伏努‧亨利; 王益魁
Original assignee: 高通股份有限公司
Priority date: 2015-02-11
Filing date: 2016-02-10
Publication date: 2021-02-11

Description

Sample entry and operation point signaling design in layered video file format

本申请案要求2015年2月11日申请的美国临时专利申请案第62/115,075号的权利，所述申请案的全部内容以引用的方式并入本文中。This application claims the benefit of U.S. Provisional Patent Application No. 62/115,075, filed February 11, 2015, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本发明涉及视频译码。The present invention relates to video decoding.

背景技术Background Art

数字视频能力可并入到广泛范围的装置中，包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌面计算机、平板计算机、电子书阅读器、数字相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制面板、蜂窝或卫星无线电电话(所谓的“智能手机”)、视频电话会议装置、视频串流装置及其类似者。数字视频装置实施视频压缩技术，诸如由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频译码(AVC)所定义的标准、目前正在开发的高效率视频译码(HEVC)标准及此等标准的扩展中所描述的那些视频压缩技术。视频装置可通过实施此等视频压缩技术更有效地发射、接收、编码、解码及/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones (so-called "smartphones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently under development, and extensions to these standards. By implementing these video compression techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

视频压缩技术执行空间(图片内)预测及/或时间(图片间)预测来减少或移除视频序列中固有的冗余。对于基于块的视频译码，可将视频切片(例如，视频帧或视频帧的一部分)分割成视频块，其还可被称作树型块、译码单元(CU)及/或译码节点。图片的经帧内译码(I)切片中的视频块是使用相对于同一图片中的相邻块中的参考样本的空间预测来编码。图片的经帧间译码(P或B)切片中的视频块可使用相对于同一图片中的相邻块中的参考样本的空间预测或相对于其它参考图片中的参考样本的时间预测。图片可被称作帧，且参考图片可被称作参考帧。Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

在视频数据已经编码之后，可将视频数据包化以用于发射或存储。可将视频数据组译成符合多种标准中的任一种的视频文件，所述标准诸如国际标准化组织(ISO)基本媒体文件格式及其扩展，诸如AVC。After the video data has been encoded, the video data may be packetized for transmission or storage. The video data may be packaged into a video file that conforms to any of a variety of standards, such as the International Organization for Standardization (ISO) base media file format and its extensions, such as AVC.

发明内容Summary of the Invention

一般来说，本发明涉及视频内容在文件中的存储。在一些实例中，本发明的技术是基于国际标准化组织(ISO)基本媒体文件格式(ISOBMFF)。本发明的一些实例涉及用于含有多个经译码层的视频串流的存储，其中每一层可为可调式层、纹理视图、深度视图等，且所述方法可适用于存储多视图高效率视频译码(MV-HEVC)、可调式HEVC(SHVC)、三维HEVC(3D-HEVC)及其它类型的视频数据。Generally, the present invention relates to the storage of video content in files. In some examples, the techniques of this invention are based on the International Organization for Standardization (ISO) Base Media File Format (ISOBMFF). Some examples of this invention relate to storage of video streams containing multiple coded layers, where each layer can be a scalable layer, a texture view, a depth view, etc., and the methods are applicable to storing Multi-View High Efficiency Video Coding (MV-HEVC), Scalable HEVC (SHVC), Three-Dimensional HEVC (3D-HEVC), and other types of video data.

在一个实例中，一种处理多层视频数据的方法包含：获得所述多层视频数据；以文件格式存储所述多层视频数据；在所述文件格式的操作点信息(oinf)方框中存储用于所述多层视频数据的每一操作点的表示格式信息；及产生根据所述文件格式而格式化的视频数据的文件。In one example, a method of processing multi-layer video data includes obtaining the multi-layer video data; storing the multi-layer video data in a file format; storing representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box of the file format; and generating a file of video data formatted according to the file format.

在另一实例中，一种处理多层视频数据的方法包含：获得根据文件格式而格式化的多层视频数据的文件；确定所述文件格式的操作点信息(oinf)方框中的用于所述多层视频数据的每一操作点的表示格式信息；及基于所述所确定的表示格式信息而解码所述多层视频数据。In another example, a method of processing multi-layer video data includes obtaining a file of multi-layer video data formatted according to a file format; determining representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box of the file format; and decoding the multi-layer video data based on the determined representation format information.

在另一实例中，一种用于处理多层视频数据的视频装置包含：数据存储媒体，其经配置以存储所述多层视频数据；及一或多个处理器，其经配置以执行以下操作：获得所述多层视频数据；以文件格式存储所述多层视频数据；在所述文件格式的操作点信息(oinf)方框中存储用于所述多层视频数据的每一操作点的表示格式信息；及产生根据所述文件格式而格式化的视频数据的文件。In another example, a video device for processing multi-layer video data includes: a data storage medium configured to store the multi-layer video data; and one or more processors configured to: obtain the multi-layer video data; store the multi-layer video data in a file format; store representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box of the file format; and generate a file of video data formatted according to the file format.

在另一实例中，一种用于处理多层视频数据的视频装置包含：数据存储媒体，其经配置以存储所述多层视频数据；及一或多个处理器，其经配置以执行以下操作：获得根据文件格式而格式化的多层视频数据的文件；确定所述文件格式的操作点信息(oinf)方框中的用于所述多层视频数据的每一操作点的表示格式信息；及基于所述所确定的表示格式信息而解码所述多层视频数据。In another example, a video device for processing multi-layer video data includes: a data storage medium configured to store the multi-layer video data; and one or more processors configured to: obtain a file of multi-layer video data formatted according to a file format; determine representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box of the file format; and decode the multi-layer video data based on the determined representation format information.

在另一实例中，一种用于处理多层视频数据的视频装置包含：用于获得所述多层视频数据的装置；用于以文件格式存储所述多层视频数据的装置；用于在所述文件格式的操作点信息(oinf)方框中存储用于所述多层视频数据的每一操作点的表示格式信息的装置；及用于产生根据所述文件格式而格式化的视频数据的文件的装置。In another example, a video device for processing multi-layer video data includes: a device for obtaining the multi-layer video data; a device for storing the multi-layer video data in a file format; a device for storing representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box of the file format; and a device for generating a file of video data formatted according to the file format.

在另一实例中，一种计算机可读存储媒体存储在经执行时使一或多个处理器执行以下操作的指令：获得多层视频数据；以文件格式存储所述多层视频数据；在所述文件格式的操作点信息(oinf)方框中存储用于所述多层视频数据的每一操作点的表示格式信息；及产生根据所述文件格式而格式化的视频数据的文件。In another example, a computer-readable storage medium stores instructions that, when executed, cause one or more processors to: obtain multi-layer video data; store the multi-layer video data in a file format; store representation format information for each operation point of the multi-layer video data in an operation point information (oinf) box in the file format; and generate a file of video data formatted according to the file format.

在附图及以下描述中阐明本发明的一或多个实例的细节。其它特征、目标及优势自描述、图式及权利要求书将为显而易见的。The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为说明可使用本发明中所描述的技术的实例视频编码及解码系统的框图。1 is a block diagram illustrating an example video encoding and decoding system that may use the techniques described in this disclosure.

图2为说明可实施本发明中所描述的技术的实例视频编码器的框图。2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.

图3为说明可实施本发明中所描述的技术的实例视频解码器的框图。3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.

图4为说明形成网络的部分的实例装置集合的框图。4 is a block diagram illustrating an example set of devices forming part of a network.

图5A为说明根据本发明的一或多种技术的文件的实例结构的概念图。5A is a conceptual diagram illustrating an example structure of a file, in accordance with one or more techniques of this disclosure.

图5B为说明根据本发明的一或多种技术的文件的实例结构的概念图。5B is a conceptual diagram illustrating an example structure of a file, in accordance with one or more techniques of this disclosure.

图6为说明根据本发明的一或多种技术的文件的实例结构的概念图。6 is a conceptual diagram illustrating an example structure of a file, in accordance with one or more techniques of this disclosure.

图7为说明根据本发明的一或多种技术的文件产生装置的实例操作的流程图。7 is a flowchart illustrating example operation of a file generation device, in accordance with one or more techniques of this disclosure.

图8为说明根据本发明的一或多种技术的文件读取装置的实例操作的流程图。8 is a flowchart illustrating example operation of a file reading device, in accordance with one or more techniques of this disclosure.

具体实施方式DETAILED DESCRIPTION

ISO基本媒体文件格式(ISOBMFF)为用于存储媒体数据的文件格式。ISOBMFF可扩展以支持符合特定视频译码标准的视频数据的存储。举例来说，ISOBMFF先前已经扩展以支持符合H.264/AVC及高效率视频译码(HEVC)视频译码标准的视频数据的存储。此外，ISOBMFF先前已经扩展以支持符合H.264/AVC的多视图译码(MVC)及可调式视频译码(SVC)扩展的视频数据的存储。MV-HEVC、3D-HEVC及SHVC为HEVC视频译码标准的支持多层视频数据的扩展。添加到ISOBMFF用于符合H.264/AVC的MVC及SVC扩展的视频数据的存储的特征不足够用于符合MV-HEVC、3D-HEVC及SHVC的视频数据的有效存储。换句话说，如果我们将要试图将用于符合H.264/AVC的MVC及SVC扩展的视频数据的存储的ISOBMFF的扩展用于符合MV-HEVC、3D-HEVC及SHVC的视频数据的存储，那么可能出现各种问题。The ISO Base Media File Format (ISOBMFF) is a file format for storing media data. ISOBMFF is extensible to support the storage of video data conforming to specific video coding standards. For example, ISOBMFF has previously been extended to support the storage of video data conforming to the H.264/AVC and High Efficiency Video Coding (HEVC) video coding standards. Furthermore, ISOBMFF has previously been extended to support the storage of video data conforming to the Multi-View Coding (MVC) and Scalable Video Coding (SVC) extensions of H.264/AVC. MV-HEVC, 3D-HEVC, and SHVC are extensions of the HEVC video coding standard that support multi-layer video data. The features added to ISOBMFF for the storage of video data conforming to the MVC and SVC extensions of H.264/AVC are insufficient for the efficient storage of video data conforming to MV-HEVC, 3D-HEVC, and SHVC. In other words, if we were to attempt to use the extensions of ISOBMFF used for storage of video data compliant with the MVC and SVC extensions of H.264/AVC for storage of video data compliant with MV-HEVC, 3D-HEVC, and SHVC, various problems might arise.

举例来说，不同于符合H.264/AVC的MVC或SVC扩展的位流，符合MV-HEVC、3D-HEVC或SHVC的位流可包含含有帧内随机存取点(IRAP)图片及非IRAP图片的存取单元。含有IRAP图片及非IRAP图片的存取单元可用于MV-HEVC、3D-HEVC及SHVC中的随机存取。然而，ISOBMFF及其现存扩展不提供识别此类存取单元的方式。此情形可阻碍计算装置执行随机存取、层切换及与多层视频数据相关联的其它此类功能的能力。For example, unlike bitstreams conforming to the MVC or SVC extensions of H.264/AVC, bitstreams conforming to MV-HEVC, 3D-HEVC, or SHVC may include access units containing intra random access point (IRAP) pictures and non-IRAP pictures. Access units containing IRAP pictures and non-IRAP pictures can be used for random access in MV-HEVC, 3D-HEVC, and SHVC. However, ISOBMFF and its existing extensions do not provide a way to identify such access units. This situation can hinder the ability of computing devices to perform random access, layer switching, and other such functions associated with multi-layer video data.

虽然本发明的技术的描述中的许多者描述MV-HEVC、3D-HEVC及SHVC，但读者应了解，本发明的技术可适用于其它视频译码标准及/或其扩展。Although much of the description of the techniques of this disclosure describes MV-HEVC, 3D-HEVC, and SHVC, the reader should appreciate that the techniques of this disclosure may be applicable to other video coding standards and/or extensions thereof.

如下文将更详细地解释，符合HEVC文件格式的文件可包含一系列对象，称为方框。方框可为由唯一类型识别符及长度定义的面向对象式建置块。本发明描述与产生根据文件格式的文件相关的技术，且更明确地说，描述用于在某些方框中定位某些类型的信息以潜在地改进播放装置的处理包含多操作点的文件的能力的技术。As explained in more detail below, files conforming to the HEVC file format may contain a series of objects, referred to as boxes. A box may be an object-oriented building block defined by a unique type identifier and a length. This disclosure describes techniques related to generating files according to the file format, and more specifically, describes techniques for locating certain types of information within certain boxes to potentially improve a playback device's ability to process files containing multiple operation points.

图1为说明可使用本发明中所描述的技术的实例视频编码及解码系统10的框图。如图1中所展示，系统10包含源装置12，其产生稍后待由目的地装置14解码的经编码视频数据。源装置12及目的地装置14可包括广泛范围装置中的任一者，包含桌面计算机、笔记本电脑(即，膝上型)计算机、平板计算机、机顶盒、电话手持机(诸如，所谓的“智能”手机)、所谓的“智能”衬垫、电视、相机、显示装置、数字媒体播放器、视频游戏控制面板、视频串流发射装置或其类似者。在一些状况下，源装置12及目的地装置14可经装备以用于无线通信。源装置12及目的地装置14可被视为视频装置。FIG1 is a block diagram illustrating an example video encoding and decoding system 10 that may use the techniques described in this disclosure. As shown in FIG1 , system 10 includes a source device 12 that generates encoded video data to be later decoded by a destination device 14. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (such as so-called "smart" phones), so-called "smart" pads, televisions, cameras, display devices, digital media players, video game consoles, video stream transmission devices, or the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Source device 12 and destination device 14 may be considered video devices.

在图1的实例中，源装置12包含视频源18、视频编码器20及输出接口22。在一些状况下，输出接口22可包含调制器/解调制器(调制解调器)及/或发射器。在源装置12中，视频源18可包含诸如视频捕捉装置(例如，视频相机)、含有先前所捕捉视频的视频存盘、用以从视频内容提供商接收视频的视频馈入接口，及/或用于将计算机图形数据产生为源视频的计算机图形系统的源，或此类源的组合。然而，本发明中所描述的技术可大体上适用于视频译码，且可应用于无线及/或有线应用。1 , source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some cases, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include a source such as a video capture device (e.g., a video camera), a video disk containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources. However, the techniques described in this disclosure may be applicable to video coding in general and may be applied to wireless and/or wired applications.

视频编码器20可编码经捕捉、经预捕捉或计算机产生的视频。源装置12可经由源装置12的输出接口22将经编码视频数据直接发射到目的地装置14。经编码视频数据还可(或替代地)存储到存储装置33上以供稍后由目的地装置14或其它装置存取，以用于解码及/或播放。Video encoder 20 may encode captured, pre-captured, or computer-generated video. Source device 12 may transmit the encoded video data directly to destination device 14 via output interface 22 of source device 12. The encoded video data may also (or alternatively) be stored on storage device 33 for later access by destination device 14 or other devices for decoding and/or playback.

目的地装置14包含输入接口28、视频解码器30及显示装置32。在一些状况下，输入接口28可包含接收器及/或调制解调器。目的地装置14的输入接口28经由链路16接收经编码视频数据。经由链路16传达或在存储装置32上提供的经编码视频数据可包含由视频编码器20所产生的多种语法元素，其供诸如视频解码器30的视频解码器在解码所述视频数据时使用。发射在通信媒体上、存储在存储媒体上或存储在文件服务器上的经编码视频数据内可包含此类语法元素。Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some cases, input interface 28 may include a receiver and/or a modem. Input interface 28 of destination device 14 receives encoded video data via link 16. The encoded video data communicated via link 16 or provided on storage device 32 may include a variety of syntax elements generated by video encoder 20 for use by a video decoder, such as video decoder 30, in decoding the video data. Such syntax elements may be included within the encoded video data transmitted on a communication medium, stored on a storage medium, or stored on a file server.

显示装置32可与目的地装置14集成或在目的地装置14的外部。在一些实例中，目的地装置14可包含集成式显示装置且还可经配置以与外部显示装置介接。在其它实例中，目的地装置14可为显示装置。一般来说，显示装置32向用户显示经解码视频数据，且可包括多种显示装置中的任一者，诸如液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或另一类型的显示装置。Display device 32 may be integrated with destination device 14 or external to destination device 14. In some examples, destination device 14 may include an integrated display device and may also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

视频编码器20及视频解码器30各自可实施为多种合适编码器电路中的任一者，诸如一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、场可编程门阵列(FPGA)、离散逻辑、软件、硬件、固件或其任何组合。当所述技术部分以软件实施时，装置可将软件指令存储在合适的非暂时性计算机可读媒体中，且使用一或多个处理器在硬件中执行指令以执行本发明的技术。视频编码器20及视频解码器30中的每一者可包含在一或多个编码器或解码器中，编码器或解码器中的任一者可集成为相应装置中的组合式编码器/解码器(编码解码器)的部分。Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are implemented partially in software, the device may store the software instructions in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in the respective device.

目的地装置14可经由链路16接收待解码的经编码视频数据。链路16可包括能够将经编码视频数据从源装置12移动到目的地装置14的任何类型的媒体或装置。在一个实例中，链路16可包括使源装置12能够实时地将经编码视频数据直接发射到目的地装置14的通信媒体。可根据通信标准(诸如，无线通信协议)调制经编码视频数据，且将其发射到目的地装置14。通信媒体可包括任何无线或有线通信媒体，诸如射频(RF)频谱或一或多个物理发射线。通信媒体可形成基于包的网络(诸如，局域网络、广域网或全球网络，诸如因特网)的一部分。通信媒体可包含路由器、交换器、基站或可适用于有助于从源装置12到目的地装置14的通信的任何其它装备。Destination device 14 may receive the encoded video data to be decoded via link 16. Link 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, link 16 may comprise a communication medium that enables source device 12 to transmit the encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment suitable for facilitating communication from source device 12 to destination device 14.

替代地，输出接口22可将经编码数据输出到存储装置33。类似地，输入接口28可存取经编码数据存储装置33。存储装置33可包含多种分布式或局部存取式数据存储媒体中的任一者，诸如硬盘机、蓝光(Blu-ray)光盘、DVD、CD-ROM、闪存、易失性或非易失性存储器或用于存储经编码视频数据的任何其它合适数字存储媒体。在另一实例中，存储装置33可对应于文件服务器或可固持由源装置12产生的经编码视频的另一中间存储装置。目的地装置14可经由串流发射或下载从存储装置33存取所存储视频数据。文件服务器可为能够存储经编码视频数据且将那个经编码视频数据发射到目的地装置14的任何类型的服务器。实例文件服务器包含网页服务器(例如，用于网站)、FTP服务器、网络附接存储(NAS)装置或本端磁盘驱动器。目的地装置14可经由任何标准数据连接(包含因特网连接)而存取经编码视频数据。此数据连接可包含适合于存取存储在文件服务器上的经编码视频数据的无线信道(例如，Wi-Fi连接)、有线连接(例如，DSL、电缆调制解调器等)，或两者的组合。经编码视频数据从存储装置33的发射可为串流发射、下载发射或两者的组合。Alternatively, output interface 22 may output the encoded data to storage device 33. Similarly, input interface 28 may access encoded data storage device 33. Storage device 33 may include any of a variety of distributed or locally accessible data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data. In another example, storage device 33 may correspond to a file server or another intermediate storage device that can hold the encoded video generated by source device 12. Destination device 14 may access the stored video data from storage device 33 via streaming or downloading. A file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to destination device 14. Example file servers include a web server (e.g., for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This data connection may include a wireless channel (e.g., a Wi-Fi connection) suitable for accessing encoded video data stored on a file server, a wired connection (e.g., DSL, cable modem, etc.), or a combination of both. The transmission of the encoded video data from storage device 33 may be a streaming transmission, a download transmission, or a combination of both.

本发明的技术不必限于无线应用或设置。所述技术可适用于支持多种多媒体应用(诸如，(例如)经由因特网的空中电视广播、有线电视发射、卫星电视发射、串流视频发射)中的任一者的视频译码、供存储在数据存储媒体上的数字视频的编码、存储在数据存储媒体上的数字视频的解码，或其它应用。在一些实例中，系统10可经配置以支持单向或双向视频发射以支持应用(诸如，视频串流发射、视频播放、视频广播及/或视频电话)。The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applicable to video decoding to support any of a variety of multimedia applications, such as, for example, over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions via the Internet, encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

此外，在图1的实例中，视频译码系统10可包含文件产生装置34。文件产生装置34可接收由源装置12产生的经编码视频数据，且产生包含经编码视频数据的文件。目的地装置14可直接或经由存储装置33接收由文件产生装置34产生的文件。在各种实例中，文件产生装置34可包含各种类型的计算装置。举例来说，文件产生装置34可包括媒体感知网络组件(MANE)、服务器计算装置、个人计算装置、特殊用途计算装置、商用计算装置或另一类型的计算装置。在一些实例中，文件产生装置34为内容递送网络的部分。文件产生装置34可经由诸如链路16的信道从源装置12接收经编码视频数据。此外，目的地装置14可经由诸如链路16的信道从文件产生装置34接收文件。Furthermore, in the example of FIG. 1 , video coding system 10 may include file generation device 34. File generation device 34 may receive the encoded video data generated by source device 12 and generate a file including the encoded video data. Destination device 14 may receive the file generated by file generation device 34 directly or via storage device 33. In various examples, file generation device 34 may include various types of computing devices. For example, file generation device 34 may include a media-aware network element (MANE), a server computing device, a personal computing device, a special-purpose computing device, a commercial computing device, or another type of computing device. In some examples, file generation device 34 is part of a content delivery network. File generation device 34 may receive the encoded video data from source device 12 via a channel such as link 16. Furthermore, destination device 14 may receive the file from file generation device 34 via a channel such as link 16.

在一些配置中，文件产生装置34可为与源装置12及目的地装置14分离的视频装置，而在其它配置中，文件产生装置34可实施为源装置12或目的地装置14的组件。在文件产生装置34为源装置12或目的地装置14的组件的实施中，文件产生装置34则可共享由视频编码器20及视频解码器30利用的相同资源(诸如，存储器、处理器及其它硬件)中的一些资源。在文件产生装置34为单独装置的实施中，文件产生装置则可包含其自身的存储器、处理器及其它硬件单元。In some configurations, file generation device 34 may be a separate video device from source device 12 and destination device 14, while in other configurations, file generation device 34 may be implemented as a component of source device 12 or destination device 14. In implementations where file generation device 34 is a component of source device 12 or destination device 14, file generation device 34 may share some of the same resources, such as memory, processor, and other hardware, utilized by video encoder 20 and video decoder 30. In implementations where file generation device 34 is a separate device, the file generation device may include its own memory, processor, and other hardware units.

在其它实例中，源装置12或另一计算装置可产生包含经编码视频数据的文件。然而，为易于解释，本发明将文件产生装置34描述为产生文件。然而应理解，一般来说，此等描述适用于计算装置。In other examples, source device 12 or another computing device may generate a file including the encoded video data. However, for ease of explanation, this disclosure describes file generation device 34 as generating a file. However, it should be understood that, in general, such descriptions apply to computing devices.

视频编码器20及视频解码器30可根据诸如高效率视频译码(HEVC)标准或其扩展的视频压缩标准而操作。HEVC标准还可被称作ISO/IEC 23008-2。最近，已由ITU-T视频译码专家组(VCEG)及ISO/IEC动画专家组(MPEG)的视频译码联合协作小组(JCT-VC)定案HEVC的设计。最新的HEVC草案规格(且下文被称作HEVC WD)可从http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1003-v1.zip获得。对HEVC的多视图扩展(即，MV-HEVC)也正由JCT-3V开发。题为“MV-HEVC Draft Text 5”且下文被称作MV-HEVC WD5的MV-HEVC的最近工作草案(WD)可从http://phenix.it-sudparis.eu/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1004-v6.zip获得。对HEVC的可调式扩展(即，SHVC)也正由JCT-VC开发。题为“High efficiency video coding(HEVC)scalableextension draft 3”且下文被称作SHVC WD3的SHVC的最近工作草案(WD)可从http://phenix.it-sudparis.eu/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1008-v3.zip获得。HEVC的范围扩展的最近工作草案(WD)可从http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1005-v3.zip获得。题为“3D-HEVCDraft Text 1”的HEVC的3D扩展的最近工作草案(WD)(即，3D-HEVC)可从http://phenix.int-evry.fr/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1001-v3.zip获得。视频编码器20及视频解码器30可根据此等标准中的一或多者操作。Video encoder 20 and video decoder 30 may operate according to a video compression standard such as the High Efficiency Video Coding (HEVC) standard or its extensions. The HEVC standard may also be referred to as ISO/IEC 23008-2. The design of HEVC was recently finalized by the Joint Collaboration Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Motion Picture Experts Group (MPEG). The latest HEVC draft specification (hereinafter referred to as HEVC WD) is available at http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1003-v1.zip. A multi-view extension to HEVC, namely MV-HEVC, is also being developed by JCT-3V. The most recent working draft (WD) of MV-HEVC, entitled "MV-HEVC Draft Text 5" and hereinafter referred to as MV-HEVC WD5, is available at http://phenix.it-sudparis.eu/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1004-v6.zip. A scalable extension to HEVC, SHVC, is also being developed by JCT-VC. The most recent working draft (WD) of SHVC, entitled "High Efficiency Video Coding (HEVC) scalable extension draft 3" and hereinafter referred to as SHVC WD3, is available at http://phenix.it-sudparis.eu/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1008-v3.zip. A recent working draft (WD) of the range extension of HEVC is available at http://phenix.int-evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC-N1005-v3.zip. A recent working draft (WD) of the 3D extension of HEVC, i.e., 3D-HEVC, entitled “3D-HEVC Draft Text 1,” is available at http://phenix.int-evry.fr/jct2/doc_end_user/documents/5_Vienna/wg11/JCT3V-E1001-v3.zip. Video encoder 20 and video decoder 30 may operate according to one or more of these standards.

替代地，视频编码器20及视频解码器30可根据其它专属或行业标准(诸如，ITU-TH.264标准，替代地被称作MPEG-4第10部分，高级视频译码(AVC))或此类标准的扩展而操作。然而，本发明的技术不限于任何特定译码标准。视频压缩标准的其它实例包含ITU-TH.261、ISO/IEC MPEG-1Visual、ITU-T H.262或ISO/IEC MPEG-2Visual、ITU-T H.263、ISO/IEC MPEG-4Visual及ITU-T H.264(也称为ISO/IEC MPEG-4AVC)，包含其可调式视频译码(SVC)及多视图视频译码(MVC)扩展。Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4 Part 10, Advanced Video Coding (AVC), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its scalable video coding (SVC) and multi-view video coding (MVC) extensions.

尽管图1中未展示，但在一些方面中，视频编码器20及视频解码器30可各自与音频编码器及解码器集成，且可包含适当MUX-DEMUX单元或其它硬件及软件以处置共同数据串流或单独数据串流中的音频及视频两者的编码。如果适用，那么在一些实例中，MUX-DEMUX单元可符合ITU H.223多任务器协议或其它协议(诸如，用户数据报协议(UDP))。Although not shown in FIG1 , in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include appropriate MUX-DEMUX units or other hardware and software to handle the encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

JCT-VC开发了HEVC标准。HEVC标准化努力是基于视频译码装置的演进型模型(被称作HEVC测试模型(HM))。HM根据(例如)ITU-T H.264/AVC假设视频译码装置相对于现存装置的若干额外能力。举例来说，H.264/AVC提供九个帧内预测编码模式，而HM可提供多达三十三个帧内预测编码模式。The JCT-VC developed the HEVC standard. The HEVC standardization effort is based on an evolved model of a video coding device, known as the HEVC Test Model (HM). The HM assumes several additional capabilities of video coding devices compared to existing devices, based on, for example, ITU-T H.264/AVC. For example, while H.264/AVC provides nine intra-frame prediction coding modes, the HM can provide up to 33 intra-frame prediction coding modes.

一般来说，HM的工作模型描述视频帧或图片可划分成包含明度样本及色度样本两者的树型块或最大译码单元(LCU)的序列。树型块还可被称作译码树单元(CTU)。树型块具有与H.264/AVC标准的宏块类似的目的。切片包含按译码次序的许多连续树型块。视频帧或图片可分割成一或多个切片。每一树型块可根据四分树而分裂成若干译码单元(CU)。举例来说，作为四分树的根节点的树型块可分裂成四个子代节点，且每一子节点又可为亲代节点并分裂成另四个子代节点。作为四分树的叶节点的最后未分裂子节点包括译码节点(即，经译码视频块)。与经译码位流相关联的语法数据可定义树型块可分裂的最大次数，且还可定义译码节点的最小大小。In general, the HM's working model describes how a video frame or picture can be divided into a sequence of treeblocks, or largest coding units (LCUs), that include both luma and chroma samples. A treeblock may also be referred to as a coding tree unit (CTU). A treeblock serves a similar purpose to a macroblock in the H.264/AVC standard. A slice comprises a number of consecutive treeblocks in coding order. A video frame or picture can be partitioned into one or more slices. Each treeblock can be split into coding units (CUs) according to a quadtree. For example, a treeblock, which is the root node of a quadtree, can be split into four child nodes, and each child node can, in turn, be a parent node and split into another four child nodes. The last unsplit child node, which is a leaf node of the quadtree, comprises a coding node (i.e., a coded video block). Syntax data associated with the coded bitstream may define the maximum number of times a treeblock can be split and may also define the minimum size of a coding node.

CU包含译码节点以及与所述译码节点相关联的预测单元(PU)及变换单元(TU)。CU的大小对应于译码节点的大小，且形状必须为正方形。CU的大小可在从8×8像素高达具有最大64×64像素或大于64×64像素的树型块的大小的范围内。每一CU可含有一或多个PU及一或多个TU。与CU相关联的语法数据可描述(例如)CU到一或多个PU的分割。分割模式可在CU是经跳过或经直接模式编码、经帧内预测模式编码还是经帧间预测模式编码之间不同。PU可经分割成非正方形形状。与CU相关联的语法数据还可描述(例如)CU根据四分树到一或多个TU的分割。TU的形状可为正方形或非正方形。A CU comprises a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node. The size of the CU corresponds to the size of the coding node and must be square in shape. The size of a CU can range from 8×8 pixels up to the size of a treeblock with a maximum of 64×64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, the partitioning of the CU into one or more PUs. The partitioning mode may differ depending on whether the CU is skip or direct mode coded, intra-prediction mode coded, or inter-prediction mode coded. A PU may be partitioned into a non-square shape. Syntax data associated with a CU may also describe, for example, the partitioning of the CU into one or more TUs according to a quadtree. The shape of a TU may be square or non-square.

HEVC标准允许根据TU进行变换，所述变换对于不同CU可不同。通常基于针对经分割LCU所定义的给定CU内的PU的大小而对TU设置大小，但可并非总是此状况。TU的大小通常与PU相同或比PU小。在一些实例中，可使用被称为“残余四分树”(RQT)的四分树结构而将对应于CU的残余样本再分为较小单元。RQT的叶节点可被称作TU。与TU相关联的像素差值可经变换以产生可加以量化的变换系数。The HEVC standard allows for transforms to be performed on a per-TU basis, which may be different for different CUs. TUs are typically sized based on the size of the PUs within a given CU defined for a partitioned LCU, but this may not always be the case. TUs are typically the same size as or smaller than the PUs. In some examples, a quadtree structure known as a "residual quadtree" (RQT) may be used to subdivide the residual samples corresponding to a CU into smaller units. The leaf nodes of the RQT may be referred to as TUs. Pixel difference values associated with a TU may be transformed to produce transform coefficients that may be quantized.

一般来说，PU包含与预测程序相关的数据。举例来说，当PU经帧内模式编码时，PU可包含描述用于PU的帧内预测模式的数据。作为另一实例，当PU经帧间模式编码时，PU可包含定义PU的运动向量的数据。定义PU的运动向量的数据可描述(例如)运动向量的水平分量、运动向量的垂直分量、运动向量的分辨率(例如，四分之一像素精度或八分之一像素精度)、运动向量所指向的参考图片，及/或运动向量的参考图片列表(例如，列表0、列表1或列表C)。In general, a PU includes data related to the prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing the intra-prediction mode used for the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may describe, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel precision or eighth-pixel precision), the reference picture to which the motion vector points, and/or the reference picture list for the motion vector (e.g., List 0, List 1, or List C).

一般来说，TU用于变换及量化程序。具有一或多个PU的给定CU还可包含一或多个变换单元(TU)。在预测之后，视频编码器20可计算对应于PU的残余值。残余值包括像素差值，所述像素差值可变换成变换系数，经量化，且使用TU进行扫描以产生串行化变换系数以供用于熵译码。本发明通常使用术语“视频块”来指CU的译码节点(即，译码块)。在一些特定状况下，本发明还可使用术语“视频块”来指树型块(即，LCU或CU)，其包含译码节点以及PU及TU。In general, TUs are used for transform and quantization processes. A given CU having one or more PUs may also include one or more transform units (TUs). After prediction, video encoder 20 may calculate residual values corresponding to the PUs. The residual values include pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using the TUs to produce serialized transform coefficients for use in entropy coding. This disclosure generally uses the term "video block" to refer to a coding node (i.e., a coding block) of a CU. In some specific cases, this disclosure may also use the term "video block" to refer to a treeblock (i.e., an LCU or CU), which includes a coding node as well as a PU and a TU.

视频序列通常包含视频帧或图片系列。图片群组(GOP)大体上包括一系列视频图片中的一或多者。GOP可在GOP的标头、图片中的一或多者的标头或别处中包含语法数据，所述语法数据描述包含在GOP中的图片的数目。图片的每一切片可包含描述所述相应切片的编码模式的切片语法数据。视频编码器20通常对个别视频切片内的视频块进行操作，以便编码视频数据。视频块可对应于CU内的译码节点。视频块可具有固定或变化的大小，且可根据指定译码标准而大小不同。A video sequence typically comprises a series of video frames or pictures. A group of pictures (GOP) generally includes one or more of a series of video pictures. A GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere that describes the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode for the respective slice. Video encoder 20 typically operates on video blocks within individual video slices to encode the video data. A video block may correspond to a coding node within a CU. A video block may have a fixed or varying size and may be sized differently according to a specified coding standard.

作为一实例，HM支持以各种PU大小进行的预测。假设特定CU的大小为2N×2N，则HM支持以2N×2N或N×N的PU大小进行的帧内预测，及以2N×2N、2N×N、N×2N或N×N的对称PU大小进行的帧间预测。HM还支持以2N×nU、2N×nD、nL×2N及nR×2N的PU大小进行的帧间预测的不对称分割。在不对称分割中，CU的一个方向未分割，而另一方向分割成25％及75％。CU的对应于25％分割的部分由“n”其后接着“上(Up)”、“下(Down)”、“左(Left)”或“右(Right)”的指示来指示。因此，例如，“2N×nU”指水平上以顶部的2N×0.5N PU及底部的2N×1.5N PU分割的2N×2N CU。As an example, the HM supports prediction with various PU sizes. Assuming a particular CU is 2N×2N, the HM supports intra prediction with PU sizes of 2N×2N or N×N, and inter prediction with symmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supports asymmetric partitioning for inter prediction with PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N. In an asymmetric partitioning, the CU is unpartitioned in one direction and partitioned into 25% and 75% in the other direction. The portion of the CU corresponding to the 25% partition is indicated by an "n" followed by an indication of "Up," "Down," "Left," or "Right." Thus, for example, "2N×nU" refers to a 2N×2N CU partitioned horizontally with a 2N×0.5N PU at the top and a 2N×1.5N PU at the bottom.

在本发明中，“N×N”与“N乘N”可互换地使用以指视频块在垂直维度与水平维度方面的像素尺寸，例如，16×16像素或16乘16像素。一般来说，16×16块在垂直方向上具有16个像素(y＝16)且在水平方向上具有16个像素(x＝16)。同样地，N×N块通常在垂直方向上具有N个像素且在水平方向上具有N个像素，其中N表示非负整数值。可按行及列来布置块中的像素。此外，块未必需要在水平方向上与垂直方向上具有同一数目个像素。举例来说，块可包括N×M个像素，其中M未必等于N。In this disclosure, "NxN" and "N by N" may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16x16 pixels or 16 by 16 pixels. Generally, a 16x16 block has 16 pixels in the vertical direction (y=16) and 16 pixels in the horizontal direction (x=16). Similarly, an NxN block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, a block need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise NxM pixels, where M is not necessarily equal to N.

在使用CU的PU进行帧内预测性或帧间预测性译码之后，视频编码器20可计算CU的TU的残余数据。PU可包括空间域中(也被称作像素域)的像素数据，且TU可包括在将变换(例如，离散余弦变换(DCT)、整数变换、小波变换或概念上类似的变换)应用于残余视频数据之后的变换域中的系数。所述残余数据可对应于未经编码图片的像素与对应于PU的预测值之间的像素差。视频编码器20可形成包含CU的残余数据的TU，且接着变换所述TU以产生CU的变换系数。After performing intra-predictive or inter-predictive coding using the PUs of a CU, video encoder 20 may calculate residual data for the TUs of the CU. A PU may include pixel data in the spatial domain (also referred to as the pixel domain), and a TU may include coefficients in the transform domain after applying a transform (e.g., a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to the residual video data. The residual data may correspond to pixel differences between pixels of an unencoded picture and prediction values corresponding to the PU. Video encoder 20 may form TUs including the residual data for the CU, and then transform the TUs to produce transform coefficients for the CU.

在进行用以产生变换系数的任何变换之后，视频编码器20可对变换系数执行量化。量化通常指量化变转换系数以可能减少用以表示系数的数据的量，从而提供进一步压缩的程序。量化程序可减少与一些或所有系数相关联的位深度。举例来说，可在量化期间将n位值降值舍位到m位值，其中n大于m。After performing any transforms to produce transform coefficients, video encoder 20 may perform quantization on the transform coefficients. Quantization generally refers to the process of quantizing the transform coefficients to potentially reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all coefficients. For example, an n-bit value may be truncated to an m-bit value during quantization, where n is greater than m.

在一些实例中，视频编码器20可使用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的串行化向量。在其它实例中，视频编码器20可执行自适应性扫描。在扫描经量化变换系数以形成一维向量之后，视频编码器20可(例如)根据内容脉络自适应性可变长度译码(CAVLC)、内容脉络自适应性二进制算术译码(CABAC)、基于语法的内容脉络自适应性二进制算术译码(SBAC)、机率区间分割熵(PIPE)译码或另一熵编码方法来熵编码一维向量。视频编码器20还可熵编码与经编码视频数据相关联的语法元素以供视频解码器30在解码视频数据时使用。In some examples, video encoder 20 may scan the quantized transform coefficients using a predefined scan order to generate a serialized vector that can be entropy encoded. In other examples, video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may entropy encode the one-dimensional vector, for example, according to context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 when decoding the video data.

为执行CABAC，视频编码器20可将内容脉络模型内的内容脉络指派给待发射的符号。所述内容脉络可能涉及(例如)符号的相邻值是否为非零。为执行CAVLC，视频编码器20可选择用于待发射的符号的可变长度码。可变长度译码(VLC)中的码字可经建构以使得相对较短码对应于更可能的符号，而较长码对应于较不可能的符号。以此方式，相对于(例如)针对待发射的每一符号使用相等长度码字，使用VLC可达成位节省。机率确定可基于指派给符号的内容脉络而进行。To perform CABAC, video encoder 20 may assign a context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of a symbol are nonzero. To perform CAVLC, video encoder 20 may select a variable length code for the symbol to be transmitted. Codewords in variable length coding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, using VLC may achieve bit savings relative to, for example, using equal-length codewords for each symbol to be transmitted. Probability determinations may be made based on the context assigned to the symbol.

视频编码器20可输出包含形成经译码图片及相关联数据的表示的位序列的位流。术语“位流”可为用以指网络抽象层(NAL)单元串流(例如，一连串NAL单元)或字节串流(例如，含有开始码前缀的NAL单元串流及如由HEVC标准的附录B指定的NAL单元的囊封)的集合性术语。NAL单元为含有NAL单元中的数据的类型的指示及含有那个数据的呈按需要穿插有仿真阻止位的原始字节序列有效负载(RBSP)的形式的字节的语法结构。NAL单元中的每一者可包含NAL单元标头且可囊封RBSP。NAL单元标头可包含指示NAL单元类型码的语法元素。通过NAL单元的NAL单元标头指定的NAL单元类型码指示NAL单元的类型。RBSP可为含有囊封在NAL单元内的整数数目个字节的语法结构。在一些情况下，RBSP包含零个位。Video encoder 20 may output a bitstream that includes a sequence of bits that form a representation of a coded picture and associated data. The term "bitstream" may be a collective term used to refer to a stream of network abstraction layer (NAL) units (e.g., a series of NAL units) or a stream of bytes (e.g., a stream of NAL units containing a start code prefix and the encapsulation of NAL units as specified by Annex B of the HEVC standard). A NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes containing that data in the form of a raw byte sequence payload (RBSP), interspersed with emulation prevention bits as needed. Each NAL unit may include a NAL unit header and may encapsulate an RBSP. The NAL unit header may include a syntax element that indicates a NAL unit type code. The NAL unit type code specified by the NAL unit header of a NAL unit indicates the type of the NAL unit. The RBSP may be a syntax structure that contains an integer number of bytes encapsulated within the NAL unit. In some cases, the RBSP comprises zero bits.

不同类型的NAL单元可囊封不同类型的RBSP。举例来说，第一类型的NAL单元可囊封PPS的RBSP、第二类型的NAL单元可囊封切片片段的RBSP，第三类型的NAL单元可囊封SEI的RBSP，等等。囊封视频译码数据的RBSP(与参数集及SEI消息的RBSP相对比)的NAL单元可被称作视频译码层(VCL)NAL单元。含有参数集(例如，VPS、SPS、PPS等)的NAL单元可被称作参数集NAL单元。Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit can encapsulate an RBSP for a PPS, a second type of NAL unit can encapsulate an RBSP for a slice segment, a third type of NAL unit can encapsulate an RBSP for SEI, and so on. NAL units that encapsulate RBSPs for video coding data (as opposed to RBSPs for parameter sets and SEI messages) can be referred to as video coding layer (VCL) NAL units. NAL units that contain parameter sets (e.g., VPS, SPS, PPS, etc.) can be referred to as parameter set NAL units.

本发明可参考囊封片段切片的RBSP作为经译码切片NAL单元的NAL单元。如HEVCWD中所定义，切片片段为在图像块扫描中经连续排序且含在单一NAL单元中的整数数目个CTU。相比来说，在HEVC WD中，切片可为含在一个独立切片片段及同一存取单元内的在下一独立切片片段(如果存在)之前的所有后续相关切片片段(如果存在)中的整数数目个CTU。独立切片片段为切片片段标头的语法元素的值并非从先前切片片段的值予以推断的切片片段。相依切片片段为切片片段标头的一些语法元素的值是从按解码次序先前独立切片片段的值予以推断的切片片段。经译码切片NAL单元的RBSP可包含切片片段标头及切片数据。切片片段标头为经译码切片片段中的含有与表示在切片片段中的第一或所有CTU有关的数据元素的一部分。切片标头为独立切片片段的切片片段标头，所述独立切片片段为当前切片片段或按解码次序位于当前相依切片片段之前的最近独立切片片段。This disclosure may refer to the RBSP that encapsulates a slice slice as a NAL unit of a coded slice NAL unit. As defined in HEVC WD, a slice segment is an integer number of CTUs that are sequentially ordered in a tile scan and contained in a single NAL unit. In contrast, in HEVC WD, a slice may be an integer number of CTUs contained in an independent slice segment and all subsequent dependent slice segments (if any) preceding the next independent slice segment (if any) within the same access unit. An independent slice segment is a slice segment in which the values of the syntax elements of the slice segment header are not inferred from the values of the previous slice segment. A dependent slice segment is a slice segment in which the values of some syntax elements of the slice segment header are inferred from the values of the previous independent slice segment in decoding order. The RBSP of a coded slice NAL unit may include a slice segment header and slice data. The slice segment header is a portion of a coded slice segment that contains data elements related to the first or all CTUs represented in the slice segment. The slice header is a slice segment header of an independent slice segment, which is either the current slice segment or the most recent independent slice segment that precedes the current dependent slice segment in decoding order.

VPS为包括适用于零或多个完整经译码视频序列(CVS)的语法元素的语法结构。SPS为含有适用于零或多个完整CVS的语法元素的语法结构。SPS可包含识别在SPS处于作用中时在作用中的VPS的语法元素。因此，VPS的语法元素可比SPS的语法元素更一般化地适用。A VPS is a syntax structure that includes syntax elements applicable to zero or more complete coded video sequences (CVSs). An SPS is a syntax structure that contains syntax elements applicable to zero or more complete CVSs. An SPS may include syntax elements that identify the VPS that is active when the SPS is active. Thus, the syntax elements of a VPS may be more generally applicable than the syntax elements of an SPS.

参数集(例如，VPS、SPS、PPS等)可含有直接或间接从切片的切片标头参考的识别。参考程序被称为“启动”。因此，当视频解码器30正解码特定切片时，由所述特定切片的切片标头中的语法元素直接或间接参考的参数集据称为“经启动”。取决于参数集类型，启动可基于每一图片或基于每一序列发生。举例来说，切片的切片标头可包含识别PPS的语法元素。因此，当视频译码器译码切片时，可启动PPS。此外，PPS可包含识别SPS的语法元素。因此，当识别SPS的PPS经启动时，可启动SPS。SPS可包含识别VPS的语法元素。因此，当识别VPS的SPS经启动时，启动VPS。A parameter set (e.g., VPS, SPS, PPS, etc.) may contain an identification referenced directly or indirectly from a slice header of a slice. The reference procedure is referred to as "activation." Thus, when video decoder 30 is decoding a particular slice, a parameter set referenced directly or indirectly by a syntax element in the slice header of that particular slice is said to be "activated." Depending on the parameter set type, activation may occur on a per-picture or per-sequence basis. For example, a slice header of a slice may include a syntax element identifying a PPS. Thus, when the video coder decodes the slice, the PPS may be activated. Furthermore, the PPS may include a syntax element identifying an SPS. Thus, when the PPS identifying the SPS is activated, the SPS may be activated. The SPS may include a syntax element identifying a VPS. Thus, when the SPS identifying the VPS is activated, the VPS is activated.

视频解码器30可接收由视频编码器20产生的位流。此外，视频解码器30可剖析位流以从所述位流获得语法元素。视频解码器30可至少部分基于从位流获得的语法元素而重建构视频数据的图片。重建构视频数据的程序可与由视频编码器20执行的程序大体互逆。举例来说，视频解码器30可使用PU的运动向量确定当前CU的PU的预测性块。此外，视频解码器30可反量化当前CU的TU的系数块。视频解码器30可对系数块执行反变换，以重建构当前CU的TU的变换块。通过将当前CU的PU的预测性块的样本添加到当前CU的TU的变换块的对应样本，视频解码器30可重建构当前CU的译码块。通过重建构图片的每一CU的译码块，视频解码器30可重建构图片。Video decoder 30 may receive a bitstream generated by video encoder 20. Furthermore, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct a picture of the video data based, at least in part, on the syntax elements obtained from the bitstream. The process of reconstructing the video data may be generally inverse to the process performed by video encoder 20. For example, video decoder 30 may use the motion vectors of the PUs to determine the predictive blocks of the PUs of the current CU. Furthermore, video decoder 30 may inversely quantize the coefficient blocks of the TUs of the current CU. Video decoder 30 may perform an inverse transform on the coefficient blocks to reconstruct the transform blocks of the TUs of the current CU. Video decoder 30 may reconstruct the coding blocks of the current CU by adding samples of the predictive blocks of the PUs of the current CU to corresponding samples of the transform blocks of the TUs of the current CU. By reconstructing the coding blocks of each CU of the picture, video decoder 30 may reconstruct the picture.

在HEVC WD中，CVS可开始于瞬时解码刷新(IDR)图片，或断链存取(BLA)图片，或为位流中的第一图片的清洁随机存取(CRA)图片，包含并非IDR或BLA图片的所有后续图片。IDR图片仅含有I切片(即，仅使用帧内预测的切片)。IDR图片可为按解码次序在位流中的第一图片，或可稍后出现在位流中。每一IDR图片为按解码次序CVS的第一图片。在HEVC WD中，IDR图片可为帧内随机存取点(IRAP)图片，对于所述图片，每一VCL NAL单元具有等于IDR_W_RADL或IDR_N_LP的nal_unit_type。In HEVC WD, a CVS can start with an instantaneous decoding refresh (IDR) picture, a broken link access (BLA) picture, or a clean random access (CRA) picture that is the first picture in the bitstream, including all subsequent pictures that are not IDR or BLA pictures. IDR pictures contain only I slices (i.e., slices that use only intra prediction). IDR pictures can be the first picture in the bitstream in decoding order, or they can appear later in the bitstream. Each IDR picture is the first picture in the CVS in decoding order. In HEVC WD, an IDR picture can be an intra random access point (IRAP) picture, for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

IDR图片可用于随机存取。然而，按解码次序在IDR图片之后的图片不可使用在IDR图片之前解码的图片作为参考。因此，依赖于IDR图片用于随机存取的位流与使用额外类型的随机存取图片的位流相比可具有显著较低的译码效率。在至少一些实例中，IDR存取单元为含有IDR图片的存取单元。IDR pictures can be used for random access. However, pictures that follow an IDR picture in decoding order cannot use pictures decoded before the IDR picture as references. Therefore, a bitstream that relies on IDR pictures for random access can have significantly lower coding efficiency than a bitstream that uses additional types of random access pictures. In at least some examples, an IDR access unit is an access unit that contains an IDR picture.

在HEVC中引入CRA图片的概念以允许按解码次序在CRA图片之后但按输出次序在CRA图片之前的图片将在所述CRA图片之前解码的图片用于参考。按解码次序在CRA图片之后，但按输出次序在CRA图片之前的图片被称作与CRA图片相关联的前置图片(或CRA图片的前置图片)。即，为改进译码效率，在HEVC中引入CRA图片的概念，以允许按解码次序在CRA图片之后但按输出次序在CRA图片之前的图片将在CRA图片之前解码的图片用于参考。CRAL存取单元为经译码图片为CRA图片的存取单元。在HEVC WD中，CRA图片为帧内随机存取图片，对于所述图片，每一VCL NAL单元具有等于CRA_NUT的nal_unit_type。HEVC introduces the concept of CRA pictures to allow pictures that follow a CRA picture in decoding order but precede it in output order to reference pictures decoded before the CRA picture. Pictures that follow a CRA picture in decoding order but precede it in output order are referred to as preceding pictures associated with the CRA picture (or preceding pictures of the CRA picture). That is, to improve coding efficiency, HEVC introduces the concept of CRA pictures to allow pictures that follow a CRA picture in decoding order but precede it in output order to reference pictures decoded before the CRA picture. A CRA access unit is an access unit whose coded picture is a CRA picture. In HEVC WD, a CRA picture is an intra random access picture, for which each VCL NAL unit has a nal_unit_type equal to CRA_NUT.

CRA图片的前置图片在解码开始于IDR图片或按解码次序在所述CRA图片之前出现的CRA图片的情况下可正确地解码。然而，在发生从CRA图片的随机存取时，CRA图片的前置图片可为非可解码的。因此，视频解码器通常在随机存取解码期间解码CRA图片的前置图片。为防止从取决于解码开始于何处而可能不可用的参考图片的误差传播，按解码次序及输出次序两者在CRA图片之后的图片不可将按解码次序或输出次序在CRA图片之前的任何图片(其包含前置图片)用于参考。The preceding pictures of a CRA picture can be correctly decodable if decoding starts with an IDR picture or a CRA picture that occurs before the CRA picture in decoding order. However, when random access occurs from a CRA picture, the preceding pictures of a CRA picture may be non-decodable. Therefore, a video decoder typically decodes the preceding pictures of a CRA picture during random access decoding. To prevent error propagation from reference pictures that may be unavailable depending on where decoding starts, pictures that follow the CRA picture in both decoding order and output order may not use any pictures that precede the CRA picture in decoding order or output order (including the preceding pictures) for reference.

BLA图片的概念是在引入CRA图片之后在HEVC中引入的，且是基于CRA图片的概念。BLA图片通常源自在CRA图片的位置处拼接的位流，且在所述拼接的位流中，将拼接点CRA图片改变到BLA图片。因此，BLA图片可为在原始位流处的CRA图片，且CRA图片由位流拼接器在所述CRA图片的位置处的位流拼接之后改变为BLA图片。在一些情况下，含有RAP图片的存取单元可在本文中被称作RAP存取单元。BLA存取单元为含有BLA图片的存取单元。在HEVC WD中，BLA图片可为帧内随机存取图片，对于所述图片，每一VCL NAL单元具有等于BLA_W_LP、BLA_W_RADL或BLA_N_LP的nal_unit_type。The concept of BLA pictures was introduced in HEVC after the introduction of CRA pictures and is based on the concept of CRA pictures. BLA pictures are usually derived from a bitstream spliced at the position of a CRA picture, and in the spliced bitstream, the splicing point CRA picture is changed to a BLA picture. Therefore, a BLA picture may be a CRA picture at the original bitstream, and the CRA picture is changed to a BLA picture by the bitstream splicer after the bitstream is spliced at the position of the CRA picture. In some cases, an access unit containing a RAP picture may be referred to herein as a RAP access unit. A BLA access unit is an access unit containing a BLA picture. In HEVC WD, a BLA picture may be an intra-frame random access picture, for which each VCL NAL unit has a nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

一般来说，IRAP图片仅含有I切片，且可为BLA图片、CRA图片或IDR图片。举例来说，HEVC WD指示IRAP图片可为每一VCL NAL单元具有在BLA_W_LP到RSV_IRAP_VCL23的范围中(包含BLA_W_LP及RSV_IRAP_VCL23)的nal_unit_type的经译码图片。此外，HEVC WD指示按解码次序在位流中的第一图片必须为IRAP图片。HEVC WD的表7-1展示NAL单元类型码及NAL单元类型类别。以下再现HEVC WD的表7-1。In general, an IRAP picture contains only I slices and can be a BLA picture, a CRA picture, or an IDR picture. For example, HEVC WD indicates that an IRAP picture can be a coded picture for which each VCL NAL unit has a nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive. Furthermore, HEVC WD indicates that the first picture in the bitstream in decoding order must be an IRAP picture. Table 7-1 of HEVC WD shows the NAL unit type code and NAL unit type category. Table 7-1 of HEVC WD is reproduced below.

表7-1-NAL单元类型码及NAL单元类型类别Table 7-1 - NAL unit type code and NAL unit type category

BLA图片与CRA图片之间的一个差异如下。对于CRA图片，如果解码开始于按解码次序在CRA图片之前的RAP图片，那么相关联的前置图片可正确地解码。然而，当发生从CRA图片的随机存取时(即，当解码开始于所述CRA图片时，或换句话说，当所述CRA图片为位流中的第一图片时)，与所述CRA图片相关联的前置图片不可正确地解码。相比之下，可能不存在与BLA图片相关联的前置图片可解码的情形，甚至当解码开始于按解码次序在BLA图片之前的RAP图片时也如此。One difference between BLA pictures and CRA pictures is as follows. For a CRA picture, if decoding starts with a RAP picture that precedes the CRA picture in decoding order, the associated preceding pictures can be correctly decoded. However, when random access occurs from a CRA picture (i.e., when decoding starts with the CRA picture, or in other words, when the CRA picture is the first picture in the bitstream), the preceding pictures associated with the CRA picture cannot be correctly decoded. In contrast, there may not be a situation where the preceding pictures associated with a BLA picture are decodable, even when decoding starts with a RAP picture that precedes the BLA picture in decoding order.

与特定CRA图片或特定BLA图片相关联的前置图片中的一些图片可为可正确解码的，甚至当所述特定CRA图片或所述特定BLA图片为位流中的第一图片时也如此。此等前置图片可被称作可解码前置图片(DLP)或随机存取可解码前置(RADL)图片。在HEVC WD中，RADL图片可为每一VCL NAL单元具有等于RADL_R或RADL_N的nal_unit_type的经译码图片。此外，HEVC WD指示所有RADL图片为前置图片且不将RADL图片用作用于同一相关联的IRAP图片的后置图片的解码程序的参考图片。当存在时，所有RADL图片按解码次序在同一相关联的IRAP图片的所有后置图片之前。HEVC WD指示RADL存取单元可为经译码图片为RADL图片的存取单元。后置图片可为按输出次序在相关联的IRAP图片之后的图片(即，按解码次序的先前IRAP图片)。Some of the leading pictures associated with a particular CRA picture or a particular BLA picture may be correctly decodable, even when the particular CRA picture or the particular BLA picture is the first picture in the bitstream. Such leading pictures may be referred to as decodable leading pictures (DLPs) or random access decodable leading (RADL) pictures. In HEVC WD, a RADL picture may be a coded picture with nal_unit_type equal to RADL_R or RADL_N for each VCL NAL unit. Furthermore, HEVC WD indicates that all RADL pictures are leading pictures and that RADL pictures are not used as reference pictures for the decoding of trailing pictures for the same associated IRAP picture. When present, all RADL pictures precede all trailing pictures for the same associated IRAP picture in decoding order. HEVC WD indicates that a RADL access unit may be an access unit whose coded picture is a RADL picture. A trailing picture may be a picture that follows the associated IRAP picture in output order (i.e., the preceding IRAP picture in decoding order).

其它前置图片可被称作非可解码前置图片(NLP)或随机存取跳过前置(RASL)图片。在HEVC WD中，RASL图片可为每一VCL NAL单元具有等于RASL_R或RASL_N的nal_unit_type的经译码图片。所有RASL图片都为相关联的BLA图片或CRA图片的前置图片。Other preceding pictures may be referred to as non-decodable preceding pictures (NLPs) or random access skip preceding (RASL) pictures. In HEVC WD, RASL pictures may be coded pictures with nal_unit_type equal to RASL_R or RASL_N for each VCL NAL unit. All RASL pictures are preceding pictures of the associated BLA or CRA pictures.

假设必要参数集在其需要启动时可用，那么IRAP图片及按解码次序所有后续非RASL图片可正确地解码，而不执行按解码次序在IRAP图片之前的任何图片的解码程序。在位流中可存在仅含有并非IRAP图片的I切片的图片。Assuming the necessary parameter sets are available when they are needed for activation, an IRAP picture and all subsequent non-RASL pictures in decoding order can be correctly decoded without performing the decoding process for any pictures that precede the IRAP picture in decoding order. There may be pictures in the bitstream that contain only I slices that are not IRAP pictures.

在多视图译码中，可存在来自不同视点的同一场景的多个视图。术语“存取单元”可用以指对应于同一时间实例的图片集。因此，视频数据可经概念化为随时间发生的一系列存取单元。“视图分量”可为单一存取单元中的视图的经译码表示。在本发明中，“视图”可指与同一视图识别符相关联的一连串或一组视图分量。视图分量可含有纹理视图分量及深度视图分量。在本发明中，“视图”可指与同一视图识别符相关联的一组或一连串一或多个视图分量。In multi-view coding, there may be multiple views of the same scene from different viewpoints. The term "access unit" may be used to refer to a set of pictures corresponding to the same time instance. Thus, video data may be conceptualized as a series of access units occurring over time. A "view component" may be a coded representation of a view in a single access unit. In this disclosure, a "view" may refer to a sequence or group of view components associated with the same view identifier. A view component may contain a texture view component and a depth view component. In this disclosure, a "view" may refer to a group or sequence of one or more view components associated with the same view identifier.

纹理视图分量(即，纹理图片)可为单一存取单元中的视图的纹理的经译码表示。纹理视图可为与视图次序索引的相同值相关联的一连串纹理视图分量。视图的视图次序索引可指示所述视图相对于其它视图的相机位置。深度视图分量(即，深度图片)可为单一存取单元中的视图的深度的经译码表示。深度视图可为与视图次序索引的相同值相关联的一组或一连串一或多个深度视图分量。A texture view component (i.e., a texture picture) may be a coded representation of the texture of a view in a single access unit. A texture view may be a sequence of texture view components associated with the same value of a view order index. The view order index of a view may indicate the camera position of the view relative to other views. A depth view component (i.e., a depth picture) may be a coded representation of the depth of a view in a single access unit. A depth view may be a group or sequence of one or more depth view components associated with the same value of a view order index.

在MV-HEVC、3D-HEVC及SHVC中，视频编码器可产生包括一系列NAL单元的位流。位流的不同NAL单元可与位流的不同层相关联。可将层定义为具有同一层识别符的VCL NAL单元及相关联的非VCL NAL单元的集合。层可等效于多视图视频译码中的视图。在多视图视频译码中，层可含有具有不同时间实例的同一层的所有视图分量。每一视图分量可为属于特定时间实例处的特定视图的视频场景的经译码图片。在3D视频译码的一些实例中，层可含有特定视图的所有经译码深度图片或特定视图的经译码纹理图片。在3D视频译码的其它实例中，层可含有特定视图的纹理视图分量及深度视图分量两者。类似地，在可调式视频译码的情况下，层通常对应于具有不同于其它层中的经译码图片的视频特性的经译码图片。此类视频特性通常包含空间分辨率及质量等级(例如，信噪比)。在HEVC及其扩展中，可在一个层内通过将具有特定时间位准的图片群组定义为子层来达成时间可按比例调整性。In MV-HEVC, 3D-HEVC, and SHVC, a video encoder may generate a bitstream comprising a series of NAL units. Different NAL units of the bitstream may be associated with different layers of the bitstream. A layer may be defined as a set of VCL NAL units and associated non-VCL NAL units with the same layer identifier. A layer may be equivalent to a view in multi-view video coding. In multi-view video coding, a layer may contain all view components of the same layer at different time instances. Each view component may be a coded picture of a video scene belonging to a specific view at a specific time instance. In some examples of 3D video coding, a layer may contain all coded depth pictures of a specific view or coded texture pictures of a specific view. In other examples of 3D video coding, a layer may contain both texture view components and depth view components of a specific view. Similarly, in the case of scalable video coding, a layer typically corresponds to a coded picture having different video characteristics than coded pictures in other layers. Such video characteristics typically include spatial resolution and quality level (e.g., signal-to-noise ratio). In HEVC and its extensions, temporal scalability can be achieved within a layer by defining groups of pictures with specific temporal levels as sub-layers.

对于位流的每一相应层，可在不参考任何较高层中的数据的情况下解码较低层中的数据。在可调式视频译码中，例如，可在不参考增强层中的数据的情况下解码基础层中的数据。一般来说，NAL单元可仅囊封单一层的数据。因此，可从位流移除囊封位流的最高剩余层的数据的NAL单元而不影响位流的剩余层中的数据的可解码性。在多视图译码及3D-HEVC中，较高层可包含额外视图分量。在SHVC中，较高层可包含信噪比(SNR)增强数据、空间增强数据及/或时间增强数据。在MV-HEVC、3D-HEVC及SHVC中，如果视频解码器可在不参考任何其它层的数据的情况下解码层中的图片，那么所述层可被称作“基础层”。基础层可符合HEVC基础规格(例如，HEVC WD)。For each respective layer of a bitstream, data in lower layers can be decoded without reference to data in any higher layers. In scalable video coding, for example, data in a base layer can be decoded without reference to data in enhancement layers. In general, a NAL unit may encapsulate data for only a single layer. Thus, a NAL unit encapsulating data for the highest remaining layer of the bitstream can be removed from the bitstream without affecting the decodability of data in the remaining layers of the bitstream. In multi-view coding and 3D-HEVC, higher layers may include additional view components. In SHVC, higher layers may include signal-to-noise ratio (SNR) enhancement data, spatial enhancement data, and/or temporal enhancement data. In MV-HEVC, 3D-HEVC, and SHVC, a layer may be referred to as a "base layer" if a video decoder can decode pictures in the layer without reference to data in any other layer. The base layer may conform to the HEVC base specification (e.g., HEVC WD).

在SVC中，除基础层外的层可被称作“增强层”，且可提供增强从位流解码的视频数据的视觉质量的信息。SVC可增强空间分辨率、信噪比(即，质量)或时间速率。在可调式视频译码(例如，SHVC)中，“层表示”可为单一存取单元中的空间层的经译码表示。为易于解释，本发明可将视图分量及/或层表示称作“视图分量/层表示”或简单地称作“图片”。In SVC, layers other than the base layer may be referred to as "enhancement layers" and may provide information that enhances the visual quality of video data decoded from the bitstream. SVC may enhance spatial resolution, signal-to-noise ratio (i.e., quality), or temporal rate. In scalable video coding (e.g., SHVC), a "layer representation" may be a coded representation of a spatial layer in a single access unit. For ease of explanation, this disclosure may refer to view components and/or layer representations as "view components/layer representations" or simply as "pictures."

为实施HEVC中的层，NAL单元的标头包含nuh_layer_id语法元素，其先前被称作在最终HEVC标准之前的各种工作草案中的nuh_reserved_zero_6bits语法元素。在基础HEVC标准中，nuh_layer_id语法元素限于值0。然而，在MV-HEVC、3D-HEVC及SVC中，nuh_layer_id语法元素可大于0以指定层的识别符。位流的具有指定不同值的nuh_layer_id语法元素的NAL单元属于位流的不同层。To implement layers in HEVC, the header of a NAL unit includes the nuh_layer_id syntax element, previously referred to as the nuh_reserved_zero_6bits syntax element in various working drafts prior to the final HEVC standard. In the base HEVC standard, the nuh_layer_id syntax element is limited to a value of 0. However, in MV-HEVC, 3D-HEVC, and SVC, the nuh_layer_id syntax element can be greater than 0 to specify a layer identifier. NAL units of a bitstream that specify different values for the nuh_layer_id syntax element belong to different layers of the bitstream.

在一些实例中，如果NAL单元与多视图译码(例如，MV-HEVC)、3DV译码(例如，3D-HEVC)或可调式视频译码(例如，SHVC)中的基础层有关，那么所述NAL单元的nuh_layer_id语法元素等于0。可在不参考位流的任何其它层中的数据的情况下解码位流的基础层中的数据。如果NAL单元不与多视图译码、3DV或可调式视频译码中的基础层有关，那么所述NAL单元的nuh_layer_id语法元素可具有非零值。In some examples, if a NAL unit is associated with a base layer in multi-view coding (e.g., MV-HEVC), 3DV coding (e.g., 3D-HEVC), or scalable video coding (e.g., SHVC), the nuh_layer_id syntax element of the NAL unit is equal to 0. Data in the base layer of the bitstream can be decoded without reference to data in any other layer of the bitstream. If a NAL unit is not associated with a base layer in multi-view coding, 3DV, or scalable video coding, the nuh_layer_id syntax element of the NAL unit may have a non-zero value.

此外，层内的一些视图分量/层表示可在不参考同一层内的其它视图分量/层表示的情况下加以解码。因此，囊封层的某些视图分量/层表示的数据的NAL单元可从位流移除，而不影响所述层中的其它视图分量/层表示的可解码性。移除囊封此类视图分量/层表示的数据的NAL单元可减小位流的帧速率。可在不参考在层内的其它视图分量/层表示的情况下解码的在所述层内的视图分量/层表示的子集可在本文中被称作“子层”或“时间子层”。In addition, some view components/layer representations within a layer can be decodable without reference to other view components/layer representations within the same layer. Thus, NAL units that encapsulate data for certain view components/layer representations of a layer can be removed from the bitstream without affecting the decodability of other view components/layer representations in that layer. Removing the NAL units that encapsulate data for such view components/layer representations can reduce the frame rate of the bitstream. A subset of view components/layer representations within a layer that can be decodable without reference to other view components/layer representations within the layer may be referred to herein as a "sub-layer" or "temporal sub-layer."

NAL单元可包含指定NAL单元的时间识别符(即，TemporalId)的temporal_id语法元素。NAL单元的时间识别符识别NAL单元所属于的子层。因此，位流的每一子层可具有不同时间识别符。一般来说，如果层的第一NAL单元的时间识别符小于同一层的第二NAL单元的时间识别符，那么可在不参考由第二NAL单元囊封的数据的情况下解码由第一NAL单元囊封的数据。A NAL unit may include a temporal_id syntax element that specifies the temporal identifier (i.e., TemporalId) of the NAL unit. The temporal identifier of a NAL unit identifies the sublayer to which the NAL unit belongs. Thus, each sublayer of a bitstream may have a different temporal identifier. In general, if the temporal identifier of the first NAL unit of a layer is less than the temporal identifier of the second NAL unit of the same layer, the data encapsulated by the first NAL unit can be decoded without reference to the data encapsulated by the second NAL unit.

位流可与多个操作点相关联。位流的每一操作点与层识别符集合(例如，nuh_layer_id值的集合)及时间识别符相关联。可将所述层识别符集合表示为OpLayerIdSet，且可将时间识别符表示为TemporalID。如果NAL单元的层识别符在操作点的层识别符集合中，且NAL单元的时间识别符小于或等于操作点的时间识别符，那么NAL单元与操作点相关联。因此，操作点可对应于所述位流中的NAL单元的子集。HEVC将操作点定义为位流，其是通过子位流提取程序的操作而从另一位流产生，其中所述另一位流、目标最高TemporalId及目标层识别符列表作为输入。A bitstream can be associated with multiple operation points. Each operation point of the bitstream is associated with a set of layer identifiers (e.g., a set of nuh_layer_id values) and a temporal identifier. The layer identifier set can be represented as OpLayerIdSet, and the temporal identifier can be represented as TemporalID. A NAL unit is associated with an operation point if its layer identifier is in the operation point's layer identifier set and its temporal identifier is less than or equal to the operation point's temporal identifier. Thus, an operation point can correspond to a subset of the NAL units in the bitstream. HEVC defines an operation point as a bitstream generated from a bitstream by the operation of a sub-bitstream extraction procedure, with the bitstream, target highest TemporalId, and target layer identifier list as input.

如上所介绍，本发明涉及基于ISO基本媒体文件格式(ISOBMFF)将视频内容存储在文件中。换句话说，本发明描述用于存储含有多个经译码层的视频串流的各种技术，其中每一层可为可调式层、纹理视图、深度视图或其它类型的层或视图。本发明的技术可适用于(例如)存储MV-HEVC视频数据、SHVC视频数据、3D-HEVC视频数据及/或其它类型的视频数据。As introduced above, this disclosure relates to storing video content in files based on the ISO Base Media File Format (ISOBMFF). In other words, this disclosure describes various techniques for storing a video stream containing multiple coded layers, where each layer may be a scalable layer, a texture view, a depth view, or other types of layers or views. The techniques of this disclosure may be applicable, for example, to storing MV-HEVC video data, SHVC video data, 3D-HEVC video data, and/or other types of video data.

现将简要地论述文件格式及文件格式标准。文件格式标准包含ISO基本媒体文件格式(ISOBMFF、ISO/IEC 14496-12，下文为“ISO/IEC 14996-12”)及从ISOBMFF导出的其它文件格式标准，包含MPEG-4文件格式(ISO/IEC 14496-14)、3GPP文件格式(3GPP TS26.244)及AVC文件格式(ISO/IEC 14496-15，下文为“ISO/IEC 14996-15”)。因此，ISO/IEC14496-12指定ISO基本媒体文件格式。其它文件针对特定应用扩展ISO基本媒体文件格式。举例来说，ISO/IEC 14496-15描述呈ISO基本媒体文件格式的NAL单元结构化视频的携载。H.264/AVC及HEVC以及其扩展为NAL单元结构化视频的实例。ISO/IEC 14496-15包含描述H.264/AVCNAL单元的携载的章节。另外，ISO/IEC 14496-15的第8章描述HEVC NAL单元的携载。File formats and file format standards will now be briefly discussed. File format standards include the ISO base media file format (ISOBMFF, ISO/IEC 14496-12, hereinafter "ISO/IEC 14996-12") and other file format standards derived from ISOBMFF, including the MPEG-4 file format (ISO/IEC 14496-14), the 3GPP file format (3GPP TS 26.244), and the AVC file format (ISO/IEC 14496-15, hereinafter "ISO/IEC 14996-15"). Thus, ISO/IEC 14496-12 specifies the ISO base media file format. Other documents extend the ISO base media file format for specific applications. For example, ISO/IEC 14496-15 describes the carrying of NAL unit structured video in the ISO base media file format. H.264/AVC and HEVC, and their extensions, are examples of NAL unit structured video. ISO/IEC 14496-15 includes a section describing the carriage of H.264/AVC NAL units. In addition, Chapter 8 of ISO/IEC 14496-15 describes the carriage of HEVC NAL units.

将ISOBMFF用作用于许多编码解码器囊封格式(诸如，AVC文件格式)以及用于许多多媒体容器格式(诸如，MPEG-4文件格式、3GPP文件格式(3GP)及DVB文件格式)的基础。除诸如音频及视频的连续媒体外，诸如图像的静态媒体以及元数据可存储在符合ISOBMFF的文件中。根据ISOBMFF结构化的文件可用于许多目的，包含本端媒体文件播放、远程文件的渐进下载、用于经由HTTP的动态自适应性串流发射(DASH)的片段、用于待串流发射的内容及其包化指令的容器，及所接收的实时媒体串流的记录。因此，尽管最初针对存储而设计，但ISOBMFF已证明对串流发射(例如，用于渐进下载或DASH)有价值。出于串流发射目的，可使用在ISOBMFF中定义的电影分段。ISOBMFF serves as the foundation for many codec encapsulation formats, such as the AVC file format, and for many multimedia container formats, such as the MPEG-4 file format, the 3GPP file format (3GP), and the DVB file format. In addition to continuous media, such as audio and video, static media, such as images, and metadata can be stored in files conforming to ISOBMFF. Files structured according to ISOBMFF can be used for many purposes, including local media file playback, progressive download of remote files, segments for Dynamic Adaptive Streaming over HTTP (DASH), containers for content to be streamed and its packetization instructions, and recording of received real-time media streams. Thus, although originally designed for storage, ISOBMFF has proven valuable for streaming, such as for progressive download or DASH. For streaming purposes, movie segments, as defined in ISOBMFF, can be used.

符合HEVC文件格式的文件可包括一系列称作方框的对象。方框可为由唯一类型识别符及长度定义的面向对象式建置块。举例来说，方框可为ISOBMFF中的基本语法结构，包含四字符译码方框类型、方框的字节计数及有效负载。换句话说，方框可为包括经译码方框类型、方框的字节计数及有效负载的语法结构。在一些情况下，在符合HEVC文件格式的文件中的所有数据可含在方帧内，且在不处于方框中的文件中可能不存在数据。因此，ISOBMFF文件可由一连串方框组成，且方框可含有其它方框。举例来说，方框的有效负载可包含一或多个额外方框。在本发明中别处详细描述的图5A、图5B及图6展示根据本发明的一或多种技术的文件内的实例方框。A file conforming to the HEVC file format may include a series of objects called boxes. A box may be an object-oriented building block defined by a unique type identifier and a length. For example, a box may be a basic syntax structure in ISOBMFF, comprising a four-character coded box type, a byte count for the box, and a payload. In other words, a box may be a syntax structure that includes a coded box type, a byte count for the box, and a payload. In some cases, all data in a file conforming to the HEVC file format may be contained within frames, and data may not exist in the file that is not in a box. Therefore, an ISOBMFF file may consist of a series of boxes, and a box may contain other boxes. For example, a box's payload may include one or more additional boxes. Figures 5A, 5B, and 6, described in detail elsewhere in this disclosure, show example boxes within a file according to one or more techniques of this disclosure.

符合ISOBMFF的文件可包含各种类型的方框。举例来说，符合ISOBMFF的文件可包含文件类型方框、媒体数据方框、电影方框、电影分段方框等。在此实例中，文件类型方框包含文件类型及兼容性信息。媒体数据方框可含有样本(例如，经译码图片)。电影方框(“moov”)含有用于存在于文件中的连续媒体串流的元数据。可将连续媒体串流中的每一者在文件中表示为播放轨。举例来说，电影方框可含有关于电影的元数据(例如，样本之间的逻辑及时序关系，以及指向样本的位置的指针)。电影方框可包含若干类型的子方框。电影方框中的子方框可包含一或多个播放轨方框。播放轨方框可包含关于电影的个别播放轨的信息。播放轨方框可包含指定单一播放轨的总体信息的播放轨标头方框。此外，播放轨方框可包含含有媒体信息方框的媒体方框。媒体信息方框可包含含有媒体样本在播放轨中的数据索引的样本表方框。样本表方框中的信息可用以按时间(且对于播放轨的样本中的每一者，按类型、大小、容器及到样本的那个容器的偏移)定位样本。因此，将用于播放轨之元数据围封在播放轨方框(“trak”)中，而将播放轨的媒体内容围封在媒体数据方框(“mdat”)中或直接围封在单独文件中。用于播放轨的媒体内容包括一连串样本(例如，由一连串样本组成)，诸如音频或视频存取单元。An ISOBMFF-compliant file may contain various types of boxes. For example, an ISOBMFF-compliant file may include a file type box, a media data box, a movie box, a movie segment box, and so on. In this example, the file type box contains the file type and compatibility information. The media data box may contain samples (e.g., coded pictures). The movie box ("moov") contains metadata for the continuous media streams present in the file. Each of these continuous media streams may be represented in the file as a track. For example, a movie box may contain metadata about the movie (e.g., the logical and temporal relationships between samples, and pointers to the locations of the samples). A movie box may contain several types of sub-boxes. A sub-box within a movie box may include one or more track boxes. A track box may contain information about individual tracks of the movie. A track box may include a track header box that specifies overall information for a single track. Furthermore, a track box may include a media box containing a media information box. The media information box may include a sample table box containing data indices of media samples in a track. The information in the sample table box can be used to locate the sample in time (and, for each of the samples in the track, by type, size, container, and offset to that container of the sample). Thus, metadata for a track is enclosed in a track box ("trak"), while the media content of the track is enclosed in a media data box ("mdat") or directly in a separate file. The media content for a track includes (e.g., consists of) a sequence of samples, such as an audio or video access unit.

ISOBMFF指定以下类型的播放轨：媒体播放轨，其含有基本媒体串流；提示播放轨，其包含媒体发射指令或表示所接收的包串流；及计时元数据播放轨，其包括时间同步的元数据。用于每一播放轨的元数据包含样本描述条目的列表，每一条目提供在播放轨中使用的译码或囊封格式及对于处理彼格式所需要的初始化数据。每一样本与播放轨的样本描述条目中的一者相关联。ISOBMFF specifies the following types of tracks: media tracks, which contain elementary media streams; hint tracks, which contain media transmission instructions or represent received packet streams; and timing metadata tracks, which include metadata for time synchronization. The metadata for each track consists of a list of sample description entries, each of which specifies the encoding or encapsulation format used in the track and the initialization data required to process that format. Each sample is associated with one of the sample description entries in the track.

ISOBMFF实现通过各种机制指定样本特定元数据。样本表方框(“stbl”)内的特定方框已经标准化以对共同需求作出响应。举例来说，同步样本方框(“stss”)为样本表方帧内的方框。同步样本方框用以列出播放轨的随机存取样本。本发明可将由同步样本方框列出的样本称作同步样本。在另一实例中，样本分群机制实现根据四字符分群类型将样本映像成共享指定为文件中的样本群组描述条目的同一性质的样本的群组。已在ISOBMFF中指定若干分群类型。ISOBMFF implementations specify sample-specific metadata through various mechanisms. Certain boxes within the sample table box ("stbl") have been standardized to respond to common needs. For example, the synchronization sample box ("stss") is a box within the sample table frame. The synchronization sample box is used to list random access samples for a playback track. The present invention may refer to the samples listed by the synchronization sample box as synchronization samples. In another example, a sample grouping mechanism implements the mapping of samples into groups of samples that share the same properties specified as sample group description entries in the file based on a four-character grouping type. Several grouping types have been specified in ISOBMFF.

样本表方框可包含一或多个SampleToGroup方框及一或多个样本群组描述方框(即，SampleGroupDescription方框)。SampleToGroup方框可用以确定样本所属于的样本群组，连同所述样本群组的相关联描述。换句话说，SampleToGroup方框可指示样本所属于的群组。SampleToGroup方框可具有“sbgp”的方框类型。SampleToGroup方框可包含分群类型元素(例如，grouping_type)。分群类型元素可为识别样本分群的类型(即，用以形成样本群组的准则)的整数。此外，SampleToGroup方框可包含一或多个条目。SampleToGroup方框中的每一条目可与播放轨中的一系列不同的非重叠连续样本相关联。每一条目可指示样本计数元素(例如，sample_count)及群组描述索引元素(例如，group_description_index)。条目的样本计数元素可指示与所述条目相关联的样本的数目。换句话说，条目的样本计数元素可为给出具有相同样本群组描述符的连续样本的数目的整数。群组描述索引元素可识别含有与所述条目相关联的样本的描述的SampleGroupDescription方框。多个条目的群组描述索引元素可识别同一SampleGroupDescription方框。The Sample Table box may include one or more SampleToGroup boxes and one or more Sample Group Description boxes (i.e., SampleGroupDescription boxes). The SampleToGroup box can be used to identify the sample group to which a sample belongs, along with an associated description of the sample group. In other words, the SampleToGroup box may indicate the group to which the sample belongs. The SampleToGroup box may have a box type of "sbgp." The SampleToGroup box may include a grouping type element (e.g., grouping_type). The grouping type element may be an integer that identifies the type of sample grouping (i.e., the criteria used to form the sample group). In addition, the SampleToGroup box may include one or more entries. Each entry in the SampleToGroup box may be associated with a series of different, non-overlapping, consecutive samples in a playback track. Each entry may indicate a sample count element (e.g., sample_count) and a group description index element (e.g., group_description_index). The sample count element of an entry may indicate the number of samples associated with the entry. In other words, the sample count element of an entry may be an integer that specifies the number of consecutive samples with the same sample group descriptor. The GroupDescriptionIndex element may identify a SampleGroupDescription box containing a description of the sample associated with the entry. The GroupDescriptionIndex elements of multiple entries may identify the same SampleGroupDescription box.

当前文件格式设计可具有一或多个问题。为基于ISOBMFF存储特定视频编码解码器的视频内容，可能需要关于那个视频编码解码器的文件格式规格。为存储含有诸如MV-HEVC及SHVC的多个层的视频串流，可重新使用来自SVC及MVC文件格式的概念中的一些概念。然而，许多部分不能直接用于SHVC及MV-HEVC视频串流。HEVC文件格式的直接应用具有至少下列缺点：SHVC及MV-HEVC位流可开始于含有基础层中的IRAP图片但也可含有其它层中的其它非IRAP图片的存取单元，或反之亦然。同步样本当前不允许指示此点用于随机存取。The current file format design may have one or more problems. To store video content for a specific video codec based on ISOBMFF, a file format specification for that video codec may be needed. To store video streams containing multiple layers such as MV-HEVC and SHVC, some concepts from the SVC and MVC file formats can be reused. However, many parts cannot be directly used for SHVC and MV-HEVC video streams. Direct application of the HEVC file format has at least the following disadvantages: SHVC and MV-HEVC bitstreams may start with an access unit that contains an IRAP picture in the base layer but may also contain other non-IRAP pictures in other layers, or vice versa. Sync samples currently do not allow indicating this point for random access.

本发明描述对以上问题的潜在解决方案，以及提供其它潜在改进，以实现含有多个层的视频串流的高效且灵活存储。本发明中所描述的技术潜在地适用于用于存储由任何视频编码解码器译码的此视频内容的任何文件格式，但所述描述对基于HEVC文件格式存储SHVC及MV-HEVC视频串流为特定的，其在ISO/IEC 14496-15的条款8中指定。This disclosure describes potential solutions to the above problems, as well as other potential improvements, for efficient and flexible storage of video streams containing multiple layers. The techniques described in this disclosure are potentially applicable to any file format for storing such video content decoded by any video codec, but the description is specific to storing SHVC and MV-HEVC video streams based on the HEVC file format, which is specified in clause 8 of ISO/IEC 14496-15.

以下描述本发明的一些技术的实例实施。以下描述的实例实施是基于在MPEG输出文件W13478中的14496-15的最新集成规格。以下包含对附录A的改变(通过下划线展示)及添加的章节(第9章针对SHVC，且第10章针对MV-HEVC)。换句话说，本发明的特定实例可修改ISO/IEC 14496-15的附录A，且可将第9章及/或第10章添加到ISO/IEC 14496-15。通过下划线及双下划线展示的文字可具有与本发明的实例的特定相关性。尽管在本文中描述的实例中各处使用术语SHVC，但本发明的设计实际上不仅将仅支持SHVC编码解码器，而是可支持包含MV-HEVC、3D-HEVC的所有多层编码解码器，除非另外明确地提及。The following describes example implementations of some of the techniques of the present invention. The example implementations described below are based on the latest integrated specification of 14496-15 in the MPEG output file W13478. The following includes changes to Appendix A (shown by underlining) and added chapters (Chapter 9 for SHVC and Chapter 10 for MV-HEVC). In other words, specific examples of the present invention may modify Appendix A of ISO/IEC 14496-15, and Chapter 9 and/or Chapter 10 may be added to ISO/IEC 14496-15. Text shown by underlining and double underlining may have specific relevance to the examples of the present invention. Although the term SHVC is used throughout the examples described herein, the design of the present invention will actually support not only SHVC codecs, but all multi-layer codecs including MV-HEVC and 3D-HEVC, unless explicitly mentioned otherwise.

ISOBMFF规格指定适用于DASH的六种类型的串流存取点(SAP)。前两个SAP类型(类型1及类型2)对应于H.264/AVC及HEVC中的IDR图片。第三SAP类型(类型3)对应于开放GOP随机存取点，因此对应于HEVC中的BLA或CRA图片。第四SAP类型(类型4)对应于GDR随机存取点。The ISOBMFF specification specifies six types of stream access points (SAPs) for DASH. The first two SAP types (Type 1 and Type 2) correspond to IDR pictures in H.264/AVC and HEVC. The third SAP type (Type 3) corresponds to an open GOP random access point, and therefore corresponds to a BLA or CRA picture in HEVC. The fourth SAP type (Type 4) corresponds to a GDR random access point.

在当前L-HEVC文件格式中，一些高层级信息(例如，位流中的层、位速率、帧速率、时间子层、平行度、操作点等的信息)是在LHEVCSampleEntry、HEVCLHVCSampleEntry、LHVCDecoderConfigurationRecord、播放轨内容信息('tcon')及OperationPointsInformationBox('oinf')中用信号发送。在一个实例中，上述方框的语法设计如下：In the current L-HEVC file format, some high-level information (e.g., information about the layers in the bitstream, bit rate, frame rate, temporal sub-layers, parallelism, operation points, etc.) is signaled in LHEVCSampleEntry, HEVCLHVCSampleEntry, LHVCDecoderConfigurationRecord, TrackContentInfo ('tcon'), and OperationPointsInformationBox ('oinf'). In one example, the syntax of the above boxes is as follows:

基于以上方框的当前结构及其中所含的信息，为播放文件中的内容，播放器可经配置以首先寻找“oinf”方框(在文件中仅一者)以知晓包含何些操作点，且接着选择所述操作点中的一者待播放。视频播放器接着可检查“tcon”方框(含有L-HEVC视频的每一播放轨中的一者)以知晓哪些播放轨含有所选择操作点的层。Based on the current structure of the above boxes and the information contained therein, to play the content in the file, the player can be configured to first look for the "oinf" box (there is only one in the file) to know which operation points are included, and then select one of the operation points to play. The video player can then check the "tcon" box (one for each track containing L-HEVC video) to know which tracks contain the layers of the selected operation point.

牢记当前设计的上述基本使用，本发明提议将更多信息(诸如，表示格式(其包含空间分辨率、位深度及色彩格式)、位速率及帧速率)包含到“oinf”方框中以实现操作点的选择。每一播放轨中的样本条目包含此类信息的一个集合，但仅针对特定操作点。当多个操作点含在一个播放轨中时，其它操作点的信息遗漏。Keeping in mind the basic usage of the current design, the present invention proposes to include more information (such as the representation format (which includes spatial resolution, bit depth, and color format), bit rate, and frame rate) in the "oinf" box to enable the selection of an operation point. The sample entry in each playback track contains a set of this information, but only for a specific operation point. When multiple operation points are included in a playback track, information for other operation points is omitted.

另一问题涉及LHEVCDecoderConfigurationRecord中的许多字段的语义不清晰且其中的一些字段令人混淆的实情。举例来说，配置文件、阶层及层级(PTL)、chromaFormat、bitDepthLumaMinus8及bitDepthChromaMinus8为层特定性质，但当前据称适用于通过operationPointIdx指示的操作点。当操作点含有一个以上层时，语义简单而言是不清晰的。Another issue relates to the fact that the semantics of many fields in LHEVCDecoderConfigurationRecord are unclear, and some of them are confusing. For example, Profile, Tier, and Level (PTL), chromaFormat, bitDepthLumaMinus8, and bitDepthChromaMinus8 are layer-specific properties, but are currently said to apply to the operation point indicated by operationPointIdx. When the operation point contains more than one layer, the semantics are simply unclear.

实际上，基于设计的常规基本使用的步骤，样本条目中的信息中的一些信息实际上无用，尤其在“oinf”方框中存在足够信息用于操作点选择时。In practice, based on the steps of conventional basic use of the design, some of the information in the sample entries is not actually useful, especially when there is enough information in the "oinf" box for operating point selection.

又一问题为在SHVC及MV-HEVC中，仅针对每一必要层(即，为输出层的层或通过操作点内的输出层直接或间接参考的层或其两者)而非针对任何不必要层(并非必要层的层)用信号发送PTL。因此，在文件格式设计中，针对不必要层用信号发送PTL可能并非必要的。Another problem is that in SHVC and MV-HEVC, PTL is signaled only for each essential layer (i.e., a layer that is an output layer or a layer directly or indirectly referenced by an output layer in an operation point, or both), and not for any unnecessary layer (a layer that is not an essential layer). Therefore, in file format design, signaling PTL for unnecessary layers may not be necessary.

下文列出对本发明中所描述的方法及技术的概述。实例详述实施是在稍后章节中提供。本发明的方法及技术可独立地应用或可以组合方式应用。The following is an overview of the methods and techniques described in the present invention. Detailed examples are provided in later sections. The methods and techniques of the present invention can be applied independently or in combination.

本发明的第一技术包含移除在LHEVC样本条目及HEVCLHVC样本条目内MPEG4BitRateBox()在LHEVCConfigurationBox之后的发信。实情为，实现针对“oinf”方框中的每一操作点用信号发送位速率信息。The first technique of this disclosure includes removing the signaling of MPEG4BitRateBox() after LHEVCConfigurationBox in LHEVC sample entries and HEVC LHVC sample entries. Instead, the bit rate information is signaled for each operation point in the "oinf" box.

本发明的第二技术包含针对“oinf”方框中的每一操作点用信号发送关于表示格式(其包含空间分辨率、位深度及色彩格式)的信息。A second technique of this disclosure includes signaling information about the representation format (which includes spatial resolution, bit depth, and color format) for each operation point in the "oinf" box.

本发明的第三技术包含从LHEVCDecoderConfigurationRecord移除已提供在“oinf”方框中或提议待添加到“oinf”方框的PTL信息、表示格式信息及帧速率信息。LHEVCDecoderConfigurationRecord中的剩余信息适用于播放轨中所含的所有层。在第三技术的另一实例中，重建构LHEVCDecoderConfigurationRecord的设计，以使得针对每一层用信号发送表示格式信息及帧速率信息及可能额外参数/信息(例如，平行度信息)。LHEVCDecoderConfigurationRecord中的语法元素无正负号int(2)parallelismType可指示平行解码特征的何类型可用于解码层中的图片。图像块、波前及切片为可用于促进平行处理的图片片段机制的实例。A third technique of the present disclosure includes removing the PTL information, representation format information, and frame rate information that has been provided in the "oinf" box or proposed to be added to the "oinf" box from the LHEVCDecoderConfigurationRecord. The remaining information in the LHEVCDecoderConfigurationRecord applies to all layers contained in the playback track. In another example of the third technique, the design of the LHEVCDecoderConfigurationRecord is restructured so that the representation format information and frame rate information and possibly additional parameters/information (e.g., parallelism information) are signaled for each layer. The syntax element unsigned int(2)parallelismType in the LHEVCDecoderConfigurationRecord may indicate what type of parallel decoding feature may be used for pictures in the decoding layer. Image blocks, wavefronts, and slices are examples of picture segment mechanisms that can be used to facilitate parallel processing.

本发明的第四技术包含从LHEVCDecoderConfigurationRecord移除operationPointIdx。在第四技术的另一实例中，实现与播放轨相关联的操作点索引的列表在LHEVCDecoderConfigurationRecord中的发信。The fourth technique of this disclosure includes removing operationPointIdx from LHEVCDecoderConfigurationRecord. In another example of the fourth technique, a list of operation point indices associated with a track is signaled in LHEVCDecoderConfigurationRecord.

本发明的第五技术包含改变“oinf”方框中的layer_count字段的语义以仅对操作点的必要层进行计数。A fifth technique of this disclosure involves changing the semantics of the layer_count field in the "oinf" box to count only the necessary layers for the operation point.

下文描述本发明的方法及技术的实例实施。在以下实例中，展示相对于HEVC及LHEVC文件格式的文字改变。在识别符[START INSERTION]与[END INSERTION]之间展示添加的文字。在识别符[START DELETION]与[END DELETION]之间展示删除的文字。The following describes an example implementation of the methods and techniques of this disclosure. In the following example, text changes relative to the HEVC and LHEVC file formats are shown. Added text is shown between the identifiers [START INSERTION] and [END INSERTION]. Deleted text is shown between the identifiers [START DELETION] and [END DELETION].

下文描述第一实施。The first implementation is described below.

此章节描述对本发明技术1、2、3(不包含其实例a.)、4(不包含其实例a.)及5的LHEVCSampleEntry、HEVCLHVCSampleEntry、LHVCDecoderConfigurationRecord及OperationPointsInformationBox('oinf')的发信的详细修改。This section describes detailed modifications to the signaling of LHEVCSampleEntry, HEVCLHVCSampleEntry, LHVCDecoderConfigurationRecord, and OperationPointsInformationBox ('oinf') of techniques 1, 2, 3 (excluding example a. thereof), 4 (excluding example a. thereof), and 5 of this disclosure.

layer_count：此字段指示为[START INSERTION]所述[END INSERTION][STARTDELETION]一[END DELETION]操作点的一部分的[START INSERTION]必要[END INSERTION]层的数目。layer_count: This field indicates the number of necessary layers that are part of the [START INSERTION] to [END INSERTION] to [START DELETION] to [END DELETION] operation points.

……

[START INSERTION][START INSERTION]

minPicWidth指定如通过用于操作点的串流的ISO/IEC 23008-2中的pic_width_in_luma_samples参数所定义的明度宽度指示符的最小值。minPicWidth specifies the minimum value of the luma width indicator as defined by the pic_width_in_luma_samples parameter in ISO/IEC 23008-2 for streaming of operation points.

minPicHeight指定如通过用于操作点的串流的ISO/IEC 23008-2中的pic_height_in_luma_samples参数所定义的明度高度指示符的最小值。minPicHeight specifies the minimum value of the luma height indicator as defined by the pic_height_in_luma_samples parameter in ISO/IEC 23008-2 for streaming of operation points.

maxPicWidth指定如通过用于操作点的串流的ISO/IEC 23008-2中的pic_width_in_luma_samples参数所定义的明度宽度指示符的最大值。maxPicWidth specifies the maximum value of the luma width indicator as defined by the pic_width_in_luma_samples parameter in ISO/IEC 23008-2 for streaming of operation points.

maxPicHeight指定如通过用于操作点的串流的ISO/IEC 23008-2中的pic_height_in_luma_samples参数所定义的明度高度指示符的最大值。maxPicHeight specifies the maximum value of the luma height indicator as defined by the pic_height_in_luma_samples parameter in ISO/IEC 23008-2 for streaming of operation points.

maxChromaFormat指定如通过用于操作点的串流的ISO/IEC 23008-2中的chroma_format_idc参数所定义的chroma_format指示符的最大值。maxChromaFormat specifies the maximum value of the chroma_format indicator as defined by the chroma_format_idc parameter in ISO/IEC 23008-2 for the stream of the operation point.

maxBitDepthMinus8指定如分别通过用于操作点的串流的ISO/IEC 23008-2中的bit_depth_luma_minus8及bit_depth_chroma_minus8参数所定义的明度及色度位深度指示符的最大值。maxBitDepthMinus8 specifies the maximum value of the luma and chroma bit depth indicators as defined by the bit_depth_luma_minus8 and bit_depth_chroma_minus8 parameters, respectively, in ISO/IEC 23008-2 for streaming of the operation point.

frame_rate_info_flag等于0指示针对操作点不存在帧速率信息。值1指示针对操作点存在帧速率信息。A frame_rate_info_flag equal to 0 indicates that no frame rate information is present for the operation point. A value of 1 indicates that frame rate information is present for the operation point.

bit_rate_info_flag等于0指示针对操作点不存在位速率信息。值1指示针对操作点存在位速率信息。bit_rate_info_flag equal to 0 indicates that no bit rate information is present for the operation point. A value of 1 indicates that bit rate information is present for the operation point.

avgFrameRate给出操作点的以帧/(256秒)为单位的平均帧速率。值0指示未指定的平均帧速率。avgFrameRate gives the average frame rate in frames/(256 seconds) for the operating point. A value of 0 indicates an unspecified average frame rate.

constantFrameRate等于1指示操作点的串流具有恒定帧速率。值2指示操作点的串流中的每一时间层的表示具有恒定帧速率。值0指示操作点的串流可能或可能不具有恒定帧速率。A constantFrameRate value of 1 indicates that the stream of the operation point has a constant frame rate. A value of 2 indicates that the representation of each temporal layer in the stream of the operation point has a constant frame rate. A value of 0 indicates that the stream of the operation point may or may not have a constant frame rate.

maxBitRate给出在一秒的任何窗口内的操作点的串流的以位/秒计的最大位速率。maxBitRate gives the maximum bit rate in bits per second for the stream at the operating point within any window of one second.

avgBitRate给出操作点的串流的以位/秒计的平均位速率。avgBitRate gives the average bit rate of the stream in bits per second at the operating point.

……

[END INSERTION][END INSERTION]

下文描述第二实施。The second implementation is described below.

此章节描述对本发明实例3(a)的LHVCDecoderConfigurationRecord的发信的详细修改。This section describes the detailed modifications to the signaling of LHVCDecoderConfigurationRecord of Example 3(a) of the present invention.

num_layers指定播放轨中的层的数目。num_layers specifies the number of layers in the playback track.

layer_id指定层ID值，针对所述层ID值而提供此回圈中的信息。layer_id specifies the layer ID value for which the information in this loop is provided.

[END INSERTION][END INSERTION]

下文描述第三实施。The third implementation is described below.

此章节描述对本发明实例4(a)的LHVCDecoderConfigurationRecord的发信的详细修改。This section describes the detailed modifications to the signaling of LHVCDecoderConfigurationRecord of Example 4(a) of the present invention.

[START INSERTION]numOperationPoints：此字段用信号发送可供用于播放轨的操作点的数目。[END INSERTION][START INSERTION]numOperationPoints: This field signals the number of operation points available for the playback track. [END INSERTION]

operationPointIdx：此字段用信号发送在操作点信息方框中记载的操作点的索引。[START DELETION]在LHEVCDecoderConfigurationRecord中的general_profile_space、general_tier_flag、general_profile_idc、general_profile_compatibility_flags、general_constraint_indicator_flag及general_level_idc的值应与操作点信息方框中的第operationPointIdx个操作点的相应值相同。[END DELETION][STARTINSERTION]操作点信息方框中的第operationPointIdx个操作点中的max_temporal_id的值应小于或等于numTemporalLayers的值。[END INSERTION]operationPointIdx: This field signals the index of the operation point recorded in the Operation Point Information box. [START DELETION] The values of general_profile_space, general_tier_flag, general_profile_idc, general_profile_compatibility_flags, general_constraint_indicator_flag, and general_level_idc in the LHEVCDecoderConfigurationRecord shall be the same as those for the operationPointIdx-th operation point in the Operation Point Information box. [END DELETION] [START INSERTION] The value of max_temporal_id in the operationPointIdx-th operation point in the Operation Point Information box shall be less than or equal to the value of numTemporalLayers. [END INSERTION]

注意，播放轨可与一个或[START DELETION]表示[END DELETION]一个以上输出层集合[START DELETION]且因此与一个以上配置文件[START DELETION]相关联。播放器可通过调查针对操作点信息方框中的第operationPointIdx个操作点而提供的信息来找出对应于[START INSERTION]具有索引operationPointIdx的所选择操作点的[END INSERTION]LHEVCDecoderConfigurationRecord中的配置文件信息的哪些层待解码及哪些层待输出。Note that a track can be associated with one or more output layer sets [START DELETION] and therefore more than one profile [START DELETION]. The player can find out which layers to decode and which layers to output from the profile information in the [START INSERTION]LHEVCDecoderConfigurationRecord corresponding to the selected operation point with index operationPointIdx by examining the information provided for the operationPointIdx-th operation point in the Operation Point Information box.

注意，对于包含在播放轨中的每一辅助图片层，建议在nalUnit内包含含有指定辅助图片层的特性的宣告性SEI消息(诸如，用于深度辅助图片层的深度表示信息SEI消息)的SEI NAL单元。Note that for each auxiliary picture layer included in a playback track, it is recommended to include a SEI NAL unit within the nalUnit containing a declarative SEI message specifying the characteristics of the auxiliary picture layer (such as a depth representation information SEI message for a depth auxiliary picture layer).

图2为说明可实施本发明中所描述的技术的实例视频编码器20的框图。视频编码器20可经配置以输出单一视图、多视图、可调式、3D及其它类型的视频数据。视频编码器20可经配置以将视频输出到后处理实体27。后处理实体27旨在表示可处理来自视频编码器20的经编码视频数据的视频实体(诸如，MANE或拼接/编辑装置)的实例。在一些情况下，后处理处理实体可为网络实体的实例。在一些视频编码系统中，后处理实体27及视频编码器20可为单独装置的部分，而在其它情况下，关于后处理实体27描述的功能性可由包括视频编码器20的同一装置执行。后处理实体27可为视频装置。在一些实例中，后处理实体27可与图1的文件产生装置34相同。FIG2 is a block diagram illustrating an example video encoder 20 that may implement the techniques described in this disclosure. Video encoder 20 may be configured to output single-view, multi-view, scalable, 3D, and other types of video data. Video encoder 20 may be configured to output video to a post-processing entity 27. Post-processing entity 27 is intended to represent an example of a video entity, such as a MANE or splicing/editing device, that may process the encoded video data from video encoder 20. In some cases, the post-processing entity may be an example of a network entity. In some video coding systems, post-processing entity 27 and video encoder 20 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 27 may be performed by the same device that includes video encoder 20. Post-processing entity 27 may be a video device. In some examples, post-processing entity 27 may be the same as file generation device 34 of FIG1 .

视频编码器20可执行视频切片内的视频块的帧内译码及帧间译码。帧内译码依赖于空间预测以减少或移除给定视频帧或图片内的视频中的空间冗余。帧间译码依赖于时间预测以减少或移除视频序列的邻近帧或图片内的视频中的时间冗余。帧内模式(I模式)可指若干基于空间的压缩模式中的任一者。帧间模式(诸如，单向预测(P模式)或双向预测(B模式))可指若干基于时间的压缩模式中的任一者。Video encoder 20 may perform intra- and inter-coding of video blocks within a video slice. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I-mode) may refer to any of several spatial-based compression modes. Inter-mode, such as unidirectional prediction (P-mode) or bidirectional prediction (B-mode), may refer to any of several temporal-based compression modes.

在图2的实例中，视频编码器20包含分割单元37、预测处理单元41、滤波器单元63、参考图片存储器64、求和器50、变换处理单元52、量化单元54及熵编码单元56。预测处理单元41包含运动估计单元42、运动补偿单元44及帧内预测处理单元46。为进行视频块重建构，视频编码器20亦包含反量化单元58、反变换处理单元60及求和器62。滤波器单元63旨在表示一或多个回路滤波器，诸如解块滤波器、自适应性回路滤波器(ALF)及样本自适应性偏移(SAO)滤波器。尽管滤波器单元63在图2中展示为回路滤波器，但在其它配置中，滤波器单元63可实施为回路后滤波器。2 , video encoder 20 includes partitioning unit 37, prediction processing unit 41, filter unit 63, reference picture memory 64, summer 50, transform processing unit 52, quantization unit 54, and entropy encoding unit 56. Prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44, and intra-prediction processing unit 46. To perform video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform processing unit 60, and summer 62. Filter unit 63 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although filter unit 63 is shown in FIG2 as a loop filter, in other configurations, filter unit 63 may be implemented as a post-loop filter.

视频编码器20的视频数据存储器35可存储待由视频编码器20的组件编码的视频数据。存储在视频数据存储器35中的视频数据可(例如)从视频源18获得。参考图片存储器64可为存储参考视频数据以供视频编码器20在编码视频数据时(例如，在帧内或帧间译码模式中)使用的参考图片存储器。视频数据存储器35及参考图片存储器64可由多种存储器装置中的任一者形成，诸如动态随机存取存储器(DRAM)(包含同步DRAM(SDRAM))、磁阻式RAM(MRAM)、电阻式RAM(RRAM)或其它类型的存储器装置。视频数据存储器35及参考图片存储器64可由同一存储器装置或单独存储器装置来提供。在各种实例中，视频数据存储器35可与视频编码器20的其它组件一起在芯片上，或相对于那些组件在芯片外。Video data memory 35 of video encoder 20 may store video data to be encoded by the components of video encoder 20. The video data stored in video data memory 35 may be obtained, for example, from video source 18. Reference picture memory 64 may be a reference picture memory that stores reference video data for use by video encoder 20 when encoding video data (e.g., in intra- or inter-coding modes). Video data memory 35 and reference picture memory 64 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 35 and reference picture memory 64 may be provided by the same memory device or separate memory devices. In various examples, video data memory 35 may be on-chip with other components of video encoder 20, or off-chip relative to those components.

如图2中所展示，视频编码器20接收视频数据，且分割单元37将数据分割成视频块。此分割还可包含分割成切片、图像块或其它较大单元，以及(例如)根据LCU及CU的四分树结构的视频块分割。视频编码器20大体上说明编码待编码视频切片内的视频块的组件。可将切片划分成多个视频块(且可能划分成被称作图像块的视频块集合)。预测处理单元41可基于误差结果(例如，译码速率及失真程度)而为当前视频块选择多个可能译码模式中的一者(诸如，多个帧内译码模式中的一者或多个帧间译码模式中的一者)。预测处理单元41可将所得经帧内或经帧间译码块提供到求和器50以产生残余块数据，且提供到求和器62以重建构经编码块以供用作参考图片。As shown in FIG2 , video encoder 20 receives video data, and partitioning unit 37 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as, for example, video block partitioning according to a quadtree structure of LCUs and CUs. Video encoder 20 generally illustrates components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles). Prediction processing unit 41 may select one of multiple possible coding modes (such as one of multiple intra-coding modes or one of multiple inter-coding modes) for the current video block based on error results (e.g., coding rate and distortion level). Prediction processing unit 41 may provide the resulting intra- or inter-coded block to summer 50 to generate residual block data, and to summer 62 to reconstruct the encoded block for use as a reference picture.

预测处理单元41内的帧内预测处理单元46可执行当前视频块相对于在与待译码的当前块相同的帧或切片中的一或多个相邻块的帧内预测性译码，以提供空间压缩。预测处理单元41内的运动估计单元42及运动补偿单元44执行当前视频块相对于一或多个参考图片中的一或多个预测性块的帧间预测性译码，以提供时间压缩。Intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 may perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.

运动估计单元42可经配置以根据视频序列的预定图案来确定用于视频切片的帧间预测模式。预定图案可将序列中的视频切片指明为P切片、B切片或GPB切片。运动估计单元42及运动补偿单元44可高度集成，但为概念目的而分开来说明。由运动估计单元42执行的运动估计为产生运动向量的程序，所述运动向量估计视频块的运动。举例来说，运动向量可指示当前视频帧或图片内的视频块的PU相对于参考图片内的预测性块的移位。Motion estimation unit 42 may be configured to determine the inter-prediction mode for a video slice based on a predetermined pattern for a video sequence. The predetermined pattern may designate a video slice in the sequence as a P slice, a B slice, or a GPB slice. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors, which estimate the motion of video blocks. For example, a motion vector may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.

预测性块为就像素差而言被发现紧密地匹配待译码的视频块的PU的块，所述像素差可由绝对差和(SAD)、平方差和(SSD)或其它差量度确定。在一些实例中，视频编码器20可计算存储在参考图片存储器64中的参考图片的子整数像素位置的值。举例来说，视频编码器20可内插参考图片的四分之一像素位置、八分之一像素位置或其它分数像素位置的值。因此，运动估计单元42可执行关于全像素位置及分数像素位置的运动搜寻且输出具有分数像素精度的运动向量。A predictive block is a block that is found to closely match a PU of the video block to be coded, in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of a reference picture stored in reference picture memory 64. For example, video encoder 20 may interpolate values for quarter-pixel positions, eighth-pixel positions, or other fractional pixel positions of the reference picture. Thus, motion estimation unit 42 may perform motion searches with respect to full-pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision.

运动估计单元42通过比较PU的位置与参考图片的预测性块的位置而计算经帧间译码切片中的视频块的PU的运动向量。参考图片可选自第一参考图片列表(列表0)或第二参考图片列表(列表1)，其中的每一者识别存储在参考图片存储器64中的一或多个参考图片。运动估计单元42将计算出的运动向量发送到熵编码单元56及运动补偿单元44。Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in reference picture memory 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.

由运动补偿单元44执行的运动补偿可涉及基于由运动估计确定的运动向量而提取或产生预测性块，可能执行达子像素精确度的内插。在接收到当前视频块的PU的运动向量后，运动补偿单元44即可在参考图片列表中的一者中定位运动向量所指向的预测性块。视频编码器20可通过从正被译码的当前视频块的像素值减去预测性块的像素值从而形成像素差值来形成残余视频块。像素差值形成块的残余数据，且可包含明度差分量及色度差分量两者。求和器50表示执行此减法运算的一或多个组件。运动补偿单元44还可产生与视频块及视频切片相关联的语法元素以供视频解码器30在解码视频切片的视频块时使用。Motion compensation performed by motion compensation unit 44 may involve extracting or generating a predictive block based on the motion vector determined by motion estimation, possibly performing interpolation to sub-pixel accuracy. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the predictive block pointed to by the motion vector in one of the reference picture lists. Video encoder 20 may form a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being coded, thereby forming pixel difference values. The pixel difference values form residual data for the block and may include both luma difference components and chroma difference components. Summer 50 represents the one or more components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video block and video slice for use by video decoder 30 when decoding the video block of the video slice.

如上文所描述，作为由运动估计单元42及运动补偿单元44所执行的帧间预测的替代例，帧内预测处理单元46可对当前块进行帧内预测。换句话说，帧内预测处理单元46可确定帧内预测模式以用以编码当前块。在一些实例中，帧内预测处理单元46可(例如)在单独编码遍次期间使用各种帧内预测模式来编码当前块，且帧内预测单元46(或在一些实例中，模式选择单元40)可从受测模式中选择待使用的适当帧内预测模式。举例来说，帧内预测处理单元46可使用对于各种受测帧内预测模式的速率-失真分析来计算速率-失真值，且在受测模式当中选择具有最好速率-失真特性的帧内预测模式。速率-失真分析大体上确定经编码块与原始未经编码块(其经编码以产生经编码块)之间的失真(或误差)的量，以及用以产生经编码块的位速率(即，位的数目)。帧内预测处理单元46可从各种经编码块的失真及速率计算比率以确定哪一帧内预测模式展现块的最好速率-失真值。As described above, as an alternative to the inter-prediction performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction processing unit 46 may intra-predict the current block. In other words, intra-prediction processing unit 46 may determine an intra-prediction mode to use to encode the current block. In some examples, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding passes, and intra-prediction unit 46 (or, in some examples, mode selection unit 40) may select an appropriate intra-prediction mode to use from the tested modes. For example, intra-prediction processing unit 46 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes and select the intra-prediction mode with the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between the encoded block and the original, unencoded block (which was encoded to produce the encoded block), as well as the bit rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction processing unit 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

在任何状况下，在选择用于块的帧内预测模式之后，帧内预测处理单元46可将指示用于块的所选择帧内预测模式的信息提供到熵编码单元56。熵编码单元56可根据本发明的技术编码指示所选择帧内预测模式的信息。视频编码器20可在所发射的位流中包含以下各者：配置数据，其可包含多个帧内预测模式索引表及多个经修改的帧内预测模式索引表(也称作码字映射表)；各种块的编码内容脉络的定义；及待用于所述内容脉络中的每一者的最有可能的帧内预测模式、帧内预测模式索引表及经修改的帧内预测模式索引表的指示。In any case, after selecting an intra-prediction mode for a block, intra-prediction processing unit 46 may provide information indicating the selected intra-prediction mode for the block to entropy encoding unit 56. Entropy encoding unit 56 may encode the information indicating the selected intra-prediction mode according to the techniques of this disclosure. Video encoder 20 may include the following in the transmitted bitstream: configuration data, which may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also called codeword mapping tables); definitions of encoding contexts for various blocks; and indications of the most probable intra-prediction mode, intra-prediction mode index tables, and modified intra-prediction mode index tables to be used for each of the content contexts.

在预测处理单元41经由帧间预测或帧内预测产生当前视频块的预测性块之后，视频编码器20可通过从当前视频块减去预测性块而形成残余视频块。残余块中的残余视频数据可包含在一或多个TU中且被应用于变换处理单元52。变换处理单元52使用诸如离散余弦变换(DCT)或概念上类似变换的变换将残余视频数据变换成残余变换系数。变换处理单元52可将残余视频数据从像素域转换到变换域(诸如，频域)。After prediction processing unit 41 generates a predictive block for the current video block via inter-prediction or intra-prediction, video encoder 20 may form a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform processing unit 52 may convert the residual video data from the pixel domain to a transform domain, such as the frequency domain.

变换处理单元52可将所得变换系数发送到量化单元54。量化单元54量化变换系数以进一步减小位速率。量化程序可减小与一些或所有系数相关联的位深度。可通过调整量化参数来修改量化程度。在一些实例中，量化单元54可接着执行对包含经量化变换系数的矩阵的扫描。替代地，熵编码单元56可执行扫描。Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.

在量化之后，熵编码单元可熵编码表示经量化变换系数的语法元素。举例来说，熵编码单元56可执行内容脉络自适应性可变长度译码(CAVLC)、内容脉络自适应性二进制算术译码(CABAC)、基于语法的内容脉络自适应性二进制算术译码(SBAC)、机率区间分割熵(PIPE)译码或另一熵编码方法或技术。在由熵编码单元56进行熵编码之后，经编码位流可被发射到视频解码器30，或经存盘以供视频解码器30稍后发射或检索。熵编码单元56还可熵编码正经译码的当前视频切片的运动向量及其它语法元素。[0066] After quantization, the entropy encoding unit may entropy encode syntax elements representing the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30 or saved for later transmission or retrieval by video decoder 30. Entropy encoding unit 56 may also entropy encode motion vectors and other syntax elements for the current video slice being coded.

反量化单元58及反变换处理单元60分别应用反量化及反变换以在像素域中重建构残余块，从而供稍后用作参考图片的参考块。运动补偿单元44可通过将残余块加到参考图片列表中的一者内的参考图片中的一者的预测性块来计算参考块。运动补偿单元44还可将一或多个内插滤波器应用到经重建构残余块，以计算子整数像素值来在运动估计中使用。求和器62将经重建构残余块加到由运动补偿单元44所产生的经运动补偿预测块以产生用于存储在参考图片存储器64中的参考块。参考块可由运动估计单元42及运动补偿单元44用作参考块以帧间预测后续视频帧或图片中的块。Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference picture. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reference block for storage in reference picture memory 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or picture.

视频编码器20表示经配置以产生可使用本发明中所描述的文件格式技术存储的视频数据的视频译码器的实例。Video encoder 20 represents an example of a video coder configured to generate video data that may be stored using the file format techniques described in this disclosure.

图3为说明可实施本发明中所描述的技术的实例视频解码器30的框图。视频解码器30可经配置以解码单一视图、多视图、可调式、3D及其它类型的视频数据。在图3的实例中，视频解码器30包含熵解码单元80、预测处理单元81、反量化单元86、反变换处理单元88、求和器90、滤波器单元91，及参考图片存储器92。预测处理单元81包含运动补偿单元82及帧内预测处理单元84。在一些实例中，视频解码器30可执行大体上互逆于关于来自图2的视频编码器20所描述的编码编次的解码编次。FIG3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. Video decoder 30 may be configured to decode single-view, multi-view, scalable, 3D, and other types of video data. In the example of FIG3 , video decoder 30 includes an entropy decoding unit 80, a prediction processing unit 81, an inverse quantization unit 86, an inverse transform processing unit 88, a summer 90, a filter unit 91, and a reference picture memory 92. Prediction processing unit 81 includes a motion compensation unit 82 and an intra-prediction processing unit 84. In some examples, video decoder 30 may perform a decoding operation that is generally the inverse of the encoding operation described with respect to video encoder 20 from FIG2 .

经译码图片缓冲器(CPB)79可接收及存储位流的经编码视频数据(例如，NAL单元)。存储在CPB 79中的视频数据可(例如)从链路16、(例如)从诸如相机的本端视频源、经由视频数据的有线或无线网络通信或通过存取物理数据存储媒体而获得。CPB 79可形成存储来自经编码视频位流的经编码视频数据的视频数据存储器。CPB 79可为存储参考视频数据以供视频解码器30在解码视频数据时(例如，在帧内或帧间译码模式中)使用的参考图片存储器。CPB 79及参考图片存储器92可由多种存储器装置中的任一者形成，诸如动态随机存取存储器(DRAM)(包含同步DRAM(SDRAM))、磁阻式RAM(MRAM)、电阻式RAM(RRAM)或其它类型的存储器装置。CPB 79及参考图片存储器92可由同一存储器装置或单独存储器装置提供。在各种实例中，CPB 79可与视频解码器30的其它组件一起在芯片上，或相对于那些组件在芯片外。Coded picture buffer (CPB) 79 may receive and store encoded video data (e.g., NAL units) of a bitstream. The video data stored in CPB 79 may be obtained, for example, from link 16, from a local video source such as a camera, via a wired or wireless network for communicating video data, or by accessing physical data storage media. CPB 79 may form a video data memory that stores encoded video data from an encoded video bitstream. CPB 79 may be a reference picture memory that stores reference video data for use by video decoder 30 when decoding the video data (e.g., in intra- or inter-coding modes). CPB 79 and reference picture memory 92 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. CPB 79 and reference picture memory 92 may be provided by the same memory device or separate memory devices. In various examples, CPB 79 may be on-chip with other components of video decoder 30 , or off-chip relative to those components.

在解码程序期间，视频解码器30从视频编码器20接收表示经编码视频切片的视频块及相关联语法元素的经编码视频位流。视频解码器30可从网络实体29接收经编码视频位流。网络实体29可(例如)为服务器、MANE、视频编辑器/拼接器或经配置以实施上文所描述的技术中的一或多者的其它此类装置。网络实体29可包含或可不包含视频编码器，诸如视频编码器20。本发明中所描述的技术中的一些可由网络实体29在网络实体29将经编码视频位流发射到视频解码器30之前实施。在一些视频解码系统中，网络实体29及视频解码器30可为单独装置的部分，而在其它情况下，关于网络实体29描述的功能性可由包括视频解码器30的同一装置执行。可将网络实体29视为视频装置。此外，在一些实例中，网络实体29为图1的文件产生装置34。During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks and associated syntax elements of an encoded video slice from video encoder 20. Video decoder 30 may receive the encoded video bitstream from network entity 29. Network entity 29 may, for example, be a server, a MANE, a video editor/splicer, or other such device configured to implement one or more of the techniques described above. Network entity 29 may or may not include a video encoder, such as video encoder 20. Some of the techniques described in this disclosure may be implemented by network entity 29 before network entity 29 transmits the encoded video bitstream to video decoder 30. In some video decoding systems, network entity 29 and video decoder 30 may be parts of separate devices, while in other cases, the functionality described with respect to network entity 29 may be performed by the same device that includes video decoder 30. Network entity 29 may be considered a video device. Furthermore, in some examples, network entity 29 is file generation device 34 of FIG.

视频解码器30的熵解码单元80熵解码位流的特定语法元素以产生经量化系数、运动向量及其它语法元素。熵解码单元80将运动向量及其它语法元素转递到预测处理单元81。视频解码器30可在视频切片层级及/或视频块层级接收语法元素。Entropy decoding unit 80 of video decoder 30 entropy decodes specific syntax elements of the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction processing unit 81. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

当视频切片经译码为经帧内译码(I)切片时，预测处理单元81的帧内预测处理单元84可基于来自当前帧或图片的先前经解码块的经用信号发送的帧内预测模式及数据而产生用于当前视频切片的视频块的预测数据。当视频帧经译码为经帧间译码(即，B、P或GPB)切片时，预测处理单元81的运动补偿单元82基于从熵解码单元80接收的运动向量及其它语法元素而产生当前视频切片的视频块的预测性块。预测性块可从参考图片列表中的一者内的参考图片中的一者产生。视频解码器30可基于存储在参考图片存储器92中的参考图片使用默认建构技术来建构参考帧列表：列表0及列表1。When the video slice is coded as an intra-coded (I) slice, intra-prediction processing unit 84 of prediction processing unit 81 may generate prediction data for the video block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B, P, or GPB) slice, motion compensation unit 82 of prediction processing unit 81 generates predictive blocks for the video block of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 80. The predictive blocks may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct reference frame lists: List 0 and List 1 using a default construction technique based on the reference pictures stored in reference picture memory 92.

运动补偿单元82通过剖析运动向量及其它语法元素来确定用于当前视频切片的视频块的预测信息，且使用所述预测信息以产生正经解码的当前视频块的预测性块。举例来说，运动补偿单元82使用所接收的语法元素中的一些以确定用以译码视频切片的视频块的预测模式(例如，帧内预测或帧间预测)、帧间预测切片类型(例如，B切片、P切片或GPB切片)、切片的参考图片列表中的一或多者的建构信息、切片的每一经帧间编码视频块的运动向量、切片的每一经帧间译码视频块的帧间预测状态及用以解码当前视频切片中的视频块的其它信息。Motion compensation unit 82 determines prediction information for a video block of the current video slice by parsing motion vectors and other syntax elements, and uses the prediction information to produce a predictive block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) used to code the video block of the video slice, an inter-prediction slice type (e.g., a B slice, a P slice, or a GPB slice), construction information for one or more of the slice's reference picture lists, a motion vector for each inter-coded video block of the slice, an inter-prediction state for each inter-coded video block of the slice, and other information used to decode the video blocks in the current video slice.

运动补偿单元82还可执行基于内插滤波器的内插。运动补偿单元82可使用如由视频编码器20在视频块的编码期间所使用的内插滤波器，以计算参考块的子整数像素的内插值。在此状况下，运动补偿单元82可从所接收语法元素确定由视频编码器20所使用的内插滤波器，且可使用所述内插滤波器以产生预测性块。Motion compensation unit 82 may also perform interpolation based on interpolation filters. Motion compensation unit 82 may use interpolation filters as used by video encoder 20 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements and may use the interpolation filters to produce the predictive blocks.

反量化单元86反量化(即，解量化)位流中所提供且由熵解码单元80解码的经量化变换系数。反量化程序可包含使用由视频编码器20针对视频切片中的每一视频块计算的量化参数，以确定量化程度及(同样地)应该应用的反量化程度。反变换处理单元88将反变换(例如，反DCT、反整数变换或概念上类似的反变换程序)应用于变换系数，以便在像素域中产生残余块。Inverse quantization unit 86 inverse quantizes (i.e., dequantizes) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 80. The inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization and, likewise, the degree of inverse quantization that should be applied. Inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce residual blocks in the pixel domain.

在运动补偿单元82基于运动向量及其它语法元素而产生当前视频块的预测性块之后，视频解码器30通过将来自反变换处理单元88的残余块与由运动补偿单元82所产生的对应预测性块求和而形成经解码视频块。求和器90表示执行此求和运算的一或多个组件。如果需要，还可使用回路滤波器(在译码回路中或在译码回路后)以使像素转变平滑，或以其它方式改进视频质量。滤波器单元91旨在表示一或多个回路滤波器(诸如，解块滤波器、自适应性回路滤波器(ALF)及样本自适应性偏移(SAO)滤波器)。尽管滤波器单元91在图3中展示为回路滤波器，但在其它配置中，滤波器单元91可实施为回路后滤波器。接着将给定帧或图片中的经解码视频块存储在参考图片存储器92中，所述参考图片存储器存储用于后续运动补偿的参考图片。参考图片存储器92还存储用于稍后在显示装置(诸如，图1的显示装置32)上呈现的经解码视频。After motion compensation unit 82 generates a predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 forms a decoded video block by summing the residual block from inverse transform processing unit 88 with the corresponding predictive block generated by motion compensation unit 82. Summer 90 represents one or more components that perform this summing operation. If desired, loop filters (either in the coding loop or after the coding loop) may also be used to smooth pixel transitions or otherwise improve video quality. Filter unit 91 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although filter unit 91 is shown in FIG3 as a loop filter, in other configurations, filter unit 91 may be implemented as a post-loop filter. The decoded video blocks in a given frame or picture are then stored in reference picture memory 92, which stores reference pictures used for subsequent motion compensation. Reference picture memory 92 also stores decoded video for later presentation on a display device, such as display device 32 of FIG1 .

图3的视频解码器30表示经配置以解码可使用本发明中所描述的文件格式技术存储的视频数据的视频解码器的实例。Video decoder 30 of FIG. 3 represents an example of a video decoder configured to decode video data that may be stored using the file format techniques described in this disclosure.

图4为说明形成网络100的部分的实例装置集合的框图。在此实例中，网络100包含路由装置104A、104B(路由装置104)及转码装置106。路由装置104及转码装置106旨在表示可形成网络100的部分的少数装置。诸如交换器、集线器、网关、防火墙、网桥及其它此类装置的其它网络装置还可包含在网络100内。此外，可沿着服务器装置102与客户端装置108之间的网络路径提供额外网络装置。在一些实例中，服务器装置102可对应于源装置12(图1)，而客户端装置108可对应于目的地装置14(图1)。4 is a block diagram illustrating an example set of devices that form part of network 100. In this example, network 100 includes routing devices 104A, 104B (routing devices 104) and transcoding device 106. Routing devices 104 and transcoding device 106 are intended to represent a small number of the devices that may form part of network 100. Other network devices, such as switches, hubs, gateways, firewalls, bridges, and other such devices, may also be included within network 100. Furthermore, additional network devices may be provided along the network path between server device 102 and client device 108. In some examples, server device 102 may correspond to source device 12 (FIG. 1), while client device 108 may correspond to destination device 14 (FIG. 1).

一般来说，路由装置104实施一或多个路由协议以经由网络100交换网络数据。在一些实例中，路由装置104可经配置以执行代理或快取操作。因此，在一些实例中，路由装置104可被称作代理装置。一般来说，路由装置104执行路由协议以发现经由网络100的路线。通过执行此类路由协议，路由装置104B可发现从本身经由路由装置104A到服务器装置102的网络路线。Generally speaking, routing device 104 implements one or more routing protocols to exchange network data via network 100. In some instances, routing device 104 may be configured to perform proxy or caching operations. Therefore, in some instances, routing device 104 may be referred to as a proxy device. Generally speaking, routing device 104 executes routing protocols to discover routes through network 100. By executing such routing protocols, routing device 104B may discover a network route from itself via routing device 104A to server device 102.

本发明的技术可由诸如路由装置104及转码装置106的网络装置实施，但也可由客户端装置108实施。以此方式，路由装置104、转码装置106及客户端装置108表示经配置以执行本发明的技术的装置的实例。此外，图1的装置以及图2中所说明的编码器20及图3中所说明的解码器30也为可经配置以执行本发明的技术中的一或多者的装置的实例。The techniques of this disclosure may be implemented by network devices such as routing device 104 and transcoding device 106, but may also be implemented by client device 108. In this manner, routing device 104, transcoding device 106, and client device 108 represent examples of devices configured to perform the techniques of this disclosure. Furthermore, the devices of FIG. 1 , as well as encoder 20 illustrated in FIG. 2 and decoder 30 illustrated in FIG. 3 , are also examples of devices that may be configured to perform one or more of the techniques of this disclosure.

图5A为说明根据本发明的一或多种技术的文件300的实例结构的概念图。在图5A的实例中，文件300包含一电影方框302及多个媒体数据方框304。尽管在图5A的实例中说明为在同一文件中，但在其它实例中，电影方框302及媒体数据方框304可在单独文件中。如上文所指示，方框可为由唯一类型识别符及长度定义的面向对象式建构块。举例来说，方框可为ISOBMFF中的基本语法结构，包含四字符译码方框类型、方框的字节计数及有效负载。FIG5A is a conceptual diagram illustrating an example structure of a file 300 in accordance with one or more techniques of this disclosure. In the example of FIG5A , file 300 includes a movie box 302 and a plurality of media data boxes 304. Although illustrated in the example of FIG5A as being in the same file, in other examples, movie box 302 and media data boxes 304 may be in separate files. As indicated above, a box may be an object-oriented building block defined by a unique type identifier and length. For example, a box may be a basic syntactic structure in ISOBMFF, including a four-character encoding of the box type, the box's byte count, and the payload.

电影方框302可含有用于文件300的播放轨的元数据。文件300的每一播放轨可包括媒体数据的连续串流。媒体数据方框304中的每一者可包含一或多个样本305。样本305中的每一者可包括音频或视频存取单元。如在本发明中别处所描述，在多视图译码(例如，MV-HEVC及3D-HEVC)及可调式视频译码(例如，SHVC)中，每一存取单元可包括多个经译码图片。举例来说，存取单元可针对每一层包含一或多个经译码图片。Movie block 302 may contain metadata for a track of file 300. Each track of file 300 may comprise a continuous stream of media data. Each of media data blocks 304 may include one or more samples 305. Each of samples 305 may comprise an audio or video access unit. As described elsewhere in this disclosure, in multi-view coding (e.g., MV-HEVC and 3D-HEVC) and scalable video coding (e.g., SHVC), each access unit may comprise multiple coded pictures. For example, an access unit may include one or more coded pictures for each layer.

此外，在图5A的实例中，电影方框302包含播放轨方框306。播放轨方框306可围封用于文件300的播放轨的元数据。在其它实例中，电影方框302可包含用于文件300的不同播放轨的多个播放轨方框。播放轨方框306包含媒体方框307。媒体方框307可含有宣告关于播放轨内的媒体数据的信息的所有对象。媒体方框307包含媒体信息方框308。媒体信息方框308可含有宣告播放轨的媒体的特性信息的所有对象。媒体信息方框308包含样本表方框309。样本表方框309可指定样本特定元数据。Furthermore, in the example of FIG5A , movie box 302 includes track box 306. Track box 306 may enclose metadata for a track of file 300. In other examples, movie box 302 may include multiple track boxes for different tracks of file 300. Track box 306 includes media box 307. Media box 307 may contain all objects that declare information about the media data within the track. Media box 307 includes media information box 308. Media information box 308 may contain all objects that declare information about the characteristics of the media of the track. Media information box 308 includes sample table box 309. Sample table box 309 may specify sample-specific metadata.

在图5A的实例中，样本表方框309包含SampleToGroup方框310及SampleGroupDescription方框312，且SampleGroupDescription方框312包含oinf方框316。在其它实例中，样本表方框309可包含除SampleToGroup方框310及SampleGroupDescription方框312外的其它方框，及/或可包含多个SampleToGroup方框及SampleGroupDescription方框。SampleToGroup方框310可将样本(例如，样本305中的特定者)映像到一群样本。SampleGroupDescription方框312可指定由所述群样本(即，样本群组)中的样本共享的性质。此外，样本表方框309可包含多个样本条目方框311。样本条目方框311中的每一者可对应于所述群样本中的样本。在一些实例中，样本条目方框311为扩展基本样本群组描述类别的随机可存取样本条目类别的实例。In the example of FIG5A , sample table box 309 includes a SampleToGroup box 310 and a SampleGroupDescription box 312, and SampleGroupDescription box 312 includes an oinf box 316. In other examples, sample table box 309 may include additional boxes in addition to SampleToGroup box 310 and SampleGroupDescription box 312, and/or may include multiple SampleToGroup boxes and SampleGroupDescription boxes. SampleToGroup box 310 may map a sample (e.g., a particular one of sample 305) to a group of samples. SampleGroupDescription box 312 may specify properties shared by samples in the group of samples (i.e., a sample group). Furthermore, sample table box 309 may include multiple SampleEntry boxes 311. Each of the SampleEntry boxes 311 may correspond to a sample in the group of samples. In some examples, the SampleEntry boxes 311 are instances of a randomly accessible SampleEntry class that extends the basic SampleGroupDescription class.

根据本发明的一或多种技术，SampleGroupDescription方框312可指定样本群组中的每一样本含有至少一个IRAP图片。以此方式，文件产生装置34可产生文件，所述文件包括含有用于在文件300中的播放轨的元数据的播放轨方框306。用于播放轨的媒体数据包括一连串样本305。所述样本中的每一者可为多层视频数据(例如，SHVC、MV-HEVC或3D-HEVC视频数据)的视频存取单元。此外，作为产生文件300的部分，文件产生装置34可在文件300中产生记载含有至少一个IRAP图片的所有样本305的额外方框(即，样本表方框309)。换句话说，额外方框识别含有至少一个IRAP图片的所有样本305。在图5A的实例中，额外方框定义记载(例如，识别)含有至少一个IRAP图片的所有样本305的样本群组。换句话说，所述额外方框指定含有至少一个IRAP图片的样本305属于样本群组。According to one or more techniques of this disclosure, the SampleGroupDescription box 312 may specify that each sample in the sample group contains at least one IRAP picture. In this manner, the file generation device 34 may generate a file including a Track box 306 containing metadata for a track in the file 300. The media data for the track includes a series of samples 305. Each of the samples may be a video access unit of multi-layer video data (e.g., SHVC, MV-HEVC, or 3D-HEVC video data). Furthermore, as part of generating the file 300, the file generation device 34 may generate an additional box (i.e., a Sample Table box 309) in the file 300 that lists all samples 305 that contain at least one IRAP picture. In other words, the additional box identifies all samples 305 that contain at least one IRAP picture. In the example of FIG. 5A , the additional box defines a sample group that lists (e.g., identifies) all samples 305 that contain at least one IRAP picture. In other words, the additional box specifies that the samples 305 that contain at least one IRAP picture belong to the sample group.

根据本发明的技术，SampleGroupDescription方框312可包含oinf方框316。oinf方框可存储用于视频数据的每一操作点的表示格式信息。表示格式信息可包含空间分辨率、位深度或色彩格式中的一或多者。另外，oinf方框可存储指示视频数据的操作点的必要层的数目的层计数。oinf方框可另外存储用于视频数据的每一操作点的位速率信息。因此，归因于在oinf方框中用信号发送的位速率信息，可能不存在对于在配置方框之后用信号发送位速率方框的需要。According to the techniques of this disclosure, SampleGroupDescription box 312 may include oinf box 316. The oinf box may store representation format information for each operation point of the video data. The representation format information may include one or more of spatial resolution, bit depth, or color format. In addition, the oinf box may store a layer count indicating the number of necessary layers for the operation point of the video data. The oinf box may also store bit rate information for each operation point of the video data. Therefore, due to the bit rate information signaled in the oinf box, there may be no need to signal a bit rate box after the configuration box.

另外，可能不存在对于在文件格式的解码器配置记录中存储配置文件、阶层及层级(PTL)信息、表示格式信息及帧速率信息的需要。在解码器配置记录中的所有其它信息可与播放轨中的视频数据的所有层相关联。视频数据的每一层的解码器配置记录可存储表示格式信息及帧速率信息。解码器配置记录可存储用于视频数据的每一层的平行度信息。文件通常仅包含播放轨的一个解码器配置记录，但播放轨可含有一或多个层及一或多个操作点。PTL信息、表示格式信息及帧速率信息可与每一层或每一OP相关联。因此，不同于仅支持一个层的HEVC文件格式，解码器配置记录可能不能够恰当地促进支持多个层的LHEVC文件格式的此关联。In addition, there may not be a need to store configuration files, hierarchy and level (PTL) information, representation format information, and frame rate information in the decoder configuration record of the file format. All other information in the decoder configuration record can be associated with all layers of video data in the playback track. The decoder configuration record for each layer of video data can store representation format information and frame rate information. The decoder configuration record can store parallelism information for each layer of video data. A file typically contains only one decoder configuration record for a playback track, but a playback track can contain one or more layers and one or more operation points. PTL information, representation format information, and frame rate information can be associated with each layer or each OP. Therefore, unlike the HEVC file format that supports only one layer, the decoder configuration record may not be able to properly facilitate this association for the LHEVC file format that supports multiple layers.

解码器配置记录可能不将操作点索引存储在解码器配置记录中，其中操作点索引指代操作点信息方框中记载的操作点的索引。将操作点索引存储在解码器配置记录中可引起播放播放轨(即，与解码器配置记录相关联)的装置播放由那个操作点索引参考的操作点。然而，可能存在更多可用操作点。移除操作点索引可更好地使播放装置能够识别由文件支持的所有操作点。解码器配置记录可存储与视频数据的播放轨相关联的操作点索引的列表。解码器配置记录可(例如)从图5A的样本条目方框311中的信息导出。A decoder configuration record may not store an operation point index in the decoder configuration record, where the operation point index refers to the index of the operation point recorded in the operation point information box. Storing the operation point index in the decoder configuration record may cause a device playing the track (i.e., associated with the decoder configuration record) to play the operation point referenced by that operation point index. However, there may be more operation points available. Removing the operation point index may better enable the playback device to identify all operation points supported by the file. The decoder configuration record may store a list of operation point indices associated with the track of video data. The decoder configuration record may be derived, for example, from the information in sample entry box 311 of FIG. 5A .

解码器配置记录存储诸如用于每一样本中以指示其所含有NAL单元的长度的长度字段的大小以及参数集(如果存储在样本条目中)的信息。解码器配置记录可(例如)为外部成框的(例如，其大小必须由含有其的结构供应)。解码器配置记录还可含有用以识别所遵循规格的版本的版本字段，其中通过版本编号的改变指示记录的不兼容的改变。相比之下，此记录的兼容扩展可不需要配置版本码的改变。解码器配置记录还可包含诸如general_profile_space、general_tier_flag、general_profile_idc、general_profile_compatibility_flags、general_constraint_indicator_flags、general_level_idc、min_spatial_segmentation_idc、chroma_format_idc、bit_depth_luma_minus8及bit_depth_chroma_minus8的若干HEVC语法元素的值，所述语法元素在HEVC中定义。解码器配置记录可含有与含有配置记录、时间子层的数目、片段信息、所支持的平行度类型及参数集NAL单元(例如，VPS、SPS、PPS、SEI等)的播放轨相关联的一般信息。The decoder configuration record stores information such as the size of the length field used in each sample to indicate the length of the NAL unit it contains, and the parameter set (if stored in the sample entry). The decoder configuration record can, for example, be externally framed (e.g., its size must be provided by the structure containing it). The decoder configuration record may also contain a version field to identify the version of the specification being followed, where an incompatible change to the record is indicated by a change in the version number. In contrast, compatible extensions of this record may not require a change in the configuration version code. The decoder configuration record may also include values for several HEVC syntax elements defined in HEVC, such as general_profile_space, general_tier_flag, general_profile_idc, general_profile_compatibility_flags, general_constraint_indicator_flags, general_level_idc, min_spatial_segmentation_idc, chroma_format_idc, bit_depth_luma_minus8, and bit_depth_chroma_minus8. The decoder configuration record may contain general information associated with a track including configuration records, number of temporal sub-layers, fragment information, supported parallelism types, and parameter set NAL units (e.g., VPS, SPS, PPS, SEI, etc.).

此外，根据本发明的一或多种技术，样本条目方框311中的每一者可包含指示对应样本中的所有经译码图片是否为IRAP图片的值(例如，all_pics_are_IRAP)。在一些实例中，所述值等于1指定所述样本中并非所有经译码图片都为IRAP图片。所述值等于0指定不需要样本群组中的每一样本中的每一经译码图片为IRAP图片。Furthermore, in accordance with one or more techniques of this disclosure, each of the sample entry boxes 311 may include a value (e.g., all_pics_are_IRAP) indicating whether all coded pictures in the corresponding sample are IRAP pictures. In some examples, a value equal to 1 specifies that not all coded pictures in the sample are IRAP pictures. A value equal to 0 specifies that not every coded picture in every sample in the sample group is an IRAP picture.

在一些实例中，当特定样本中的并非所有经译码图片都为IRAP图片时，文件产生装置34可在用于所述特定样本的样本条目方框311中的一者中包含指示所述特定样本中的IRAP图片的数目的值(例如，num_IRAP_pics)。另外，文件产生装置34可在用于所述特定样本的样本条目中包含指示所述特定样本中的IRAP图片的层识别符的值。文件产生装置34还可在用于所述特定样本的样本条目中包含指示所述特定样本的IRAP图片中的VCL NAL单元的NAL单元类型的值。In some examples, when not all coded pictures in a particular sample are IRAP pictures, file generation device 34 may include a value indicating the number of IRAP pictures in the particular sample (e.g., num_IRAP_pics) in one of sample entry blocks 311 for the particular sample. Additionally, file generation device 34 may include a value indicating the layer identifier of the IRAP pictures in the particular sample in the sample entry for the particular sample. File generation device 34 may also include a value indicating the NAL unit type of the VCL NAL units in the IRAP pictures of the particular sample in the sample entry for the particular sample.

此外，在图5A的实例中，样本表方框309包含子样本信息方框314。尽管图5A的实例仅展示一个子样本信息方框，但样本表方框309可包含多个子样本信息方框。一般来说，子样本信息方框经设计以含有子样本信息。子样本为样本的一系列邻接字节。ISO/IEC14496-12指示应针对给定译码系统(诸如，H.264/AVC或HEVC)供应子样本的特定定义。Furthermore, in the example of FIG5A , sample table box 309 includes a subsample information box 314. Although the example of FIG5A shows only one subsample information box, sample table box 309 may include multiple subsample information boxes. Generally, the subsample information box is designed to contain subsample information. A subsample is a series of contiguous bytes of a sample. ISO/IEC 14496-12 specifies that a specific definition of subsamples should be provided for a given coding system, such as H.264/AVC or HEVC.

ISO/IEC 14496-15的第8.4.8章指定用于HEVC的子样本的定义。特定地说，ISO/IEC 14496-15的第8.4.8章指定对于子样本信息方框(ISO/IEC 14496-12的8.7.7)在HEVC串流中的使用，基于子样本信息方框的标志字段的值而定义子样本。根据本发明的一或多种技术，如果子样本信息方框314中的标志字段等于5，那么对应于子样本信息方框314的子样本含有一个经译码图片及相关联的非VCL NAL单元。相关联的非VCL NAL单元可包含含有适用于经译码图片的SEI消息的NAL单元及含有适用于经译码图片的参数集(例如，VPS、SPS、PPS等)的NAL单元。Section 8.4.8 of ISO/IEC 14496-15 specifies the definition of subsamples for HEVC. Specifically, Section 8.4.8 of ISO/IEC 14496-15 specifies that, for use of the subsample information box (8.7.7 of ISO/IEC 14496-12) in an HEVC stream, subsamples are defined based on the value of the flags field of the subsample information box. According to one or more techniques of this disclosure, if the flags field in the subsample information box 314 is equal to 5, then the subsample corresponding to the subsample information box 314 contains one coded picture and associated non-VCL NAL units. The associated non-VCL NAL units may include NAL units containing SEI messages applicable to the coded picture and NAL units containing parameter sets (e.g., VPS, SPS, PPS, etc.) applicable to the coded picture.

因此，在一个实例中，文件产生装置34可产生文件(例如，文件300)，所述文件包括含有用于文件中的播放轨的元数据的播放轨方框(例如，播放轨方框306)。在此实例中，用于播放轨的媒体数据包括一连串样本，所述样本中的每一者为多层视频数据(例如，SHVC、MV-HEVC或3D-HEVC视频数据)的视频存取单元。此外，在此实例中，作为文件产生装置34产生文件的部分，文件产生装置34可在文件中产生子样本信息方框(例如，子样本信息方框314)，所述子样本信息方框含有指定在所述子样本信息方框中给出的子样本信息的类型的标志。当标志具有特定值时，对应于子样本信息方框的子样本含有正好一个经译码图片及与所述经译码图片相关联的零个或多个非VCL NAL单元。Thus, in one example, file generation device 34 may generate a file (e.g., file 300) that includes a track box (e.g., track box 306) containing metadata for a track in the file. In this example, the media data for the track includes a series of samples, each of which is a video access unit of multi-layer video data (e.g., SHVC, MV-HEVC, or 3D-HEVC video data). Furthermore, in this example, as part of generating the file, file generation device 34 may generate a subsample information box (e.g., subsample information box 314) in the file that contains a flag that specifies the type of subsample information given in the subsample information box. When the flag has a particular value, the subsample corresponding to the subsample information box contains exactly one coded picture and zero or more non-VCL NAL units associated with the coded picture.

此外，根据本发明的一或多种技术，如果子样本信息方框314的标志字段等于0，那么子样本信息方框314进一步包含DiscardableFlag值、NoInterLayerPredFlag值、LayerId值及TempId值。如果子样本信息方框314的标志字段等于5，那么子样本信息方框314可包含DiscardableFlag值、VclNalUnitType值、LayerId值、TempId值、NoInterLayerPredFlag值、SubLayerRefNalUnitFlag值及保留值。Furthermore, according to one or more techniques of this disclosure, if the Flags field of the Subsample Information box 314 is equal to 0, then the Subsample Information box 314 further includes a DiscardableFlag value, a NoInterLayerPredFlag value, a LayerId value, and a TempId value. If the Flags field of the Subsample Information box 314 is equal to 5, then the Subsample Information box 314 may include a DiscardableFlag value, a VclNalUnitType value, a LayerId value, a TempId value, a NoInterLayerPredFlag value, a SubLayerRefNalUnitFlag value, and a Reserved value.

SubLayerRefNalUnitFlag等于0指示子样本中的所有NAL单元为子层非参考图片的VCL NAL单元，如在ISO/IEC 23008-2(即，HEVC)中所指定。SubLayerRefNalUnitFlag等于1指示子样本中的所有NAL单元为子层参考图片的VCL NAL单元，如在ISO/IEC 23008-2(即，HEVC)中所指定。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示子样本中的所有NAL单元是否为子层非参考图片的VCL NAL单元的额外标志。SubLayerRefNalUnitFlag equal to 0 indicates that all NAL units in the subsample are VCL NAL units of sub-layer non-reference pictures, as specified in ISO/IEC 23008-2 (i.e., HEVC). SubLayerRefNalUnitFlag equal to 1 indicates that all NAL units in the subsample are VCL NAL units of sub-layer reference pictures, as specified in ISO/IEC 23008-2 (i.e., HEVC). Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional flag in the subsample information box 314 that indicates whether all NAL units in the subsample are VCL NAL units of sub-layer non-reference pictures.

DiscardableFlag值指示子样本中的VCL NAL单元的discardable_flag值的值。如在ISO/IEC 14496-15的第A.4章中所指定，在且仅在所有提取的或聚集的NAL单元具有设置到1的discardable_flag的情况下，应将discardable_flag值设置到1，且否则，将其设置到0。如果含有NAL单元的位流可在无NAL单元的情况下正确地解码，则NAL单元可具有设置到1的discardable_flag。因此，如果含有NAL单元的位流可在无NAL单元的情况下正确地解码，那么NAL单元可为“可舍弃的”。子样本中的所有VCL NAL单元应具有相同discardable_flag值。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示子样本的所有VCL NAL单元是否可舍弃的额外标志(例如，discardable_flag)。The DiscardableFlag value indicates the value of the discardable_flag value of the VCL NAL unit in the subsample. As specified in Chapter A.4 of ISO/IEC 14496-15, the discardable_flag value should be set to 1 if and only if all extracted or aggregated NAL units have the discardable_flag set to 1, and otherwise it should be set to 0. If the bitstream containing the NAL unit can be correctly decoded without the NAL unit, then the NAL unit may have the discardable_flag set to 1. Therefore, if the bitstream containing the NAL unit can be correctly decoded without the NAL unit, then the NAL unit may be "discardable". All VCL NAL units in a subsample should have the same discardable_flag value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional flag (e.g., discardable_flag) in the subsample information box 314 indicating whether all VCL NAL units of the subsample are discardable.

NoInterLayerPredFlag值指示子样本中的VCL NAL单元的inter_layer_pred_enabled_flag的值。在且仅在所有提取的或聚集的VCL NAL单元具有设置到1的inter_layer_pred_enabled_flag的情况下，应将inter_layer_pred_enabled_flag设置到1，且否则，将其设置到0。子样本中的所有VCL NAL单元应具有相同inter_layer_pred_enabled_flag值。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示是否针对子样本的所有VCL NAL单元启用层间预测的额外值(例如，inter_layer_pred_enabled_flag)。The NoInterLayerPredFlag value indicates the value of the inter_layer_pred_enabled_flag for the VCL NAL units in the subsample. The inter_layer_pred_enabled_flag should be set to 1 if and only if all extracted or aggregated VCL NAL units have the inter_layer_pred_enabled_flag set to 1, and otherwise set to 0. All VCL NAL units in a subsample should have the same inter_layer_pred_enabled_flag value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., inter_layer_pred_enabled_flag) in the subsample information box 314 indicating whether inter-layer prediction is enabled for all VCL NAL units of the subsample.

LayerId指示子样本中的NAL单元的nuh_layer_id值。子样本中的所有NAL单元应具有相同nuh_layer_id值。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示子样本的每一NAL单元的层识别符的额外值(例如，LayerId)。LayerId indicates the nuh_layer_id value of the NAL unit in the subsample. All NAL units in the subsample should have the same nuh_layer_id value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., LayerId) indicating the layer identifier of each NAL unit of the subsample in the subsample information box 314.

TempId指示子样本中的NAL单元的TemporalId值。子样本中的所有NAL单元应具有相同TemporalId值。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示子样本的每一NAL单元的时间识别符的额外值(例如，TempId)。TempId indicates the TemporalId value of the NAL units in the subsample. All NAL units in a subsample should have the same TemporalId value. Therefore, when the file generation device 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generation device 34 includes an additional value (e.g., TempId) indicating the temporal identifier of each NAL unit of the subsample in the subsample information box 314.

VclNalUnitType指示子样本中的VCL NAL单元的nal_unit_type语法元素。nal_unit_type语法元素为NAL单元的NAL单元标头中的语法元素。nal_unit_type语法元素指定NAL单元中所含的RBSP的类型。子样本中的所有nal_unit_type VCL NAL单元应具有相同nal_unit_type值。因此，当文件产生装置34产生子样本信息方框314且标志具有特定值(例如，5)时，文件产生装置34在子样本信息方框314中包含指示子样本的VCL NAL单元的NAL单元类型的额外值(例如，VclNalUnitType)。子样本的所有VCL NAL单元具有相同NAL单元类型。VclNalUnitType indicates the nal_unit_type syntax element of the VCL NAL unit in the subsample. The nal_unit_type syntax element is a syntax element in the NAL unit header of a NAL unit. The nal_unit_type syntax element specifies the type of RBSP contained in the NAL unit. All nal_unit_type VCL NAL units in a subsample should have the same nal_unit_type value. Therefore, when the file generator 34 generates the subsample information box 314 and the flag has a specific value (e.g., 5), the file generator 34 includes an additional value (e.g., VclNalUnitType) in the subsample information box 314 indicating the NAL unit type of the VCL NAL unit of the subsample. All VCL NAL units of a subsample have the same NAL unit type.

图5B为说明根据本发明的一或多种技术的文件300的替代实例结构的概念图。在图5B的实例中，oinf方框316作为与样本表方框309分离的方框包含在媒体信息方框308中，而非如图5A中所展示，oinf方框316包含在样本群组描述方框312中。图3B中的各种方框的内容及功能另外可与关于图5A描述的相同。FIG5B is a conceptual diagram illustrating an alternative example structure of file 300 in accordance with one or more techniques of this disclosure. In the example of FIG5B , an oinf box 316 is included in media information box 308 as a separate box from sample table box 309, rather than being included in sample group description box 312 as shown in FIG5A . The contents and functions of the various boxes in FIG3B may otherwise be the same as those described with respect to FIG5A .

图6为说明根据本发明的一或多种技术的文件300的实例结构的概念图。如在ISO/IEC 14496-15的第8.4.9章中所指定，HEVC允许文件格式样本仅用于参考且不用于输出。举例来说，HEVC允许非显示参考图片在视频中。FIG6 is a conceptual diagram illustrating an example structure of a file 300 in accordance with one or more techniques of this disclosure. As specified in Section 8.4.9 of ISO/IEC 14496-15, HEVC allows file format samples to be used only for reference and not for output. For example, HEVC allows non-display reference pictures in a video.

此外，ISO/IEC 14496-15的第8.4.9章指定当任何此非输出样本存在于播放轨中时，应如下约束文件。Furthermore, Section 8.4.9 of ISO/IEC 14496-15 specifies that when any such non-output samples are present in a playback track, the file should be constrained as follows.

1.非输出样本应被给予在输出的样本的时间范围外的组成时间。1. Non-output samples should be given a composition time outside the time range of the output samples.

2.应使用不包含非输出样本的组成时间的编辑列表。2. An edit list that does not include component times for non-output samples should be used.

3.当播放轨包含CompositionOffsetBox(‘ctts’)时，3. When the playback track contains CompositionOffsetBox (‘ctts’),

a.应使用CompositionOffsetBox的版本1，a. Version 1 of CompositionOffsetBox should be used,

b.sample_offset的值应针对每一非输出样本而设置为等于-2³¹，The value of b.sample_offset should be set equal to -2 ³¹ for each non-output sample,

c.CompositionToDecodeBox(‘cslg’)应含在播放轨的SampleTableBox(‘stbl’)中，且c.CompositionToDecodeBox(‘cslg’) should be contained in the SampleTableBox(‘stbl’) of the playback track, and

d.当对于播放轨存在CompositionToDecodeBox时，方框中的leastDecodeToDisplayDelta字段的值应等于CompositionOffsetBox中的最小组成偏移而不包含非输出样本的sample_offset值。d. When a CompositionToDecodeBox exists for a playback track, the value of the leastDecodeToDisplayDelta field in the box should be equal to the minimum composition offset in the CompositionOffsetBox excluding the sample_offset value of non-output samples.

注意：因此，leastDecodeToDisplayDelta大于-2³¹。Note: Therefore, leastDecodeToDisplayDelta is greater than -2 ³¹ .

如在ISO/IEC 14496-12中所指定，CompositionOffsetBox提供解码时间与组成时间之间的偏移。CompositionOffsetBox包含sample_offset值的集合。sample_offset值中的每一者为给出组成时间与解码时间之间的偏移的非负整数。组成时间指将输出样本所在的时间。解码时间指将解码样本所在的时间。As specified in ISO/IEC 14496-12, the CompositionOffsetBox provides the offset between the decoding time and the composition time. The CompositionOffsetBox contains a set of sample_offset values. Each sample_offset value is a non-negative integer that gives the offset between the composition time and the decoding time. The composition time is the time at which the sample will be output. The decoding time is the time at which the sample will be decoded.

如上文所指示，经译码切片NAL单元可包含切片片段标头。切片片段标头可为经译码切片片段的部分，且可含有关于切片片段中的第一或所有CTU的数据元素。在HEVC中，切片片段标头包含pic_output_flag语法元素。一般来说，pic_output_flag语法元素包含在图片的切片的第一切片片段标头中。因此，本发明可将图片的切片的第一切片片段标头的pic_output_flag称作图片的pic_output_flag。As indicated above, a coded slice NAL unit may include a slice segment header. The slice segment header may be part of a coded slice segment and may contain data elements related to the first or all CTUs in the slice segment. In HEVC, the slice segment header includes a pic_output_flag syntax element. Generally, the pic_output_flag syntax element is included in the first slice segment header of a slice of a picture. Therefore, this disclosure may refer to the pic_output_flag of the first slice segment header of a slice of a picture as the pic_output_flag of the picture.

如在HEVC WD的第7.4.7.1章中所指定，pic_output_flag语法元素影响经解码图片输出及移除程序，如在HEVC WD的附录C中所指定。一般来说，如果用于切片片段的切片片段标头的pic_output_flag语法元素为1，那么输出包含对应于所述切片片段标头的切片的图片。否则，如果用于切片片段的切片片段标头的pic_output_flag语法元素为0，那么可解码包含对应于所述切片片段标头的切片的图片以供用作参考图片，但不输出所述图片。As specified in Chapter 7.4.7.1 of the HEVC WD, the pic_output_flag syntax element affects the decoded picture output and removal procedures, as specified in Annex C of the HEVC WD. In general, if the pic_output_flag syntax element of the slice segment header for a slice segment is 1, then the pictures including the slice corresponding to the slice segment header are output. Otherwise, if the pic_output_flag syntax element of the slice segment header for a slice segment is 0, then the pictures including the slice corresponding to the slice segment header may be decoded for use as reference pictures, but the pictures are not output.

根据本发明的一或多种技术，在ISO/IEC 14496-15的第8.4.9章中对HEVC的参考可由对应的对SHVC、MV-HEVC或3D-HEVC的参考替换。此外，根据本发明的一或多种技术，当存取单元含有具有等于1的pic_output_flag的一些经译码图片及具有等于0的pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨来存储串流。对于所述播放轨中的每一相应者，相应播放轨的每一样本中的所有经译码图片具有相同pic_output_flag值。因此，播放轨中的第一者中的所有经译码图片具有等于0的pic_output_flag，且播放轨中的第二者中的所有经译码图片具有等于1的pic_output_flag。According to one or more techniques of this disclosure, references to HEVC in Section 8.4.9 of ISO/IEC 14496-15 may be replaced by corresponding references to SHVC, MV-HEVC, or 3D-HEVC. Furthermore, according to one or more techniques of this disclosure, when an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, at least two tracks must be used to store the stream. For each respective one of the tracks, all coded pictures in each sample of the respective track have the same pic_output_flag value. Thus, all coded pictures in a first one of the tracks have pic_output_flag equal to 0, and all coded pictures in a second one of the tracks have pic_output_flag equal to 1.

因此，在图6的实例中，文件产生装置34可产生文件400。类似于在图5A的实例中的文件300，文件400包含电影方框402及一或多个媒体数据方框404。媒体数据方框404中的每一者可对应于文件400的不同播放轨。电影方框402可含有用于文件400的播放轨的元数据。文件400的每一播放轨可包括媒体数据的连续串流。媒体数据方框404中的每一者可包含一或多个样本405。样本405中的每一者可包括音频或视频存取单元。6 , the file generation device 34 may generate a file 400. Similar to the file 300 in the example of FIG. 5A , the file 400 includes a movie box 402 and one or more media data boxes 404. Each of the media data boxes 404 may correspond to a different track of the file 400. The movie box 402 may contain metadata for the tracks of the file 400. Each track of the file 400 may include a continuous stream of media data. Each of the media data boxes 404 may include one or more samples 405. Each of the samples 405 may include an audio or video access unit.

如上文所指示，在一些实例中，当存取单元含有具有等于1的pic_output_flag的一些经译码图片及具有等于0的pic_output_flag的一些其它经译码图片时，必须使用至少两个播放轨来存储串流。因此，在图6的实例中，电影方框402包含播放轨方框406及播放轨方框408。播放轨方框406及408中的每一者围封用于文件400的不同播放轨的元数据。举例来说，播放轨方框406可围封用于具有具等于0的pic_output_flag的经译码图片且不具有具等于1的pic_output_flag的图片的播放轨之元数据。播放轨方框408可围封用于具有具等于1的pic_output_flag的经译码图片且不具有具等于0的pic_output_flag的图片的播放轨之元数据。As indicated above, in some examples, when an access unit contains some coded pictures with pic_output_flag equal to 1 and some other coded pictures with pic_output_flag equal to 0, at least two tracks must be used to store the stream. Thus, in the example of FIG6 , movie block 402 includes track block 406 and track block 408. Each of track blocks 406 and 408 encloses metadata for a different track of file 400. For example, track block 406 may enclose metadata for a track that has coded pictures with pic_output_flag equal to 0 and no pictures with pic_output_flag equal to 1. Track block 408 may enclose metadata for a track that has coded pictures with pic_output_flag equal to 1 and no pictures with pic_output_flag equal to 0.

因此，在一个实例中，文件产生装置34可产生包括围封(例如，包括)媒体内容的媒体数据方框(例如，媒体数据方框404)的文件(例如，文件400)。媒体内容包括一连串样本(例如，样本405)。样本中的每一者可为多层视频数据的存取单元。在此实例中，当文件产生装置34响应于位流的至少一个存取单元包含具有等于1的图片输出标志经译码图片及具有等于0的图片输出标志经译码图片的确定而产生文件时，文件产生装置34可使用至少两个播放轨将位流存储在文件中。对于来自至少两个播放轨的每一相应播放轨，相应播放轨的每一样本中的所有经译码图片具有相同的图片输出标志值。允许输出具有等于1的图片输出标志图片，且允许将具有等于0的图片输出标志图片用作参考图片，但不允许将其输出。Thus, in one example, file generation device 34 may generate a file (e.g., file 400) that includes a media data box (e.g., media data box 404) that encloses (e.g., includes) media content. The media content includes a series of samples (e.g., sample 405). Each of the samples may be an access unit of multi-layer video data. In this example, when file generation device 34 generates a file in response to a determination that at least one access unit of a bitstream includes a coded picture with a picture output flag equal to 1 and a coded picture with a picture output flag equal to 0, file generation device 34 may store the bitstream in the file using at least two tracks. For each respective track from the at least two tracks, all coded pictures in each sample of the respective track have the same picture output flag value. Pictures with a picture output flag equal to 1 are allowed to be output, and pictures with a picture output flag equal to 0 are allowed to be used as reference pictures, but are not allowed to be output.

图7为说明根据本发明的一或多种技术的文件产生装置34的实例操作的流程图。图7的操作连同本发明的其它流程图中所说明的操作为实例。根据本发明的技术的其它实例操作可包含更多、更少或不同动作。FIG7 is a flowchart illustrating an example operation of the file generation device 34 according to one or more techniques of this disclosure. The operation of FIG7 , along with the operations illustrated in other flowcharts of this disclosure, is an example. Other example operations according to the techniques of this disclosure may include more, fewer, or different actions.

在图7的实例中，文件产生装置34产生文件。作为产生文件的部分，文件产生装置34获得多层视频数据(170)且以文件格式存储多层视频数据(172)。文件产生装置34在文件格式的oinf方框中存储用于多层视频数据的每一操作点的表示格式信息(174)。文件产生装置34产生根据文件格式而格式化的视频数据的文件(176)。表示格式信息可包含空间分辨率、位深度或色彩格式中的一或多者。文件产生装置34可另外或替代地在文件格式的oinf方框中存储用于多层视频数据的每一操作点的位速率信息及/或可不在文件格式的配置方框之后用信号发送位速率方框。文件产生装置34可另外或替代地不在文件格式的解码器配置记录中存储配置文件、阶层及层级(PTL)信息、表示格式信息及帧速率信息，且使解码器配置记录中的所有其它信息与播放轨中的多层视频数据的所有层相关联。文件产生装置34可另外或替代地在文件格式的oinf方框中存储层计数，其中层计数指示多层视频数据的操作点的必要层的数目。In the example of FIG. 7 , file generation device 34 generates a file. As part of generating the file, file generation device 34 obtains multi-layer video data (170) and stores the multi-layer video data in a file format (172). File generation device 34 stores representation format information for each operation point of the multi-layer video data in an oinf box of the file format (174). File generation device 34 generates a file (176) of the video data formatted according to the file format. The representation format information may include one or more of a spatial resolution, a bit depth, or a color format. File generation device 34 may additionally or alternatively store bit rate information for each operation point of the multi-layer video data in an oinf box of the file format and/or may not signal the bit rate box after the configuration box of the file format. File generation device 34 may additionally or alternatively not store the configuration file, hierarchy and level (PTL) information, representation format information, and frame rate information in a decoder configuration record of the file format and associate all other information in the decoder configuration record with all layers of the multi-layer video data in the playback track. The file generation device 34 may additionally or alternatively store a layer count in an oinf box of the file format, wherein the layer count indicates the number of necessary layers for an operation point of the multi-layer video data.

oinf方框可包含在媒体信息方框中，且oinf方框可包含在样本群组描述方框中。样本群组描述方框可包含在样本表方框中，且样本表方框可包含在媒体信息方框中。The oinf box may be contained in the media information box, and the oinf box may be contained in the sample group description box. The sample group description box may be contained in the sample table box, and the sample table box may be contained in the media information box.

文件产生装置34可在多层视频数据的每一层的解码器配置记录中存储表示格式信息及帧速率信息。文件产生装置34可另外或替代地在多层视频数据的每一层的解码器配置记录中存储平行度信息。文件产生装置34可不在文件格式的解码器配置记录中存储操作点索引。文件产生装置34可另外或替代地在文件格式的解码器配置记录中存储与多层视频数据的播放轨相关联的操作点索引的列表。The file generation device 34 may store the presentation format information and the frame rate information in the decoder configuration record for each layer of the multi-layer video data. The file generation device 34 may also or alternatively store the parallelism information in the decoder configuration record for each layer of the multi-layer video data. The file generation device 34 may not store the operation point index in the decoder configuration record of the file format. The file generation device 34 may also or alternatively store a list of operation point indexes associated with the playback tracks of the multi-layer video data in the decoder configuration record of the file format.

图8为说明文件读取装置(诸如，目的地装置14、后处理实体27或网络实体29)的实例操作的流程图。图8的操作连同本发明的其它流程图中所说明的操作为实例。根据本发明的技术的其它实例操作可包含更多、更少或不同动作。FIG8 is a flowchart illustrating an example operation of a file reading device, such as destination device 14, post-processing entity 27, or network entity 29. The operation of FIG8 , along with the operations illustrated in other flowcharts of this disclosure, is an example. Other example operations according to the techniques of this disclosure may include more, fewer, or different actions.

在图8的实例中，文件读取装置获得根据文件格式而格式化的多层视频数据的文件(180)。文件读取装置针对文件格式确定文件格式的oinf方框中的用于多层视频数据的每一操作点的表示格式信息(182)。文件读取装置可能结合诸如视频解码器30的视频解码器基于所确定的表示格式信息而解码多层视频数据(184)。In the example of FIG8 , a file reader obtains a file of multi-layer video data formatted according to a file format (180). The file reader determines, for the file format, representation format information for each operation point of the multi-layer video data in an oinf box of the file format (182). The file reader decodes the multi-layer video data based on the determined representation format information, possibly in conjunction with a video decoder such as the video decoder 30 (184).

在一或多个实例中，所描述功能可以硬件、软件、固件或其任何组合来实施。如果以软件实施，那么所述功能可作为一或多个指令或程序代码而在计算机可读媒体上存储或经由计算机可读媒体发射，且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体(其对应于诸如数据存储媒体的有形媒体)或通信媒体，通信媒体包含(例如)根据通信协议促进计算机程序从一处传送到另一处的任何媒体。以此方式，计算机可读媒体大体可对应于(1)为非暂时性的有形计算机可读存储媒体，或(2)通信媒体，诸如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本发明中所描述的技术的指令、代码及/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted via a computer-readable medium as one or more instructions or program codes and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, which includes, for example, any media that facilitates the transfer of a computer program from one place to another according to a communication protocol. In this manner, computer-readable media may generally correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) communication media, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

通过实例而非限制，此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储器、磁盘存储器或其它磁性存储装置、闪存或可用于存储呈指令或数据结构形式的所要程序代码且可由计算机存取的任何其它媒体。又，将任何连接适当地称为计算机可读媒体。举例来说，如果使用同轴缆线、光纤缆线、双绞线、数字用户线(DSL)或无线技术(诸如，红外线、无线电及微波)从网站、服务器或其它远程源发射指令，那么同轴缆线、光纤缆线、双绞线、DSL或无线技术(诸如，红外线、无线电及微波)包含在媒体的定义中。然而，应理解，计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体，而是有关非暂时性有形存储媒体。如本文中所使用，磁盘及光盘包含紧密光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软性磁盘及蓝光(Blu-ray)光盘，其中磁盘通常以磁性方式再现数据，而光盘用激光以光学方式再现数据。以上各者的组合亦应包含在计算机可读媒体的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather refer to non-transitory, tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks typically reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

可由诸如一或多个数字信号处理器(DSP)、一般用途微处理器、专用集成电路(ASIC)、场可编程逻辑数组(FPGA)或其它等效集成或离散逻辑电路的一或多个处理器来执行指令。因此，如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任何其它结构中的任一者。此外，在一些方面中，本文中所描述的功能性可提供在经配置用于编码及解码的专用硬件及/或软件模块内，或并入组合式编码解码器中。此外，所述技术可完全实施在一或多个电路或逻辑组件中。Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Thus, the term "processor," as used herein, may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic components.

本发明的技术可在广泛多种装置或设备中实施，所述装置或装置包含无线手机、集成电路(IC)或IC的集合(例如，芯片集合)。本发明中描述各种组件、模块或单元以强调经配置以执行所揭示技术的装置的功能方面，但未必要求由不同硬件单元来实现。确切地说，如上文所描述，可将各种单元组合在编码解码器硬件单元中，或通过互操作性硬件单元(包含如上文所描述的一或多个处理器)的集合结合合适软件及/或固件来提供所述单元。The techniques of this disclosure can be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or collections of ICs (e.g., chipsets). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as described above, the various units can be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in conjunction with appropriate software and/or firmware.

已描述各种实例。此等及其它实例处于以下权利要求书的范围内。Various embodiments have been described. These and other embodiments are within the scope of the following claims.

Claims

1. A method for processing multi-layer video data, the method comprising:

Obtain multi-layer video data including multiple operation points;

The multi-layer video data is stored in a file format, wherein the file format includes operation point information oinf boxes, which identify the operation points included in the multi-layer video data;

The oinf box stores the configuration file, hierarchy, and indicator of the level PTL information for each layer at each operation point of the multi-layer video data;

The oinf box stores representation format information for each operational point of the multi-layer video data, wherein the representation format information includes one or more of spatial resolution, bit depth, or color format; and

Generates a file containing video data formatted according to the stated file format.

2. The method of claim 1, wherein the oinf box definition includes sample group operation points in the multi-layer video data.

3. The method according to claim 1, further comprising:

The bit rate information for each operation point of the multi-layer video data is stored in the oinf box of the file format; and

The signal transmission bit rate box is not used after the configuration box for the file format.

4. The method of claim 1, further comprising:

The configuration file, hierarchy and level PTL information, representation format information, and frame rate information are not stored in the decoder configuration record of the file format; and

This associates all information from each layer in the decoder configuration record with each layer of the multi-layer video data in the playback track.

5. The method of claim 1, further comprising:

The representation format information and frame rate information are stored in the decoder configuration record for each layer of the multi-layer video data.

6. The method of claim 5, further comprising:

Parallelism information is stored in the decoder configuration record for each layer of the multi-layer video data.

7. The method of claim 1, further comprising:

The operation point index is not stored in the decoder configuration record of the file format.

8. The method of claim 1, further comprising:

The decoder configuration record of the file format stores a list of operation point indices associated with the playback track of the multi-layer video data.

9. The method of claim 1, further comprising:

The layer count is stored in the oinf box of the file format, wherein the layer count indicates the number of necessary layers for the operation points of the multi-layer video data.

10. The method of claim 1, wherein the oinf box is included in the media information box.

11. The method of claim 10, wherein the oinf box is further included in the sample group description box, wherein the sample group description box is included in the sample table box, and wherein the sample table box is included in the media information box.

12. The method of claim 1, wherein each operation point of the multilayer video data comprises a bitstream, the bitstream being generated from the other stream by means of an operation of a sub-bitstream extraction procedure of the other stream.

13. A video apparatus for processing multi-layer video data, the apparatus comprising:

Data storage media configured to store the multi-layer video data; and

One or more processors configured to perform the following operations:

Obtain multi-layer video data including multiple operation points;

14. The apparatus of claim 13, wherein the oinf box definition includes sample group operation points in the multilayer video data.

15. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

16. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

17. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

18. The apparatus of claim 17, wherein the one or more processors are further configured to perform the following operations:

19. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

20. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

21. The apparatus of claim 13, wherein the one or more processors are further configured to perform the following operations:

22. The apparatus of claim 13, wherein the oinf box is included in the media information box.

23. The apparatus of claim 22, wherein the oinf box is further included in a sample group description box, wherein the sample group description box is included in a sample table box, and wherein the sample table box is included in the media information box.

24. The apparatus of claim 13, wherein each operation point of the multilayer video data comprises a bitstream, the bitstream being generated from the other stream by operation of a sub-bitstream extraction procedure of the other stream.

25. A video apparatus for processing multi-layer video data, the apparatus comprising:

A device for acquiring multi-layer video data including multiple operation points;

A means for storing the multi-layer video data in a file format, wherein the file format includes operation point information oinf boxes that identify the operation points included in the multi-layer video data;

A means for storing in the oinf box an indicator of the configuration file, hierarchy, and level PTL information for each layer at each point of operation of the multi-layer video data;

Means for storing representation format information for each operational point of the multi-layer video data within the oinf frame, wherein the representation format information includes one or more of spatial resolution, bit depth, or color format; and

A means for generating a file of video data formatted according to the file format.

26. The apparatus of claim 25, wherein the oinf box definition includes sample group operation points in the multilayer video data.

27. The apparatus of claim 25, wherein the oinf box is included in the media information box.

28. The apparatus of claim 27, wherein the oinf box is further included in a sample group description box, wherein the sample group description box is included in a sample table box, and wherein the sample table box is included in the media information box.

29. A non-transitory computer-readable storage medium storing instructions, which, when executed, cause one or more processors to perform the following operations:

Obtain multi-layer video data including multiple operation points;

30. The non-transitory computer-readable storage medium of claim 29, wherein the oinf block definition includes sample grouping operation points in the multi-layer video data.

31. The non-transitory computer-readable storage medium of claim 29, wherein the oinf box is included in the media information box.

32. The non-transitory computer-readable storage medium of claim 31, wherein the oinf box is further included in the sample group description box, wherein the sample group description box is included in the sample table box, and wherein the sample table box is included in the media information box.