HK40080098B

HK40080098B - Methods, apparatus and storage medium for decoding video data

Info

Publication number: HK40080098B
Application number: HK62023069256.3A
Authority: HK
Inventors: 马杜·柏林加色·克里什南; 赵欣; 刘杉
Original assignee: 腾讯美国有限责任公司
Priority date: 2021-04-16
Filing date: 2022-01-28
Publication date: 2025-06-20

Description

Methods, apparatus, and storage media for decoding video data

援引并入Incorporation

本申请基于并要求于2021年4月16日提交的第63/175,897号美国临时申请和于2022年1月4日提交的第17/568,275号美国非临时申请的优先权权益，这两个申请通过引用整体并入本文中。This application is based on and claims priority interests in U.S. Provisional Application No. 63/175,897, filed April 16, 2021, and U.S. Non-Provisional Application No. 17/568,275, filed January 4, 2022, both of which are incorporated herein by reference in their entirety.

技术领域Technical Field

本公开描述了一组高级视频编码技术。更具体地，所公开的技术涉及视频编码和解码中的变换划分方案与主/二次变换类型选择之间的交互。This disclosure describes a set of advanced video coding techniques. More specifically, the disclosed techniques relate to the interaction between transform partitioning schemes and primary/secondary transform type selection in video coding and decoding.

背景技术Background Technology

本文所提供的背景描述是出于总体上呈现本公开内容的目的。在该背景技术部分以及本说明书的各个方面中所描述的、目前已署名的发明人的工作所进行的程度，并不表明其在本申请提交时有资格作为现有技术，且从未明示或暗示其被承认为本公开内容的现有技术。The background description provided herein is for the purpose of presenting the general content of this disclosure. The extent of the work performed by the currently named inventors as described in this background section and in various aspects of this specification does not imply that it was qualified as prior art at the time of filing of this application, nor is it expressly or implied that it was recognized as prior art of this disclosure.

可使用具有运动补偿的帧间图片预测来执行视频编码和解码。未压缩的数字视频可包括一系列图片，每个图片具有例如为1920×1080的亮度样本及相关联的完全或子采样的色度样本的空间大小。该一系列图片可具有例如每秒60幅图片或每秒60帧的固定或可变的图片速率(替代地，称为帧率)。对于流式传输或数据处理，未压缩的视频具有特定比特率要求。例如，具有像素分辨率为1920×1080、帧率为60帧/秒、色度子采样为4：2：0、每个像素8位/颜色通道的视频需要接近1.5Gbit/s的带宽。一小时的此类视频需要600GB以上的存储空间。Video encoding and decoding can be performed using inter-frame picture prediction with motion compensation. Uncompressed digital video can comprise a series of pictures, each with a spatial size of, for example, a 1920×1080 luminance sample and an associated full or subsampled chrominance sample. This series of pictures can have a fixed or variable picture rate (alternatively referred to as the frame rate), for example, 60 pictures per second or 60 frames per second. Uncompressed video has specific bitrate requirements for streaming or data processing. For example, video with a pixel resolution of 1920×1080, a frame rate of 60 frames per second, a chrominance subsampling of 4:2:0, and 8 bits per pixel/color channel requires bandwidth close to 1.5 Gbit/s. One hour of such video would require more than 600 GB of storage space.

视频编码和解码的一个目的可以是通过压缩来减少未压缩的输入视频信号中的冗余。压缩可有助于减小上述带宽和/或存储空间需求，在某些情况下可减小两个数量级或大于两个数量级。可采用无损压缩和有损压缩以及它们的组合。无损压缩指的是可通过解码过程从已压缩的原始信号中重建原始信号的精确副本的技术。有损压缩指的是以下编码/解码过程：原始视频信息在编码期间没有完全保留，且在解码期间不能完全恢复。当使用有损压缩时，已重建的信号可能与原始信号不同，但是原始信号和已重建的信号之间的失真足够小，以使已重建的信号可用于预期的应用，尽管会损失一些信息。在视频的情况下，在许多应用中广泛采用有损压缩。可容忍的失真量取决于应用。例如，某些消费视频流式应用的用户相比电影或电视广播应用的用户来说可以容忍更高的失真。可选择或调节可通过特定编码算法实现的压缩率，以反映各种失真容忍度：更高的可容忍的失真通常允许编码算法产生更高的损失和更高的压缩率。One objective of video encoding and decoding is to reduce redundancy in the uncompressed input video signal through compression. Compression can help reduce the aforementioned bandwidth and/or storage space requirements, in some cases by two orders of magnitude or more. Lossless compression, lossy compression, and combinations thereof can be employed. Lossless compression refers to a technique that reconstructs an exact copy of the original signal from the compressed original signal through a decoding process. Lossy compression refers to an encoding/decoding process where the original video information is not fully preserved during encoding and cannot be fully recovered during decoding. When using lossy compression, the reconstructed signal may differ from the original signal, but the distortion between the original and reconstructed signals is small enough that the reconstructed signal can be used for the intended application, despite some information loss. In the case of video, lossy compression is widely used in many applications. The tolerable amount of distortion depends on the application. For example, users of some consumer video streaming applications may tolerate higher distortion than users of movie or television broadcasting applications. The compression ratio achievable through a specific encoding algorithm can be selected or adjusted to reflect various distortion tolerances: higher tolerable distortion generally allows the encoding algorithm to produce higher losses and higher compression ratios.

视频编码器和解码器可利用来自多个宽泛类别的技术和步骤，这些步骤包括例如运动补偿、傅里叶变换、量化和熵编码。Video encoders and decoders can utilize techniques and steps from a wide range of categories, including, for example, motion compensation, Fourier transform, quantization, and entropy coding.

视频编解码器技术可包括称为帧内编码的技术。在帧内编码中，在不参考来自先前重建的参考图片的样本或其它数据的情况下表示样本值。在一些视频编解码器中，图片在空间上细分成样本块。当所有样本块都以帧内模式编码时，该图片可称为帧内图片。帧内图片及其派生物(例如，独立解码器刷新图片)可用于重置解码器状态，因此可用作已编码视频码流和视频会话中的第一张图片，或者用作静止图像。然后，帧内预测之后的块的样本可进行变换，而变换到频域中，且可以在熵编码之前对如此生成的变换系数进行量化。帧内预测表示一种使预变换域中的样本值最小化的技术。在一些情况下，变换之后的DC值越小，且AC系数越小，则在给定的量化步长下表示熵编码之后的块所需的位越少。Video codec techniques may include techniques called intra-frame coding. In intra-frame coding, sample values are represented without reference to samples or other data from a previously reconstructed reference picture. In some video codecs, a picture is spatially subdivided into sample blocks. When all sample blocks are encoded in intra-frame mode, the picture may be called an intra-frame picture. Intra-frame pictures and their derivatives (e.g., independent decoder refresh pictures) can be used to reset the decoder state and thus can be used as the first picture in an encoded video stream and video session, or as a still image. Samples of blocks following intra-frame prediction can then be transformed into the frequency domain, and the transformed coefficients thus generated can be quantized before entropy coding. Intra-frame prediction represents a technique that minimizes the sample values in the pre-transform domain. In some cases, the smaller the transformed DC value and the smaller the AC coefficients, the fewer bits are needed to represent the entropy-coded block at a given quantization step size.

例如从诸如MPEG-2代编码技术已知的传统帧内编码不使用帧内预测。然而，一些更新的视频压缩技术包括试图基于例如周围样本数据和/或元数据对块进行编码/解码的技术，该周围样本数据和/或元数据在空间相邻的编码和/或解码期间获得并且按照解码顺序排在正在被帧内编码或解码的数据块之前。在下文中这样的技术称为“帧内预测”技术。应注意，至少在一些情况下，帧内预测仅使用来自正在重建的当前图片的参考数据，而不使用来自参考图片的参考数据。For example, traditional intra-frame coding techniques known from MPEG-2 generation coding technologies do not use intra-frame prediction. However, some newer video compression techniques include those that attempt to encode/decode blocks based on, for example, surrounding sample data and/or metadata, which is obtained during spatially adjacent encoding and/or decoding and is placed before the data block being intra-frame encoded or decoded in decoding order. Such techniques are referred to below as "intra-frame prediction" techniques. It should be noted that, at least in some cases, intra-frame prediction uses only reference data from the current picture being reconstructed, and not reference data from a reference picture.

帧内预测可以有许多不同形式。当在给定的视频编码技术中可使用不止一种这样的技术时，使用中的技术可称为帧内预测模式。可以在特定编解码器中提供一种或多种帧内预测模式。在一些情况下，模式可具有子模式和/或可与各种参数相关联，且模式/子模式信息和视频块的帧内编码参数可单独地编码或共同地包含在模式码字中。给定的模式、子模式和/或参数组合使用哪个码字，可能会影响通过帧内预测的编码效率增益，因此可能会影响用于将码字转换成码流的熵编码技术。Intra-frame prediction can take many different forms. When more than one such technique is available in a given video coding technique, the technique in use is called an intra-frame prediction mode. One or more intra-frame prediction modes can be provided in a particular codec. In some cases, a mode may have sub-modes and/or may be associated with various parameters, and the mode/sub-mode information and the intra-frame coding parameters of the video block may be encoded separately or contained together in the mode codeword. Which codeword is used for a given combination of mode, sub-mode, and/or parameters can affect the coding efficiency gain through intra-frame prediction, and therefore may affect the entropy coding technique used to convert the codeword into a bitstream.

H.264引入了一种帧内预测模式，该帧内预测模式在H.265中得到完善，且在诸如联合探索模型(JEM)、下一代视频编码(VVC)和基准集(BMS)等更新的编码技术中进一步得到完善。通常，对于帧内预测，可使用属于已经变成可用的相邻样本值来形成预测值块(predictor block)。例如，特定相邻样本集的可用值可沿着一定方向和/或行复制到预测值块中。对使用方向的参考可以在码流中进行编码，或者其本身可被预测。H.264 introduced an intra-frame prediction mode, which was refined in H.265 and further improved in newer coding techniques such as the Joint Exploration Model (JEM), Next-Generation Video Coding (VVC), and Baseline Spectrum (BMS). Typically, for intra-frame prediction, predictor blocks are formed using neighboring sample values that have become available. For example, available values from a specific neighboring sample set can be copied along a certain direction and/or line into the predictor block. The reference to the direction used can be encoded in the bitstream or can itself be predicted.

参考图1A，在右下方描绘了在H.265的33种可能的帧内预测方向(对应于H.265中指定的35个帧内模式的33个角度模式)中指定的9个预测方向的子集。箭头汇聚的点(101)表示正在被预测的样本。箭头表示使用相邻样本预测101处的样本所沿的方向。例如，箭头(102)指示根据在右上方、与水平方向成45度角的一个或多个相邻样本预测样本(101)。类似地，箭头(103)指示根据在样本(101)的左下方、与水平方向成22.5度角的一个或多个相邻样本预测样本(101)。Referring to Figure 1A, a subset of nine prediction directions specified in the 33 possible intra-frame prediction directions of H.265 (corresponding to the 33 angular modes of the 35 intra-frame modes specified in H.265) is depicted in the lower right. The point (101) where the arrows converge represents the sample being predicted. The arrows indicate the direction along which the sample at 101 is predicted using neighboring samples. For example, arrow (102) indicates that sample (101) is predicted based on one or more neighboring samples in the upper right at a 45-degree angle to the horizontal. Similarly, arrow (103) indicates that sample (101) is predicted based on one or more neighboring samples in the lower left of sample (101) at a 22.5-degree angle to the horizontal.

仍然参考图1A，在左上方描绘了4×4个样本的正方形块(104)(由粗体虚线指示)。正方形块(104)包括16个样本，每个样本用“S”、其在Y维度上的位置(例如，行索引)和其在X维度上的位置(例如，列索引)来标记。例如，样本S21是在Y维度上(从顶部开始)的第二个样本和在X维度上(从左侧开始)的第一个样本。类似地，样本S44是在Y和X维度上、块(104)中的第四个样本。由于块的大小为4×4个样本，因此S44位于右下角。还示出了遵循类似编号方案的示例性参考样本。参考样本用R、其相对于块(104)的Y位置(例如，行索引)和X位置(列索引)来标记。在H.264和H.265中，使用与正在重建的块相邻的预测样本。Referring again to Figure 1A, a 4×4 square block (104) of samples is depicted in the upper left (indicated by a bold dashed line). The square block (104) comprises 16 samples, each labeled with “S”, its position in the Y dimension (e.g., row index), and its position in the X dimension (e.g., column index). For example, sample S21 is the second sample in the Y dimension (from the top) and the first sample in the X dimension (from the left). Similarly, sample S44 is the fourth sample in the block (104) in both the Y and X dimensions. Since the block size is 4×4 samples, S44 is located in the lower right corner. An exemplary reference sample following a similar numbering scheme is also shown. The reference sample is labeled with R, its Y position (e.g., row index) relative to the block (104), and its X position (column index). In H.264 and H.265, predicted samples adjacent to the block being reconstructed are used.

块104的帧内图片预测可通过根据用信号表示的预测方向从相邻样本复制参考样本值来开始。例如，假设已编码视频码流包括信令，该信令针对该块104指示箭头(102)的预测方向，即，根据在右上方、与水平方向成45度角的一个或多个预测样本来预测样本。在这种情况下，根据同一个参考样本R05预测样本S41、S32、S23和S14。然后根据参考样本R08预测样本S44。Intra-frame picture prediction for block 104 can begin by copying reference sample values from neighboring samples based on the prediction direction indicated by a signal. For example, suppose the encoded video stream includes signaling that indicates the prediction direction of the arrow (102) for block 104, i.e., predicting samples based on one or more prediction samples in the upper right corner at a 45-degree angle to the horizontal. In this case, samples S41, S32, S23, and S14 are predicted based on the same reference sample R05. Then sample S44 is predicted based on reference sample R08.

在一些情况下，可例如通过插值来组合多个参考样本的值，以计算参考样本；尤其是当方向不能被45度整除时。In some cases, the values of multiple reference samples can be combined, for example, by interpolation, to calculate the reference sample; especially when the direction is not divisible by 45 degrees.

随着视频编码技术继续发展，可能的方向的数量增加。在H.264(2003年)中，例如九个不同的方向可用于帧内预测。在H.265(2013年)中，增加到33个方向，以及在本公开时，JEM/VVC/BMS可支持多达65个方向。已进行实验研究来帮助识别最合适的帧内预测方向，且熵编码中的一些技术可用于以小的比特对那些最合适的方向进行编码，对于方向，接受一定的比特代价。此外，有时可以根据已解码的相邻块的帧内预测中所使用的相邻方向来预测方向本身。As video coding techniques continue to evolve, the number of possible directions increases. In H.264 (2003), for example, nine different directions were available for intra-frame prediction. In H.265 (2013), this increased to 33 directions, and at the time of this disclosure, JEM/VVC/BMS supports up to 65 directions. Experimental studies have been conducted to help identify the most suitable intra-frame prediction directions, and some techniques in entropy coding can be used to encode those most suitable directions with small bit counts, accepting a certain bit cost for each direction. Furthermore, sometimes the direction itself can be predicted based on the adjacent directions used in the intra-frame prediction of decoded adjacent blocks.

图1B示出了示意图(180)，其描绘了根据JEM的65个帧内预测方向，以说明在随着时间的推移而开发的各种编码技术中，预测方向的数量增加。Figure 1B shows a schematic diagram (180) depicting 65 intra-frame prediction directions according to JEM to illustrate the increasing number of prediction directions in various coding techniques developed over time.

在已编码视频码流中，将表示帧内预测方向的比特映射到该预测方向的方式，可能因视频编码技术的不同而不同；例如，其范围可以从预测方向简单直接映射到帧内预测模式，变化为预测方向映射到码字，映射到涉及最可能模式的复杂自适应方案，以及类似技术。然而，在所有情况下，可存在某些帧内预测方向，这些方向与某些其它方向相比，在统计上出现在视频内容中的可能性较小。由于视频压缩的目标是减少冗余，因此在一种设计良好的视频编码技术中，相比于更可能的方向，那些不太可能的方向可由更多的位数表示。In an encoded video stream, the way bits representing an intra-prediction direction are mapped to that prediction direction can vary depending on the video coding technique. For example, the range can vary from a simple direct mapping of the prediction direction to an intra-prediction mode, to mapping the prediction direction to codewords, to complex adaptive schemes involving the most probable mode, and similar techniques. However, in all cases, there may be some intra-prediction directions that are statistically less likely to appear in the video content compared to others. Since the goal of video compression is to reduce redundancy, in a well-designed video coding technique, less probable directions can be represented by more bits than more probable directions.

帧间图片预测或帧间预测可基于运动补偿。在运动补偿中，来自先前重建的图片或其部分(参考图片)的样本数据在沿着由运动矢量(此后称为MV)指示的方向在空间上偏移之后，可用于预测新重建的图片或图片部分(例如，块)。在一些情况下，参考图片可与当前正在重建的图片相同。MV可具有X和Y两个维度，或具有三个维度，第三个维度为正在使用的参考图片的指示(类似于时间维度)。Inter-frame image prediction, or inter-prediction, can be based on motion compensation. In motion compensation, sample data from a previously reconstructed image or a portion thereof (the reference image), after being spatially offset along a direction indicated by a motion vector (hereafter referred to as MV), can be used to predict a newly reconstructed image or image portion (e.g., a patch). In some cases, the reference image may be the same as the image currently being reconstructed. The MV may have two dimensions, X and Y, or three dimensions, with the third dimension being an indication of the reference image being used (similar to a temporal dimension).

在一些视频压缩技术中，可根据其它MV，例如根据在空间上与正在重建的区域相邻的样本数据的另一区域相关的、且按解码次序处于当前MV之前的其它MV来预测适用于样本数据的某个区域的该当前MV。这样做可通过消除相关MV中的冗余来大大减少对MV进行编码所需的数据总量，从而增加压缩效率。MV预测可有效地工作，例如，由于在对从相机获得的输入视频信号(称为自然视频)进行编码时，存在以下统计可能性：比适用单个MV的区域更大的区域在视频序列中沿着相似的方向移动，因此在某些情况下，可使用从相邻区域的MV导出的相似运动矢量来预测该更大的区域。这使得用于给定区域的实际MV与根据周围MV所预测的MV相似或相同。进而在熵编码之后，该MV可以采用比直接对MV进行编码(而非根据相邻MV来预测该MV)时使用的位数更小的位数来进行表示。在一些情况下，MV预测可以是无损压缩从原始信号(即：样本流)中导出的信号(即：MV)的示例。在其它情况下，例如由于根据多个周围MV计算预测值时出现舍入误差，使得MV预测本身可以是有损的。In some video compression techniques, the current MV applicable to a region of sample data can be predicted based on other MVs, such as those related to another region of sample data spatially adjacent to the region being reconstructed and preceding the current MV in decoding order. This significantly reduces the total amount of data required to encode the MV by eliminating redundancy in related MVs, thereby increasing compression efficiency. MV prediction works effectively, for example, because when encoding the input video signal (called natural video) from the camera, there is a statistical probability that a larger region than the region applicable to a single MV moves along similar directions in the video sequence. Therefore, in some cases, the larger region can be predicted using similar motion vectors derived from the MVs of neighboring regions. This makes the actual MV for a given region similar to or the same as the MV predicted based on the surrounding MVs. Consequently, after entropy coding, the MV can be represented using fewer bits than when directly encoding the MV (rather than predicting it based on neighboring MVs). In some cases, MV prediction can be an example of lossless compression of the signal (i.e., the MV) derived from the original signal (i.e., the sample stream). In other cases, such as when rounding errors occur while calculating the predicted value based on multiple surrounding MVs, the MV prediction itself can be lossy.

H.265/HEVC(ITU-T H.265建议书，“High Efficiency Video Coding(高效视频编码)”，2016年12月)中描述了各种MV预测机制。在H.265指定的多种MV预测机制中，下面描述的是下文称为“空间合并(spatial merge)”的技术。H.265/HEVC (ITU-T H.265 Recommendation, “High Efficiency Video Coding”, December 2016) describes various MV prediction mechanisms. Among the various MV prediction mechanisms specified in H.265, the following describes a technique referred to below as “spatial merge”.

具体地，参考图2，当前块(201)包括在运动搜索过程期间已由编码器找到的样本，可根据已产生空间偏移的相同大小的先前块来预测该样本。可以从与一个或多个参考图片相关联的元数据中导出MV，而非直接对该MV进行编码，例如使用与被标记为A0、A1和B0、B1、B2(分别对应202到206)的五个周围样本中的任一样本相关联的MV，(按解码次序)从最近的参考图片中导出该MV。在H.265中，MV预测可使用来自相邻块正在使用的同一参考图片的预测值。Specifically, referring to Figure 2, the current block (201) includes samples found by the encoder during the motion search process, which can be predicted based on previous blocks of the same size that have generated spatial offsets. Instead of directly encoding the MV, it can be derived from metadata associated with one or more reference images, for example, using the MV associated with any of five surrounding samples labeled A0, A1 and B0, B1, B2 (corresponding to 202 to 206 respectively), derived from the nearest reference image (in decoding order). In H.265, MV prediction can use predictions from the same reference image being used by adjacent blocks.

发明内容Summary of the Invention

本公开描述了用于视频编码和/或解码的方法、装置和计算机可读存储介质的各种实施例。This disclosure describes various embodiments of methods, apparatuses, and computer-readable storage media for video encoding and/or decoding.

根据一方面，本公开的一个实施例提供一种用于在解码器中对视频数据进行编码/解码的方法。该方法包括：接收数据块的已编码视频码流；从已编码视频码流中提取与数据块相关联的变换划分类型；以及响应于变换划分类型属于预定变换划分类型集的子集，预定变换划分类型集中的每个变换划分类型指定用于将数据块分割成变换块的拆分模式：提取与从数据块拆分的变换块相关联的变换的变换类型，该变换类型在已编码视频码流中用信号表示，其中，变换类型属于第一预定变换类型集；以及根据变换类型，对变换块执行逆变换。According to one aspect, one embodiment of this disclosure provides a method for encoding/decoding video data in a decoder. The method includes: receiving an encoded video stream of data blocks; extracting a transform partition type associated with the data blocks from the encoded video stream; and in response to a transform partition type belonging to a subset of a predetermined set of transform partition types, each transform partition type in the predetermined set specifying a splitting pattern for dividing the data blocks into transform blocks: extracting a transform type associated with the transform blocks split from the data blocks, the transform type being represented by a signal in the encoded video stream, wherein the transform type belongs to a first predetermined set of transform types; and performing an inverse transform on the transform blocks according to the transform type.

根据另一方面，本公开的一个实施例提供一种用于对视频数据进行编码/解码的方法。该方法包括：接收数据块的已编码视频码流；从已编码视频码流中提取与视频数据的数据块相关联的变换划分类型；响应于变换划分类型属于预定变换划分类型集的子集，从已编码视频码流中提取与数据块的变换块相关联的变换类型；以及响应于变换划分类型不属于预定变换划分类型集，按照默认方式识别数据块的变换类型。According to another aspect, one embodiment of this disclosure provides a method for encoding/decoding video data. The method includes: receiving an encoded video stream of data blocks; extracting a transform partition type associated with a data block of video data from the encoded video stream; extracting a transform type associated with a transform block of the data block from the encoded video stream in response to the transform partition type belonging to a subset of a predetermined transform partition type set; and identifying the transform type of the data block according to a default method in response to the transform partition type not belonging to the predetermined transform partition type set.

根据另一方面，本公开的一个实施例提供一种用于对视频数据进行编码/解码的方法。该方法包括：接收数据块的已编码视频码流；从已编码视频码流中提取与数据块的变换块相关联的变换的变换类型；以及响应于变换类型属于预定变换类型集：从已编码视频码流中提取与数据块相关联的变换划分类型。According to another aspect, one embodiment of this disclosure provides a method for encoding/decoding video data. The method includes: receiving an encoded video stream of data blocks; extracting a transform type of a transform associated with a transform block of the data blocks from the encoded video stream; and, in response to the transform type belonging to a predetermined transform type set: extracting a transform partition type associated with the data blocks from the encoded video stream.

根据另一方面，本公开的一个实施例提供一种用于视频编码和/或解码的装置。该装置包括：存储指令的存储器；和与存储器通信的处理器。当处理器执行指令时，处理器配置成使得该装置执行上述用于视频解码和/或编码的方法。According to another aspect, one embodiment of this disclosure provides an apparatus for video encoding and/or decoding. The apparatus includes: a memory storing instructions; and a processor communicating with the memory. When the processor executes the instructions, the processor is configured to cause the apparatus to perform the aforementioned methods for video decoding and/or encoding.

根据又一方面，本公开的一个实施例提供一种非暂时性计算机可读介质，非暂时性计算机可读介质存储有指令，当该指令由用于视频解码和/或编码的计算机执行时，该指令使得该计算机执行上述用于视频解码和/或编码的方法。According to another aspect, one embodiment of this disclosure provides a non-transitory computer-readable medium storing instructions that, when executed by a computer for video decoding and/or encoding, cause the computer to perform the aforementioned methods for video decoding and/or encoding.

上述和其它方面及其实现方式在附图、说明书和权利要求中更详细地描述。The above and other aspects and their implementations are described in more detail in the drawings, description and claims.

附图说明Attached Figure Description

根据以下详细描述和附图，所公开的主题的进一步的特征、性质和各种优点将更加明显，在附图中：Further features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and accompanying drawings, in which:

图1A示出了帧内预测定向模式的示例性子集的示意性图示。Figure 1A shows a schematic illustration of an exemplary subset of intra-predictive orientation modes.

图1B示出了示例性帧内预测方向的图示。Figure 1B illustrates an exemplary intra-frame prediction direction.

图2示出了在一个示例中，用于运动矢量预测的当前块及其周围空间合并候选的示意性图示。Figure 2 shows a schematic illustration of the current block and its surrounding spatial merging candidates for motion vector prediction in one example.

图3示出了根据一个示例性实施例的通信系统(300)的简化框图的示意性图示。Figure 3 shows a schematic illustration of a simplified block diagram of a communication system (300) according to an exemplary embodiment.

图4示出了根据一个示例性实施例的通信系统(400)的简化框图的示意性图示。Figure 4 shows a schematic illustration of a simplified block diagram of a communication system (400) according to an exemplary embodiment.

图5示出了根据一个示例性实施例的视频解码器的简化框图的示意性图示。Figure 5 shows a schematic illustration of a simplified block diagram of a video decoder according to an exemplary embodiment.

图6示出了根据一个示例性实施例的视频编码器的简化框图的示意性图示。Figure 6 shows a schematic illustration of a simplified block diagram of a video encoder according to an exemplary embodiment.

图7示出了根据另一示例性实施例的视频编码器的框图。Figure 7 shows a block diagram of a video encoder according to another exemplary embodiment.

图8示出了根据另一示例性实施例的视频解码器的框图。Figure 8 shows a block diagram of a video decoder according to another exemplary embodiment.

图9示出了根据本公开的示例性实施例的定向(directional)帧内预测模式。Figure 9 illustrates a directional intra-frame prediction mode according to an exemplary embodiment of the present disclosure.

图10示出了根据本公开的示例性实施例的非定向帧内预测模式。Figure 10 illustrates a non-directional intra-frame prediction mode according to an exemplary embodiment of the present disclosure.

图11示出了根据本公开的示例性实施例的递归帧内预测模式。Figure 11 illustrates a recursive intra-frame prediction mode according to an exemplary embodiment of the present disclosure.

图12示出了根据本公开的示例性实施例的帧内预测块的变换块划分和扫描。Figure 12 illustrates the transform block partitioning and scanning of an intra-prediction block according to an exemplary embodiment of the present disclosure.

图13示出了根据本公开的示例性实施例的帧间预测块的变换块划分和扫描。Figure 13 illustrates the transform block partitioning and scanning of inter-frame prediction blocks according to an exemplary embodiment of the present disclosure.

图14示出了根据本公开的示例性实施例的低频不可分离变换过程。Figure 14 illustrates a low-frequency non-separable transformation process according to an exemplary embodiment of the present disclosure.

图15示出了根据本公开的示例性实施例的基于各个参考行(reference line)的帧内预测方案。Figure 15 illustrates an intra-frame prediction scheme based on individual reference lines according to an exemplary embodiment of the present disclosure.

图16示出了根据本公开的示例性实施例的非递归块划分方案。Figure 16 illustrates a non-recursive block partitioning scheme according to an exemplary embodiment of the present disclosure.

图17示出了根据本公开的示例性实施例的流程图。Figure 17 shows a flowchart of an exemplary embodiment according to the present disclosure.

图18示出了根据本公开的示例性实施例的计算机系统的示意性图示。Figure 18 shows a schematic illustration of a computer system according to an exemplary embodiment of the present disclosure.

具体实施方式Detailed Implementation

图3示出了根据本公开的一个实施例的通信系统(300)的简化框图。通信系统(300)包括多个终端设备，该多个终端设备可通过例如网络(350)彼此通信。例如，通信系统(300)包括通过网络(350)互连的第一终端设备对(310)和(320)。在图3的示例中，第一终端设备对(310)和(320)可执行单向数据传输。例如，终端设备(310)可以对视频数据(例如，由终端设备(310)采集的视频图片流的视频数据)进行编码，以通过网络(350)传输到另一终端设备(320)。已编码视频数据可以以一个或多个已编码视频码流的形式传输。终端设备(320)可以从网络(350)接收已编码视频数据，对已编码视频数据进行解码以恢复视频图片，以及根据所恢复的视频数据显示视频图片。单向数据传输可以在媒体服务等应用中实现。Figure 3 shows a simplified block diagram of a communication system (300) according to an embodiment of the present disclosure. The communication system (300) includes a plurality of terminal devices that can communicate with each other via, for example, a network (350). For example, the communication system (300) includes a first pair of terminal devices (310) and (320) interconnected via the network (350). In the example of Figure 3, the first pair of terminal devices (310) and (320) can perform unidirectional data transmission. For example, terminal device (310) can encode video data (e.g., video data of a video picture stream captured by terminal device (310)) for transmission over the network (350) to another terminal device (320). The encoded video data can be transmitted in the form of one or more encoded video streams. Terminal device (320) can receive the encoded video data from the network (350), decode the encoded video data to recover video pictures, and display the video pictures based on the recovered video data. Unidirectional data transmission can be implemented in applications such as media services.

在另一示例中，通信系统(300)包括执行已编码视频数据的双向传输的第二终端设备对(330)和(340)，该双向传输可例如在视频会议应用期间实现。对于双向数据传输，在一个示例中，终端设备(330)和(340)中的每个终端设备可以对视频数据(例如，由终端设备采集的视频图片流的视频数据)进行编码，以通过网络(350)传输到终端设备(330)和(340)中的另一终端设备。终端设备(330)和(340)中的每个终端设备还可接收由终端设备(330)和(340)中的另一终端设备传输的已编码视频数据，且可以对已编码视频数据进行解码以恢复视频图片，以及可根据所恢复的视频数据在可访问的显示设备上显示视频图片。In another example, the communication system (300) includes a pair of second terminal devices (330) and (340) that perform bidirectional transmission of encoded video data, which may be implemented, for example, during a video conferencing application. For bidirectional data transmission, in one example, each of the terminal devices (330) and (340) may encode video data (e.g., video data from a video picture stream captured by the terminal device) for transmission over a network (350) to the other terminal device (330) and (340). Each of the terminal devices (330) and (340) may also receive encoded video data transmitted by the other terminal device (330) and (340), and may decode the encoded video data to recover video pictures, and may display the video pictures on an accessible display device based on the recovered video data.

在图3的示例中，终端设备(310)、终端设备(320)、终端设备(330)和终端设备(340)可实现为服务器、个人计算机和智能电话，但是本公开的潜在原理的适用性可不限于此。本公开的实施例可以在台式计算机、膝上型计算机、平板电脑、媒体播放器、可穿戴计算机、专用视频会议设备等中实现。网络(350)表示在终端设备(310)、终端设备(320)、终端设备(330)和终端设备(340)之间传送已编码视频数据的任何数量的网络，包括例如有线(连线的)和/或无线通信网络。通信网络(350)9可以在电路交换信道、分组交换信道和/或其它类型的信道中交换数据。代表性的网络包括电信网络、局域网、广域网和/或互联网。出于本讨论的目的，除非在本文中予以明确说明，否则网络(350)的架构和拓扑对于本公开的操作来说可能是无关紧要的。In the example of Figure 3, terminal devices (310), (320), (330), and (340) can be implemented as servers, personal computers, and smartphones, but the applicability of the underlying principles of this disclosure is not limited thereto. Embodiments of this disclosure can be implemented in desktop computers, laptop computers, tablet computers, media players, wearable computers, dedicated video conferencing equipment, etc. Network (350) refers to any number of networks that transmit encoded video data between terminal devices (310), (320), (330), and (340), including, for example, wired (connected) and/or wireless communication networks. Communication network (350) 9 can exchange data in circuit-switched channels, packet-switched channels, and/or other types of channels. Representative networks include telecommunications networks, local area networks (LANs), wide area networks (WANs), and/or the Internet. For the purposes of this discussion, unless explicitly stated herein, the architecture and topology of the network (350) may be irrelevant to the operation of this disclosure.

作为用于所公开的主题的应用的示例，图4示出了视频编码器和视频解码器在视频流式传输环境中的放置方式。所公开的主题可同等地适用于其它视频应用，包括例如视频会议、数字TV、广播、游戏、虚拟现实、在包括CD、DVD、存储棒等的数字介质上存储压缩视频等等。As an example of an application of the disclosed subject matter, Figure 4 illustrates the placement of a video encoder and a video decoder in a video streaming environment. The disclosed subject matter is equally applicable to other video applications, including, for example, video conferencing, digital TV, broadcasting, gaming, virtual reality, storing compressed video on digital media including CDs, DVDs, memory sticks, etc.

视频流式传输系统可包括视频采集子系统(413)，采集子系统(413)可包括诸如数码相机的视频源(401)，视频源(401)用于创建未压缩的视频图片或图像流(402)。在一个示例中，视频图片流(402)包括由视频源401的数码相机记录的样本。相较于已编码视频数据(404)(或已编码视频码流)，被描绘为粗线以强调高数据量的视频图片流(402)可由电子设备(420)处理，电子设备(420)包括耦接到视频源(401)的视频编码器(403)。视频编码器(403)可包括硬件、软件或软硬件组合，以实现或实施如下文更详细地描述的所公开的主题的各方面。相较于未压缩的视频图片流(402)，被描绘为细线以强调较低数据量的已编码视频数据(404)(或已编码视频码流(404))可存储在流式传输服务器(405)上以供将来使用，或者直接存储于下游视频设备(未示出)。一个或多个流式传输客户端子系统，例如图4中的客户端子系统(406)和客户端子系统(408)可访问流式传输服务器(405)以检索已编码视频数据(404)的副本(407)和副本(409)。客户端子系统(406)可包括例如电子设备(430)中的视频解码器(410)。视频解码器(410)对已编码视频数据的传入副本(407)进行解码，且产生可以在显示器(412)(例如，显示屏)或其它呈现设备(未描绘)上呈现的未压缩的输出视频图片流(411)。视频解码器410可配置成执行本公开所描述的各种功能中的一些或所有功能。在一些流式传输系统中，可根据某些视频编码/压缩标准对已编码视频数据(404)、(407)和(409)(例如，视频码流)进行编码。这些标准的示例包括ITU-T H.265建议书。在一个示例中，正在开发的视频编码标准非正式地称为下一代视频编码(VVC)。所公开的主题可用于VVC的上下文和其它视频编码标准中。The video streaming system may include a video acquisition subsystem (413), which may include a video source (401) such as a digital camera, for creating uncompressed video pictures or image streams (402). In one example, the video picture stream (402) includes samples recorded by the digital camera of the video source 401. The video picture stream (402), depicted as a thick line to emphasize its high data volume, may be processed by an electronic device (420) including a video encoder (403) coupled to the video source (401). The video encoder (403) may include hardware, software, or a combination of hardware and software to implement or enforce aspects of the disclosed subject matter as described in more detail below. Compared to the uncompressed video picture stream (402), the encoded video data (404) (or encoded video bitstream (404)) depicted as thin lines to emphasize its lower data volume may be stored on a streaming server (405) for future use or directly stored on a downstream video device (not shown). One or more streaming client subsystems, such as client subsystems (406) and (408) in FIG. 4, may access the streaming server (405) to retrieve copies (407) and (409) of the encoded video data (404). The client subsystem (406) may include, for example, a video decoder (410) in an electronic device (430). The video decoder (410) decodes the incoming copy (407) of the encoded video data and produces an uncompressed output video picture stream (411) that can be presented on a display (412) (e.g., a screen) or other presentation device (not depicted). The video decoder 410 may be configured to perform some or all of the various functions described in this disclosure. In some streaming systems, encoded video data (404), (407), and (409) (e.g., video streams) can be encoded according to certain video coding/compression standards. Examples of these standards include ITU-T Recommendation H.265. In one example, the video coding standard under development is informally referred to as Next Generation Video Coding (VVC). The topics disclosed can be used in the context of VVC and other video coding standards.

应注意，电子设备(420)和电子设备(430)可包括其它组件(未示出)。例如，电子设备(420)可包括视频解码器(未示出)，且电子设备(430)还可包括视频编码器(未示出)。It should be noted that electronic devices (420) and (430) may include other components (not shown). For example, electronic device (420) may include a video decoder (not shown), and electronic device (430) may also include a video encoder (not shown).

在下文中，图5示出了根据本公开的任意实施例的视频解码器(510)的框图。视频解码器(510)可包括在电子设备(530)中。电子设备(530)可包括接收器(531)(例如，接收电路)。视频解码器(510)可用于代替图4的示例中的视频解码器(410)。In the following text, Figure 5 shows a block diagram of a video decoder (510) according to any embodiment of the present disclosure. The video decoder (510) may be included in an electronic device (530). The electronic device (530) may include a receiver (531) (e.g., receiving circuitry). The video decoder (510) may be used in place of the video decoder (410) in the example of Figure 4.

接收器(531)可接收将由视频解码器(510)解码的一个或多个已编码视频序列。在同一实施例或另一实施例中，一次可以对一个已编码视频序列进行解码，其中每个已编码视频序列的解码独立于其它已编码视频序列。每个视频序列可与多个视频帧或图像相关联。可以从信道(501)接收已编码视频序列，信道(501)可以是硬件/软件链路，其通向存储已编码视频数据的存储设备或者传输已编码视频数据的流式传输源。接收器(531)可接收可转发到它们各自的处理电路(未描绘)的已编码视频数据和其它数据，例如已编码音频数据和/或辅助数据流。接收器(531)可以将已编码视频序列与其它数据分开。为了防止网络抖动，缓冲存储器(515)可设置在接收器(531)和熵解码器/解析器(520)(此后称为“解析器(520)”)之间。在某些应用中，缓冲存储器(515)可实现为视频解码器(510)的一部分。在其它应用中，缓冲存储器(515)可位于视频解码器(510)的外部且与视频解码器(510)分离(未描绘)。而在另一些其它应用中，在视频解码器(510)的外部可设置缓冲存储器(未描绘)，其目的是例如防止网络抖动，且在视频解码器(510)的内部可设置另一附加的缓冲存储器(515)以例如处理播放定时。当接收器(531)从具有足够带宽和可控性的存储/转发设备或从等时同步网络接收数据时，可能不需要缓冲存储器(515)，或可以将缓冲存储器做得较小。为了在诸如互联网等业务分组网络上使用，可能需要足够大小的缓冲存储器(515)，缓冲存储器(515)的大小可相对较大。这种缓冲存储器可实现为具有自适应大小，且可至少部分地在操作系统或视频解码器(510)外部的类似元件(未描绘)中实现。The receiver (531) may receive one or more encoded video sequences to be decoded by the video decoder (510). In the same embodiment or another embodiment, one encoded video sequence may be decoded at a time, wherein the decoding of each encoded video sequence is independent of other encoded video sequences. Each video sequence may be associated with multiple video frames or images. Encoded video sequences may be received from a channel (501), which may be a hardware/software link leading to a storage device storing the encoded video data or a streaming source transmitting the encoded video data. The receiver (531) may receive encoded video data and other data, such as encoded audio data and/or auxiliary data streams, that may be forwarded to their respective processing circuitry (not depicted). The receiver (531) may separate the encoded video sequences from other data. To prevent network jitter, a buffer memory (515) may be provided between the receiver (531) and the entropy decoder/parser (520) (hereinafter referred to as "parser (520)"). In some applications, the buffer memory (515) may be implemented as part of the video decoder (510). In other applications, the buffer memory (515) may be located external to and separate from the video decoder (510) (not depicted). In still other applications, a buffer memory (not depicted) may be placed external to the video decoder (510) for purposes such as preventing network jitter, and another additional buffer memory (515) may be placed internally to handle playback timing, for example. When the receiver (531) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous synchronization network, the buffer memory (515) may not be necessary, or the buffer memory may be made smaller. For use on packet networks such as the Internet, a buffer memory (515) of sufficient size may be required, and the size of the buffer memory (515) may be relatively large. Such a buffer memory may be implemented with an adaptive size and may be implemented at least partially in a similar component (not depicted) external to the operating system or the video decoder (510).

视频解码器(510)可包括解析器(520)，以根据已编码视频序列重建符号(521)。这些符号的类别包括用于管理视频解码器(510)的操作的信息，以及用于控制诸如显示器(512)(例如，显示屏)之类的呈现设备的潜在信息，该呈现设备可以是或不是电子设备(530)的整体部分，但是可耦接到电子设备(530)，如图5所示。用于呈现设备的控制信息可以是辅助增强信息(SEI消息)或视频可用性信息(VUI)参数集片段(未描绘)的形式。解析器(520)可以对解析器(520)所接收的已编码视频序列进行解析/熵解码。已编码视频序列的熵编码可根据视频编码技术或标准进行，且可遵循各种原理，包括可变长度编码、霍夫曼(Huffman)编码、具有或不具有上下文敏感度的算术编码等。解析器(520)可基于对应于子群的至少一个参数，从已编码视频序列提取用于视频解码器中的像素的子群中的至少一个子群的子群参数集。子群可包括图片群组(GOP)、图片、图块、切片、宏块、编码单元(CU)、块、变换单元(TU)、预测单元(PU)等。解析器(520)还可以从已编码视频序列提取信息，例如变换系数(例如，傅里叶变换系数)，量化器参数值，运动矢量等。The video decoder (510) may include a parser (520) to reconstruct symbols (521) from the encoded video sequence. These symbols may include information for managing the operation of the video decoder (510) and potential information for controlling a presentation device such as a display (512) (e.g., a screen), which may or may not be integral to the electronic device (530), but may be coupled to the electronic device (530), as shown in Figure 5. Control information for the presentation device may be in the form of supplementary enhancement information (SEI messages) or fragments of video availability information (VUI) parameter sets (not depicted). The parser (520) may perform parsing/entropy decoding on the encoded video sequence received by the parser (520). The entropy coding of the encoded video sequence may be performed according to video coding techniques or standards and may follow various principles, including variable-length coding, Huffman coding, arithmetic coding with or without context sensitivity, etc. The parser (520) can extract a set of subgroup parameters from the encoded video sequence for use in the video decoder, based on at least one parameter corresponding to a subgroup. Subgroups may include group of pictures (GOP), pictures, tiles, slices, macroblocks, coding units (CU), blocks, transform units (TU), prediction units (PU), etc. The parser (520) can also extract information from the encoded video sequence, such as transform coefficients (e.g., Fourier transform coefficients), quantizer parameter values, motion vectors, etc.

解析器(520)可以对从缓冲存储器(515)接收的视频序列执行熵解码/解析操作，从而创建符号(521)。The parser (520) can perform entropy decoding/parsing operations on the video sequence received from the buffer memory (515) to create symbols (521).

取决于已编码视频图片或一部分已编码视频图片(例如：帧间图片和帧内图片，帧间块和帧内块)的类型以及其它因素，符号(521)的重建可涉及多个不同的处理或功能单元。涉及到的单元以及单元所涉及的方式可由解析器(520)通过从已编码视频序列解析的子群控制信息来控制。为了简单起见，未描绘解析器(520)与下文的多个处理或功能单元之间的此类子群控制信息流。Depending on the type of encoded video frames or portions thereof (e.g., inter-frame and intra-frame frames, inter-frame and intra-frame blocks) and other factors, the reconstruction of the symbol (521) may involve multiple different processing or functional units. The units involved, and the manner in which they are involved, can be controlled by the parser (520) through subgroup control information parsed from the encoded video sequence. For simplicity, the flow of such subgroup control information between the parser (520) and the various processing or functional units described below is not depicted.

除了已提及的功能块之外，视频解码器(510)可以在概念上细分成如下文所描述的多个功能单元。在商业约束下运行的实际实现方式中，这些功能单元中的许多功能单元彼此紧密交互且可至少部分地彼此集成。然而，出于清楚地描述所公开的主题的各种功能的目的，在本公开的下文中采用概念上细分成的多个功能。In addition to the functional blocks already mentioned, the video decoder (510) can be conceptually subdivided into multiple functional units as described below. In a practical implementation operating under commercial constraints, many of these functional units interact closely with each other and can be at least partially integrated with each other. However, for the purpose of clearly describing the various functions of the disclosed subject matter, the concept of multiple functions is adopted below in this disclosure.

第一单元可包括缩放器/逆变换单元(551)。缩放器/逆变换单元(551)可以从解析器(520)接收作为符号(521)的量化变换系数以及控制信息，包括指示要使用哪种类型的逆变换、块大小、量化因子/参数、量化缩放矩阵等信息。缩放器/逆变换单元(551)可输出包括样本值的块，样本值可输入到聚合器(555)中。The first unit may include a scaler/inverse transform unit (551). The scaler/inverse transform unit (551) can receive quantization transform coefficients as symbols (521) from the parser (520) as well as control information, including information such as the type of inverse transform to be used, block size, quantization factor/parameter, quantization scaling matrix, etc. The scaler/inverse transform unit (551) can output a block containing sample values, which can be input into the aggregator (555).

在一些情况下，缩放器/逆变换(551)的输出样本可属于帧内编码块；即，不使用来自先前重建的图片的预测信息，但是可使用来自当前图片的先前重建部分的预测信息的块。此类预测信息可由帧内图片预测单元(552)提供。在一些情况下，帧内图片预测单元(552)可使用已重建且存储在当前图片缓冲器(558)中的周围块信息来生成大小和形状与正在重建的块相同的块。例如，当前图片缓冲器(558)缓冲部分重建的当前图片和/或完全重建的当前图片。在一些实现方式中，聚合器(555)可基于每个样本，将帧内预测单元(552)生成的预测信息添加到由缩放器/逆变换单元(551)提供的输出样本信息中。In some cases, the output samples of the scaler/inverse transform (551) may belong to intra-coded blocks; that is, blocks that do not use prediction information from previously reconstructed images but can use prediction information from previously reconstructed portions of the current image. Such prediction information may be provided by the intra-picture prediction unit (552). In some cases, the intra-picture prediction unit (552) may use information from surrounding blocks that have been reconstructed and stored in the current picture buffer (558) to generate blocks of the same size and shape as the blocks being reconstructed. For example, the current picture buffer (558) buffers partially reconstructed current images and/or fully reconstructed current images. In some implementations, the aggregator (555) may add the prediction information generated by the intra-picture prediction unit (552) to the output sample information provided by the scaler/inverse transform unit (551) based on each sample.

在其它情况下，缩放器/逆变换单元(551)的输出样本可属于帧间编码和潜在运动补偿的块。在这种情况下，运动补偿预测单元(553)可访问参考图片存储器(557)以提取用于帧间图片预测的样本。在根据属于块的符号(521)对所提取的样本进行运动补偿之后，这些样本可由聚合器(555)添加到缩放器/逆变换单元(551)的输出(单元551的输出可称为残差样本或残差信号)，从而生成输出样本信息。运动补偿预测单元(553)从参考图片存储器(557)内的地址提取预测样本可受到运动矢量控制，该运动矢量可以以符号(521)的形式提供给运动补偿预测单元(553)使用，符号(521)可具有例如X分量、Y分量(偏移)和参考图片分量(时间)。运动补偿还可包括在使用子样本精确运动矢量时，从参考图片存储器(557)提取的样本值的内插，运动补偿还可与运动矢量预测机制等相关联。In other cases, the output samples of the scaler/inverse transform unit (551) may belong to blocks of inter-frame coding and potential motion compensation. In this case, the motion compensation prediction unit (553) may access the reference picture memory (557) to extract samples for inter-frame picture prediction. After motion compensation is performed on the extracted samples according to the symbols (521) belonging to the blocks, these samples may be added by the aggregator (555) to the output of the scaler/inverse transform unit (551) (the output of unit 551 may be referred to as residual samples or residual signals), thereby generating output sample information. The extraction of prediction samples by the motion compensation prediction unit (553) from the address in the reference picture memory (557) may be controlled by a motion vector, which may be provided to the motion compensation prediction unit (553) in the form of symbols (521), which may have, for example, an X component, a Y component (offset), and a reference picture component (time). Motion compensation may also include interpolation of sample values extracted from a reference image memory (557) when using precise motion vectors of subsamples, and motion compensation may also be associated with motion vector prediction mechanisms, etc.

聚合器(555)的输出样本可经受环路滤波器单元(556)中的各种环路滤波技术。视频压缩技术可包括环路内滤波器技术，该环路内滤波器技术受控于包括在已编码视频序列(还称为已编码视频码流)中且可作为来自解析器(520)的符号(521)可用于环路滤波器单元(556)的参数，然而，视频压缩技术还可响应于在对已编码图片或已编码视频序列的先前(按解码次序)部分进行解码期间获得的元信息，以及响应于先前重建且经过环路滤波的样本值。可以以各种顺序包括多种类型的环路滤波器，作为环路滤波器单元556的一部分，如下文将进一步详细描述的。The output samples of the aggregator (555) can be subjected to various loop filtering techniques in the loop filter unit (556). Video compression techniques may include in-loop filtering techniques controlled by parameters included in the encoded video sequence (also referred to as the encoded video stream) and available as symbols (521) from the parser (520) to the loop filter unit (556). However, the video compression techniques may also respond to metadata obtained during decoding of previous (in decoding order) portions of the encoded picture or encoded video sequence, and to previously reconstructed and loop-filtered sample values. Various types of loop filters may be included in various orders as part of the loop filter unit 556, as will be described in further detail below.

环路滤波器单元(556)的输出可以是样本流，该样本流可输出到呈现设备(512)以及存储在参考图片存储器(557)中以用于将来的帧间图片预测。The output of the loop filter unit (556) can be a sample stream that can be output to the presentation device (512) and stored in the reference image memory (557) for future inter-frame image prediction.

一旦完全重建，某些已编码图片就可用作参考图片以用于将来的帧间图片预测。例如，一旦对应于当前图片的已编码图片被完全重建，且已编码图片(通过例如解析器(520))被识别为参考图片，则当前图片缓冲器(558)可变成参考图片存储器(557)的一部分，且可以在开始重建后续已编码图片之前重新分配新的当前图片缓冲器。Once fully reconstructed, certain encoded images can be used as reference images for future inter-frame image prediction. For example, once the encoded image corresponding to the current image has been fully reconstructed and the encoded image (by, for example, the parser (520)) is identified as the reference image, the current image buffer (558) can become part of the reference image memory (557), and a new current image buffer can be reallocated before the reconstruction of subsequent encoded images begins.

视频解码器(510)可根据诸如ITU-T H.265建议书的标准中采用的预定视频压缩技术执行解码操作。在已编码视频序列遵循视频压缩技术或标准的语法以及视频压缩技术或标准中记录的配置文件的意义上，已编码视频序列可符合所使用的视频压缩技术或标准指定的语法。具体而言，配置文件可以从视频压缩技术或标准中可用的所有工具中选择某些工具，作为在该配置文件下可供使用的仅有工具。为了符合标准，已编码视频序列的复杂度可处于视频压缩技术或标准的层级所限定的范围内。在一些情况下，层级限制最大图片大小、最大帧率、最大重建采样率(以例如每秒兆(mega)个样本为单位进行测量)、最大参考图片大小等。在一些情况下，由层级设定的限制可通过假想参考解码器(HRD)规范和在已编码视频序列中用信号表示的HRD缓冲器管理的元数据来进一步限定。The video decoder (510) can perform decoding operations according to a predetermined video compression technique adopted in a standard such as ITU-T Recommendation H.265. The encoded video sequence may conform to the syntax specified by the video compression technique or standard in the sense that the encoded video sequence follows the syntax of the video compression technique or standard and the configuration file recorded in the video compression technique or standard. Specifically, the configuration file may select certain tools from all available tools in the video compression technique or standard as the only tools available under that configuration file. To conform to the standard, the complexity of the encoded video sequence may be within the range defined by the hierarchy of the video compression technique or standard. In some cases, the hierarchy limits the maximum image size, maximum frame rate, maximum reconstruction sampling rate (measured in megasamples per second, for example), maximum reference image size, etc. In some cases, the limitations set by the hierarchy may be further limited by the hypothetical reference decoder (HRD) specification and the metadata managed by the HRD buffer, which is represented by signals in the encoded video sequence.

在一些示例性实施例中，接收器(531)可以在接收已编码视频时接收附加(冗余)数据。附加数据可以被包括作为已编码视频序列的一部分。附加数据可由视频解码器(510)使用来对数据进行适当解码和/或更准确地重建原始视频数据。附加数据可采用例如时间、空间或信噪比(SNR)增强层、冗余切片、冗余图片、前向纠错码等形式。In some exemplary embodiments, the receiver (531) may receive additional (redundant) data when receiving encoded video. The additional data may be included as part of the encoded video sequence. The additional data may be used by the video decoder (510) to properly decode the data and/or more accurately reconstruct the original video data. The additional data may take the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant pictures, forward error correction codes, etc.

图6示出了根据本公开的一个示例性实施例的视频编码器(603)的框图。视频编码器(603)可包括在电子设备(620)中。电子设备(620)可进一步包括传输器(640)(例如，传输电路)。视频编码器(603)可用于代替图4的示例中的视频编码器(403)。Figure 6 shows a block diagram of a video encoder (603) according to an exemplary embodiment of the present disclosure. The video encoder (603) may be included in an electronic device (620). The electronic device (620) may further include a transmitter (640) (e.g., transmission circuitry). The video encoder (603) may be used in place of the video encoder (403) in the example of Figure 4.

视频编码器(603)可以从视频源(601)(并非图6的示例中的电子设备(620)的一部分)接收视频样本，视频源(601)可采集将由视频编码器(603)编码的视频图像。在另一示例中，视频源(601)可实现为电子设备(620)的一部分。The video encoder (603) can receive video samples from a video source (601) (not part of the electronic device (620) in the example of Figure 6), which can capture video images that will be encoded by the video encoder (603). In another example, the video source (601) may be implemented as part of the electronic device (620).

视频源(601)可提供将由视频编码器(603)编码的呈数字视频样本流形式的源视频序列，该数字视频样本流可具有任何合适的位深度(例如：8位、10位、12位......)、任何色彩空间(例如BT.601YCrCb，RGB，XYZ......)和任何合适的采样结构(例如YCrCb 4:2:0，YCrCb 4:4:4)。在媒体服务系统中，视频源(601)可以是能够存储先前已准备的视频的存储设备。在视频会议系统中，视频源(601)可以是采集本地图像信息作为视频序列的相机。视频数据可作为多个单独的图片或图像来提供，当按顺序观看时，这些图片或图像被赋予运动。图片本身可构建为空间像素阵列，其中取决于所使用的采样结构、色彩空间等，每个像素可包括一个或多个样本。本领域的普通技术人员可容易地理解像素和样本之间的关系。下文侧重于描述样本。A video source (601) can provide a sequence of source video samples to be encoded by a video encoder (603) in the form of a digital video sample stream, which can have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit, etc.), any color space (e.g., BT.601YCrCb, RGB, XYZ, etc.), and any suitable sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media service system, the video source (601) can be a storage device capable of storing previously prepared video. In a video conferencing system, the video source (601) can be a camera that captures local image information as a video sequence. The video data can be provided as multiple individual pictures or images, which are given motion when viewed in sequence. The pictures themselves can be constructed as spatial pixel arrays, where each pixel can include one or more samples depending on the sampling structure, color space, etc., used. The relationship between pixels and samples will be readily understood by those skilled in the art. The following focuses on describing samples.

根据一些示例性实施例，视频编码器(603)可实时地或在应用所要求的任何其它时间约束下，将源视频序列的图片编码并压缩成已编码视频序列(643)。施行适当的编码速度构成了控制器(650)的一个功能。在一些实施例中，控制器(650)可以在功能上耦接到如下文所描述的其它功能单元且控制所述的其它功能单元。为了简单起见，图中未描绘耦接。由控制器(650)设置的参数可包括速率控制相关参数(图片跳过、量化器、率失真优化技术的λ值......)、图片大小、图片群组(GOP)布局、最大运动矢量搜索范围等。控制器(650)可配置成具有其它合适的功能，这些功能涉及针对某一系统设计优化的视频编码器(603)。According to some exemplary embodiments, a video encoder (603) can encode and compress images of a source video sequence into an encoded video sequence (643) in real time or under any other time constraints required by the application. Implementing an appropriate encoding rate constitutes a function of the controller (650). In some embodiments, the controller (650) may be functionally coupled to and control other functional units described below. For simplicity, coupling is not depicted in the figures. Parameters set by the controller (650) may include rate control related parameters (image skipping, quantizer, λ value of rate-distortion optimization techniques, etc.), image size, group of images (GOP) layout, maximum motion vector search range, etc. The controller (650) may be configured to have other suitable functions related to the video encoder (603) optimized for a particular system design.

在一些示例性实施例中，视频编码器(603)可配置成在编码环路中进行操作。作为过于简化的描述，在一个示例中，编码环路可包括源编码器(630)(例如，负责基于待编码的输入图片和参考图片来创建符号，例如符号流)和嵌入于视频编码器(603)中的(本地)解码器(633)。解码器(633)重建符号以用类似于(远程)解码器可创建样本数据的方式创建样本数据，即使所嵌入的解码器633处理在不进行熵编码的情况下由源编码器630编码的视频流(因为在所公开的主题中考虑的视频压缩技术中，熵编码中的符号与已编码视频码流之间的任何压缩可以是无损的)。重建的样本流(样本数据)输入到参考图片存储器(634)。由于符号流的解码产生与解码器位置(本地或远程)无关的位精确结果，因此参考图片存储器(634)中的内容在本地编码器和远程编码器之间也是按比特位精确对应的。换句话说，编码器的预测部分“看到”的参考图片样本与解码器将在解码期间使用预测时所“看到”的样本值完全相同。这种参考图片同步性基本原理(以及在例如因信道误差而无法维持同步性的情况下产生的漂移)用于改善编码质量。In some exemplary embodiments, the video encoder (603) may be configured to operate within an encoding loop. As an oversimplification, in one example, the encoding loop may include a source encoder (630) (e.g., responsible for creating symbols, such as a symbol stream, based on the input image to be encoded and a reference image) and a (local) decoder (633) embedded in the video encoder (603). The decoder (633) reconstructs the symbols to create sample data in a manner similar to how a (remote) decoder can create sample data, even though the embedded decoder 633 processes the video stream encoded by the source encoder 630 without entropy coding (because in the video compression techniques considered in the disclosed subject matter, any compression between the symbols in entropy coding and the encoded video bitstream can be lossless). The reconstructed sample stream (sample data) is input to a reference image memory (634). Since the decoding of the symbol stream produces bit-precise results independent of the decoder location (local or remote), the contents of the reference image memory (634) are also bit-precisely corresponding between the local encoder and the remote encoder. In other words, the reference picture samples "seen" by the encoder's prediction section are exactly the same as the sample values that the decoder will "see" during the prediction phase. This fundamental principle of reference picture synchronization (and the drift that occurs when synchronization cannot be maintained, for example, due to channel errors) is used to improve coding quality.

“本地”解码器(633)的操作可与诸如已在上文结合图5详细描述的视频解码器(510)的“远程”解码器相同。然而，另外简要参考图5，由于符号可用且熵编码器(645)和解析器(520)能够无损地将符号编码/解码成已编码视频序列，因此包括缓冲存储器(515)和解析器(520)的视频解码器(510)的熵解码部分可能无法完全在编码器中的本地解码器(633)中实现。The operation of the “local” decoder (633) can be the same as that of a “remote” decoder, such as the video decoder (510) which has been described in detail above in conjunction with Figure 5. However, referring briefly to Figure 5 again, since symbols are available and the entropy encoder (645) and parser (520) are able to encode/decode the symbols into an encoded video sequence without loss, the entropy decoding portion of the video decoder (510), which includes the buffer memory (515) and the parser (520), may not be fully implemented in the local decoder (633) within the encoder.

此时可以观察到，除了仅可存在于解码器中的解析/熵解码之外的任何解码器技术，还可必定需要以基本上相同的功能形式存在于对应的编码器中。出于此原因，所公开的主题有时可侧重于解码器操作，解码器操作与编码器的解码部分结合。因此，可简化编码器技术的描述，因为编码器技术与全面地描述的解码器技术互逆。在下文中，仅在某些区域或方面，提供编码器的更详细的描述。It can be observed that any decoder technique, other than parsing/entropy decoding which can only exist in the decoder, must also necessarily exist in the corresponding encoder in essentially the same functional form. For this reason, the disclosed subject matter may sometimes focus on decoder operations, which are combined with the decoding portion of the encoder. Therefore, the description of encoder techniques can be simplified, as encoder techniques are inverses of the comprehensively described decoder techniques. In the following, only certain areas or aspects are described in more detail about the encoder.

在操作期间，在一些示例性实现方式中，源编码器(630)可执行运动补偿预测编码，通过参考来自视频序列中被指定为“参考图片”的一个或多个先前已编码图片，该运动补偿预测编码对输入图片进行预测性编码。以这种方式，编码引擎(632)对输入图片的像素块和参考图片的像素块之间的颜色通道中的差异(或残差)进行编码，该参考图片可被选作该输入图片的预测参考。术语“残差”及其形容词形式“残差的”可互换地使用。During operation, in some exemplary implementations, the source encoder (630) may perform motion-compensated predictive coding, which predictively encodes the input image by referencing one or more previously encoded images from the video sequence designated as "reference images." In this manner, the encoding engine (632) encodes the differences (or residuals) in the color channels between pixel blocks of the input image and pixel blocks of the reference image, which may be selected as a predictive reference for the input image. The term "residual" and its adjective form "residual" are used interchangeably.

本地视频解码器(633)可基于源编码器(630)创建的符号，对可指定为参考图片的图片的已编码视频数据进行解码。编码引擎(632)的操作有利地可以是有损过程。当已编码视频数据可以在视频解码器(图6未示出)中被解码时，已重建视频序列通常可以是带有一些误差的源视频序列的副本。本地视频解码器(633)复制解码过程，该解码过程可由视频解码器对参考图片执行，且可使已重建参考图片存储在参考图片高速缓存(634)中。以这种方式，视频编码器(603)可以在本地存储已重建参考图片的副本，该副本与将由远端(远程)视频解码器获得的已重建参考图片具有共同内容(不存在传输误差)。The local video decoder (633) can decode encoded video data of a picture that can be designated as a reference picture based on symbols created by the source encoder (630). The operation of the encoding engine (632) can advantageously be a lossy process. When the encoded video data can be decoded in the video decoder (not shown in FIG. 6), the reconstructed video sequence can typically be a copy of the source video sequence with some errors. The local video decoder (633) replicates the decoding process, which can be performed by the video decoder on the reference picture, and allows the reconstructed reference picture to be stored in a reference picture cache (634). In this way, the video encoder (603) can locally store a copy of the reconstructed reference picture that shares common content (no transmission errors) with the reconstructed reference picture that will be obtained by the remote video decoder.

预测器(635)可针对编码引擎(632)执行预测搜索。即，对于待编码的新图片，预测器(635)可以在参考图片存储器(634)中搜索可用作新图片的适当预测参考的样本数据(作为候选参考像素块)或某些元数据，例如参考图片运动矢量、块形状等。预测器(635)可基于样本块逐像素块操作，以找到合适的预测参考。在一些情况下，如由预测器(635)获得的搜索结果所确定的，输入图片可具有从参考图片存储器(634)中存储的多个参考图片取得的预测参考。The predictor (635) can perform a prediction search against the encoding engine (632). That is, for a new image to be encoded, the predictor (635) can search in the reference image memory (634) for sample data (as candidate reference pixel blocks) or certain metadata, such as reference image motion vectors, block shapes, etc., that can be used as appropriate prediction references for the new image. The predictor (635) can operate pixel-by-pixel based on the sample blocks to find suitable prediction references. In some cases, as determined by the search results obtained by the predictor (635), the input image may have prediction references obtained from multiple reference images stored in the reference image memory (634).

控制器(650)可管理源编码器(630)的编码操作，包括例如设置用于对视频数据进行编码的参数和子群参数。The controller (650) can manage the encoding operations of the source encoder (630), including, for example, setting parameters and subgroup parameters for encoding video data.

可以在熵编码器(645)中对所有上述功能单元的输出进行熵编码。熵编码器(645)根据诸如霍夫曼编码、可变长度编码、算术编码等的技术来对各种功能单元生成的符号进行无损压缩，从而将该符号转换成已编码视频序列。The outputs of all the aforementioned functional units can be entropy encoded in the entropy encoder (645). The entropy encoder (645) performs lossless compression on the symbols generated by the various functional units using techniques such as Huffman coding, variable-length coding, and arithmetic coding, thereby converting the symbols into an encoded video sequence.

传输器(640)可缓冲由熵编码器(645)创建的已编码视频序列，从而为通过通信信道(660)进行传输做准备，通信信道(660)可以是通向可存储已编码视频数据的存储设备的硬件/软件链路。传输器(640)可以将来自视频编码器(603)的已编码视频数据与待传输的其它数据合并，其它数据例如是已编码音频数据和/或辅助数据流(未示出来源)。The transmitter (640) can buffer the encoded video sequence created by the entropy encoder (645) in preparation for transmission via a communication channel (660), which may be a hardware/software link to a storage device capable of storing the encoded video data. The transmitter (640) can combine the encoded video data from the video encoder (603) with other data to be transmitted, such as encoded audio data and/or auxiliary data streams (source not shown).

控制器(650)可管理视频编码器(603)的操作。在编码期间，控制器(650)可以向每个已编码图片分配某一已编码图片类型，但这可能影响可应用于相应的图片的编码技术。例如，通常可以将图片分配为以下任一种图片类型：The controller (650) manages the operation of the video encoder (603). During encoding, the controller (650) can assign a specific encoded image type to each encoded image, but this may affect the encoding techniques applicable to the corresponding images. For example, images can typically be assigned to any of the following image types:

帧内图片(I图片)，其可以是不将序列中的任何其它图片用作预测源就可被编码和解码的图片。一些视频编解码器容许不同类型的帧内图片，包括例如独立解码器刷新(“IDR”)图片。本领域的普通技术人员了解I图片的这些变体及其相应的应用和特征。An intra-frame picture (I-picture) is a picture that can be encoded and decoded without using any other pictures in the sequence as a prediction source. Some video codecs allow different types of intra-frame pictures, including, for example, Independent Decoder Refresh (“IDR”) pictures. Those skilled in the art will understand these variations of I-pictures and their corresponding applications and characteristics.

预测性图片(P图片)，其可以是可使用帧内预测或帧间预测进行编码和解码的图片，该帧内预测或帧间预测使用至多一个运动矢量和参考索引来预测每个块的样本值。A predictive picture (P-picture) can be a picture that can be encoded and decoded using intra-frame prediction or inter-frame prediction, which uses at most one motion vector and reference index to predict sample values for each block.

双向预测性图片(B图片)，其可以是可使用帧内预测或帧间预测进行编码和解码的图片，该帧内预测或帧间预测使用至多两个运动矢量和参考索引来预测每个块的样本值。类似地，多个预测性图片可使用多于两个参考图片和相关联的元数据来用于重建单个块。A bidirectional predictive picture (B-picture) can be a picture that can be encoded and decoded using intra-frame prediction or inter-frame prediction, which uses at most two motion vectors and a reference index to predict sample values for each block. Similarly, multiple predictive pictures can be used to reconstruct a single block using more than two reference pictures and associated metadata.

源图片通常可以在空间上细分成多个样本编码块(例如，4×4、8×8、4×8或16×16个样本的块)，且逐块进行编码。这些块可参考其它(已编码)块进行预测性编码，其它(已编码)块由应用于块的相应图片的编码分配来确定。例如，I图片的块可进行非预测性编码，或者I图片的块可参考同一图片的已编码块来进行预测性编码(空间预测或帧内预测)。P图片的像素块可参考一个先前编码的参考图片通过空间预测或通过时间预测来进行预测性编码。B图片的块可参考一个或两个先前编码的参考图片通过空间预测或通过时间预测来进行预测性编码。出于其它目的，源图片或中间处理的图片可细分成其它类型的块。编码块和其它类型的块的划分可以或可以不遵循相同的方式，如下文进一步详细描述的。Source images can typically be spatially subdivided into multiple sample coding blocks (e.g., 4×4, 8×8, 4×8, or 16×16 sample blocks), and coded block by block. These blocks can be predictively coded with reference to other (already coded) blocks, which are determined by the coding assignments of the corresponding images applied to the blocks. For example, blocks of an I-image can be non-predictively coded, or blocks of an I-image can be predictively coded (spatial or intra-frame prediction) with reference to already coded blocks of the same image. Pixel blocks of a P-image can be predictively coded with reference to a previously coded reference image via spatial or temporal prediction. Blocks of a B-image can be predictively coded with reference to one or two previously coded reference images via spatial or temporal prediction. For other purposes, source images or intermediate images can be subdivided into other types of blocks. The partitioning of coding blocks and other types of blocks may or may not follow the same manner, as described in further detail below.

视频编码器(603)可根据诸如ITU-T H.265建议书的预定视频编码技术或标准执行编码操作。在操作中，视频编码器(603)可执行各种压缩操作，包括利用输入视频序列中的时间和空间冗余的预测性编码操作。因此，已编码视频数据可符合所使用的视频编码技术或标准指定的语法。The video encoder (603) can perform encoding operations according to a predetermined video coding technique or standard such as ITU-T H.265 Recommendation. In operation, the video encoder (603) can perform various compression operations, including predictive coding operations utilizing temporal and spatial redundancy in the input video sequence. Therefore, the encoded video data can conform to the syntax specified by the video coding technique or standard used.

在一些示例性实施例中，传输器(640)可以在传输已编码视频时传输附加数据。源编码器(630)可包括此类数据作为已编码视频序列的一部分。附加数据可包括时间/空间/SNR增强层、诸如冗余图片和切片的其它形式的冗余数据、SEI消息、VUI参数集片段等。In some exemplary embodiments, the transmitter (640) may transmit additional data while transmitting encoded video. The source encoder (630) may include such data as part of the encoded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, SEI messages, VUI parameter set fragments, etc.

所采集的视频可作为呈时间序列的多个源图片(视频图片)。帧内图片预测(通常简化为帧内预测)利用给定图片中的空间相关性，而帧间图片预测则利用图片之间的时间或其它相关性。例如，可以将正在编码/解码的特定图片分成块，正在编码/解码的特定图片称为当前图片。在当前图片中的块类似于视频中先前已编码且仍被缓冲的参考图片中的参考块时，可通过称为运动矢量的矢量对当前图片中的块进行编码。运动矢量指向参考图片中的参考块，且在使用多个参考图片的情况下，运动矢量可具有识别参考图片的第三个维度。The captured video can be presented as multiple source images (video images) in a time-series format. Intra-frame image prediction (often simplified to intra-frame prediction) utilizes spatial correlations within a given image, while inter-frame image prediction utilizes temporal or other correlations between images. For example, a specific image being encoded/decoded can be divided into blocks, referred to as the current image. When a block in the current image resembles a reference block in a previously encoded and still buffered reference image in the video, the block in the current image can be encoded using a vector called a motion vector. The motion vector points to the reference block in the reference image, and when using multiple reference images, the motion vector can have a third dimension that identifies the reference image.

在一些示例性实施例中，双向预测技术可用于帧间图片预测。根据这种双向预测技术，使用两个参考图片，例如按解码次序在视频中的当前图片之前(但是按显示次序可能分别是过去或将来)的第一参考图片和第二参考图片。可通过指向第一参考图片中的第一参考块的第一运动矢量和指向第二参考图片中的第二参考块的第二运动矢量来对当前图片中的块进行编码。可通过第一参考块和第二参考块的组合来联合地预测该块。In some exemplary embodiments, bidirectional prediction techniques can be used for inter-frame image prediction. According to this bidirectional prediction technique, two reference images are used, such as a first reference image and a second reference image that precede the current image in the video in decoding order (but may be past or future in display order). A block in the current image can be encoded using a first motion vector pointing to a first reference block in the first reference image and a second motion vector pointing to a second reference block in the second reference image. The block can be jointly predicted using a combination of the first and second reference blocks.

此外，合并模式技术可用于帧间图片预测以提高编码效率。In addition, merging mode techniques can be used for inter-frame image prediction to improve coding efficiency.

根据本公开的一些示例性实施例，诸如帧间图片预测和帧内图片预测的预测以块为单位来执行。例如，将视频图片序列中的图片划分成编码树单元(CTU)以用于压缩，图片中的CTU可具有相同大小，例如64×64像素、32×32像素或16×16像素。通常，CTU可包括三个平行的编码树块(CTB)：一个亮度CTB和两个色度CTB。可以将每个CTU递归地以四叉树拆分成一个或多个编码单元(CU)。例如，可以将64×64像素的CTU拆分成一个64×64像素的CU，或4个32×32像素的CU。一个或多个32×32块中的每一个块可进一步拆分成4个16×16像素的CU。在一些示例性实施例中，可以在编码期间分析每个CU以确定多种预测类型之中用于CU的预测类型，例如帧间预测类型或帧内预测类型。根据时间和/或空间可预测性，可以将CU拆分成一个或多个预测单元(PU)。通常，每个PU包括亮度预测块(PB)和两个色度PB。在一个实施例中，编码(编码/解码)中的预测操作以预测块为单位来执行。可以以各种空间模式执行将CU拆分成PU(或不同颜色通道的PB)。例如，亮度或色度PB可包括针对样本的值(例如，亮度值)的矩阵，所述的样本例如是8×8像素、16×16像素、8×16像素、16×8像素等。According to some exemplary embodiments of this disclosure, predictions such as inter-frame picture prediction and intra-frame picture prediction are performed on a block-by-block basis. For example, pictures in a video picture sequence are divided into coding tree units (CTUs) for compression, and the CTUs in the pictures may have the same size, such as 64×64 pixels, 32×32 pixels, or 16×16 pixels. Typically, a CTU may include three parallel coding tree blocks (CTBs): one luma CTB and two chroma CTBs. Each CTU may be recursively split into one or more coding units (CUs) in a quadtree. For example, a 64×64 pixel CTU may be split into one 64×64 pixel CU, or four 32×32 pixel CUs. Each of the one or more 32×32 blocks may be further split into four 16×16 pixel CUs. In some exemplary embodiments, each CU may be analyzed during encoding to determine the prediction type used for the CU among a variety of prediction types, such as inter-frame prediction type or intra-frame prediction type. Based on temporal and/or spatial predictability, a CU may be split into one or more prediction units (PUs). Typically, each PU includes a luma prediction block (PB) and two chroma PBs. In one embodiment, prediction operations in encoding (encoding/decoding) are performed on a per-prediction-block basis. The CU can be split into PUs (or PBs for different color channels) in various spatial modes. For example, a luma or chroma PB may comprise a matrix of values for samples (e.g., luma values), such as 8×8 pixels, 16×16 pixels, 8×16 pixels, 16×8 pixels, etc.

图7示出了根据本公开的另一示例性实施例的视频编码器(703)的图。视频编码器(703)配置成接收视频图片序列中的当前视频图片内的样本值的处理块(例如，预测块)，且将处理块编码到作为已编码视频序列的一部分的已编码图片中。在示例中，视频编码器(703)可用于代替图4的示例中的视频编码器(403)。Figure 7 illustrates a diagram of a video encoder (703) according to another exemplary embodiment of the present disclosure. The video encoder (703) is configured to receive processing blocks (e.g., prediction blocks) of sample values within a current video picture in a video picture sequence, and to encode the processing blocks into an encoded picture that is part of an encoded video sequence. In the example, the video encoder (703) can be used instead of the video encoder (403) in the example of Figure 4.

例如，视频编码器(703)接收用于处理块的样本值的矩阵，该处理块例如是8×8样本的预测块等。然后，视频编码器(703)使用例如率失真优化(RDO)来确定是否使用帧内模式、帧间模式或双向预测模式来最佳地对处理块进行编码。当确定在帧内模式中对处理块进行编码时，视频编码器(703)可使用帧内预测技术以将处理块编码到已编码图片中；以及当确定在帧间模式或双向预测模式中对处理块进行编码时，视频编码器(703)可分别使用帧间预测或双向预测技术以将处理块编码到已编码图片中。在一些示例性实施例中，合并模式可用作帧间图片预测的子模式，其中，在不借助预测器外部的已编码运动矢量分量的情况下，从一个或多个运动矢量预测器导出运动矢量。在一些其它示例性实施例中，可存在适用于主题块的运动矢量分量。因此，视频编码器(703)可包括未在图7中明确示出的组件，例如用于确定处理块的预测模式的模式决策模块。For example, a video encoder (703) receives a matrix of sample values for a processing block, such as an 8×8 sample prediction block. The video encoder (703) then uses, for example, rate-distortion optimization (RDO) to determine whether to optimally encode the processing block using intra-frame mode, inter-frame mode, or bidirectional prediction mode. When it is determined that the processing block is to be encoded in intra-frame mode, the video encoder (703) can use intra-frame prediction techniques to encode the processing block into an encoded picture; and when it is determined that the processing block is to be encoded in inter-frame mode or bidirectional prediction mode, the video encoder (703) can use inter-frame prediction or bidirectional prediction techniques respectively to encode the processing block into an encoded picture. In some exemplary embodiments, a merging mode can be used as a sub-mode for inter-frame picture prediction, wherein motion vectors are derived from one or more motion vector predictors without the aid of encoded motion vector components outside the predictor. In some other exemplary embodiments, motion vector components applicable to the subject block may be present. Therefore, the video encoder (703) may include components not explicitly shown in FIG7, such as a mode decision module for determining the prediction mode of the processing block.

在图7的示例中，视频编码器(703)包括如图7中的示例性布置所示的耦接在一起的帧间编码器(730)、帧内编码器(722)、残差计算器(723)、开关(726)、残差编码器(724)、通用控制器(721)和熵编码器(725)。In the example of Figure 7, the video encoder (703) includes an inter-frame encoder (730), an intra-frame encoder (722), a residual calculator (723), a switch (726), a residual encoder (724), a general controller (721), and an entropy encoder (725) coupled together as shown in the exemplary arrangement of Figure 7.

帧间编码器(730)配置成接收当前块(例如，处理块)的样本，将该块与参考图片中的一个或多个参考块(例如，按显示次序是先前图片和后来图片中的块)进行比较，生成帧间预测信息(例如，根据帧间编码技术的冗余信息描述、运动矢量、合并模式信息)，以及基于帧间预测信息使用任何合适的技术来计算帧间预测结果(例如，已预测块)。在一些示例中，参考图片是基于已编码视频信息，使用嵌入在图6的示例性编码器620中的解码单元633(如图7的残差解码器728所示，如下文进一步详细描述的)来解码的已解码参考图片。An inter-frame encoder (730) is configured to receive samples of the current block (e.g., the processing block), compare that block with one or more reference blocks in a reference image (e.g., blocks in the order of display from previous to later images), generate inter-frame prediction information (e.g., a description of redundancy information based on the inter-frame coding technique, motion vectors, merging mode information), and compute inter-frame prediction results (e.g., predicted blocks) based on the inter-frame prediction information using any suitable technique. In some examples, the reference image is a decoded reference image that is decoded based on encoded video information using a decoding unit 633 embedded in the exemplary encoder 620 of FIG. 6 (shown as the residual decoder 728 of FIG. 7, as described in further detail below).

帧内编码器(722)配置成接收当前块(例如，处理块)的样本，将该块与同一图片中已编码的块进行比较，以及在变换之后生成量化系数，且在一些情况下还生成帧内预测信息(例如，根据一个或多个帧内编码技术的帧内预测方向信息)。帧内编码器(722)可基于帧内预测信息和同一图片中的参考块计算帧内预测结果(例如，已预测块)。The intra encoder (722) is configured to receive samples of the current block (e.g., the processed block), compare the block with encoded blocks in the same image, generate quantization coefficients after transformation, and in some cases generate intra prediction information (e.g., intra prediction direction information based on one or more intra coding techniques). The intra encoder (722) may compute intra prediction results (e.g., predicted blocks) based on the intra prediction information and reference blocks in the same image.

通用控制器(721)可配置成确定通用控制数据，且基于该通用控制数据控制视频编码器(703)的其它组件。在一个示例中，通用控制器(721)确定块的预测模式，且基于该预测模式将控制信号提供给开关(726)。例如，当该预测模式是帧内模式时，通用控制器(721)控制开关(726)以选择供残差计算器(723)使用的帧内模式结果，且控制熵编码器(725)以选择帧内预测信息并将帧内预测信息包括在码流中；以及当该块的预测模式是帧间模式时，通用控制器(721)控制开关(726)以选择供残差计算器(723)使用的帧间预测结果，且控制熵编码器(725)以选择帧间预测信息并将帧间预测信息包括在码流中。A general controller (721) can be configured to determine general control data and control other components of the video encoder (703) based on that general control data. In one example, the general controller (721) determines the prediction mode of a block and provides control signals to a switch (726) based on that prediction mode. For example, when the prediction mode is an intra-frame mode, the general controller (721) controls the switch (726) to select an intra-frame mode result for use by the residual calculator (723) and controls the entropy encoder (725) to select intra-frame prediction information and include the intra-frame prediction information in the bitstream; and when the prediction mode of the block is an inter-frame mode, the general controller (721) controls the switch (726) to select an inter-frame prediction result for use by the residual calculator (723) and controls the entropy encoder (725) to select inter-frame prediction information and include the inter-frame prediction information in the bitstream.

残差计算器(723)可配置成计算所接收的块与从帧内编码器(722)或帧间编码器(730)选择的块的预测结果之间的差(残差数据)。残差编码器(724)可配置成对残差数据进行编码以生成变换系数。例如，残差编码器(724)可配置成将残差数据从空间域变换到频域，以生成变换系数。变换系数随后经受量化处理以获得量化变换系数。在各种示例性实施例中，视频编码器(703)还包括残差解码器(728)。残差解码器(728)配置成执行逆变换，且生成已解码残差数据。已解码残差数据可适当地由帧内编码器(722)和帧间编码器(730)使用。例如，帧间编码器(730)可基于已解码残差数据和帧间预测信息生成已解码块，且帧内编码器(722)可基于已解码残差数据和帧内预测信息生成已解码块。适当地处理已解码块以生成已解码图片，且已解码图片可以在存储器电路(未示出)中缓冲并用作参考图片。A residual calculator (723) is configured to calculate the difference (residual data) between the received block and the prediction result of a block selected from the intra encoder (722) or the inter encoder (730). A residual encoder (724) is configured to encode the residual data to generate transform coefficients. For example, the residual encoder (724) is configured to transform the residual data from the spatial domain to the frequency domain to generate transform coefficients. The transform coefficients are then quantized to obtain quantized transform coefficients. In various exemplary embodiments, the video encoder (703) also includes a residual decoder (728). The residual decoder (728) is configured to perform an inverse transform and generate decoded residual data. The decoded residual data may be used appropriately by the intra encoder (722) and the inter encoder (730). For example, the inter encoder (730) may generate a decoded block based on the decoded residual data and inter-frame prediction information, and the intra encoder (722) may generate a decoded block based on the decoded residual data and intra-frame prediction information. The decoded blocks are processed appropriately to generate a decoded image, which can be buffered in a memory circuit (not shown) and used as a reference image.

熵编码器(725)可配置成将码流格式化以包括已编码块，以及执行熵编码。熵编码器(725)配置成在码流中包括各种信息。例如，熵编码器(725)可配置成将通用控制数据、所选预测信息(例如，帧内预测信息或帧间预测信息)、残差信息和其它合适的信息包括在码流中。当在帧间模式或双向预测模式的合并子模式中对块进行编码时，可以不存在残差信息。The entropy encoder (725) is configured to format the bitstream to include encoded blocks and to perform entropy encoding. The entropy encoder (725) is configured to include various information in the bitstream. For example, the entropy encoder (725) can be configured to include general control data, selected prediction information (e.g., intra-frame prediction information or inter-frame prediction information), residual information, and other suitable information in the bitstream. Residual information may be absent when blocks are encoded in a merged sub-mode of inter-frame mode or bidirectional prediction mode.

图8示出了根据本公开的另一实施例的示例性视频解码器(810)的图。视频解码器(810)配置成接收作为已编码视频序列的一部分的已编码图片，且对已编码图片进行解码以生成已重建图片。在一个示例中，视频解码器(810)可用于代替图4的示例中的视频解码器(410)。Figure 8 illustrates an exemplary video decoder (810) according to another embodiment of the present disclosure. The video decoder (810) is configured to receive an encoded image as part of an encoded video sequence and decode the encoded image to generate a reconstructed image. In one example, the video decoder (810) may be used instead of the video decoder (410) in the example of Figure 4.

在图8的示例中，视频解码器(810)包括如图8中的示例性布置所示的耦接在一起的熵解码器(871)、帧间解码器(880)、残差解码器(873)、重建模块(874)和帧内解码器(872)。In the example of Figure 8, the video decoder (810) includes an entropy decoder (871), an inter-frame decoder (880), a residual decoder (873), a reconstruction module (874), and an intra-frame decoder (872) coupled together as shown in the exemplary arrangement in Figure 8.

熵解码器(871)可配置成根据已编码图片来重建某些符号，这些符号表示构成已编码图片的语法元素。此类符号可包括例如对块进行编码的模式(例如，帧内模式、帧间模式、双向预测模式、合并子模式或另一子模式)、可识别供帧内解码器(872)或帧间解码器(880)使用以进行预测的某些样本或元数据的预测信息(例如，帧内预测信息或帧间预测信息)、呈例如量化变换系数形式的残差信息等。在一个示例中，当预测模式是帧间或双向预测模式时，将帧间预测信息提供给帧间解码器(880)；以及当预测类型是帧内预测类型时，将帧内预测信息提供给帧内解码器(872)。残差信息可经受逆量化并提供给残差解码器(873)。The entropy decoder (871) can be configured to reconstruct certain symbols from the encoded picture, which represent the syntax elements constituting the encoded picture. Such symbols may include, for example, the mode encoding the block (e.g., intra-mode, inter-mode, bidirectional prediction mode, merged sub-mode, or another sub-mode), prediction information (e.g., intra-prediction information or inter-prediction information) that can be identified for use by the intra-decoder (872) or inter-decoder (880) for prediction, residual information in the form of, for example, quantized transform coefficients, etc. In one example, when the prediction mode is inter-mode or bidirectional prediction mode, inter-prediction information is provided to the inter-decoder (880); and when the prediction type is intra-prediction type, intra-prediction information is provided to the intra-decoder (872). The residual information may be inversely quantized and provided to the residual decoder (873).

帧间解码器(880)可配置成接收帧间预测信息，且基于该帧间预测信息生成帧间预测结果。The inter-frame decoder (880) can be configured to receive inter-frame prediction information and generate inter-frame prediction results based on the inter-frame prediction information.

帧内解码器(872)可配置成接收帧内预测信息，且基于该帧内预测信息生成预测结果。The intra-frame decoder (872) can be configured to receive intra-frame prediction information and generate prediction results based on the intra-frame prediction information.

残差解码器(873)可配置成执行逆量化以提取解量化的变换系数，且处理该解量化的变换系数，以将残差从频域变换到空间域。残差解码器(873)还可利用某些控制信息(用以包括量化器参数(QP))，这些控制信息可由熵解码器(871)提供(未描绘数据路径，因为这仅仅是低数据量控制信息)。The residual decoder (873) can be configured to perform inverse quantization to extract the dequantized transform coefficients and process the dequantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (873) can also utilize certain control information (to include quantizer parameters (QP)) which can be provided by the entropy decoder (871) (the data path is not depicted because this is only low-volume control information).

重建模块(874)可配置成在空间域中组合由残差解码器(873)输出的残差与预测结果(可由帧间预测模块或帧内预测模块输出，视情况而定)以形成已重建块，已重建块形成已重建图片的一部分，而已重建图片是已重建视频的一部分。应注意，还可执行诸如去块操作等其它合适的操作来改善视觉质量。The reconstruction module (874) can be configured to combine the residual output by the residual decoder (873) with the prediction result (which may be output by the inter-frame prediction module or the intra-frame prediction module, as appropriate) in the spatial domain to form a reconstructed block, which forms part of the reconstructed image, and the reconstructed image is part of the reconstructed video. It should be noted that other suitable operations, such as deblocking, can also be performed to improve visual quality.

应注意，可使用任何合适的技术来实现视频编码器(403)、视频编码器(603)和视频编码器(703)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。在一些示例性实施例中，可使用一个或多个集成电路来实现视频编码器(403)、视频编码器(603)和视频编码器(703)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。在另一实施例中，可使用执行软件指令的一个或多个处理器来实现视频编码器(403)、视频编码器(603)和视频编码器(603)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。It should be noted that any suitable technology can be used to implement the video encoder (403), video encoder (603), and video encoder (703), as well as the video decoder (410), video decoder (510), and video decoder (810). In some exemplary embodiments, one or more integrated circuits can be used to implement the video encoder (403), video encoder (603), and video encoder (703), as well as the video decoder (410), video decoder (510), and video decoder (810). In another embodiment, one or more processors executing software instructions can be used to implement the video encoder (403), video encoder (603), and video encoder (603), as well as the video decoder (410), video decoder (510), and video decoder (810).

返回到帧内预测过程，在帧内预测过程中，通过相邻、下一个相邻或者一个或多个其它行的样本、或者它们的组合来预测块(例如，亮度预测块或色度预测块，或者如果没有进一步拆分成预测块，则是编码块)中的样本，以生成预测块。然后，可通过变换来处理正在编码的实际块和预测块之间的残差，之后进行量化。可使得各种帧内预测模式可用，且可以在码流中用信号表示与帧内模式选择相关的参数和其它参数。例如，各种帧内预测模式可涉及用于预测样本的一个或多个行位置、从一个或多个预测行中选择预测样本所沿的方向、以及其它特殊的帧内预测模式。Returning to the intra-prediction process, a prediction block is generated by predicting samples in a block (e.g., a luma prediction block or a chroma prediction block, or, if not further subdivided into prediction blocks, a coded block) using samples from adjacent, next adjacent, or one or more other lines, or combinations thereof. The residual between the actual block being encoded and the prediction block is then processed by a transform, followed by quantization. Various intra-prediction modes are made available, and parameters and other parameters related to the intra-prediction mode selection can be represented as signals in the bitstream. For example, various intra-prediction modes may involve the position of one or more lines used for prediction samples, the direction along which the prediction samples are selected from one or more prediction lines, and other special intra-prediction modes.

例如，一组帧内预测模式(可互换地称为“帧内模式”)可包括预定数量的定向帧内预测模式。如上文与图1的示例性实现方式相关地描述的，这些帧内预测模式可对应于预定数量的方向，沿着这些方向选择块外部的样本(out-of-block samples)作为在特定块中预测的样本的预测。在另一特定示例性实现方式中，可支持和预先定义八(8)个主方向模式，这8个主方向模式对应于与水平轴线成45度至207度的角度。For example, a set of intra-prediction modes (interchangeably referred to as "intra-modes") may include a predetermined number of directional intra-prediction modes. As described above in relation to the exemplary implementation of Figure 1, these intra-prediction modes may correspond to a predetermined number of directions along which out-of-block samples are selected as predictions for samples predicted in a particular block. In another particular exemplary implementation, eight (8) principal directional modes may be supported and predefined, these eight principal directional modes corresponding to angles from 45 degrees to 207 degrees with respect to the horizontal axis.

在帧内预测的一些其它实现方式中，为了进一步利用定向纹理中更多种类的空间冗余，定向帧内模式可进一步扩展到具有更细粒度的角度集。例如，上述8角度实现方式可配置成提供8个标称角度(nominal angles)，如图9所示，这8个标称角度称为V_PRED，H_PRED，D45_PRED，D135_PRED，D113_PRED，D157_PRED，D203_PRED和D67_PRED，且对于每个标称角度，可添加预定数量的(例如，7个)更细角度。通过这种扩展，更大总数的(例如本示例中，56个)方向角度可用于帧内预测，对应于相同数量的预定定向帧内模式。预测角度可由标称帧内角度加上角度增量来表示。对于上文针对每个标称角度具有7个更细的角度方向的特定示例，角度增量可以是-3至3乘以步长，该步长为3度。In some other implementations of intra-frame prediction, to further utilize the greater variety of spatial redundancy in oriented textures, oriented intra-frame modes can be extended to have a finer-grained set of angles. For example, the 8-angle implementation described above can be configured to provide 8 nominal angles, as shown in Figure 9. These 8 nominal angles are named V_PRED, H_PRED, D45_PRED, D135_PRED, D113_PRED, D157_PRED, D203_PRED, and D67_PRED, and for each nominal angle, a predetermined number (e.g., 7) of finer angles can be added. With this extension, a larger total number of directional angles (e.g., 56 in this example) can be used for intra-frame prediction, corresponding to the same number of predetermined oriented intra-frame modes. The predicted angle can be represented by the nominal intra-frame angle plus an angle increment. For the specific example above with 7 finer angular directions for each nominal angle, the angle increment can be from -3 to 3 times a step size of 3 degrees.

在一些实现方式中，作为上述定向帧内模式的替代或者除了上述定向帧内模式之外，还可预先定义预定数量的非定向帧内预测模式，以及使得预定数量的非定向帧内预测模式可用。例如，可指定5个称为平滑帧内预测模式的非定向帧内模式。这些非定向帧内模式预测模式可具体地称为DC帧内模式，PAETH帧内模式，SMOOTH帧内模式，SMOOTH_V帧内模式和SMOOTH_H帧内模式。图10示出了在这些示例性非定向模式下对特定块的样本的预测。作为示例，图10示出了通过来自顶部相邻行和/或左侧相邻行的样本预测的4×4块1002。块1002中的特定样本1010可对应于位于块1002的顶部相邻行中的样本1010的正上方样本1004、作为顶部相邻行和左侧相邻行的交集的样本1010的左上样本1006、以及位于块1002的左侧相邻行中的样本1010的正左侧样本1008。对于示例性DC帧内预测模式，左侧相邻样本1008和上方相邻样本1004的平均值可用作样本1010的预测值。对于示例性PAETH帧内预测模式，可提取顶部参考样本1004、左侧参考样本1008和左上方参考样本1006，然后，这三个参考样本中最接近(顶部+左侧–左上方)的那个值可设置为样本1010的预测值。对于示例性SMOOTH_V帧内预测模式，样本1010可通过沿着垂直方向对左上方相邻样本1006和左侧相邻样本1008进行二次插值来预测。对于示例性SMOOTH_H帧内预测模式，样本1010可通过沿着水平方向对左上方相邻样本1006和顶部相邻样本1004进行二次插值来预测。对于示例性SMOOTH帧内预测模式，样本1010可通过沿着垂直方向和水平方向的二次插值的平均值来预测。上述非定向帧内模式实现方式仅作为非限制性示例来说明。还考虑其它相邻行、样本的其它非定向选择、以及组合预测样本以预测预测块中的特定样本的方式。In some implementations, a predetermined number of non-directional intra-prediction modes can be predefined as an alternative to or in addition to the aforementioned directional intra-prediction modes, and these non-directional intra-prediction modes can be made available. For example, five non-directional intra-prediction modes, referred to as smooth intra-prediction modes, can be specified. These non-directional intra-prediction modes can be specifically referred to as DC intra-prediction mode, PAETH intra-prediction mode, SMOOTH intra-prediction mode, SMOOTH_V intra-prediction mode, and SMOOTH_H intra-prediction mode. Figure 10 illustrates the prediction of samples for a specific block under these exemplary non-directional modes. As an example, Figure 10 shows a 4×4 block 1002 predicted by samples from the top adjacent row and/or the left adjacent row. A specific sample 1010 in block 1002 may correspond to sample 1004 directly above sample 1010 in the top adjacent row of block 1002, sample 1006 to the upper left of sample 1010 which is the intersection of the top and left adjacent rows, and sample 1008 directly to the left of sample 1010 in the left adjacent row of block 1002. For the exemplary DC intra-prediction mode, the average of the left adjacent sample 1008 and the top adjacent sample 1004 can be used as the predicted value for sample 1010. For the exemplary PAETH intra-prediction mode, the top reference sample 1004, the left reference sample 1008, and the upper left reference sample 1006 can be extracted, and then the value closest to (top + left – upper left) among these three reference samples can be set as the predicted value for sample 1010. For the exemplary SMOOTH_V intra-prediction mode, sample 1010 can be predicted by performing quadratic interpolation along the vertical direction on the upper left adjacent sample 1006 and the left adjacent sample 1008. For the exemplary SMOOTH_H intra-prediction mode, sample 1010 can be predicted by performing quadratic interpolation along the horizontal direction on its upper-left neighbor sample 1006 and its top neighbor sample 1004. For the exemplary SMOOTH intra-prediction mode, sample 1010 can be predicted by averaging the quadratic interpolations along the vertical and horizontal directions. The above non-directional intra-prediction mode implementations are merely non-limiting examples. Other adjacent rows, other non-directional selections of samples, and methods of combining predicted samples to predict specific samples in a prediction block are also considered.

在不同编码层级(图片、切片、块、单元等)下，由编码器从上述定向或非定向模式中选择特定帧内预测模式，这可以在码流中用信号表示。在一些示例性实现方式中，首先可以用信号表示示例性8个标称定向模式和5个非角度平滑模式(总共13个选项)。然后，如果用信号表示的模式是8个标称角度帧内模式之一，则进一步用信号表示索引，以向对应的用信号表示的标称角度指示所选择的角度增量。在一些其它示例性实现方式中，所有帧内预测模式可一起索引(例如，56个定向模式加上5个非定向模式，以产生61个帧内预测模式)以用于用信号表示。At different coding levels (pictures, slices, blocks, units, etc.), the encoder selects a specific intra-prediction mode from the aforementioned directional or non-directional modes, which can be represented by signals in the bitstream. In some exemplary implementations, eight exemplary nominal directional modes and five non-angular smoothing modes (a total of 13 options) can first be represented by signals. Then, if the mode represented by a signal is one of the eight nominal angular intra-prediction modes, it is further indexed by signal representation to indicate the selected angle increment to the corresponding nominal angle represented by the signal. In some other exemplary implementations, all intra-prediction modes can be indexed together (e.g., 56 directional modes plus 5 non-directional modes to produce 61 intra-prediction modes) for use in signal representation.

在一些示例性实现方式中，示例性56个或其它数量的定向帧内预测模式可利用统一定向预测器来实现，统一定向预测器将块的每个样本投影到参考子样本位置，且通过2抽头双线性滤波器对参考样本进行插值。In some exemplary implementations, the exemplary 56 or other number of directional intra-prediction modes can be implemented using a unified directional predictor that projects each sample of the block to a reference subsample location and interpolates the reference sample through a 2-tap bilinear filter.

在一些实现方式中，为了利用边缘上的参考捕获衰减空间相关性，可设计称为FILTER INTRA模式的附加滤波器模式。对于这些模式，除了块外部的样本之外，块内的预测样本可用作块内的一些片区的帧内预测参考样本。例如，可预先定义这些模式，并使得这些模式可用于至少亮度块(或仅亮度块)的帧内预测。可预先设计预定数量的(例如，五个)滤波器帧内模式，每个滤波器帧内模式由n抽头滤波器(例如，7抽头滤波器)的集合表示，从而反映例如4×2片区中的样本与邻近该片区的n个近邻之间的相关性。换句话说，n抽头滤波器的权重因子可依赖于位置。以8×8块、4×2片区和7抽头滤波器为例，如图11所示，8×8块1102可拆分成8个4×2片区。在图11中，这些片区由B0，Bl，B2，B3，B4，B5，B6和B7表示。对于每个片区，其7个近邻(在图11中，由R0至R7指示)可用于预测当前片区中的样本。对于片区B0，所有近邻可能已重建。但是对于其它片区，一些近邻位于当前块中，因此可能未重建，然后紧邻的近邻的预测值用作参考。例如，不重建如图11指示的片区B7的所有近邻，因此替代地，使用近邻的预测样本。In some implementations, additional filter patterns, called FILTER INTRA modes, can be designed to utilize edge-based references to capture attenuation spatial correlations. For these modes, in addition to samples outside the block, predicted samples within the block can be used as intra-prediction reference samples for some regions within the block. For example, these modes can be predefined and made available for intra-prediction of at least luma blocks (or luma blocks only). A predetermined number (e.g., five) of filter intra-prediction modes can be pre-designed, each represented by a set of n-tap filters (e.g., 7-tap filters) reflecting the correlation between samples in, for example, a 4×2 region and its n nearest neighbors. In other words, the weighting factors of the n-tap filters can be location-dependent. Taking an 8×8 block, 4×2 regions, and 7-tap filters as an example, as shown in Figure 11, the 8×8 block 1102 can be divided into eight 4×2 regions. In Figure 11, these regions are represented by B0, B1, B2, B3, B4, B5, B6, and B7. For each region, its seven nearest neighbors (indicated by R0 to R7 in Figure 11) can be used to predict samples within the current region. For region B0, all nearest neighbors may have been reconstructed. However, for other regions, some neighbors are located in the current block and may not have been reconstructed; in this case, the predicted values of the immediate nearest neighbors are used as a reference. For example, not all nearest neighbors of region B7 as indicated in Figure 11 are reconstructed; instead, the predicted samples of the nearest neighbors are used.

在帧内预测的一些实现方式中，一个颜色分量可使用一个或多个其它颜色分量来预测。颜色分量可以是YCrCb、RGB、XYZ颜色空间等中的任意一个分量。例如，可实现从亮度分量(例如，亮度参考样本)预测色度分量(例如，色度块)，该预测称为从亮度而来的色度或CfL。在一些示例性实现方式中，跨颜色预测仅可允许从亮度到色度。例如，色度块中的色度样本可建模为同时重建的亮度样本(coincident reconstructed luma samples)的线性函数。CfL预测可如下实现：In some implementations of intra-frame prediction, a color component can be predicted using one or more other color components. The color component can be any component in the YCrCb, RGB, XYZ color space, etc. For example, it is possible to predict a chroma component (e.g., a chroma patch) from a luma component (e.g., a luma reference sample), a prediction referred to as luma-derived chroma or CfL. In some exemplary implementations, cross-color prediction is only allowed from luma to chroma. For example, chroma samples in a chroma patch can be modeled as a linear function of simultaneously reconstructed luma samples. CfL prediction can be implemented as follows:

CfL(α)＝α×L^AC+DC (1)CfL(α)＝α×L ^AC +DC (1)

其中，L^AC指示亮度分量的AC贡献，α指示线性模型的参数，DC指示色度分量的DC贡献。例如，AC分量是针对块的每个样本获得，而DC分量是针对整个块获得。具体而言，可以将已重建亮度样本子采样(subsampled)到色度分辨率，然后可以从每个亮度值中减去平均亮度值(亮度的DC)，以形成亮度的AC贡献。然后将亮度的AC贡献用于等式(1)的线性模式，以预测色度分量的AC值。为了根据亮度AC贡献估算或预测色度AC分量(而不是要求解码器计算缩放参数)，示例性CfL实现方式可基于原始色度样本确定参数α，并在码流中用信号表示参数α。这降低了解码器的复杂度且产生更精确的预测。至于色度分量的DC贡献，在一些示例性实现方式中，可使用色度分量内的帧内DC模式来计算色度分量的DC贡献。Where L<sub>^AC</sub> indicates the AC contribution of the luminance component, α indicates the parameters of the linear model, and DC indicates the DC contribution of the chrominance component. For example, the AC component is obtained for each sample of the block, while the DC component is obtained for the entire block. Specifically, the reconstructed luminance samples can be subsampled to the chrominance resolution, and then the average luminance value (DC of luminance) can be subtracted from each luminance value to form the AC contribution of luminance. The AC contribution of luminance is then used in the linear mode of equation (1) to predict the AC value of the chrominance component. In order to estimate or predict the chrominance AC component based on the luminance AC contribution (instead of requiring the decoder to calculate the scaling parameters), an exemplary CfL implementation can determine the parameter α based on the original chrominance samples and represent the parameter α as a signal in the bitstream. This reduces the complexity of the decoder and produces more accurate predictions. As for the DC contribution of the chrominance component, in some exemplary implementations, the intra-frame DC mode within the chrominance component can be used to calculate the DC contribution of the chrominance component.

在参考行的一些示例性实现方式中，可使用多线帧内预测。在这些实现方式中，多于一个参考行可供在帧内预测中选择，编码器决定并用信号表示使用哪个参考行来生成帧内预测。参考行索引可以在帧内预测模式之前用信号表示，在用信号表示非零参考行索引的情况下，只允许最可能的预测模式。参考图15，描绘了4个参考行以及左上参考样本的示例(从参考行0到参考行3)，其中每个参考行由六个部段组成，这六个部段即部段A至部段F(如1502-1512所指示的)。此外，部段A和部段F可分别通过来自部段B和部段E的最接近样本来填充。In some exemplary implementations of reference rows, multi-line intra-prediction can be used. In these implementations, more than one reference row is available for selection during intra-prediction, and the encoder decides and signals which reference row to use to generate the intra-prediction. The reference row index can be signaled before the intra-prediction mode; in the case of a non-zero reference row index, only the most probable prediction mode is allowed. Referring to Figure 15, an example of four reference rows and upper-left reference samples (from reference row 0 to reference row 3) is depicted, where each reference row consists of six segments, namely segments A to F (as indicated in 1502-1512). Furthermore, segments A and F can be filled with the nearest samples from segments B and E, respectively.

然后，可实现帧内预测块或帧间预测块的残差的变换，之后进行变换系数的量化。为了执行变换，在变换之前，帧内编码块和帧间编码块可进一步分成多个变换块(有时可互换地用作“变换单元”，即使术语“单元”通常用于表示三个颜色通道的集合，例如，“编码单元”可包括亮度编码块和色度编码块)。在一些实现方式中，可指定编码块(或预测块)的最大划分深度(术语“已编码块”可与“编码块”互换地使用)。例如，这种划分不会超过2个层级。在帧内预测块和帧间预测块之间，可以不同地处理将预测块划分成变换块的操作。然而，在一些实现方式中，在帧内预测块和帧间预测块之间，这种划分可以相似。Then, the residuals of the intra-frame prediction block or inter-frame prediction block can be transformed, followed by quantization of the transform coefficients. To perform the transform, the intra-frame and inter-frame coded blocks can be further divided into multiple transform blocks (sometimes used interchangeably as "transform units," even though the term "unit" is generally used to refer to a set of three color channels; for example, a "coding unit" may include a luma-coded block and a chroma-coded block). In some implementations, the maximum partitioning depth of the coded block (or prediction block) can be specified (the term "coded block" can be used interchangeably with "coded block"). For example, this partitioning will not exceed two levels. The operation of partitioning the prediction block into transform blocks can be handled differently between intra-frame and inter-frame prediction blocks. However, in some implementations, this partitioning can be similar between intra-frame and inter-frame prediction blocks.

在一些示例性实现方式中，对于帧内编码块，可以以所有变换块具有相同大小的方式进行变换划分，并以光栅扫描顺序对变换块进行编码。图12示出了帧内编码块的这种变换块划分的示例。具体而言，图12示出了编码块1202通过中间层级四叉树拆分1204而划分成16个具有相同块大小的变换块，如1206所示。用于编码的示例性光栅扫描顺序由图12中的顺序箭头示出。In some exemplary implementations, for intra-coded blocks, transforms can be partitioned such that all transform blocks have the same size, and the transform blocks are encoded in raster scan order. Figure 12 illustrates an example of such transform block partitioning for an intra-coded block. Specifically, Figure 12 shows that coded block 1202 is divided into 16 transform blocks of the same size through intermediate level quadtree splitting 1204, as shown in 1206. The exemplary raster scan order for encoding is indicated by the sequence arrows in Figure 12.

在一些示例性实现方式中，对于帧间编码块，可以以递归方式进行变换单元划分，其中划分深度可达预定数量的层级(例如，2个层级)。如图13所示，拆分可针对任何子分区且在任何层级停止或递归地继续。具体地，图13示出了一个示例，其中块1302拆分成四个四叉树子块1304，其中一个子块进一步拆分成四个二级变换块，而其它子块的划分在第一层级之后停止，从而产生总共7个具有两种不同大小的变换块。用于编码的示例性光栅扫描顺序进一步由图13中的顺序箭头示出。虽然图13示出了多达两级正方形变换块的四叉树拆分的示例性实现方式，但是在一些生成实现方式中，变换划分可支持1:1(正方形)、1:2/2:1和1:4/4:1变换块形状和大小，其范围从4×4到64×64。在一些示例性实现方式中，如果编码块小于或等于64×64，则变换块划分仅可应用于亮度分量(换句话说，在该条件下，色度变换块可与编码块相同)。否则，如果编码块宽度或高度大于64，则亮度编码块和色度编码块分别可隐式地拆分成多个min(W，64)×min(H，64)变换块和min(W，32)×min(H，32)变换块。In some exemplary implementations, transform units can be partitioned recursively for inter-frame coded blocks, with a partition depth of up to a predetermined number of levels (e.g., two levels). As shown in Figure 13, the partitioning can be applied to any sub-partition and stop at any level or continue recursively. Specifically, Figure 13 shows an example where block 1302 is partitioned into four quadtree sub-blocks 1304, one of which is further partitioned into four second-level transform blocks, while the partitioning of the other sub-blocks stops after the first level, resulting in a total of seven transform blocks of two different sizes. The exemplary raster scan order for encoding is further illustrated by the sequence arrows in Figure 13. While Figure 13 shows an exemplary implementation of quadtree partitioning for up to two levels of square transform blocks, in some generative implementations, transform partitioning can support 1:1 (square), 1:2/2:1, and 1:4/4:1 transform block shapes and sizes, ranging from 4×4 to 64×64. In some exemplary implementations, if the coded block is less than or equal to 64×64, the transform block partitioning can only be applied to the luma component (in other words, under this condition, the chroma transform block can be the same as the coded block). Otherwise, if the coded block width or height is greater than 64, the luma coded block and the chroma coded block can be implicitly split into multiple min(W, 64)×min(H, 64) transform blocks and min(W, 32)×min(H, 32) transform blocks, respectively.

在一些示例性实现方式中，如图16所示，提供将编码块或预测块划分成变换块的另一替代示例性方案。如图16所示，作为使用递归变换划分的替代，可根据编码块的变换类型将预定划分类型集应用于编码块。在图16所示的特定示例中，可应用6种示例性划分类型之一，以将编码块拆分成各种数量的变换块。这种方案可适用于编码块或预测块。在本公开中，术语“划分类型”通常可指的是对块(例如，预测块或编码块)进行划分的方式，其可指的是“变换划分类型”、“预测块划分类型”或“编码块划分类型”。此外，对于“变换划分类型”下的描述，相同的构思还可适用于“编码块划分类型”，反之亦然。In some exemplary implementations, as shown in Figure 16, another alternative exemplary scheme for partitioning a coding block or prediction block into transform blocks is provided. As shown in Figure 16, as an alternative to using recursive transform partitioning, a predetermined set of partition types can be applied to the coding block based on its transform type. In the specific example shown in Figure 16, one of six exemplary partition types can be applied to split the coding block into various numbers of transform blocks. This scheme can be applied to either coding blocks or prediction blocks. In this disclosure, the term "partition type" generally refers to the manner in which a block (e.g., a prediction block or coding block) is partitioned, and can refer to a "transform partition type," a "prediction block partition type," or a "coding block partition type." Furthermore, the same concept applies to the description under "transform partition type" and vice versa.

更详细地说，如图16所示，图16的划分方案对任何给定的变换类型提供多达6种划分类型。在该方案中，可基于例如率失真成本对每个编码块或预测块分配变换类型。在一个示例中，可基于编码块或预测块的变换类型来确定分配给编码块或预测块的划分类型。特定划分类型可对应于变换块拆分大小和模式(或划分类型)，如图16所示的4种划分类型所示。可预先定义各种变换类型和各种划分类型之间的对应关系。在下文中示出了示例性对应关系，其中大写字母标签指示可基于率失真成本分配给编码块或预测块的变换类型：More specifically, as shown in Figure 16, the partitioning scheme of Figure 16 provides up to six partitioning types for any given transform type. In this scheme, a transform type can be assigned to each coding block or prediction block based on, for example, rate-distortion cost. In one example, the partitioning type assigned to a coding block or prediction block can be determined based on its transform type. A specific partitioning type can correspond to a transform block split size and mode (or partitioning type), as shown by the four partitioning types in Figure 16. The correspondence between the various transform types and the various partitioning types can be predefined. Exemplary correspondences are shown below, where uppercase labels indicate the transform type that can be assigned to a coding block or prediction block based on rate-distortion cost:

·PARTITION_NONE(划分_无)：分配等于块大小的变换大小。• PARTITION_NONE: Allocates a transformation size equal to the block size.

·PARTITION_SPLIT(划分_拆分)：分配如下变换大小，该变换大小是块大小的宽度的1/2和块大小的高度的1/2。• PARTITION_SPLIT: Allocates a transform size that is half the width of the block size and half the height of the block size.

·PARTITION_HORZ(划分_HORZ)：分配如下变换大小，该变换大小与块大小具有相同宽度且是块大小的高度的1/2。• PARTITION_HORZ: Allocates a transform size that has the same width as the block size and is half the height of the block size.

·PARTITION_VERT(划分_VERT)：分配如下变换大小，该变换大小是块大小的宽度的1/2且与块大小具有相同高度。• PARTITION_VERT(Partition_VERT): Allocates a transform size that is half the width of the block size and has the same height as the block size.

·PARTITION_HORZ4(划分_HORZ4)：分配如下变换大小，该变换大小与块大小具有相同宽度且是块大小的高度的1/4。• PARTITION_HORZ4: Allocates a transform size that has the same width as the block size and is 1/4 of the block size's height.

·PARTITION_VERT4(划分_VERT4)：分配如下变换大小，该变换大小是块大小的宽度的1/4且与块大小具有相同高度。• PARTITION_VERT4: Allocates a transform size that is 1/4 the width of the block size and has the same height as the block size.

在上述示例中，对于划分的变换块，如图16所示的划分类型均包含统一变换大小。这仅仅是示例，而非限制。在一些其它实现方式中，在特定划分类型(或模式)中，混合变换块大小可用于划分的变换块。In the examples above, all partitioned transform blocks, as shown in Figure 16, contain a uniform transform size. This is merely an example, not a limitation. In some other implementations, mixed transform block sizes may be used for partitioned transform blocks within a specific partition type (or pattern).

然后，可以对如上获得的每个变换块进行主变换。主变换本质上是将变换块中的残差从空间域移动到频域。在实际主变换的一些实现方式中，为了支持上述示例性扩展编码块划分，可允许多种变换大小(对于两个维度中的每个维度，其范围从4点到64点)和变换形状(正方形；宽高比为2:1/1:2和4:1/1:4的矩形)。Then, a master transform can be performed on each transform block obtained above. The master transform is essentially moving the residuals in the transform block from the spatial domain to the frequency domain. In some implementations of the actual master transform, in order to support the exemplary extended coding block partitioning described above, various transform sizes (ranging from 4 to 64 points for each of the two dimensions) and transform shapes (squares; rectangles with aspect ratios of 2:1/1:2 and 4:1/1:4) are allowed.

转到实际主变换，在一些示例性实现方式中，2-D变换过程可涉及使用混合变换核(hybrid transform kernel)(例如，混合变换核可由用于已编码残差变换块的每个维度的不同1-D变换组成)。示例性1-D变换核可包括但不限于：a)4点DCT-2、8点DCT-2、16点DCT-2、32点DCT-2、64点DCT-2；b)4点不对称DST、8点不对称DST、16点不对称DST(DST-4，DST-7)及其翻转版本；c)4点恒等变换(identity transform)、8点恒等变换、16点恒等变换、32点恒等变换。对用于每个维度的变换核的选择可基于率失真(RD)准则。例如，在表1中列出了可实现的DCT-2和不对称DST的基函数。Turning to the actual master transform, in some exemplary implementations, the 2-D transform process may involve the use of a hybrid transform kernel (e.g., a hybrid transform kernel may consist of different 1-D transforms for each dimension of the encoded residual transform block). Exemplary 1-D transform kernels may include, but are not limited to: a) 4-point DCT-2, 8-point DCT-2, 16-point DCT-2, 32-point DCT-2, and 64-point DCT-2; b) 4-point asymmetric DST, 8-point asymmetric DST, 16-point asymmetric DST (DST-4, DST-7) and their inverted versions; c) 4-point identity transform, 8-point identity transform, 16-point identity transform, and 32-point identity transform. The selection of the transform kernel for each dimension may be based on a rate-distortion (RD) criterion. For example, the basis functions for implementable DCT-2 and asymmetric DST are listed in Table 1.

表1：示例性主变换基函数(针对N点输入的DCT-2、DST-4和DST-7)Table 1: Exemplary master transform basis functions (DCT-2, DST-4, and DST-7 for N-point inputs)

在一些示例性实现方式中，用于特定主变换实现方式的混合变换核的可用性可基于变换块大小和预测模式。在表2中列出了示例性依赖关系。对于色度分量，可以以隐式方式执行变换类型选择。例如，对于帧内预测残差，可根据帧内预测模式来选择变换类型，如表3所指定的。对于帧间预测残差，可根据共定位亮度块(co-located luma block)的变换类型选择来选择色度块的变换类型。因此，对于色度分量，在码流中不存在变换类型信令。In some exemplary implementations, the availability of a hybrid transform kernel for a particular master transform implementation can be based on the transform block size and prediction mode. Exemplary dependencies are listed in Table 2. For chroma components, transform type selection can be performed implicitly. For example, for intra-frame prediction residuals, the transform type can be selected based on the intra-frame prediction mode, as specified in Table 3. For inter-frame prediction residuals, the transform type of the chroma block can be selected based on the transform type selection of the co-located luma block. Therefore, for chroma components, there is no transform type signaling in the bitstream.

表2：AV1混合变换核及其基于预测模式和块大小的可用性。这里，→和↓指示水平维度和垂直维度；√和×指示针对该块大小和预测模式的内核可用性Table 2: AV1 Hybrid Transform Kernels and their Availability Based on Prediction Mode and Block Size. Here, → and ↓ indicate the horizontal and vertical dimensions; √ and × indicate kernel availability for that block size and prediction mode.

表3：色度分量帧内预测残差的变换类型选择Table 3: Transform type selection for intra-frame prediction residuals of chroma components

帧内预测Intra-frame prediction 垂直变换Vertical Transformation 水平变换Horizontal transformation DC_PREDDC_PRED DCTDCT DCTDCT V_PREDV_PRED ADSTADST DCTDCT H_PREDH_PRED DCTDCT ADSTADST D45_PREDD45_PRED DCTDCT DCTDCT D135_PREDD135_PRED ADSTADST ADSTADST D113_PREDD113_PRED ADSTADST DCTDCT D157_PREDD157_PRED DCTDCT ADSTADST D203_PREDD203_PRED DCTDCT ADSTADST D67_PREDD67_PRED ADSTADST DCTDCT SMOOTH_PREDSMOOTH_PRED ADSTADST ADSTADST SMOOTH_V_PREDSMOOTH_V_PRED ADSTADST DCTDCT SMOOTH_H_PREDSMOOTH_H_PRED DCTDCT ADSTADST PAETH_PREDPAETH_PRED ADSTADST ADSTADST

在一些实现方式中，可以对主变换系数执行二次变换。例如，如图14所示，可以在正向主变换和量化(在编码器处)之间以及在解量化和逆向主变换(在解码器侧)之间应用LFNST(低频不可分离变换)，以进一步对主变换系数去相关(decorrelate)，LFNST称为简化二次变换。本质上，LFNST可以取一部分主变换系数，例如低频部分(因此，从变换块的主变换系数的完整集“简化”)，以进行二次变换。在示例性LFNST中，根据变换块大小，可应用4×4不可分离变换或8×8不可分离变换。例如，4×4LFNST可应用于较小的变换块(例如，min(宽度，高度)<8)，而8×8LFNST可应用于较大的变换块(例如，min(宽度、高度)>8)。例如，如果对8×8变换块进行4×4LFNST，则只有8×8主变换系数的低频4×4部分会进一步经历二次变换。In some implementations, a secondary transformation can be performed on the master transform coefficients. For example, as shown in Figure 14, LFNST (Low-Frequency Inseparable Transform) can be applied between the forward master transform and quantization (at the encoder) and between dequantization and the inverse master transform (at the decoder) to further decorrelate the master transform coefficients. LFNST is called a simplified secondary transformation. Essentially, LFNST takes a portion of the master transform coefficients, such as the low-frequency portion (thus "simplifying" from the complete set of master transform coefficients in the transform block), for a secondary transformation. In the exemplary LFNST, depending on the transform block size, a 4×4 or 8×8 inseparable transformation can be applied. For example, a 4×4 LFNST can be applied to a smaller transform block (e.g., min(width, height) < 8), while an 8×8 LFNST can be applied to a larger transform block (e.g., min(width, height) > 8). For example, if a 4×4 LFNST is performed on an 8×8 transform block, only the low-frequency 4×4 portion of the 8×8 master transform coefficients will undergo a further secondary transformation.

具体地如图14所示，变换块可以是8×8(或16×16)。因此，变换块的正向主变换1402生成8×8(或16×16)主变换系数矩阵1404，其中每个正方形单元表示2×2(或4×4)部分。例如，正向LFNST的输入可以不是完整的8×8(或16×16)主变换系数。例如，4×4(或8×8)LFNST可用于二次变换。因此，只有主变换系数矩阵1404的4×4(或8×8)低频主变换系数可用作LFNST的输入，如阴影部分(左上角)1406所指示。可以不对主变换系数矩阵的其余部分进行二次变换。因此，在二次变换之后，主变换系数的、受到LFNST的那部分变成二次变换系数，而未受到LFNST的其余部分(例如，矩阵1404的无阴影部分)保持对应的主变换系数。在一些示例性实现方式中，未受到二次变换的其余部分可全部设置为零系数。Specifically, as shown in Figure 14, the transform block can be 8×8 (or 16×16). Therefore, the forward principal transform 1402 of the transform block generates an 8×8 (or 16×16) principal transform coefficient matrix 1404, where each square cell represents a 2×2 (or 4×4) portion. For example, the input to the forward LFNST may not be a complete 8×8 (or 16×16) principal transform coefficient. For example, a 4×4 (or 8×8) LFNST can be used for the secondary transform. Therefore, only the 4×4 (or 8×8) low-frequency principal transform coefficients of the principal transform coefficient matrix 1404 can be used as the input to the LFNST, as indicated by the shaded portion (top left) 1406. The secondary transform may not be performed on the remaining portion of the principal transform coefficient matrix. Therefore, after the secondary transform, the portion of the principal transform coefficients affected by the LFNST becomes the secondary transform coefficient, while the remaining portion not affected by the LFNST (e.g., the unshaded portion of matrix 1404) retains the corresponding principal transform coefficients. In some exemplary implementations, the remaining parts that are not subject to the second transformation can all be set to zero coefficients.

在下文中描述了LFNST中使用的不可分离变换的应用示例。为了应用示例性4×4LFNST，4×4输入块X(例如，表示主变换系数块的4×4低频部分，例如图14的主变换矩阵1404的阴影部分1406)可表示成：The following describes an application example of the inseparable transform used in LFNST. To apply the exemplary 4×4 LFNST, the 4×4 input block X (e.g., representing the 4×4 low-frequency portion of the master transform coefficient block, such as the shaded portion 1406 of the master transform matrix 1404 in Figure 14) can be represented as:

该2-D输入矩阵首先可以以示例性顺序线性化或扫描成矢量The 2D input matrix can first be linearized or scanned into a vector in an exemplary sequence.

然后，4×4LFNST的不可分离变换可计算成其中指示输出的变换系数矢量，T是16×16变换矩阵。然后，使用该块的扫描顺序(例如，水平、垂直或对角线)将所得到的16×1系数矢量反向扫描成4×4块。索引较小的系数可与较小的扫描索引一起放置在4×4系数块中。以这样的方式，可通过第二变换T进一步利用主变换系数X中的冗余，从而提供额外的压缩增强。The inseparable transform of the 4×4 LFNST is then computed into a vector of transform coefficients indicating the output, where T is a 16×16 transform matrix. The resulting 16×1 coefficient vector is then scanned in reverse into 4×4 blocks using the scan order of the blocks (e.g., horizontal, vertical, or diagonal). Coefficients with smaller indices can be placed in the 4×4 coefficient blocks along with smaller scan indices. In this way, redundancy in the main transform coefficients X can be further utilized via the second transform T, providing additional compression enhancement.

上述示例性LFNST基于直接矩阵乘法方法，以应用不可分离变换，使得在单次过程(single pass)中实现LFNST，而不需要多次迭代。在一些进一步的示例性实现方式中，可进一步降低示例性4×4LFNST的不可分离变换矩阵(T)的维度，以最小化计算复杂度和存储变换系数的存储器空间需求。这种实现方式可称为简化非分离变换(RST)。更详细地说，RST的主要思想是将N(在上述示例中，N为4×4＝16，但是对于8×8块，N可等于64)维矢量映射到不同空间中的R维矢量，其中N/R(R<N)表示降维因子(dimension reduction factor)。因此，作为N×N变换矩阵的替代，RST矩阵变成R×N矩阵，如下：The exemplary LFNST described above is based on a direct matrix multiplication method to apply the inseparable transformation, enabling LFNST to be implemented in a single pass without requiring multiple iterations. In some further exemplary implementations, the dimension of the inseparable transformation matrix (T) of the exemplary 4×4 LFNST can be further reduced to minimize computational complexity and memory requirements for storing the transformation coefficients. This implementation can be called Simplified Non-Separable Transformation (RST). More specifically, the main idea of RST is to map an N-dimensional vector (in the example above, N is 4×4 = 16, but for an 8×8 block, N can be equal to 64) to an R-dimensional vector in a different space, where N/R (R < N) represents the dimension reduction factor. Therefore, as an alternative to the N×N transformation matrix, the RST matrix becomes an R×N matrix, as follows:

其中，变换矩阵的R行是N维空间的减小的R基。因此，该变换将输入矢量或N维转换成减小的R维的输出矢量。因此，如图14所示，从主系数1406变换的二次变换系数1408在维度上以因子或N/R减小。图14中1408周围的三个正方形可以用零来填充。Here, the R rows of the transformation matrix form a reduced R-basis for the N-dimensional space. Therefore, this transformation converts the input vector, or N-dimensional vector, into a reduced R-dimensional output vector. Thus, as shown in Figure 14, the secondary transformation coefficients 1408, transformed from the principal coefficients 1406, are reduced in dimension by a factor of N/R. The three squares surrounding 1408 in Figure 14 can be filled with zeros.

RTS的逆变换矩阵可以是其正向变换的转置。对于示例性8×8LFNST(与上述4×4LFNST相比，这里的描述更加多样化)，可应用示例性减小因子4，因此64×64直接不可分离变换矩阵相应地简化成16×64直接矩阵。此外，在一些实现方式中，输入主系数的一部分(而不是全部)可线性化成为LFNST的输入矢量。例如，只有示例性8×8输入主变换系数的一部分可线性化为上述X矢量。对于一个特定示例，在8×8主变换系数矩阵的四个4×4象限之中，可忽略右下角(高频系数)，只有其它三个象限使用预定扫描顺序来线性化为48×1矢量，而非64×1矢量。在这样的实现方式中，不可分离变换矩阵可进一步从16×64简化成16×48。The inverse transform matrix of an RTS can be the transpose of its forward transform. For the exemplary 8×8 LFNST (which is more varied in description than the 4×4 LFNST described above), an exemplary reduction factor of 4 can be applied, thus reducing the 64×64 direct inseparable transform matrix to a 16×64 direct matrix. Furthermore, in some implementations, a portion (but not all) of the input principal coefficients can be linearized into the input vector of the LFNST. For example, only a portion of the exemplary 8×8 input principal transform coefficients can be linearized into the X vector described above. For a particular example, in the four 4×4 quadrants of the 8×8 principal transform coefficient matrix, the lower right quadrant (high-frequency coefficients) can be ignored, and only the other three quadrants are linearized into 48×1 vectors using a predetermined scan order, instead of 64×1 vectors. In such an implementation, the inseparable transform matrix can be further simplified from 16×64 to 16×48.

因此，可以在解码器侧使用示例性简化的48×16逆RST矩阵，以生成8×8核(主)变换系数的左上、右上和左下4×4象限。具体而言，当对相同的变换集配置应用进一步简化的16×48RST矩阵(而不是16×64RST)时，不可分离二次变换可以从8×8主系数块的三个4×4象限块(排除了右下4×4块)中提取矢量化的48个矩阵元素作为输入。在这样的实现方式中，在二次变换中可忽略已删掉的右下4×4主变换系数。这种进一步简化的变换可以将48×1矢量转换成16×1输出矢量，将16×1输出矢量反向扫描成4×4矩阵以填充图14的1408。1408周围的二次变换系数的三个正方形可以用零来填充。Therefore, an exemplary simplified 48×16 inverse RST matrix can be used on the decoder side to generate the top-left, top-right, and bottom-left 4×4 quadrants of the 8×8 kernel (master) transform coefficients. Specifically, when a further simplified 16×48 RST matrix (instead of 16×64 RST) is applied to the same transform set configuration, the inseparable quadratic transform can extract 48 vectorized matrix elements as input from the three 4×4 quadrant blocks of the 8×8 master coefficient block (excluding the bottom-right 4×4 block). In this implementation, the removed bottom-right 4×4 master transform coefficients can be ignored in the quadratic transform. This further simplified transform can convert the 48×1 vector into a 16×1 output vector, which is then scanned inversely into a 4×4 matrix to fill 1408 in Figure 14. The three squares of the quadratic transform coefficients around 1408 can be filled with zeros.

借助于RST的这种降维，减少了存储所有LFNST矩阵的存储器使用量。例如，在上述示例中，与不存在降维的实现方式相比，存储器使用量可以从10KB减小到8KB，而性能的降低相当小。By leveraging this dimensionality reduction of RST, the memory usage for storing all LFNST matrices is reduced. For example, in the example above, compared to an implementation without dimensionality reduction, memory usage can be reduced from 10KB to 8KB, with a relatively small performance degradation.

在一些实现方式中，为了降低复杂度，可进一步限制LFNST，以仅当位于要受到LFNST的主变换系数部分之外(例如，位于图14中1404的部分1406之外)的所有系数都不显著时才应用LFNST。因此，当应用LFNST时，所有的仅主变换系数(primary-only transformcoefficients)(例如，图14的主系数矩阵1404的无阴影部分)可接近零。这种限制允许在最后显著位置上调节LFNST索引信令，因此避免一些额外的系数扫描，而当不应用该限制时，检查特定位置的显著系数可能需要一些额外的系数扫描。在一些实现方式中，LFNST的最坏情况处理(就每像素的乘法而言)可分别将4×4块和8×8块的不可分离变换限制为8×16变换和8×48变换。在这些情况下，当应用LFNST时，最后显著扫描位置(last-significantscan position)必须小于8，而其它尺寸则小于16。对于形状为4×N和N×4且N>8的块，上述限制意味着现在仅对左上4×4区域应用一次LFNST。由于在应用LFNST时，所有的仅主系数(primary-only coefficients)均为零，因此在这种情况下，主变换所需的操作数量减少。从编码器的角度来看，当测试LFNST变换时，可简化系数的量化。对于前16个系数(按扫描次序)，必须最大限度地进行率失真优化量化(RDO)，可强制其余系数为零。In some implementations, to reduce complexity, LFNST can be further restricted to apply only if all coefficients outside the primary transform coefficient portion to be subjected to LFNST (e.g., outside portion 1406 of 1404 in Figure 14) are insignificant. Therefore, when LFNST is applied, all primary-only transform coefficients (e.g., the unshaded portion of the primary coefficient matrix 1404 in Figure 14) can approach zero. This restriction allows for adjustment of the LFNST index signaling at the last significant position, thus avoiding some additional coefficient scans, which might be necessary to check significant coefficients at a specific position without this restriction. In some implementations, worst-case handling of LFNST (in terms of multiplication per pixel) can restrict the inseparable transforms of 4×4 blocks and 8×8 blocks to 8×16 and 8×48 transforms, respectively. In these cases, when LFNST is applied, the last-significant-can position must be less than 8, while other sizes must be less than 16. For blocks of shape 4×N and N×4 where N>8, the above constraints mean that LFNST is now applied only once to the top-left 4×4 region. Since all primary-only coefficients are zero when LFNST is applied, the number of operations required for the primary transform is reduced in this case. From the encoder's perspective, coefficient quantization is simplified when testing the LFNST transform. For the first 16 coefficients (in scan order), rate-distortion optimized quantization (RDO) must be maximized, forcing the remaining coefficients to zero.

在一些示例性实现方式中，可用的RST内核可指定为多个变换集，其中每个变换集包括多个不可分离变换矩阵。例如，总共可存在4个变换集，每个变换集有2个不可分离变换矩阵(内核)供LFNST使用。这些内核可以预先离线训练，因此它们是数据驱动的。离线训练的变换核可存储在存储器中，或者可硬编码在编码或解码设备中，以在编码/解码过程期间使用。在编码或解码过程期间对变换集的选择可由帧内预测模式确定。可预先定义从帧内预测模式到变换集的映射。在表4中示出了这种预定映射的示例。例如，如表4所示，如果三个跨分量线性模型(CCLM)模式之一(INTRA_LT_CCLM、INTRA_T_CCLM或INTRA_L_CCLM)用于当前块(即81<＝predModeIntra<＝83)，则可以对当前色度块选择变换集0。对于每个变换集，可通过显式用信号表示的LFNST索引进一步指定所选择的不可分离二次变换候选。例如，在变换系数之后，索引可以在码流中对于每个帧内CU用信号表示一次。In some exemplary implementations, the available RST kernels can be specified as multiple transform sets, each comprising multiple inseparable transform matrices. For example, a total of four transform sets may exist, each with two inseparable transform matrices (kernels) for use by LFNST. These kernels can be pre-trained offline, thus being data-driven. Offline-trained transform kernels can be stored in memory or hard-coded in the encoding or decoding device for use during the encoding/decoding process. The selection of transform sets during the encoding or decoding process can be determined by the intra-prediction mode. A mapping from the intra-prediction mode to the transform set can be predefined. Examples of such predefined mappings are shown in Table 4. For example, as shown in Table 4, transform set 0 can be selected for the current chroma block if one of the three cross-component linear model (CCLM) modes (INTRA_LT_CCLM, INTRA_T_CCLM, or INTRA_L_CCLM) is used for the current block (i.e., 81 <= predModeIntra <= 83). For each transform set, the selected inseparable quadratic transform candidate can be further specified by an LFNST index explicitly represented by a signal. For example, after the transform coefficients, the index can be represented by a signal once for each intra-frame CU in the bitstream.

表4：变换选择表Table 4: Transformation Selection Table

IntraPredModeIntraPredMode 变换集索引Transform set index IntraPredMode<0IntraPredMode<0 11 0<＝IntraPredMode<＝10 <= IntraPredMode <= 1 00 2<＝IntraPredMode<＝122 <= IntraPredMode <= 12 11 13<＝IntraPredMode<＝2313 <= IntraPredMode <= 23 22 24<＝IntraPredMode<＝4424 <= IntraPredMode <= 44 33 45<＝IntraPredMode<＝5545 <= IntraPredMode <= 55 22 56<＝IntraPredMode<＝8056 <= IntraPredMode <= 80 11 81<＝IntraPredMode<＝8381 <= IntraPredMode <= 83 00

因为在上述示例性实现方式中，仅当位于第一系数子群或部分之外的所有系数都不有效时才限制LFNST可应用，所以LFNST索引编码取决于最后显著系数(lastsignificant coefficient)的位置。此外，LFNST索引可被上下文编码，但是不依赖于帧内预测模式，且只有第一个bin可被上下文编码。此外，LFNST可应用于帧内切片和帧间切片中的帧内CU，以及亮度和色度。如果启用双树，则可分别用信号表示亮度和色度的LFNST索引。对于帧间切片(禁用双树)，可以用信号表示单个LFNST索引，单个LFNST索引用于亮度和色度两者。Because the application of LFNST is restricted only when all coefficients outside the first coefficient subgroup or portion are invalid in the above exemplary implementation, LFNST index encoding depends on the position of the last significant coefficient. Furthermore, LFNST indices can be context-encoded, but are independent of the intra-prediction mode, and only the first bin can be context-encoded. Additionally, LFNST can be applied to intra-CUs in intra-slices and inter-slices, as well as luma and chroma. If dual-tree is enabled, the LFNST indices for luma and chroma can be represented by signals separately. For inter-slices (with dual-tree disabled), a single LFNST index can be represented by a signal, with a single LFNST index used for both luma and chroma.

在一些示例性实现方式中，当选择帧内子划分(ISP)模式时，可禁用LFNST且可以不用信号表示RST索引，原因是即使RST应用于每个可行的划分块，性能的改进也可能微不足道。此外，对ISP预测的残差禁用RST可降低编码复杂度。在一些进一步的实现方式中，当选择多元线性回归帧内预测(MIP)模式时，也可禁用LFNST且可以不用信号表示RST索引。In some exemplary implementations, when the Intra-Frame Sub-Partitioning (ISP) mode is selected, LFNST can be disabled and the RST index can be omitted from the signal representation because the performance improvement may be negligible even if RST is applied to every feasible partition block. Furthermore, disabling RST on the residuals of ISP predictions reduces coding complexity. In some further implementations, when the Multivariate Linear Regression Intra-Frame Prediction (MIP) mode is selected, LFNST can also be disabled and the RST index can be omitted from the signal representation.

考虑到由于现有的最大变换大小限制(例如，64×64)而导致大于64×64(或代表最大变换块大小的任何其它预定大小)的大CU被隐式地拆分(例如，TU平铺(TU tiling))，因此LFNST索引搜索针对特定数量的解码流水线阶段可将数据缓冲增加四倍。因此，在一些实现方式中，允许LFNST的最大大小可限制为例如64×64。在一些实现方式中，仅在DCT2作为主变换时才可启用LFNST。Considering that large CUs larger than 64×64 (or any other predetermined size representing the maximum transform block size) are implicitly split (e.g., TU tiling) due to existing maximum transform size limitations (e.g., 64×64), LFNST index search can quadruple the data buffer size for a specific number of decoding pipeline stages. Therefore, in some implementations, the maximum allowed size of LFNST can be limited to, for example, 64×64. In some implementations, LFNST is enabled only when DCT2 is the primary transform.

在一些其它实现方式中，通过定义例如12个二次变换集，给亮度分量提供帧内二次变换(IST)，其中每个集中存在例如3个内核。帧内模式相关索引(intra mode dependentindex)可用于选择变换集。一个集中的内核选择可基于用信号表示的语法元素。当DCT2或ADST用作水平变换和垂直主变换两者时，可启用IST。在一些实现方式中，根据块大小，可选择4×4不可分离变换或8×8不可分离变换。如果min(tx_宽度，tx_高度)<8，则可选择4×4IST。对于较大的块，可使用8×8IST。这里，tx_宽度(tx_width)和tx_高度(tx_height)分别对应于变换块的宽度和高度。IST的输入可以是处于Z字形扫描顺序的低频主变换系数。In some other implementations, an intra-frame quadratic transform (IST) is provided to the luma component by defining, for example, 12 quadratic transform sets, where each set contains, for example, 3 kernels. An intra-mode dependent index can be used to select the transform set. Kernel selection within a set can be based on syntax elements represented by signals. IST can be enabled when DCT2 or ADST is used for both the horizontal transform and the vertical master transform. In some implementations, a 4×4 or 8×8 non-separable transform can be selected depending on the block size. A 4×4 IST can be selected if min(tx_width, tx_height) < 8. For larger blocks, an 8×8 IST can be used. Here, tx_width and tx_height correspond to the width and height of the transform block, respectively. The input to the IST can be low-frequency master transform coefficients in a zigzag scan order.

视频编码或解码过程中的各种变换(例如，残差块中的样本的主变换或者主变换系数过程中的块的二次变换)在仅使用可分离变换方案时，在捕获定向纹理图案(例如，45度方向(例如，基本上远离水平或垂直方向的方向)的边缘)方面可能不是十分有效。如上所述，在一些示例性实现方式中，一个或多个不可分离变换设计可用于主变换系数的二次变换。Various transformations during video encoding or decoding (e.g., the master transform of samples in a residual block or the second transformation of a block in the master transform coefficient process) may not be very effective at capturing directional texture patterns (e.g., edges in a 45-degree direction (e.g., directions substantially away from the horizontal or vertical direction)) when using only separable transform schemes. As mentioned above, in some exemplary implementations, one or more non-separable transform designs can be used for the second transform of the master transform coefficients.

变换块划分和应用于所划分的变换块的变换类型可相互关联。例如，某些变换类型可更适合于特定划分类型。例如，与递归划分(例如之前在图13中描述的划分)相比，在图16中示出且在上文描述的变换划分方案提供非递归划分类型。如果针对所有可用的划分模式(例如，图16中的变换划分类型)下划分的变换块，允许所有可用的变换类型，则在确定哪个变换划分类型用于获得变换块以及哪个变换类型用于每个划分的变换块时，编码器需要在大参数空间中执行优化。实际上，与其它变换类型相比，某个变换类型集通常更适合于变换划分类型的一个特定类型。在下文中描述的各种实现方式中，可考虑变换划分类型和变换类型之间的交互，并使用这种交互来获得方案以限制用于特定划分类型的允许变换类型，并且类似地限制用于特定变换类型的允许划分类型。这样的实现方式可以在对每个所划分的变换块确定变换划分模式和变换类型选择时，特别是在使用非递归变换划分的情况下，减少编码器的优化空间。Transform block partitioning and the transform types applied to the partitioned transform blocks can be correlated. For example, certain transform types may be better suited to a particular partition type. For instance, the transform partitioning scheme shown in Figure 16 and described above provides non-recursive partition types compared to recursive partitioning (e.g., the partitioning previously described in Figure 13). If all available transform types are allowed for transform blocks partitioned under all available partitioning modes (e.g., the transform partitioning types in Figure 16), the encoder needs to perform optimization in a large parameter space when determining which transform partitioning type to use to obtain transform blocks and which transform type to use for the transform blocks of each partition. In practice, a certain set of transform types is often better suited to a particular type of transform partitioning type compared to other transform types. In the various implementations described below, the interaction between transform partitioning types and transform types can be considered, and this interaction can be used to obtain a scheme that restricts the allowed transform types for a particular partitioning type, and similarly restricts the allowed partitioning types for a particular transform type. Such implementations can reduce the encoder's optimization space when determining the transform partitioning mode and transform type selection for each partitioned transform block, especially when using non-recursive transform partitioning.

这些示例性实现方式可单独地使用或者以任何顺序或任何方式组合。在上文和下文的公开内容中，术语“已编码块”、“编码块”等可用于指代执行预测或变换的图片单元。编码块可以是亮度编码块，或者可以是色度编码块。在一些情况下，已编码/编码块可指的是预测块。术语“块大小”用于指代编码块的宽度或高度，或者宽度和高度的最大值，或者宽度和高度的最小值，或者面积大小(宽度*高度)，或者宽高比(宽度：高度，或高度：宽度)。These exemplary implementations can be used individually or in any order or in any combination. In the disclosure above and below, the terms "encoded block," "encoded block," etc., can be used to refer to a picture unit that performs a prediction or transformation. An encoded block can be a luminance-encoded block or a chrominance-encoded block. In some cases, an encoded/encoded block can refer to a prediction block. The term "block size" is used to refer to the width or height of the encoded block, or the maximum of the width and height, or the minimum of the width and height, or the area size (width * height), or the aspect ratio (width:height, or height:width).

多个候选主变换类型Multiple candidate master transform types

在一个实施例中，块可存在多个候选主变换类型。块可包括由划分产生的变换块。主变换类型的选择和/或用信号表示可限制为预定变换划分类型集。预定变换划分类型集可以是更大的可用变换划分类型集(例如，图16中的完整的划分类型集)的子集。换句话说，仅当块的变换划分类型属于预定变换划分类型集时，才做出并用信号表示对主变换的选择。否则，例如，如果变换划分类型是其它类型，则可使用默认变换类型，而不是选择并用信号表示主变换类型。In one embodiment, a block may have multiple candidate primary transform types. A block may include transform blocks resulting from partitioning. The selection and/or signaling of the primary transform type may be limited to a predetermined set of transform partition types. The predetermined set of transform partition types may be a subset of a larger set of available transform partition types (e.g., the complete set of partition types in Figure 16). In other words, a selection of the primary transform, signaled, is made only if the block's transform partition type belongs to the predetermined set of transform partition types. Otherwise, for example, if the transform partition type is another type, a default transform type may be used instead of selecting and signaling the primary transform type.

在一个实现方式中，主变换类型可包括以下至少之一：离散余弦变换(DCT)类型1至DCT类型8；非对称离散正弦变换(ADST)；离散正弦变换(DST)类型1至DST类型8；线图变换(LGT)；或者卡洛南-洛伊变换(Karhunen-Loeve Transform，KLT)。In one implementation, the main transform type may include at least one of the following: Discrete Cosine Transform (DCT) type 1 to DCT type 8; Asymmetric Discrete Sine Transform (ADST); Discrete Sine Transform (DST) type 1 to DST type 8; Line Graph Transform (LGT); or Karhunen-Loeve Transform (KLT).

在一个实现方式中，预定变换划分类型集仅包括例如图16的各种变换划分类型中的PARTITION_NONE，即，变换块大小等于预测块(或编码块)大小。因此，仅当变换类型属于该预定集时，才可能选择和/或用信号表示主变换类型。In one implementation, the predetermined transform partition type set includes only PARTITION_NONE from the various transform partition types in Figure 16, i.e., the transform block size is equal to the prediction block (or coding block) size. Therefore, the main transform type can only be selected and/or represented by a signal if the transform type belongs to this predetermined set.

在一个实现方式中，还可考虑划分的数量来确定预定变换划分类型集。例如，仅当特定变换划分类型的划分的数量小于或等于预定阈值时，才可能将特定变换划分类型视为预定变换划分类型集的一部分，可对该部分选择和/或用信号表示主变换类型。在一个实现方式中，预定阈值可以是从1到16的整数。In one implementation, the number of partitions can also be considered to determine the predetermined set of transform partition types. For example, a particular transform partition type may be considered part of the predetermined set of transform partition types only if the number of partitions for that particular transform partition type is less than or equal to a predetermined threshold, and the dominant transform type can be selected and/or represented by a signal for that part. In one implementation, the predetermined threshold can be an integer from 1 to 16.

多个候选二次变换类型Multiple candidate quadratic transformation types

在一个实施例中，块可存在多个候选二次变换类型。块可包括由划分产生的变换块。二次变换类型的选择和/或用信号表示仅可应用于预定变换划分类型集。预定变换划分类型集可以是更大的可用变换划分类型集(例如，图16中的完整的变换划分类型集)的子集。换句话说，仅当块的变换划分类型属于预定变换划分类型集时，才做出并用信号表示对二次变换的选择。否则，例如，如果变换划分类型是其它类型，则可使用默认二次变换类型，而不是选择并用信号表示二次变换类型，或者可以不执行二次变换。In one embodiment, a block may have multiple candidate quadratic transform types. A block may include transform blocks resulting from partitioning. The selection and/or signaling of the quadratic transform type can only be applied to a predetermined set of transform partition types. The predetermined set of transform partition types may be a subset of a larger set of available transform partition types (e.g., the complete set of transform partition types in Figure 16). In other words, a selection of a quadratic transform, signaled, is made only if the block's transform partition type belongs to the predetermined set of transform partition types. Otherwise, for example, if the transform partition type is another type, a default quadratic transform type may be used instead of selecting and signaling the quadratic transform type, or the quadratic transform may not be performed.

在一个实现方式中，二次变换类型可包括KLT。KLT可配置成具有不同的内核。In one implementation, the secondary transformation type may include KLT. KLT can be configured to have different kernels.

在一个实现方式中，预定变换划分类型集仅可包括例如图16的各种变换划分类型中的PARTITION_NONE，即，变换块大小等于预测块(或编码块)大小。因此，仅当变换类型属于该预定集时，才可选择和/或用信号表示二次变换类型。In one implementation, the predetermined transform partition type set can only include PARTITION_NONE from the various transform partition types in Figure 16, i.e., the transform block size is equal to the prediction block (or coding block) size. Therefore, the secondary transform type can only be selected and/or represented by a signal if the transform type belongs to this predetermined set.

在一个实现方式中，还可考虑划分的数量来确定预定变换划分类型集。例如，仅当特定变换划分类型的划分的数量小于或等于预定阈值时，才可能将特定变换划分类型视为预定变换划分类型集的一部分，可对该部分选择和/或用信号表示二次变换类型。在一个实现方式中，预定阈值可以是从1到16的整数。In one implementation, the number of partitions can also be considered to determine the predetermined set of transform partition types. For example, a particular transform partition type may be considered part of the predetermined set of transform partition types only if the number of partitions for that particular transform partition type is less than or equal to a predetermined threshold. This part can then be selected and/or represented by a signal to indicate the quadratic transform type. In one implementation, the predetermined threshold can be an integer from 1 to 16.

在一个实现方式中，二次变换类型的选择和/或用信号表示可基于变换划分类型和主变换类型的组合。组合可包括预定变换划分类型集中的变换划分类型和预定变换类型集中的主变换类型。例如，仅当变换划分类型为PARTITION_NONE且块所使用的主变换类型为DCT或ADST时，才可能需要选择和/或用信号表示二次变换类型。否则，例如，如果变换划分类型是其它类型，则可使用默认二次变换类型，而不是选择并用信号表示二次变换类型，或者可以不执行二次变换。In one implementation, the selection and/or signaling of the secondary transform type can be based on a combination of transform partition type and primary transform type. This combination may include transform partition types from a predetermined set of transform partition types and primary transform types from a predetermined set of transform types. For example, selecting and/or signaling the secondary transform type may only be necessary if the transform partition type is PARTITION_NONE and the primary transform type used by the block is DCT or ADST. Otherwise, for example, if the transform partition type is another type, a default secondary transform type may be used instead of selecting and signaling the secondary transform type, or the secondary transform may not be performed.

变换相关信令Transformation related signaling

在本公开中，公开了各种信令机制，目的是提高信令效率，并考虑了变换相关语法元素/参数的顺序。This disclosure presents various signaling mechanisms aimed at improving signaling efficiency and takes into account the order of related syntax elements/parameters.

在一个实施例中，变换划分类型信息可以在主/二次变换类型选择信息之前用信号表示。仅当变换划分属于预定变换划分类型集(例如PARTITION_NONE)时，才需要用信号表示主/二次变换类型选择。否则，如果变换划分不属于预定变换划分类型集时，则可能不需要用信号表示主/二次变换类型选择。取而代之的是，主/二次变换类型可被导出为预定默认变换类型。In one embodiment, transform partition type information can be represented by a signal before the primary/secondary transform type selection information. Signaling the primary/secondary transform type selection is only necessary if the transform partition belongs to a predetermined transform partition type set (e.g., PARTITION_NONE). Otherwise, if the transform partition does not belong to the predetermined transform partition type set, signaling the primary/secondary transform type selection may not be necessary. Instead, the primary/secondary transform type can be derived as a predetermined default transform type.

在一个实施例中，主/二次变换类型选择信息可以在变换划分类型信息之前用信号表示。在这种情况下，变换划分类型的选择和/或用信号表示可取决于主/二次变换类型选择信息。In one embodiment, the primary/secondary transform type selection information can be represented by a signal before the transform partition type information. In this case, the selection of the transform partition type and/or its representation by a signal may depend on the primary/secondary transform type selection information.

在一个实现方式中，仅当主变换类型属于预定变换类型集时，才可能需要用信号表示变换划分类型信息。否则，可能不需要用信号表示变换划分类型信息。例如，预定变换类型集可包括但不限于DCT类型1至DCT类型8、ADST、DST类型1至DST类型8，LGT和KLT。In one implementation, it may be necessary to represent transform type information with signals only if the primary transform type belongs to a predetermined transform type set. Otherwise, it may not be necessary to represent transform type information with signals. For example, the predetermined transform type set may include, but is not limited to, DCT type 1 to DCT type 8, ADST, DST type 1 to DST type 8, LGT, and KLT.

在一个实现方式中，仅当二次变换类型属于预定变换类型集时，才可能需要用信号表示变换划分类型信息。作为示例，预定变换类型集可包括但不限于具有与预定KLT索引相关联的内核的特定KLT。否则，如果二次变换类型不属于预定变换类型集时，则可能不需要用信号表示变换划分类型信息。取而代之的是，变换划分类型信息可被导出为预定默认变换划分类型(例如PARTITION_NONE)。In one implementation, it may be necessary to represent transform partition type information with signals only if the quadratic transform type belongs to a predetermined transform type set. As an example, the predetermined transform type set may include, but is not limited to, a specific KLT with a kernel associated with a predetermined KLT index. Otherwise, if the quadratic transform type does not belong to the predetermined transform type set, it may not be necessary to represent transform partition type information with signals. Instead, the transform partition type information can be derived as a predetermined default transform partition type (e.g., PARTITION_NONE).

图17示出了用于对视频数据进行解码的示例性方法1700。方法1700可包括以下步骤中的一部分或全部步骤：步骤1710，接收数据块的已编码视频码流；步骤1720，从已编码视频码流中提取与数据块相关联的变换划分类型；步骤1730，响应于变换划分类型属于预定变换划分类型集的子集，预定变换划分类型集中的每个变换划分类型指定用于将数据块分割成变换块的拆分模式：提取与从数据块拆分的变换块相关联的变换的变换类型，该变换类型在已编码视频码流中用信号表示，其中，变换类型属于第一预定变换类型集；以及根据变换类型，对变换块执行逆变换。Figure 17 illustrates an exemplary method 1700 for decoding video data. Method 1700 may include some or all of the following steps: step 1710, receiving an encoded video stream of data blocks; step 1720, extracting a transform partition type associated with the data blocks from the encoded video stream; step 1730, in response to a transform partition type belonging to a subset of a predetermined set of transform partition types, each transform partition type in the predetermined set specifying a splitting pattern for dividing the data blocks into transform blocks: extracting a transform type associated with the transform blocks split from the data blocks, the transform type being represented by a signal in the encoded video stream, wherein the transform type belongs to a first predetermined set of transform types; and performing an inverse transform on the transform blocks according to the transform type.

在本公开的实施例中，可根据需要以任何数量或顺序组合或排列任何步骤和/或操作。两个或更多个步骤和/或操作可并行地执行。In embodiments of this disclosure, any steps and/or operations may be combined or arranged in any number or order as needed. Two or more steps and/or operations may be performed in parallel.

本公开的实施例可单独地使用或者以任何顺序组合。此外，每种方法(或实施例)、编码器和解码器可通过处理电路(例如，一个或多个处理器或者一个或多个集成电路)来实现。在一个示例中，一个或多个处理器执行存储在非暂时性计算机可读介质中的程序。本公开的实施例可适用于亮度块或色度块。The embodiments of this disclosure can be used individually or in any combination in any order. Furthermore, each method (or embodiment), encoder, and decoder can be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer-readable medium. The embodiments of this disclosure can be applied to luma blocks or chroma blocks.

上述技术可实现为计算机软件，该计算机软件使用计算机可读指令，且物理地存储在一个或多个计算机可读介质中。例如，图18示出了适于实施所公开的主题的某些实施例的计算机系统(1800)。The above-described techniques can be implemented as computer software that uses computer-readable instructions and is physically stored in one or more computer-readable media. For example, Figure 18 illustrates a computer system (1800) suitable for implementing certain embodiments of the disclosed subject matter.

可使用任何合适的机器代码或计算机语言对计算机软件进行编码，任何合适的机器代码或计算机语言可经受汇编、编译、链接或类似的机制以创建包括指令的代码，指令可由一个或多个计算机中央处理单元(CPU)、图形处理单元(GPU)等直接执行，或者通过解释、微代码执行等执行。Computer software can be coded using any suitable machine code or computer language. Any suitable machine code or computer language can be assembled, compiled, linked, or similarly processed to create code containing instructions that can be executed directly by one or more computer central processing units (CPUs), graphics processing units (GPUs), or through interpretation, microcode execution, etc.

指令可以在各种类型的计算机或其组件上执行，计算机或其组件包括例如个人计算机、平板计算机、服务器、智能电话、游戏设备、物联网设备等。The instructions can be executed on various types of computers or their components, including, for example, personal computers, tablet computers, servers, smartphones, gaming devices, Internet of Things devices, etc.

图18所示的计算机系统(1800)的组件本质上是示例性的，并不旨在对实施本公开的实施例的计算机软件的用途或功能的范围提出任何限制。组件的配置也不应解释为具有与计算机系统(1800)的示例性实施例中所示的组件中的任何一个组件或组件的组合相关的任何依赖或要求。The components of the computer system (1800) shown in Figure 18 are exemplary in nature and are not intended to impose any limitation on the scope of use or functionality of computer software implementing embodiments of this disclosure. The configuration of the components should also not be construed as having any dependencies or requirements relating to any one or a combination of components shown in the exemplary embodiments of the computer system (1800).

计算机系统(1800)可包括某些人机接口输入设备。此类人机接口输入设备可响应于一个或多个人类用户通过例如下述的输入：触觉输入(例如：击键、划动，数据手套移动)、音频输入(例如：语音、拍手)、视觉输入(例如：手势)、嗅觉输入(未描绘)。人机接口设备还可用于捕获不一定与人的意识输入直接相关的某些媒介，例如音频(例如：语音、音乐、环境声音)、图像(例如：扫描图像、从静止图像相机获取的拍摄图像)、视频(例如，二维视频、包括立体视频的三维视频)。The computer system (1800) may include certain human-machine interface input devices. Such human-machine interface input devices may respond to input from one or more human users through, for example, tactile input (e.g., keystrokes, swipes, movement of a data glove), audio input (e.g., speech, clapping), visual input (e.g., gestures), and olfactory input (not depicted). The human-machine interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, captured images from a still image camera), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

人机接口输入设备可包括下述中的一项或多项(每种中仅示出一个)：键盘(1801)、鼠标(1802)、触控板(1803)、触摸屏(1810)、数据手套(未示出)、操纵杆(1805)、麦克风(1806)、扫描仪(1807)、相机(1808)。Human-machine interface input devices may include one or more of the following (only one of each is shown): keyboard (1801), mouse (1802), touchpad (1803), touch screen (1810), data glove (not shown), joystick (1805), microphone (1806), scanner (1807), camera (1808).

计算机系统(1800)还可包括某些人机接口输出设备。此类人机接口输出设备可通过例如触觉输出、声音、光和气味/味道来刺激一个或多个人类用户的感官。此类人机接口输出设备可包括触觉输出设备(例如，触摸屏(1810)的触觉反馈、数据手套(未示出)或操纵杆(1805)，但还可以是不作为输入设备的触觉反馈设备)、音频输出设备(例如：扬声器(1809)、耳机(未描绘))、视觉输出设备(例如，包括CRT屏幕、LCD屏幕、等离子屏幕、OLED屏幕的屏幕(1810)，每种屏幕具有或没有触摸屏输入功能，每种屏幕具有或没有触觉反馈功能，其中的一些屏幕能够通过诸如立体图像输出之类的装置、虚拟现实眼镜(未描绘)、全息显示器和烟箱(未描绘)以及打印机(未描绘)来输出二维视觉输出或超过三维的输出。The computer system (1800) may also include certain human-machine interface output devices. Such human-machine interface output devices may stimulate the senses of one or more human users through, for example, tactile output, sound, light, and smell/taste. Such human-machine interface output devices may include tactile output devices (e.g., tactile feedback of a touchscreen (1810), data gloves (not shown), or joysticks (1805), but may also be tactile feedback devices that are not input devices), audio output devices (e.g., speakers (1809), headphones (not depicted)), and visual output devices (e.g., screens (1810) including CRT screens, LCD screens, plasma screens, OLED screens, each screen may or may not have touchscreen input functionality, each screen may or may not have tactile feedback functionality, some of which are capable of outputting two-dimensional or more three-dimensional visual outputs through devices such as stereoscopic image output, virtual reality glasses (not depicted), holographic displays and smoke boxes (not depicted), and printers (not depicted).

计算机系统(1800)还可包括人类可访问存储设备及其关联介质，例如包括具有CD/DVD等介质(1821)的CD/DVD ROM/RW(1820)的光学介质、指状驱动器(1822)、可拆卸硬盘驱动器或固态驱动器(1823)、诸如磁带和软盘之类的传统磁性介质(未描绘)、诸如安全软件狗之类的基于专用ROM/ASIC/PLD的设备(未描绘)等。The computer system (1800) may also include human-accessible storage devices and their associated media, such as optical media including CD/DVD ROM/RW (1820) having media such as CD/DVD (1821), finger drives (1822), removable hard disk drives or solid-state drives (1823), conventional magnetic media such as magnetic tapes and floppy disks (not depicted), devices based on dedicated ROM/ASIC/PLD such as security dongles (not depicted), etc.

本领域技术人员还应该理解，结合当前公开的主题所使用的术语“计算机可读介质”不涵盖传输介质、载波或其它暂时性信号。Those skilled in the art should also understand that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not cover transmission media, carrier waves, or other transient signals.

计算机系统(1800)还可包括通向一个或多个通信网络(1855)的接口(1854)。网络可例如是无线网络、有线网络、光网络。网络可进一步是本地网络、广域网络、城域网络、车辆和工业网络、实时网络、延迟容忍网络等。网络的示例包括诸如以太网之类的局域网、无线LAN、包括GSM、3G、4G、5G、LTE等的蜂窝网络、包括有线电视、卫星电视和地面广播电视的电视有线或无线广域数字网络、包括CAN总线的车辆和工业网络等。某些网络通常需要附接到某些通用数据端口或外围总线(1849)的外部网络接口适配器(例如，计算机系统(1800)的USB端口)；如下所述，其它网络接口通常通过附接到系统总线而集成到计算机系统(1800)的内核中(例如，连接到PC计算机系统中的以太网接口或连接到智能手机计算机系统中的蜂窝网络接口)。计算机系统(1800)可使用这些网络中的任何网络与其它实体通信。此类通信可以是仅单向接收的(例如，广播电视)、仅单向发送的(例如，连接到某些CANBus设备的CANBus)或双向的，例如，使用局域网或广域网数字网络连接到其它计算机系统。如上所述，可以在那些网络和网络接口中的每一个上使用某些协议和协议栈。The computer system (1800) may also include an interface (1854) leading to one or more communication networks (1855). The network may be, for example, a wireless network, a wired network, or an optical network. The network may further be a local area network, a wide area network, a metropolitan area network, a vehicle and industrial network, a real-time network, a latency-tolerant network, etc. Examples of networks include local area networks such as Ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., cable or wireless wide area digital television networks including cable television, satellite television, and terrestrial broadcast television, vehicle and industrial networks including CAN buses, etc. Some networks typically require an external network interface adapter (e.g., a USB port of the computer system (1800)) attached to some general-purpose data port or peripheral bus (1849); other network interfaces are typically integrated into the core of the computer system (1800) by being attached to a system bus (e.g., an Ethernet interface connected to a PC computer system or a cellular network interface connected to a smartphone computer system). The computer system (1800) can use any of these networks to communicate with other entities. Such communication can be one-way receiving (e.g., broadcast television), one-way transmitting (e.g., a CANBus connected to certain CANBus devices), or bidirectional, such as connecting to other computer systems using a local area network (LAN) or wide area network (WAN) digital network. As mentioned above, certain protocols and protocol stacks can be used on each of those networks and network interfaces.

上述人机接口设备、人机可访问的存储设备和网络接口可附接到计算机系统(1800)的内核(1840)。The aforementioned human-machine interface device, human-machine accessible storage device, and network interface can be attached to the kernel (1840) of the computer system (1800).

内核(1840)可包括一个或多个中央处理单元(CPU)(1841)、图形处理单元(GPU)(1842)、现场可编程门区域(FPGA)(1843)形式的专用可编程处理单元、用于某些任务的硬件加速器(1844)、图形适配器(1850)等。这些设备以及只读存储器(ROM)(1845)、随机存取存储器(1846)、诸如内部非用户可访问的硬盘驱动器、SSD等之类的内部大容量存储器(1847)可通过系统总线(1848)连接。在一些计算机系统中，可以以一个或多个物理插头的形式访问系统总线(1848)，以能够通过附加的CPU、GPU等进行扩展。外围设备可直接附接到内核的系统总线(1848)或通过外围总线(1849)附接到内核的系统总线(1848)。在一个示例中，屏幕(1810)可连接到图形适配器(1850)。外围总线的架构包括PCI、USB等。The core (1840) may include one or more central processing units (CPU) (1841), graphics processing units (GPUs) (1842), dedicated programmable processing units in the form of field-programmable gate areas (FPGAs) (1843), hardware accelerators (1844) for certain tasks, graphics adapters (1850), etc. These devices, along with read-only memory (ROM) (1845), random access memory (1846), and internal mass storage (1847) such as internal non-user-accessible hard disk drives, SSDs, etc., may be connected via the system bus (1848). In some computer systems, the system bus (1848) may be accessed via one or more physical connectors to allow for expansion with additional CPUs, GPUs, etc. Peripheral devices may be directly attached to the core's system bus (1848) or attached via a peripheral bus (1849). In one example, a screen (1810) may be connected to a graphics adapter (1850). Peripheral bus architectures include PCI, USB, etc.

CPU(1841)、GPU(1842)、FPGA(1843)和加速器(1844)可执行某些指令，这些指令可组合来构成上述计算机代码。该计算机代码可存储在ROM(1845)或RAM(1846)中。过渡数据还可存储在RAM(1846)中，而永久数据可例如存储在内部大容量存储器(1847)中。可通过使用高速缓存来进行通向任何存储设备的快速存储及检索，该高速缓存可与下述紧密关联：一个或多个CPU(1841)、GPU(1842)、大容量存储(1847)、ROM(1845)、RAM(1846)等。The CPU (1841), GPU (1842), FPGA (1843), and accelerator (1844) can execute certain instructions that can be combined to form the aforementioned computer code. This computer code can be stored in ROM (1845) or RAM (1846). Transient data can also be stored in RAM (1846), while permanent data can be stored, for example, in internal mass storage (1847). Fast storage and retrieval to any storage device can be achieved by using a cache, which can be closely associated with one or more CPUs (1841), GPUs (1842), mass storage (1847), ROM (1845), RAM (1846), etc.

计算机可读介质可以在其上具有执行各种由计算机实现的操作的计算机代码。介质和计算机代码可以是出于本公开的目的而专门设计和构造的介质和计算机代码，或者介质和计算机代码可以是计算机软件领域的技术人员公知且可用的类型。Computer-readable media may have computer code thereon that performs various computer-implemented operations. The media and computer code may be media and computer code specifically designed and constructed for the purposes of this disclosure, or the media and computer code may be of a type known and available to those skilled in the art of computer software.

作为非限制性示例，可由于一个或多个处理器(包括CPU、GPU、FPGA、加速器等)执行包含在一种或多种有形的计算机可读介质中的软件而使得具有架构(1800)，特别是内核(1840)的计算机系统提供功能。此类计算机可读介质可以是与如上所介绍的用户可访问的大容量存储相关联的介质，以及某些非暂时性内核(1840)的存储器，例如内核内部大容量存储器(1847)或ROM(1845)。实施本公开的各个实施例的软件可存储在此类设备中并由内核(1840)执行。根据特定需要，计算机可读介质可包括一个或多个存储设备或芯片。软件可使得内核(1840)，特别是其中的处理器(包括CPU、GPU、FPGA等)执行本文所描述的特定过程或特定过程的特定部分，包括定义存储在RAM(1846)中的数据结构以及根据由软件定义的过程来修改此类数据结构。除此之外或作为替代，可由于硬连线或以其它方式体现在电路(例如：加速器(1844))中的逻辑而使得计算机系统提供功能，该电路可替代软件或与软件一起运行以执行本文描述的特定过程或特定过程的特定部分。在适当的情况下，提及软件的部分可包含逻辑，反之亦然。在适当的情况下，提及计算机可读介质的部分可包括存储用于执行的软件的电路(例如，集成电路(IC))、体现用于执行的逻辑的电路或包括两者。本公开包括硬件和软件的任何合适的组合。As a non-limiting example, a computer system having an architecture (1800), particularly a kernel (1840), can be made functional by one or more processors (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software contained in one or more tangible computer-readable media. Such computer-readable media can be media associated with user-accessible mass storage as described above, and memory of certain non-transitory kernels (1840), such as internal kernel mass storage (1847) or ROM (1845). Software implementing the various embodiments of this disclosure can be stored in such devices and executed by the kernel (1840). Depending on specific needs, the computer-readable media may include one or more storage devices or chips. The software can cause the kernel (1840), particularly the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform specific processes or specific portions of specific processes described herein, including defining data structures stored in RAM (1846) and modifying such data structures according to software-defined processes. In addition to or as an alternative, the computer system may be made functional by hard-wired or otherwise embodied logic in circuitry (e.g., an accelerator (1844)), which may replace or operate with the software to perform a particular process or a particular portion of a particular process described herein. Where appropriate, references to software may include logic, and vice versa. Where appropriate, references to computer-readable media may include circuitry storing software for execution (e.g., an integrated circuit (IC)), circuitry embodying logic for execution, or both. This disclosure includes any suitable combination of hardware and software.

虽然本公开已描述多个示例性实施例，但是存在落入本公开的范围内的修改、置换和各种替换等效物。因此，应理解，本领域技术人员将能够设计出许多虽然未在本文中明确示出或描述，但是体现了本公开的原理，因此落入本公开的精神和范围内的系统和方法。While several exemplary embodiments have been described in this disclosure, modifications, substitutions, and various equivalent alternatives fall within the scope of this disclosure. Therefore, it should be understood that those skilled in the art will be able to design numerous systems and methods that, while not expressly shown or described herein, embody the principles of this disclosure and thus fall within its spirit and scope.

附录A：首字母缩写词Appendix A: Acronyms

JEM：联合探索模型JEM: Joint Exploration Model

VVC：下一代视频编码VVC: Next-Generation Video Coding

BMS：基准集BMS: Benchmark Set

MV：运动矢量MV: Motion Vector

HEVC：高效视频编码HEVC: High-Efficiency Video Coding

SEI：辅助增强信息SEI: Auxiliary Enhancement Information

VUI：视频可用性信息VUI: Video Availability Information

GOP：图片群组GOP: Image Group

TU：变换单元TU: Transformation Unit

PU：预测单元PU: Prediction Unit

CTU：编码树单元CTU: Coding Tree Unit

CTB：编码树块CTB: Coded Tree Block

PB：预测块PB: Prediction Block

HRD：假想参考解码器HRD: Hypothetical Reference Decoder

SNR：信噪比SNR: Signal-to-noise ratio

CPU：中央处理单元CPU: Central Processing Unit

GPU：图形处理单元GPU: Graphics Processing Unit

CRT：阴极射线管CRT: Cathode Ray Tube

LCD：液晶显示器LCD: Liquid Crystal Display

OLED：有机发光二极管OLED: Organic Light Emitting Diode

CD：光盘CD: CD-ROM

DVD：数字视频光盘DVD: Digital Video Disc

ROM：只读存储器ROM: Read-Only Memory

RAM：随机存取存储器RAM: Random Access Memory

ASIC：专用集成电路ASIC: Application-Specific Integrated Circuit

PLD：可编程逻辑器件PLD: Programmable Logic Device

LAN：局域网LAN: Local Area Network

GSM：全球移动通信系统GSM: Global System for Mobile Communications

LTE：长期演进LTE: Long Term Evolution

CANBus：控制器局域网络总线CANBus: Controller Area Network Bus

USB：通用串行总线USB: Universal Serial Bus

PCI：互连外围组件PCI: Interconnect Peripheral Components

FPGA：现场可编程门区域FPGA: Field Programmable Gate Domain

SSD：固态驱动器SSD: Solid State Drive

IC：集成电路IC: Integrated Circuit

HDR：高动态范围HDR: High Dynamic Range

SDR：标准动态范围SDR: Standard Dynamic Range

JVET：联合视频探索团队JVET: Joint Video Exploration Team

MPM：最可能模式MPM: Most Likely Pattern

WAIP：广角帧内预测WAIP: Wide-angle Intra-frame Prediction

CU：编码单元CU: Encoding Unit

PU：预测单元PU: Prediction Unit

TU：变换单元TU: Transformation Unit

CTU：编码树单元CTU: Coding Tree Unit

PDPC：位置相关预测组合PDPC: Location-Related Prediction Combination

ISP：帧内子分区ISP: Intra-Frame Sub-Partition

SPS：序列参数设置SPS: Sequence Parameter Settings

PPS：图片参数集PPS: Image Parameter Set

APS：自适应参数集APS: Adaptive Parameter Set

VPS：视频参数集VPS: Video Parameter Set

DPS：解码参数集DPS: Decoding Parameter Set

ALF：自适应环路滤波器ALF: Adaptive Loop Filter

SAO：采样自适应偏移SAO: Sampling Adaptive Offset

CC-ALF：跨分量自适应环路滤波器CC-ALF: Cross-component adaptive loop filter

CDEF：约束定向增强过滤器CDEF: Constrained Directional Enhancement Filter

CCSO：跨分量采样偏移CCSO: Cross-component sampling offset

LSO：局部采样偏移LSO: Local Sampling Offset

LR：环路恢复过滤器LR: Loop Recovery Filter

AV1：AOMedia视频1AV1: AOMedia Video 1

AV2：AOMedia视频2AV2: AOMedia Video 2

Claims

1. A method for decoding video data, characterized in that the method comprises:

Receive the encoded video stream of the data blocks;

Extract the transform partition type associated with the data block from the encoded video stream; and

In response to the fact that the transformation partition type belongs to a subset of a predetermined set of transformation partition types, the subset of the predetermined set of transformation partition types includes only PARTITION_NONE which does not perform transformation block partitioning, and each transformation partition type in the predetermined set of transformation partition types specifies a splitting mode for dividing the data block into transformation blocks:

Extract the transform type associated with the transform block split from the data block, the transform type being represented by a signal in the encoded video stream, wherein the transform type belongs to a first predetermined transform type set, the transform being a quadratic transform, and the first predetermined transform type set including the Caronan-Loy transform (KLT); and

In response to the transformation partition type not belonging to a subset of the predetermined transformation partition type set:

The transformation type associated with the transformation block is determined as a predetermined default transformation type;

According to the transformation type, perform an inverse transformation on the transformation block.

2. The method according to claim 1, wherein when the transformation is a primary transformation, the first predetermined transformation type set includes:

Discrete Cosine Transform (DCT) types 1 to 8;

Asymmetric Discrete Sine Transform (ADST);

Discrete Sine Transform (DST) types 1 to 8;

Line graph transformation LGT; and

Karonan-Louis Transform (KLT).

3. The method according to claim 1, characterized in that the method further comprises:

The number of transformation partitions associated with each transformation partition type in a subset of the predetermined transformation partition type set is less than or equal to a predetermined threshold.

4. The method according to claim 3, wherein the predetermined threshold comprises integers from 1 to 16, including 1 and 16.

5. The method according to claim 1, wherein the transformation type further indicates that the type of the main transformation associated with the transformation block belongs to a second predetermined transformation type set.

6. The method of claim 5, wherein the second predetermined transformation type set includes DCT and ADST.

7. An apparatus for decoding video data, comprising a memory and a processor, the memory storing instructions that cause the processor to perform the method according to any one of claims 1 to 6.

8. A method for encoding video data, characterized in that the method comprises:

Obtain the video bitstream of the data block;

Extract the transform partition type associated with the data block from the video bitstream; and

Extract the transform type associated with the transform block split from the data block, the transform type being represented by a signal in the video bitstream, wherein the transform type belongs to a first predetermined transform type set, the transform is a quadratic transform, and the first predetermined transform type set includes the Caronan-Loy transform (KLT); and

9. An apparatus for encoding video data, comprising a memory and a processor, the memory storing instructions that cause the processor to perform the method according to claim 8.

10. A non-transitory computer-readable medium storing instructions that, when executed by a computer for video decoding, cause the computer to perform the method according to any one of claims 1 to 6, and when executed by a computer for video encoding, cause the computer to perform the method according to claim 8.

11. A method for processing a video stream, characterized in that the video stream is decoded based on the method of any one of claims 1 to 6, or generated according to the method of claim 8.