WO2022257134A1 - Video encoding/decoding method, device and system, and storage medium - Google Patents

Video encoding/decoding method, device and system, and storage medium Download PDF

Info

Publication number
WO2022257134A1
WO2022257134A1 PCT/CN2021/099827 CN2021099827W WO2022257134A1 WO 2022257134 A1 WO2022257134 A1 WO 2022257134A1 CN 2021099827 W CN2021099827 W CN 2021099827W WO 2022257134 A1 WO2022257134 A1 WO 2022257134A1
Authority
WO
WIPO (PCT)
Prior art keywords
current block
mode
block
intra
autoencoder
Prior art date
Application number
PCT/CN2021/099827
Other languages
French (fr)
Chinese (zh)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180098992.4A priority Critical patent/CN117441336A/en
Priority to PCT/CN2021/099827 priority patent/WO2022257134A1/en
Publication of WO2022257134A1 publication Critical patent/WO2022257134A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Abstract

A video encoding/decoding method, device and system, and a storage medium. During decoding, an identifier of an intra-prediction mode of a current block is parsed from a received bitstream, and the intra-prediction mode of the current block is determined from a plurality of optional intra-prediction modes according to the identifier, wherein the plurality of optional intra-prediction modes comprise an autoencoder mode. During encoding, an intra-prediction mode of a current block is selected from a plurality of optional intra-prediction modes comprising an autoencoder mode; intra-prediction is performed on the current block on the basis of the intra-prediction mode of the current block, and an identifier of the intra-prediction mode of the current block is encoded and written into a bitstream. According to embodiments of the present disclosure, autoencoder-based image compression is introduced into a block level-based video codec, thereby improving the performance of a video codec.

Description

一种视频编解码方法、装置、系统及存储介质A video encoding and decoding method, device, system and storage medium 技术领域technical field
本公开实施例涉及但不限于视频压缩技术,尤其涉及一种视频编解码方法、装置、系统及存储介质。Embodiments of the present disclosure relate to, but are not limited to, video compression technologies, and in particular, relate to a video encoding and decoding method, device, system, and storage medium.
背景技术Background technique
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,目前仍然需要追求更好的数字视频压缩技术,以提高视频质量,减少数字视频传输的带宽和流量压力。Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage. With the proliferation of Internet videos and people's higher and higher requirements for video clarity, it is still necessary to pursue better digital video compression technology to improve video quality and reduce the bandwidth and traffic pressure of digital video transmission.
发明概述Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本公开实施例提供了一种视频解码方法,包括:An embodiment of the present disclosure provides a video decoding method, including:
从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,其中,所述多种可选择的帧内预测模式包括自编码器模式;Parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, wherein the multiple kinds of intra-frame prediction modes Selectable intra prediction modes include autoencoder mode;
根据所述当前块的帧内预测模式对所述当前块进行帧内预测。Perform intra-frame prediction on the current block according to the intra-frame prediction mode of the current block.
本公开实施例还提供了一种视频编码方法,包括:An embodiment of the present disclosure also provides a video coding method, including:
从多种可选择的帧内预测模式中选取当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式;selecting an intra prediction mode of the current block from a plurality of selectable intra prediction modes, the plurality of selectable intra prediction modes including an autoencoder mode;
基于所述当前块的帧内预测模式对所述当前块进行帧内预测,对所述当前块的帧内预测模式的标识编码并写入码流。Perform intra-frame prediction on the current block based on the intra-frame prediction mode of the current block, encode the identifier of the intra-frame prediction mode of the current block and write it into a code stream.
本公开一实施例还提供了一种视频解码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。An embodiment of the present disclosure also provides a video decoding device, including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, the implementation of the present disclosure The video decoding method described in any one of the embodiments.
本公开一实施例还提供了一种视频解码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:An embodiment of the present disclosure also provides a video decoding device, including an intra prediction processing unit, wherein the intra prediction processing unit includes:
模式选取单元,用于从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式;及,根据所述当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测;a mode selection unit, configured to parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, The multiple selectable intra-frame prediction modes include an autoencoder mode; and, according to the intra-frame prediction mode of the current block, activate a corresponding prediction unit to perform intra-frame prediction on the current block;
自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值。The self-encoder prediction unit is used for, when the intra-frame prediction mode of the current block is the self-encoder mode, based on the decoding network of the self-encoder corresponding to the self-encoder mode, for the supplementary information of the current block or for all The supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain the predicted value of the pixels in the current block.
本公开一实施例还提供了一种视频编码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。An embodiment of the present disclosure also provides a video encoding device, including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, the implementation of the present disclosure The video encoding method described in any one of the embodiments.
本公开一实施例还提供了一种视频编码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:An embodiment of the present disclosure also provides a video encoding device, including an intra-frame prediction processing unit, wherein the intra-frame prediction processing unit includes:
模式选取单元,用于从多种可选择的帧内预测模式中选取当前块的帧内预测模式,根据当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测,及对当前块的帧内预测模式的标识编码并写入码流;其中,所述多种可选择的帧内预测模式包括自编码器模式;The mode selection unit is used to select the intra prediction mode of the current block from a variety of selectable intra prediction modes, activate the corresponding prediction unit to perform intra prediction on the current block according to the intra prediction mode of the current block, and perform the intra prediction on the current block. The identification of the intra-frame prediction mode of the current block is coded and written into the code stream; wherein the multiple selectable intra-frame prediction modes include an autoencoder mode;
自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器的编码网络,对当前块中像素的原始值和/或当前块邻近的已重建参考信息进行第一非线性变换,得到当前块的补充信息;及,基于自编码器的解码网络,对当前块的补充信息或者对当前块的补充信息和邻近的已重建参考信息进行第二非线性变换,得到当前块中像素的预测值。The self-encoder prediction unit is used to reconstruct the original values of the pixels in the current block and/or the adjacent pixels of the current block based on the coding network of the self-encoder when the intra-frame prediction mode of the current block is the self-encoder mode performing a first non-linear transformation on the reference information to obtain supplementary information of the current block; and, based on the decoding network of an autoencoder, performing a second non-linear transformation on the supplementary information of the current block or on the supplementary information of the current block and adjacent reconstructed reference information Linear transformation to get the predicted value of the pixels in the current block.
本公开一实施例还提供了一种视频编解码系统,包括本公开任一实施例所述的视频编码装置和本公开任一实施例所述的视频解码装置。An embodiment of the present disclosure further provides a video encoding and decoding system, including the video encoding device described in any embodiment of the present disclosure and the video decoding device described in any embodiment of the present disclosure.
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频编码方法或视频解码方 法。An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized. The video encoding method or the video decoding method described in the example.
本公开一实施例还提供了一种码流,其中,所述码流根据本公开实施例的视频编码方法生成,所述码流中包括对所述当前块的帧内预测模式的标识和所述当前块的补充信息编码得到的码字;或者,所述码流中包括对所述当前块的帧内预测模式的标识、所述当前块的补充信息以及以下信息中的一种或多种编码得到的码字:所述当前块的网络参数标识、所述当前块中像素的残差值、所述当前块的残差标识。An embodiment of the present disclosure further provides a code stream, wherein the code stream is generated according to the video coding method of the embodiment of the present disclosure, and the code stream includes the identification of the intra prediction mode of the current block and the The codeword obtained by encoding the supplementary information of the current block; or, the code stream includes one or more of the identification of the intra prediction mode of the current block, the supplementary information of the current block, and the following information A codeword obtained by encoding: the network parameter identifier of the current block, the residual value of pixels in the current block, and the residual identifier of the current block.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent to others upon reading and understanding the drawings and detailed description.
附图概述Figure overview
附图用来提供对本公开实施例的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。The accompanying drawings are used to provide an understanding of the embodiments of the present disclosure, and constitute a part of the description, together with the embodiments of the present disclosure, are used to explain the technical solutions of the present disclosure, and do not constitute limitations on the technical solutions of the present disclosure.
图1是可用于本公开实施例的一种视频编解码系统的结构框图;FIG. 1 is a structural block diagram of a video codec system that can be used in an embodiment of the present disclosure;
图2是可用于本公开实施例的一种视频编码器的结构框图;FIG. 2 is a structural block diagram of a video encoder that can be used in an embodiment of the present disclosure;
图3是可用于本公开实施例的一种视频解码器的结构框图;FIG. 3 is a structural block diagram of a video decoder that can be used in an embodiment of the present disclosure;
图4是传统的帧内预测模式的示意图;FIG. 4 is a schematic diagram of a traditional intra prediction mode;
图5是基于自编码器的图像压缩的框架图;Fig. 5 is the frame diagram of the image compression based on self-encoder;
图6是本公开实施例移植到视频编解码框架中的自编码器的结构示意图;Fig. 6 is a schematic structural diagram of an autoencoder transplanted into a video codec framework according to an embodiment of the present disclosure;
图7是本公开实施例当前块邻近的参考像素的示意图;Fig. 7 is a schematic diagram of reference pixels adjacent to the current block according to an embodiment of the present disclosure;
图8是本公开实施例视频编码器中移植入自编码器后的帧内预测处理单元的结构示意图;FIG. 8 is a schematic structural diagram of an intra-frame prediction processing unit transplanted into a self-encoder in a video encoder according to an embodiment of the present disclosure;
图9是本公开实施例视频编码器中在第一自编码器模式下跳过残差处理时的参与编码处理的模块的示意图;9 is a schematic diagram of modules participating in encoding processing when residual processing is skipped in the first autoencoder mode in the video encoder according to an embodiment of the present disclosure;
图10是本公开一实施例的视频编码方法的流程图;FIG. 10 is a flowchart of a video encoding method according to an embodiment of the present disclosure;
图11是本公开实施例视频解码器中移植入自编码器后的帧内预测处理单元的结构示意图;FIG. 11 is a schematic structural diagram of an intra-frame prediction processing unit transplanted into a self-encoder in a video decoder according to an embodiment of the present disclosure;
图12是本公开实施例视频解码器中在第二自编码器模式下跳过残差处理时的参与解码处理的模块的示意图;12 is a schematic diagram of modules participating in decoding processing when residual processing is skipped in the second self-encoder mode in the video decoder according to an embodiment of the present disclosure;
图13是本公开一实施例的视频解码方法的流程图;FIG. 13 is a flowchart of a video decoding method according to an embodiment of the present disclosure;
图14是本公开一实施例视频编码器的结构示意图;FIG. 14 is a schematic structural diagram of a video encoder according to an embodiment of the present disclosure;
图15A至图15D、图16A至图16D、图17A至图17D分别是三组采用不同帧内预测方式得到的重建块与原始块的对比图。FIGS. 15A to 15D , FIGS. 16A to 16D , and FIGS. 17A to 17D are comparison diagrams of three sets of reconstructed blocks obtained by using different intra prediction methods and original blocks.
详述detail
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。The present disclosure describes a number of embodiments, but the description is illustrative rather than restrictive, and it will be apparent to those of ordinary skill in the art that within the scope encompassed by the described embodiments of the present disclosure, There are many more embodiments and implementations.
本公开的描述中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。本文中的“和/或”是对关联对象的关联关系的一种描述,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“多个”是指两个或多于两个。另外,为了便于清楚描述本公开实施例的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。In the description of the present disclosure, words such as "exemplary" or "for example" are used to mean an example, illustration or illustration. Any embodiment described in this disclosure as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments. "And/or" in this article is a description of the relationship between associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. "A plurality" means two or more than two. In addition, in order to clearly describe the technical solutions of the embodiments of the present disclosure, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not necessarily limit the difference.
在描述具有代表性的示例性实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。In describing representative exemplary embodiments, the specification may have presented a method and/or process as a particular sequence of steps. However, to the extent the method or process is not dependent on the specific order of steps described herein, the method or process should not be limited to the specific order of steps described. Other sequences of steps are also possible, as will be appreciated by those of ordinary skill in the art. Therefore, the specific order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, claims to the method and/or process should not be limited to performing their steps in the order written, as those skilled in the art can readily appreciate that such order can be varied and still remain within the spirit and scope of the disclosed embodiments Inside.
国际上,主流的视频编解码标准包括H.264/Advanced Video Coding(高级视频编码,AVC),H.265/High Efficiency Video Coding(高效视频编码,HEVC),H.266/Versatile Video Coding(多功能视频编码,VVC),MPEG(Moving Picture Experts Group,动态图像专家组),AOM(开放媒体联盟,Alliance for Open Media),AVS(Audio Video coding Standard,音视频编码标准)以及这些标准的拓展,或任何自定义的其他标准等,这些标准通过视频压缩技术减少传输的数据量和存储的数据量,以达到更高效的视频编解码和传输储存。Internationally, mainstream video codec standards include H.264/Advanced Video Coding (Advanced Video Coding, AVC), H.265/High Efficiency Video Coding (High Efficiency Video Coding, HEVC), H.266/Versatile Video Coding (Multiple Functional Video Coding, VVC), MPEG (Moving Picture Experts Group, Dynamic Image Experts Group), AOM (Open Media Alliance, Alliance for Open Media), AVS (Audio Video coding Standard, audio and video coding standards) and the expansion of these standards, Or any other custom standards, etc. These standards reduce the amount of transmitted data and stored data through video compression technology, so as to achieve more efficient video codec and transmission storage.
上述主流的视频编解码标准都采用了基于块的混合编码方式,这种编码方式先以块为基本单元进行帧内预测或帧间预测,然后对残差(也可简称为残差)进行变换、量化,将分块、预测等相关的语法元素、量化后残差等进行熵编码,得到已编码视频码流(文中简称为码流,也可称为位流、比特流等)。在视频编解码过程中,还可通过环路滤波来提高图像的重建质量。The above-mentioned mainstream video codec standards all adopt a block-based hybrid coding method. This coding method first uses the block as the basic unit to perform intra-frame prediction or inter-frame prediction, and then transforms the residual (also referred to as the residual) , quantization, entropy encoding is performed on related syntax elements such as block division and prediction, and residuals after quantization, to obtain encoded video code streams (abbreviated as code streams herein, and may also be referred to as bit streams, bit streams, etc.). In the video codec process, the reconstruction quality of the image can also be improved through loop filtering.
在H.264/AVC中,将输入图像划分成固定的尺寸的块作为编码的基本单元,并把它称为宏块(MB,Macro Block),包括一个亮度块和两个色度块,亮度块大小为16×16。如果采用4:2:0采样,色度块大小为亮度块大小的一半。在预测环节,根据预测模式的不同,将宏块进一步划分为用于预测的小块。帧内预测中可以把宏块划分成16×16、8×8、4×4的小块,每个小块分别进行帧内预测。在变换、量化环节,将宏块划分为4×4或8×8的小块,将每个小块中的残差分别进行变换和量化,得到量化后的系数。In H.264/AVC, the input image is divided into blocks of fixed size as the basic unit of encoding, and it is called a macro block (MB, Macro Block), including a luma block and two chrominance blocks, luma The block size is 16×16. If 4:2:0 sampling is used, the chroma block size is half of the luma block size. In the prediction link, according to different prediction modes, the macroblock is further divided into small blocks for prediction. In the intra-frame prediction, the macroblock can be divided into small blocks of 16×16, 8×8, and 4×4, and intra-frame prediction is performed on each small block. In the transformation and quantization link, the macroblock is divided into 4×4 or 8×8 small blocks, and the residual in each small block is transformed and quantized respectively to obtain quantized coefficients.
H.265/HEVC与H.264/AVC相比,在多个编码环节采取了改进措施。在H.265/HEVC中,一幅图像被分割成编码树单元(CTU,Coding Tree Unit),CTU是编码的基本单元(对应于H.264/AVC中的宏块)。一个CTU包含一个亮度编码树块(CTB,Coding Tree Block)和两个色度编码树块,H.265/HEVC标准中CU的最大尺寸一般为64×64。为了适应多种多样的视频内容和视频特征,CTU采用四叉树(QT,Quadro Tree)方式迭代划分为一系列编码单元(CU,Coding Unit),CU是帧内/帧间编码的基本单元。一个CU包含一个亮度编码块(CB,Coding Block)和两个色度编码块及相关语法结构,最大CU大小为CTU,最小CU大小为8×8。经过编码树划分得到的叶子节点CU根据预测方式的不同,可分为三种类型:帧内预测的intra CU、帧间预测的inter CU和skipped CU。skipped CU可以看作是inter CU的特例,不包含运动信息和残差信息。叶子节点CU包含一个或者多个预测单元(PU,Prediction Unit),H.265/HEVC支持4×4到64×64大小的PU,一共有八种划分模式。对于帧内编码模式,可能的划分模式有两种:Part_2Nx2N和Part_NxN。对于残差信号,CU采用残差四叉树划分为变换单元(TU:Transform Unit)。一个TU包含一个亮度变换块(TB,Transform Block)和两个色度变换块。仅允许方形的划分,将一个CB划分为1个或者4个PB。同一个TU具有相同的变换和量化过程,支持的大小为4×4到32×32。与之前的编码标准不同,在帧间预测中,TB可以跨越PB的边界,以进一步最大化帧间编码的编码效率。Compared with H.264/AVC, H.265/HEVC has taken improvement measures in multiple encoding links. In H.265/HEVC, an image is divided into coding tree units (CTU, Coding Tree Unit), and CTU is the basic unit of coding (corresponding to macroblocks in H.264/AVC). A CTU includes a luma coding tree block (CTB, Coding Tree Block) and two chrominance coding tree blocks. The maximum size of a CU in the H.265/HEVC standard is generally 64×64. In order to adapt to a variety of video content and video features, CTU is iteratively divided into a series of coding units (CU, Coding Unit) in the form of quadtree (QT, Quadro Tree). CU is the basic unit of intra/inter coding. A CU includes a luma coding block (CB, Coding Block) and two chroma coding blocks and related syntax structures. The maximum CU size is CTU, and the minimum CU size is 8×8. The leaf node CUs obtained through coding tree division can be divided into three types according to different prediction methods: intra CU for intra-frame prediction, inter CU for inter-frame prediction, and skipped CU. The skipped CU can be regarded as a special case of the inter CU, which does not contain motion information and residual information. The leaf node CU contains one or more prediction units (PU, Prediction Unit). H.265/HEVC supports PUs of 4×4 to 64×64 sizes, and there are eight division modes in total. For the intra coding mode, there are two possible division modes: Part_2Nx2N and Part_NxN. For the residual signal, the CU uses a residual quadtree to divide it into transform units (TU: Transform Unit). A TU includes a luma transform block (TB, Transform Block) and two chroma transform blocks. Only square division is allowed, and one CB is divided into 1 or 4 PBs. The same TU has the same transformation and quantization process, and the supported sizes are from 4×4 to 32×32. Different from previous coding standards, in inter prediction, TB can cross the boundary of PB to further maximize the coding efficiency of inter coding.
在VVC/H.266中,视频编码图像首先划分跟HEVC相似的编码树单元CTU,但是最大尺寸从64×64提高到了128×128。H.266/VVC提出了四叉树和嵌套多类型树(MTT,Multi-Type Tree)划分,MTT包括二叉树(BT,Binary Tree)和三叉树(TT,Ternary Tree),且统一了H.265/HEVC中CU、PU、TU的概念,并且支持更灵活的CU划分形状。CTU按照四叉树结构进行划分,叶子节点通过MTT进一步划分。多类型树叶子节点成为编码单元CU,当CU不大于最大变换单元(64×64)时,后续预测和变换不会再进一步划分。大部分情况下CU、PU、TU具有相同的大小。考虑到亮度和色度的不同特性和具体实现的并行度,VVC/H.266中,色度可以采用单独的划分树结构,而不必和亮度划分树保持一致。H.266/VVC中I帧的色度划分采用色度分离树,P帧和B帧色度划分则与亮度划分保持一致。In VVC/H.266, video coded images are first divided into coding tree units CTU similar to HEVC, but the maximum size is increased from 64×64 to 128×128. H.266/VVC proposed quadtree and nested multi-type tree (MTT, Multi-Type Tree) division, MTT includes binary tree (BT, Binary Tree) and ternary tree (TT, Ternary Tree), and unified H. 265/HEVC concepts of CU, PU, and TU, and supports more flexible CU division shapes. The CTU is divided according to the quadtree structure, and the leaf nodes are further divided by MTT. The leaf nodes of the multi-type tree become the coding unit CU. When the CU is not larger than the largest transformation unit (64×64), subsequent prediction and transformation will not be further divided. In most cases CU, PU, TU have the same size. Considering the different characteristics of luma and chroma and the parallelism of specific implementations, in VVC/H.266, chroma can use a separate division tree structure, instead of keeping consistent with the luma division tree. The chroma division of I frame in H.266/VVC adopts chroma separation tree, and the chroma division of P frame and B frame is consistent with the luma division.
图1为可用于本公开实施例的一种视频编解码系统的框图。如图1所示,该系统分为编码侧装置1和解码侧装置2,编码侧装置1产生码流。解码侧装置2可对码流进行解码。编码侧装置1和解码侧装置2可包含一个或多个处理器以及耦合到所述一个或多个处理器的存储器,如随机存取存储器、带电可擦可编程只读存储器、快闪存储器或其它媒体。编码侧装置1和解码侧装置2可以用各种装置实现,如台式计算机、移动计算装置、笔记本电脑、平板计算机、机顶盒、电视机、相机、显示装置、数字媒体播放器、车载计算机或其他类似的装置。FIG. 1 is a block diagram of a video encoding and decoding system applicable to an embodiment of the present disclosure. As shown in FIG. 1 , the system is divided into an encoding-side device 1 and a decoding-side device 2 , and the encoding-side device 1 generates code streams. The decoding side device 2 can decode the code stream. The encoding side device 1 and the decoding side device 2 may include one or more processors and memory coupled to the one or more processors, such as random access memory, charged erasable programmable read-only memory, flash memory or other media. The encoding side device 1 and the decoding side device 2 can be implemented with various devices, such as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, vehicle-mounted computers, or other similar installation.
解码侧装置2可经由链路3从编码侧装置1接收码流。链路3包括能够将码流从编码侧装置1移动到解码侧装置2的一个或多个媒体或装置。在一个示例中,链路3包括使得编码侧装置1能够将码流直接发送到解码侧装置2的一个或多个通信媒体。编码侧装置1可根据通信标准(例如无线通信协议)来调制码流,且可将经调制的码流发送到解码侧装置2。所述一个或多个通信媒体可包含无线和/或有线通信媒体,例如射频(radio frequency,RF)频谱或一个或多个物理传输线。所述一个或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一个或多个通信媒体可包含路由器、交换器、基站或促进从编码侧装置1到解码侧装置2的通信的其它设备。在另一示例中,也 可将码流从输出接口15输出到一个存储装置,解码侧装置2可经由流式传输或下载从该存储装置读取所存储的数据。该存储装置可包含多种分布式存取或本地存取的数据存储媒体中的任一种,例如硬盘驱动器、蓝光光盘、数字多功能光盘、只读光盘、快闪存储器、易失性或非易失性存储器、文件服务器等等。The device 2 on the decoding side can receive the code stream from the device 1 on the encoding side via the link 3 . The link 3 includes one or more media or devices capable of moving the code stream from the device 1 on the encoding side to the device 2 on the decoding side. In one example, the link 3 includes one or more communication media that enable the device 1 on the encoding side to directly transmit the code stream to the device 2 on the decoding side. The device 1 on the encoding side can modulate the code stream according to a communication standard (such as a wireless communication protocol), and can send the modulated code stream to the device 2 on the decoding side. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from device 1 on the encoding side to device 2 on the decoding side. In another example, the code stream can also be output from the output interface 15 to a storage device, and the decoding-side device 2 can read the stored data from the storage device via streaming or downloading. The storage device may comprise any of a variety of distributed-access or locally-accessed data storage media, such as hard disk drives, Blu-ray Discs, Digital Versatile Discs, CD-ROMs, flash memory, volatile or non-volatile Volatile memory, file servers, and more.
在图1所示的示例中,编码侧装置1包含数据源11、编码器13和输出接口15。在一些示例中。数据源11可包括视频捕获装置(例如,摄像机)、含有先前捕获的数据的存档、用以从内容提供者接收数据的馈入接口,用于产生数据的计算机图形系统,或这些来源的组合。编码器13可对来自数据源11的数据进行编码后输出到输出接口15,输出接口15可包含调节器、调制解调器和发射器中的至少之一。In the example shown in FIG. 1 , the encoding side device 1 includes a data source 11 , an encoder 13 and an output interface 15 . In some examples. Data sources 11 may include video capture devices (eg, video cameras), archives containing previously captured data, feed interfaces to receive data from content providers, computer graphics systems to generate data, or combinations of these sources. The encoder 13 can encode the data from the data source 11 and output it to the output interface 15, and the output interface 15 can include at least one of an adjuster, a modem and a transmitter.
在图1所示的示例中,解码侧装置2包含输入接口21、解码器23和显示装置25。在一些示例中,输入接口21包含接收器和调制解调器中的至少之一。输入接口21可经由链路3或从存储装置接收码流。解码器23对接收的码流进行解码。显示装置25用于显示解码后的数据,显示装置25可与解码侧装置2的其他装置集成在一起或者单独设置。显示装置25例如可以是液晶显示器、等离子显示器、有机发光二极管显示器或其它类型的显示装置。在其他示例中,解码侧装置2也可以不包含所述显示装置25,或者包含应用解码后数据的其他装置或设备。In the example shown in FIG. 1 , the decoding side device 2 includes an input interface 21 , a decoder 23 and a display device 25 . In some examples, input interface 21 includes at least one of a receiver and a modem. The input interface 21 can receive the code stream via the link 3 or from a storage device. The decoder 23 decodes the received code stream. The display device 25 is used for displaying the decoded data, and the display device 25 may be integrated with other devices of the decoding side device 2 or provided separately. The display device 25 may be, for example, a liquid crystal display, a plasma display, an organic light emitting diode display or other types of display devices. In other examples, the device 2 on the decoding side may not include the display device 25 , or may include other devices or devices for applying the decoded data.
图1的编码器13和解码器23可使用以下电路中的任意一种或者以下电路的任意组合来实现:一个或多个微处理器、数字信号处理器、专用集成电路、现场可编程门阵列、离散逻辑、硬件。如果部分地以软件来实施本公开,那么可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本公开方法。 Encoder 13 and decoder 23 of Fig. 1 can use any one in the following circuits or any combination of following circuits to realize: one or more microprocessors, digital signal processors, application specific integrated circuits, field programmable gate arrays , discrete logic, hardware. If the present disclosure is implemented partially in software, instructions for the software may be stored in a suitable non-transitory computer-readable storage medium and executed in hardware using one or more processors to thereby Implement the disclosed method.
图2所示是一种示例性的视频编码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频编码器的结构也可以用于H.264/AVC、VVC/H.266、MPEG、AOM、AVS、及类似标准以及这些标准的后续和扩展中。Fig. 2 is a structural block diagram of an exemplary video encoder. In this example, the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video encoder can also be used for H.264/AVC, VVC/H.266, MPEG, AOM, AVS , and similar standards, as well as successors and extensions of these standards.
如图所示,视频编码器20用于对视频数据编码,生成码流。如图所示,视频编码器20包含预测处理单元100、划分单元101、残差产生单元102、变换处理单元104、量化单元106、反量化单元108、反变换处理单元110、重建单元112、滤波器单元113、已解码图片缓冲器114,以及熵编码单元116。预测处理单元100包含帧间预测处理单元121和帧内预测处理单元126。在其他实施例中,视频编码器20可以包含比该示例更多、更少或不同功能组件。残差产生单元102和重建单元112在图中均用带加号的圆圈表示。As shown in the figure, the video encoder 20 is used to encode video data to generate code streams. As shown in the figure, the video encoder 20 includes a prediction processing unit 100, a division unit 101, a residual generation unit 102, a transformation processing unit 104, a quantization unit 106, an inverse quantization unit 108, an inverse transformation processing unit 110, a reconstruction unit 112, a filter Decoder unit 113, decoded picture buffer 114, and entropy coding unit 116. The prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126 . In other embodiments, video encoder 20 may contain more, fewer or different functional components than this example. Both the residual generation unit 102 and the reconstruction unit 112 are represented by circles with plus signs in the figure.
划分单元101与预测处理单元100配合将接收的视频数据划分为切片(Slice)、CTU或其它较大的单元。划分单元101接收的视频数据可以是包括I帧、P帧或B帧等视频帧的视频序列。The division unit 101 cooperates with the prediction processing unit 100 to divide the received video data into slices (Slices), CTUs or other larger units. The video data received by the dividing unit 101 may be a video sequence including video frames such as I frames, P frames, or B frames.
预测处理单元100可以将CTU划分为CU,对CU执行帧内预测编码或帧间预测编码。对CU做帧内编码时,可以将2N×2N的CU划分为2N×2N或N×N的预测单元(PU:prediction unit)进行帧内预测。对CU做帧间预测时,可以将2N×2N的CU划分为2N×2N、2N×N、N×2N、N×N或其他大小的PU进行帧间预测,也可以支持对PU的不对称划分。The prediction processing unit 100 may divide a CTU into CUs, and perform intra-frame predictive coding or inter-frame predictive coding on the CUs. When performing intra-frame coding on a CU, the 2N×2N CU can be divided into 2N×2N or N×N prediction units (PU: prediction unit) for intra-frame prediction. When performing inter-frame prediction on a CU, the 2N×2N CU can be divided into PUs of 2N×2N, 2N×N, N×2N, N×N or other sizes for inter-frame prediction, and asymmetrical PUs can also be supported divided.
帧间预测处理单元121可对PU执行帧间预测,产生PU的预测数据,所述预测数据包括PU的预测块、PU的运动信息和各种语法元素。The inter prediction processing unit 121 may perform inter prediction on the PU to generate prediction data of the PU, the prediction data including the prediction block of the PU, motion information of the PU and various syntax elements.
帧内预测处理单元126可对PU执行帧内预测,产生PU的预测数据。PU的预测数据可包含PU的预测块和各种语法元素。帧内预测处理单元126可尝试多种可选择的帧内预测模式,从中选取代价最小的一种帧内预测模式来执行对PU的帧内预测。The intra prediction processing unit 126 may perform intra prediction on the PU to generate prediction data for the PU. The prediction data for a PU may include the prediction block and various syntax elements for the PU. The intra-frame prediction processing unit 126 may try multiple selectable intra-frame prediction modes, and select an intra-frame prediction mode with the least cost to perform intra-frame prediction on the PU.
残差产生单元102可基于CU的原始块和CU划分的PU的预测块,产生CU的残差块。The residual generation unit 102 may generate the residual block of the CU based on the original block of the CU and the prediction block of the PU into which the CU is divided.
变换处理单元104可将CU划分为一个或多个变换单元(TU:Transform Unit),TU关联的残差块是CU的残差块划分得到的子块。通过将一种或多种变换应用于TU关联的残差块来产生TU关联的系数块。例如,变换处理单元104可将离散余弦变换(DCT:Discrete Cosine Transform)、方向性变换或其他的变换应用于TU关联的残差块,可将残差块从像素域转换到频域。在一些情况下也可以跳过变换处理。The transform processing unit 104 may divide the CU into one or more transform units (TU: Transform Unit), and the residual block associated with the TU is a sub-block obtained by dividing the residual block of the CU. A TU-associated coefficient block is generated by applying one or more transforms to the TU-associated residual block. For example, the transform processing unit 104 may apply discrete cosine transform (DCT: Discrete Cosine Transform), directional transform or other transforms to the residual block associated with the TU, and may convert the residual block from the pixel domain to the frequency domain. Transformation processing may also be skipped in some cases.
量化单元106可基于选定的量化参数(QP:Quantitative parameters)对系数块中的系数进行量化,量化可能会带来量化损失(quantitative losses),通过调整QP值可以调整对系数块的量化程度。在一些情况下也可以跳过量化处理。The quantization unit 106 can quantize the coefficients in the coefficient block based on selected quantization parameters (QP: Quantitative parameters). Quantization may cause quantization losses (quantitative losses). By adjusting the QP value, the degree of quantization of the coefficient block can be adjusted. Quantization processing can also be skipped in some cases.
反量化单元108和反变换单元110可分别将反量化和反变换应用于系数块,得到TU关联的重建残差块。The inverse quantization unit 108 and the inverse transformation unit 110 may respectively apply inverse quantization and inverse transformation to the coefficient blocks to obtain TU-associated reconstructed residual blocks.
重建单元112可基于所述重建残差块和预测处理单元100产生的预测块,产生CU的重建块。The reconstruction unit 112 may generate a reconstruction block of the CU based on the reconstruction residual block and the prediction block generated by the prediction processing unit 100 .
滤波器单元113对所述重建块执行环路滤波后存储在已解码图片缓冲器114中。帧内预测处理单元126可以从已解码图片缓冲器114缓存的重建块中提取PU邻近的已重建参考信息以对PU执行帧内预测。帧间预测处理单元121可使用已解码图片缓冲器114缓存的含有重建块的参考图片对其他图片的PU执行帧间预测。The filter unit 113 performs loop filtering on the reconstructed block and stores it in the decoded picture buffer 114 . The intra prediction processing unit 126 may extract the reconstructed reference information adjacent to the PU from the reconstructed blocks cached in the decoded picture buffer 114 to perform intra prediction on the PU. Inter prediction processing unit 121 may perform inter prediction on PUs of other pictures using reference pictures cached by decoded picture buffer 114 that contain reconstructed blocks.
熵编码单元116可以对接收的数据(如语法元素、量化后的系统块、运动信息等)执行熵编码操作,如执行上下文自适应可变长度编码(CAVLC:Context Adaptive Variable Length Coding)、上下文自适应二进制算术编码(CABAC:Context-based Adaptive Binary Arithmetic Coding)等,输出码流(即已编码视频码流)。The entropy coding unit 116 can perform entropy coding operations on received data (such as syntax elements, quantized system blocks, motion information, etc.), such as performing context adaptive variable length coding (CAVLC: Context Adaptive Variable Length Coding), context self- Adapt to Binary Arithmetic Coding (CABAC: Context-based Adaptive Binary Arithmetic Coding), etc., and output code stream (that is, coded video code stream).
图3所示是一种示例性的视频解码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频解码器的结构也可以用于H.264/AVC、VVC/H.266及其他类似标准的视频解码。FIG. 3 is a structural block diagram of an exemplary video decoder. In this example, the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video decoder can also be used for videos of H.264/AVC, VVC/H.266 and other similar standards decoding.
视频解码器30可对接收的码流解码,输出已解码视频数据。如图所示,视频解码器30包含熵解码单元150、预测处理单元152、反量化单元154、反变换处理单元156、重建单元158(图中用带加号的圆圈表示)、滤波器单元159,以及图片缓冲器160。在其它实施例中,视频解码器30可以包含更多、更少或不同的功能组件。The video decoder 30 can decode the received code stream and output decoded video data. As shown in the figure, the video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transformation processing unit 156, a reconstruction unit 158 (indicated by a circle with a plus sign in the figure), a filter unit 159 , and the picture buffer 160. In other embodiments, video decoder 30 may contain more, fewer or different functional components.
熵解码单元150可对接收的码流进行熵解码,提取语法元素、量化后的系数块和PU的运动信息等信息。预测处理单元152、反量化单元154、反变换处理单元156、重建单元158以及滤波器单元159均可基于从码流提取的语法元素来执行相应的操作。The entropy decoding unit 150 may perform entropy decoding on the received code stream to extract information such as syntax elements, quantized coefficient blocks, and PU motion information. The prediction processing unit 152 , the inverse quantization unit 154 , the inverse transform processing unit 156 , the reconstruction unit 158 and the filter unit 159 can all perform corresponding operations based on the syntax elements extracted from the code stream.
作为执行重建操作的功能组件,反量化单元154可对量化后的TU关联的系数块进行反量化。反变换处理单元156可将一种或多种反变换应用于反量化后的系数块以便产生TU的重建残差块。As a functional component performing a reconstruction operation, the inverse quantization unit 154 may inverse quantize the quantized TU-associated coefficient blocks. Inverse transform processing unit 156 may apply one or more inverse transforms to the inverse quantized coefficient block in order to generate the reconstructed residual block of the TU.
预测处理单元152包含帧间预测处理单元162和帧内预测处理单元164。如果PU使用帧内预测编码,帧内预测处理单元164可基于从码流解析出的语法元素确定PU的帧内预测模式,根据确定的帧内预测模式和从图片缓冲器件60获取的PU邻近的已重建参考信息执行帧内预测,产生PU的预测块。如果PU使用帧间预测编码,帧间预测处理单元162可基于PU的运动信息和相应的语法元素来确定PU的一个或多个参考块,基于所述参考块来产生PU的预测块。Prediction processing unit 152 includes inter prediction processing unit 162 and intra prediction processing unit 164 . If the PU is encoded using intra-frame prediction, the intra-frame prediction processing unit 164 can determine the intra-frame prediction mode of the PU based on the syntax elements parsed from the code stream, and according to the determined intra-frame prediction mode and the adjacent PU obtained from the picture buffer device 60 Intra prediction is performed on the reconstructed reference information, resulting in a prediction block of the PU. If the PU is encoded using inter-prediction, inter-prediction processing unit 162 may determine one or more reference blocks for the PU based on the motion information of the PU and corresponding syntax elements to generate a predictive block for the PU.
重建单元158可基于TU关联的重建残差块和预测处理单元152产生的PU的预测块(即帧内预测数据或帧间预测数据),得到CU的重建块。The reconstruction unit 158 may obtain the reconstruction block of the CU based on the reconstruction residual block associated with the TU and the prediction block of the PU generated by the prediction processing unit 152 (ie intra prediction data or inter prediction data).
滤波器单元159可对CU的重建块执行环路滤波,得到重建的图片。重建的图片存储在图片缓冲器160中。图片缓冲器160可提供参考图片以用于后续运动补偿、帧内预测、帧间预测等,也可将重建的视频数据作为已解码视频数据输出,在显示装置上的呈现。The filter unit 159 may perform loop filtering on the reconstructed block of the CU to obtain a reconstructed picture. The reconstructed pictures are stored in the picture buffer 160 . The picture buffer 160 can provide reference pictures for subsequent motion compensation, intra prediction, inter prediction, etc., and can also output the reconstructed video data as decoded video data for presentation on a display device.
帧内预测的基本思想就是利用相邻像素的相关性去除空间冗余。在一些视频编解码方法中,帧内预测通常借助各种帧内预测模式对当前块进行帧内预测,通过率失真优化(Rate Distortion Optimized)筛选出当前块最优的帧内预测模式,根据该最优的帧内预测模式进行帧内预测编码,并将该帧内预测模式的信息写入码流。解码端从码流中解析出所述帧内预测模式,根据所述帧内预测模式预测得到当前块的预测数据。The basic idea of intra prediction is to use the correlation of adjacent pixels to remove spatial redundancy. In some video coding and decoding methods, intra-frame prediction usually uses various intra-frame prediction modes to perform intra-frame prediction on the current block, and selects the best intra-frame prediction mode for the current block through rate-distortion optimization (Rate Distortion Optimized), according to the The optimal intra-frame prediction mode performs intra-frame prediction encoding, and writes the information of the intra-frame prediction mode into the code stream. The decoding end parses the intra-frame prediction mode from the code stream, and predicts and obtains the prediction data of the current block according to the intra-frame prediction mode.
经过历代的数字视频编解码标准发展,帧内预测中的非角度模式保持相对稳定,包括均值模式和平面模(planar)式;角度模式则随着数字视频编解码标准的演进而不断增加,以国际数字视频编码标准H系列为例,H.264/AVC标准仅有8种角度预测模式和1种非角度预测模式;H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。在H.266/VVC中,帧内预测模式被进一步拓展。对于亮度块,可选择的帧内预测模式包括:模式0(planar模式)、模式1(DC模式)、模式2到66(均为角度模式),如图4所示。VTM对于色度块除了planar模式、DC模式和角度模式外,还有矩阵加权帧内预测(MIP:(Matrix weighted intra prediction)模式、分量间线性模型(CCLM:Cross-component linear model prediction)模式。MIP模式为VVC独有,而CCLM模式也存在于其它先进的标准里,例如AV1的Chroma from Luma(CfL)模式和AVS3的Two Step Cross-component Prediction Mode(TSCPM)模式。After the development of digital video codec standards in the past generations, the non-angle modes in intra prediction remain relatively stable, including the mean mode and planar mode; the angle mode continues to increase with the evolution of digital video codec standards. Take the international digital video coding standard H series as an example, the H.264/AVC standard only has 8 angle prediction modes and 1 non-angle prediction mode; H.265/HEVC extends to 33 angle prediction modes and 2 non-angle prediction modes . In H.266/VVC, the intra prediction mode is further extended. For luma blocks, selectable intra prediction modes include: mode 0 (planar mode), mode 1 (DC mode), modes 2 to 66 (all angle modes), as shown in FIG. 4 . In addition to planar mode, DC mode and angle mode for chroma blocks, VTM also has matrix weighted intra prediction (MIP: (Matrix weighted intra prediction) mode, CCLM: Cross-component linear model prediction) mode. The MIP mode is unique to VVC, and the CCLM mode also exists in other advanced standards, such as AV1's Chroma from Luma (CfL) mode and AVS3's Two Step Cross-component Prediction Mode (TSCPM) mode.
在一些视频编解码方法中,变换可分为一次变换(primary transform)和二次变换(secondary transform),变换在预测阶段后进行,在当下流行的编解码器中,变换作用与残差,目的是将残差的能量尽可能集中到块的左上角。变换也可以有多种模式可以选择,以VVC为例,一次变换的变换核可以是DCT2-DCT2,DST7-DST7,DCT8-DCT8,DST7-DCT8或DCT8-DST7。In some video codec methods, the transformation can be divided into primary transform and secondary transform. The transformation is carried out after the prediction stage. In the current popular codec, the transformation effect and residual, the purpose is to concentrate the energy of the residual as much as possible in the upper left corner of the block. Transformation can also have a variety of modes to choose from. Taking VVC as an example, the transformation core of one transformation can be DCT2-DCT2, DST7-DST7, DCT8-DCT8, DST7-DCT8 or DCT8-DST7.
在一些视频编解码方法中,量化作用与变换后的残差块,常用的量化方式有率失真优化量化(RDOQ: rate-distortion optimization quantization)算法和网格编码量化(TCQ:Trellis Coded Quantization)。RDOQ和TCQ的量化都是通过标量量化器完成。In some video encoding and decoding methods, the quantization function and the transformed residual block, commonly used quantization methods include rate-distortion optimization quantization (RDOQ: rate-distortion optimization quantization) algorithm and trellis coding quantization (TCQ: Trellis Coded Quantization). The quantization of RDOQ and TCQ is done by scalar quantizer.
基于深度学习的端到端图像压缩效果已经接近或超越了很多传统算法,目前端到端的图片压缩大多使用了自编码器(autoencoder,AE)结构。这里的自编码器可以理解为是一种输入和学习目标相同的神经网络,目的是让输出接近于输入,可以用于提取特征。AE将编码端和解码端当作一个整体,同时进行训练优化。一种示例性的自编码器的网络结构如图5所示。The end-to-end image compression effect based on deep learning has approached or surpassed many traditional algorithms. At present, most end-to-end image compression uses an autoencoder (autoencoder, AE) structure. The autoencoder here can be understood as a neural network with the same input and learning goals, the purpose is to make the output close to the input, which can be used to extract features. AE regards the encoding end and the decoding end as a whole, and performs training optimization at the same time. An exemplary network structure of an autoencoder is shown in FIG. 5 .
在编码端,z为待压缩的图像,z通过设定的变换得到向量x,表示为x=g p(z)(该步骤可选);x通过训练得到的编码网络g a进行非线性变换得到y,即y=g a(x;φ),φ表示编码网络g a需要训练的网络参数,如权重、偏置等。y经过设定的量化方法将浮点数的y变为q(图中未示出)。R表示q经熵编码后消耗的比特数为R。由于熵编码是无损的,因此熵编码和熵解码可以不出现在训练中,图中未示出熵编码和熵解码的处理。 At the encoding end, z is the image to be compressed, and z obtains the vector x through the set transformation, expressed as x=g p (z) (this step is optional); x performs nonlinear transformation through the trained encoding network g a Obtain y, that is, y=g a (x; φ), φ represents the network parameters that the encoding network g a needs to train, such as weight, bias, etc. y changes the floating-point number y into q through the set quantization method (not shown in the figure). R indicates that the number of bits consumed by q after entropy encoding is R. Since entropy coding is lossless, entropy coding and entropy decoding may not appear in training, and the processing of entropy coding and entropy decoding is not shown in the figure.
在解码端,q经过反量化变为
Figure PCTCN2021099827-appb-000001
通过训练出的解码网络g s进行非线性变换得到
Figure PCTCN2021099827-appb-000002
Figure PCTCN2021099827-appb-000003
θ表示解码网络g s需要训练的网络参数,如权重、偏置等。
Figure PCTCN2021099827-appb-000004
再通过设定的变换转变为重建的图像z
Figure PCTCN2021099827-appb-000005
表示为
Figure PCTCN2021099827-appb-000006
待压缩图像z和重建的图像
Figure PCTCN2021099827-appb-000007
之间的损失为D,表示为
Figure PCTCN2021099827-appb-000008
通过在训练集上的训练,达到预期的拉格朗日率失真代价,可以获得编码网络的网络参数φ和解码网络的网络参数θ。
At the decoding end, q becomes
Figure PCTCN2021099827-appb-000001
Through the nonlinear transformation of the trained decoding network g s , it is obtained
Figure PCTCN2021099827-appb-000002
which is
Figure PCTCN2021099827-appb-000003
θ represents the network parameters that the decoding network g s needs to be trained, such as weights, biases, etc.
Figure PCTCN2021099827-appb-000004
Then transform into the reconstructed image z through the set transformation
Figure PCTCN2021099827-appb-000005
Expressed as
Figure PCTCN2021099827-appb-000006
The image z to be compressed and the reconstructed image
Figure PCTCN2021099827-appb-000007
The loss between is D, expressed as
Figure PCTCN2021099827-appb-000008
Through the training on the training set, the expected Lagrangian rate-distortion cost can be achieved, and the network parameters φ of the encoding network and the network parameters θ of the decoding network can be obtained.
本申请的发明人通过对上述传统的视频编解码方法以及自编码器图像压缩方法的研究发现,传统的视频编解码方法中,帧内预测采用的是线性计算方式,对于一些细节特征复杂的块,重建块的主观质量较差。而端到端的视频编解码方法利用基于自编码器的图像压缩技术,使用神经网络实现完整的视频编解码框架,在细节特征复杂的区域展现了良好的效果,但对细节特征简单的区域通常效果不如传统的预测编码。另外,在端到端的视频编解码方法中,量化是使用一个简单的标量量化器实现,不能灵活调整,难以达到理想的性能。The inventors of the present application have found through research on the above-mentioned traditional video encoding and decoding methods and self-encoder image compression methods that in traditional video encoding and decoding methods, the intra-frame prediction adopts a linear calculation method, and some blocks with complex details , the subjective quality of the reconstructed blocks is poor. The end-to-end video encoding and decoding method uses the image compression technology based on the self-encoder, and uses the neural network to realize the complete video encoding and decoding framework, which shows good results in areas with complex detail features, but usually has no effect on areas with simple detail features. Not as good as traditional predictive coding. In addition, in the end-to-end video codec method, quantization is realized by using a simple scalar quantizer, which cannot be adjusted flexibly, and it is difficult to achieve ideal performance.
本公开实施例将基于自编码器的图像压缩方法引入基于块级的视频编解码器,对传统的帧内预测进行拓展,在传统帧内预测模式的基础上增加了基于自编码器实现的帧内预测模式(称为自编码器模式),以利用基于自编码器的图像压缩对于细节特征复杂区域编码性能好的特性,同时保留传统的预测编码对细节特征简单的区域编码效果好的特点,提升视频编解器的性能。The embodiment of the present disclosure introduces the image compression method based on the self-encoder into the block-level video codec, expands the traditional intra-frame prediction, and adds the frame based on the self-encoder to the traditional intra-frame prediction mode. Intra-prediction mode (called self-encoder mode) to take advantage of self-encoder-based image compression for the coding performance of complex areas with detailed features, while retaining the characteristics of traditional predictive coding for areas with simple detailed features, Improve video codec performance.
为了将基于自编码器的图像压缩方法引入基于块级的视频编解码器,本公开实施例提供了一种适用于视频编解码中帧内预测的自编码网络。如图6所示。该自编码器网络由编码网络51(对应于图中的编码网络g a)和解码网络52(对应于图中的解码网络g s)构成,编码网络51和解码网络52可由一系列的隐藏层和激活函数组成。隐藏层可为卷积和/或全连接的计算等非线性计算组成,图中的编码网络51和解码网络52均包括4个隐藏层和设置在相邻隐藏层之间的3个激活函数,但这仅仅是示例性,编码网络51和解码网络52可以设置为其他数量。在其他示例中,自编码器的编码网络51和解码网络52也可以采用其他形式的结构实现。 In order to introduce an autoencoder-based image compression method into a block-level video codec, an embodiment of the present disclosure provides an autoencoder network suitable for intra-frame prediction in video codec. As shown in Figure 6. The self-encoder network is composed of an encoding network 51 (corresponding to the encoding network g a in the figure) and a decoding network 52 (corresponding to the decoding network g s in the figure), and the encoding network 51 and the decoding network 52 can consist of a series of hidden layers and the activation function. The hidden layer can be composed of nonlinear calculations such as convolution and/or fully connected calculations. The encoding network 51 and decoding network 52 in the figure both include 4 hidden layers and 3 activation functions arranged between adjacent hidden layers. But this is only an example, and the encoding network 51 and the decoding network 52 can be set to other numbers. In other examples, the encoding network 51 and the decoding network 52 of the autoencoder may also be implemented in other forms of structures.
对自编码器进行训练时,在如图6所示的实施例中,可以选取视频编解码中使用的样本块,以样本块内像素的原始值和样本块邻近的已重建参考信息作为编码网络51的输入,在编码网络进行第一非线性变换得到补充信息。在本实施例中,是将该补充信息和样本块邻近的已重建参考信息一起作为解码网络52的输入,在解码网络中进行第二非线性变换,得到样本块内像素的预测值。When training the self-encoder, in the embodiment shown in Figure 6, the sample block used in video encoding and decoding can be selected, and the original value of the pixel in the sample block and the reconstructed reference information adjacent to the sample block can be used as the encoding network 51 input, the first nonlinear transformation is performed in the encoding network to obtain supplementary information. In this embodiment, the supplementary information and the reconstructed reference information adjacent to the sample block are used as the input of the decoding network 52, and the second nonlinear transformation is performed in the decoding network to obtain the predicted value of the pixels in the sample block.
所述自编码器的编码网络可采用降维的神经网络,解码网络可采用升维的神经网络。在一示例中,训练时可以使用4x4~32x32之间的样本块,对于不同大小的块,可以使用元素个数不同的补充信息,较大的块可以使用较多元素的补充信息。例如,对于小于16x16的块(4x8,4x16,16x4,8x4,4x4,8x16,16x8,8x8)的块,通过编码网络降维,得到的补充信息为1x2的向量,对于其它等于或大于16x16的块,得到的补充信息为1x4的向量。但这仅仅是示例性,对于补充信息的维度,本公开不做限制,例如也可以是多行多列元素组成的矩阵。The encoding network of the autoencoder can adopt a dimension-reducing neural network, and the decoding network can adopt a dimension-raising neural network. In an example, sample blocks ranging from 4x4 to 32x32 can be used during training, and for blocks of different sizes, supplementary information with different numbers of elements can be used, and larger blocks can use supplementary information with more elements. For example, for blocks smaller than 16x16 blocks (4x8, 4x16, 16x4, 8x4, 4x4, 8x16, 16x8, 8x8), through encoding network dimensionality reduction, the obtained supplementary information is a 1x2 vector, for other blocks equal to or greater than 16x16 , the obtained supplementary information is a 1x4 vector. But this is only an example, and the present disclosure does not limit the dimension of the supplementary information, for example, it may also be a matrix composed of elements in multiple rows and columns.
在本公开一示例性的实施例中,对自编码器器训练的样本块的大小为16×16,块内有256个像素,即原始像素值为256个值。样本块的已重建参考信息包括样本块邻近的参考像素的重建值,如图7所示,样本块61邻近的参考像素可包括以下像素中的一种或多种:样本块上方的一行或多行像素62、样本块右上方的一行或多行像素65、样本块左侧的一列或多列像素63、样本块左下侧的一列或多列像素66,及样本块左上角的一个或多个像素64。在图7所示的示例中,是使用了样本块上方的4行像素的重建值、样本块 左侧的4列像素的重建值以及样本块左上角的4x4个像素的重建值作为样本块的已重建参考信息。对于16×16的样本块,重建像素值的个数为4×16(上)+4×16(左)+4×4(左上)=144个。但这仅仅是示例性的,样本块邻近的参考像素所在的区域以及每一个区域中参考像素的行数和列数可以根据样本块的大小等因素进行调整,例如在样本块大小为4×4时,可以使用样本块上方的1行像素的重建值、样本块右上方的1行像素的重建值、样本块左侧的1列像素的重建值、样本块左下侧的1列像素的重建值、以及样本块左上角的1个像素的重建值作为该样本块的已重建参考信息。In an exemplary embodiment of the present disclosure, the size of the sample block for autoencoder training is 16×16, and there are 256 pixels in the block, that is, there are 256 original pixel values. The reconstructed reference information of the sample block includes the reconstructed values of reference pixels adjacent to the sample block. As shown in FIG. 7, the reference pixels adjacent to the sample block 61 may include one or more of the following pixels: one or more lines above the sample block One row of pixels 62, one or more rows of pixels 65 at the upper right of the sample block, one or more columns of pixels 63 at the left side of the sample block, one or more columns of pixels 66 at the lower left side of the sample block, and one or more rows of pixels at the upper left corner of the sample block 64 pixels. In the example shown in Figure 7, the reconstruction values of the 4 rows of pixels above the sample block, the reconstruction values of the 4 columns of pixels on the left side of the sample block, and the reconstruction values of 4x4 pixels in the upper left corner of the sample block are used as the sample block. Reference information has been rebuilt. For a 16×16 sample block, the number of reconstructed pixel values is 4×16 (upper)+4×16 (left)+4×4 (upper left)=144. But this is only exemplary. The area where the reference pixels adjacent to the sample block are located and the number of rows and columns of reference pixels in each area can be adjusted according to the size of the sample block and other factors. For example, when the sample block size is 4×4 When , you can use the reconstructed value of the 1 row of pixels above the sample block, the reconstructed value of the 1 row of pixels above the sample block, the reconstructed value of the 1 column of pixels on the left side of the sample block, and the reconstructed value of the 1 column of pixels on the lower left side of the sample block , and the reconstructed value of 1 pixel in the upper left corner of the sample block are used as the reconstructed reference information of the sample block.
虽然图6所示的实施例使用了样本块邻近的已重建参考信息,但在本公开另一实施例中,基于自编码器模式进行帧内预测时,也可以不使用样本块邻近的已重建参考信息进行训练,即训练时以样本块内像素的原始值为编码网络51的编入,以编码网络51输出的补充信息作为解码网络52的输入。Although the embodiment shown in FIG. 6 uses the reconstructed reference information adjacent to the sample block, in another embodiment of the present disclosure, when performing intra prediction based on the autoencoder mode, the reconstructed reference information adjacent to the sample block may not be used. Reference information is used for training, that is, the original value of the pixels in the sample block is used for encoding into the encoding network 51 during training, and the supplementary information output by the encoding network 51 is used as the input of the decoding network 52 .
在自编码器的训练过程中,对于目标码率不同的块可以分别进行训练,得到与不同目标码率的块分别对应的多组网络参数(一组网络参数包括编码网络的网络参数和解码网络的网络参数)。训练时通过设定不同的目标码率,可以训练出不同的网络参数。In the training process of the autoencoder, the blocks with different target code rates can be trained separately, and multiple sets of network parameters corresponding to blocks with different target code rates can be obtained (a set of network parameters includes the network parameters of the encoding network and the network parameters of the decoding network. network parameters). By setting different target bit rates during training, different network parameters can be trained.
在自编码器的训练过程中,对于不同大小的块可以用不同大小的块的样本集分别进行训练,得到与不同大小的块分别对应的多组网络参数。In the training process of the autoencoder, for blocks of different sizes, the sample sets of blocks of different sizes can be used for training separately, and multiple sets of network parameters corresponding to blocks of different sizes can be obtained.
在自编码器的训练过程中,对于不同形状的块可以用不同的样本集分别进行训练,得到与不同形状的块分别对应的多组网络参数。In the training process of the autoencoder, blocks of different shapes can be trained with different sample sets, and multiple sets of network parameters corresponding to blocks of different shapes can be obtained.
在自编码器的训练过程中,对于不同类型的块(如亮度块和色度块)可以用不同类型的块的样本集分别进行训练,得到与不同类型的块分别对应的多组网络参数。例如,亮度块使用第一组网络参数,两个色度块使用第二组网络参数,或者亮度块使用第一组网络参数,第一色度块(Cr分量)使用第二组网络参数,第二色度块(Cb分量)使用第三组网络参数。In the training process of the self-encoder, for different types of blocks (such as luma blocks and chrominance blocks), the sample sets of different types of blocks can be used to train separately, and multiple sets of network parameters corresponding to different types of blocks can be obtained. For example, the luma block uses the first set of network parameters, and the two chroma blocks use the second set of network parameters, or the luma block uses the first set of network parameters, the first chroma block (Cr component) uses the second set of network parameters, and the second Dichroma blocks (Cb components) use a third set of network parameters.
也就是说,可以根据块的不同特征分别训练多组网络参数,块的特征可以根据块的目标码率、大小、形状和类型中的一种或多种确定。例如,在一个示例中,根据目标码率、大小和类型确定块的特征,假定块的目标码率、大小和类型有16种组合方式,则可以有16种特征不同的块,以这16种的块的样本集分别对自编码器进行训练,可以得到16组网络参数,每组网络参数对应于一种特征的块。在利用自编码器进行帧内预测时,就可以根据当前块的特征确定自编码器应使用的与当前块对应的一组网络参数。That is to say, multiple sets of network parameters can be trained separately according to different characteristics of the block, and the characteristics of the block can be determined according to one or more of the target code rate, size, shape and type of the block. For example, in one example, the characteristics of the block are determined according to the target code rate, size and type. Assuming that there are 16 combinations of the target code rate, size and type of the block, there can be 16 blocks with different characteristics. The sample sets of the block are trained on the autoencoder respectively, and 16 sets of network parameters can be obtained, and each set of network parameters corresponds to a block of a feature. When the autoencoder is used for intra prediction, a set of network parameters corresponding to the current block that the autoencoder should use can be determined according to the characteristics of the current block.
本公开实施例的编解码方法并不局限于特定的编解码标准。但如上文所述,在不同的视频编解码标准中,块的划分方式、名称等有所不同。为了统一表述,本公开将视频编码、解码时可以独立选取帧内预测模式、且可以进行帧内预测的编码单位统称为“块”,“块”可以为视频编码单元(包含亮度块和色度块),或者为视频编码单元包含的亮度块,或者为视频编码单元包含的色度块,而视频编码单元可以为但不限于宏块MB、编码树单元CTB、编码单元CU、预测单元PU、或变换单元TU。文中,将指当前正在处理的块称为当前块。例如,在编码中,当前块指当前正在编码的块;在解码中,当前块指当前正在解码的块。The codec method in the embodiment of the present disclosure is not limited to a specific codec standard. However, as mentioned above, in different video codec standards, the block division methods, names, etc. are different. In order to express uniformly, this disclosure collectively refers to coding units that can independently select intra-frame prediction modes and perform intra-frame prediction during video coding and decoding as "blocks", and "blocks" can be video coding units (including luma blocks and chroma blocks). block), or a luma block contained in a video coding unit, or a chrominance block contained in a video coding unit, and the video coding unit may be, but not limited to, a macroblock MB, a coding tree unit CTB, a coding unit CU, a prediction unit PU, Or transform unit TU. Herein, the block currently being processed is referred to as the current block. For example, in encoding, the current block refers to the block currently being encoded; in decoding, the current block refers to the block currently being decoded.
为了基于自编码器的图像压缩方法引入基于块级的视频编解码器,本公开实施例将训练的自编码器移植到传统的视频编解码框架中,并且将自编码器模式作为独立的帧内预测模式。In order to introduce a block-level video codec into the image compression method based on the autoencoder, the embodiments of the present disclosure transplant the trained autoencoder into the traditional video codec framework, and use the autoencoder mode as an independent intraframe predictive mode.
本公开一实施例将训练好的自编码器移植到视频编码框架的帧内预测处理单元中,其结构如图8所示。移植后的帧内预测处理单元126’包括:In an embodiment of the present disclosure, the trained self-encoder is transplanted into the intra-frame prediction processing unit of the video coding framework, and its structure is shown in FIG. 8 . The transplanted intra prediction processing unit 126' includes:
模式选取单元1261,用于从多种可选择的帧内预测模式中选取当前块的帧内预测模式,根据当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测,及对当前块的帧内预测模式的标识编码并写入码流;其中,所述多种可选择的帧内预测模式包括自编码器模式。The mode selection unit 1261 is configured to select the intra prediction mode of the current block from a variety of selectable intra prediction modes, activate the corresponding prediction unit to perform intra prediction on the current block according to the intra prediction mode of the current block, and Encoding the identification of the intra-frame prediction mode of the current block and writing it into the code stream; wherein, the multiple selectable intra-frame prediction modes include an autoencoder mode.
多种模式的预测单元,用于基于相应的帧内预测模式对当前块进行帧内预测,得到当前块中像素的预测值。所述多种模式的预测单元包括:The prediction unit of multiple modes is used to perform intra-frame prediction on the current block based on the corresponding intra-frame prediction mode, and obtain the predicted value of the pixels in the current block. The prediction units of the multiple modes include:
自编码器预测单元1263,用于在当前块的帧内预测模式为自编码器模式时,基于自编码器的编码网络,对当前块中像素的原始值和/或当前块邻近的已重建参考信息进行第一非线性变换,得到当前块的补充信息;及,基于自编码器的解码网络,对当前块的补充信息或者对当前块的补充信息和邻近的已重建参考信息进行第二非线性变换,得到当前块中像素的预测值(也即产生当前块的预测块),所述预测值即自编码器模式下输出的预测结果。The self-encoder prediction unit 1263 is used for, when the intra-frame prediction mode of the current block is the self-encoder mode, based on the coding network of the self-encoder, the original values of the pixels in the current block and/or the reconstructed references adjacent to the current block performing a first nonlinear transformation on the information to obtain supplementary information of the current block; and, based on the decoding network of an autoencoder, performing a second nonlinear transformation on the supplementary information of the current block or on the supplementary information of the current block and adjacent reconstructed reference information Transform to obtain the prediction value of the pixels in the current block (that is, the prediction block that generates the current block), and the prediction value is the prediction result output in the self-encoder mode.
其他模式的预测单元对应的帧内预测模式可以包括平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式或矩阵加权帧内预测MIP模式中的一种或多种。图中以平面模式预测单元 1262为例,略去了其他模式的预测单元。The intra prediction modes corresponding to the prediction units in other modes may include one or more of planar mode, horizontal mode, vertical mode, DC mode, angle mode, inter-component linear model CCLM mode or matrix weighted intra prediction MIP mode. In the figure, the plane mode prediction unit 1262 is taken as an example, and prediction units of other modes are omitted.
虽然图8所示的示例示出了一个自编码器预测单元,但在另一示例中,本公开实施例也可以在帧内预测处理单元中设置多个自编码器预测单元。分别使用不同的自编码器(自编码器的结构不同)进行帧内预测。本公开实施例可选择的帧内预测模式可以包括多种自编码器模式,不同的自编码器模式可以使用不同的自编码器来执行帧内预测,或者,使用同一自编码器但设置不同的网络参数以执行不同自编码器模式下的帧内预测。Although the example shown in FIG. 8 shows one self-encoder prediction unit, in another example, multiple self-encoder prediction units may also be set in the intra-frame prediction processing unit in the embodiment of the present disclosure. Different autoencoders (with different autoencoder structures) are used for intra prediction respectively. The optional intra-frame prediction mode in the embodiment of the present disclosure may include multiple autoencoder modes, and different autoencoder modes may use different autoencoders to perform intra-frame prediction, or use the same autoencoder but set different Network parameters to perform intra prediction in different autoencoder modes.
从图8可以看出,在视频编码器中,自编码器模式下的帧内预测与传统的帧内预测不同,相对传统的帧内预测,自编码器模式下的帧内预测会生成补充信息,该补充信息需要额外编码写入码流,和语法元素一起送到熵编码单元116。此外,自编码器模式下的帧内预测利用了当前块中像素的原始值(即利用了当前块的原始块),而不仅仅是根据已重建参考信息进行预测,这使得补充信息包含有从原始块中提取的信息。而传统的帧内预测是计算当前块内像素的原始值和预测值之差,得到当前块内像素的残差值(即产生当前块的残差块,也可称为当前块的残差块),通过残差值来体现原始块的信息。It can be seen from Figure 8 that in a video encoder, intra prediction in autoencoder mode is different from traditional intra prediction, and compared to traditional intra prediction, intra prediction in autoencoder mode generates supplementary information , the supplementary information needs to be additionally coded and written into the code stream, and sent to the entropy coding unit 116 together with the syntax elements. In addition, the intra prediction in autoencoder mode utilizes the original values of the pixels in the current block (i.e., utilizes the original block of the current block) instead of just predicting based on the reconstructed reference information, which makes the supplementary information contain from Information extracted from the original block. The traditional intra prediction is to calculate the difference between the original value and the predicted value of the pixel in the current block to obtain the residual value of the pixel in the current block (that is, the residual block of the current block is generated, which can also be called the residual block of the current block ), to reflect the information of the original block through the residual value.
自编码器训练时可以输出的预测值和原始值的率失真代价为目标,采用自编码器模式进行帧内预测时,当前块中像素的预测值相对于原始值的失真是可控的,因而采用自编码器模式进行帧内预测时,可以跳过残差处理,这里残差处理包括残差计算、对残差的变换、量化、熵编码、反量化、反变换等处理。在本公开一示例性实施例中,视频编码器采用第一自编码器模式对当前块进行帧内预测,需要执行相应编码处理的模块如图9所示,其他模块(包括残差产生单元102、变换处理单元104、最化单元106、反量化单元108、反变换处理单元110)执行的处理均可跳过。默认当前块中像素的重建的残差值等于0,重建单元112接收帧内预测处理单元126输出的当前块中像素的预测值后,可以略过滤波处理,将所述预测值作为当前块中像素的重建值保存到已解码图片缓冲器114中,当然也可以经滤波器单元113进行滤波后再保存到已解码图片缓冲器114中。在帧内预测模式为第一自编码器模式的情况下采用的跳过残差处理的方式,无需对残差编码,可节约开销,同时可以大大减少编码的复杂度,减轻编码器的负担。The rate-distortion cost of the predicted value and the original value that can be output during self-encoder training is the target. When using the self-encoder mode for intra-frame prediction, the distortion of the predicted value of the pixel in the current block relative to the original value is controllable, so When the autoencoder mode is used for intra-frame prediction, residual processing can be skipped, where residual processing includes residual calculation, residual transformation, quantization, entropy coding, inverse quantization, inverse transformation, and other processing. In an exemplary embodiment of the present disclosure, the video encoder uses the first self-encoder mode to perform intra-frame prediction on the current block, and the modules that need to perform corresponding encoding processing are shown in FIG. 9 . Other modules (including the residual generation unit 102 , the transformation processing unit 104, the optimization unit 106, the inverse quantization unit 108, and the inverse transformation processing unit 110) can be skipped. By default, the reconstructed residual value of the pixel in the current block is equal to 0. After the reconstruction unit 112 receives the predicted value of the pixel in the current block output by the intra prediction processing unit 126, the filtering process can be omitted, and the predicted value can be used as the predicted value in the current block. The reconstructed value of the pixel is stored in the decoded picture buffer 114 , of course, it can also be filtered by the filter unit 113 and then stored in the decoded picture buffer 114 . When the intra-frame prediction mode is the first self-encoder mode, the method of skipping residual processing does not need to encode the residual, which can save overhead, and can greatly reduce the complexity of encoding and reduce the burden on the encoder.
在本公开另一示例性的实施例中,采用第二自编码器模式对当前块进行帧内预测时,进行残差的编码处理。参见图8和图2,自编码器预测单元1263中编码网络输出的补充信息需要传送到熵编码单元116进行编码,而自编码器预测单元1263中解码网络输出的预测值和传统模式的预测单元产生的预测值一样需要传送到残差产生单元102以产生残差信息,后续也对残差信息进行变换、量化、反量化和反变换等处理,并对量化后的残差信息编码写入码流。该实施例与跳过残差处理的前述实施例相比,编码运算较为复杂,但可以具有更好的图像质量。In another exemplary embodiment of the present disclosure, when the second self-encoder mode is used to perform intra-frame prediction on the current block, residual coding processing is performed. Referring to Figure 8 and Figure 2, the supplementary information output from the encoding network in the self-encoder prediction unit 1263 needs to be transmitted to the entropy encoding unit 116 for encoding, while the prediction value output from the decoding network in the self-encoder prediction unit 1263 and the prediction unit of the traditional mode The generated predicted value also needs to be sent to the residual generation unit 102 to generate the residual information, and then the residual information is also transformed, quantized, dequantized, and inversely transformed, and the quantized residual information is encoded and written into the code. flow. Compared with the foregoing embodiment in which the residual processing is skipped, this embodiment has more complex encoding operations, but can have better image quality.
上述残差处理方式不同的第一自编码器模式和第二自编码器模式可以作为两种不同的帧内预测模式供视频编码器中的帧内预测处理单元选择。也可以作为一种自编码器模式下的两种子模式。比较这两种子模式的编码代价,在跳过残差处理时当前块的编码代价小于或等于进行残差处理时当前块的编码代价时,采用第一自编码器模式即跳过残差处理以节约开销和简化运算,在跳过残差处理时当前块的编码代价大于进行残差处理时当前块的编码代价时,采用第二自编码器模式即进行残差处理以保证视频的质量。The first autoencoder mode and the second autoencoder mode with different residual processing methods can be used as two different intra prediction modes for selection by the intra prediction processing unit in the video encoder. Also available as two sub-modes in one autoencoder mode. Comparing the encoding cost of these two sub-modes, when the encoding cost of the current block when skipping residual processing is less than or equal to the encoding cost of the current block when performing residual processing, the first autoencoder mode is used, that is, skipping residual processing to To save overhead and simplify calculations, when the encoding cost of the current block when skipping residual processing is greater than the encoding cost of the current block when performing residual processing, the second autoencoder mode is used to perform residual processing to ensure video quality.
本公开一实施例提供了一种视频编码方法,也提供了一种用于视频编码的帧内预测方法,如图10所示,包括:An embodiment of the present disclosure provides a video encoding method, and also provides an intra prediction method for video encoding, as shown in FIG. 10 , including:
步骤610,从多种可选择的帧内预测模式中选取当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式; Step 610, select the intra prediction mode of the current block from a variety of selectable intra prediction modes, the multiple selectable intra prediction modes include an autoencoder mode;
步骤620,基于所述当前块的帧内预测模式对所述当前块进行帧内预测,对所述当前块的帧内预测模式的标识编码并写入码流。Step 620: Perform intra-frame prediction on the current block based on the intra-frame prediction mode of the current block, encode the identifier of the intra-frame prediction mode of the current block and write it into a code stream.
本实施例视频编码方法将自编码器模式引入为帧内预测模式的一种,增加了一种基于非线性计算的帧内预测模式,可以增强视频编码器对细节特征复杂的图像的编码性能。The video encoding method of this embodiment introduces the self-encoder mode as one of the intra-frame prediction modes, and adds an intra-frame prediction mode based on nonlinear calculation, which can enhance the encoding performance of the video encoder for images with complex detail features.
在本公开一示例性的实施例中,在所述当前块的帧内预测模式为自编码器模式的情况下,所述基于所述当前块的帧内预测模式对所述当前块进行帧内预测,包括:In an exemplary embodiment of the present disclosure, when the intra prediction mode of the current block is an autoencoder mode, the intra prediction mode of the current block is used to perform intra prediction on the current block. Forecasts, including:
通过编码处理得到所述当前块的补充信息,所述编码处理包括:基于所述自编码器模式对应的自编码器的编码网络,对所述当前块中像素的原始值和/或所述当前块邻近的已重建参考信息进行第一非线性变换;及The supplementary information of the current block is obtained through encoding processing, and the encoding process includes: based on the encoding network of the autoencoder corresponding to the autoencoder mode, the original value of the pixel in the current block and/or the current performing a first nonlinear transformation on the reconstructed reference information adjacent to the block; and
基于所述自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补 充信息和所述当前块邻近的已重建参考信息进行第二非线性变换,得到所述当前块中像素的预测值。Based on the decoding network of the autoencoder corresponding to the autoencoder mode, perform a second nonlinear transformation on the supplementary information of the current block or on the supplementary information of the current block and the reconstructed reference information adjacent to the current block , to obtain the predicted value of the pixel in the current block.
在本公开一示例性的实施例中,所述多种可选择的帧内预测模式还包括以下模式中的一种或多种:平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式和矩阵加权帧内预测MIP模式。对于其他的帧内预测模式本公开不做局限,可以是现有编解码标准中采用的帧内预测模式,也可以是现有编解码标准中还没有采用的帧内预测模式。In an exemplary embodiment of the present disclosure, the multiple selectable intra prediction modes further include one or more of the following modes: planar mode, horizontal mode, vertical mode, DC mode, angle mode, component Interlinear model CCLM mode and matrix weighted intra prediction MIP mode. The present disclosure is not limited to other intra-frame prediction modes, which may be intra-frame prediction modes adopted in existing codec standards, or intra-frame prediction modes not yet adopted in existing codec standards.
在本公开一示例性的实施例中,所述帧内预测模式的标识包括自编码器模式标识位,所述自编码器模式标识位的一个取值表示帧内预测模式为自编码器模式,另一个取值表示帧内预测模式非自编码器模式;或者,所述帧内预测模式的标识包括帧内预测模式的索引号。In an exemplary embodiment of the present disclosure, the identification of the intra prediction mode includes an autoencoder mode identification bit, and a value of the autoencoder mode identification bit indicates that the intra prediction mode is an autoencoder mode, Another value indicates that the intra-frame prediction mode is not an autoencoder mode; or, the identifier of the intra-frame prediction mode includes an index number of the intra-frame prediction mode.
在本公开一示例性的实施例中,所述当前块邻近的已重建参考信息包括所述当前块邻近的参考像素的重建值,所述当前块邻近的参考像素包括以下像素中的一种或多种:In an exemplary embodiment of the present disclosure, the reconstructed reference information adjacent to the current block includes reconstruction values of reference pixels adjacent to the current block, and the reference pixels adjacent to the current block include one of the following pixels or Various:
所述当前块上方的一行或多行像素;one or more rows of pixels above the current block;
所述当前块右上方的一行或多行像素;One or more rows of pixels at the upper right of the current block;
所述当前块左侧的一列或多列像素;One or more columns of pixels to the left of the current block;
所述当前块左下侧的一列或多列像素;One or more columns of pixels on the lower left side of the current block;
所述当前块左上角的一个或多个像素。One or more pixels in the upper left corner of the current block.
如上文所述,在端到端的视频编解码方法中,量化是使用一个简单的标量量化器实现,不能灵活调整,在将其移植到视频编码器框架的帧内预测功能模块后,会影响编码性能的提升。通常的,补充信息由多个整数值组成,整数值的取值会同时影响到编码补充信息所需要消耗的比特以及预测值与原始值之间的失真的大小,可以通过调整整数值的取值来最小化比特开销和失真大小。因此,在本公开一实施例中,在进行所述帧内预测的所述编码处理时,还包括:确定所述第一非线性变换输出的元素值的多种微调方式;分别计算所述多种微调方式下的所述当前块的编码代价如率失真代价;按照编码代价最小的微调方式对所述元素值进行微调,将微调后的元素值作为所述当前块的补充信息。本实施例对自编码器的编码网络输出的元素值加以微调,以编码代价来选择一种最优的微调方式,可以增加量化的灵活性,提高编码性能。上述多种微调方式下的所述当前块的编码代价可以在选定变换模式的情况下分别计算,也可以将所述多种微调方式与可选择的变换模式进行组合,将编码代价最小的组合中的微调方式视为所述编码代价最小的微调方式。As mentioned above, in the end-to-end video encoding and decoding method, quantization is implemented using a simple scalar quantizer, which cannot be adjusted flexibly. After it is transplanted to the intra prediction function module of the video encoder framework, it will affect the encoding Performance improvements. Usually, the supplementary information is composed of multiple integer values. The value of the integer value will affect the bits consumed by encoding the supplementary information and the size of the distortion between the predicted value and the original value. You can adjust the value of the integer value to minimize bit overhead and distortion size. Therefore, in an embodiment of the present disclosure, when performing the encoding process of the intra prediction, it further includes: determining multiple fine-tuning modes of the element values output by the first nonlinear transformation; calculating the multiple The encoding cost of the current block in one fine-tuning manner, such as the rate-distortion cost; the element value is fine-tuned according to the fine-tuning manner with the smallest encoding cost, and the fine-tuned element value is used as the supplementary information of the current block. In this embodiment, the element values output by the encoding network of the self-encoder are fine-tuned, and an optimal fine-tuning method is selected at the encoding cost, which can increase the flexibility of quantization and improve the encoding performance. The coding cost of the current block under the above multiple fine-tuning methods can be calculated separately when the transformation mode is selected, or the multiple fine-tuning methods can be combined with the selectable transformation modes, and the combination with the smallest coding cost The fine-tuning method in is regarded as the fine-tuning method with the least encoding cost.
神经网络在训练过程中由于需要精确的梯度计算,所以网络权重等参数为浮点数,这导致了网络每层输出也是浮点数,而补充信息需要经过熵编码,熵编码是对整数进行。因此本公开提出两种处理方式,第一种是对训练好的网络权重进行量化,使自编码器的编码网络在使用时输出为整数,第二种是对自编码器的编码网络输出的浮点数进行量化使其为整数。During the training process of the neural network, precise gradient calculations are required, so parameters such as network weights are floating-point numbers, which leads to the output of each layer of the network is also a floating-point number, and the supplementary information needs to be entropy encoded, and entropy encoding is carried out on integers. Therefore, this disclosure proposes two processing methods, the first is to quantize the trained network weights, so that the output of the encoding network of the autoencoder is an integer when it is used, and the second is to quantify the floating value output by the encoding network of the autoencoder Points are quantized so that they are integers.
针对这两种不同情况,可以相应采用以下两种不同的微调方式:For these two different situations, the following two different fine-tuning methods can be adopted accordingly:
第一种:在所述第一非线性变换输出的元素值为整数的情况下,所述多种微调方式包括对所述第一非线性变换输出的部分或全部元素值的可能微调值进行组合得到的部分或全部组合方式,其中,一个元素值的可能微调值包括该元素值,该元素值加1得到的值,及该元素值减1得到的值;The first method: when the element values output by the first nonlinear transformation are integers, the multiple fine-tuning methods include combining possible fine-tuning values of some or all element values output by the first nonlinear transformation Part or all of the resulting combinations, wherein the possible fine-tuning values of an element value include the element value, the value obtained by adding 1 to the element value, and the value obtained by subtracting 1 from the element value;
这种方式下,自编码器的编码网络与解码网络权重为整数,这代表训练完的神经网络权重已经经过量化,这时自编码器编码网络输出的补充信息则已经为整数。此时由于网络权重的量化带来了失真,所以输出的补充信息非最优结果,本公开实施从外到里通过对补充信息逐个进行加一减一和保持原有值的尝试,以拉格朗日率失真为衡量标准,决策出最优的补充信息。例如,对于实施中1x2,补充信息共有2 3种组合,对所有组合逐一计算bit开销以及重建块的失真度,选出最优的组合作为最终补充信息写入码流。可以提高视频编解码的性能。 In this way, the weights of the encoding network and decoding network of the autoencoder are integers, which means that the weights of the trained neural network have been quantized, and the supplementary information output by the encoding network of the autoencoder is already an integer. At this time, due to the distortion caused by the quantization of the network weights, the output supplementary information is not the optimal result. The implementation of this disclosure attempts to add and subtract one by one to the supplementary information and keep the original value from the outside to the inside. The Langer rate-distortion is used as the measurement standard to determine the optimal supplementary information. For example, for the implementation of 1x2, there are 2 to 3 combinations of supplementary information. For all combinations, the bit overhead and the distortion degree of the reconstructed block are calculated one by one, and the optimal combination is selected as the final supplementary information to be written into the code stream. It can improve the performance of video codec.
第二种:在所述第一非线性变换输出的元素值为浮点数的情况下,所述多种微调方式包括对所述第一非线性变换输出的部分或全部元素值的可能微调值进行组合得到的部分或全部组合方式,其中,一个元素值的可能微调值包括对该元素值向上取整得到的值,及对该元素值向下取整得到的值。本实施例采用的这种微调方式,The second method: in the case where the element values output by the first nonlinear transformation are floating-point numbers, the multiple fine-tuning methods include performing possible fine-tuning values on some or all of the element values output by the first nonlinear transformation Some or all of the combination methods obtained by combination, wherein, the possible fine-tuning value of an element value includes the value obtained by rounding up the value of the element, and the value obtained by rounding down the value of the element. The fine-tuning method adopted in this embodiment,
对于自编码器的编码网络的输出,一种可选的量化方式是四舍五入,量化会带来精度的损失。相对于四舍五入操作,本公开实施例对补充信息包含的元素值逐个进行上取整和下取整的尝试,组合为多种微调方式,以拉格朗日率失真为衡量标准,决策出最优的补充信息,例如,对于实施中1x2的向量,补充信息共有2 2种组合,对所有组合逐一计算率失真代价,选出最优的组合作为最终补充信息写入码流,可以提高 视频编解码的性能。 For the output of the encoding network of the autoencoder, an optional quantization method is rounding, and quantization will cause a loss of precision. Compared with the rounding operation, the embodiment of the present disclosure attempts to round up and down the element values contained in the supplementary information one by one, and combine them into a variety of fine-tuning methods, and use the Lagrangian rate distortion as the measurement standard to determine the optimal Supplementary information, for example, for the 1x2 vector in the implementation, there are 2 2 combinations of supplementary information, calculate the rate-distortion cost for all combinations one by one, select the optimal combination as the final supplementary information and write it into the code stream, which can improve video encoding and decoding performance.
在本公开一实施例中,所述从多种可选择的帧内预测模式中选取当前块的帧内预测模式,包括:In an embodiment of the present disclosure, the selection of the intra prediction mode of the current block from multiple selectable intra prediction modes includes:
确定所述多种可选择的帧内预测模式中需进行代价计算的帧内预测模式;determining an intra-frame prediction mode that requires cost calculation among the multiple selectable intra-frame prediction modes;
分别计算需进行代价计算的所述帧内预测模式的编码代价;Calculating the encoding costs of the intra prediction modes that require cost calculation;
将编码代价最小的帧内预测模式选取为所述当前块的帧内预测模式;Selecting the intra-frame prediction mode with the smallest coding cost as the intra-frame prediction mode of the current block;
其中,对于任意的块,将自编码器模式作为需进行代价计算的帧内预测模式;或者,如所述当前块的特征属于设定的可使用自编码器模式的块的特征,再将自编码器模式作为需进行代价计算的帧内预测模式,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。在一个示例中,对于大小为4x4到32x32之间的块才将自编码器模式作为需进行代价计算的帧内预测模式,对于其他大小的块,不将自编码器模式作为需进行代价计算的帧内预测模式。Wherein, for any block, the self-encoder mode is used as the intra-frame prediction mode that requires cost calculation; or, if the feature of the current block belongs to the set feature of the block that can use the self-encoder mode, then the self-encoder mode is used The encoder mode is an intra-frame prediction mode that requires cost calculation, and the features are determined according to one or more of target code rate, size, shape and type. In one example, the autoencoder mode is used as the intra prediction mode that requires cost calculation for blocks with a size between 4x4 and 32x32, and the autoencoder mode is not used as a costly calculation mode for blocks of other sizes Intra prediction mode.
本公开实施例中,所述编码代价采用拉格朗日率失真代价衡量,拉格朗日率失真代价J=λR+D,其中,R是当前块编码后码字的比特开销,对于自编码器模式,相对其他模式需要额外计入补充信息的开销;D是当前块的重建块相对于原始块的失真;λ根据量化参数确定。当自编码器模式的率失真代价最小时,可以将当前块的自编码器模式标识位置1并且对补充信息编码写入码流。In the embodiment of the present disclosure, the encoding cost is measured by Lagrangian rate-distortion cost, Lagrangian rate-distortion cost J=λR+D, where R is the bit overhead of the codeword after encoding the current block, and for self-encoding Comparing with other modes, additional supplementary information overhead is required; D is the distortion of the reconstructed block of the current block relative to the original block; λ is determined according to the quantization parameter. When the rate-distortion cost of the autoencoder mode is the smallest, the autoencoder mode identification bit of the current block can be set to 1 and the supplementary information can be encoded and written into the code stream.
本公开实施例将自编码器模式加入传统帧内预测的最可能模式列表中,最可能模式列表中包括了筛选出的多个需要进行完整率失真优化的帧内预测模式,对其中的每一帧内预测模式需要计算出该模式下当前块的率失真代价用于比较。通过将自编码器模式加入列表与其他帧内预测模式进行代价比较,可以充分利用自编码器模式的优点,实现对视频编码器的性能优化。In the embodiment of the present disclosure, the self-encoder mode is added to the most probable mode list of traditional intra prediction, and the most probable mode list includes a plurality of selected intra prediction modes that need to be fully rate-distortion optimized, and each of them The intra prediction mode needs to calculate the rate-distortion cost of the current block in this mode for comparison. By adding the autoencoder mode to the list and comparing the cost with other intra prediction modes, the advantages of the autoencoder mode can be fully utilized to achieve performance optimization of the video encoder.
在本实施例的一个示例中,在所述自编码器模式包括第一自编码器模式的情况下,根据拉格朗日率失真公式计算所述第一自编码器模式对应的编码代价时,所述拉格朗日率失真公式中的比特开销包括所述当前块的补充信息和帧内预测模式的标识的开销,所述拉格朗日率失真公式中的失真根据所述当前块中像素的原始值和预测值的差值确定。In an example of this embodiment, when the autoencoder mode includes the first autoencoder mode, when calculating the encoding cost corresponding to the first autoencoder mode according to the Lagrangian rate-distortion formula, The bit overhead in the Lagrange rate-distortion formula includes the supplementary information of the current block and the overhead of the identification of the intra prediction mode, and the distortion in the Lagrangian rate-distortion formula is based on the pixel in the current block The difference between the original value and the predicted value is determined.
在本实施例的另一示例中,在所述自编码器模式为第二自编码器模式的情况下,根据拉格朗日率失真公式计算所述第二自编码器模式对应的率失真代价时,所述拉格朗日率失真公式中的比特开销包括所述当前块的补充信息、所述当前块中像素的残差值和所述当前块的帧内预测模式的标识的开销;所述拉格朗日率失真公式中的失真根据所述当前块中像素的原始值和重建值确定,其中,所述当前块中像素的重建值等于所述当前块中像素的预测值加上重建的残差值(重建的残差值由反量化单元输出)。In another example of this embodiment, when the autoencoder mode is the second autoencoder mode, the rate-distortion cost corresponding to the second autoencoder mode is calculated according to the Lagrangian rate-distortion formula When , the bit overhead in the Lagrangian rate-distortion formula includes the overhead of the supplementary information of the current block, the residual value of the pixel in the current block, and the identification of the intra prediction mode of the current block; The distortion in the Lagrangian rate-distortion formula is determined according to the original value and the reconstructed value of the pixel in the current block, wherein the reconstructed value of the pixel in the current block is equal to the predicted value of the pixel in the current block plus the reconstructed The residual value of (the reconstructed residual value is output by the dequantization unit).
在本实施例的一个示例中,可以对选取自编码器模式的当前块做出限制,例如,只允许大小在4x4到32x32之间的块使用自编码器模式,且对于不同大小的块可以产生不同数据量的补充信息。例如,对于小于16x16的块,补充信息包括两个元素,对于其它大于或等于16x16的块,补充信息包括4个元素。块的大小与自编码器模式下帧内编码的效果是相关的,限制使用自编码器的块的大小,可以充分利用自编码器模式的优势,同时简化模式选择时的运算。In an example of this embodiment, the current block that selects the autoencoder mode can be restricted, for example, only blocks with a size between 4x4 and 32x32 are allowed to use the autoencoder mode, and blocks of different sizes can be Generate supplementary information of varying data volumes. For example, for blocks smaller than 16x16, the supplementary information includes two elements, and for other blocks larger than or equal to 16x16, the supplementary information includes 4 elements. The size of the block is related to the effect of intra-frame encoding in the autoencoder mode. Limiting the block size of the autoencoder can make full use of the advantages of the autoencoder mode, while simplifying the operation of the mode selection.
在本实施例的一个示例中,对于同一空间位置上(如同一视频编码单元)的亮度块和色度块,可以将为亮度块选取的帧内预测模式作为同一空间位置上的色度块的帧内预测模式。如果一个视频编码单元中的亮度块选取了自编码器模式,则也为该视频编码单元中的色度块选取自编码器模式。In an example of this embodiment, for a luma block and a chroma block at the same spatial position (such as the same video coding unit), the intra prediction mode selected for the luma block can be used as the intra prediction mode for the chroma block at the same spatial position. Intra prediction mode. If the autoencoder mode is selected for the luma block in a video coding unit, then the autoencoder mode is also selected for the chroma block in the video coding unit.
在本公开一实施例中,所述自编码器具有训练出的一组网络参数。In an embodiment of the present disclosure, the autoencoder has a set of trained network parameters.
在本公开一实施例中,所述自编码器具有训练出的多组网络参数,每组网络参数包含所述自编码器的编码网络和解码网络的网络参数,不同组的网络参数根据各自对应的块的样本集分别训练得到,不同组的网络参数对应的块具有不同的特征,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。本实施例中,在所述当前块的帧内预测模式为自编码器模式的情况下,所述自编码器的编码网络和解码网络的网络参数可通过以下方式确定:根据所述当前块的特征从所述多组网络参数中找到与所述当前块对应的一组网络参数,根据该组网络参数确定所述自编码器的编码网络和解码网络的网络参数。此外,在本实施例中,所述视频编码方法还包括:对所述当前块的网络参数标识编码并写入码流,所述当前块的网络参数标识用于指示所述多组网络参数中与所述当前块对应的一组网络参数。在码流中加入网络参数标识可以方便视频解码器快速确定自编码器的解码网络的网络参数,在另一示例中,也可以由视频解码器中的帧内预测处理单元根据当前块的特征查找到相应的网络参数标识,从而节约编码网络参数标识的开销。In an embodiment of the present disclosure, the autoencoder has multiple sets of network parameters trained, each set of network parameters includes the network parameters of the encoding network and decoding network of the autoencoder, and different sets of network parameters are based on their corresponding The sample sets of the blocks are trained separately, and the blocks corresponding to different groups of network parameters have different characteristics, and the characteristics are determined according to one or more of the target code rate, size, shape and type. In this embodiment, when the intra prediction mode of the current block is an autoencoder mode, the network parameters of the encoding network and decoding network of the autoencoder can be determined in the following manner: according to the A set of network parameters corresponding to the current block is found from the multiple sets of network parameters, and network parameters of the encoding network and decoding network of the autoencoder are determined according to the set of network parameters. In addition, in this embodiment, the video encoding method further includes: encoding the network parameter identifier of the current block and writing it into a code stream, where the network parameter identifier of the current block is used to indicate the A set of network parameters corresponding to the current block. Adding the network parameter identifier in the code stream can facilitate the video decoder to quickly determine the network parameters of the decoding network of the self-encoder. In another example, the intra prediction processing unit in the video decoder can also search for to the corresponding network parameter identifier, thereby saving the overhead of encoding the network parameter identifier.
在本公开一实施例中,在所述当前块为视频编码单元,所述视频编码单元包含一个亮度块和两个色度块的情况下,对于视频编码单元中的一个亮度块和两个色度块使用的自编码器可以有三种不同的方式:In an embodiment of the present disclosure, when the current block is a video coding unit, and the video coding unit includes a luma block and two chroma blocks, for a luma block and two chroma blocks in the video coding unit The autoencoder used by the degree block can be done in three different ways:
第一种是使用一个自编码器,将所述一个亮度块和两个色度块相关的样本数据作为该自编码器的输入数据进行训练,该自编码器的编码网络输出的补充信息为所述一个亮度块和两个色度块所共用。相应地,在执行帧内预测编码时,所述通过编码处理得到所述当前块的补充信息,包括:基于所述自编码器模式的与视频编码单元对应的一个自编码器的编码网络,对所述一个亮度块和两个色度块中像素的原始值和/或所述一个亮度块和两个色度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块和两个色度块共用的补充信息。使用一个自编码器对一个亮度块和两个色度块同时进行帧内预测,可以提高运算效率,加快预测速度。The first is to use an autoencoder, and use the sample data related to the one luma block and two chrominance blocks as the input data of the autoencoder for training, and the supplementary information output by the encoding network of the autoencoder is shared by one luma block and two chrominance blocks. Correspondingly, when intra-frame predictive coding is performed, the obtaining the supplementary information of the current block through coding processing includes: based on the coding network of an autoencoder corresponding to the video coding unit in the autoencoder mode, to The original values of the pixels in the one luma block and the two chroma blocks and/or the reconstructed reference information adjacent to the one luma block and the two chroma blocks are subjected to a first nonlinear transformation to obtain the one luma block and the two chroma blocks Supplementary information common to both chroma blocks. Using an autoencoder to simultaneously perform intra-frame prediction on a luma block and two chrominance blocks can improve computing efficiency and speed up prediction.
第二种是使用二个自编码器,其中一个自编码器以所述一个亮度块相关的样本数据作为输入数据进行训练,该自编码器的编码网络输出的补充信息是所述一个亮度块的补充信息。另一个自编码器以所述两个色度块相关的样本数据为输入数据进行训练,该另一个自编码器的编码网络输出的补充信息是所述两个亮度块共用的补充信息。相应地,在执行帧内预测编码时,所述通过编码处理得到所述当前块的补充信息,包括:基于所述自编码器模式的与亮度块对应的一个自编码器的编码网络,对所述一个亮度块中像素的原始值和/或所述一个亮度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块的补充信息;及,基于所述自编码器模式的与色度块对应的一个自编码器的编码网络,对所述两个色度块中像素的原始值和/或所述两个色度块邻近的已重建参考信息进行第一非线性变换,得到所述两个色度块共用的补充信息。The second is to use two self-encoders, wherein one self-encoder uses the sample data related to the one luminance block as input data for training, and the supplementary information output by the encoding network of the self-encoder is the information of the one luminance block Additional information. Another self-encoder uses the sample data related to the two chrominance blocks as input data for training, and the supplementary information output by the encoding network of the other self-encoder is the supplementary information shared by the two luma blocks. Correspondingly, when intra-frame predictive coding is performed, said obtaining the supplementary information of the current block through coding processing includes: based on the coding network of an autoencoder corresponding to the luma block in the autoencoder mode, the performing a first nonlinear transformation on the original values of pixels in the one luma block and/or the reconstructed reference information adjacent to the one luma block to obtain supplementary information of the one luma block; and, based on the autoencoder mode an encoding network of an autoencoder corresponding to the chrominance block, performing a first nonlinear transformation on the original values of the pixels in the two chrominance blocks and/or the reconstructed reference information adjacent to the two chrominance blocks, Complementary information shared by the two chrominance blocks is obtained.
第三种是使用三个自编码器,其中第一个自编码器以所述一个亮度块相关的样本数据作为输入数据进行训练,该自编码器的编码网络输出的补充信息是所述一个亮度块的补充信息。第二个自编码器以所述两个色度块中第一色度块相关的样本数据为输入数据进行训练,该第二个自编码器的编码网络输出的补充信息是所述第一色度块的补充信息。第三个自编码器以所述两个色度块中第二色度块相关的样本数据为输入数据进行训练,该第三个自编码器的编码网络输出的补充信息是所述第二色度块的补充信息。相应地,在执行帧内预测编码时,所述通过编码处理得到所述当前块的补充信息,包括:基于所述自编码器模式的与亮度块对应的一个自编码器的编码网络,对所述一个亮度块中像素的原始值和/或所述一个亮度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块的补充信息;基于所述自编码器模式的与所述两个色度块中第一色度块对应的一个自编码器的编码网络,对所述第一色度块中像素的原始值和/或邻近的已重建参考信息进行第一非线性变换,得到所述第一色度块的补充信息;及,基于所述自编码器模式的与所述两个色度块中第二色度块对应的一个自编码器的编码网络,对所述第二色度块中像素的原始值和/或邻近的已重建参考信息进行第一非线性变换,得到所述第二色度块的补充信息。The third is to use three autoencoders, wherein the first autoencoder is trained with the sample data related to the one luminance block as input data, and the supplementary information output by the encoding network of the autoencoder is the one luminance Supplementary information for the block. The second self-encoder is trained with the sample data related to the first chroma block in the two chroma blocks as input data, and the supplementary information output by the encoding network of the second self-encoder is the first color Supplementary information for degree blocks. The third self-encoder is trained with the sample data related to the second chroma block in the two chroma blocks as input data, and the supplementary information output by the encoding network of the third self-encoder is the second color Supplementary information for degree blocks. Correspondingly, when intra-frame predictive coding is performed, said obtaining the supplementary information of the current block through coding processing includes: based on the coding network of an autoencoder corresponding to the luma block in the autoencoder mode, the performing a first nonlinear transformation on the original value of the pixel in the one luma block and/or the reconstructed reference information adjacent to the one luma block to obtain the supplementary information of the one luma block; based on the self-encoder mode and the An encoding network of an autoencoder corresponding to the first chrominance block of the two chrominance blocks, performing a first nonlinear transformation on the original value of the pixel in the first chrominance block and/or the adjacent reconstructed reference information , to obtain the supplementary information of the first chrominance block; and, based on the encoding network of an autoencoder corresponding to the second chrominance block of the two chrominance blocks in the autoencoder mode, the The original values of the pixels in the second chroma block and/or adjacent reconstructed reference information are subjected to a first nonlinear transformation to obtain supplementary information of the second chroma block.
如果所述当前块为视频编码单元包含的亮度块,在选取了自编码器模式时,使用一个自编码器对该亮度块进行帧内预测即可。如果当前块为视频编码单元包含的色度块,且同一视频编码单元包含的两个色度块共同进行帧内预测模式选取和帧内预测时,可使用所述两个色度块的样本数据训练一个自编码器,使用该自编码器对所述两个色度块进行帧内预测,得到所述两个色度块共用的补充信息。如果当前块为视频编码单元包含的色度块,但对同一视频编码单元包含的两个色度块分别进行帧内预测模式选取和帧内预测时,可根据所述两个色度块的样本数据分别训练两个自编码器,使用所述两个自编码器对该两个色度块分别进行帧内预测,具体方式与前述实施例类似,这时不再赘述。If the current block is a luma block included in the video coding unit, when the autoencoder mode is selected, an autoencoder may be used to perform intra-frame prediction on the luma block. If the current block is a chroma block contained in a video coding unit, and two chroma blocks contained in the same video coding unit are used for intra prediction mode selection and intra prediction together, the sample data of the two chroma blocks can be used An autoencoder is trained, and the autoencoder is used to perform intra-frame prediction on the two chrominance blocks to obtain supplementary information shared by the two chrominance blocks. If the current block is a chroma block included in a video coding unit, but when performing intra prediction mode selection and intra prediction on two chroma blocks included in the same video coding unit, the samples of the two chroma blocks can be The data trains two autoencoders respectively, and uses the two autoencoders to perform intra-frame prediction on the two chrominance blocks respectively.
在本公开一实施例中,所述自编码器的编码网络采用降维的神经网络。In an embodiment of the present disclosure, the encoding network of the autoencoder adopts a dimension-reduced neural network.
在本公开视频编码方法的实施例中,在所述当前块的帧内预测模式为自编码器模式的情况下,所述视频编码方法还包括:对所述当前块的补充信息编码并写入码流。In an embodiment of the video coding method of the present disclosure, when the intra prediction mode of the current block is an autoencoder mode, the video coding method further includes: coding and writing the supplementary information of the current block into stream.
在本公开视频编码方法的一示例中,所述自编码器模式包括第一自编码器模式;在所述当前块的帧内预测模式为第一自编码器模式的情况下,所述视频编码方法还包括:默认所述当前块中像素的重建的残差值等于0,跳过残差处理。这种方法可以节省对残差编码的开销,同时简化帧内预测编码的运算复杂度,减少对视频编码器资源的占用。In an example of the video encoding method of the present disclosure, the autoencoder mode includes a first autoencoder mode; when the intra prediction mode of the current block is the first autoencoder mode, the video encoding The method further includes: defaulting that the reconstructed residual value of pixels in the current block is equal to 0, and skipping residual processing. This method can save the overhead of residual coding, simplify the computational complexity of intra-frame prediction coding, and reduce the occupation of video encoder resources.
在本公开视频编码方法的一示例中,所述自编码器模式包括第二自编码器模式;在所述当前块的帧内预测模式为第二自编码器模式的情况下,所述视频编码方法还包括:根据所述当前块中像素的原始值和预测值之差得到所述当前块中像素的残差值;及,对所述当前块中像素的残差值编码并写入码流。对残差值编码包括熵编码,还可以包括变换和量化中的一种或多种。这种方法在自编码器模式下保留了残差处理,可以提高解码后视频的质量。由于自编码器模式下的帧内预测需要编码补充信息,因此对于残差信息的编 码,可以使用更大的量化步长。In an example of the video encoding method of the present disclosure, the autoencoder mode includes a second autoencoder mode; when the intra prediction mode of the current block is the second autoencoder mode, the video encoding The method further includes: obtaining the residual value of the pixel in the current block according to the difference between the original value and the predicted value of the pixel in the current block; and encoding the residual value of the pixel in the current block and writing it into the code stream . Coding the residual value includes entropy coding and may also include one or more of transform and quantization. This approach preserves the residual processing in the autoencoder mode, which can improve the quality of the decoded video. Since intra prediction in autoencoder mode needs to encode supplementary information, larger quantization step sizes can be used for the encoding of residual information.
在本公开视频编码方法的一实施例中,在基于自编码器模式对所述当前块进行帧内预测的情况下,所述视频编码方法还包括:In an embodiment of the video encoding method of the present disclosure, in the case of performing intra-frame prediction on the current block based on an autoencoder mode, the video encoding method further includes:
在J1<J2的情况下,对所述当前块进行残差处理,所述残差处理包括:根据所述当前块中像素的原始值和预测值之差得到所述当前块中像素的残差值,及,对所述当前块中像素的残差值编码并写入码流;In the case of J1<J2, performing residual processing on the current block, the residual processing includes: obtaining the residual of the pixels in the current block according to the difference between the original value and the predicted value of the pixels in the current block value, and, encode the residual value of the pixel in the current block and write it into the code stream;
在J1≥J2的情况下,跳过所述当前块的残差处理;In the case of J1≥J2, skip the residual processing of the current block;
其中,J1是进行残差处理时所述当前块的编码代价,J2是跳过残差处理时所述当前块的编码代价。Wherein, J1 is the encoding cost of the current block when residual processing is performed, and J2 is the encoding cost of the current block when residual processing is skipped.
本公开对两种残差处理方式的选择并不局限于以上方式,例如,也可以使用阈值的方式来判断,例如,在J2小于或等于设定的阈值时,即跳过当前块的残差处理,在J2大于设定的阈值时,对当前块进行残差处理。The selection of the two residual processing methods in the present disclosure is not limited to the above methods, for example, a threshold method can also be used to judge, for example, when J2 is less than or equal to the set threshold, the residual of the current block is skipped For processing, when J2 is greater than the set threshold, residual processing is performed on the current block.
在本公开视频编码方法的一实施例中,在基于自编码器模式对所述当前块进行帧内预测、且可选择跳过残差处理或进行残差处理的情况下,所述视频编码方法还包括:In an embodiment of the video coding method of the present disclosure, in the case of performing intra-frame prediction on the current block based on the self-encoder mode, and can choose to skip residual processing or perform residual processing, the video coding method Also includes:
生成所述当前块的残差标识,其中,在对所述当前块进行残差处理的情况下,所述残差标识用于指示码流中存在所述当前块的残差数据,在跳过所述当前块的残差处理的情况下,所述残差标识用于指示码流中不存在所述当前块的残差数据;Generate a residual identifier of the current block, where, in the case of performing residual processing on the current block, the residual identifier is used to indicate that there is residual data of the current block in the code stream, and skip In the case of residual processing of the current block, the residual identifier is used to indicate that there is no residual data of the current block in the code stream;
对所述当前块的残差标识编码并写入码流。Encode the residual identifier of the current block and write it into a code stream.
在一个示例中,上述残差标识可以用一个比特表示,该比特的一个值指示码流中存在所述当前块的残差数据,另一个值指示码流中不存在所述当前块的残差数据。在另一个示例中,上述残差标识可以和帧内预测模式的标识合并设置,例如,当帧内预测模式的标识指示当前块的帧内预测模式为第一自编码器模式(也可以是自编码器模式下的一个子模式)时,表示码流中不存在所述当前块的残差数据,当帧内预测模式的标识指示当前块的帧内预测模式为第二自编码器模式(也可以是自编码器模式下的另一个子模式)时,表示码流中存在所述当前块的残差数据。In an example, the above residual identifier can be represented by one bit, one value of this bit indicates that the residual data of the current block exists in the code stream, and the other value indicates that there is no residual data of the current block in the code stream data. In another example, the above residual identification can be set in combination with the identification of the intra prediction mode, for example, when the identification of the intra prediction mode indicates that the intra prediction mode of the current block is the first autoencoder mode (or the autoencoder mode) A sub-mode under the encoder mode), it means that the residual data of the current block does not exist in the code stream, and when the intra prediction mode indicator indicates that the intra prediction mode of the current block is the second self-encoder mode (also may be another submode in the self-encoder mode), it indicates that the residual data of the current block exists in the code stream.
本公开一实施例将训练好的自编码器移植到视频解码器的帧内预测处理单元中,其结构如图11所示。移植后的帧内预测处理单元164’包括:In an embodiment of the present disclosure, the trained self-encoder is transplanted into the intra prediction processing unit of the video decoder, and its structure is shown in FIG. 11 . The transplanted intra prediction processing unit 164' includes:
模式选取单元1641,用于从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式;及,根据所述当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测;其中,所述多种可选择的帧内预测模式包括自编码器模式。The mode selection unit 1641 is configured to parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier and, activating a corresponding prediction unit to perform intra-frame prediction on the current block according to the intra-frame prediction mode of the current block; wherein, the multiple selectable intra-frame prediction modes include an autoencoder mode.
多种模式的预测单元,用于基于相应的帧内预测模式对当前块进行帧内预测,得到当前块中像素的预测值。其中包括:The prediction unit of multiple modes is used to perform intra-frame prediction on the current block based on the corresponding intra-frame prediction mode, and obtain the predicted value of the pixels in the current block. These include:
自编码器预测单元1643,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值(也即产生当前块的预测块)。在一示例中,视频解码器中的自编码器预测单元1643与视频编码器的自编码器预测单元1263使用相同的解码网络。The self-encoder prediction unit 1643 is configured to, when the intra-frame prediction mode of the current block is the self-encoder mode, based on the decoding network of the self-encoder corresponding to the self-encoder mode, the supplementary information of the current block or the The supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain the predicted values of the pixels in the current block (that is, to generate the predicted block of the current block). In one example, the autoencoder prediction unit 1643 in the video decoder uses the same decoding network as the autoencoder prediction unit 1263 in the video encoder.
其他模式的预测单元例如可以包括平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式或矩阵加权帧内预测MIP模式中的一种或多种的预测单元。图中以平面模式预测单元1642为例,略去了其他预测单元。The prediction units in other modes may include, for example, one or more prediction units in planar mode, horizontal mode, vertical mode, DC mode, angle mode, inter-component linear model CCLM mode or matrix weighted intra prediction MIP mode. In the figure, the plane mode prediction unit 1642 is taken as an example, and other prediction units are omitted.
上述帧内预测模式的标识、补充信息从码流中解析得到,具体地,可以由熵解码单元对码流解码得到后传送到帧内预测处理单元164’。The identification and supplementary information of the above-mentioned intra prediction mode are obtained by parsing the code stream, specifically, the code stream can be obtained by decoding the code stream by the entropy decoding unit and then sent to the intra prediction processing unit 164'.
虽然图11所示的示例示出了一个自编码器预测单元,但在另一示例中,本公开实施例也可以在帧内预测处理单元中设置多个自编码器预测单元。分别使用不同的自编码器进行帧内预测。本公开实施例可选择的帧内预测模式可以包括多种自编码器模式(如第一自编码器模式和第二自编码器模式),不同的自编码器模式可以使用不同的自编码器执行帧内预测,也可使用同一自编码器但设置不同的网络参数来执行不同自编码器模式下的帧内预测。Although the example shown in FIG. 11 shows one self-encoder prediction unit, in another example, multiple self-encoder prediction units may also be set in the intra-frame prediction processing unit in the embodiment of the present disclosure. Intra prediction is performed using different autoencoders respectively. The selectable intra-frame prediction mode in the embodiment of the present disclosure may include multiple autoencoder modes (such as the first autoencoder mode and the second autoencoder mode), and different autoencoder modes may be performed using different autoencoders For intra prediction, it is also possible to use the same autoencoder but set different network parameters to perform intra prediction in different autoencoder modes.
从图11中可以看出,在视频解码器中,自编码器模式下的帧内预测的输入不仅包括已重建参考信息, 还包括从码流中解析出的补充信息。如上文所述,补充信息包含有从原始块中提取的信息。与编码侧类似的,在视频解码器采用自编码器模式进行帧内预测时,也可以跳过残差处理,即不再进行对残差的熵解码、反量化、反变换等处理。在本公开一示例性实施例中,视频解码器采用第一自编码器模式对当前块进行帧内预测时,需要执行相应解码处理的模块如图12所示,其他模块(包括反量化单元154、反变换处理单元155)执行的处理均可跳过。默认当前块中像素的重建的残差值等于0,重建单元158接收帧内预测处理单元164输出的当前块中像素的预测值后,可以略过滤波处理(也可进行滤波处理),将所述预测值作为当前块中像素的重建值保存到图片缓冲器160中,跳过残差处理可以减少编码开销和解码的复杂度,减轻解码器的负担。It can be seen from Fig. 11 that in the video decoder, the input of the intra prediction in the self-encoder mode includes not only the reconstructed reference information, but also the supplementary information parsed from the code stream. As mentioned above, supplementary information contains information extracted from the original block. Similar to the encoding side, when the video decoder uses the self-encoder mode for intra-frame prediction, the residual processing can also be skipped, that is, the entropy decoding, inverse quantization, and inverse transformation of the residual are no longer processed. In an exemplary embodiment of the present disclosure, when the video decoder uses the first self-encoder mode to perform intra-frame prediction on the current block, the modules that need to perform corresponding decoding processing are shown in FIG. 12 , and other modules (including the inverse quantization unit 154 , inverse transformation processing unit 155) can be skipped. By default, the reconstructed residual value of the pixel in the current block is equal to 0. After the reconstruction unit 158 receives the predicted value of the pixel in the current block output by the intra prediction processing unit 164, the filtering process can be omitted (filtering process can also be performed), and the obtained The predicted value is stored in the picture buffer 160 as the reconstructed value of the pixel in the current block, and skipping the residual processing can reduce the coding overhead and decoding complexity, and reduce the burden on the decoder.
在本公开另一示例性的实施例中,视频解码器采用第二自编码器模式对当前块进行帧内预测,进行残差处理(如熵解码、反量化、反变换)等。参见图11和图3,自编码器预测单元1643中输出的当前块中像素的预测值需要送到重建单元158,与经熵解码、反量化、反变换得到的当前块中像素的重建的残差值相加,以产生当前块中像素的重建值,再经滤波器单元159滤波(帧内编码时滤波处理可以跳过)后。保存到图片缓冲器160。第二自编码器模式对应的帧内预测解码运算相对复杂,但图像质量相对较好。In another exemplary embodiment of the present disclosure, the video decoder uses the second self-encoder mode to perform intra-frame prediction on the current block, and perform residual processing (such as entropy decoding, inverse quantization, inverse transformation) and the like. Referring to Figure 11 and Figure 3, the predicted value of the pixels in the current block output from the encoder prediction unit 1643 needs to be sent to the reconstruction unit 158, and the reconstructed residual of the pixels in the current block obtained through entropy decoding, inverse quantization, and inverse transformation The difference values are added to generate the reconstructed value of the pixels in the current block, and then filtered by the filter unit 159 (the filtering process can be skipped during intra-frame encoding). Save to picture buffer 160 . The intra prediction decoding operation corresponding to the second autoencoder mode is relatively complex, but the image quality is relatively good.
上述残差处理方式不同的第一自编码器模式和第二自编码器模式可以作为两种不同的帧内预测模式供视频解码器中的帧内预测处理单元选择。也可以作为一种自编码器模式下的两种子模式。比较这两种子模式的编码代价,在跳过残差处理时当前块的编码代价小于或等于进行残差处理时当前块的编码代价时,采用第一自编码器模式即跳过残差处理以节约开销和简化运算,在跳过残差处理时当前块的编码代价大于进行残差处理时当前块的编码代价时,采用第二自编码器模式即进行残差处理以保证视频的质量。The first self-encoder mode and the second self-encoder mode with different residual processing methods can be used as two different intra-frame prediction modes for selection by the intra-frame prediction processing unit in the video decoder. Also available as two sub-modes in one autoencoder mode. Comparing the encoding cost of these two sub-modes, when the encoding cost of the current block when skipping residual processing is less than or equal to the encoding cost of the current block when performing residual processing, the first autoencoder mode is used, that is, skipping residual processing to To save overhead and simplify calculations, when the encoding cost of the current block when skipping residual processing is greater than the encoding cost of the current block when performing residual processing, the second autoencoder mode is used to perform residual processing to ensure video quality.
本公开一实施例还提供了一种视频解码方法,同时也提供了一种用于视频解码的帧内预测方法,如图13所示,包括:An embodiment of the present disclosure also provides a video decoding method, and also provides an intra prediction method for video decoding, as shown in FIG. 13 , including:
步骤710,从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,其中,所述多种可选择的帧内预测模式包括自编码器模式; Step 710, parse out the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, wherein the The multiple selectable intra prediction modes include autoencoder mode;
步骤720,根据所述当前块的帧内预测模式对所述当前块进行帧内预测。Step 720: Perform intra-frame prediction on the current block according to the intra-frame prediction mode of the current block.
其中,码流中的当前块的帧内预测模式的标识可以通过熵解码解析出来。Wherein, the identifier of the intra prediction mode of the current block in the code stream can be analyzed through entropy decoding.
本实施例视频解码方法将自编码器模式引入为帧内预测模式的一种,增加了解码端的基于非线性计算的帧内预测模式,可以增强视频编码器对细节特征复杂的图像的解码性能。The video decoding method of this embodiment introduces the self-encoder mode as a kind of intra-frame prediction mode, adds an intra-frame prediction mode based on nonlinear calculation at the decoding end, and can enhance the decoding performance of the video encoder for images with complex detail features.
在本公开一示例性的实施例中,所述多种可选择的帧内预测模式还包括平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式或矩阵加权帧内预测MIP模式中的一种或多种。In an exemplary embodiment of the present disclosure, the multiple selectable intra-frame prediction modes further include planar mode, horizontal mode, vertical mode, DC mode, angle mode, inter-component linear model CCLM mode or matrix weighted intra-frame One or more of the MIP modes are predicted.
在本公开一示例性的实施例中,所述帧内预测模式的标识包括自编码器模式标识位,所述自编码器模式标识位的一个取值表示帧内预测模式为自编码器模式,另一个取值表示帧内预测模式非自编码器模式;或者,所述帧内预测模式的标识包括帧内预测模式的索引号。可以在根据帧内预测模式的标识确定当前块的帧内预测模式为自编码器模式后,再解析补充信息。In an exemplary embodiment of the present disclosure, the identification of the intra prediction mode includes an autoencoder mode identification bit, and a value of the autoencoder mode identification bit indicates that the intra prediction mode is an autoencoder mode, Another value indicates that the intra-frame prediction mode is not an autoencoder mode; or, the identifier of the intra-frame prediction mode includes an index number of the intra-frame prediction mode. The supplementary information may be parsed after determining that the intra-frame prediction mode of the current block is the self-encoder mode according to the identifier of the intra-frame prediction mode.
在本公开一示例性的实施例中,所述当前块邻近的已重建参考信息包括所述当前块邻近的参考像素的重建值,所述当前块邻近的参考像素包括以下像素中的一种或多种:In an exemplary embodiment of the present disclosure, the reconstructed reference information adjacent to the current block includes reconstruction values of reference pixels adjacent to the current block, and the reference pixels adjacent to the current block include one of the following pixels or Various:
所述当前块上方的一行或多行像素;one or more rows of pixels above the current block;
所述当前块右上方的一行或多行像素;One or more rows of pixels at the upper right of the current block;
所述当前块左侧的一列或多列像素;one or more columns of pixels to the left of the current block;
所述当前块左下侧的一列或多列像素;One or more columns of pixels on the lower left side of the current block;
所述当前块左上角的一个或多个像素。One or more pixels in the upper left corner of the current block.
在本公开一示例性的实施例中,所述自编码器的解码网络采用升维的神经网络。In an exemplary embodiment of the present disclosure, the decoding network of the autoencoder adopts an up-dimensional neural network.
在本公开一示例性的实施例中,在所述当前块的帧内预测模式为自编码器模式的情况下,解码侧的帧内预测无需再使用自编码器的编码网络,而是使用解码网络对补充信息进行非线性变换得到预测值即可。在解码侧,根据所述当前块的帧内预测模式对所述当前块进行帧内预测,包括:In an exemplary embodiment of the present disclosure, when the intra prediction mode of the current block is the autoencoder mode, the intra prediction on the decoding side does not need to use the encoding network of the autoencoder, but uses the decoding The network performs nonlinear transformation on the supplementary information to obtain the predicted value. On the decoding side, perform intra prediction on the current block according to the intra prediction mode of the current block, including:
从码流中解析出所述当前块的补充信息,所述补充信息可以通过对码流进行熵解码而得到;Parsing the supplementary information of the current block from the code stream, the supplementary information can be obtained by performing entropy decoding on the code stream;
基于所述自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值。Based on the decoding network of the autoencoder corresponding to the autoencoder mode, the supplementary information of the current block or the supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain Predicted values of pixels in the current block.
在本公开一示例性的实施例中,所述解码网络具有训练出的一组网络参数;或者,所述解码网络具有训练出的多组网络参数,不同组的网络参数根据各自对应的块的样本集分别训练得到,不同组的网络参数对应的块具有不同的特征,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。其中,在所述解码网络具有训练出的多组网络参数的情况下,所述解码网络的网络参数通过以下方式确定:根据所述当前块的特征从所述多组网络参数中找到与所述当前块对应的一组网络参数,根据该组网络参数确定所述解码网络的网络参数;或者,从码流中解析出所述当前块的网络参数标识,根据所述网络参数标识指示的所述当前块对应的一组网络参数确定所述解码网络的参数。根据网络参数标识可以快速确定解码网络的网络参数,减少运算的复杂度。In an exemplary embodiment of the present disclosure, the decoding network has a set of trained network parameters; or, the decoding network has multiple sets of trained network parameters, and different sets of network parameters are based on the The sample sets are trained separately, and the blocks corresponding to different sets of network parameters have different characteristics, and the characteristics are determined according to one or more of the target code rate, size, shape and type. Wherein, in the case that the decoding network has multiple sets of network parameters trained, the network parameters of the decoding network are determined in the following manner: according to the characteristics of the current block, the A group of network parameters corresponding to the current block, and determine the network parameters of the decoding network according to the group of network parameters; or, parse out the network parameter identifier of the current block from the code stream, and according to the network parameter identifier indicated by the network parameter identifier A set of network parameters corresponding to the current block determines parameters of the decoding network. According to the network parameter identification, the network parameters of the decoding network can be quickly determined, reducing the complexity of the operation.
在本公开一示例性的实施例中,在所述当前块为视频编码单元,所述视频编码单元包含一个亮度块和两个色度块的情况下,与编码侧对应的,有三种可能的帧内预测方式。此时,所述基于所述自编码器模式对应的自编码器的解码网络对所述当前块进行帧内预测,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值,包括:In an exemplary embodiment of the present disclosure, when the current block is a video coding unit, and the video coding unit includes one luma block and two chrominance blocks, corresponding to the encoding side, there are three possible Intra prediction method. At this time, the decoding network based on the autoencoder corresponding to the autoencoder mode performs intra-frame prediction on the current block, and the supplementary information of the current block or the supplementary information of the current block and the The reconstructed reference information adjacent to the current block is subjected to nonlinear transformation to obtain the predicted value of the pixels in the current block, including:
基于所述自编码器模式的与视频编码单元对应的一个自编码器的解码网络,对所述一个亮度块和两个色度块共用的补充信息或者对所述一个亮度块和两个色度块共用的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块和两个色度块中像素的预测值;或者Based on the decoding network of an autoencoder corresponding to the video coding unit of the autoencoder mode, for the supplementary information shared by the one luma block and the two chroma blocks or for the one luma block and the two chroma blocks The supplementary information shared by the block and the adjacent reconstructed reference information are nonlinearly transformed to obtain the predicted values of the pixels in the one luma block and the two chroma blocks; or
基于所述自编码器模式的与亮度块对应的一个自编码器的解码网络,对所述一个亮度块的补充信息或者对所述一个亮度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块中像素的预测值;及,基于所述自编码器模式的与色度块对应的一个自编码器的解码网络,对所述两个色度块共用的补充信息或者对所述两个色度块共用的补充信息和邻近的已重建参考信息进行非线性变换,得到所述两个色度块中像素的预测值;或者A decoding network of an autoencoder corresponding to a luminance block based on the autoencoder mode, non-linearly performs a non-linear operation on the supplementary information of the one luminance block or on the supplementary information of the one luminance block and adjacent reconstructed reference information Transform to obtain the predicted value of the pixels in the one luma block; and, based on the decoding network of an autoencoder corresponding to the chroma block in the autoencoder mode, the supplementary information shared by the two chroma blocks or perform nonlinear transformation on the supplementary information shared by the two chroma blocks and the adjacent reconstructed reference information to obtain the predicted values of the pixels in the two chroma blocks; or
基于所述自编码器模式的与亮度块对应的一个自编码器的解码网络,对所述一介亮度块的补充信息或者对所述一个亮度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块中像素的预测值;基于所述自编码器模式的与两个色度块中第一色度块对应的自编码器的解码网络,对所述第一色度块的补充信息或者对所述第一色度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述第一色度块中像素的预测值;及,基于所述自编码器模式的与两个色度块中第二色度块对应的自编码器的解码网络,对所述第二色度块的补充信息或者对所述第二色度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述第二色度块中像素的预测值。A decoding network of an autoencoder corresponding to a luma block based on the autoencoder mode, nonlinearly performs a non-linear operation on the supplementary information of the one luma block or on the supplementary information of the one luma block and adjacent reconstructed reference information Transform to obtain the predicted value of the pixel in the one luma block; based on the decoding network of the self-encoder corresponding to the first chrominance block in the two chrominance blocks of the self-encoder mode, the first chroma The supplementary information of the block or the supplementary information of the first chroma block and the adjacent reconstructed reference information are nonlinearly transformed to obtain the predicted value of the pixel in the first chroma block; and, based on the self-encoder The decoding network of the self-encoder corresponding to the second chrominance block of the two chrominance blocks of the mode, the supplementary information of the second chrominance block or the supplementary information of the second chrominance block and the adjacent existing The reconstruction reference information is subjected to nonlinear transformation to obtain the predicted value of the pixel in the second chrominance block.
在本公开视频解码方法的一实施例中,所述自编码器模式包括第一自编码器模式;在所述当前块的帧内预测模式为第一自编码器模式的情况下,所述视频解码方法还包括:默认所述当前块中像素的重建的残差值等于0,将所述当前块中像素的预测值作为所述当前块中像素的重建值。这种帧内预测解码方法对应于编码侧采用的跳过残差处理的第一自编码器模式下的帧内预测编码方法。In an embodiment of the video decoding method of the present disclosure, the autoencoder mode includes the first autoencoder mode; when the intra prediction mode of the current block is the first autoencoder mode, the video The decoding method further includes: defaulting that the reconstructed residual value of the pixel in the current block is equal to 0, and using the predicted value of the pixel in the current block as the reconstructed value of the pixel in the current block. This intra-frame predictive decoding method corresponds to the intra-frame predictive encoding method in the first self-encoder mode that skips the residual processing adopted by the encoding side.
在本公开视频解码方法的一实施例中,所述自编码器模式包括第二自编码器模式;在所述当前块的帧内预测模式为第二自编码器模式的情况下,所述视频解码方法还包括:对码流解析得到所述当前块中像素的重建的残差值,将所述当前块中像素的预测值和重建的残差值相加,得到所述当前块中像素的重建值。这种帧内预测解码方法对应于编码侧采用的进行残差处理的第二自编码器模式下的帧内预测编码方法。所述对码流解析得到所述当前块中像素的重建的残差值,其中的解析可以包括熵解码、反量化和反变换,反量化和/或反变换也可以跳过。In an embodiment of the video decoding method of the present disclosure, the autoencoder mode includes a second autoencoder mode; when the intra prediction mode of the current block is the second autoencoder mode, the video The decoding method further includes: analyzing the code stream to obtain the reconstructed residual value of the pixel in the current block, and adding the predicted value of the pixel in the current block to the reconstructed residual value to obtain the pixel in the current block Rebuild value. This intra-frame predictive decoding method corresponds to the intra-frame predictive encoding method in the second self-encoder mode that performs residual processing on the encoding side. The parsing of the code stream obtains the reconstructed residual value of the pixels in the current block, wherein the parsing may include entropy decoding, inverse quantization and inverse transformation, and inverse quantization and/or inverse transformation may also be skipped.
在本公开视频解码方法的一实施例中,在确定所述当前块的帧内预测模式为自编码器模式的情况下,所述视频解码方法还包括:In an embodiment of the video decoding method of the present disclosure, when it is determined that the intra prediction mode of the current block is an autoencoder mode, the video decoding method further includes:
从码流中解析出所述当前块的残差标识,所述残差标识用于指示码流中是否存在所述当前块的残差数据;Parsing the residual identifier of the current block from the code stream, where the residual identifier is used to indicate whether residual data of the current block exists in the code stream;
在所述残差标识指示码流中存在所述当前块的残差数据的情况下,对所述当前块的残差数据进行解析处理;In the case where the residual identifier indicates that residual data of the current block exists in the code stream, parsing and processing the residual data of the current block;
在所述残差标识指示码流中不存在所述当前块的残差数据的情况下,跳过对所述当前块的残差数据的解析处理,默认所述当前块中像素的重建的残差值等于0。In the case where the residual identifier indicates that the residual data of the current block does not exist in the code stream, the parsing process of the residual data of the current block is skipped, and the reconstructed residual of the pixels in the current block is defaulted The difference is equal to 0.
本公开一实施例提供了一种视频解码装置,如图14所示,包括处理器5以及存储有可在所述处理器上运行的计算机程序的存储器6,其中,所述处理器5执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。An embodiment of the present disclosure provides a video decoding device, as shown in FIG. 14 , including a processor 5 and a memory 6 storing computer programs that can run on the processor, wherein the processor 5 executes the The computer program is used to implement the video decoding method described in any embodiment of the present disclosure.
本公开一实施例还提供了一种视频解码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:An embodiment of the present disclosure also provides a video decoding device, including an intra prediction processing unit, wherein the intra prediction processing unit includes:
模式选取单元,用于从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式;及,根据所述当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测;a mode selection unit, configured to parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, The multiple selectable intra-frame prediction modes include an autoencoder mode; and, according to the intra-frame prediction mode of the current block, activate a corresponding prediction unit to perform intra-frame prediction on the current block;
自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值。The self-encoder prediction unit is used for, when the intra-frame prediction mode of the current block is the self-encoder mode, based on the decoding network of the self-encoder corresponding to the self-encoder mode, for the supplementary information of the current block or for all The supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain the predicted value of the pixels in the current block.
本公开一实施例还提供了一种视频编码装置,也可参见图14,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。An embodiment of the present disclosure also provides a video encoding device, see also FIG. 14 , including a processor and a memory storing a computer program that can run on the processor, wherein the processor executes the computer The program implements the video coding method described in any embodiment of the present disclosure.
本公开一实施例还提供了一种视频编码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:An embodiment of the present disclosure also provides a video encoding device, including an intra-frame prediction processing unit, wherein the intra-frame prediction processing unit includes:
模式选取单元,用于从多种可选择的帧内预测模式中选取当前块的帧内预测模式,根据当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测,及对当前块的帧内预测模式的标识编码并写入码流;其中,所述多种可选择的帧内预测模式包括自编码器模式;The mode selection unit is used to select the intra prediction mode of the current block from a variety of selectable intra prediction modes, activate the corresponding prediction unit to perform intra prediction on the current block according to the intra prediction mode of the current block, and perform the intra prediction on the current block. The identification of the intra-frame prediction mode of the current block is coded and written into the code stream; wherein the multiple selectable intra-frame prediction modes include an autoencoder mode;
自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器的编码网络,对当前块中像素的原始值和/或当前块邻近的已重建参考信息进行第一非线性变换,得到当前块的补充信息;及,基于自编码器的解码网络,对当前块的补充信息或者对当前块的补充信息和邻近的已重建参考信息进行第二非线性变换,得到当前块中像素的预测值。The self-encoder prediction unit is used to reconstruct the original values of the pixels in the current block and/or the adjacent pixels of the current block based on the coding network of the self-encoder when the intra-frame prediction mode of the current block is the self-encoder mode performing a first non-linear transformation on the reference information to obtain supplementary information of the current block; and, based on the decoding network of an autoencoder, performing a second non-linear transformation on the supplementary information of the current block or on the supplementary information of the current block and adjacent reconstructed reference information Linear transformation to get the predicted value of the pixels in the current block.
本公开一实施例还提供了一种视频编解码系统,包括本公开任一实施例所述的视频编码装置和本公开任一实施例所述的视频解码装置。An embodiment of the present disclosure further provides a video encoding and decoding system, including the video encoding device described in any embodiment of the present disclosure and the video decoding device described in any embodiment of the present disclosure.
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频编码方法或视频解码方法。An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized. The video encoding method or the video decoding method described in the example.
本公开一实施例还提供了一种码流,其中,所述码流根据本公开实施例的视频编码方法生成,所述码流中包括对所述当前块的帧内预测模式的标识和所述当前块的补充信息编码得到的码字;或者,所述码流中包括对所述当前块的帧内预测模式的标识、所述当前块的补充信息以及以下信息中的一种或多种编码得到的码字:所述当前块的网络参数标识、所述当前块中像素的残差值、所述当前块的残差标识。An embodiment of the present disclosure further provides a code stream, wherein the code stream is generated according to the video coding method of the embodiment of the present disclosure, and the code stream includes the identification of the intra prediction mode of the current block and the The codeword obtained by encoding the supplementary information of the current block; or, the code stream includes one or more of the identification of the intra prediction mode of the current block, the supplementary information of the current block, and the following information A codeword obtained by encoding: the network parameter identifier of the current block, the residual value of pixels in the current block, and the residual identifier of the current block.
采用本公开上述实施例的帧内预测和视频编解码方法,可以提升了主客观预测质量和编码性能。图15A至图15D、图16A至图16D、图17A至图17D分别是三组采用不同帧内预测方式得到的重建块与原始块的对比图,各组均采用32x32的亮度块但具有不同的内容。其中,图15A、图16A和图17A所示为亮度块的原始块和其邻近的重建像素,图15B、图16B和图17B所示为基于自编码器模式下的帧内预测得到的亮度块的预测块和邻近的重建像素,图15C、图16C和图17C所示为将自编码器的补充信息前置设为0时亮度块的预测块和邻近的重建像素,图15D、图16D和图17D所示为使用DC模式进行帧内预测得到的亮度块的预测块和邻近的重建像素。可以看出,基于自编码器模式下的帧内预测的结果具有较小的失真。By adopting the intra prediction and video encoding and decoding methods of the above-mentioned embodiments of the present disclosure, the subjective and objective prediction quality and encoding performance can be improved. Figures 15A to 15D, Figures 16A to 16D, and Figures 17A to 17D are three groups of comparison diagrams of reconstructed blocks and original blocks obtained by using different intra prediction methods. Each group uses a 32x32 brightness block but has a different content. Among them, Fig. 15A, Fig. 16A and Fig. 17A show the original block of the luma block and its adjacent reconstruction pixels, and Fig. 15B, Fig. 16B and Fig. 17B show the luma block obtained based on the intra prediction in the self-encoder mode The predicted block and adjacent reconstructed pixels of , Figure 15C, Figure 16C and Figure 17C show the predicted block and adjacent reconstructed pixels of the luma block when the supplementary information of the self-encoder is set to 0, Figure 15D, Figure 16D and FIG. 17D shows the predicted block and adjacent reconstructed pixels of a luma block obtained by intra prediction using DC mode. It can be seen that the result based on intra prediction in autoencoder mode has less distortion.
本公开实施例将基于自编码器的图像压缩引入基于块级的视频编解码器,对传统的帧内预测进行了拓展,利用端到端压缩对于复杂结构编码性能好的特性,提升了视频编解码器的性能。另外,在以往的自编码器网络中,对于自编码器编码网络输出补充信息并没有进一步优化,而是直接当作自编码器解码网络输入进行解码。在本公开实施例中。提出对补充信息进行微调,使得拉格朗日率失真代价最小以优化性能。The embodiment of the present disclosure introduces the image compression based on the self-encoder into the video codec based on the block level, expands the traditional intra-frame prediction, and uses the end-to-end compression for complex structure coding performance to improve the video coding performance. decoder performance. In addition, in the previous autoencoder network, the supplementary information output by the autoencoder encoding network is not further optimized, but directly decoded as the input of the autoencoder decoding network. In the disclosed embodiment. A fine-tuning of supplementary information is proposed to minimize the Lagrangian rate-distortion cost to optimize performance.
在一个或多个示例性实施例中,所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软 件实施,那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输,且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质,或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式,计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一个或多个计算机或者一个或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media that correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may comprise a computer readable medium.
举例来说且并非限制,此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且,还可以将任何连接称作计算机可读介质举例来说,如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令,则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解,计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质,而是针对非瞬时有形存储介质。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等,其中磁盘通常以磁性方式再生数据,而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or may be used to store instructions or data Any other medium that stores desired program code in the form of a structure and that can be accessed by a computer. Moreover, any connection could also be termed a computer-readable medium. For example, if a connection is made from a website, server or other remote source for transmitting instructions, coaxial cable, fiber optic cable, dual wire, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not encompass connections, carrier waves, signals, or other transitory (transitory) media, but are instead directed to non-transitory tangible storage media. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, or blu-ray disc, etc. where disks usually reproduce data magnetically, while discs use lasers to Data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
可由例如一个或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一个或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或并入在组合式编解码器中。并且,可将所述技术完全实施于一个或多个电路或逻辑元件中。can be implemented by one or more processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Execute instructions. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
本公开实施例的技术方案可在广泛多种装置或设备中实施,包含无线手机、集成电路(IC)或一组IC(例如,芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面,但不一定需要通过不同硬件单元来实现。而是,如上所述,各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。The technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset). Various components, modules, or units are described in the disclosed embodiments to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (comprising one or more processors as described above) in combination with suitable software and/or firmware.

Claims (39)

  1. 一种视频解码方法,包括:A video decoding method, comprising:
    从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,其中,所述多种可选择的帧内预测模式包括自编码器模式;Parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, wherein the multiple kinds of intra-frame prediction modes Selectable intra prediction modes include autoencoder mode;
    根据所述当前块的帧内预测模式对所述当前块进行帧内预测。Perform intra-frame prediction on the current block according to the intra-frame prediction mode of the current block.
  2. 如权利要求1所述的视频解码方法,其中:The video decoding method as claimed in claim 1, wherein:
    所述多种可选择的帧内预测模式还包括平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式或矩阵加权帧内预测MIP模式中的一种或多种。The multiple selectable intra prediction modes further include one or more of planar mode, horizontal mode, vertical mode, DC mode, angle mode, inter-component linear model CCLM mode or matrix weighted intra prediction MIP mode.
  3. 如权利要求1所述的视频解码方法,其中:The video decoding method as claimed in claim 1, wherein:
    在所述当前块的帧内预测模式为自编码器模式的情况下,根据所述当前块的帧内预测模式对所述当前块进行帧内预测,包括:In a case where the intra prediction mode of the current block is an autoencoder mode, performing intra prediction on the current block according to the intra prediction mode of the current block includes:
    从码流中解析出所述当前块的补充信息;Parsing the supplementary information of the current block from the code stream;
    基于所述自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值。Based on the decoding network of the autoencoder corresponding to the autoencoder mode, the supplementary information of the current block or the supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain Predicted values of pixels in the current block.
  4. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    所述自编码器模式包括第一自编码器模式;the autoencoder patterns include a first autoencoder pattern;
    在所述当前块的帧内预测模式为第一自编码器模式的情况下,所述视频解码方法还包括:默认所述当前块中像素的重建的残差值等于0,将所述当前块中像素的预测值作为所述当前块中像素的重建值。In the case where the intra prediction mode of the current block is the first self-encoder mode, the video decoding method further includes: defaulting that the reconstructed residual value of pixels in the current block is equal to 0, and converting the current block to The predicted value of the pixel in the current block is used as the reconstructed value of the pixel in the current block.
  5. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    所述自编码器模式包括第二自编码器模式;the autoencoder patterns include a second autoencoder pattern;
    在所述当前块的帧内预测模式为第二自编码器模式的情况下,所述视频解码方法还包括:对码流解析得到所述当前块中像素的重建的残差值,将所述当前块中像素的预测值和重建的残差值相加,得到所述当前块中像素的重建值。When the intra prediction mode of the current block is the second self-encoder mode, the video decoding method further includes: analyzing the code stream to obtain a reconstructed residual value of pixels in the current block, and converting the The predicted value of the pixel in the current block is added to the reconstructed residual value to obtain the reconstructed value of the pixel in the current block.
  6. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    所述解码网络具有训练出的一组网络参数;或者The decoding network has a trained set of network parameters; or
    所述解码网络具有训练出的多组网络参数,不同组的网络参数根据各自对应的块的样本集分别训练得到,不同组的网络参数对应的块具有不同的特征,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。The decoding network has multiple sets of network parameters trained, and the network parameters of different sets are respectively trained according to the sample sets of corresponding blocks, and the blocks corresponding to the network parameters of different sets have different characteristics, and the characteristics are based on the target code rate , size, shape and type determined by one or more.
  7. 如权利要求6所述的视频解码方法,其中:The video decoding method as claimed in claim 6, wherein:
    在所述解码网络具有训练出的多组网络参数的情况下,所述解码网络的网络参数通过以下方式确定:In the case where the decoding network has multiple sets of network parameters trained, the network parameters of the decoding network are determined in the following manner:
    根据所述当前块的特征从所述多组网络参数中找到与所述当前块对应的一组网络参数,根据该组网络参数确定所述解码网络的网络参数;或者Finding a set of network parameters corresponding to the current block from the multiple sets of network parameters according to the characteristics of the current block, and determining network parameters of the decoding network according to the set of network parameters; or
    从码流中解析出所述当前块的网络参数标识,根据所述网络参数标识指示的所述当前块对应的一组网络参数确定所述解码网络的参数。The network parameter identifier of the current block is parsed from the code stream, and the parameters of the decoding network are determined according to a set of network parameters corresponding to the current block indicated by the network parameter identifier.
  8. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    所述当前块为视频编码单元,或者为视频编码单元包含的亮度块,或者为视频编码单元包含的色度块,所述视频编码单元为宏块MB、编码树单元CTB、编码单元CU、预测单元PU、或变换单元TU。The current block is a video coding unit, or a luma block contained in a video coding unit, or a chrominance block contained in a video coding unit, and the video coding unit is a macroblock MB, a coding tree unit CTB, a coding unit CU, a prediction Unit PU, or transform unit TU.
  9. 如权利要求8所述的视频解码方法,其中:The video decoding method as claimed in claim 8, wherein:
    在所述当前块为视频编码单元,所述视频编码单元包含一个亮度块和两个色度块的情况下,所述基于所述自编码器模式对应的自编码器的解码网络对所述当前块进行帧内预测,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值,包括:In the case that the current block is a video coding unit, and the video coding unit includes a luma block and two chrominance blocks, the decoding network based on the autoencoder corresponding to the autoencoder mode Performing intra-frame prediction on the block, performing nonlinear transformation on the supplementary information of the current block or on the supplementary information of the current block and the reconstructed reference information adjacent to the current block, to obtain the predicted value of the pixel in the current block, include:
    基于所述自编码器模式的与视频编码单元对应的一个自编码器的解码网络,对所述一个亮度块和两个色度块共用的补充信息或者对所述一个亮度块和两个色度块共用的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块和两个色度块中像素的预测值;或者Based on the decoding network of an autoencoder corresponding to the video coding unit of the autoencoder mode, for the supplementary information shared by the one luma block and the two chroma blocks or for the one luma block and the two chroma blocks The supplementary information shared by the block and the adjacent reconstructed reference information are nonlinearly transformed to obtain the predicted values of the pixels in the one luma block and the two chroma blocks; or
    基于所述自编码器模式的与亮度块对应的一个自编码器的解码网络,对所述一个亮度块的补充信息或者对所述一个亮度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块中像素的预测值;及,基于所述自编码器模式的与色度块对应的一个自编码器的解码网络,对所述两个色度块共用的补充信息或者对所述两个色度块共用的补充信息和邻近的已重建参考信息进行非线性变换,得到所述两个色度块中像素的预测值;或者A decoding network of an autoencoder corresponding to a luminance block based on the autoencoder mode, non-linearly performs a non-linear operation on the supplementary information of the one luminance block or on the supplementary information of the one luminance block and adjacent reconstructed reference information Transform to obtain the predicted value of the pixels in the one luma block; and, based on the decoding network of an autoencoder corresponding to the chroma block in the autoencoder mode, the supplementary information shared by the two chroma blocks or perform nonlinear transformation on the supplementary information shared by the two chroma blocks and the adjacent reconstructed reference information to obtain the predicted values of the pixels in the two chroma blocks; or
    基于所述自编码器模式的与亮度块对应的一个自编码器的解码网络,对所述一介亮度块的补充信息或者对所述一个亮度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述一个亮度块中像素的预测值;基于所述自编码器模式的与两个色度块中第一色度块对应的自编码器的解码网络,对所述第一色度块的补充信息或者对所述第一色度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述第一色度块中像素的预测值;及,基于所述自编码器模式的与两个色度块中第二色度块对应的自编码器的解码网络,对所述第二色度块的补充信息或者对所述第二色度块的补充信息和邻近的已重建参考信息进行非线性变换,得到所述第二色度块中像素的预测值。A decoding network of an autoencoder corresponding to a luma block based on the autoencoder mode, nonlinearly performs a non-linear operation on the supplementary information of the one luma block or on the supplementary information of the one luma block and adjacent reconstructed reference information Transform to obtain the predicted value of the pixel in the one luma block; based on the decoding network of the self-encoder corresponding to the first chrominance block in the two chrominance blocks of the self-encoder mode, the first chroma The supplementary information of the block or the supplementary information of the first chroma block and the adjacent reconstructed reference information are nonlinearly transformed to obtain the predicted value of the pixel in the first chroma block; and, based on the self-encoder The decoding network of the self-encoder corresponding to the second chrominance block of the two chrominance blocks of the mode, the supplementary information of the second chrominance block or the supplementary information of the second chrominance block and the adjacent existing The reconstruction reference information is subjected to nonlinear transformation to obtain the predicted value of the pixel in the second chrominance block.
  10. 如权利要求1所述的视频解码方法,其中:The video decoding method as claimed in claim 1, wherein:
    所述帧内预测模式的标识包括自编码器模式标识位,所述自编码器模式标识位的一个取值表示帧内预测模式为自编码器模式,另一个取值表示帧内预测模式非自编码器模式;或者The identification of the intra-frame prediction mode includes an auto-encoder mode identification bit. One value of the auto-encoder mode identification bit indicates that the intra-frame prediction mode is an auto-encoder mode, and the other value indicates that the intra-frame prediction mode is not an auto-encoder mode. encoder mode; or
    所述帧内预测模式的标识包括帧内预测模式的索引号。The identifier of the intra-frame prediction mode includes an index number of the intra-frame prediction mode.
  11. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    所述当前块邻近的已重建参考信息包括所述当前块邻近的参考像素的重建值,所述当前块邻近的参考像素包括以下像素中的一种或多种:The reconstructed reference information adjacent to the current block includes reconstruction values of reference pixels adjacent to the current block, and the reference pixels adjacent to the current block include one or more of the following pixels:
    所述当前块上方的一行或多行像素;one or more rows of pixels above the current block;
    所述当前块右上方的一行或多行像素;One or more rows of pixels at the upper right of the current block;
    所述当前块左侧的一列或多列像素;one or more columns of pixels to the left of the current block;
    所述当前块左下侧的一列或多列像素;One or more columns of pixels on the lower left side of the current block;
    所述当前块左上角的一个或多个像素。One or more pixels in the upper left corner of the current block.
  12. 如权利要求3所述的视频解码方法,其中:The video decoding method as claimed in claim 3, wherein:
    在确定所述当前块的帧内预测模式为自编码器模式的情况下,所述视频解码方法还包括:In the case where it is determined that the intra prediction mode of the current block is an autoencoder mode, the video decoding method further includes:
    从码流中解析出所述当前块的残差标识,所述残差标识用于指示码流中是否存在所述当前块的残差数据;Parsing the residual identifier of the current block from the code stream, where the residual identifier is used to indicate whether residual data of the current block exists in the code stream;
    在所述残差标识指示码流中存在所述当前块的残差数据的情况下,对所述当前块的残差数据进行解析处理;In the case where the residual identifier indicates that residual data of the current block exists in the code stream, parsing and processing the residual data of the current block;
    在所述残差标识指示码流中不存在所述当前块的残差数据的情况下,跳过对所述当前块的残差数据的解析处理,默认所述当前块中像素的重建的残差值等于0。In the case where the residual identifier indicates that the residual data of the current block does not exist in the code stream, the parsing process of the residual data of the current block is skipped, and the reconstructed residual of the pixels in the current block is defaulted The difference is equal to 0.
  13. 一种视频编码方法,包括:A video coding method, comprising:
    从多种可选择的帧内预测模式中选取当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式;selecting an intra prediction mode of the current block from a plurality of selectable intra prediction modes, the plurality of selectable intra prediction modes including an autoencoder mode;
    基于所述当前块的帧内预测模式对所述当前块进行帧内预测,对所述当前块的帧内预测模式的标识编码并写入码流。Perform intra-frame prediction on the current block based on the intra-frame prediction mode of the current block, encode the identifier of the intra-frame prediction mode of the current block and write it into a code stream.
  14. 如权利要求13所述的视频编码方法,其中:The video coding method as claimed in claim 13, wherein:
    所述多种可选择的帧内预测模式还包括以下模式中的一种或多种:平面模式、水平模式、垂直模式、DC模式、角度模式、分量间线性模型CCLM模式和矩阵加权帧内预测MIP模式。The multiple selectable intra prediction modes also include one or more of the following modes: planar mode, horizontal mode, vertical mode, DC mode, angle mode, inter-component linear model CCLM mode and matrix weighted intra prediction MIP mode.
  15. 如权利要求13所述的视频编码方法,其中:The video coding method as claimed in claim 13, wherein:
    在所述当前块的帧内预测模式为自编码器模式的情况下,所述基于所述当前块的帧内预测模式对所述当前块进行帧内预测,包括:In the case where the intra prediction mode of the current block is an autoencoder mode, performing intra prediction on the current block based on the intra prediction mode of the current block includes:
    通过编码处理得到所述当前块的补充信息,所述编码处理包括:基于所述自编码器模式对应的自编码器的编码网络,对所述当前块中像素的原始值和/或所述当前块邻近的已重建参考信息进行第一非线性变换;The supplementary information of the current block is obtained through encoding processing, and the encoding process includes: based on the encoding network of the autoencoder corresponding to the autoencoder mode, the original value of the pixel in the current block and/or the current performing a first nonlinear transformation on the reconstructed reference information adjacent to the block;
    基于所述自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行第二非线性变换,得到所述当前块中像素的预测值。Based on the decoding network of the autoencoder corresponding to the autoencoder mode, perform a second nonlinear transformation on the supplementary information of the current block or on the supplementary information of the current block and the reconstructed reference information adjacent to the current block , to obtain the predicted value of the pixel in the current block.
  16. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    在所述当前块的帧内预测模式为自编码器模式的情况下,所述视频编码方法还包括:对所述当前块的补充信息编码并写入码流。In the case that the intra prediction mode of the current block is an autoencoder mode, the video encoding method further includes: encoding the supplementary information of the current block and writing it into a code stream.
  17. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    所述自编码器模式包括第一自编码器模式;the autoencoder patterns include a first autoencoder pattern;
    在所述当前块的帧内预测模式为第一自编码器模式的情况下,所述视频编码方法还包括:默认所述当前块中像素的重建的残差值等于0,跳过残差处理。In the case where the intra prediction mode of the current block is the first self-encoder mode, the video encoding method further includes: defaulting that the reconstructed residual value of pixels in the current block is equal to 0, and skipping residual processing .
  18. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    所述自编码器模式包括第二自编码器模式;the autoencoder patterns include a second autoencoder pattern;
    在所述当前块的帧内预测模式为第二自编码器模式的情况下,所述视频编码方法还包括:根据所述当前块中像素的原始值和预测值之差得到所述当前块中像素的残差值;及,对所述当前块中像素的残差值编码并写入码流。In the case where the intra prediction mode of the current block is the second self-encoder mode, the video encoding method further includes: according to the difference between the original value and the predicted value of the pixel in the current block, obtaining A residual value of a pixel; and, encoding the residual value of a pixel in the current block and writing it into a code stream.
  19. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    所述编码处理还包括:The encoding process also includes:
    确定所述第一非线性变换输出的元素值的多种微调方式;determining multiple fine-tuning modes of element values output by the first nonlinear transformation;
    分别计算所述多种微调方式下的所述当前块的编码代价;respectively calculating the encoding cost of the current block in the multiple fine-tuning modes;
    按照编码代价最小的微调方式对所述元素值进行微调,将微调后的元素值作为所述当前块的补充信息。The element value is fine-tuned in a fine-tuning manner with the least coding cost, and the fine-tuned element value is used as supplementary information of the current block.
  20. 如权利要求19所述的视频编码方法,其中:The video coding method as claimed in claim 19, wherein:
    所述第一非线性变换输出的元素值为整数;所述多种微调方式包括对所述第一非线性变换输出的部分或全部元素值的可能微调值进行组合得到的部分或全部组合方式,其中,一个元素值的可能微调值包括该元素值,该元素值加1得到的值,及该元素值减1得到的值;或者The element values output by the first nonlinear transformation are integers; the various fine-tuning methods include some or all combinations obtained by combining possible fine-tuning values of some or all element values output by the first nonlinear transformation, Wherein, the possible fine-tuning values of an element value include the element value, the value obtained by adding 1 to the element value, and the value obtained by subtracting 1 from the element value; or
    所述第一非线性变换输出的元素值为浮点数,所述多种微调方式包括对所述第一非线性变换输出的部分或全部元素值的可能微调值进行组合得到的部分或全部组合方式,其中,一个元素值的可能微调值包括对该元素值向上取整得到的值,及对该元素值向下取整得到的值。The element values output by the first nonlinear transformation are floating-point numbers, and the various fine-tuning methods include some or all of the combinations obtained by combining possible fine-tuning values of some or all element values output by the first nonlinear transformation , where the possible fine-tuning values of an element value include the value obtained by rounding up the element value and the value obtained by rounding down the element value.
  21. 如权利要求16至20中任一所述的视频编码方法,其中:The video coding method according to any one of claims 16 to 20, wherein:
    所述从多种可选择的帧内预测模式中选取当前块的帧内预测模式,包括:The selection of the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes includes:
    确定所述多种可选择的帧内预测模式中需进行代价计算的帧内预测模式;determining an intra-frame prediction mode that requires cost calculation among the multiple selectable intra-frame prediction modes;
    分别计算需进行代价计算的所述帧内预测模式的编码代价;Calculating the encoding costs of the intra prediction modes that require cost calculation;
    将编码代价最小的帧内预测模式选取为所述当前块的帧内预测模式;Selecting the intra-frame prediction mode with the smallest encoding cost as the intra-frame prediction mode of the current block;
    其中,对于任意的块,将自编码器模式作为需进行代价计算的帧内预测模式;或者,如所述当前块的 特征属于设定的可使用自编码器模式的块的特征,再将自编码器模式作为需进行代价计算的帧内预测模式,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。Wherein, for any block, the self-encoder mode is used as the intra-frame prediction mode that requires cost calculation; or, if the feature of the current block belongs to the set feature of the block that can use the self-encoder mode, then the self-encoder mode is used The encoder mode is an intra-frame prediction mode that requires cost calculation, and the features are determined according to one or more of target code rate, size, shape and type.
  22. 如权利要求21所述的视频编码方法,其中:The video coding method as claimed in claim 21, wherein:
    所述编码代价根据拉格朗日率失真公式计算;在所述自编码器模式包括第一自编码器模式的情况下,根据拉格朗日率失真公式计算所述第一自编码器模式对应的编码代价时,所述拉格朗日率失真公式中的比特开销包括所述当前块的补充信息和帧内预测模式的标识的开销,所述拉格朗日率失真公式中的失真根据所述当前块中像素的原始值和预测值确定。The encoding cost is calculated according to the Lagrangian rate-distortion formula; when the self-encoder mode includes the first self-encoder mode, the first self-encoder mode is calculated according to the Lagrangian rate-distortion formula corresponding to When the coding cost is high, the bit overhead in the Lagrange rate-distortion formula includes the overhead of the supplementary information of the current block and the identification of the intra prediction mode, and the distortion in the Lagrangian rate-distortion formula is based on the The original value and predicted value of the pixels in the current block are determined.
  23. 如权利要求21所述的视频编码方法,其中:The video coding method as claimed in claim 21, wherein:
    所述编码代价根据拉格朗日率失真公式计算;在所述自编码器模式为第二自编码器模式的情况下,根据拉格朗日率失真公式计算所述第二自编码器模式对应的率失真代价时,所述拉格朗日率失真公式中的比特开销包括所述当前块的补充信息、所述当前块中像素的残差值和所述当前块的帧内预测模式的标识的开销;所述拉格朗日率失真公式中的失真根据所述当前块中像素的原始值和重建值确定,所述当前块中像素的重建值等于所述当前块中像素的预测值加上重建的残差值。The encoding cost is calculated according to the Lagrangian rate-distortion formula; when the self-encoder mode is the second self-encoder mode, the second self-encoder mode is calculated according to the Lagrangian rate-distortion formula corresponding to When the rate-distortion cost is higher, the bit overhead in the Lagrangian rate-distortion formula includes the supplementary information of the current block, the residual value of the pixels in the current block, and the identification of the intra prediction mode of the current block overhead; the distortion in the Lagrangian rate-distortion formula is determined according to the original value and reconstructed value of the pixel in the current block, and the reconstructed value of the pixel in the current block is equal to the predicted value of the pixel in the current block plus Residual value of reconstruction on .
  24. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    所述自编码器具有训练出的一组网络参数;或者The autoencoder has a trained set of network parameters; or
    所述自编码器具有训练出的多组网络参数,每组网络参数包含所述自编码器的编码网络和解码网络的网络参数,不同组的网络参数根据各自对应的块的样本集分别训练得到,不同组的网络参数对应的块具有不同的特征,所述特征根据目标码率、大小、形状和类型中的一种或多种确定。The autoencoder has multiple sets of network parameters trained, each set of network parameters includes the network parameters of the encoding network and the decoding network of the autoencoder, and the network parameters of different groups are obtained by training respectively according to the sample sets of their corresponding blocks. , blocks corresponding to different groups of network parameters have different characteristics, and the characteristics are determined according to one or more of target code rate, size, shape and type.
  25. 如权利要求24所述的视频编码方法,其中:The video coding method as claimed in claim 24, wherein:
    在所述当前块的帧内预测模式为自编码器模式的情况下,所述自编码器的编码网络和解码网络的网络参数通过以下方式确定:根据所述当前块的特征从所述多组网络参数中找到与所述当前块对应的一组网络参数,根据该组网络参数确定所述自编码器的编码网络和解码网络的网络参数。In the case that the intra prediction mode of the current block is an autoencoder mode, the network parameters of the encoding network and the decoding network of the autoencoder are determined in the following manner: from the plurality of groups according to the characteristics of the current block A group of network parameters corresponding to the current block is found in the network parameters, and network parameters of the encoding network and the decoding network of the autoencoder are determined according to the group of network parameters.
  26. 如权利要求25所述的视频编码方法,其中:The video coding method as claimed in claim 25, wherein:
    所述视频编码方法还包括:对所述当前块的网络参数标识编码并写入码流,所述当前块的网络参数标识用于指示所述多组网络参数中与所述当前块对应的一组网络参数。The video encoding method further includes: encoding the network parameter identifier of the current block and writing it into a code stream, where the network parameter identifier of the current block is used to indicate one of the multiple groups of network parameters corresponding to the current block Group network parameters.
  27. 如权利要求13所述的视频编码方法,其中:The video coding method as claimed in claim 13, wherein:
    所述当前块为视频编码单元,或者为视频编码单元包含的亮度块,或者为视频编码单元包含的色度块,所述视频编码单元为宏块MB、编码树单元CTB、编码单元CU、预测单元PU、或变换单元TU。The current block is a video coding unit, or a luma block contained in a video coding unit, or a chrominance block contained in a video coding unit, and the video coding unit is a macroblock MB, a coding tree unit CTB, a coding unit CU, a prediction Unit PU, or transform unit TU.
  28. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    在所述当前块为视频编码单元,所述视频编码单元包含一个亮度块和两个色度块的情况下,所述通过编码处理得到所述当前块的补充信息,包括:In the case that the current block is a video coding unit, and the video coding unit includes a luma block and two chrominance blocks, the supplementary information of the current block obtained through encoding processing includes:
    基于所述自编码器模式的与视频编码单元对应的一个自编码器的编码网络,对所述一个亮度块和两个色度块中像素的原始值和/或所述一个亮度块和两个色度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块和两个色度块共用的补充信息;或者Based on the encoding network of an autoencoder corresponding to the video coding unit of the autoencoder mode, the original values of the pixels in the one luma block and the two chrominance blocks and/or the one luma block and the two Performing a first nonlinear transformation on the reconstructed reference information adjacent to the chroma block to obtain supplementary information shared by the one luma block and the two chroma blocks; or
    基于所述自编码器模式的与亮度块对应的一个自编码器的编码网络,对所述一个亮度块中像素的原始值和/或所述一个亮度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块的补充信息;及,基于所述自编码器模式的与色度块对应的一个自编码器的编码网络,对所述两个色度块中像素的原始值和/或所述两个色度块邻近的已重建参考信息进行第一非线性变换,得到所述两个色度块共用的补充信息;或者Based on the encoding network of an autoencoder corresponding to the luminance block of the autoencoder mode, a first non-destructive process is performed on the original value of the pixel in the one luminance block and/or the reconstructed reference information adjacent to the one luminance block. Linear transformation to obtain the supplementary information of the one luma block; and, based on the encoding network of an autoencoder corresponding to the chrominance block in the autoencoder mode, the original values of the pixels in the two chrominance blocks and/or performing a first nonlinear transformation on the reconstructed reference information adjacent to the two chroma blocks to obtain supplementary information shared by the two chroma blocks; or
    基于所述自编码器模式的与亮度块对应的一个自编码器的编码网络,对所述一个亮度块中像素的原始值和/或所述一个亮度块邻近的已重建参考信息进行第一非线性变换,得到所述一个亮度块的补充信息;基于所述自编码器模式的与所述两个色度块中第一色度块对应的一个自编码器的编码网络,对所述第一色度块中像素的原始值和/或邻近的已重建参考信息进行第一非线性变换,得到所述第一色度块的补充信息;及, 基于所述自编码器模式的与所述两个色度块中第二色度块对应的一个自编码器的编码网络,对所述第二色度块中像素的原始值和/或邻近的已重建参考信息进行第一非线性变换,得到所述第二色度块的补充信息。Based on the encoding network of an autoencoder corresponding to the luminance block of the autoencoder mode, a first non-destructive process is performed on the original value of the pixel in the one luminance block and/or the reconstructed reference information adjacent to the one luminance block. Linear transformation to obtain the supplementary information of the one luma block; based on the encoding network of an autoencoder corresponding to the first chrominance block of the two chrominance blocks in the autoencoder mode, the first performing a first nonlinear transformation on the original values of pixels in the chroma block and/or adjacent reconstructed reference information to obtain supplementary information for the first chroma block; and, based on the combination of the autoencoder mode and the two An encoding network of an autoencoder corresponding to a second chrominance block in a chrominance block performs a first nonlinear transformation on the original value of a pixel in the second chrominance block and/or adjacent reconstructed reference information, to obtain Supplementary information of the second chroma block.
  29. 如权利要求13所述的视频编码方法,其中:The video coding method as claimed in claim 13, wherein:
    所述帧内预测模式的标识包括自编码器模式标识位,所述自编码器模式标识位的一个取值表示帧内预测模式为自编码器模式,另一个取值表示帧内预测模式非自编码器模式;或者The identification of the intra-frame prediction mode includes an auto-encoder mode identification bit. One value of the auto-encoder mode identification bit indicates that the intra-frame prediction mode is an auto-encoder mode, and the other value indicates that the intra-frame prediction mode is not an auto-encoder mode. encoder mode; or
    所述帧内预测模式的标识包括帧内预测模式的索引号。The identifier of the intra-frame prediction mode includes an index number of the intra-frame prediction mode.
  30. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    所述当前块邻近的已重建参考信息包括所述当前块邻近的参考像素的重建值,所述当前块邻近的参考像素包括以下像素中的一种或多种:The reconstructed reference information adjacent to the current block includes reconstruction values of reference pixels adjacent to the current block, and the reference pixels adjacent to the current block include one or more of the following pixels:
    所述当前块上方的一行或多行像素;one or more rows of pixels above the current block;
    所述当前块右上方的一行或多行像素;One or more rows of pixels at the upper right of the current block;
    所述当前块左侧的一列或多列像素;one or more columns of pixels to the left of the current block;
    所述当前块左下侧的一列或多列像素;One or more columns of pixels on the lower left side of the current block;
    所述当前块左上角的一个或多个像素。One or more pixels in the upper left corner of the current block.
  31. 如权利要求15所述的视频编码方法,其中:The video coding method as claimed in claim 15, wherein:
    在基于自编码器模式对所述当前块进行帧内预测的情况下,所述视频编码方法还包括:In the case of performing intra prediction on the current block based on an autoencoder mode, the video encoding method further includes:
    在J1<J2的情况下,对所述当前块进行残差处理,所述残差处理包括:根据所述当前块中像素的原始值和预测值之差得到所述当前块中像素的残差值,及,对所述当前块中像素的残差值编码并写入码流;In the case of J1<J2, performing residual processing on the current block, the residual processing includes: obtaining the residual of the pixels in the current block according to the difference between the original value and the predicted value of the pixels in the current block value, and, encode the residual value of the pixel in the current block and write it into the code stream;
    在J1≥J2的情况下,跳过所述当前块的残差处理;In the case of J1≥J2, skip the residual processing of the current block;
    其中,J1是进行残差处理时所述当前块的编码代价,J2是跳过残差处理时所述当前块的编码代价。Wherein, J1 is the encoding cost of the current block when residual processing is performed, and J2 is the encoding cost of the current block when residual processing is skipped.
  32. 如权利要求17或18或31所述的视频编码方法,其中:The video coding method as claimed in claim 17 or 18 or 31, wherein:
    在基于自编码器模式对所述当前块进行帧内预测的情况下,所述视频编码方法还包括:In the case of performing intra prediction on the current block based on an autoencoder mode, the video encoding method further includes:
    生成所述当前块的残差标识,其中,在对所述当前块进行残差处理的情况下,所述残差标识用于指示码流中存在所述当前块的残差数据,在跳过所述当前块的残差处理的情况下,所述残差标识用于指示码流中不存在所述当前块的残差数据;Generate a residual identifier of the current block, where, in the case of performing residual processing on the current block, the residual identifier is used to indicate that there is residual data of the current block in the code stream, and skip In the case of residual processing of the current block, the residual identifier is used to indicate that there is no residual data of the current block in the code stream;
    对所述当前块的残差标识编码并写入码流。Encode the residual identifier of the current block and write it into a code stream.
  33. 一种视频解码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求1至12中任一所述的视频解码方法。A video decoding device, comprising a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, the computer program described in any one of claims 1 to 12 is realized. The video decoding method.
  34. 一种视频解码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:A video decoding device, comprising an intra prediction processing unit, wherein the intra prediction processing unit includes:
    模式选取单元,用于从接收的码流中解析出当前块的帧内预测模式的标识,根据所述标识从多种可选择的帧内预测模式中确定所述当前块的帧内预测模式,所述多种可选择的帧内预测模式包括自编码器模式;及,根据所述当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测;a mode selection unit, configured to parse the identifier of the intra-frame prediction mode of the current block from the received code stream, and determine the intra-frame prediction mode of the current block from a variety of selectable intra-frame prediction modes according to the identifier, The multiple selectable intra-frame prediction modes include an autoencoder mode; and, according to the intra-frame prediction mode of the current block, activate a corresponding prediction unit to perform intra-frame prediction on the current block;
    自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器模式对应的自编码器的解码网络,对所述当前块的补充信息或者对所述当前块的补充信息和所述当前块邻近的已重建参考信息进行非线性变换,得到所述当前块中像素的预测值。The self-encoder prediction unit is used for, when the intra-frame prediction mode of the current block is the self-encoder mode, based on the decoding network of the self-encoder corresponding to the self-encoder mode, for the supplementary information of the current block or for all The supplementary information of the current block and the reconstructed reference information adjacent to the current block are nonlinearly transformed to obtain the predicted value of the pixels in the current block.
  35. 一种视频编码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求13至32中任一所述的视频编码方法。A video encoding device, comprising a processor and a memory storing a computer program that can run on the processor, wherein when the processor executes the computer program, the computer program described in any one of claims 13 to 32 is realized. video encoding method.
  36. 一种视频编码装置,包括帧内预测处理单元,其中,所述帧内预测处理单元包括:A video encoding device, including an intra prediction processing unit, wherein the intra prediction processing unit includes:
    模式选取单元,用于从多种可选择的帧内预测模式中选取当前块的帧内预测模式,根据当前块的帧内预测模式激活相应的预测单元执行对当前块的帧内预测,及对当前块的帧内预测模式的标识编码并写入码 流;其中,所述多种可选择的帧内预测模式包括自编码器模式;The mode selection unit is used to select the intra prediction mode of the current block from a variety of selectable intra prediction modes, activate the corresponding prediction unit to perform intra prediction on the current block according to the intra prediction mode of the current block, and perform the intra prediction on the current block. The identification of the intra-frame prediction mode of the current block is coded and written into the code stream; wherein the multiple selectable intra-frame prediction modes include an autoencoder mode;
    自编码器预测单元,用于在当前块的帧内预测模式为自编码器模式的情况下,基于自编码器的编码网络,对当前块中像素的原始值和/或当前块邻近的已重建参考信息进行第一非线性变换,得到当前块的补充信息;及,基于自编码器的解码网络,对当前块的补充信息或者对当前块的补充信息和邻近的已重建参考信息进行第二非线性变换,得到当前块中像素的预测值。The self-encoder prediction unit is used to reconstruct the original values of the pixels in the current block and/or the adjacent pixels of the current block based on the coding network of the self-encoder when the intra-frame prediction mode of the current block is the self-encoder mode performing a first non-linear transformation on the reference information to obtain supplementary information of the current block; and, based on the decoding network of an autoencoder, performing a second non-linear transformation on the supplementary information of the current block or on the supplementary information of the current block and adjacent reconstructed reference information Linear transformation to get the predicted value of the pixels in the current block.
  37. 一种视频编解码系统,包括如权利要求35或36所述的视频编码装置,以及如权利要求33或34所述的视频解码装置。A video encoding and decoding system, comprising the video encoding device as claimed in claim 35 or 36, and the video decoding device as claimed in claim 33 or 34.
  38. 一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如权利要求1至31中任一所述的方法。A non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein the computer program implements the method according to any one of claims 1 to 31 when executed by a processor.
  39. 一种码流,其中,所述码流根据如权利要求13至32中任一所述的视频编码方法生成,所述码流中包括对所述当前块的帧内预测模式的标识和所述当前块的补充信息编码得到的码字;或者,所述码流中包括对所述当前块的帧内预测模式的标识、所述当前块的补充信息以及以下信息中的一种或多种编码得到的码字:所述当前块的网络参数标识、所述当前块中像素的残差值、所述当前块的残差标识。A code stream, wherein the code stream is generated according to the video coding method according to any one of claims 13 to 32, and the code stream includes the identification of the intra prediction mode of the current block and the The codeword obtained by encoding the supplementary information of the current block; or, the code stream includes one or more of the identification of the intra prediction mode of the current block, the supplementary information of the current block, and the following information The obtained codeword: the network parameter identifier of the current block, the residual value of pixels in the current block, and the residual identifier of the current block.
PCT/CN2021/099827 2021-06-11 2021-06-11 Video encoding/decoding method, device and system, and storage medium WO2022257134A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180098992.4A CN117441336A (en) 2021-06-11 2021-06-11 Video encoding and decoding method, device, system and storage medium
PCT/CN2021/099827 WO2022257134A1 (en) 2021-06-11 2021-06-11 Video encoding/decoding method, device and system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/099827 WO2022257134A1 (en) 2021-06-11 2021-06-11 Video encoding/decoding method, device and system, and storage medium

Publications (1)

Publication Number Publication Date
WO2022257134A1 true WO2022257134A1 (en) 2022-12-15

Family

ID=84425620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/099827 WO2022257134A1 (en) 2021-06-11 2021-06-11 Video encoding/decoding method, device and system, and storage medium

Country Status (2)

Country Link
CN (1) CN117441336A (en)
WO (1) WO2022257134A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200137384A1 (en) * 2018-10-24 2020-04-30 City University Of Hong Kong Generative adversarial network based intra prediction for video coding
US20200186809A1 (en) * 2018-12-05 2020-06-11 Google Llc Hybrid Motion-Compensated Neural Network with Side-Information Based Video Coding
CN111466115A (en) * 2017-10-13 2020-07-28 弗劳恩霍夫应用研究促进协会 Intra prediction mode concept for block-wise picture coding
CN112333451A (en) * 2020-11-03 2021-02-05 中山大学 Intra-frame prediction method based on generation countermeasure network
CN112514381A (en) * 2019-06-25 2021-03-16 Oppo广东移动通信有限公司 Image encoding and decoding method, encoder, decoder, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111466115A (en) * 2017-10-13 2020-07-28 弗劳恩霍夫应用研究促进协会 Intra prediction mode concept for block-wise picture coding
US20200137384A1 (en) * 2018-10-24 2020-04-30 City University Of Hong Kong Generative adversarial network based intra prediction for video coding
US20200186809A1 (en) * 2018-12-05 2020-06-11 Google Llc Hybrid Motion-Compensated Neural Network with Side-Information Based Video Coding
CN112514381A (en) * 2019-06-25 2021-03-16 Oppo广东移动通信有限公司 Image encoding and decoding method, encoder, decoder, and storage medium
CN112333451A (en) * 2020-11-03 2021-02-05 中山大学 Intra-frame prediction method based on generation countermeasure network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
B. CHOI (TENCENT), Z. LI (TENCENT), W. WANG (TENCENT), W. JIANG (TENCENT), X. XU (TENCENT), S. LIU (TENCENT): "AHG11: Information on inter-prediction coding tool with deep neural network", 20. JVET MEETING; 20201007 - 20201016; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-T0058, 6 October 2020 (2020-10-06), XP030289824 *

Also Published As

Publication number Publication date
CN117441336A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
TWI745594B (en) Intra filtering applied together with transform processing in video coding
CN110393010B (en) Intra-filtering flags in video coding
RU2723568C2 (en) Determining prediction parameters for non-square video coding units
RU2565365C2 (en) Encoding transform coefficients for video coding
TWI624172B (en) Palette-based video coding
CN103190147B (en) For combined decoding method and the equipment of the syntactic element of video coding
JP6162150B2 (en) Residual quadtree (RQT) coding for video coding
JP5869108B2 (en) Memory efficient context modeling
TWI827606B (en) Trellis coded quantization coefficient coding
US20110317757A1 (en) Intra prediction mode signaling for finer spatial prediction directions
US10356418B2 (en) Video encoding method and apparatus therefor, and video decoding method and apparatus therefor, in which edge type offset is applied
JP2014532375A (en) Sample adaptive offset merged with adaptive loop filter in video coding
JP5937205B2 (en) Run mode based coefficient coding for video coding
SG190202A1 (en) Separately coding the position of a last significant coefficient of a video block in video coding
JP2015511472A (en) Coding loop filter parameters using codebook in video coding
JP2023542841A (en) Multiple neural network models for filtering during video coding
CA3221507A1 (en) Coding and decoding methods, coder and decoder, and storage medium
WO2022257134A1 (en) Video encoding/decoding method, device and system, and storage medium
WO2022257142A1 (en) Video decoding and coding method, device and storage medium
WO2023004590A1 (en) Video decoding and encoding methods and devices, and storage medium
WO2023225854A1 (en) Loop filtering method and device, and video coding/decoding method, device and system
TWI829424B (en) Decoding method, encoding method and apparatus
WO2023039856A1 (en) Video decoding method and device, video encoding method and device, and storage medium
RU2801327C2 (en) Intra prediction method and device for a video sequence
TWI821013B (en) Mehtods and devices for video encoding and decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944641

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE