CN111225206B

CN111225206B - Video decoding method and video decoder

Info

Publication number: CN111225206B
Application number: CN201811409192.6A
Authority: CN
Inventors: 林永兵; 郑建铧
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2021-10-26
Anticipated expiration: 2038-11-23
Also published as: WO2020103800A1; CN111225206A

Abstract

The application discloses a video decoding method and a video decoder in the technical field of video coding and decoding. The method comprises the following steps: analyzing the code stream to obtain a target transformation matrix pair index value of the current block subjected to inverse transformation processing and a quantization coefficient of the current block; performing inverse quantization processing on the quantization coefficient of the current block to obtain an inverse quantization coefficient of the current block; determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair; carrying out inverse transformation processing on the inverse quantization coefficient of the current block according to the target transformation matrix so as to obtain a reconstructed residual block of the current block; and obtaining a reconstructed block of the current block according to the reconstructed residual block of the current block. The method and the device can reduce the calculation complexity in inverse transformation.

Description

Video decoding method and video decoder

Technical Field

The present application relates to the field of video coding and decoding technology, and more particularly, to a video decoding method and a decoder.

Background

Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones (so-called "smart phones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), the video coding standard H.265/High Efficiency Video Coding (HEVC), and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.

Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into tiles, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. An image block in a to-be-intra-coded (I) strip of an image is encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. An image block in a to-be-inter-coded (P or B) slice of an image may use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

In the process of coding an image block, the image block needs to be predicted to obtain a residual coefficient, and then the residual coefficient is transformed, quantized and entropy-coded to obtain a coded stream. In the process of transformation, it may be tried to transform the residual coefficients by using different transformation matrix pairs, and then select a suitable transformation matrix pair from the transformation matrix pairs according to the coding cost to transform the residual coefficients.

The conventional scheme may perform transformation using the DCT matrix 8 and the DCT7 matrix in addition to the DCT2 matrix during the transformation. However, the computational complexity is high when the DCT8 matrix and the DCT7 matrix are used for transformation.

Disclosure of Invention

The present application provides a video decoding method, a video encoding method, a video decoder and a video encoder to simplify the computational complexity in inverse transformation/transformation.

In a first aspect, a method for inverse video coding is provided, the method comprising: analyzing the code stream to obtain a target transformation matrix pair index value of a current block subjected to inverse transformation processing and a quantization coefficient of the current block; performing inverse quantization processing on the quantized coefficient of the current block to obtain an inverse quantized coefficient of the current block; determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and a corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair, wherein the candidate transformation matrix pair comprises a horizontal direction transformation matrix and a vertical direction transformation matrix, the horizontal direction transformation matrix and the vertical direction transformation matrix are both one of two preset transformation matrices, the first transformation matrix of the two preset transformation matrices is a DCT2 'matrix, and the DCT2' matrix is a transposed matrix of the DCT2 matrix; carrying out inverse transformation processing according to the inverse quantization coefficient of the current block to obtain a reconstructed residual block of the current block; and obtaining a reconstructed block of the current block according to the reconstructed residual block of the current block.

It should be understood that in the present application, inverse quantization may also be referred to as inverse quantization and inverse transform may also be referred to as inverse transform.

In the application, since the candidate transformation matrix pair includes the DCT2' matrix and the DCT2' matrix inverse transformation process has a fast algorithm, the fast algorithm can be adopted when the DCT2' is adopted for inverse transformation, and the computational complexity of the inverse transformation process can be reduced.

In addition, since the DCT2 'matrix is the transpose of the DCT2 matrix, the DCT2' can multiplex the inverse transform realization circuit of the DCT2 in the inverse transform process, and hardware realization cost can be reduced.

With reference to the first aspect, in certain implementations of the first aspect, the DCT2' matrix is derived from the DCT2 matrix.

In the application, since the DCT2' matrix is the transpose matrix of the DCT2 matrix, the matrix coefficients of the DCT2' matrix can be derived from the matrix coefficients of the DCT2 matrix, and the storage overhead can be reduced without additionally storing the matrix coefficients of the DCT2' matrix.

With reference to the first aspect, in certain implementations of the first aspect, a second transformation matrix of the two preset transformation matrices is derived from a DCT2 matrix.

In the application, since the second transformation matrix of the two preset transformation matrices can be derived according to the DCT2 matrix, it is not necessary to store the matrix coefficients of the second transformation matrix additionally, and the storage overhead can be reduced.

With reference to the first aspect, in certain implementations of the first aspect, a second transformation matrix of the two preset transformation matrices is a DCT2 'FS matrix or a DCT2' F matrix.

Wherein F in the DCT2' FS matrix and the DCT2' F matrix represents mirror image, and S in the DCT2' FS matrix represents sign transformation. The DCT2 'F matrix is obtained by mirroring the DCT2' matrix, and the DCT2 'FS matrix is obtained by firstly mirroring the DCT2' matrix and then carrying out symbol transformation on the matrix obtained by mirroring.

Specifically, when the second transform matrix is a DCT2 'FS matrix or a DCT2' F matrix, the DCT2 'FS matrix or the DCT2' F matrix may be derived from a DCT matrix of a corresponding size. For example, a 16 × 16DCT 2' FS matrix or a 16 × 16DCT 2' F matrix may each be derived from a 16 × 16DCT 2' F matrix.

Optionally, the mirror image is a left-right flip mirror image.

Flipping left and right to mirror a matrix may refer to mirroring the matrix coefficients on the left side of the matrix to the right side and mirroring the matrix coefficients on the right side of the matrix to the left side.

Alternatively, the above sign transformation means that only matrix coefficients of even rows of the matrix are sign-inverted, while matrix coefficients of odd rows of the matrix remain unchanged.

Alternatively, the above sign transformation means that only matrix coefficients of odd rows of the matrix are sign-inverted, while matrix coefficients of even rows of the matrix remain unchanged.

Optionally, the sign transformation is inverting the signs of all matrix coefficients in the matrix.

Optionally, the first matrix of the two predetermined transformation matrices is a DCT2' matrix, and the second matrix is a DCT2' FS matrix or a DCT2' F matrix is applicable to all transformation sizes.

Specifically, the first matrix of the two preset transformation matrices is a DCT2' matrix, and the second matrix is a DCT2' FS matrix or a DCT2' F matrix is suitable for the case where the transformation sizes are 4 dots, 8 dots, 16 dots, and 32 dots.

In the application, due to the fact that the DCT2 'FS and the DCT2' F matrixes have a fast algorithm in the inverse transformation process, the calculation complexity in the inverse transformation process can be simplified.

Specifically, when the first matrix of the two preset transformation matrices is a DCT2 'matrix and the second matrix is a DCT2' FS matrix or a DCT2 'F matrix, since a fast algorithm exists in the DCT2' matrix, the DCT2 'FS matrix and the DCT2' F matrix, when inverse transformation processing is performed on inverse quantization coefficients of a current block using a target transformation matrix pair (the target transformation matrix pair is formed by combining the two preset transformation matrices), the computational complexity in the inverse transformation process can be reduced.

With reference to the first aspect, in certain implementations of the first aspect, a horizontal transform matrix in the target pair of transform matrices is a DCT2' matrix, and a vertical transform matrix in the target pair of transform matrices is a DCT2' FS matrix or a DCT2' F matrix when the height of the current block is greater than or equal to M points; when the height of the current block is less than M points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer.

With reference to the first aspect, in certain implementations of the first aspect, a vertical transform matrix in the target pair of transform matrices is a DCT2' matrix, and a horizontal transform matrix in the target pair of transform matrices is a DCT2' FS matrix or a DCT2' F matrix when the width of the current block is greater than or equal to M points; when the height of the current block is less than M points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer.

Optionally, M is 32.

It is to be understood that a large-size transform scene may be considered when the transform size is greater than or equal to M points, and a small-size transform scene may be considered when the transform size is less than M points.

In the application, the transformation matrix with the rapid algorithm is adopted as the horizontal transformation matrix or the vertical transformation matrix in the target transformation matrix pair in the large-size transformation scene, and the transformation matrix suitable for the small-size transformation scene is adopted as the horizontal transformation matrix or the vertical transformation matrix in the target transformation matrix pair in the small-size transformation scene, so that the complexity of inverse transformation can be obviously reduced in the large-size transformation scene, the transformation performance can be ensured in the small-size scene, and the balance between the reduction of the complexity of inverse transformation and the guarantee of the inverse transformation performance can be achieved.

With reference to the first aspect, in certain implementations of the first aspect, a horizontal transform matrix in the target pair of transform matrices is a DCT2' matrix, and a vertical transform matrix in the target pair of transform matrices is a DST7 matrix when the height of the current block is greater than or equal to N points; when the height of the current block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

With reference to the first aspect, in certain implementations of the first aspect, a vertical transform matrix in the target pair of transform matrices is a DCT2' matrix, and a horizontal transform matrix in the target pair of transform matrices is a DST7 matrix when the width of the current block is greater than or equal to N points; when the width of the current block is less than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

Optionally, N-16.

It is to be understood that a large-size transform scene may be considered when the transform size is greater than or equal to N points, and a small-size transform scene may be considered when the transform size is less than M points.

In the application, the DCT2 'matrix and the DST7 matrix (suitable for inverse transformation in a large-size scene) are adopted as the target transformation matrix pair in a large-size transformation scene, and the DCT2' matrix and the DST4 matrix (suitable for inverse transformation in a small-size scene) are adopted as the target transformation matrix pair in a small-size scene, so that the complexity of inverse transformation can be reduced, and the performance of inverse transformation can be improved.

The DST4 matrix may also be derived from the DCT2 matrix.

Specifically, an L × L DST4 matrix may be derived (may also be referred to as a transform) from a 2L × 2L DCT2 matrix. For example, a 4 × 4 DST4 matrix may be derived from an 8 × 8DCT2 matrix, a 16 × 16 DST4 matrix may be derived from a 32 × 32DCT2 matrix, and a 32 × 32 DST4 matrix may be derived from a 64 × 64 DCT2 matrix.

It should be understood that the aforementioned L × L DST4 matrix is derived from a 2L × 2L DCT2 matrix, and may specifically refer to extracting partial matrix coefficients in the DCT2 matrix as matrix coefficients of the DST4 matrix.

Optionally, the method further includes: analyzing the code stream to obtain a multi-core conversion flag bit; determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relationship between the target transformation matrix pair index value and the candidate transformation matrix pair, including: under the condition that the value of the multi-core transformation zone bit is a first value, determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair; and under the condition that the value of the multi-core transformation zone bit is the second value, determining the DCT2 matrix as a horizontal direction transformation matrix and a vertical direction transformation matrix in the target transformation matrix pair.

That is, in the present application, the multi-core transform may be performed only when the value of the multi-core transform flag is the first value, and the residual coefficients may be inversely transformed by directly using (DCT 2) as the target transform matrix when the value of the multi-core transform flag is the second value.

The multi-core transform flag may be specifically an MTS _ flag, and may indicate that multi-core transform is performed when the MTS _ flag is 1, or may indicate that multi-core transform is not performed when the MTS _ flag is 0 (or the meanings indicated by 1 and 0 may be interchanged).

In a second aspect, a video encoding method is provided, the method comprising: obtaining a residual block of an image block to be processed; obtaining a candidate transformation matrix pair of the residual block according to preset mapping relation information, wherein the candidate transformation matrix pair comprises a horizontal direction transformation matrix and a vertical direction transformation matrix, the horizontal direction transformation matrix and the vertical direction transformation matrix are both one of two preset transformation matrices, the first transformation matrix of the two preset transformation matrices is a DCT2 'matrix, and the DCT2' matrix is a transposed matrix of the DCT2 matrix; selecting a transformation matrix pair with the minimum distortion rate from the candidate transformation matrix pairs as a target transformation matrix pair; transforming the residual block according to the target transformation matrix pair to obtain a transformation coefficient of the image block to be processed; and writing the index value of the target transformation matrix corresponding to the target transformation matrix into the code stream.

The mapping relationship information may include a target transformation matrix pair index value and a transformation matrix pair corresponding to the target index value.

In the application, since the candidate transformation matrix pair includes the DCT2' matrix and the DCT2' matrix has a fast algorithm in the transformation process, the fast algorithm can be adopted when the DCT2' matrix is adopted for transformation, and the calculation complexity of the transformation process can be reduced.

In combination with the second aspect, in some implementations of the second aspect, the DCT2' matrix is derived from the DCT2 matrix.

With reference to the second aspect, in certain implementations of the second aspect, a second transformation matrix of the two preset transformation matrices is derived from a DCT2 matrix.

With reference to the second aspect, in certain implementations of the second aspect, a second transformation matrix of the two preset transformation matrices is a DCT2 'FS matrix or a DCT2' F matrix.

Optionally, the mirror image is a left-right flip mirror image.

In the application, as the DCT2 'FS and the DCT2' F matrixes have a fast algorithm in the transformation process, the computational complexity in the transformation process can be simplified.

Specifically, when the first matrix of the two preset transformation matrices is a DCT2 'matrix and the second matrix is a DCT2' FS matrix or a DCT2 'F matrix, since a fast algorithm exists in the DCT2' matrix, the DCT2 'FS matrix and the DCT2' F matrix, when the residual block is transformed by using a target transformation matrix pair (the target transformation matrix pair is formed by combining the two preset transformation matrices), the computational complexity in the inverse transformation process can be reduced.

With reference to the second aspect, in certain implementations of the second aspect, the horizontal transform matrix in the target pair of transform matrices is a DCT2' matrix, and the vertical transform matrix in the target pair of transform matrices is a DCT2' FS matrix or a DCT2' F matrix when the height of the residual block is greater than or equal to M points; when the height of the residual block is less than M points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, where M is a positive integer.

With reference to the second aspect, in certain implementations of the second aspect, the vertical transform matrix in the target pair of transform matrices is a DCT2' matrix, and when the width of the residual block is greater than or equal to M points, the horizontal transform matrix in the target pair of transform matrices is a DCT2' FS matrix or a DCT2' F matrix; when the width of the residual block is smaller than M points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, where M is a positive integer.

Optionally, M is 32.

In the application, the transformation matrix with the rapid algorithm is adopted as the horizontal transformation matrix or the vertical transformation matrix in the target transformation matrix pair in the large-size transformation scene, and the transformation matrix suitable for the small-size transformation scene is adopted as the horizontal transformation matrix or the vertical transformation matrix in the target transformation matrix pair in the small-size transformation scene, so that the transformation complexity can be obviously reduced in the large-size transformation scene, the transformation performance can be ensured in the small-size scene, and the balance can be obtained between the reduction of the transformation complexity and the guarantee of the transformation performance.

With reference to the second aspect, in certain implementations of the second aspect, the horizontal transform matrix in the target pair of transform matrices is a DCT2' matrix, and the vertical transform matrix in the target pair of transform matrices is a DST7 matrix when the height of the residual block is greater than or equal to N points; when the height of the residual block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

With reference to the second aspect, in certain implementations of the second aspect, the vertical transform matrix in the target pair of transform matrices is a DCT2' matrix, and the horizontal transform matrix in the target pair of transform matrices is a DST7 matrix when the width of the residual block is greater than or equal to N points; when the width of the residual block is smaller than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, where N is a positive integer.

Optionally, N-16.

In the application, the DCT2 'matrix and the DST7 matrix (suitable for transformation in a large-size scene) are adopted as the target transformation matrix pair in a large-size transformation scene, and the DCT2' matrix and the DST4 matrix (suitable for transformation in a small-size scene) are adopted as the target transformation matrix pair in a small-size scene, so that the transformation complexity can be reduced, and the transformation performance can be improved.

The DST4 matrix may also be derived from the DCT2 matrix.

In a third aspect, a decoder is provided, which includes means for performing the method of the first aspect or any one of the implementations of the first aspect.

In a fourth aspect, there is provided an encoder comprising means for performing the method of the second aspect or any one of the implementations of the second aspect.

In a fifth aspect, a decoder is provided, comprising: a memory and a processor, the processor invoking program code stored in the memory to perform part or all of the steps of the first aspect or any implementation of the first aspect.

Optionally, the memory is a non-volatile memory.

Optionally, the memory is coupled with the processor.

In a sixth aspect, there is provided an encoder comprising: a memory and a processor that invokes program code stored in the memory to perform part or all of the steps of the method of the second aspect or any implementation of the second aspect.

Optionally, the memory is a non-volatile memory.

Optionally, the memory is coupled with the processor.

In a seventh aspect, a computer-readable storage medium is provided, which stores program code, wherein the program code includes instructions for performing part or all of the steps of the method in the first aspect or any one of the implementation manners of the first aspect.

In an eighth aspect, a computer readable storage medium is provided, which stores program code, wherein the program code comprises instructions for performing part or all of the steps of the method in the second aspect or any one of the implementations of the second aspect.

In a ninth aspect, there is provided a computer program product for causing a computer to perform some or all of the steps of the method of the first aspect or any one of its implementations when the computer program product is run on the computer.

A tenth aspect provides a computer program product for causing a computer to perform the instructions of the second aspect or of the steps of part or all of the method in an implementation of any of the second aspects, when the computer program product runs on the computer.

Drawings

FIG. 1 is a block diagram of an example video encoding system for implementing an embodiment of the present application;

FIG. 2 is a block diagram of an example structure of a video encoder for implementing embodiments of the present application;

FIG. 3 is a block diagram of an example architecture of a video decoder implementing an embodiment of the present application;

FIG. 4 shows a block diagram of an example structure including encoder 20 of FIG. 2 and decoder 30 of FIG. 3;

FIG. 5 shows a block diagram of another example of an encoding apparatus or a decoding apparatus;

FIG. 6 is a schematic flow chart diagram of a video decoding method of an embodiment of the present application;

FIG. 7 is a schematic diagram of a process of deriving a transform matrix of DCT2 'and a transform matrix of DCT2' FS from a transform matrix of DCT 2;

fig. 8 is a schematic flow chart of a video encoding method of an embodiment of the present application;

fig. 9 is a schematic diagram of a butterfly fast algorithm circuit implementation of a 16 × 16DCT2 matrix in HEVC;

fig. 10 is a schematic diagram of a 32 × 32 inverse transform implementation circuit according to an embodiment of the present application;

fig. 11 is a schematic block diagram of a video decoder of an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the application or in which specific aspects of embodiments of the application may be employed. It should be understood that embodiments of the present application may be used in other ways and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present application is defined by the appended claims.

For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

Video coding generally refers to processing a sequence of pictures that form a video or video sequence. In the field of video coding, the terms "picture", "frame" or "image" may be used as synonyms. Video encoding as used in this application (or this disclosure) refers to video encoding or video decoding. Video encoding is performed on the source side, typically including processing (e.g., by compressing) the original video picture to reduce the amount of data required to represent the video picture (and thus more efficiently store and/or transmit). Video decoding is performed at the destination side, typically involving inverse processing with respect to the encoder, to reconstruct the video pictures. Embodiments refer to video pictures (or collectively pictures, as will be explained below) "encoding" should be understood to refer to "encoding" or "decoding" of a video sequence. The combination of the encoding part and the decoding part is also called codec (encoding and decoding).

In the case of lossless video coding, the original video picture can be reconstructed, i.e., the reconstructed video picture has the same quality as the original video picture (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, the amount of data needed to represent the video picture is reduced by performing further compression, e.g., by quantization, while the decoder side cannot fully reconstruct the video picture, i.e., the quality of the reconstructed video picture is lower or worse than the quality of the original video picture.

Several video coding standards of h.261 belong to the "lossy hybrid video codec" (i.e., the combination of spatial and temporal prediction in the sample domain with 2D transform coding in the transform domain for applying quantization). Each picture of a video sequence is typically partitioned into non-overlapping sets of blocks, typically encoded at the block level. In other words, the encoder side typically processes, i.e., encodes, video at the block (video block) level, e.g., generates a prediction block by spatial (intra-picture) prediction and temporal (inter-picture) prediction, subtracts the prediction block from the current block (the currently processed or to be processed block) to obtain a residual block, transforms the residual block and quantizes the residual block in the transform domain to reduce the amount of data to be transmitted (compressed), while the decoder side applies the inverse processing portion relative to the encoder to the encoded or compressed block to reconstruct the current block for representation. In addition, the encoder replicates the decoder processing loop such that the encoder and decoder generate the same prediction (e.g., intra-prediction and inter-prediction) and/or reconstruction for processing, i.e., encoding, subsequent blocks.

As used herein, the term "block" may be a portion of a picture or frame. For ease of description, embodiments of the present application are described with reference to multipurpose Video Coding (VVC) or High-Efficiency Video Coding (HEVC) developed by the Video Coding Experts Group (VCEG) of the ITU-T Video Coding Experts Group and the JCT-VC (Joint Video Coding, MPEG) of the Joint working Group of Video Coding of the ISO/IEC moving Picture Experts Group. Those of ordinary skill in the art understand that the embodiments of the present application are not limited to HEVC or VVC. May refer to CU, PU, and TU. In HEVC, the CTU is split into CUs by using a quadtree structure represented as a coding tree. A decision is made at the CU level whether to encode a picture region using inter-picture (temporal) or intra-picture (spatial) prediction. Each CU may be further split into one, two, or four PUs according to the PU split type. The same prediction process is applied within one PU and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU split type, the CU may be partitioned into Transform Units (TUs) according to other quadtree structures similar to the coding tree used for the CU. In recent developments of video compression techniques, the coding blocks are partitioned using Quad-tree and binary tree (QTBT) partition frames. In the QTBT block structure, a CU may be square or rectangular in shape. In the VVC, a Coding Tree Unit (CTU) is first divided by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. The binary tree leaf nodes are called Coding Units (CUs), and the segments are used for prediction and transform processing without any other segmentation. This means that the block sizes of CU, PU and TU in the QTBT coding block structure are the same. Also, it has been proposed to use multiple partitions, such as ternary tree partitions, with QTBT block structures.

For a preliminary understanding and appreciation of the video codec process, embodiments of the encoder 20, decoder 30, and codec systems 10, 40 are described below in conjunction with fig. 1-4 (before describing embodiments of the present application in more detail based on fig. 10).

Fig. 1 is a conceptual or schematic block diagram depicting an exemplary encoding system 10, such as a video encoding system 10 that may utilize the techniques of the present application (this disclosure). Encoder 20 (e.g., video encoder 20) and decoder 30 (e.g., video decoder 30) of video encoding system 10 represent examples of devices that may be used to perform techniques for video encoding or video decoding methods according to various examples described in this application. As shown in fig. 1, encoding system 10 includes a source device 12 for providing encoded data 13, e.g., encoded pictures 13, to a destination device 14 that decodes encoded data 13, for example.

The source device 12 comprises an encoder 20 and may additionally, i.e. optionally, comprise a picture source 16, a pre-processing unit 18, e.g. a picture pre-processing unit 18, and a communication interface or unit 22.

The picture source 16 may include or may be any type of picture capture device for capturing real-world pictures, for example, and/or any type of picture or comment generation device (for screen content encoding, some text on the screen is also considered part of the picture or image to be encoded), for example, a computer graphics processor for generating computer animated pictures, or any type of device for obtaining and/or providing real-world pictures, computer animated pictures (e.g., screen content, Virtual Reality (VR) pictures), and/or any combination thereof (e.g., Augmented Reality (AR) pictures).

A (digital) picture is or can be seen as a two-dimensional array or matrix of sample points having intensity values. The sample points in the array may also be referred to as pixels (short for pixels) or pels (pels). The number of sampling points of the array or picture in the horizontal and vertical directions (or axes) defines the size and/or resolution of the picture. To represent color, three color components are typically employed, i.e., a picture may be represented as or contain three sample arrays. In the RBG format or color space, a picture includes corresponding red, green, and blue sampling arrays. However, in video coding, each pixel is typically represented in a luminance/chrominance format or color space, e.g., YCbCr, comprising a luminance component (sometimes also indicated by L) indicated by Y and two chrominance components indicated by Cb and Cr. The luminance (luma) component Y represents the luminance or gray level intensity (e.g. both are the same in a gray scale picture), while the two chrominance (chroma) components Cb and Cr represent the chrominance or color information components. Accordingly, a picture in YCbCr format includes a luminance sample array of luminance sample values (Y), and two chrominance sample arrays of chrominance values (Cb and Cr). Pictures in RGB format may be converted or transformed into YCbCr format and vice versa, a process also known as color transformation or conversion. If the picture is black and white, the picture may include only an array of luminance samples.

Picture source 16 (e.g., video source 16) may be, for example, a camera for capturing pictures, a memory, such as a picture store, any type of (internal or external) interface that includes or stores previously captured or generated pictures, and/or obtains or receives pictures. The camera may be, for example, an integrated camera local or integrated in the source device, and the memory may be an integrated memory local or integrated in the source device, for example. The interface may be, for example, an external interface that receives pictures from an external video source, for example, an external picture capturing device such as a camera, an external memory, or an external picture generating device, for example, an external computer graphics processor, computer, or server. The interface may be any kind of interface according to any proprietary or standardized interface protocol, e.g. a wired or wireless interface, an optical interface. The interface for obtaining picture data 17 may be the same interface as communication interface 22 or part of communication interface 22.

Unlike pre-processing unit 18 and the processing performed by pre-processing unit 18, picture or picture data 17 (e.g., video data 16) may also be referred to as raw picture or raw picture data 17.

Pre-processing unit 18 is configured to receive (raw) picture data 17 and perform pre-processing on picture data 17 to obtain a pre-processed picture 19 or pre-processed picture data 19. For example, the pre-processing performed by pre-processing unit 18 may include trimming, color format conversion (e.g., from RGB to YCbCr), toning, or denoising. It is to be understood that the pre-processing unit 18 may be an optional component.

Encoder 20, e.g., video encoder 20, is used to receive pre-processed picture data 19 and provide encoded picture data 21 (details will be described further below, e.g., based on fig. 2 or fig. 4).

Communication interface 22 of source device 12 may be used to receive encoded picture data 21 and transmit to other devices, e.g., destination device 14 or any other device for storage or direct reconstruction, or to process encoded picture data 21 prior to correspondingly storing encoded data 13 and/or transmitting encoded data 13 to other devices, e.g., destination device 14 or any other device for decoding or storage.

Destination device 14 includes a decoder 30 (e.g., a video decoder 30), and may additionally, that is, optionally, include a communication interface or unit 28, a post-processing unit 32, and a display device 34.

Communication interface 28 of destination device 14 is used, for example, to receive encoded picture data 21 or encoded data 13 directly from source device 12 or any other source, such as a storage device, such as an encoded picture data storage device.

Communication interface 22 and communication interface 28 may be used to transmit or receive encoded picture data 21 or encoded data 13 by way of a direct communication link between source device 12 and destination device 14, such as a direct wired or wireless connection, or by way of any type of network, such as a wired or wireless network or any combination thereof, or any type of private and public networks, or any combination thereof.

Communication interface 22 may, for example, be used to encapsulate encoded picture data 21 into a suitable format, such as a packet, for transmission over a communication link or communication network.

Communication interface 28, which forms a corresponding part of communication interface 22, may for example be used for decapsulating encoded data 13 to obtain encoded picture data 21.

Both communication interface 22 and communication interface 28 may be configured as a unidirectional communication interface, as indicated by the arrow from source device 12 to destination device 14 for encoded picture data 13 in fig. 1, or as a bidirectional communication interface, and may be used, for example, to send and receive messages to establish a connection, acknowledge and exchange any other information related to a communication link and/or a data transmission, for example, an encoded picture data transmission.

Decoder 30 is used to receive encoded picture data 21 and provide decoded picture data 31 or decoded picture 31 (details will be described further below, e.g., based on fig. 3 or fig. 5).

Post-processor 32 of destination device 14 is used to post-process decoded picture data 31 (also referred to as reconstructed picture data), e.g., decoded picture 131, to obtain post-processed picture data 33, e.g., post-processed picture 33. Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), toning, cropping, or resampling, or any other processing for, for example, preparing decoded picture data 31 for display by display device 34.

Display device 34 of destination device 14 is used to receive post-processed picture data 33 to display a picture to, for example, a user or viewer. Display device 34 may be or may include any type of display for presenting the reconstructed picture, such as an integrated or external display or monitor. For example, the display may include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other display of any kind.

Although fig. 1 depicts source apparatus 12 and destination apparatus 14 as separate apparatuses, an apparatus embodiment may also include the functionality of both source apparatus 12 and destination apparatus 14 or both, i.e., source apparatus 12 or corresponding functionality and destination apparatus 14 or corresponding functionality. In such embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.

It will be apparent to those skilled in the art from this description that the existence and (exact) division of the functionality of the different elements or source device 12 and/or destination device 14 shown in fig. 1 may vary depending on the actual device and application.

Encoder 20 (e.g., video encoder 20) and decoder 30 (e.g., video decoder 30) may each be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the techniques are implemented in part in software, an apparatus may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered one or more processors. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in a corresponding device.

Source device 12 may be referred to as a video encoding device or a video encoding apparatus. Destination device 14 may be referred to as a video decoding device or a video decoding apparatus. Source device 12 and destination device 14 may be examples of video encoding devices or video encoding apparatus.

Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, a mobile phone, a smart phone, a tablet or tablet computer, a camcorder, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game console, a video streaming device (e.g., a content service server or a content distribution server), a broadcast receiver device, a broadcast transmitter device, etc., and may not use or use any type of operating system.

In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.

In some cases, the video encoding system 10 shown in fig. 1 is merely an example, and the techniques of this application may be applicable to video encoding settings (e.g., video encoding or video decoding) that do not necessarily involve any data communication between the encoding and decoding devices. In other examples, the data may be retrieved from local storage, streamed over a network, and so on. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In some examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

It should be understood that for each of the examples described above with reference to video encoder 20, video decoder 30 may be used to perform the reverse process. With respect to signaling syntax elements, video decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly. In some examples, video encoder 20 may entropy encode one or more syntax elements defined … … into an encoded video bitstream. In such instances, video decoder 30 may parse such syntax elements and decode the relevant video data accordingly.

Encoder and encoding method

Fig. 2 shows a schematic/conceptual block diagram of an example of a video encoder 20 for implementing the techniques of this application. In the example of fig. 2, video encoder 20 includes a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a buffer 216, a loop filter unit 220, a Decoded Picture Buffer (DPB) 230, a prediction processing unit 260, and an entropy encoding unit 270. Prediction processing unit 260 may include inter prediction unit 244, intra prediction unit 254, and mode selection unit 262. Inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

For example, the residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the prediction processing unit 260, and the entropy encoding unit 270 form a forward signal path of the encoder 20, and, for example, the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the prediction processing unit 260 form a backward signal path of the encoder, wherein the backward signal path of the encoder corresponds to a signal path of a decoder (see the decoder 30 in fig. 3).

Encoder 20 receives picture 201 or block 203 of picture 201, e.g., a picture in a sequence of pictures forming a video or video sequence, e.g., via input 202. Picture block 203 may also be referred to as a current picture block or a picture block to be encoded, and picture 201 may be referred to as a current picture or a picture to be encoded (especially when the current picture is distinguished from other pictures in video encoding, such as previously encoded and/or decoded pictures in the same video sequence, i.e., a video sequence that also includes the current picture).

Segmentation

An embodiment of encoder 20 may include a partitioning unit (not shown in fig. 2) for partitioning picture 201 into a plurality of blocks, such as block 203, typically into a plurality of non-overlapping blocks. The partitioning unit may be used to use the same block size for all pictures in a video sequence and a corresponding grid defining the block size, or to alter the block size between pictures or subsets or groups of pictures and partition each picture into corresponding blocks.

In one example, prediction processing unit 260 of video encoder 20 may be used to perform any combination of the above-described segmentation techniques.

Like picture 201, block 203 is also or can be viewed as a two-dimensional array or matrix of sample points having intensity values (sample values), although smaller in size than picture 201. In other words, the block 203 may comprise, for example, one sample array (e.g., a luma array in the case of a black and white picture 201) or three sample arrays (e.g., a luma array and two chroma arrays in the case of a color picture) or any other number and/or class of arrays depending on the color format applied. The number of sampling points in the horizontal and vertical directions (or axes) of the block 203 defines the size of the block 203.

The encoder 20 as shown in fig. 2 is used to encode a picture 201 block by block, e.g., performing encoding and prediction for each block 203.

Residual calculation

The residual calculation unit 204 is configured to calculate a residual block 205 based on the picture block 203 and the prediction block 265 (further details of the prediction block 265 are provided below), e.g. by subtracting sample values of the picture block 203 from sample values of the prediction block 265 on a sample-by-sample (pixel-by-pixel) basis to obtain the residual block 205 in the sample domain.

Transformation of

The transform processing unit 206 is configured to apply a transform, such as a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST), on the sample values of the residual block 205 to obtain transform coefficients 207 in a transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent the residual block 205 in the transform domain.

The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transform specified for HEVC/h.265. Such integer approximations are typically scaled by some factor compared to the orthogonal DCT transform. To maintain the norm of the residual block processed by the forward transform and the inverse transform, an additional scaling factor is applied as part of the transform process. The scaling factor is typically selected based on certain constraints, e.g., the scaling factor is a power of 2 for a shift operation, a trade-off between bit depth of transform coefficients, accuracy and implementation cost, etc. For example, a specific scaling factor may be specified on the decoder 30 side for the inverse transform by, for example, inverse transform processing unit 212 (and on the encoder 20 side for the corresponding inverse transform by, for example, inverse transform processing unit 212), and correspondingly, a corresponding scaling factor may be specified on the encoder 20 side for the forward transform by transform processing unit 206.

Quantization

Quantization unit 208 is used to quantize transform coefficients 207, e.g., by applying scalar quantization or vector quantization, to obtain quantized transform coefficients 209. Quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209. The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient during quantization, where n is greater than m. The quantization level may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scales may be applied to achieve finer or coarser quantization. Smaller quantization steps correspond to finer quantization and larger quantization steps correspond to coarser quantization. An appropriate quantization step size may be indicated by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization step sizes. For example, a smaller quantization parameter may correspond to a fine quantization (smaller quantization step size) and a larger quantization parameter may correspond to a coarse quantization (larger quantization step size), or vice versa. The quantization may comprise a division by a quantization step size and a corresponding quantization or inverse quantization, e.g. performed by inverse quantization 210, or may comprise a multiplication by a quantization step size. Embodiments according to some standards, such as HEVC, may use a quantization parameter to determine the quantization step size. In general, the quantization step size may be calculated based on the quantization parameter using a fixed point approximation of an equation that includes division. Additional scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block that may be modified due to the scale used in the fixed point approximation of the equation for the quantization step size and quantization parameter. In one example implementation, the inverse transform and inverse quantization scales may be combined. Alternatively, a custom quantization table may be used and signaled from the encoder to the decoder, e.g., in a bitstream. Quantization is a lossy operation, where the larger the quantization step size, the greater the loss.

The inverse quantization unit 210 is configured to apply inverse quantization of the quantization unit 208 on the quantized coefficients to obtain inverse quantized coefficients 211, e.g., apply an inverse quantization scheme of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step as the quantization unit 208. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, corresponding to transform coefficients 207, although the loss due to quantization is typically not the same as the transform coefficients.

The inverse transform processing unit 212 is configured to apply an inverse transform of the transform applied by the transform processing unit 206, for example, an inverse Discrete Cosine Transform (DCT) or an inverse Discrete Sine Transform (DST), to obtain an inverse transform block 213 in the sample domain. The inverse transform block 213 may also be referred to as an inverse transform dequantized block 213 or an inverse transform residual block 213.

The reconstruction unit 214 (e.g., summer 214) is used to add the inverse transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain, e.g., to add sample values of the reconstructed residual block 213 to sample values of the prediction block 265.

Optionally, a buffer unit 216 (or simply "buffer" 216), such as a line buffer 216, is used to buffer or store the reconstructed block 215 and corresponding sample values, for example, for intra prediction. In other embodiments, the encoder may be used to use the unfiltered reconstructed block and/or corresponding sample values stored in buffer unit 216 for any class of estimation and/or prediction, such as intra prediction.

For example, an embodiment of encoder 20 may be configured such that buffer unit 216 is used not only to store reconstructed blocks 215 for intra prediction 254, but also for loop filter unit 220 (not shown in fig. 2), and/or such that buffer unit 216 and decoded picture buffer unit 230 form one buffer, for example. Other embodiments may be used to use filtered block 221 and/or blocks or samples from decoded picture buffer 230 (neither shown in fig. 2) as input or basis for intra prediction 254.

The loop filter unit 220 (or simply "loop filter" 220) is used to filter the reconstructed block 215 to obtain a filtered block 221, in order to facilitate pixel transitions or to improve video quality. Loop filter unit 220 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), or a sharpening or smoothing filter, or a collaborative filter. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstructed block 221. The decoded picture buffer 230 may store the reconstructed encoded block after the loop filter unit 220 performs a filtering operation on the reconstructed encoded block.

Embodiments of encoder 20 (correspondingly, loop filter unit 220) may be configured to output loop filter parameters (e.g., sample adaptive offset information), e.g., directly or after entropy encoding by entropy encoding unit 270 or any other entropy encoding unit, e.g., such that decoder 30 may receive and apply the same loop filter parameters for decoding.

Decoded Picture Buffer (DPB) 230 may be a reference picture memory that stores reference picture data for use by video encoder 20 in encoding video data. DPB 230 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including Synchronous DRAM (SDRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. The DPB 230 and the buffer 216 may be provided by the same memory device or separate memory devices. In a certain example, a Decoded Picture Buffer (DPB) 230 is used to store filtered blocks 221. Decoded picture buffer 230 may further be used to store other previous filtered blocks, such as previous reconstructed and filtered blocks 221, of the same current picture or of a different picture, such as a previous reconstructed picture, and may provide the complete previous reconstructed, i.e., decoded picture (and corresponding reference blocks and samples) and/or the partially reconstructed current picture (and corresponding reference blocks and samples), e.g., for inter prediction. In a certain example, if reconstructed block 215 is reconstructed without in-loop filtering, Decoded Picture Buffer (DPB) 230 is used to store reconstructed block 215.

Prediction processing unit 260, also referred to as block prediction processing unit 260, is used to receive or obtain block 203 (current block 203 of current picture 201) and reconstructed picture data, e.g., reference samples of the same (current) picture from buffer 216 and/or reference picture data 231 of one or more previously decoded pictures from decoded picture buffer 230, and to process such data for prediction, i.e., to provide prediction block 265, which may be inter-predicted block 245 or intra-predicted block 255.

The mode selection unit 262 may be used to select a prediction mode (e.g., intra or inter prediction mode) and/or a corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.

Embodiments of mode selection unit 262 may be used to select prediction modes (e.g., from those supported by prediction processing unit 260) that provide the best match or the smallest residual (smallest residual means better compression in transmission or storage), or that provide the smallest signaling overhead (smallest signaling overhead means better compression in transmission or storage), or both. The mode selection unit 262 may be configured to determine a prediction mode based on Rate Distortion Optimization (RDO), i.e., select a prediction mode that provides the minimum rate distortion optimization, or select a prediction mode in which the associated rate distortion at least meets the prediction mode selection criteria.

The prediction processing performed by the example of the encoder 20 (e.g., by the prediction processing unit 260) and the mode selection performed (e.g., by the mode selection unit 262) will be explained in detail below.

As described above, the encoder 20 is configured to determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes. The prediction mode set may include, for example, intra prediction modes and/or inter prediction modes.

The intra prediction mode set may include 35 different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in h.265, or may include 67 different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in h.266 under development.

The set of (possible) inter prediction modes depends on the available reference pictures (i.e. at least partially decoded pictures stored in the DBP 230, e.g. as described above) and other inter prediction parameters, e.g. on whether the best matching reference block is searched using the entire reference picture or only a part of the reference picture, e.g. a search window area of an area surrounding the current block, and/or e.g. on whether pixel interpolation like half-pixel and/or quarter-pixel interpolation is applied.

In addition to the above prediction mode, a skip mode and/or a direct mode may also be applied.

The prediction processing unit 260 may further be configured to partition the block 203 into smaller block partitions or sub-blocks, for example, by iteratively using quad-tree (QT) partitioning, binary-tree (BT) partitioning, or ternary-tree (TT) partitioning, or any combination thereof, and to perform prediction for each of the block partitions or sub-blocks, for example, wherein mode selection includes selecting a tree structure of the partitioned block 203 and selecting a prediction mode to apply to each of the block partitions or sub-blocks.

The inter prediction unit 244 may include a Motion Estimation (ME) unit (not shown in fig. 2) and a Motion Compensation (MC) unit (not shown in fig. 2). The motion estimation unit is used to receive or obtain picture block 203 (current picture block 203 of current picture 201) and decoded picture 231, or at least one or more previously reconstructed blocks, e.g., reconstructed blocks of one or more other/different previously decoded pictures 231, for motion estimation. For example, the video sequence may comprise a current picture and a previously decoded picture 31, or in other words, the current picture and the previously decoded picture 31 may be part of, or form, a sequence of pictures forming the video sequence.

For example, the encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different one of a plurality of other pictures and provide the reference picture (or reference picture index) to a motion estimation unit (not shown in fig. 2) and/or provide an offset (spatial offset) between the position (X, Y coordinates) of the reference block and the position of the current block as an inter prediction parameter. This offset is also called a Motion Vector (MV).

The motion compensation unit is used to obtain, e.g., receive, inter-prediction parameters and perform inter-prediction based on or using the inter-prediction parameters to obtain the inter-prediction block 245. The motion compensation performed by the motion compensation unit (not shown in fig. 2) may involve taking or generating a prediction block based on a motion/block vector determined by motion estimation (possibly performing interpolation to sub-pixel precision). Interpolation filtering may generate additional pixel samples from known pixel samples, potentially increasing the number of candidate prediction blocks that may be used to encode a picture block. Upon receiving the motion vector for the PU of the current picture block, motion compensation unit 246 may locate the prediction block in one reference picture list to which the motion vector points. Motion compensation unit 246 may also generate syntax elements associated with the blocks and video slices for use by video decoder 30 in decoding picture blocks of the video slices.

The intra prediction unit 254 is used to obtain, e.g., receive, the picture block 203 (current picture block) of the same picture and one or more previously reconstructed blocks, e.g., reconstructed neighboring blocks, for intra estimation. For example, the encoder 20 may be configured to select an intra-prediction mode from a plurality of (predetermined) intra-prediction modes.

Embodiments of encoder 20 may be used to select an intra prediction mode based on optimization criteria, such as based on a minimum residual (e.g., an intra prediction mode that provides a prediction block 255 that is most similar to current picture block 203) or a minimum code rate distortion.

The intra-prediction unit 254 is further configured to determine the intra-prediction block 255 based on the intra-prediction parameters as the selected intra-prediction mode. In any case, after selecting the intra-prediction mode for the block, intra-prediction unit 254 is also used to provide intra-prediction parameters, i.e., information indicating the selected intra-prediction mode for the block, to entropy encoding unit 270. In one example, intra-prediction unit 254 may be used to perform any combination of the intra-prediction techniques described below.

Entropy encoding unit 270 is configured to apply an entropy encoding algorithm or scheme (e.g., a Variable Length Coding (VLC) scheme, a Context Adaptive VLC (CAVLC) scheme, an arithmetic coding scheme, a Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or other entropy encoding methods or techniques) to individual or all of quantized residual coefficients 209, inter-prediction parameters, intra-prediction parameters, and/or loop filter parameters (or not) to obtain encoded picture data 21 that may be output by output 272 in the form of, for example, encoded bitstream 21. The encoded bitstream may be transmitted to video decoder 30, or archived for later transmission or retrieval by video decoder 30. Entropy encoding unit 270 may also be used to entropy encode other syntax elements of the current video slice being encoded.

Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may quantize the residual signal directly without the transform processing unit 206 for certain blocks or frames. In another embodiment, encoder 20 may have quantization unit 208 and inverse quantization unit 210 combined into a single unit.

Fig. 3 illustrates an exemplary video decoder 30 for implementing the techniques of the present application. Video decoder 30 is to receive encoded picture data (e.g., an encoded bitstream) 21, e.g., encoded by encoder 20, to obtain a decoded picture 231. During the decoding process, video decoder 30 receives video data, such as an encoded video bitstream representing picture blocks of an encoded video slice and associated syntax elements, from video encoder 20.

In the example of fig. 3, decoder 30 includes entropy decoding unit 304, inverse quantization unit 310, inverse transform processing unit 312, reconstruction unit 314 (e.g., summer 314), buffer 316, loop filter 320, decoded picture buffer 330, and prediction processing unit 360. The prediction processing unit 360 may include an inter prediction unit 344, an intra prediction unit 354, and a mode selection unit 362. In some examples, video decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with reference to video encoder 20 of fig. 2.

Entropy decoding unit 304 is to perform entropy decoding on encoded picture data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), e.g., any or all of inter-prediction, intra-prediction parameters, loop filter parameters, and/or other syntax elements (decoded). The entropy decoding unit 304 is further for forwarding the inter-prediction parameters, the intra-prediction parameters, and/or other syntax elements to the prediction processing unit 360. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

Inverse quantization unit 310 may be functionally identical to inverse quantization unit 110, inverse transform processing unit 312 may be functionally identical to inverse transform processing unit 212, reconstruction unit 314 may be functionally identical to reconstruction unit 214, buffer 316 may be functionally identical to buffer 216, loop filter 320 may be functionally identical to loop filter 220, and decoded picture buffer 330 may be functionally identical to decoded picture buffer 230.

Prediction processing unit 360 may include inter prediction unit 344 and intra prediction unit 354, where inter prediction unit 344 may be functionally similar to inter prediction unit 244 and intra prediction unit 354 may be functionally similar to intra prediction unit 254. The prediction processing unit 360 is typically used to perform block prediction and/or to obtain a prediction block 365 from the encoded data 21, as well as to receive or obtain (explicitly or implicitly) prediction related parameters and/or information about the selected prediction mode from, for example, the entropy decoding unit 304.

When the video slice is encoded as an intra-coded (I) slice, intra-prediction unit 354 of prediction processing unit 360 is used to generate a prediction block 365 for the picture block of the current video slice based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame or picture. When a video frame is encoded as an inter-coded (i.e., B or P) slice, inter prediction unit 344 (e.g., a motion compensation unit) of prediction processing unit 360 is used to generate a prediction block 365 for the video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 304. For inter prediction, a prediction block may be generated from one reference picture within one reference picture list. Video decoder 30 may construct the reference frame list using default construction techniques based on the reference pictures stored in DPB 330: list 0 and list 1.

Prediction processing unit 360 is used to determine prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements, and to generate a prediction block for the current video block being decoded using the prediction information. For example, prediction processing unit 360 uses some of the syntax elements received to determine a prediction mode (e.g., intra or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more of a reference picture list of the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information to decode video blocks of the current video slice.

Inverse quantization unit 310 may be used to inverse quantize (i.e., inverse quantize) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 304. The inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied.

Inverse transform processing unit 312 is used to apply an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to produce a block of residuals in the pixel domain.

The reconstruction unit 314 (e.g., summer 314) is used to add the inverse transform block 313 (i.e., reconstructed residual block 313) to the prediction block 365 to obtain the reconstructed block 315 in the sample domain, e.g., by adding sample values of the reconstructed residual block 313 to sample values of the prediction block 365.

Loop filter unit 320 (either during or after the encoding cycle) is used to filter reconstructed block 315 to obtain filtered block 321 to facilitate pixel transitions or improve video quality. In one example, loop filter unit 320 may be used to perform any combination of the filtering techniques described below. Loop filter unit 320 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), or a sharpening or smoothing filter, or a collaborative filter. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.

Decoded video block 321 in a given frame or picture is then stored in decoded picture buffer 330, which stores reference pictures for subsequent motion compensation.

Decoder 30 is used to output decoded picture 31, e.g., via output 332, for presentation to or viewing by a user.

Other variations of video decoder 30 may be used to decode the compressed bitstream. For example, decoder 30 may generate an output video stream without loop filter unit 320. For example, the non-transform based decoder 30 may directly inverse quantize the residual signal without the inverse transform processing unit 312 for certain blocks or frames. In another embodiment, video decoder 30 may have inverse quantization unit 310 and inverse transform processing unit 312 combined into a single unit.

Fig. 4 is an illustration of an example of a video encoding system 40 including encoder 20 of fig. 2 and/or decoder 30 of fig. 3, according to an example embodiment. System 40 may implement a combination of the various techniques of the present application. In the illustrated embodiment, video encoding system 40 may include an imaging device 41, video encoder 20, video decoder 30 (and/or a video encoder implemented by logic 47 of processing unit 46), an antenna 42, one or more processors 43, one or more memories 44, and/or a display device 45.

As shown, the imaging device 41, the antenna 42, the processing unit 46, the logic circuit 47, the video encoder 20, the video decoder 30, the processor 43, the memory 44, and/or the display device 45 are capable of communicating with each other. As discussed, although video encoding system 40 is depicted with video encoder 20 and video decoder 30, in different examples, video encoding system 40 may include only video encoder 20 or only video decoder 30.

In some examples, as shown, video encoding system 40 may include an antenna 42. For example, the antenna 42 may be used to transmit or receive an encoded bitstream of video data. Additionally, in some examples, video encoding system 40 may include a display device 45. Display device 45 may be used to present video data. In some examples, logic 47 may be implemented by processing unit 46, as shown. The processing unit 46 may comprise application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. Video coding system 40 may also include an optional processor 43, which optional processor 43 similarly may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. In some examples, the logic 47 may be implemented in hardware, such as video encoding specific hardware, and the processor 43 may be implemented in general purpose software, an operating system, and so on. In addition, the Memory 44 may be any type of Memory, such as a volatile Memory (e.g., Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), etc.) or a nonvolatile Memory (e.g., flash Memory, etc.), and the like. In a non-limiting example, storage 44 may be implemented by a speed cache memory. In some instances, logic circuitry 47 may access memory 44 (e.g., to implement an image buffer). In other examples, logic 47 and/or processing unit 46 may include memory (e.g., cache, etc.) for implementing image buffers, etc.

In some examples, video encoder 20 implemented by logic circuitry may include an image buffer (e.g., implemented by processing unit 46 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include video encoder 20 implemented by logic circuitry 47 to implement the various modules discussed with reference to fig. 2 and/or any other encoder system or subsystem described herein. Logic circuitry may be used to perform various operations discussed herein.

Video decoder 30 may be implemented in a similar manner by logic circuitry 47 to implement the various modules discussed with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. In some examples, logic circuit implemented video decoder 30 may include an image buffer (implemented by processing unit 2820 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include video decoder 30 implemented by logic circuitry 47 to implement the various modules discussed with reference to fig. 3 and/or any other decoder system or subsystem described herein.

In some examples, antenna 42 of video encoding system 40 may be used to receive an encoded bitstream of video data. As discussed, the encoded bitstream may include data related to the encoded video frame, indicators, index values, mode selection data, etc., discussed herein, such as data related to the encoding partition (e.g., transform coefficients or quantized transform coefficients, (as discussed) optional indicators, and/or data defining the encoding partition). Video encoding system 40 may also include a video decoder 30 coupled to antenna 42 and configured to decode the encoded bitstream. The display device 45 is used to present video frames.

Fig. 5 is a simplified block diagram of an apparatus 500 that may be used as either or both of source device 12 and destination device 14 in fig. 1, according to an example embodiment. Apparatus 500 may implement the techniques of this application, and apparatus 500 may take the form of a computing system including multiple computing devices, or a single computing device such as a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer, or the like.

The processor 502 in the apparatus 500 may be a central processor. Alternatively, processor 502 may be any other type of device or devices now or later developed that is capable of manipulating or processing information. As shown, although the disclosed embodiments may be practiced using a single processor, such as processor 502, speed and efficiency advantages may be realized using more than one processor.

In one embodiment, the Memory 504 of the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that is accessed by the processor 502 using a bus 512. The memory 504 may further include an operating system 508 and application programs 510, the application programs 510 including at least one program that permits the processor 502 to perform the methods described herein. For example, applications 510 may include applications 1 through N, applications 1 through N further including video coding applications that perform the methods described herein. The apparatus 500 may also include additional memory in the form of a slave memory 514, the slave memory 514 may be, for example, a memory card for use with a mobile computing device. Because a video communication session may contain a large amount of information, this information may be stored in whole or in part in the slave memory 514 and loaded into the memory 504 for processing as needed.

Device 500 may also include one or more output apparatuses, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines a display and a touch-sensitive element operable to sense touch inputs. A display 518 may be coupled to the processor 502 via the bus 512. Other output devices that permit a user to program apparatus 500 or otherwise use apparatus 500 may be provided in addition to display 518, or other output devices may be provided as an alternative to display 518. When the output device is or includes a display, the display may be implemented in different ways, including by a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT) display, a plasma display, or a Light Emitting Diode (LED) display, such as an Organic LED (OLED) display.

The apparatus 500 may also include or be in communication with an image sensing device 520, the image sensing device 520 being, for example, a camera or any other image sensing device 520 now or later developed that can sense an image, such as an image of a user running the apparatus 500. The image sensing device 520 may be placed directly facing the user running the apparatus 500. In an example, the position and optical axis of image sensing device 520 may be configured such that its field of view includes an area proximate display 518 and display 518 is visible from that area.

The apparatus 500 may also include or be in communication with a sound sensing device 522, such as a microphone or any other sound sensing device now known or later developed that can sense sound in the vicinity of the apparatus 500. The sound sensing device 522 may be positioned to face directly the user operating the apparatus 500 and may be used to receive sounds, such as speech or other utterances, emitted by the user while operating the apparatus 500.

Although the processor 502 and memory 504 of the apparatus 500 are depicted in fig. 5 as being integrated in a single unit, other configurations may also be used. The operations of processor 502 may be distributed among multiple directly couplable machines (each machine having one or more processors), or distributed in a local area or other network. Memory 504 may be distributed among multiple machines, such as a network-based memory or a memory among multiple machines running apparatus 500. Although only a single bus is depicted here, the bus 512 of the device 500 may be formed from multiple buses. Further, the secondary memory 514 may be directly coupled to other components of the apparatus 500 or may be accessible over a network and may comprise a single integrated unit, such as one memory card, or multiple units, such as multiple memory cards. Accordingly, the apparatus 500 may be implemented in a variety of configurations.

In the present application, the DCT2 matrix may also be abbreviated as DCT2, the DCT2' matrix may also be abbreviated as DCT2, the DST4 matrix may also be abbreviated as DST4, the DCT2' FS matrix may also be abbreviated as DCT2' FS, the DCT2' F matrix may also be abbreviated as DCT2' F, and the DST7 matrix may also be abbreviated as DST 7.

The video encoding method according to the embodiment of the present application will be described in detail with reference to fig. 6.

Fig. 6 is a schematic flow chart of an inverse video coding method of an embodiment of the present application. The method shown in fig. 6 includes steps 1001 to 1005, and the steps 1001 to 1005 will be described in detail below.

1001. And analyzing the code stream to obtain a target transformation matrix pair index value of the current block subjected to inverse transformation processing and a quantization coefficient of the current block.

In step 1001, after the code stream is acquired, the quantization coefficient of the current block and the target transform matrix pair index value may be acquired by performing operations such as entropy decoding on the code stream.

Optionally, the prediction mode corresponding to the current block may also be obtained by performing operations such as entropy decoding on the code stream.

1002. And performing inverse quantization processing on the quantized coefficient of the current block to obtain an inverse quantized coefficient of the current block.

1003. And determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair.

The candidate transformation matrix pair includes a horizontal direction transformation matrix and a vertical direction transformation matrix, both the horizontal direction transformation matrix and the vertical direction transformation matrix are one of two preset transformation matrices, a first transformation matrix of the two preset transformation matrices is a DCT2 'matrix, and a DCT2' matrix is a transposed matrix of the DCT2 matrix.

The candidate transformation matrix pairs may include the following specific transformation matrix pairs:

(1) (transform matrix a );

(2) (transform matrix a, transform matrix B);

(3) (transform matrix B, transform matrix a);

(4) and (transform matrix B ).

Wherein the first transformation matrix and the second transformation matrix in the parentheses may be a vertical direction transformation matrix and a horizontal direction transformation matrix, respectively.

The transformation matrix B may be DCT 2'.

The target transformation matrix pair may be one of the four transformation matrix pairs (specifically, which of the candidate transformation matrix pairs may be determined according to the target transformation matrix pair index value and the correspondence between the target transformation matrix pair index value and the candidate transformation matrix pair).

For the DCT2' matrix, since there is a fast algorithm for the transform/inverse transform of the DCT2' matrix, the DCT2' matrix is used to replace the DCT8 matrix/DCT 4 matrix in the prior art (there is no fast algorithm for the DCT8 matrix/DCT 4 matrix). The implementation of the transform/inverse transform of the DCT2' matrix can be further simplified. Meanwhile, since the DCT2 'matrix is the transpose of the DCT2 matrix, the DCT2' matrix can multiplex a circuit (such as a multiplier) for realizing inverse transform of the DCT2 matrix, which can improve the utilization efficiency of a hardware circuit.

For example, the 4x4DCT2 matrix is specified as follows:

transposing the matrix to obtain a 4x4DCT2 'matrix, wherein the 4x4DCT 2' matrix is specifically as follows:

the correspondence (which may also be referred to as a mapping) between the target transformation matrix pair index value and the candidate transformation matrix pair may be as shown in table 1, where DCT2' is the first transformation matrix of the two preset transformation matrices, and transformation matrix a is the second transformation matrix of the two preset transformation matrices.

TABLE 1

As shown in table 1, after obtaining the target transform matrix pair index, the decoding end may determine the target transform matrix pair according to the correspondence shown in table 1. For example, the decoding end obtains the index value of the target transform matrix pair to be 1 by analyzing the code stream, and then the decoding end can determine that the target transform matrix pair is composed of a transform matrix a and a DCT2 'matrix according to the corresponding relationship shown in table 1, where the transform matrix a is a vertical transform matrix in the target transform matrix pair, and the DCT2' is a horizontal transform matrix in the target transform matrix pair.

It should be understood that table 1 is only a specific form of the correspondence between the index values of the target transformation matrix pair and the candidate transformation matrix pairs (the candidate transformation matrix pair corresponding to each index value is also only an example), and in the present application, the correspondence between the index values of the target transformation matrix pair and the candidate transformation matrix pairs is not limited to the specific form shown in table 1, and the correspondence between the index values of the target transformation matrix pair and the candidate transformation matrix pairs may also be represented by other forms than tables, which is not limited in the present application.

In addition, the value of the index value by the target transformation matrix in table 1 is also only a specific case, and the present application does not limit which transformation matrix pair corresponds to when the index value of the target transformation matrix specifically takes a certain value (hereinafter, the table indicating the correspondence between the index value of the target transformation matrix pair and the candidate transformation matrix pair is also only a specific case, and the expression form of the correspondence between the index value of the target transformation matrix pair and the candidate transformation matrix pair is not limited to the form indicated by the table in the present application).

1004. Carrying out inverse transformation processing on the inverse quantization coefficient of the current block according to the target transformation matrix so as to obtain a reconstruction residual block of the current block;

1005. obtaining the reconstructed block of the current block according to the reconstructed residual block of the current block

Specifically, assuming that the target transform matrix pair is (a, B) and the obtained transform coefficient block is F, inverse transformation may be performed according to equation (1) to obtain a residual block R. Wherein A is a horizontal direction transformation matrix, B is a vertical direction transformation matrix,

R＝B’*F*A (1)

where B' represents the transpose of matrix B, since B is an orthogonal matrix, transposing B is equivalent to solving the inverse of B.

Furthermore, since the DCT2 'matrix is the transpose of the DCT2 matrix, the DCT2' can multiplex the inverse transform implementation circuit of the DCT2 during the inverse transform, which can reduce the hardware implementation cost.

Alternatively, the DCT2' matrix is derived from the DCT2 matrix.

The decoding end can store the matrix coefficient of the DCT2 matrix or the DCT2 matrix in advance, and when inverse transformation is required, the decoding end can derive the matrix coefficient of the DCT2' matrix according to the matrix coefficient in the DCT2 matrix.

In the application, since the DCT2' matrix is the transpose of the DCT2 matrix, the matrix coefficients of the DCT2' matrix can be derived from the matrix coefficients of the DCT2, and the matrix coefficients of the DCT2' matrix do not need to be stored additionally, which can reduce the storage overhead.

How to derive the DCT2' matrix from the DCT2 matrix is described below with reference to specific examples.

For example, the 4x4DCT2 matrix is specified as follows:

then, by transposing the 4x4DCT2 matrix, a 4x4DCT2 'matrix can be obtained, and the obtained 4x4DCT 2' matrix is specifically as follows:

optionally, as an embodiment, the second transformation matrix of the two preset transformation matrices is a DCT2 'FS matrix or a DCT2' F matrix.

The mirror image may be a left-right flip mirror image.

Optionally, the first matrix of the two predetermined transformation matrices is a DCT2' matrix, the second matrix is a DCT2' FS matrix or a DCT2' F matrix is applicable to all transformation sizes (for example, the transformation sizes may include 4 dots, 8 dots, 16 dots, 32 dots, and so on).

When the first matrix of the two transformation matrices is a DCT2 'matrix and the second matrix is a DCT2' FS matrix, the correspondence between the index values of the target transformation matrix pair and the candidate transformation matrix pairs may be as shown in table 2.

TABLE 2

In order to more clearly illustrate the effect of the present application in which the DCT2 'matrix and the DCT2' FS matrix are used as the target transform matrix pair at any transform size, the computational complexity during inverse transformation is analyzed by taking the DCT2 'matrix with 4-32 points and the DCT2' FS matrix with 4-32 points as an example in the target transform matrix pair, and the analysis results are shown in table 3.

TABLE 3

Transformation matrix	Fast algorithm (ODD matrix size)	Number of multiplications required for conversion
			32-point DCT2' FS matrix	ODD16+ODD8+ODD4+ODD2	16x16+8x8+4x4+2x2
16-point DCT2' FS matrix	ODD8+ODD4+ODD2	8x8+4x4+2x2
			8-point DCT2' FS matrix	ODD4+ODD2	4x4+2x2
4-point DCT2' FS matrix	ODD2	2x2
			32-point DCT2' matrix	ODD16+ODD8+ODD4+ODD2	16x16+8x8+4x4+2x2
16-point DCT2' matrix	ODD8+ODD4+ODD2	8x8+4x4+2x2
			8-point DCT2' matrix	ODD4+ODD2	4x4+2x2
4-point DCT2' matrix	ODD2	2x2

As can be seen from table 3, for the DCT2 'matrix and the DCT2' FS matrix of 4 dots, 8 dots, 16 dots, and 32 dots, the matrix multiplication size (ODD matrix size) does not exceed 16 × 16, which greatly reduces the number of multiplications, thereby greatly reducing the computational complexity.

In addition, through analyzing the transformation/inverse transformation process, the calculation complexity of the transformation process is mainly from large-size transformation, so that transformation matrixes with fast algorithms can be adopted for transformation in a large-size transformation scene, and transformation matrixes applied in some traditional schemes can still be adopted in a small-size transformation scene.

Optionally, as an embodiment, the horizontal transformation matrix in the target transformation matrix pair is a DCT2' matrix, and when the height of the current block is greater than or equal to M points, the vertical transformation matrix in the target transformation matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the current block is less than M points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer.

Optionally, as an embodiment, the vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the current block is greater than or equal to M points, the horizontal transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the current block is less than M points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer.

The above M may be 32.

When the above M is 32, if the transform size is 32 or more, the target transform matrix pair may be composed of a DCT2' matrix and a DCT2' FS matrix (or a DCT2' F matrix), and at this time, the correspondence between the target transform matrix pair index value and the candidate transform matrix pair may be as shown in table 4.

TABLE 4

When the above M is 32, if the transform size is less than 32, the candidate transform matrix pair may be composed of a DCT2' matrix and a DST4 matrix, and at this time, the correspondence between the target transform matrix pair index value and the candidate transform matrix pair may be as shown in table 5.

TABLE 5

Optionally, as an embodiment, the second transformation matrix of the two preset transformation matrices is derived from the DCT2 matrix.

In the application, since the DCT2 'FS matrix, the DCT2' F matrix and the DST4 matrix can be derived from the DCT2 matrix, it is not necessary to store the matrix coefficients of these matrices additionally, and the storage overhead can be reduced.

In the present application, for the DCT2 'FS matrix, the DCT2' F matrix, and the DST4 matrix, whether a 32-point matrix or a smaller size matrix can be derived from the 64-point DCT2 transform matrix.

For example, for a 32-point DCT2 'FS matrix, a 32-point DCT2' transform matrix may be obtained first according to a 32-point DCT2 matrix, and then a 32-point DCT2 'matrix may be subjected to mirroring and sign transformation to obtain a DCT2' FS (a 32-point DCT2 'F matrix may be obtained by only mirroring a DCT2' matrix).

For example, for a 16-point (or 8-point, 4-point) DST4 matrix, a DCT4 matrix may be extracted from a 32 (or 16, 8) -point DCT2 matrix, and then subjected to operations such as mirroring and sign transformation to obtain a DST4 matrix.

In summary, when the transform size is greater than or equal to 32, the two preset transform matrices may be DCT2' and DCT2' FS matrices (or DCT2' F matrices); when the transform size is less than 32, the preset two transform matrices may be a sum DCT2' matrix and a DST4 matrix.

How to derive the second of the two transformation matrices from the DCT2 matrix is described below with reference to specific examples.

The following describes the process of deriving the DCT2 'FS matrix and the DCT2' F matrix from the DCT2 matrix in detail, taking the 4 × 4 matrix as an example.

For example, the 4x4DCT2 matrix is specified as follows:

then, by transposing the 4x4DCT2 matrix, a 4x4DCT2 'matrix can be obtained first, and the obtained 4x4DCT 2' matrix is specifically as follows:

after obtaining the 4x4DCT2 'matrix, the 4x4DCT 2' F matrix can be obtained by left-right flipping and mirroring the matrix coefficients of the 4x4DCT2 'matrix, and the 4x4DCT 2' F matrix is specifically as follows:

when the second of the two predetermined transformation matrices is DCT2' F, the second of the two predetermined transformation matrices is obtained through the derivation process.

When the second transformation matrix of the two preset transformation matrices is DCT2 'FS, the 4x4DCT 2' F matrix may be sign-transformed to obtain a 4x4DCT2 'FS matrix, where the 4x4DCT 2' FS matrix is specifically as follows:

for more clear explanation, the present application adopts a DCT2 'matrix and a DCT2' FS matrix as a target transformation matrix pair in a large-size scene, and adopts a DCT2 'matrix and a DST4 matrix as a target transformation matrix pair in a small-size scene, so as to reduce the effect of the computational complexity when transforming, and the computational complexity of the transformation is analyzed by taking the example that the target transformation matrix pair is composed of a DCT2' matrix and a DST4 matrix (at 4-16 points), and the target transformation matrix pair is composed of a DCT2 'matrix and a DCT2' FS matrix (at 32 points), and the analysis results are shown in table 6.

TABLE 6

Transformation matrix	Fast algorithm (ODD matrix size)	Number of multiplications required for conversion
			32-point DCT2' FS matrix	ODD16+ODD8+ODD4+ODD2	16x16+8x8+4x4+2x2
4-16 point DST4 matrix	Without fast algorithms, using matrix multiplication	16x16x16+8x8x8+4x4x4
			32-point DCT2' matrix	ODD16+ODD8+ODD4+ODD2	16x16+8x8+4x4+2x2
16-point DCT2' matrix	ODD8+ODD4+ODD2	8x8+4x4+2x2
			8-point DCT2' matrix	ODD4+ODD2	4x4+2x2
4-point DCT2' matrix	ODD2	2x2

As can be seen from Table 6, for the 4-16 point DST4 matrix and the DCT2 'matrix and the 32 point DCT2' FS matrix, the matrix multiplication size (ODD matrix size) does not exceed 16x16, which greatly reduces the number of multiplications and thus greatly reduces the computational complexity.

In the present application, when the transform size is 32 points, the DCT2 'matrix may be a 32 × 32 matrix, and the DCT2' F matrix may be a 32 × 32 matrix.

The 32 × 32DCT 2' matrix is specified as follows:

as can be seen from the above example, the first row coefficients of the DCT2' matrix are arranged from large to small (except for the first coefficient), which is close to or similar to the first row coefficient ordering rule of the DCT8 matrix/DCT 4 matrix. And the first row coefficients of the DCT2' F matrix are arranged from small to large (except the last coefficient), which is close to or similar to the ordering rule of the first row coefficients of the DST7 matrix/DST 4 matrix. Therefore, in the embodiment of the present application, by using a transformation matrix that is similar to the transformation matrix used in the existing scheme, it is possible to ensure that the performance loss is small in the transformation/inverse transformation process.

In the present application, when the target transform matrix pair is composed of a DCT2' matrix and a DCT2' FS matrix (or a DCT2' F matrix or a DST4 matrix), the transform matrix in the target transform matrix pair may be derived from the DCT2 matrix for any transform size.

The following description details the process of deriving the 4x4DCT2 'matrix and the 4x4DCT 2' FS matrix from the 8x8DCT2 matrix by taking the transform size of 4x4 as an example.

As shown in fig. 7, partial coefficients are extracted from an 8 × 8DCT2 matrix (matrix coefficients of odd rows in the left half of the 8 × 8DCT2 matrix are extracted in a size of 4 × 4), a 4 × 4 matrix (i.e., a DCT 24 × 4 transform matrix) is formed, and then transposed, a 4 × 4DCT 2' matrix is obtained.

As shown in fig. 7, a 4x4DCT 2' FS matrix is obtained by extracting partial coefficients from an 8x8DCT2 matrix (extracting matrix coefficients of even rows of the 8x8DCT2 matrix according to a size of 4 × 4) to form a 4x4 matrix, and transposing and sign-transforming the 4x4 matrix (inverting the coefficients by rows or columns).

In order to analyze the effect of the present application, when the transformation size is greater than or equal to 32, the target transformation matrix pair is composed of a DCT2 'matrix and a DCT2' FS matrix (or a DCT2 'F matrix), and when the transformation size is less than 32, the target transformation matrix pair is tested for performance when composed of a DCT2' matrix and a DST4 matrix, and compared with the mode that a DCT8 matrix and a DST7 matrix are adopted in the existing scheme, the test performance of the present application is shown in table 7 and table 8.

TABLE 7

Testing video sequences	Y	U	V	EncT	DecT
						Sequence A1	0.79％	1.14％	1.18％	95％	91％
Sequence A2	0.34％	0.33％	0.31％	95％	92％
						Sequence B	0.19％	0.29％	0.30％	95％	93％
Sequence C	-0.23％	-0.09％	0.08％	97％	95％
						Sequence E	0.16％	0.25％	0.17％	96％	96％
Complete sequence	0.22％	0.35％	0.38％	96％	93％
						Sequence D	-0.31％	-0.07％	-0.42％	97％	97％

TABLE 8

Testing video sequences	Y	U	V	EncT	DecT
						Sequence A1	0.43％	0.69％	0.70％	99％	98％
Sequence A2	0.20％	0.23％	0.12％	99％	99％
						Sequence B	0.18％	0.17％	0.38％	99％	98％
Sequence C	-0.06％	0.05％	0.07％	99％	98％
						Complete sequence	0.17％	0.25％	0.31％	99％	98％
Sequence D	-0.11％	-0.26％	0.09％	99％	98％

Table 7 shows test results obtained in the intra prediction mode, and table 8 shows test results obtained in the random access mode (either intra prediction or inter prediction may be used in prediction). Y denotes the luminance component of the video image, U/V denotes the chrominance component of the video image, the value below Y/U/V denotes the percentage of the increase of the coded bits at the same video image quality, a negative value denotes the decrease of the coded bits. EncT and DecT denote encoding and decoding times, respectively.

As can be seen from table 7, the present application increases the encoding bit by 0.22% (mainly looking at the luminance component Y) compared to the prior art scheme, while the encoding time is reduced by 4% and the decoding time is reduced by 7%. As can be seen from table 8, the present application increases the encoding bit by 0.17% (mainly looking at the luminance component Y) compared to the prior art scheme, while the encoding time is reduced by 1% and the decoding time is reduced by 2%.

As can be seen from tables 7 and 8, compared with the existing scheme, the present invention can simplify the calculation process, reduce the encoding and decoding time, and improve the encoding and decoding efficiency.

As can be seen from tables 7 and 8, since the number of coded bits increases more for sequence a, the DST7 matrix or the DST4 matrix may be set as a transformation matrix in the target transformation matrix pair in order to improve performance.

Optionally, as an embodiment, the horizontal transformation matrix in the target transformation matrix pair is a DCT2' matrix, and when the height of the current block is greater than or equal to N point, the vertical transformation matrix in the target transformation matrix pair is a DST7 matrix; when the height of the current block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, where N is a positive integer.

Optionally, as an embodiment, the vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the current block is greater than or equal to N points, the horizontal transform matrix in the target transform matrix pair is a DST7 matrix; when the width of the current block is less than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, where N is a positive integer.

Optionally, N-16.

Since the change slope of the leading coefficient of the DST7 matrix is gentler than that of the DST4, when the DST7 is applied to a large-size transformation scene, and the DST4 matrix is applied to a lower-size scene, better coding and decoding performance can be obtained.

When the transform size of the current block is greater than or equal to 16 points, the target transform matrix pair may be composed of a DCT2' matrix and a DST7 matrix, and at this time, the correspondence between the target transform matrix pair index value and the candidate transform matrix pair may be as shown in table 9.

TABLE 9

When the transform size of the current block is less than 16 points, the target transform matrix pair may be composed of a DCT2' matrix and a DST4 matrix, and at this time, the correspondence between the target transform matrix pair index value and the candidate transform matrix pair may be as shown in table 10.

Watch 10

The DST matrix of 4 × 4 and the DST7 transformation matrix of 4 × 4 may be specifically as follows:

4 × 4 DST transform matrix:

4 × 4 DST7 transform matrix:

in order to analyze the effect of the present application, the present application tests the performance of the transformation/inverse transformation when the target transformation matrix pair is composed of the DCT2 'matrix and the DST7 matrix when the variable size is greater than or equal to 16, and when the target transformation matrix pair is composed of the DCT2' matrix and the DST4 matrix when the variable size is less than 16, and the test performance of the present application is shown in table 11 and table 12, compared with the existing solutions that use the DCT8 matrix and the DST7 matrix.

TABLE 11

Testing video sequences	Y	U	V	EncT	DecT
						Sequence A1	0.10％	0.44％	-0.06％	99％	99％
Sequence A2	-0.15％	-0.56％	-0.15％	98％	99％
						Sequence B	-0.33％	0.11％	-0.53％	98％	102％
Sequence C	-0.36％	-0.13％	0.08％	101％	102％
						Sequence E	-0.27％	-0.25％	-0.24％	100％	98％
Complete sequence	-0.22％	-0.06％	-0.20％	99％	100％
						Sequence D	-0.28％	-1.44％	0.03％	101％	100％

TABLE 12

Testing video sequences	Y	U	V	EncT	DecT
						Sequence A1	0.00％	-0.18％	0.33％	100％	101％
Sequence A2	-0.11％	-0.09％	-0.12％	100％	98％
						Sequence B	-0.21％	-0.30％	-0.26％	99％	99％
Sequence C	-0.17％	0.39％	0.12％	100％	95％
						Complete sequence	-0.13％	-0.05％	-0.01％	100％	98％
Sequence D	-0.11％	-0.10％	1.19％	100％	102％

Table 11 shows test results obtained under the intra prediction mode test condition, and table 12 shows test results obtained under the random access mode test condition (either intra prediction or inter prediction may be used in prediction). Y denotes the luminance component of the video image, U/V denotes the chrominance component of the video image, the value below Y/U/V denotes the percentage of the increase of the coded bits at the same video image quality, a negative value denotes the decrease of the coded bits. EncT and DecT denote encoding and decoding times, respectively.

As can be seen from table 11, compared with the prior art, the present application reduces the encoding bit by 0.22% (mainly looking at the luminance component Y), which is equivalent to the performance improvement by 0.22%, and the encoding time is reduced by 1%. As can be seen from table 12, compared with the existing scheme, the present application reduces the encoding bits by 0.13% (mainly looking at the luminance component Y), which is equivalent to the performance improvement by 0.13%, and the decoding time is reduced by 2%.

As can be seen from tables 11 and 12, compared with the existing scheme, the present application can simplify the calculation process, reduce the encoding and decoding time, and improve the encoding and decoding performance.

Optionally, as an embodiment, the method shown in fig. 6 further includes: analyzing the code stream to obtain a multi-core conversion flag bit; determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relationship between the target transformation matrix pair index value and the candidate transformation matrix pair, including: under the condition that the value of the multi-core transformation zone bit is a first value, determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and the corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair; and under the condition that the value of the multi-core transformation zone bit is the first value, determining the DCT2 matrix as a target transformation matrix pair.

Here, determining the DCT2 matrix as the target transform matrix pair may refer to taking the DCT2 matrix as a transform matrix in the horizontal direction and a transform matrix in the vertical direction in the target transform matrix pair.

The multi-core transformation flag bit may be specifically an MTS _ flag, and when the MTS _ flag is a first value, it indicates that multi-core transformation is performed, and when the MTS _ flag is a second value, it indicates that multi-core transformation is not performed.

The first value and the second value may be 1 and 0, respectively, or the second value and the first value may be 0 and 1, respectively.

The video decoding method according to the embodiment of the present application is described in detail with reference to fig. 6, and the video encoding method according to the embodiment of the present application is described with reference to fig. 8, it should be understood that the video encoding method shown in fig. 8 corresponds to the video encoding method shown in fig. 6 (the code stream finally obtained by the video encoding method shown in fig. 8 can be processed by the video decoding method shown in fig. 6), and in order to avoid unnecessary repetition, the following description will appropriately omit repeated description when the video encoding method according to the embodiment of the present application is described with reference to fig. 8.

Fig. 8 is a schematic flow chart of a video encoding method of an embodiment of the present application. Fig. 8 is a schematic flow chart of a video encoding method of an embodiment of the present application. The method shown in fig. 8 may be executed by an encoding end device, and the method shown in fig. 8 includes steps 2001 to 2005, and steps 2001 to 2005 are described below respectively.

2001. And acquiring a residual block of the image block to be processed.

The residual block may be obtained by subtracting the to-be-processed image block from the prediction block (as shown in fig. 1, a residual block may be obtained by subtracting the to-be-processed image block from the prediction block), and an inter prediction mode or an intra prediction mode may be adopted in the process of obtaining the prediction block of the to-be-processed image block.

2002. And obtaining a candidate transformation matrix pair of the residual block according to preset mapping relation information.

The candidate transformation matrix pair comprises a horizontal direction transformation matrix and a vertical direction transformation matrix, the horizontal direction transformation matrix and the vertical direction transformation matrix are both one of two preset transformation matrices, the first transformation matrix of the two preset transformation matrices is a DCT2 'matrix, and the DCT2' matrix is a transposed matrix of the DCT2 matrix.

The mapping relationship information may include a target transformation matrix pair index value and a transformation matrix pair corresponding to the target index value, and the mapping relationship information may be pre-stored at the encoding end and the decoding end.

The correspondence (which may also be referred to as a mapping) between the target transform matrix pair index value and the candidate transform matrix pair may be as shown in table 13, where DCT2' is the first transform matrix of the two preset transform matrices, and transform a is the second transform matrix of the two preset transform matrices.

Watch 13

As shown in table 13, it can be known that the candidate transformation matrix pair is composed of the transformation matrix a and the DCT2' according to the mapping relationship information shown in table 13.

2003. And selecting the transformation matrix pair with the minimum rate distortion from the candidate transformation matrix pairs as a target transformation matrix pair.

For example, the encoding end may try 4 transform matrix pairs in table 13 during the transformation process, calculate the rate-distortion cost corresponding to each transform matrix pair, and then select the transform matrix pair with the smallest rate-distortion cost as the target transform matrix pair.

2004. And transforming the residual block according to the target transformation matrix pair to obtain a transformation coefficient of the image block to be processed.

Specifically, assuming that the target transform matrix pair is (a, B) and the residual block is R, then the transform (i.e., matrix multiplication) may be performed according to equation (2) to obtain the residual block transform coefficient F. Wherein A is a transformation matrix in the horizontal direction, B is a transformation matrix in the vertical direction,

F＝B*R*A’ (2)

where a' represents the transpose of matrix a, since a is an orthogonal matrix, transposing a is equivalent to solving the inverse of a.

2005. And writing the index value of the target transformation matrix pair corresponding to the target transformation matrix pair into the code stream.

For example, by calculating the rate-distortion cost corresponding to each candidate transformation matrix pair shown in table 13, and finding that the rate-distortion cost corresponding to the (DCT2 'matrix, transformation matrix a) is the lowest, the (DCT 2' matrix, transformation matrix a) can be used as the target transformation matrix pair. Next, the residual block may be transformed according to the transform matrix pair, and an index value (specifically, 2) of the target transform matrix pair corresponding to the (DCT 2' matrix, transform matrix a) is written into the code stream. Thus, after the decoding end analyzes the code stream, the target transformation matrix pair can be determined according to the index value 2 of the target transformation matrix pair and the corresponding relation between the index value of the target transformation matrix pair and the candidate transformation matrix pair which are stored in advance.

Alternatively, as an embodiment, the DCT2' matrix is derived from the DCT2 matrix.

Optionally, the mirror image is a left-right flip mirror image.

Optionally, as an embodiment, the horizontal transformation matrix in the target transformation matrix pair is a DCT2' matrix, and when the height of the residual block is greater than or equal to M points, the vertical transformation matrix in the target transformation matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the residual block is less than M points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, where M is a positive integer.

Optionally, as an embodiment, the vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the residual block is greater than or equal to M points, the horizontal transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the width of the residual block is smaller than M points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, where M is a positive integer.

Optionally, M is 32.

Optionally, as an embodiment, the second transformation matrix of the two preset transformation matrices is derived from a DCT2 matrix.

Optionally, as an embodiment, the horizontal transformation matrix in the target transformation matrix pair is a DCT2' matrix, and when the height of the residual block is greater than or equal to N points, the vertical transformation matrix in the target transformation matrix pair is a DST7 matrix; when the height of the residual block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

Optionally, as an embodiment, the vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the residual block is greater than or equal to N points, the horizontal transform matrix in the target transform matrix pair is a DST7 matrix; when the width of the residual block is smaller than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, where N is a positive integer.

Optionally, N-16.

For a more detailed understanding of the transformation matrix in the present application, the basic structure of the transformation kernel (from which the corresponding transformation matrix can be derived) corresponding to the commonly used transformation matrix is described below, and the basis functions of the transformation kernel corresponding to the commonly used transformation matrix are shown in table 14.

TABLE 14

In order to make clear that the implementation circuit of the transform matrix in the embodiment of the present application may multiplex a transform/inverse transform implementation circuit corresponding to the 2Nx2N DCT2 matrix, the circuit multiplexing is specifically described as follows.

Fig. 9 illustrates a butterfly fast algorithm circuit implementation of a 16 × 16DCT2 matrix in HEVC, and it can be seen from fig. 9 that the butterfly fast algorithm circuit of the 16 × 16DCT2 matrix includes implementation circuits of a 4 × 4DCT2 matrix, an 8 × 8DCT2 matrix, a 4 × 4DCT4 matrix, and an 8 × 8DCT4 matrix, that is, a circuit implementation of the 16 × 16DCT2 matrix can be directly multiplexed when the transforms of the 4 × 4DCT2 matrix, the 8 × 8DCT2 matrix, the 4 × 4DCT4 matrix, and the 8 × 8DCT4 matrix are implemented; however, only the 4 × 4DCT2 matrix and the 8 × 8DCT2 matrix can multiplex the butterfly fast algorithm circuit, and the implementation of the 4 × 4DCT4 matrix and the 8 × 8DCT4 matrix can multiplex the implementation circuit of the 16 × 16DCT2 matrix, but the butterfly fast algorithm is not used.

In addition, a method for quickly implementing partial button (corresponding chinese translation is partial butterfly) of an inverse Transform circuit disclosed in the Core Transform Design in the High Efficiency Video Coding (HEVC) Standard (corresponding chinese translation is Core Transform Design in the HEVC Standard). The inverse DCT2 matrix transform can be implemented by decomposing into three modules, namely, EVEN, ODD and ADDSUB, wherein EVEN represents the column transform by using the matrix composed of the ODD row coefficients of the DCT2 matrix, ODD represents the column transform by using the matrix composed of the EVEN row coefficients of the DCT2 matrix, and ADDSUB represents the addition and subtraction module.

For example, fig. 10 depicts a 32 × 32 inverse transform implementation circuit, wherein the EVEN4 module, the ODD4 module, and the ADDSUB4 module constitute a 4 × 4 matrix inverse transform implementation circuit 701; the inverse transform realization circuit 701 of the 4 × 4 matrix, the ODD8 module and the ADDSUB8 module form an inverse transform realization circuit 702 of the 8 × 8 matrix; the inverse transform realization circuit 702 of the 8 × 8 matrix, the ODD16 module and the ADDSUB16 module form an inverse transform realization circuit 703 of the 16 × 16 matrix; the 16 × 16 matrix inverse transform circuit implementation 703, the ODD16 module, and the ADDSUB16 module form a 32 × 32 matrix inverse transform implementation circuit 704.

While the video encoding method and the video encoding method of the embodiment of the present application are described in detail with reference to fig. 1 to 10, the video decoder of the embodiment of the present application is described with reference to fig. 11, the video decoder shown in fig. 11 is capable of executing the steps in the video decoding method of the embodiment of the present application, the above-mentioned limitations regarding the video decoder method of the embodiment of the present application are also applicable to the video decoder shown in fig. 11, and in order to avoid unnecessary repetition, the description of the video decoder apparatus of the embodiment of the present application is appropriately omitted.

Fig. 11 is a schematic block diagram of a video decoder of an embodiment of the present application. The video decoder 300 shown in fig. 11 includes:

an entropy decoding unit 310, configured to parse the code stream to obtain a target transform matrix pair index value of a current block subjected to inverse transform processing and a quantization coefficient of the current block;

an inverse quantization unit 320, configured to perform inverse quantization on the quantized coefficient of the current block to obtain an inverse quantized coefficient of the current block;

an inverse transformation processing unit 330, configured to determine a target transform matrix pair from the candidate transform matrix pair according to the target transform matrix pair index value and a corresponding relationship between the target transform matrix pair index value and the candidate transform matrix pair, where the candidate transform matrix pair includes a horizontal direction transform matrix and a vertical direction transform matrix, both of the horizontal direction transform matrix and the vertical direction transform matrix are one of two preset transform matrices, a first transform matrix of the two preset transform matrices is a DCT2 'matrix, and a DCT2' matrix is a transposed matrix of a DCT2 matrix;

the inverse transform processing unit 330 is further configured to perform inverse transform processing on the inverse quantization coefficient of the current block according to the target transform matrix to obtain a reconstructed residual block of the current block;

a reconstructing unit 340, configured to obtain a reconstructed block of the current block according to the reconstructed residual block of the current block.

Optionally, as an embodiment, a second transformation matrix of the two preset transformation matrices is a DCT2 'FS matrix or a DCT2' F matrix.

Optionally, as an embodiment, the horizontal transformation matrix in the target transformation matrix pair is a DCT2' matrix, and the vertical transformation matrix in the target transformation matrix pair is a DST7 matrix when the height of the current block is greater than or equal to N points; when the height of the current block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

Optionally, as an embodiment, the vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and the horizontal transform matrix in the target transform matrix pair is a DST7 matrix when the width of the current block is greater than or equal to N points; when the width of the current block is less than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video decoding method, comprising:

analyzing the code stream to obtain a target transformation matrix pair index value of a current block subjected to inverse transformation processing and a quantization coefficient of the current block;

performing inverse quantization processing on the quantized coefficient of the current block to obtain an inverse quantized coefficient of the current block;

determining a target transformation matrix pair from the candidate transformation matrix pair according to the target transformation matrix pair index value and a corresponding relation between the target transformation matrix pair index value and the candidate transformation matrix pair, wherein the candidate transformation matrix pair comprises a horizontal direction transformation matrix and a vertical direction transformation matrix, the horizontal direction transformation matrix and the vertical direction transformation matrix are both one of two preset transformation matrices, a first transformation matrix of the two preset transformation matrices is a DCT2 'matrix, a DCT2' matrix is a transposed matrix of a DCT2 matrix, and a second transformation matrix of the two preset transformation matrices is derived according to a DCT2 matrix;

carrying out inverse transformation processing on the inverse quantization coefficient of the current block according to the target transformation matrix so as to obtain a reconstructed residual block of the current block;

and obtaining a reconstructed block of the current block according to the reconstructed residual block of the current block.

2. The method of claim 1, wherein a second transformation matrix of the two predetermined transformation matrices is a DCT2' FS matrix or a DCT2' F matrix, wherein F in the DCT2' FS matrix and the DCT2' F matrix represents a mirror image, S in the DCT2' FS matrix represents a sign transformation, the DCT2' F matrix is a matrix obtained by mirroring the DCT2' matrix, and the DCT2' FS matrix is a matrix obtained by first mirroring the DCT2' matrix and then sign transforming the matrix obtained by mirroring.

3. The method of claim 1 or 2, wherein a horizontal transform matrix in the target transform matrix pair is a DCT2' matrix, and a vertical transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix when the height of the current block is greater than or equal to M points; when the height of the current block is smaller than M points, a vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer, F in a DCT2' FS matrix and a DCT2' F matrix represents a mirror image, S in the DCT2' FS matrix represents sign transformation, the DCT2' F matrix is a matrix obtained after the DCT2' matrix is mirrored, and the DCT2' FS matrix is a matrix obtained by firstly mirroring the DCT2' matrix and then carrying out sign transformation on the matrix obtained by mirroring.

4. The method of claim 1 or 2, wherein a vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the current block is greater than or equal to M points, a horizontal transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the current block is smaller than M points, a horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer, F in a DCT2' FS matrix and a DCT2' F matrix represents mirroring, S in the DCT2' FS matrix represents sign transformation, the DCT2' F matrix is a matrix obtained after mirroring the DCT2' matrix, and the DCT2' FS matrix is a matrix obtained by firstly mirroring the DCT2' matrix and then carrying out sign transformation on the matrix obtained by mirroring.

5. The method of claim 1, wherein a horizontal transform matrix in the target pair of transform matrices is a DCT2' matrix, and a vertical transform matrix in the target pair of transform matrices is a DST7 matrix when the height of the current block is greater than or equal to N points; when the height of the current block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

6. The method of claim 1, wherein a vertical transform matrix in the target pair of transform matrices is a DCT2' matrix, and wherein a horizontal transform matrix in the target pair of transform matrices is a DST7 matrix when the width of the current block is greater than or equal to N points; when the width of the current block is less than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

7. A video decoder, comprising:

the entropy decoding unit is used for analyzing the code stream to acquire a target transformation matrix pair index value of a current block subjected to inverse transformation processing and a quantization coefficient of the current block;

the inverse quantization unit is used for carrying out inverse quantization processing on the quantization coefficient of the current block so as to obtain the inverse quantization coefficient of the current block;

an inverse transformation processing unit, configured to determine a target transform matrix pair from a candidate transform matrix pair according to the target transform matrix pair index value and a corresponding relationship between the target transform matrix pair index value and the candidate transform matrix pair, where the candidate transform matrix pair includes a horizontal direction transform matrix and a vertical direction transform matrix, both of the horizontal direction transform matrix and the vertical direction transform matrix are one of two preset transform matrices, a first transform matrix of the two preset transform matrices is a DCT2 'matrix, where a DCT2' matrix is a transposed matrix of a DCT2 matrix, and a second transform matrix of the two preset transform matrices is derived according to a DCT2 matrix;

the inverse transformation processing unit is further configured to perform inverse transformation processing on the inverse quantization coefficient of the current block according to the target transformation matrix to obtain a reconstructed residual block of the current block;

and the reconstruction unit is used for obtaining a reconstruction block of the current block according to the reconstruction residual block of the current block.

8. The video decoder of claim 7, wherein a second transform matrix of the two predetermined transform matrices is a DCT2' FS matrix or a DCT2' F matrix, wherein F of the DCT2' FS matrix and the DCT2' F matrix represents mirroring, S of the DCT2' FS matrix represents sign transformation, the DCT2' F matrix is a matrix obtained by mirroring the DCT2' matrix, and the DCT2' FS matrix is a matrix obtained by first mirroring the DCT2' matrix and then sign transforming the mirrored matrix.

9. The video decoder of claim 7 or 8, wherein a horizontal transform matrix in the target transform matrix pair is a DCT2' matrix, and when the height of the current block is greater than or equal to M points, a vertical transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the current block is smaller than M points, a vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer, F in a DCT2' FS matrix and a DCT2' F matrix represents a mirror image, S in the DCT2' FS matrix represents sign transformation, the DCT2' F matrix is a matrix obtained after the DCT2' matrix is mirrored, and the DCT2' FS matrix is a matrix obtained by firstly mirroring the DCT2' matrix and then carrying out sign transformation on the matrix obtained by mirroring.

10. The video decoder of claim 7 or 8, wherein a vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and when the width of the current block is greater than or equal to M points, a horizontal transform matrix in the target transform matrix pair is a DCT2' FS matrix or a DCT2' F matrix; when the height of the current block is smaller than M points, a horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein M is a positive integer, F in a DCT2' FS matrix and a DCT2' F matrix represents mirroring, S in the DCT2' FS matrix represents sign transformation, the DCT2' F matrix is a matrix obtained after mirroring the DCT2' matrix, and the DCT2' FS matrix is a matrix obtained by firstly mirroring the DCT2' matrix and then carrying out sign transformation on the matrix obtained by mirroring.

11. The video decoder of claim 7, wherein a horizontal transform matrix in the target transform matrix pair is a DCT2' matrix, and a vertical transform matrix in the target transform matrix pair is a DST7 matrix when the height of the current block is greater than or equal to N points; when the height of the current block is less than N points, the vertical transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.

12. The video decoder of claim 7, wherein a vertical transform matrix in the target transform matrix pair is a DCT2' matrix, and wherein a horizontal transform matrix in the target transform matrix pair is a DST7 matrix when the width of the current block is greater than or equal to N points; when the width of the current block is less than N points, the horizontal transformation matrix in the target transformation matrix pair is a DST4 matrix, wherein N is a positive integer.