WO2024007116A1 - 解码方法、编码方法、解码器以及编码器 - Google Patents

解码方法、编码方法、解码器以及编码器 Download PDF

Info

Publication number
WO2024007116A1
WO2024007116A1 PCT/CN2022/103654 CN2022103654W WO2024007116A1 WO 2024007116 A1 WO2024007116 A1 WO 2024007116A1 CN 2022103654 W CN2022103654 W CN 2022103654W WO 2024007116 A1 WO2024007116 A1 WO 2024007116A1
Authority
WO
WIPO (PCT)
Prior art keywords
mode
mip
intra prediction
prediction mode
current block
Prior art date
Application number
PCT/CN2022/103654
Other languages
English (en)
French (fr)
Inventor
谢志煌
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/103654 priority Critical patent/WO2024007116A1/zh
Priority to TW112123269A priority patent/TW202404370A/zh
Publication of WO2024007116A1 publication Critical patent/WO2024007116A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • the embodiments of the present application relate to the technical field of image and video encoding and decoding, and more specifically, to a decoding method, an encoding method, a decoder, and an encoder.
  • Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
  • Digital video compression standards can implement video decompression technology, there is still a need to pursue better digital video decompression technology to Improving compression efficiency.
  • Embodiments of the present application provide a decoding method, encoding method, decoder and encoder, which can improve compression efficiency.
  • embodiments of the present application provide a decoding method, including:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • a reconstructed block of the current block is determined based on the prediction block of the current block and the residual block of the current block.
  • embodiments of the present application provide an encoding method, including:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • the fourth transform coefficient is encoded.
  • embodiments of the present application provide a decoder, including:
  • An analysis unit used to analyze the code stream of the current sequence to obtain the first transform coefficient of the current block
  • Transform unit used for:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • a reconstruction unit configured to determine a reconstruction block of the current block based on the prediction block of the current block and the residual block of the current block.
  • an encoder including:
  • Residual unit used to obtain the residual block of the current block in the current sequence
  • Transform unit used for:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • Coding unit used to encode the fourth transform coefficient.
  • embodiments of the present application provide a decoder, including:
  • a processor adapted to implement computer instructions
  • the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the decoding method in the above-mentioned first aspect or its respective implementations.
  • processors there are one or more processors and one or more memories.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • an encoder including:
  • a processor adapted to implement computer instructions
  • the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the encoding method in the above-mentioned second aspect or its respective implementations.
  • processors there are one or more processors and one or more memories.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions.
  • the computer instructions When the computer instructions are read and executed by a processor of a computer device, the computer device executes the above-mentioned step.
  • embodiments of the present application provide a code stream, which code stream relates to the code stream in the above-mentioned first aspect or the code stream involved in the above-mentioned second aspect.
  • the quality of the current block can be improved.
  • Decompression performance when the decoder uses a non-traditional intra prediction mode to predict the current block, it can avoid directly using the transformation set corresponding to the planar mode to perform the first transformation, and the transformation set corresponding to the first intra prediction mode is in a certain range. It can reflect the texture direction of the current block to a certain extent, thereby improving the decompression performance of the current block.
  • Figure 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of a prediction mode derived based on DIMD provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of deriving prediction blocks based on DIMD provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
  • Figure 6 is an example of LFNST provided by the embodiment of the present application.
  • Figure 7 is an example of a transformation set of LFNST provided by the embodiment of the present application.
  • Figure 8 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • Figure 9 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
  • Figure 10 is a schematic flow chart of the encoding method provided by the embodiment of the present application.
  • Figure 11 is a schematic block diagram of a decoder provided by an embodiment of the present application.
  • Figure 12 is a schematic block diagram of an encoder provided by an embodiment of the present application.
  • Figure 13 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the solutions provided by the embodiments of this application can be applied to the field of digital video coding technology, including but not limited to: image coding and decoding, video coding and decoding, hardware video coding and decoding, dedicated circuit video coding and decoding, and real-time video coding and decoding. field.
  • the solution provided by the embodiments of the present application can be combined with the Audio Video Coding Standard (AVS), the second generation AVS standard (AVS2) or the third generation AVS standard (AVS3).
  • AVS Audio Video Coding Standard
  • VVC Very Video Coding
  • the solution provided by the embodiment of the present application can be used to perform lossy compression on images (lossy compression), or can also be used to perform lossless compression on images (lossless compression).
  • the lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • Video coding and decoding standards all adopt block-based hybrid coding framework. Specifically, each image in the video is divided into the largest coding unit (LCU) or coding tree unit (Coding Tree Unit, CTU) of the same size (such as 128x128, 64x64, etc.) squares. Each maximum coding unit or coding tree unit can be divided into rectangular coding units (coding units, CU) according to rules. The coding unit may also be divided into prediction unit (PU), transformation unit (TU), etc.
  • the hybrid coding framework includes prediction, transform, quantization, entropy coding, in loop filter and other modules.
  • the prediction module includes intra prediction and inter prediction. Inter-frame prediction includes motion estimation (motion estimation) and motion compensation (motion compensation).
  • the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels.
  • Intra-frame prediction only refers to the information of the same image and predicts the pixel information within the current divided block. Since there is a strong similarity between adjacent images in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent images, thereby improving coding efficiency.
  • Inter-frame prediction can refer to image information of different frames and use motion estimation to search for motion vector information that best matches the current divided block. The transformation converts the predicted image blocks into the frequency domain and redistributes the energy. Combined with quantization, it can remove information that is insensitive to the human eye and is used to eliminate visual redundancy.
  • Entropy coding can eliminate character redundancy based on the current context model and the probability information of the binary code stream.
  • the encoder can first read a black-and-white image or color image from the original video sequence, and then encode the black-and-white image or color image.
  • the black-and-white image may include pixels of the brightness component
  • the color image may include pixels of the chrominance component.
  • the color image may also include pixels with a brightness component.
  • the color format of the original video sequence can be luminance-chrominance (YCbCr, YUV) format or red-green-blue (Red-Green-Blue, RGB) format, etc.
  • the encoder reads a black-and-white image or a color image, it divides it into blocks respectively, and uses intra-frame prediction or inter-frame prediction for the current block to generate a predicted block of the current block.
  • the prediction block is subtracted from the original block of the current block. block to obtain a residual block, transform and quantize the residual block to obtain a quantized coefficient matrix, entropy encode the quantized coefficient matrix and output it to the code stream.
  • the decoder uses intra prediction or inter prediction for the current block to generate a prediction block of the current block.
  • the decoding end decodes the code stream to obtain the quantization coefficient matrix, performs inverse quantization and inverse transformation on the quantization coefficient matrix to obtain the residual block, and adds the prediction block and the residual block to obtain the reconstruction block.
  • Reconstruction blocks can be used to compose a reconstructed image, and the decoder performs loop filtering on the reconstructed image based on images or blocks to obtain a decoded image.
  • the current block can be the current coding unit (CU) or the current prediction unit (PU), etc.
  • the encoding end also requires similar operations as the decoding end to obtain the decoded image.
  • the decoded image can be used as a reference image for inter-frame prediction of subsequent images.
  • the block division information determined by the encoding end, mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., need to be output to the code stream if necessary.
  • the decoding end determines the same block division information as the encoding end through parsing and analyzing based on existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image and decoding obtained by the encoding end
  • the decoded image obtained at both ends is the same.
  • the decoded image obtained at the encoding end is usually also called a reconstructed image.
  • the current block can be divided into prediction units during prediction, and the current block can be divided into transformation units during transformation.
  • the divisions between prediction units and transformation units can be the same or different.
  • the above is only the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules of the framework or some steps of the process may be optimized. This application is applicable to this block-based hybrid coding.
  • Figure 1 is a schematic block diagram of a coding framework 100 provided by an embodiment of the present application.
  • the coding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transformation and quantization unit 120, an entropy coding unit 130, an inverse transformation and inverse quantization unit 140, and a loop. filter unit 150.
  • the encoding framework 100 may also include a decoded image buffer unit 160. This coding framework 100 may also be called a hybrid framework coding mode.
  • the intra prediction unit 180 or the inter prediction unit 170 may predict the image block to be encoded to output the prediction block.
  • the residual unit 110 may calculate a residual block, that is, a difference value between the prediction block and the image block to be encoded, based on the prediction block and the image block to be encoded.
  • the transformation and quantization unit 120 is used to perform operations such as transformation and quantization on the residual block to remove information that is insensitive to the human eye, thereby eliminating visual redundancy.
  • the residual block before transformation and quantization by the transformation and quantization unit 120 may be called a time domain residual block
  • the time domain residual block after transformation and quantization by the transformation and quantization unit 120 may be called a frequency residual block. or frequency domain residual block.
  • the entropy encoding unit 130 may output a code stream based on the transform quantization coefficient. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-based adaptive binary arithmetic entropy coding (CABAC). The entropy encoding unit 130 may also be called a header information encoding unit.
  • CABAC context-based adaptive binary arithmetic entropy coding
  • the image block to be encoded can also be called an original image block or a target image block, and a prediction block can also be called a predicted image block or image prediction block, and can also be called a prediction signal or prediction information.
  • the reconstruction block may also be called a reconstructed image block or an image reconstruction block, and may also be called a reconstructed signal or reconstructed information.
  • the image block to be encoded may also be called an encoding block or a coded image block, and for the decoding end, the image block to be encoded may also be called a decoding block or a decoded image block.
  • the image block to be encoded may be a CTU or a CU.
  • the encoding framework 100 calculates the residual between the prediction block and the image block to be encoded to obtain the residual block, and then transmits the residual block to the decoder through processes such as transformation and quantization.
  • the decoder receives and decodes the code stream, it obtains the residual block through steps such as inverse transformation and inverse quantization.
  • the prediction block predicted by the decoder is superimposed on the residual block to obtain the reconstructed block.
  • the inverse transform and inverse quantization unit 140, the loop filter unit 150 and the decoded image buffer unit 160 in the encoding framework 100 may be used to form a decoder.
  • the intra prediction unit 180 or the inter prediction unit 170 can predict the image block to be encoded based on the existing reconstructed block, thereby ensuring that the encoding end and the decoding end have consistent understanding of the reference image.
  • the encoder can replicate the decoder's processing loop and thus produce the same predictions as the decoder.
  • the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to copy the approximate residual block at the decoding side.
  • the approximate residual block After the approximate residual block is added to the prediction block, it can pass through the loop filtering unit 150 to smoothly filter out block effects and other effects caused by block-based processing and quantization.
  • the image blocks output by the loop filter unit 150 may be stored in the decoded image buffer unit 160 for use in prediction of subsequent images.
  • Figure 1 is only an example of the present application and should not be understood as a limitation of the present application.
  • the loop filtering unit 150 in the encoding framework 100 may include deblocking filter (DBF) and sample adaptive offset (SAO) filtering.
  • DBF deblocking filter
  • SAO sample adaptive offset
  • the encoding framework 100 may adopt a neural network-based loop filtering algorithm to improve video compression efficiency.
  • the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
  • a model based on a convolutional neural network can be used to calculate the result of filtering the pixels based on the deblocking filter and sample adaptive compensation filtering.
  • the network structures of the loop filter unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the brightness component contains more visual information, the brightness component can also be used to guide the filtering of the chroma component to improve the reconstruction quality of the chroma component.
  • inter-frame prediction can refer to the image information of different frames and use motion estimation to search for the motion vector information that best matches the image block to be encoded to eliminate temporal redundancy;
  • the frames used for inter-frame prediction can be P frames and /or B frame, P frame refers to forward prediction frame, B frame refers to bidirectional prediction frame.
  • intra-frame prediction only refers to the information of the same image and predicts pixel information within the image block to be encoded to eliminate spatial redundancy;
  • the frame used for intra-frame prediction can be an I frame.
  • the image block to be encoded can refer to the upper left image block, the upper image block and the left image block as reference information to predict the image block to be encoded, and the image block to be encoded The block is used as reference information for the next image block, so that the entire image can be predicted.
  • the input digital video is in color format, such as YUV 4:2:0 format, then every 4 pixels of each image frame of the digital video consists of 4 Y components and 2 UV components.
  • the encoding framework can components (i.e. luma blocks) and UV components (i.e. chrominance blocks) are encoded separately.
  • the decoding end can also perform corresponding decoding according to the format.
  • intra prediction can predict the image block to be encoded with the help of angle prediction mode and non-angle prediction mode to obtain the prediction block. Based on the rate distortion information calculated between the prediction block and the image block to be encoded, the to-be-encoded image block can be filtered out.
  • the optimal prediction mode of the image block is determined and the prediction mode is transmitted to the decoder through the code stream.
  • the decoder parses the prediction mode, predicts the prediction block of the target decoding block, and superimposes the time domain residual block obtained through code stream transmission to obtain the reconstruction block.
  • non-angle prediction modes have remained relatively stable, including mean mode and plane mode.
  • Angle prediction modes continue to increase with the evolution of digital video coding and decoding standards.
  • the H.264/AVC standard only has 8 angle prediction modes and 1 non-angle prediction mode;
  • H.265/HEVC has been expanded to 33 angle prediction modes and 2 non-angle prediction modes. model.
  • the intra-frame prediction mode has been further expanded.
  • Matrix weighted intra-frame prediction (MIP) mode among which this traditional prediction modes include: planar mode, DC mode and 65 angle prediction modes.
  • planar mode is usually used to process some blocks with gradient textures
  • DC mode is usually used to process some flat areas
  • angle prediction mode is usually used to process blocks with obvious angular textures.
  • the current block used for intra prediction may be a square block or a rectangular block.
  • the decoder can convert the traditional angle prediction mode into the wide-angle prediction mode after receiving the signal, so that , both the total number of intra prediction modes and the encoding method of intra modes can remain unchanged.
  • the intra prediction mode to be performed may be determined or selected based on the size of the current block; for example, the wide-angle prediction mode may be determined or selected based on the size of the current block to perform intra prediction on the current block; for example, on the current block
  • the current block can be intra-predicted using wide-angle prediction mode.
  • the aspect ratio of the current block may be used to determine the angle prediction mode to which the wide-angle prediction mode is replaced and the angle prediction mode after replacement. For example, when predicting the current block, any intra prediction mode having an angle not exceeding the diagonal corner of the current block (from the lower left corner to the upper right corner of the current block) may be selected as the replaced angle prediction mode.
  • the MIP mode can also be called the Matrix weighted Intra Prediction (Matrix weighted Intra Prediction) mode.
  • the process involved in the MIP mode can be divided into three main steps, which are the down-sampling process, the matrix multiplication process and the up-sampling process. Specifically, the spatially adjacent reconstructed samples are first downsampled through the downsampling process, and then the downsampled sample sequence is used as the input vector of the matrix multiplication process, that is, the output vector of the downsampling process is used as the input of the matrix multiplication process.
  • Vector multiply the preset matrix and add the offset vector, and output the calculated sample vector; use the output vector of the matrix multiplication process as the input vector of the upsampling process, and obtain the final prediction block through upsampling .
  • FIG. 2 is a schematic diagram of the MIP mode provided by the embodiment of the present application.
  • the MIP mode obtains the upper adjacent downsampled reconstructed sample vector bdry top by averaging the reconstructed samples adjacent to the top of the current coding unit during the downsampling process, and obtains the left adjacent downsampled vector bdry top by averaging the reconstructed samples adjacent to the left.
  • Sampling reconstructed sample vector bdry left After obtaining bdry top and bdry left , use them as the input vector bdry red of the matrix multiplication process.
  • the sample can be obtained through the top row vector bdry top red , bdry left , A k ⁇ bdry red +b k based on bdry red Vector, where A k is a preset matrix, b k is a preset bias vector, and k is the index of the MIP mode.
  • a k is a preset matrix
  • b k is a preset bias vector
  • k is the index of the MIP mode.
  • MIP in order to predict a block with width W and height H, MIP requires H reconstructed pixels in the left column of the current block and W reconstructed pixels in the upper row of the current block as input.
  • MIP generates prediction blocks in the following three steps: reference pixel averaging (Averaging), matrix multiplication (Matrix Vector Multiplication) and interpolation (Interpolation).
  • the core of MIP is matrix multiplication, which can be thought of as a process of generating prediction blocks using input pixels (reference pixels) in a matrix multiplication manner.
  • MIP provides a variety of matrices. Different prediction methods can be reflected in different matrices. Using different matrices for the same input pixel will yield different results.
  • reference pixel averaging and interpolation is a design that compromises performance and complexity. For larger blocks, an effect similar to downsampling can be achieved by reference pixel averaging, allowing the input to be adapted to a smaller matrix, while interpolation achieves an upsampling effect. In this way, there is no need to provide MIP matrices for blocks of each size, but only matrices of one or several specific sizes. As the demand for compression performance increases and hardware capabilities improve, more complex MIPs may appear in the next generation of standards.
  • the MIP mode can be simplified from the neural network.
  • the matrix used can be obtained based on training. Therefore, the MIP mode has strong generalization ability and prediction effects that traditional prediction models cannot achieve.
  • the MIP mode can be a model obtained through multiple simplifications of hardware and software complexity for an intra-frame prediction model based on a neural network. Based on a large number of training samples, multiple prediction modes represent a variety of models and parameters, which can compare Good coverage of natural sequences of textures.
  • MIP is somewhat similar to planar mode, but obviously MIP is more complex and more flexible than planar mode.
  • the number of MIP modes may be different. For example, for a coding unit of 4x4 size, the MIP mode has 16 prediction modes; for a coding unit of 8x8 with width equal to 4 or height equal to 4, the MIP mode has 8 prediction modes; for coding units of other sizes, the MIP mode has 6 prediction modes.
  • MIP mode has a transpose function. For prediction modes that conform to the current size, MIP mode can try transposition calculations on the encoder side.
  • MIP mode not only requires a usage flag bit to indicate whether the current coding unit uses MIP mode, but also, if the current coding unit uses MIP mode, an additional transposition flag bit and index flag bit need to be transmitted to the decoder.
  • the transposed flag bit can be binarized by fixed-length encoding (Fixed Length, FL), with a length of 1.
  • the index flag bit is binarized by truncated binary encoding (TB). Taking the 4x4 size coding unit as an example, the MIP mode has 16 prediction modes.
  • the index flag bit can be a 5- or 6-bit truncated binary identifier.
  • the main core point of the DIMD mode is that the decoder uses the same method as the encoder to derive the intra prediction mode. This avoids transmitting the intra prediction mode index of the current coding unit in the code stream to save bit overhead. Purpose.
  • DIMD mode The specific process of DIMD mode can be divided into the following two main steps:
  • Step 1 Export the prediction model.
  • Figure 3 is a schematic diagram of a prediction mode derived based on DIMD provided by an embodiment of the present application.
  • DIMD derives a prediction mode using pixels in the template in the reconstruction area (reconstruction pixels on the left and upper sides of the current block).
  • the template may include the three adjacent rows of reconstructed samples above the current block, the three adjacent columns of reconstructed samples on the left, and the corresponding adjacent reconstructed samples at the upper left.
  • the template can be configured according to the window (for example, as shown in (a) of Figure 3 or As shown in the window shown in Figure 3(b)), multiple gradient values corresponding to multiple adjacent reconstructed samples are determined within the template, where each gradient value can be used to adapt an intra-frame image that is adapted to its gradient direction.
  • the encoder can use the prediction mode adapted to the largest and second largest gradient value among multiple gradient values as the derived prediction mode. For example, as shown in (b) of Figure 3, for a 4 ⁇ 4 block size, all adjacent reconstruction samples that need to determine the gradient value are analyzed and the corresponding gradient histogram (histogram of gradients) is obtained, for example, as As shown in (c) of Figure 3, for blocks of other sizes, all adjacent reconstructed samples whose gradient values need to be determined are analyzed and the corresponding gradient histograms are obtained; finally, the largest and second largest gradients in the gradient histogram are The prediction mode corresponding to the gradient is used as the derived prediction mode.
  • IPM Intra prediction mode
  • the gradient histogram in this application is only an example for determining the derived prediction mode, and can be implemented in a variety of simple forms, which is not specifically limited in this application.
  • this application does not limit the method of statistical gradient histograms.
  • the Sobel operator or other methods may be used to calculate the gradient histograms.
  • the gradient value involved in this application can also be equivalently replaced by a gradient amplitude value, which is not specifically limited in this application.
  • Step 2 Export prediction blocks.
  • Figure 4 is a schematic diagram of deriving prediction blocks based on DIMD provided by an embodiment of the present application.
  • the encoder can weight the prediction values of 3 intra prediction modes (planar mode and 2 intra prediction modes derived based on DIMD).
  • the codec uses the same prediction block derivation method to obtain the prediction block of the current block. Assume that the prediction mode corresponding to the largest gradient value is prediction mode 1, and the prediction mode corresponding to the second largest gradient value is prediction mode 2.
  • the encoder determines the following two conditions:
  • the gradient of prediction mode 2 is not 0;
  • Neither prediction mode 1 nor prediction mode 2 is planar mode or DC prediction mode.
  • prediction mode 1 is used to calculate the prediction sample value of the current block, that is, the ordinary prediction prediction process is applied to prediction mode 1; otherwise, that is, the above two conditions are established, then the weighted averaging method is used. Export the predicted block for the current block.
  • the specific method is: the plane mode occupies 1/3 of the weighted weight, and the remaining 2/3 is the total weight of prediction mode 1 and prediction mode 2. For example, divide the gradient amplitude value of prediction mode 1 by the sum of the gradient amplitude values of prediction mode 1. The sum of the gradient amplitude values of prediction mode 2 is used as the weighting weight of prediction mode 1.
  • the gradient amplitude value of prediction mode 2 is divided by the sum of the gradient amplitude value of prediction mode 1 and the gradient amplitude value of prediction mode 2 as the weighting of prediction mode 2.
  • the decoder follows the same steps to obtain the prediction block.
  • Weight(mode1) 2/3*(amp1/(amp1+amp2));
  • Weight(mode2) 1-Weight(PLANAR)-Weight(mode1);
  • mode1 and mode2 represent prediction mode 1 and prediction mode 2 respectively
  • amp1 and amp2 represent the gradient amplitude value of prediction mode 1 and the gradient amplitude value of prediction mode 2 respectively.
  • DIMD mode requires transmitting a flag bit to the decoder, which is used to indicate whether the current encoding unit uses DIMD mode.
  • DIMD uses gradient analysis of reconstructed pixels to screen intra prediction modes, and the two intra prediction modes plus the planar mode can be weighted according to the analysis results.
  • the advantage of DIMD is that if the DIMD mode is selected for the current block, there is no need to indicate which intra prediction mode is used in the code stream. Instead, the decoder itself derives it through the above process, which saves overhead to a certain extent.
  • the technical principles of the TIMD mode are similar to the technical principles of the above-mentioned DIMD mode. They both use the same operation of the codec to derive the prediction mode to save transmission mode index overhead.
  • the TIMD mode can be understood as two main parts. First, the cost information of each prediction mode is calculated according to the template. The prediction mode corresponding to the minimum cost and the second-lowest cost will be selected. The prediction mode corresponding to the minimum cost is recorded as prediction mode 1. The prediction mode corresponding to the small cost is recorded as prediction mode 2; if the ratio of the next smallest cost value (costMode2) to the minimum cost value (costMode1) meets the preset conditions, such as costMode2 ⁇ 2*costMode1, then prediction mode 1 and prediction mode 2. Each corresponding prediction block can be weighted and fused according to the corresponding weights of prediction mode 1 and prediction mode 2 to obtain the final prediction block.
  • the corresponding weights of prediction mode 1 and prediction mode 2 are determined in the following manner:
  • weight1 costMode2/(costMode1+costMode2)
  • weight2 1-weight1
  • weight1 is the weight of the prediction block corresponding to prediction mode 1
  • weight2 is the weight of the prediction block corresponding to prediction mode 2.
  • weighted fusion between prediction blocks will not be performed, and the prediction block corresponding to prediction mode 1 will be the prediction block of TIMD.
  • the TIMD mode if the TIMD mode is used to perform intra prediction on the current block, if the reconstruction sample template of the current block does not contain available adjacent reconstruction samples, the TIMD mode selects the planar mode to perform intra prediction on the current block, that is, it does not Perform unweighted fusion. Same as DIMD mode, TIMD mode needs to transmit a flag bit to the decoder to indicate whether the current coding unit uses TIMD mode.
  • the process of the encoder or decoder calculating the cost information of each prediction mode is mainly: performing intra mode prediction on the samples in the template area based on the reconstructed samples adjacent to the upper or left side of the template area.
  • the prediction process is the same as the original intra prediction.
  • the modes are the same; for example, when DC mode is used to perform intra mode prediction on samples in the template area, the mean value of the entire coding unit is calculated; and when angle prediction mode is used to perform intra mode prediction on samples in the template area, the corresponding mode is selected according to the mode.
  • interpolation filter and interpolate prediction samples according to rules At this time, based on the predicted samples and reconstructed samples in the template area, the distortion between the predicted samples and reconstructed samples in the area can be calculated, which is the cost information of the current prediction mode.
  • FIG. 5 is a schematic diagram of a template used by TIMD provided by an embodiment of the present application.
  • the codec can be based on a coding unit with a width equal to 2(M+L1)+1 and a height equal to 2(N+L2)+1 Select the reference template (Reference of template) of the current block to calculate the template of the current block.
  • the TIMD mode selects the planar mode to perform intra prediction on the current block.
  • the template of the current block may be the samples adjacent to the left and upper sides of the current CU in Figure 5, that is, there are no available reconstructed samples in the diagonally filled area. That is to say, if there are no adjacent reconstructed samples available in the diagonal padding area, the TIMD mode selects the planar mode to perform intra prediction on the current block.
  • the left and upper sides of the current block can theoretically obtain reconstruction values, that is, the template of the current block contains available adjacent reconstruction samples.
  • the decoder can use a certain intra prediction mode to predict on the template, and compare the prediction value and the reconstructed value to obtain the cost of the intra prediction mode on the template. For example, Sum of Absolute Differences (SAD), Sum of Absolute Transformed Difference (SATD) or Sum of Squares for Error (SSE), etc. Since the template and the current block are adjacent, the reconstructed samples in the template are correlated with the pixels in the current block.
  • the performance of a prediction mode on the template can be used to estimate the performance of this prediction mode on the current block.
  • TIMD predicts some candidate intra prediction modes on the template, obtains the cost of the candidate intra prediction mode on the template, and replaces the one or two intra prediction modes with the lowest cost as the intra prediction value of the current block. If the template cost difference between the two intra prediction modes is not large, the compression performance can be improved by weighting the average of the prediction values of the two intra prediction modes.
  • the weights of the predicted values of the two prediction modes are related to the above-mentioned costs. For example, the weights are inversely proportional to the costs.
  • TIMD uses the prediction effect of the intra prediction mode on the template to screen the intra prediction mode, and can weight the two intra prediction modes according to the cost on the template.
  • the advantage of TIMD is that if the TIMD mode is selected for the current block, there is no need to indicate which intra prediction mode is used in the code stream. Instead, the decoder itself derives it through the above process, which saves overhead to a certain extent.
  • DIMD mode In the above simple introduction to several intra prediction modes, it is not difficult to find that the technical principles of DIMD mode are close to those of TIMD mode. They both use the decoder to perform the same operation as the encoder to infer the prediction of the current coding unit. model.
  • This prediction mode can save the transmission of the index of the prediction mode when the complexity is acceptable, thereby saving overhead and improving compression efficiency.
  • the DIMD mode and the TIMD mode work better in large areas with consistent texture characteristics. If the texture changes slightly or the template area cannot be covered, Then the prediction effect of this prediction mode is poor.
  • DIMD mode whether for DIMD mode or TIMD mode, prediction blocks obtained based on multiple traditional prediction modes are fused or weighted prediction blocks obtained based on multiple traditional prediction modes are performed.
  • the fusion of prediction blocks can Generating effects that cannot be achieved by a single prediction mode.
  • the DIMD mode introduces the planar mode as an additional weighted prediction mode to increase the spatial correlation between adjacent reconstructed samples and prediction samples, thereby improving the prediction effect of intra prediction, but , Since the prediction principle of planar mode is relatively simple, for some prediction blocks with obvious differences between the upper right corner and the lower left corner, using planar mode as an additional weighted prediction mode may have counterproductive effects.
  • the current block When encoding, the current block will be predicted first.
  • the prediction uses spatial or temporal correlation performance to obtain an image that is the same or similar to the current block.
  • the predicted block and the current block are exactly the same, but it is difficult to guarantee that this is the case for all blocks in a video.
  • the predicted block is usually very similar to the current block, but there are differences.
  • due to irregular motion, distortion, occlusion, brightness, etc. changes in the video it is difficult to completely predict the current block.
  • the hybrid coding framework will subtract the predicted image from the original image of the current block to obtain the residual image, or the current block minus the predicted block will obtain the residual block.
  • Residual blocks are usually much simpler than the original image, so prediction can significantly improve compression efficiency.
  • the residual block is not directly encoded, but is usually transformed first. Transformation is to transform the residual image from the spatial domain to the frequency domain and remove the correlation of the residual image. After the residual image is transformed into the frequency domain, since most of the energy is concentrated in the low-frequency region, most of the transformed non-zero coefficients are concentrated in the upper left corner, and then quantization is used to further compress the residual block.
  • a larger quantization step size can be used in high-frequency areas.
  • Image transformation technology is a transformation of the original image in order to be able to represent the original image with an orthogonal function or orthogonal matrix. This transformation is two-dimensional linear and reversible.
  • the original image is generally called a spatial domain image
  • the transformed image is called a converted domain image (also called frequency domain).
  • the converted domain image can be inversely transformed into a spatial domain image. After image transformation, on the one hand, it can more effectively reflect the characteristics of the image itself, on the other hand, it can also concentrate energy on a small amount of data, which is more conducive to the storage, transmission and processing of images.
  • Transformation methods include primary transformation and secondary transformation.
  • Main transformation methods include but are not limited to: Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST).
  • DCTs that can be used in video codecs include but are not limited to DCT2 and DCT8 types;
  • DSTs that can be used in video codecs include but are not limited to DST7 types. Since DCT has strong energy concentration characteristics, only some areas (such as the upper left corner area) of the original image have non-zero coefficients after DCT transformation. Of course, in video coding and decoding, images are divided into blocks for processing, so transformation is also performed based on blocks.
  • the above-mentioned DCT2 type, DCT8 type, and DST7 type transformations are usually disassembled.
  • the one-dimensional transformation is divided into horizontal and vertical directions, which is divided into two steps. For example, perform horizontal transformation first and then vertical transformation, or perform vertical transformation first and then horizontal transformation.
  • the above transformation method is more effective for horizontal and vertical textures, but is less effective for oblique textures. Since horizontal and vertical textures are the most common, the above transformation method is very useful for improving compression efficiency.
  • the encoder can perform a secondary transformation based on the primary transform to further improve compression efficiency.
  • the main transformation can be used to process textures in horizontal and vertical directions.
  • the main transformation can also be called a basic transformation.
  • the main transformation includes but is not limited to: the above-mentioned DCT2 type, DCT8 type, DST7 type transformation.
  • Secondary transformation is used to process oblique textures.
  • secondary transformation includes but is not limited to: low frequency non-separable transform (LFNST).
  • LFNST low frequency non-separable transform
  • the secondary transform On the encoding side, the secondary transform is used after the main transform and before quantization.
  • the secondary transform is used after the inverse quantization and before the inverse primary transform.
  • Figure 6 is an example of LFNST provided by the embodiment of the present application.
  • LFNST performs a secondary transformation on the low-frequency coefficients in the upper left corner after the basic transformation.
  • the main transform concentrates the energy to the upper left corner by decorrelating the image.
  • the secondary transformation decorrelates the low-frequency coefficients of the primary transformation.
  • 16 coefficients are input to a 4x4 LFNST, 8 coefficients are output; when 64 coefficients are input to an 8x8 LFNST, 16 coefficients are output.
  • 8 coefficients are input to the 4x4 inverse LFNST, 16 coefficients are output; when 16 coefficients are input to the 8x8 inverse LFNST, 64 coefficients are output.
  • the encoder When the encoder performs secondary transformation on the current block in the current image, it can use a certain transformation kernel in the selected transformation set to transform the residual block of the current block.
  • the transformation kernel can refer to a set of transformation kernels used to transform a certain oblique texture, or the transformation set can include a set of transformation kernels used to transform some similar oblique textures. gather.
  • the transformation kernel can also be called or equivalently replaced by transformation matrix, transformation kernel type or basis function and other terms with similar or identical meanings
  • the transformation set can also be called or equivalently replaced by transformation matrix Groups, transformation kernel type groups, basis function groups and other terms with similar or identical meanings are not specifically limited in this application.
  • Figure 7 is an example of the transformation set of LFNST provided by the embodiment of the present application.
  • LFNST can have 4 transformation sets, and the transformation kernels in the same transformation set have similar oblique textures.
  • the transformation set shown in (a) in Figure 7 may be a transformation set with an index of 0
  • the transformation set shown in (b) of Figure 7 may be a transformation set with an index of 1
  • the transformation set in Figure 7 The transformation set shown in (c) may be a transformation set with an index of 2
  • the transformation set shown in (d) in FIG. 7 may be a transformation set with an index of 3.
  • Intra-frame prediction uses the reconstructed pixels around the current block as a reference to predict the current block. Since current videos are encoded from left to right and from top to bottom, the reference pixels that can be used by the current block are usually on the left and upper sides. . Angle prediction tiles the reference pixels to the current block according to the specified angle as the prediction value, which means that the predicted block will have obvious directional texture, and the residual of the current block after angle prediction will also reflect obvious angles statistically. characteristic. Therefore, the transformation set selected by LFNST can be bound to the intra prediction mode. That is, after the intra prediction mode is determined, LFNST can use a transformation set (Transform set) that is adapted to the texture direction and the angular characteristics of the intra prediction mode. to save bit overhead.
  • Transform set transformation set
  • LFNST has a total of 4 transformation sets, and each transformation set has 2 transformation kernels.
  • Table 1 gives the correspondence between intra prediction modes and transform sets.
  • intra prediction modes 0 to 81 can be associated with the indices of four transform sets.
  • the cross-component prediction modes used by chroma intra-frame prediction are 81 to 83, while luma intra-frame prediction does not have these modes.
  • the transformation set of LFNST can be transposed to handle more angles with one transformation set.
  • intra prediction modes 13 to 23 and intra prediction modes 45 to 55 both correspond to transformation set 2, but the frame Intra prediction modes 13 to 23 are obviously close to horizontal modes, while intra prediction modes 45 to 55 are obviously close to vertical modes.
  • the corresponding transformations of intra prediction modes 45 to 55 are adapted through transposition. .
  • the encoding end can determine which transform set LFNST uses based on the intra prediction mode used by the current block, and then determine the transform core to be used in a determined transform set. Equivalently, the correlation between the intra prediction mode and the LFNST transform set can be exploited, thereby reducing the selection of the LFNST transform set for transmission in the code stream. Whether the current block will use LFNST, and if LFNST is used, whether to use the first or second transformation core in a transformation set can be determined through the code stream and some conditions.
  • LFNST can also be designed to be more complex. For example, use larger transformation sets, more transformation sets, and use more transformation kernels per transformation set.
  • Table 2 shows another correspondence relationship between intra prediction modes and transformation sets.
  • each transformation set uses 3 transformation kernels.
  • the corresponding relationship between the transformation set and the intra prediction mode can be realized as follows: for the intra prediction mode 0 to 34, the forward direction corresponds to the transformation set 0 to 34, that is, the larger the number of the prediction mode, the larger the index of the transformation set; for the intra prediction mode Prediction modes 35 to 67, due to transposition, correspond inversely to 2 to 33, that is, the larger the number of the prediction mode, the smaller the index of the transformation set; for the remaining prediction modes, they can all be uniformly corresponding to the index of 2 set of transformations. In other words, if transposition is not considered, an intra prediction mode corresponds to a transformation set. According to this design, the residual corresponding to each intra prediction mode can obtain a more suitable transformation set, and the compression performance will also be improved. .
  • LFNST is only an example of quadratic transformation and should not be understood as a limitation on quadratic transformation.
  • LFNST is a non-separable secondary transformation.
  • a separable secondary transformation can also be used to improve the compression efficiency of the residual of oblique textures. This application does not specifically limit this.
  • Figure 8 is a schematic block diagram of the decoding framework 200 provided by the embodiment of the present application.
  • the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer. Unit 270.
  • the entropy decoding unit 210 receives and parses the code stream to obtain the prediction block and the frequency domain residual block.
  • the inverse transform and inverse quantization unit 220 performs steps such as inverse transformation and inverse quantization, and the time can be obtained.
  • the residual unit 230 superposes the prediction block predicted by the intra prediction unit 240 or the inter prediction unit 250 to the time domain residual block after inverse transformation and inverse quantization by the inverse transformation and inverse quantization unit 220, Reconstruction blocks are available.
  • Figure 9 is a schematic flow chart of the decoding method 300 provided by the embodiment of the present application. It should be understood that the decoding method 300 can be performed by a decoder. For example, the decoding method 300 may be executed by the decoding framework 200 shown in FIG. 6 . For the convenience of description, the following takes the decoder as an example.
  • the decoding method 300 may include:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • S350 Determine the reconstruction block of the current block based on the prediction block of the current block and the residual block of the current block.
  • the decoder uses the intra prediction mode derived from the DIMD mode for the reconstructed sample in the first template area
  • the gradient value of the reconstructed sample in the first template area can be first calculated, and then combined with
  • the intra prediction mode that matches the gradient direction of the reconstructed sample with the largest gradient value among the reconstructed samples in the first template region is determined as the intra prediction mode derived using the DIMD mode.
  • the decoder can calculate the gradient value corresponding to each intra prediction mode by traversing the intra prediction modes based on the reconstructed samples in the first template area, and select the intra prediction mode with the largest gradient value as Determine the intra prediction mode derived using the DIMD mode.
  • the decoder may first calculate the prediction block of the current block ( or the output vector of the optimal MIP mode), and then compare it with the gradient value of the prediction sample within the prediction block of the current block (or the output vector of the optimal MIP mode) with the largest gradient value
  • the intra prediction mode whose gradient direction matches the prediction sample is determined as the intra prediction mode derived using the DIMD mode.
  • the decoder can calculate the gradient corresponding to each intra prediction mode by traversing the intra prediction modes based on the prediction samples in the prediction block of the current block (or the output vector of the optimal MIP mode). value, and determine the intra prediction mode with the largest gradient value as the intra prediction mode derived using the DIMD mode.
  • the first transformation is used to process texture along an oblique direction in the current block.
  • the second transformation is used to process the texture along the horizontal direction and the texture along the vertical direction in the current block.
  • the first transformation is the inverse transformation of the secondary transformation at the coding end
  • the second transformation is the inverse transformation of the basic transformation at the coding end.
  • the first transformation may be an inverse (inverse) LFNST
  • the second transformation may be an inverse (inverse) DCT2 type, an inverse (inverse) DCT8 type, or an inverse (inverse) DST7 type, etc.
  • TMMIP technology and LFNST are also applicable to other secondary transformation methods.
  • LFNST is a non-separable secondary transformation.
  • TMMIP technology can also be applied to separable secondary transformations, which is not specifically limited in this application.
  • the encoder or decoder predicts the current block, it is possible to use the transformation set corresponding to the PLANAR mode to perform LFNST.
  • the transformation kernel used by LFNST is composed of the traditional intra prediction mode data set. It is obtained through deep learning training. Therefore, in the ordinary intra prediction process, the transformation kernel used by LFNST is usually also the transformation kernel selected from the transformation set of LFNST corresponding to the traditional intra prediction mode.
  • the encoder or decoder may use a non-traditional intra prediction mode to predict the current block. At this time, consider that the planar mode is usually used to process some blocks with gradient textures, while LFNST is used to process oblique blocks.
  • the texture information of the prediction block output by the planar mode and the texture information of the planar mode in the traditional intra prediction mode are usually processed as one type of texture, that is, the encoder or decoder utilizes non-traditional frames
  • the transformation set corresponding to the planar mode is used to perform LFNST.
  • the encoder uses MIP mode to predict the current block, it uses the transformation set corresponding to the planar mode to perform LFNST.
  • the MIP mode since the meaning represented by the MIP mode is different from the traditional intra prediction mode, that is, the traditional intra prediction mode has obvious directionality, while the MIP mode is only an index of the matrix coefficients.
  • the planar mode is used for processing Some textures have gradient blocks, but they do not necessarily conform to the texture information of the current block. That is, the texture direction of the transformation set used by LFNST does not necessarily conform to the texture direction of the current block, which reduces the decompression performance of the current block.
  • embodiments of the present application can further improve the performance of the current block by introducing a first intra prediction mode and performing a first transformation on the first transformation coefficient of the current block based on the transformation set corresponding to the first intra prediction mode. Describes the decompression performance of the current block.
  • the decoder uses a non-traditional intra prediction mode to predict the current block, it can avoid directly using the transformation set corresponding to the planar mode to perform the first transformation, and the transformation set corresponding to the first intra prediction mode is in a certain range. It can reflect the texture direction of the current block to a certain extent, thereby improving the decompression performance of the current block.
  • Table 1 is derived by using the optimal MIP mode and the suboptimal MIP mode to perform weighted prediction on the current block, and designing the first intra prediction mode to use the DIMD mode for the prediction block of the current block.
  • the intra prediction mode the results obtained by testing the test sequence are obtained.
  • Table 2 is the use of the optimal MIP mode and the use of the DIMD mode for the reconstructed samples in the first template area adjacent to the current block.
  • a negative delta bit rate represents the performance improvement based on the test results of ECM2.0 based on the solution provided by this application. It can be seen from the test results that under the general test conditions, the test results in Table 1 to Table 2 can provide an average brightness performance gain of 0.20%, and the 4K sequence performs well. It is worth noting that the TIMD prediction mode integrated in ECM2.0 has a higher complexity based on ECM1.0, and only has a performance gain of 0.4%. In the current situation where intra-frame coding performance is becoming increasingly difficult to obtain, Under the circumstances, the solution provided by this application can bring good performance gains without increasing the complexity of the decoder. Especially for 4K type video sequences, the performance gains are obvious. In addition, due to server load reasons, even if the encoding and decoding time fluctuates slightly, theoretically the decoding time will basically not increase.
  • the output vector of the optimal MIP mode is the vector output by the optimal MIP mode before upsampling; or, the output vector of the optimal MIP mode is the vector output by the optimal MIP mode. The upsampled vector.
  • the process of using the intra prediction mode derived from the DIMD mode for the output vector of the optimal MIP mode may be performed before upsampling the vector output by the optimal MIP mode, or may be performed before upsampling the output vector of the optimal MIP mode.
  • the vector output by the optimal MIP mode is executed after upsampling, which is not specifically limited in this application.
  • the decoder After the decoder inputs the prediction matrix of the optimal MIP mode with reference samples and obtains the output vector, the vector output by the optimal MIP mode has at most 64 prediction samples.
  • the decoder can reduce computational complexity by using the first intra prediction mode derived from the DIMD mode before upsampling the vector output by the optimal MIP mode, This in turn improves the decompression performance of the current block.
  • the decoder can effectively reduce computational complexity by using the DIMD to calculate the gradient magnitude value of each traditional prediction mode before upsampling.
  • the S350 may include:
  • the decoder determines the first intra prediction mode based on a prediction mode used to predict the current block.
  • the decoder determines the first intra prediction mode based on a mode type of a prediction mode used to predict the current block.
  • the decoder determines the first intra prediction mode based on a derived mode of a prediction mode used to predict the current block.
  • the derivation mode of the prediction mode used to predict the current block includes but is not limited to: MIP mode, the DIMD mode and the TIMD mode.
  • the decoder determines that the first intraframe
  • the prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block, or the decoder determines that the first intra prediction mode is using the DIMD for the output vector of the optimal MIP mode. Mode derived intra prediction mode.
  • the decoder determines that the first intra prediction mode is to use the DIMD mode for the prediction block of the current block.
  • the derived intra prediction mode, or the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for the output vector of the optimal MIP mode.
  • the decoder gives priority to the prediction block of the current block.
  • the texture direction of the transformation set corresponding to the first intra prediction mode can simultaneously fit the optimal MIP mode so that the current block
  • the texture characteristics of the prediction block of the current block and the texture characteristics of the prediction block of the current block caused by the sub-optimal MIP mode can improve the decompression performance of the current block as much as possible; the decoder gives priority to the most
  • the output vector of the optimal MIP mode uses the intra prediction mode derived from the DIMD mode as the first intra prediction mode, the output of the optimal MIP mode can be directly obtained in the process of determining the optimal MIP mode.
  • the vector is equivalent to enabling the texture direction of the transformation set corresponding to the first intra prediction mode to fit the texture characteristics of the prediction block of the current block in the optimal MIP mode on the basis of reducing the decompression complexity. , thereby improving the decompression performance of the current block as much as possible.
  • the decoder may also use the first template region to predict the current block.
  • the reconstructed sample in the frame is determined as the first intra prediction mode using the intra prediction mode derived from the DIMD mode or the intra prediction mode derived from the TIMD mode, which is not specifically limited in this application.
  • the decoder determines the first intra prediction if the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived from the TIMD mode.
  • the mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the decoder determines that the first intra prediction mode is an intra prediction mode derived from the TIMD mode.
  • the decoder determines that the first intra prediction mode is using the DIMD mode for the prediction block of the current block.
  • a derived intra prediction mode, or the decoder determines that the first intra prediction mode is an intra prediction mode derived from the TIMD mode.
  • the decoder preferentially uses the frame derived from the DIMD mode for the prediction block of the current block.
  • the texture direction of the transformation set corresponding to the first intra prediction mode can simultaneously fit the texture characteristics of the prediction block of the current block in the optimal MIP mode.
  • the intra prediction mode derived from the TIMD mode allows the prediction block of the current block to exhibit texture characteristics, thereby improving the decompression performance of the current block as much as possible; the decoder gives priority to frames derived from the TIMD mode
  • the second intra prediction mode can be directly determined as the first intra prediction mode, which is equivalent to being able to reduce decompression complexity on the basis of This enables the texture direction of the transformation set corresponding to the first intra prediction mode to fit the texture characteristics of the prediction block of the current block displayed by the optimal MIP mode, thereby improving the decompression performance of the current block as much as possible.
  • the decoder may also determine to use the output vector of the optimal MIP mode.
  • the intra prediction mode derived from the DIMD mode and the intra prediction mode derived using the DIMD mode for the reconstructed sample in the first template area are determined as the first intra prediction mode, which is not specifically limited in this application. .
  • the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block, or the decoder determines that the first intra prediction mode is an intra prediction mode for the prediction block of the current block.
  • the reconstructed samples within the first template region use the intra prediction mode derived from the DIMD mode.
  • the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block. Intra prediction mode, or the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for reconstructed samples within the first template region.
  • the decoder gives priority to the current intra prediction mode.
  • the prediction block of a block uses the intra prediction mode derived from the DIMD mode as the first intra prediction mode
  • the texture direction of the transformation set corresponding to the first intra prediction mode can simultaneously fit the optimal MIP.
  • the mode allows the prediction block of the current block to exhibit texture characteristics and the intra prediction mode derived from the DIMD mode allows the prediction block of the current block to exhibit texture characteristics, thereby improving the decompression performance of the current block as much as possible.
  • the decoder preferentially uses the intra prediction mode derived from the DIMD mode for the reconstructed sample in the first template area as the first intra prediction mode, and the second intra prediction mode can be directly determined is the first intra prediction mode, which is equivalent to enabling the texture direction of the transformation set corresponding to the first intra prediction mode to fit the optimal MIP mode for the current block on the basis of reducing the decompression complexity.
  • the texture characteristics exhibited by the predicted block can thereby improve the decompression performance of the current block as much as possible.
  • the decoder may also determine that the The output vector of the optimal MIP mode is determined as the first intra prediction mode using the intra prediction mode derived from the DIMD mode and the intra prediction mode derived from the TIMD mode. This application does not make any reference to this. Specific limitations.
  • the method 300 may further include:
  • the second intra prediction mode includes any one of the following: a suboptimal MIP mode for predicting the current block, a frame derived using the DIMD mode for reconstructed samples in the first template region Intra prediction mode, an intra prediction mode derived from the TIMD mode;
  • the current block is predicted based on the optimal MIP mode and the second intra prediction mode to obtain a prediction block of the current block.
  • the process of the decoder predicting the current block based on the optimal MIP mode and determining the second intra prediction mode is referred to as template matching MIP (Template Matching MIP, TMMIP) technology, template matching-based MIP.
  • Prediction mode derivation method, or TMMIP fusion enhancement technology that is to say, after the decoder obtains the residual block of the current block, it can predict the current block based on the derived optimal MIP mode and the second intra prediction mode. Performance enhancements to the prediction process.
  • TMMIP technology can use the optimal MIP prediction mode and at least one of the following to enhance the performance of the prediction process of the current block: the suboptimal MIP prediction mode, the intra prediction mode derived from the TIMD mode, and the current block prediction mode.
  • the reconstructed samples within the first template region adjacent to the block use the intra prediction mode derived from the DIMD mode.
  • the decoder predicts the current block based on the optimal MIP mode and the second intra prediction mode, and designs the optimal MIP mode to determine the distortion cost based on multiple MIP modes for predicting the current block.
  • the optimal MIP mode of the block, the second intra prediction mode is designed to include at least one of the following: a suboptimal MIP mode for predicting the current block determined based on the distortion cost of the plurality of MIP modes.
  • Obtaining the MIP mode can effectively reduce the bit overhead at the coding unit level compared with the traditional MIP technology, thereby improving the decompression efficiency of the current block.
  • the bit overhead of the MIP mode is larger than that of other intra prediction modes. It not only requires a usage flag bit to indicate whether to use the MIP mode, but also requires a transposition flag bit to indicate whether to transpose the MIP mode. Finally, the maximum The overhead part, which requires using truncated binary encoding to represent the index of the MIP pattern.
  • MIP mode is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, MIP mode often works better than traditional intra-frame prediction mode. However, its large flag bit overhead is a flaw of the MIP mode.
  • the MIP mode has 16 prediction modes, but its bit overhead includes 1 MIP mode usage flag, 1 MIP mode Transpose flag and 5 or 6 bit truncated binary flag.
  • this application uses the decoder to independently determine the optimal MIP mode for predicting the current block and determines the intra prediction mode of the current block based on the optimal MIP mode, which can save up to 5 or 6 bits of overhead. It can effectively reduce the bit overhead at the coding unit level, thereby improving decompression efficiency.
  • the premise of saving up to 5 or 6 bits of overhead for each coding unit is that the derivation algorithm based on the template matching prediction mode is accurate enough. If the accuracy of the derivation algorithm based on the template matching prediction mode is too low, it will cause encoding The MIP mode exported by the end is different from the MIP mode exported by the encoding end, thus reducing the encoding and decoding performance. In other words, encoding and decoding performance depends on the accuracy of the derivation algorithm based on template matching prediction patterns.
  • the accuracy of both the template-based derivation algorithm of the traditional intra-frame prediction mode and the template-matching-based derivation algorithm of the inter-frame is unsatisfactory.
  • it can save bit overhead and improve compression efficiency, with the development of the template-matching-based prediction mode
  • the additional bit overhead at the coding unit level brought by the derivation algorithm based on template matching prediction modes has made it difficult for subsequent technologies to solely rely on derivation algorithms based on template matching prediction modes to improve compression efficiency. Therefore, the derivation algorithm based on template matching prediction mode urgently needs to improve the encoding and decoding performance on the basis of improving compression efficiency.
  • this application avoids the problem by fusing the optimal MIP mode and the second intra prediction mode, that is, performing fusion prediction on the current block based on the optimal MIP mode and the second intra prediction mode.
  • Completely replacing the optimal prediction mode calculated based on rate-distortion cost with the optimal MIP mode can take both prediction accuracy and prediction diversity into account, thereby improving decompression performance.
  • TMMIP technology predicts the current block by combining the optimal MIP mode and the second intra prediction mode, and the prediction block obtained by predicting the current block using different prediction modes may have different texture features, therefore, If the TMMIP technology is selected for the current block, it means that the optimal MIP mode may cause the prediction block of the current block to exhibit a texture characteristic, and the second intra prediction mode may cause the prediction block of the current block to exhibit another texture characteristic.
  • a kind of texture characteristic in other words, after predicting the current block, from a statistical point of view, the residual block of the current block will also show two texture characteristics, that is, the residual block of the current block does not necessarily conform to a certain texture characteristic.
  • the prediction model can reflect the rules.
  • the transformation set corresponding to the first intra prediction mode can simultaneously fit the texture characteristics of the prediction block of the current block in the optimal MIP mode and the texture characteristics of the prediction block of the current block in the second intra prediction mode, improving the texture characteristics of the current block. Decompression performance.
  • the first intra prediction mode is an intra prediction mode derived from the DIMD mode or an intra prediction mode derived from the TIMD mode for the reconstructed sample in the first template area
  • you can Determining the first intra prediction mode directly during the process of determining the second intra prediction mode is equivalent to reducing the decompression complexity so that the transformation set corresponding to the first intra prediction mode can be
  • the texture direction can simultaneously fit the texture characteristics exhibited by the prediction block of the current block by the optimal MIP mode and the texture characteristics exhibited by the prediction block of the current block by the second intra prediction mode, thereby improving decompression efficiency.
  • the decoder first predicts the current block based on the optimal MIP mode to obtain a first prediction block; and then predicts the current block based on the second intra prediction mode to obtain a third prediction block.
  • two prediction blocks then based on the weight of the optimal MIP mode and the weight of the second intra prediction mode, perform weighting processing on the first prediction block and the second prediction block to obtain the prediction block.
  • the method 300 may further include:
  • the prediction mode used to predict the current block includes the optimal MIP mode and a suboptimal MIP mode used to predict the current block or an intra prediction mode derived from the TIMD mode, then based on the The distortion cost of the optimal MIP mode and the distortion cost of the second intra prediction mode are determined to determine the weight of the optimal MIP mode and the weight of the second intra prediction mode; if used to perform the current block If the predicted prediction mode includes the optimal MIP mode and the intra prediction mode derived using the DIMD mode for the reconstructed sample in the first template area, then the weight of the optimal MIP mode and the first The weights of the two intra prediction modes are both preset values.
  • the decoder predicts the current block based on the optimal MIP mode to obtain a first prediction block; and predicts the current block based on the second intra prediction mode to obtain a second prediction block. prediction block; then, the decoder performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the second intra prediction mode to obtain the current Block prediction block.
  • the decoder may directly perform intra prediction on the current block based on the optimal MIP mode to obtain the first prediction block.
  • the decoder can directly obtain the optimal prediction mode and the suboptimal prediction mode based on the TIMD mode, predict the current block, and obtain the second prediction block.
  • the distortion cost of the optimal prediction mode requires a prediction block fusion operation; that is, the decoder can first perform intra prediction on the current block according to the optimal prediction mode to obtain the optimal prediction block; secondly, the current block can be obtained based on the suboptimal prediction mode. Perform intra-frame prediction to obtain the suboptimal prediction block; then use the ratio between the distortion cost of the optimal prediction mode and the distortion cost of the suboptimal prediction mode to calculate the weight value belonging to the optimal prediction block and the weight of the suboptimal prediction block.
  • the optimal prediction block and the suboptimal prediction block are weighted and fused to obtain the second prediction block.
  • the optimal prediction mode or the suboptimal prediction mode is planar mode or DC mode, or the distortion cost of the optimal prediction mode is greater than twice the distortion cost of the optimal prediction mode, then there is no need to perform a prediction block fusion operation, that is, only The optimal prediction block obtained based on the optimal prediction mode may be directly used as the second prediction block.
  • the decoder After obtaining the first prediction block and the second prediction block, the decoder performs weighting processing on the first prediction block and the second prediction block to obtain the prediction block of the current block.
  • the decoder determines the weight of the optimal MIP mode and the weight of the second intra prediction mode based on the distortion cost of the optimal MIP mode and the distortion cost of the second intra prediction mode; if The prediction mode used to predict the current block includes the optimal MIP mode and the intra prediction mode derived using the DIMD mode for the reconstructed sample in the first template region, then the decoder determines the The weight of the optimal MIP mode and the weight of the second intra prediction mode are both preset values.
  • the S320 may include:
  • the decoder parses the code stream of the current sequence to obtain a first identifier; if the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used for image blocks in the current sequence Prediction is performed, and the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the value of the first identifier is a first numerical value, it is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence. ; If the value of the first flag is a second value, it is used to flag that the optimal MIP mode and the second intra prediction mode are not allowed to be used to predict image blocks in the current sequence.
  • the first value is 1 and the second value is 0. In another implementation, the first value is 0 and the second value is 1.
  • the first numerical value and the second numerical value can also be other numerical values, which are not limited in this application.
  • the first identification is true, it is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence; If the flag is false, it is used to flag that the optimal MIP mode and the second intra prediction mode are not allowed to be used to predict the image block in the current sequence.
  • the decoder parses the block-level identifier. If the current block adopts the intra prediction mode, it parses or obtains the first identifier. If the first identifier is true, the decoder uses the block-level identifier based on the multiple MIP modes. The distortion cost determines the optimal MIP mode.
  • the first flag is recorded as sps_timd_enable_flag.
  • the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder determines the optimal MIP based on the distortion costs of the multiple MIP modes. model.
  • the first identifier is a sequence-level identifier.
  • the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence. It can also be replaced by a similar or identical identifier. Description of meaning.
  • the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence, or alternatively It is any one of the following: the first identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the image block in the current sequence, and the first identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the image block in the current sequence.
  • the image block performs intra-frame prediction.
  • the first identifier is used to identify that the image block in the current sequence is allowed to use the TMMIP technology.
  • the first identifier is used to identify that the MIP mode pair determined based on the multiple MIP modes is allowed to be used.
  • Image blocks in the current sequence are predicted.
  • the permission flag bits of other technologies can also be used to indirectly indicate whether the current sequence is allowed to use the TMMIP technology.
  • TIMD technology when the first identifier is used to indicate that the current sequence is allowed to use TIMD technology, it means that the current sequence is also allowed to use TMMIP technology; or in other words, the first identifier is used to indicate that the current sequence is allowed to use TIMD. technology, it means that the current sequence allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
  • the decoder parses the The code stream obtains a second identifier; if the second identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the current block, the decoder based on the multiple MIP The distortion cost of the mode determines the optimal MIP mode.
  • the decoder parses the block-level identifier. If the current block adopts intra prediction mode, the decoder parses or obtains the first identifier. If the first identifier is true, the decoder parses or obtains the second identifier. , if the second flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the third value of the second identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the current block; if the second identifier The fourth value is used to identify that the optimal MIP mode and the second intra prediction mode are not allowed to be used to predict the current block.
  • the third value is 1 and the fourth value is 0.
  • the third value is 0 and the fourth value is 1.
  • the third numerical value and the fourth numerical value can also be other numerical values, which are not specifically limited in this application.
  • the second flag is true, it is used to flag that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the current block; if the second flag is false, , is used to identify that the optimal MIP mode and the second intra prediction mode are not allowed to be used to predict the current block.
  • the decoder parses or obtains sps_timd_enable_flag. If the sps_timd_enable_flag is true, the decoder can parse or obtain cu_timd_enable_flag. If the If cu_timd_enable_flag is true, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the second identification is a block-level identification or a coding unit-level identification.
  • the second identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the current block, and may also be replaced by a description with a similar or identical meaning.
  • the second identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the current block, and may also be replaced by: Any item: the second identifier is used to identify that the TMMIP technology is allowed to be used to determine the intra prediction mode of the current block, the second identifier is used to identify that the TMMIP technology is allowed to be used to perform intra prediction of the current block, the second identifier
  • the second identification is used to identify that the image block in the current block is allowed to use the TMMIP technology, and the second identification is used to identify that the current block is allowed to be predicted using the MIP mode determined based on the multiple MIP modes.
  • whether the current block is allowed to use the TMMIP technology can also be indirectly indicated through the permission flag bits of other technologies.
  • TIMD technology when the second identifier is used to indicate that the current block is allowed to use TIMD technology, it means that the current block is also allowed to use TMMIP technology; or in other words, the second identifier is used to indicate that the current block is allowed to use TIMD. technology, it means that the current block allows the use of TIMD technology and TMMIP technology at the same time; to further save bit overhead.
  • the decoding end when the decoding end parses the second identifier, it can parse the second identifier before parsing the residual block of the current block, or it can parse the second identifier after parsing the residual block of the current block.
  • This application provides This is not specifically limited.
  • the method 300 may further include:
  • the decoder determines the optimal MIP mode based on the distortion costs of multiple MIP modes
  • the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the second template area adjacent to the current block.
  • the decoder is based on the distortion cost of multiple MIP modes. Before determining the optimal MIP mode for predicting the current block, the decoder needs to calculate the distortion cost of each MIP mode in the multiple MIP modes, and The multiple MIP modes are sorted according to the distortion cost of each MIP mode, and the MIP mode with the smallest cost is the optimal prediction result.
  • the distortion cost involved in the decoder in this application is different from the rate distortion cost (RDcost) involved in the encoder.
  • the rate distortion cost is used by the encoding end to determine a certain intraframe among multiple intraframe prediction technologies.
  • the rate distortion cost can be the cost value obtained by comparing the distorted image and the original image. Since the decoder cannot obtain the original image, the distortion cost involved in the decoder can be the difference between the reconstructed sample and the predicted sample.
  • the distortion cost such as the Sum of Absolute Transformed Difference (SATD) cost between reconstructed samples and predicted samples, or other costs that can be used to calculate the difference between reconstructed samples and predicted samples.
  • SATD Sum of Absolute Transformed Difference
  • the decoder first determines the order of the multiple MIP modes based on the distortion cost of the MIP mode; and then determines the optimal order based on the order of the multiple MIP modes.
  • the encoding method used in the MIP mode; and then the code stream of the current sequence is decoded based on the encoding method used in the optimal MIP mode to obtain the index of the optimal MIP mode.
  • the codeword length of the encoding method used by the first n MIP modes in the arrangement sequence is smaller than the codeword length of the encoding mode used by the MIP mode after the nth MIP mode in the arrangement sequence; and/or, the The first n MIP patterns use variable length encoding and the MIP patterns after the nth MIP pattern use truncated binary encoding.
  • n can be any value greater than or equal to 1.
  • the index of the MIP mode is usually binarized and written in a truncated binary manner. This encoding method is closer to equal probability encoding, that is, it divides all prediction modes into two segments.
  • the decoder may first calculate the distortion of each MIP mode in the multiple MIP modes. cost, and sorts the multiple MIP modes according to the distortion cost of each MIP mode. Finally, the decoder can choose to use a more flexible variable length encoding method and an equal probability encoding method based on the sorting of the multiple MIP modes. In comparison, by flexibly setting the encoding method of the MIP mode, it is helpful to save the bit overhead of the index of the MIP mode.
  • the arrangement order is an order obtained by the decoder arranging the plurality of MIP modes in order from small to large distortion costs. Since the smaller the distortion cost of the MIP mode, the greater the probability that the encoder will use it for intra prediction of the current block. Therefore, the codeword length of the coding method used by the first n MIP modes in the arrangement order is designed to be smaller than the required length.
  • the encoding method used by the MIP mode after the first MIP mode is designed to be a truncated binary encoding method; equivalently, the MIP mode used by the encoder with a high probability uses a shorter codeword length or a variable length encoding method, which can save the bit overhead of the index of the MIP mode , improve decompression performance.
  • the method 300 may further include:
  • the decoder determines whether to use the suboptimal MIP mode based on the distortion cost of the optimal MIP mode and the distortion cost of the suboptimal MIP mode. Predict the current block; if it is determined not to use the sub-optimal MIP mode, the decoder can directly predict the current block based on the optimal MIP mode; if it is determined to use the sub-optimal MIP mode, then The decoder may predict the current block based on the optimal MIP mode and the suboptimal MIP mode to obtain a predicted block of the current block.
  • the decoder can directly predict the current block based on the optimal MIP mode to obtain the prediction block of the current block.
  • the second intra prediction mode is a suboptimal MIP mode
  • the ratio between the distortion cost of the suboptimal MIP mode and the distortion cost of the optimal MIP mode is greater than or equal to a preset ratio
  • the decoder can directly predict the current block based on the optimal MIP mode to obtain a prediction block of the current block.
  • the distortion cost of the suboptimal MIP mode is greater than or equal to a certain multiple (for example, twice) of the distortion cost of the optimal MIP mode, it can be interpreted that the suboptimal MIP mode has large distortion and is not suitable for the current block. That is, the fusion enhancement technology is not needed and only the optimal MIP mode is used to predict the current block.
  • a certain multiple for example, twice
  • the decoder determines whether to use the suboptimal MIP mode to predict the current block based on the distortion cost of the optimal MIP mode and the distortion cost of the suboptimal MIP mode, which is equivalent to the decoder Based on the distortion cost of the optimal MIP mode and the distortion cost of the sub-optimal MIP mode, it is determined whether to use the sub-optimal MIP mode to enhance the performance of the optimal MIP mode, avoiding carrying in the code stream for determining whether Using the sub-optimal MIP mode to identify the optimal MIP mode for performance enhancement saves bit overhead, thereby enhancing decompression performance.
  • the second template area and the first template area are the same or different.
  • the size of the second template area may be predefined according to the size of the current block.
  • the width of the upper area adjacent to the current block in the second template area is the width of the current block, and its height is the height of at least one row of samples; for another example, the width of the upper area adjacent to the current block in the second template area is The height of the adjacent left region to the left is the height of the current block and its width is the width of two rows of samples.
  • the second template area can also be implemented as a second template area of other sizes or sizes, which is not specifically limited in this application.
  • the method 300 may further include:
  • the decoder predicts the samples in the second template area based on the third identifier and the multiple MIP modes, and obtains the distortion cost of the multiple MIP modes in each state of the third identifier; the third The input vector and the output vector of three identifiers are used to identify whether to transpose the MIP mode; the decoder determines the optimal MIP mode based on the distortion cost of the multiple MIP modes in each state of the third identifier.
  • the decoder predicts the samples in the second template area based on the third identifier and the multiple MIP modes before determining the optimal MIP mode to obtain the The distortion cost of the multiple MIP modes in each state of the third identifier.
  • MIP has more bit overhead than other intra prediction tools. It not only requires a flag bit to indicate whether to use MIP technology, but also needs a flag bit to indicate whether to use transposition. MIP, the last and most expensive part, requires the use of truncated binary encoding to represent the prediction mode of MIP. MIP is a simplified technology based on neural network technology. It is quite different from traditional interpolation filter prediction technology. For some special textures, although MIP prediction is better than traditional intra-frame prediction mode, its larger logo Overhead is a flaw of MIP technology.
  • this application considers the transposition function of the MIP mode by traversing each state of the third identifier, which can save the cost of one MIP transposition identifier, thereby improving the solution. Compression efficiency.
  • the decoder traverses each state of the third identifier and the multiple MIP modes, determines the distortion cost of the multiple MIP modes in each state of the third identifier, and determines the distortion cost of the multiple MIP modes based on the third identifier.
  • the distortion cost of the multiple MIP modes in each state of the three identifiers is determined to determine the optimal MIP mode; or, the decoder traverses each state of the third identifier and the multiple MIP modes to determine the
  • the optimal MIP mode is determined based on the distortion cost of the multiple MIP modes in each state of the third identifier, and based on the distortion costs of the multiple MIP modes in each state of the third identifier. That is to say, the decoding end may first traverse the multiple MIP modes, or may first traverse the state of the third identifier.
  • the value of the third identifier is a fifth numerical value, it is used to identify the input vector and output vector of the transposed MIP mode; if the value of the third identifier is a sixth numerical value, then it is used to identify the input vector and output vector of the transposed MIP mode. Identifies the input and output vectors of the non-transposed MIP mode.
  • each state of the third identifier can also be replaced by each value of the third identifier.
  • the fifth value is 1 and the sixth value is 0.
  • the fifth value is 0 and the sixth value is 1.
  • the fifth numerical value and the sixth numerical value can also be other numerical values, which are not limited in this application.
  • the third identifier is true, it is used to identify the input vector and the output vector of the transposed MIP mode; if the third identifier is false, it is used to identify the input vector and the output vector of the non-transposed MIP mode. Output vector. At this time, whether the third identifier is true or false is a state of the third identifier.
  • the third identification is a sequence level identification, a block level identification or a coding unit level identification.
  • the third identification may also be called transposition information, transposition identification, or MIP transposition identification bit.
  • the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode, and can also be replaced with a description with similar or identical meaning.
  • the third identifier is used to identify whether the input and output of the MIP mode need to be transposed, and the third identifier is used to identify whether the input vector and the output vector of the MIP mode are transposed. vector, and the third identifier is used to indicate whether to transpose.
  • the method 300 may further include:
  • the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the preset size may include a size whose width is the preset width and whose height is the preset height. That is to say, if the width of the current block is the preset width and the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the preset size can be realized by pre-saving corresponding codes, tables or other methods that can be used to indicate relevant information in the device (for example, including a decoder and an encoder).
  • This application is concerned with its specific implementation.
  • the method is not limited.
  • the preset size may refer to the size defined in the agreement.
  • the "protocol" may refer to a standard protocol in the field of coding and decoding technology, which may include, for example, VCC or ECM protocols and other related protocols.
  • the decoder may also use other methods to determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes based on the preset size, which is not specified in this application. limited.
  • the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the plurality of MIP modes based solely on the width or height of the current block. In one implementation, if the width of the current block is the preset width or the height is the preset height, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes. For another example, the decoder may determine whether to determine the optimal MIP mode based on the distortion costs of the multiple MIP modes by comparing the size of the current block with the preset size. In one implementation, if the size of the current block is larger or smaller than a preset size, the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes. In another implementation, if the height of the current block is greater than or less than a preset height, the decoder determines the optimal MIP mode based on distortion costs of the multiple MIP modes.
  • the method 300 may further include:
  • the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the decoder based on the multiple MIP modes
  • the distortion cost is determined by determining the optimal MIP mode. That is to say, only when the image frame in which the current block is located is an I frame, the decoder determines whether to determine the optimal value based on the distortion costs of the multiple MIP modes based on the size of the current block. MIP mode.
  • the method 300 may further include:
  • the decoder determines the optimal MIP mode based on the distortion costs of the multiple MIP modes.
  • the decoder may directly determine the optimal MIP mode based on the distortion costs of the multiple MIP modes. That is to say, when the image frame in which the current block is located is a B frame, regardless of the size of the current block, the decoder can directly determine the optimal MIP based on the distortion costs of the multiple MIP modes. model.
  • the method 300 may further include:
  • the decoder obtains the MIP mode used by the adjacent block adjacent to the current block; the decoder determines the MIP mode used by the adjacent block as the multiple MIP modes.
  • the adjacent block may be an image block adjacent to at least one of the upper side, the left side, the lower left, the upper right, and the upper left of the current block.
  • the decoder may determine image blocks acquired in the order of upper, left, lower left, upper right, and upper left of the current block as the adjacent blocks.
  • the plurality of MIP modes may be used to construct an available MIP mode or an available MIP mode list that the decoder determines is used to predict the current block, so that the decoder passes the second MIP mode in the available MIP mode or available MIP mode list.
  • the optimal MIP mode is determined by the way in which the samples in the template area are predicted.
  • the method 300 may further include:
  • the decoder performs reconstruction sample filling on the adjacent reference area outside the second template area to obtain the reference rows and reference columns of the second template area; the decoder uses the reference rows and the reference columns as input, and uses The multiple MIP modes respectively predict samples in the second template area to obtain multiple prediction blocks corresponding to the multiple MIP modes; the decoder is based on the multiple prediction blocks and the second template area. Within the reconstruction block, the distortion costs of the multiple MIP modes are determined.
  • the adjacent reference area outside the second template area is filled with reconstructed samples.
  • the width of the area adjacent to the upper side of the second template area in the reference area is equal to the width of the second template area, and the left side of the reference area adjacent to the second template area
  • the height of the adjacent area is equal to the width of the second template area; if the width of the area in the reference area adjacent to the upper side of the second template area is greater than the width of the second template area, then decoding
  • the processor may perform downsampling or dimensionality reduction processing on the area in the reference area adjacent to the upper side of the second template area to obtain the reference row. If the height of the area in the reference area adjacent to the left side of the second template area can be greater than the width of the second template area, the decoder can compare the area in the reference area with the second template area. The adjacent area on the left is downsampled or dimensionally reduced to obtain the reference column.
  • the second template area may be a template area used in the TIMD mode mentioned above, and the reference area may be a reference template (Reference of template) used in the TIMD mode.
  • the decoder will The composed reference area is filled with reconstructed samples, and the filled reference area is downsampled or dimensionally reduced to obtain the reference rows and reference columns, and then the MIP pattern is constructed based on the reference rows and reference columns. input vector.
  • the decoder after the decoder obtains the reference row and the reference column, the reference row and the reference column are used as input, and the samples in the second template area are respectively processed using the multiple MIP modes. Predict to obtain multiple prediction blocks corresponding to the multiple MIP modes; that is to say, the decoder performs prediction on the current block by traversing the multiple MIP modes based on the reconstructed samples in the reference template of the current block. Samples in the second template area are predicted. Taking the current traversal MIP mode as an example, the decoder uses the reference row, the reference column, the index of the current traversal MIP mode, and the third identifier mentioned above as inputs to obtain the prediction corresponding to the current traversal MIP mode.
  • the reference row and the reference column are used to construct the input vector of the current traversal MIP mode; the index of the current traversal MIP mode is used to determine the matrix and/or offset vector of the current traversal MIP mode;
  • the third identifier is used to identify whether to transpose the input vector and the output vector of the MIP mode; for example, if the third identifier is used to identify the input vector and the output vector of the MIP mode not to be transposed, then the reference columns are spliced after the reference line to form the input vector of the current traversal MIP mode; if the third identifier is used to identify the input vector and output vector of the transposed MIP mode, then the reference line is spliced to the reference line.
  • the decoder transposes the output of the current traversal MIP mode to obtain the prediction block of the second template region.
  • the decoder may be based on the distortion cost between the multiple prediction blocks and the reconstructed samples in the second template area. , select the MIP mode with the smallest cost according to the principle of minimum distortion cost, and determine it as the optimal MIP mode under the template matching-based MIP mode of the current block.
  • the decoder when the decoder uses the multiple MIP modes to predict samples in the second template region, it first downsamples the reference rows and reference columns to obtain an input vector; and then Taking the input vector as input, predicting the samples in the second template area by traversing the multiple MIP modes, and obtaining the output vectors of the multiple MIP modes; finally, predicting the multiple MIP modes The output vector is upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
  • the reference row and the reference column satisfy the input conditions of the multiple MIP modes. If the reference row and the reference column do not meet the input conditions of the multiple MIP modes, the reference row and/or the reference column can be processed first as input that meets the input conditions of the multiple MIP modes. samples, and then determine the input vectors of the plurality of MIP modes based on the input samples that satisfy the input conditions of the plurality of MIP modes. For example, taking the input condition as a specified number of input samples, if the reference row and the reference column do not meet the number of input samples in the MIP mode, the decoder can modify the reference row and/or the reference column.
  • the reference column is dimensionally reduced to a specified number of input samples by methods such as Haar-downsampling, and the input vectors of the multiple MIP modes are determined based on the specified number of input samples after dimensionality reduction.
  • the S320 may include:
  • the decoder determines the optimal MIP mode based on a sum SATD of absolute transform differences of the plurality of MIP modes on the second template region.
  • the distortion costs of the multiple MIP modes are designed to be the multiple Compared with directly calculating the rate distortion cost of the multiple MIP modes, the SATD of the MIP mode can not only determine the optimal MIP mode based on the distortion costs of the multiple MIP modes on the second template area, It can also simplify the calculation complexity of the distortion costs of the multiple MIP modes, thereby improving the decompression performance of the decoder.
  • the solution provided by this application proposes the idea of fusion enhancement based on the optimal MIP mode, that is, the decoder not only needs to determine the optimal MIP mode for predicting the current block, but also needs to fuse another prediction block to achieve different prediction effects. This not only saves bit overhead, but also creates a new prediction technology.
  • the fusion process is actually because the optimal MIP mode cannot completely replace the optimal prediction mode calculated based on the rate distortion cost at the encoding end.
  • a fusion approach is adopted to balance prediction accuracy with prediction diversity.
  • the main idea of the decoder's MIP pattern derivation method based on template matching can be divided into the following parts:
  • the reconstructed samples in the reference area (such as the reference template shown in FIG. 5) are filled, that is, the reference reconstructed samples required for predicting the samples in the second template area (such as the template shown in FIG. 5).
  • the width and height of the reference area do not need to exceed the width and height of the second template area. If the reference area is filled with samples that exceed the width and height of the second template area, downsampling or other methods need to be used to reduce the dimension to meet the MIP input dimension requirements.
  • the decoder uses the reference reconstructed samples in the reference area, the indexes of the multiple MIP modes, and the MIP transposition flag bits as inputs to predict the samples in the second template area to obtain the multiple Prediction block corresponding to MIP mode.
  • the reference reconstruction samples in the reference area need to meet MIP input conditions, such as Haar-downsampling, etc. to reduce the dimension to a specified number of input samples.
  • the indexes of the multiple MIP modes are used to determine the matrix index of the MIP technology, and then obtain the MIP prediction matrix coefficients.
  • the MIP transposition flag bit is used to identify whether input and output need to be transposed.
  • the decoder uses the optimal MIP prediction mode and the second intra prediction mode to predict the current block to obtain the first prediction block and the second prediction block, and weights the second intra prediction mode according to the optimal MIP prediction mode. , perform weighted calculation on the first prediction block and the second prediction block to obtain the prediction block of the current block.
  • the decoding method according to the embodiment of the present application has been described in detail from the perspective of the decoder above.
  • the encoding method according to the embodiment of the present application will be described from the perspective of the encoder with reference to FIG. 17 below.
  • Figure 10 is a schematic flow chart of the encoding method 400 provided by the embodiment of the present application. It should be understood that the encoding method 400 can be performed by an encoder. For example, it is applied to the coding framework 100 shown in FIG. 1 . For ease of description, the following uses an encoder as an example.
  • the encoding method 400 may include:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • the first transformation at the decoding end is the inverse transformation of the fourth transformation at the encoding end
  • the second transformation at the decoding end is the inverse transformation of the third transformation at the encoding end.
  • the third transformation is the basic transformation or the main transformation mentioned above
  • the fourth transformation is the secondary transformation mentioned above
  • the first transformation is the inverse transformation (or inverse transformation) of the secondary transformation.
  • Transform the second transformation may be the base transformation or the inverse transformation (or inverse transformation) of the main transformation.
  • the first transformation may be an inverse (inverse) LFNST
  • the second transformation may be an inverse (inverse) DCT2 type, an inverse (inverse) DCT8 type, or an inverse (inverse) DST7 type, etc.
  • the third transformation may be DCT2 type, DCT8 type or DST7 type, etc.
  • the fourth transformation may be LFNST.
  • the output vector of the optimal MIP mode is the vector output by the optimal MIP mode before upsampling; or, the output vector of the optimal MIP mode is the vector output by the optimal MIP mode. The upsampled vector.
  • the S430 may include:
  • the first intra prediction mode is determined based on the prediction mode used to predict the current block.
  • the first intra prediction mode is determined if the prediction mode used to predict the current block includes the optimal MIP mode and a suboptimal MIP mode used to predict the current block.
  • the prediction mode used to predict the current block includes the optimal MIP mode and the intra prediction mode derived from the TIMD mode, it is determined that the first intra prediction mode is Use the intra prediction mode derived from the DIMD mode for the prediction block of the current block, or determine the first intra prediction mode to be the intra prediction mode derived from the TIMD mode.
  • the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived using the DIMD mode for reconstructed samples in the first template region , then it is determined that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block, or it is determined that the first intra prediction mode is an intra prediction mode for the first template region
  • the reconstructed samples within the intra prediction mode are derived using the DIMD mode.
  • the second template area and the first template area are the same or different.
  • the S410 may include:
  • the second intra prediction mode includes any one of the following: a suboptimal MIP mode for predicting the current block, a frame derived using the DIMD mode for reconstructed samples in the first template region Intra prediction mode, an intra prediction mode derived from the TIMD mode;
  • the current block is predicted based on the optimal MIP mode to obtain a first prediction block; the current block is predicted based on the second intra prediction mode to obtain a second prediction block; Based on the weight of the optimal MIP mode and the weight of the second intra prediction mode, the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
  • the prediction mode used to predict the current block includes the optimal MIP mode and a suboptimal MIP mode used to predict the current block or intra prediction derived from the TIMD mode mode, then based on the distortion cost of the optimal MIP mode and the distortion cost of the second intra prediction mode, determine the weight of the optimal MIP mode and the weight of the second intra prediction mode; if used The prediction mode for predicting the current block includes the optimal MIP mode and the intra prediction mode derived using the DIMD mode for the reconstructed sample in the first template area, then the optimal MIP mode is determined The weight of and the weight of the second intra prediction mode are both preset values.
  • the encoder obtains a first identification; if the first identification is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence , then determine the second intra prediction mode; wherein, the S450 may include:
  • the fourth transform coefficient and the first identifier are encoded.
  • the S450 may include:
  • the second identification is used to identify that the optimal MIP mode and the second intra prediction are allowed to be used. mode to predict the current block; if the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode and the The second intra prediction mode predicts the current block.
  • the method 400 may further include:
  • the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the second template area adjacent to the current block.
  • the second template area and the first template area are the same or different.
  • the samples in the second template area are predicted based on the third identifier and the multiple MIP modes to obtain the distortion cost of the multiple MIP modes in each state of the third identifier. ;
  • the input vector and output vector of the third identifier used to identify whether to transpose the MIP mode; based on the distortion cost of the multiple MIP modes in each state of the third identifier, determine the optimal MIP mode .
  • the method 400 may further include:
  • the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
  • the method 400 may further include:
  • the MIP mode predicts the samples in the second template area respectively to obtain multiple prediction blocks corresponding to the multiple MIP modes; based on the multiple prediction blocks and the reconstruction block in the second template area, determine The distortion cost of the multiple MIP modes.
  • the reference row and the reference column are down-sampled to obtain an input vector; using the input vector as input, the second template region is sampled by traversing the multiple MIP patterns. Predict the samples to obtain the output vectors of the multiple MIP modes; perform upsampling on the output vectors of the multiple MIP modes to obtain prediction blocks corresponding to the multiple MIP modes.
  • the optimal MIP pattern is determined based on a sum SATD of absolute transform differences of the plurality of MIP patterns on the second template region.
  • the encoding method can be understood as the reverse process of the decoding method. Therefore, for the specific solution of the encoding method 400, please refer to the relevant content of the decoding method 300. For the convenience of description, this application will not repeat it again.
  • the second intra prediction mode mentioned above is the suboptimal MIP mode, that is, the encoder or decoder can perform intra prediction on the current block based on the optimal MIP mode and the suboptimal MIP mode to obtain the current block. prediction block.
  • the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
  • the encoder process can be implemented as the following process:
  • step 1
  • step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
  • the encoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the encoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the encoder calculates the distortion cost based on the prediction block of the second template area obtained by traversing each MIP mode and the reconstructed samples in the second template area, and records the distortion cost value under each prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected, as well as the suboptimal MIP mode and its corresponding transposition information. The encoder determines whether fusion enhancement is needed based on the relationship between the cost value of the optimal MIP mode and the cost value of the suboptimal MIP mode.
  • the optimal MIP mode needs to be The optimal MIP prediction block and the suboptimal MIP prediction block are fused and enhanced. If the cost value of the suboptimal prediction mode is greater than or equal to twice the cost value of the optimal MIP mode, no fusion enhancement is required.
  • the encoder obtains the prediction block corresponding to the optimal MIP mode and the suboptimal MIP mode based on the optimal MIP mode, the suboptimal MIP mode, the transposition information of the optimal MIP mode, and the transposition information of the suboptimal MIP mode.
  • the prediction block corresponding to the excellent MIP mode Specifically, first, the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposed information as the input vector, and reads the matrix coefficients in the current mode according to the MIP mode as an index, Then, the output vector is obtained by calculating the input vector and matrix coefficients.
  • the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector to obtain the optimal MIP prediction block and the suboptimal MIP prediction block of the same size as the current block. And based on the calculated weight value of the optimal MIP mode and the weight value of the suboptimal MIP mode, the optimal MIP prediction block and the suboptimal MIP prediction block are weighted and averaged to obtain a new prediction block as the final prediction block of the current block. . If fusion enhancement is not required, the encoder can calculate the optimal MIP prediction block based on the optimal MIP mode and its transposition information. The calculation process is the same as above. Finally, the encoder uses the optimal MIP prediction block as the prediction of the current block. piece.
  • the encoder obtains the rate distortion cost of the current block and records it as cost1.
  • the encoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the encoder determines that the first intra prediction mode is an intra prediction mode for the prediction block of the current block.
  • the output vector of the optimal MIP mode uses the intra prediction mode derived from the DIMD mode.
  • the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
  • the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here.
  • the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and then performs a basic transformation on the residual block of the current block and transforms the basic transformed transform coefficient based on the first intra prediction mode. Perform a secondary transformation, and then perform operations such as quantization, entropy coding, and loop filtering on the transformation coefficients after the secondary transformation. It should be understood that the specific quantification process can be found in the relevant content above. To avoid repetition, it will not be described again here.
  • the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
  • the decoder process can be implemented as the following process:
  • step 1
  • the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
  • the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
  • the decoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the decoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the decoder calculates the distortion cost based on the prediction block of the second template area obtained by traversing each MIP mode and the reconstructed samples in the second template area, and records the distortion cost value under each prediction mode and transposition information. After traversing all allowed prediction modes and transposition information, according to the principle of minimum cost, the optimal MIP mode and its corresponding transposition information are selected, as well as the suboptimal MIP mode and its corresponding transposition information. The decoder determines whether fusion enhancement is needed based on the relationship between the cost value of the optimal MIP mode and the cost value of the suboptimal MIP mode.
  • the optimal MIP mode needs to be The optimal MIP prediction block and the suboptimal MIP prediction block are fused and enhanced. If the cost value of the suboptimal prediction mode is greater than or equal to twice the cost value of the optimal MIP mode, no fusion enhancement is required.
  • the decoder obtains the prediction block corresponding to the optimal MIP mode and the suboptimal MIP mode based on the optimal MIP mode, the suboptimal MIP mode, the transposition information of the optimal MIP mode, and the transposition information of the suboptimal MIP mode.
  • the prediction block corresponding to the excellent MIP mode Specifically, first, the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposed information as the input vector, and reads the matrix coefficients in the current mode according to the MIP mode as an index, Then, the output vector is obtained by calculating the input vector and matrix coefficients.
  • the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector to obtain the optimal MIP prediction block and the suboptimal MIP prediction block of the same size as the current block. And based on the calculated weight value of the optimal MIP mode and the weight value of the suboptimal MIP mode, the optimal MIP prediction block and the suboptimal MIP prediction block are weighted and averaged to obtain a new prediction block as the final prediction block of the current block. . If fusion enhancement is not required, the decoder can calculate the optimal MIP prediction block based on the optimal MIP mode and its transposition information. The calculation process is the same as above. Finally, the decoder uses the optimal MIP prediction block as the prediction of the current block. piece.
  • the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the decoder determines that the first intra prediction mode is an intra prediction mode for the prediction block of the current block.
  • the output vector of the optimal MIP mode uses the intra prediction mode derived from the DIMD mode.
  • the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
  • the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block (based on the first intra-frame prediction).
  • the mode performs the inverse transform of the secondary transform and then performs the inverse transform of the basic transform or the inverse transform of the main transform) to obtain the residual block of the current block (also known as the time-domain residual block or time-domain residual information); then the decoder will The prediction block of the current block and the residual block of the current block are superimposed to obtain a reconstructed sample block.
  • the reconstructed image can be used as video output or as a reference for subsequent decoding.
  • the size of the second template area used by the encoder or decoder in the TMMIP technology can be predefined according to the size of the current block.
  • the width of the upper area adjacent to the current block in the second template area is the width of the current block, and its height is the height of two rows of samples; the left side of the second template area adjacent to the current block The height of the adjacent left area is the height of the current block, and its width is the width of the two rows of samples.
  • the second template area can also be implemented with other sizes, and this application does not specifically limit this.
  • the second intra prediction mode mentioned above is the intra prediction mode derived from the TIMD mode, that is, the encoder or decoder can predict the current intra prediction mode based on the optimal MIP mode and the intra prediction mode derived from the TIMD mode.
  • the block is intra-predicted to obtain the prediction block of the current block.
  • the MIP pattern derivation fusion enhancement technology based on template matching can not only fuse two derived MIP prediction blocks, but can also be fused with prediction blocks generated by other template matching-based derivation technologies.
  • This application integrates TMMIP technology and TIMD technology to obtain a derived fusion method of traditional prediction blocks and matrix-based prediction blocks.
  • TIMD uses the idea of template matching on the encoding and decoding end to derive the optimal traditional intra prediction mode, and this technology can also offset and expand the prediction mode to obtain an updated intra prediction mode.
  • TMMIP technology also uses the idea of template matching on the encoding and decoding end to derive the optimal MIP mode. By fusing these two optimal prediction modes, it can take into account the directionality of traditional prediction blocks and the unique texture characteristics of MIP prediction, resulting in A brand new prediction block to improve coding efficiency.
  • the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
  • the encoder process can be implemented as the following process:
  • step 1
  • step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
  • the encoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the encoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the encoder also needs to try the template matching calculation process of TIMD, obtain different interpolation filters according to different prediction mode indexes, and interpolate the reference samples to obtain prediction samples within the template.
  • the encoder calculates the distortion cost based on the predicted samples of the second template area and the reconstructed samples in the second template area obtained by traversing each MIP mode, and records the distortion cost value under each prediction mode and transposed information, and Based on the distortion cost of each prediction mode and transposed information, and according to the principle of minimum cost, the optimal MIP mode and its corresponding transposed information are selected.
  • the encoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction samples within the template and calculate the distortion cost with the reconstructed samples within the template, and record the optimal prediction mode and times derived from TIMD technology according to the principle of minimum cost.
  • the optimal prediction mode, the distortion cost value of the optimal prediction mode, and the distortion cost value of the suboptimal prediction mode are examples of the optimal prediction mode.
  • the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
  • the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
  • the encoder For the optimal prediction mode and suboptimal prediction mode derived from TIMD technology, if neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than two times the distortion cost of the optimal prediction mode, the encoder needs to perform a prediction block fusion operation. First, the encoder obtains the interpolation filter coefficients according to the optimal prediction mode, performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the encoder performs interpolation filtering according to the suboptimal prediction mode.
  • the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
  • the encoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
  • the encoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
  • the encoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the optimal TIMD prediction block of the current block.
  • the encoder performs a weighted average of the optimal MIP prediction block and the optimal TIMD prediction block to obtain a new prediction block: The predicted block for the current block.
  • the encoder obtains the rate distortion cost of the current block and records it as cost1.
  • the encoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the encoder determines that the first intra prediction mode is an intra prediction mode derived from the DIMD mode. Intra prediction mode derived from TIMD mode.
  • the cost information of the template area of the TIMD technology and the second template area can be set to be the same, that is, the template areas used to calculate the distortion cost are the same, then the cost information of the template area of the TIMD technology and The cost information of the template area of the TMMIP technology can be equivalent or at the same comparison level. In this case, it can also be determined based on the cost information whether to fuse enhancement, which is not specifically limited in this application.
  • the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
  • the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
  • the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and then performs a basic transformation on the residual block of the current block and transforms the basic transformed transform coefficient based on the first intra prediction mode. Perform a secondary transformation, and then perform operations such as quantization, entropy coding, and loop filtering on the transformation coefficients after the secondary transformation. It should be understood that the specific quantification process can be found in the relevant content above. To avoid repetition, it will not be described again here.
  • the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
  • the decoder process can be implemented as the following process:
  • step 1
  • the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
  • the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
  • the decoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the decoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the decoder also needs to try the template matching calculation process of TIMD, obtain different interpolation filters according to different prediction mode indexes, and interpolate the reference samples to obtain prediction samples within the template.
  • the decoder calculates the distortion cost based on the predicted samples of the second template area and the reconstructed samples in the second template area obtained by traversing each MIP mode, and records the distortion cost value under each prediction mode and transposed information, and Based on the distortion cost of each prediction mode and transposed information, and according to the principle of minimum cost, the optimal MIP mode and its corresponding transposed information are selected.
  • the decoder also needs to traverse all intra prediction modes allowed by TIMD, calculate the prediction samples within the template and calculate the distortion cost with the reconstructed samples within the template, and record the optimal prediction mode and times derived from TIMD technology according to the principle of minimum cost.
  • the optimal prediction mode, the distortion cost value of the optimal prediction mode, and the distortion cost value of the suboptimal prediction mode are examples of the suboptimal prediction mode.
  • the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
  • the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
  • the decoder For the optimal prediction mode and suboptimal prediction mode derived from TIMD technology, if neither the optimal prediction mode nor the suboptimal prediction mode is the mean (DC) mode or the flat (PLANAR) mode, and the distortion cost of the suboptimal prediction mode is less than two times the distortion cost of the optimal prediction mode, the decoder needs to perform a prediction block fusion operation. First, the decoder obtains the interpolation filter coefficients according to the optimal prediction mode, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal prediction block; secondly, the decoder performs interpolation filtering according to the suboptimal prediction mode.
  • the prediction mode obtains the interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as suboptimal prediction blocks.
  • the decoder uses the ratio between the optimal prediction mode cost value and the suboptimal prediction mode cost value to calculate the weight value belonging to the optimal prediction block and the weight value of the suboptimal prediction block.
  • the decoder performs a weighted fusion of the optimal prediction block and the suboptimal prediction block to obtain the prediction block of the current block as output.
  • the decoder does not need to perform prediction blocks In the fusion operation, only the optimal prediction block obtained by interpolation filtering the upper and left adjacent reconstructed samples using the optimal prediction mode is used as the optimal TIMD prediction block of the current block.
  • the decoder performs a weighted average of the optimal MIP prediction block and the optimal TIMD prediction block to obtain a new prediction block: The predicted block for the current block.
  • the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the decoder determines that the first intra prediction mode is an intra prediction mode derived from the DIMD mode. Intra prediction mode derived from TIMD mode.
  • the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
  • the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block (based on the first intra-frame prediction).
  • the mode performs the inverse transform of the secondary transform and then performs the inverse transform of the basic transform or the inverse transform of the main transform) to obtain the residual block of the current block (also known as the time-domain residual block or time-domain residual information); then the decoder will The prediction block of the current block and the residual block of the current block are superimposed to obtain a reconstructed sample block.
  • the reconstructed image can be used as video output or as a reference for subsequent decoding.
  • the calculation process of the weight value of the weighted fusion of TIMD prediction blocks can be referred to the content described above in the introduction to TIMD technology. To avoid duplication, it will not be described again here.
  • the encoder or decoder may determine whether to fuse enhancement based on the optimal prediction mode derived by TIMD; for example, if the optimal prediction mode derived by TIMD is DC mode or PLANAR mode, the encoder or decoder may not use fusion Enhancement, that is, only the prediction block generated by the optimal MIP mode derived from the TMMIP technology is used as the prediction block of the current block.
  • the size of the second template area used by the encoder or decoder in the TMMIP technology can be predefined according to the size of the current block.
  • the definition of the second template area in the TMMIP technology may be consistent with the definition of the template area in the TIMD technology, or may be different.
  • the width of the current block is less than or equal to 8
  • the height of the upper area adjacent to the current block in the second template area is the height of two rows of samples, otherwise the height is the height of four rows of samples
  • the width of the left area adjacent to the left side of the current block in the second template area is the height of two columns of samples, otherwise the width is the height of four columns of samples.
  • the second intra prediction mode mentioned above is an intra prediction mode derived from the DIMD mode, that is, the encoder or decoder can be based on the optimal MIP mode and the first template adjacent to the current block.
  • the reconstructed samples in the region use the intra prediction mode derived from the DIMD mode to perform intra prediction on the current block to obtain the prediction block of the current block.
  • TMMIP technology can also be integrated and enhanced with DIMD technology.
  • the prediction modes derived by DIMD technology and TIMD technology are both traditional intra prediction modes, due to different derivation methods, the prediction modes obtained by the two are not necessarily the same.
  • the fusion enhancement of TMMIP technology and DIMD technology will be different from the fusion enhancement of TMMIP technology and TIMD technology.
  • the size of the second template area of TMMIP technology and TIMD technology is generally the same, and the calculation cost information is basically an absolute transformation. Sum of Absolute Transformed Difference (SATD), which is also called the distortion cost value based on Hadamard transform. Therefore, TMMIP technology and TIMD technology can directly calculate the fusion weight based on this cost information.
  • SATD Sum of Absolute Transformed Difference
  • the template area is generally not as large as the second template area of the TMMIP technology (or DIMD technology), and the criterion for DIMD to derive the prediction mode is measured based on the gradient amplitude value.
  • the gradient amplitude value and the SATD cost value cannot be directly equivalent, so the weight It cannot be calculated simply by referring to the solution when integrating TMMIP technology and TIMD technology.
  • the encoder traverses the prediction mode. If the current block is in intra mode, the encoder obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode derivation technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current encoder allows the use of TMMIP technology.
  • the encoder process can be implemented as the following process:
  • step 1
  • step 2 If sps_tmmip_enable_flag is true, the encoder tries TMMIP technology, that is, performs step 2; if sps_tmmip_enable_flag is false, the encoder does not try TMMIP technology, that is, skips step 2 and proceeds to step 3 directly.
  • the encoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the encoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean value; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the encoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the encoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the encoder first performs Haar down-sampling on the reconstructed samples. For example, the encoder determines the down-sampling step size based on the block size. Then, the encoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the encoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the encoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the encoder uses DIMD technology to derive the optimal intra prediction mode, which is the optimal DIMD mode.
  • the DIMD technology calculates the gradient value of the reconstructed sample in the first template area based on the Sobel operator, and converts the gradient value based on the angle values of different prediction modes to obtain the amplitude value in the corresponding prediction mode.
  • the encoder traverses the template prediction blocks obtained from each MIP mode, calculates the distortion cost with the reconstructed samples in the template, and records the optimal MIP mode and transposition information according to the minimum cost principle.
  • the encoder traverses all allowed intra-frame prediction modes, calculates the amplitude value in each intra-frame prediction mode, and records the optimal DIMD prediction mode according to the principle of maximum amplitude.
  • the encoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
  • the encoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
  • the encoder obtains the corresponding interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal DIMD prediction block.
  • the encoder performs a weighted average of the optimal MIP prediction block and the optimal DIMD prediction block for each prediction sample according to the preset weight, and the new prediction block obtained is the prediction block of the current block.
  • the encoder obtains the rate distortion cost of the current block and records it as cost1.
  • the encoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the encoder determines that the first intra prediction mode is an intra prediction mode for the prediction block of the current block.
  • the reconstructed samples within the first template region use the intra prediction mode derived from the DIMD mode.
  • the encoder continues to traverse other intra prediction techniques and calculates the corresponding rate-distortion cost as cost2...costN.
  • the current block uses TMMIP technology, and the encoder sets the TMMIP usage flag of the current block to true and writes it into the code stream; if cost1 is not the minimum rate distortion cost, the current block uses other intra-frame techniques. Prediction technology, the encoder sets the TMMIP usage flag position of the current block and writes it into the code stream. It should be understood that information such as identification bits or indexes of other intra prediction technologies are transmitted according to definition and will not be elaborated here;
  • the encoder determines the residual block of the current block based on the prediction block of the current block and the original block of the current block, and then performs a basic transformation on the residual block of the current block and transforms the basic transformed transform coefficient based on the first intra prediction mode. Perform a secondary transformation, and then perform operations such as quantization, entropy coding, and loop filtering on the transformation coefficients after the secondary transformation. It should be understood that the specific quantification process can be found in the relevant content above. To avoid repetition, it will not be described again here.
  • the decoder parses the block-level type flag bit. If it is intra-frame mode, it parses or obtains the sequence-level allowable flag bit, which is used to indicate whether the current sequence is allowed to use the MIP mode export technology based on template matching, which can be in the form of sps_tmmip_enable_flag. If the allowed flag bits of tmmip are all true, it means that the current decoder allows the use of TMMIP technology.
  • the decoder process can be implemented as the following process:
  • step 1
  • the decoder parses the TMMIP usage flag of the current block. Otherwise, the current decoding process does not need to decode the block-level TMMIP usage flag.
  • the block-level TMMIP usage flag defaults to No. If the TMMIP usage flag of the current block is true, perform step 2; otherwise, perform step 3.
  • the decoder fills the adjacent rows and columns outside the second template region with reconstructed samples.
  • the filling process is the same as the filling method in the original intra prediction process. For example, the decoder can traverse and fill from the lower left corner to the upper right corner. If all reconstructed samples are available, all available reconstructed samples will be filled in sequence; if all reconstructed samples are available If it is not available, all the reconstructed samples will be filled with the mean; if some of the reconstructed samples are available, the available reconstructed samples will be filled in first. For the remaining unavailable reconstructed samples, the decoder can traverse according to the above order from the lower left corner to the upper right corner until the first available reconstruction sample appears. After a sample is reconstructed, previously unavailable locations are filled with the first available reconstructed sample.
  • the decoder takes the filled reconstructed samples outside the second template area as input, and uses the allowable MIP mode to predict the samples in the second template area.
  • MIP patterns are allowed to be used for a 4x4 block size.
  • the allowed MIP patterns are 8.
  • the allowed MIP pattern for other size blocks is 6.
  • blocks of any size can use the MIP transpose function, and the prediction mode of TMMIP mentioned above is the same as the MIP technology.
  • the specific prediction calculation process includes: the decoder first performs Haar down-sampling on the reconstructed samples. For example, the decoder determines the down-sampling step size based on the block size. Then, the decoder adjusts the splicing order of the reconstructed samples after downsampling on the upper side and the reconstructed samples after downsampling on the left based on the information about whether to transpose or not; if transposition is not required, the reconstructed samples after downsampling on the left are spliced on top After the reconstructed sample after side downsampling, the obtained vector is used as input.
  • the decoder obtains the MIP matrix coefficients based on the traversed prediction mode as an index, and calculates the output vector with the input. Finally, the decoder upsamples the output vector according to the number of output vectors and the current template size. If upsampling is not required, the vectors are filled in the horizontal direction and output as the template prediction block. If upsampling is required, the horizontal direction is upsampled first. Then downsample in the vertical direction, upsample to the same size as the template, and then output it as a prediction block in the second template area.
  • the decoder uses DIMD technology to derive the optimal intra prediction mode, which is the optimal DIMD mode.
  • the DIMD technology calculates the gradient value of the reconstructed sample in the first template area based on the Sobel operator, and converts the gradient value based on the angle values of different prediction modes to obtain the amplitude value in the corresponding prediction mode.
  • the decoder traverses the template prediction blocks obtained from each MIP mode, calculates the distortion cost with the reconstructed samples in the template, and records the optimal MIP mode and transposition information according to the principle of minimum cost.
  • the decoder traverses all allowed intra-frame prediction modes, calculates the amplitude value in each intra-frame prediction mode, and records the optimal DIMD prediction mode according to the principle of maximum amplitude.
  • the decoder samples the reconstructed samples adjacent to the upper and left sides of the current block as appropriate and splices them according to the transposition information as the input vector, and uses the MIP mode as the index Read the matrix coefficients in the current mode, and then obtain the output vector by calculating the input vector and matrix coefficients.
  • the decoder can perform output transposition according to the transposition information, and upsample the output vector according to the size of the current block and the number of samples of the output vector, and obtain an output with the same size as the current block as the optimal MIP prediction block of the current block.
  • the decoder obtains the corresponding interpolation filter coefficients, and performs interpolation filtering on the upper and left adjacent reconstructed samples to obtain prediction samples at all positions in the current block, which are recorded as the optimal DIMD prediction block.
  • the decoder performs a weighted average of the optimal MIP prediction block and the optimal DIMD prediction block for each prediction sample according to the preset weight, and the new prediction block obtained is the prediction block of the current block.
  • the decoder determines that the first intra prediction mode is an intra prediction mode derived using the DIMD mode for a prediction block of the current block, or the decoder determines that the first intra prediction mode is an intra prediction mode for the prediction block of the current block.
  • the reconstructed samples within the first template region use the intra prediction mode derived from the DIMD mode.
  • the decoder continues to parse information such as usage flags or indexes of other intra prediction technologies, and obtains the final prediction block of the current block based on the parsed information.
  • the decoder parses the code stream and obtains the frequency domain residual block of the current block (also called frequency domain residual information), and performs inverse quantization and inverse transformation on the frequency domain residual block of the current block (based on the first intra-frame prediction).
  • the mode performs the inverse transform of the secondary transform and then performs the inverse transform of the basic transform or the inverse transform of the main transform) to obtain the residual block of the current block (also known as the time-domain residual block or time-domain residual information); then the decoder will The prediction block of the current block and the residual block of the current block are superimposed to obtain a reconstructed sample block.
  • the reconstructed image can be used as video output or as a reference for subsequent decoding.
  • the calculation process of the optimal DIMD prediction block can be referred to the content described above in the introduction to DIMD technology. To avoid repetition, it will not be described again here.
  • the fusion weight of the optimal MIP prediction block and the optimal DIMD prediction block can be a preset value, for example, the optimal MIP prediction block accounts for 5/9, and the optimal DIMD prediction block accounts for 4/9.
  • the fusion weight of the optimal MIP prediction block and the optimal DIMD prediction block can also be other values, which is not specifically limited in this application.
  • the above-mentioned second template area and the above-mentioned first template area may be the same or different, and this application does not specifically limit this.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
  • the implementation of the examples does not constitute any limitations.
  • Figure 11 is a schematic block diagram of the decoder 500 according to the embodiment of the present application.
  • the decoder 500 may include:
  • the parsing unit 510 is configured to parse the code stream of the current sequence to obtain the first transform coefficient of the current block;
  • Transformation unit 520 used for:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • the reconstruction unit 530 is configured to determine a reconstruction block of the current block based on the prediction block of the current block and the residual block of the current block.
  • the output vector of the optimal MIP mode is the vector output by the optimal MIP mode before upsampling; or, the output vector of the optimal MIP mode is the vector output by the optimal MIP mode. The upsampled vector.
  • the transformation unit 520 is specifically used to:
  • the first intra prediction mode is determined based on the prediction mode used to predict the current block.
  • the transformation unit 520 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and the suboptimal MIP mode used to predict the current block, then determine the first intra prediction mode to be the prediction mode for the current block.
  • the prediction block uses the intra prediction mode derived from the DIMD mode, or the first intra prediction mode is determined to be the intra prediction mode derived from the DIMD mode for the output vector of the optimal MIP mode.
  • the transformation unit 520 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and the intra prediction mode derived from the TIMD mode, it is determined that the first intra prediction mode is the prediction mode for the current block.
  • the prediction block uses an intra prediction mode derived from the DIMD mode, or determines the first intra prediction mode to be an intra prediction mode derived from the TIMD mode.
  • the transformation unit 520 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived using the DIMD mode for reconstructed samples in the first template region, then determine the first An intra prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block, or the first intra prediction mode is determined to be an intra prediction mode using the reconstructed samples in the first template region.
  • the reconstruction unit 530 is also used to:
  • the second intra prediction mode includes any one of the following: a suboptimal MIP mode for predicting the current block, a frame derived using the DIMD mode for reconstructed samples in the first template region Intra prediction mode, an intra prediction mode derived from the TIMD mode;
  • the current block is predicted based on the optimal MIP mode and the second intra prediction mode to obtain a prediction block of the current block.
  • the reconstruction unit 530 is specifically used to:
  • the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
  • the reconstruction unit 530 performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the second intra prediction mode, Before obtaining the predicted block of the current block, it is also used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and a suboptimal MIP mode used to predict the current block or an intra prediction mode derived from the TIMD mode, then based on the The distortion cost of the optimal MIP mode and the distortion cost of the second intra prediction mode, determining the weight of the optimal MIP mode and the weight of the second intra prediction mode;
  • the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived using the DIMD mode for reconstructed samples in the first template region, then determine the optimal MIP mode.
  • the weight of the superior MIP mode and the weight of the second intra prediction mode are both preset values.
  • the transformation unit 520 is specifically used to:
  • the second intra prediction mode is determined.
  • the transformation unit 520 is specifically used to:
  • the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image blocks in the current sequence, then parse the code stream to obtain the second identifier;
  • the second intra prediction mode is determined.
  • the reconstruction unit 530 is also used to:
  • the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the second template area adjacent to the current block.
  • the second template area and the first template area are the same or different.
  • the reconstruction unit 530 is specifically used to:
  • the samples in the second template area based on the third identifier and the multiple MIP modes to obtain the distortion cost of the multiple MIP modes in each state of the third identifier; the third identifier
  • the input vector and output vector used to identify whether to transpose the MIP mode;
  • the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
  • the reconstruction unit 530 before determining the optimal MIP mode based on the distortion costs of multiple MIP modes, the reconstruction unit 530 is also used to:
  • the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
  • the reconstruction unit 530 before determining the optimal MIP mode based on the distortion costs of multiple MIP modes, the reconstruction unit 530 is also used to:
  • Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the second template region.
  • the reconstruction unit 530 is specifically used to:
  • the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
  • the reconstruction unit 530 is specifically used to:
  • the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the second template area.
  • Figure 12 is a schematic block diagram of the encoder 600 according to the embodiment of the present application.
  • the encoder 600 may include:
  • Residual unit 610 used to obtain the residual block of the current block in the current sequence
  • Transformation unit 620 used for:
  • the first intra prediction mode includes any one of the following: using an intra prediction mode derived by a decoder side intra mode derivation DIMD mode for the prediction block of the current block, or using an intra prediction mode for predicting the current block.
  • the output vector of the optimal matrix-based intra prediction MIP mode of the current block uses the intra prediction mode derived from the DIMD mode, using the DIMD mode for the reconstructed samples within the first template region adjacent to the current block.
  • Derived intra prediction mode intra prediction mode derived from template-based intra mode derivation TIMD mode;
  • Encoding unit 630 configured to encode the fourth transform coefficient.
  • the output vector of the optimal MIP mode is the vector output by the optimal MIP mode before upsampling; or, the output vector of the optimal MIP mode is the vector output by the optimal MIP mode. The upsampled vector.
  • the transformation unit 620 is specifically used to:
  • the first intra prediction mode is determined based on the prediction mode used to predict the current block.
  • the transformation unit 620 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and the suboptimal MIP mode used to predict the current block, then determine the first intra prediction mode to be the prediction mode for the current block.
  • the prediction block uses the intra prediction mode derived from the DIMD mode, or the first intra prediction mode is determined to be the intra prediction mode derived from the DIMD mode for the output vector of the optimal MIP mode.
  • the transformation unit 620 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and the intra prediction mode derived from the TIMD mode, it is determined that the first intra prediction mode is the prediction mode for the current block.
  • the prediction block uses an intra prediction mode derived from the DIMD mode, or determines the first intra prediction mode to be an intra prediction mode derived from the TIMD mode.
  • the transformation unit 620 is specifically used to:
  • the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived using the DIMD mode for reconstructed samples in the first template region, then determine the first An intra prediction mode is an intra prediction mode derived using the DIMD mode for the prediction block of the current block, or the first intra prediction mode is determined to be an intra prediction mode using the reconstructed samples in the first template region.
  • the residual unit 610 is specifically used to:
  • the second intra prediction mode includes any one of the following: a suboptimal MIP mode for predicting the current block, a frame derived using the DIMD mode for reconstructed samples in the first template region Intra prediction mode, an intra prediction mode derived from the TIMD mode;
  • the residual unit 610 is specifically used to:
  • the first prediction block and the second prediction block are weighted to obtain a prediction block of the current block.
  • the residual unit 610 performs weighting processing on the first prediction block and the second prediction block based on the weight of the optimal MIP mode and the weight of the second intra prediction mode. , before obtaining the predicted block of the current block, it is also used for:
  • the prediction mode used to predict the current block includes the optimal MIP mode and a suboptimal MIP mode used to predict the current block or an intra prediction mode derived from the TIMD mode, then based on the The distortion cost of the optimal MIP mode and the distortion cost of the second intra prediction mode, determining the weight of the optimal MIP mode and the weight of the second intra prediction mode;
  • the prediction mode used to predict the current block includes the optimal MIP mode and an intra prediction mode derived using the DIMD mode for reconstructed samples in the first template region, then determine the optimal MIP mode.
  • the weight of the superior MIP mode and the weight of the second intra prediction mode are both preset values.
  • the residual unit 610 is specifically used to:
  • the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence, then determine the second intra prediction mode;
  • the encoding unit 630 is specifically used for:
  • the fourth transform coefficient and the first identifier are encoded.
  • the residual unit 610 is specifically used to:
  • the first identifier is used to identify that the optimal MIP mode and the second intra prediction mode are allowed to be used to predict the image block in the current sequence, then based on the optimal MIP mode and the third intra prediction mode, The second intra prediction mode predicts the current block to obtain the first rate distortion cost;
  • the current block will be predicted based on the optimal MIP mode and the second intra prediction mode. Prediction block, determined to be the prediction block of the current block;
  • the encoding unit 630 is specifically used for:
  • the second identification is used to identify that the optimal MIP mode and the second intra prediction are allowed to be used. mode to predict the current block; if the first rate distortion cost is greater than the minimum value of the at least one rate distortion cost, the second identification is used to identify that the optimal MIP mode and the The second intra prediction mode predicts the current block.
  • the residual unit 610 is also used to:
  • the distortion cost of the multiple MIP modes includes the distortion cost obtained by using the multiple MIP modes to predict samples in the second template area adjacent to the current block.
  • the second template area and the first template area are the same or different.
  • the residual unit 610 is specifically used to:
  • the samples in the second template area based on the third identifier and the multiple MIP modes to obtain the distortion cost of the multiple MIP modes in each state of the third identifier; the third identifier
  • the input vector and output vector used to identify whether to transpose the MIP mode;
  • the optimal MIP mode is determined based on the distortion costs of the multiple MIP modes in each state of the third identifier.
  • the residual unit 610 before determining the optimal MIP mode based on the distortion costs of multiple MIP modes, the residual unit 610 is also used to:
  • the MIP modes used by the adjacent blocks are determined as the multiple MIP modes.
  • the residual unit 610 before determining the optimal MIP mode based on the distortion costs of multiple MIP modes, the residual unit 610 is also used to:
  • Distortion costs for the plurality of MIP modes are determined based on the plurality of prediction blocks and reconstruction blocks within the second template region.
  • the residual unit 610 is specifically used to:
  • the output vectors of the multiple MIP modes are upsampled to obtain prediction blocks corresponding to the multiple MIP modes.
  • the residual unit 610 is specifically used to:
  • the optimal MIP pattern is determined based on the sum SATD of absolute transformation differences of the plurality of MIP patterns on the second template area.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the decoder 500 shown in Figure 11 may correspond to the corresponding subject in performing the method 300 of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the decoder 500 are respectively to implement the method 300, etc.
  • the encoder 600 shown in Figure 12 may correspond to the corresponding subject in performing the method 400 of the embodiment of the present application, that is, the foregoing and other operations and/or functions of each unit in the encoder 600 are respectively in order to implement the method 400, etc. Corresponding processes in each method.
  • each unit in the decoder 500 or encoder 600 involved in the embodiment of the present application can be separately or entirely combined into one or several other units to form, or some of the units (some) can also be disassembled. It is divided into multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the decoder 500 or the encoder 600 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
  • Run a computer program capable of executing each step involved in the corresponding method to construct the decoder 500 or encoder 600 involved in the embodiment of the present application, and implement the encoding method or decoding method of the embodiment of the present application.
  • the computer program can be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and run therein to implement the corresponding methods of the embodiments of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
  • each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
  • the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
  • the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • FIG. 13 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device 700 at least includes a processor 710 and a computer-readable storage medium 720 .
  • the processor 710 and the computer-readable storage medium 720 may be connected through a bus or other means.
  • the computer-readable storage medium 720 is used to store a computer program 721
  • the computer program 721 includes computer instructions
  • the processor 710 is used to execute the computer instructions stored in the computer-readable storage medium 720.
  • the processor 710 is the computing core and the control core of the electronic device 700. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
  • the processor 710 may also be called a central processing unit (Central Processing Unit, CPU).
  • the processor 710 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the computer-readable storage medium 720 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 710 Computer-readable storage media.
  • the computer-readable storage medium 720 includes, but is not limited to: volatile memory and/or non-volatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the electronic device 700 may be the encoder or coding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores the first computer instructions; the computer-readable instructions are loaded and executed by the processor 710
  • the first computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the encoding method provided by the embodiment of the present application; in other words, the first computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
  • the electronic device 700 may be the decoder or decoding framework involved in the embodiment of the present application; the computer-readable storage medium 720 stores second computer instructions; the computer-readable instructions are loaded and executed by the processor 710 The second computer instructions stored in the storage medium 720 are used to implement the corresponding steps in the decoding method provided by the embodiment of the present application; in other words, the second computer instructions in the computer-readable storage medium 720 are loaded by the processor 710 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
  • embodiments of the present application also provide a coding and decoding system, including the above-mentioned encoder and decoder.
  • embodiments of the present application also provide a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a memory device in the electronic device 700 and is used to store programs and data.
  • computer-readable storage medium 720 may include a built-in storage medium in the electronic device 700 , and of course may also include an extended storage medium supported by the electronic device 700 .
  • the computer-readable storage medium provides storage space that stores the operating system of the electronic device 700 .
  • one or more computer instructions suitable for being loaded and executed by the processor 710 are also stored in the storage space. These computer instructions may be one or more computer programs 721 (including program codes).
  • a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • the data processing device 700 can be a computer.
  • the processor 710 reads the computer instructions from the computer-readable storage medium 720.
  • the processor 710 executes the computer instructions, so that the computer executes the encoding method provided in the above various optional ways. or decoding method.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供了一种解码方法、编码方法、解码器以及编码器。本申请实施例涉及图像视频编解码技术领域。在本申请实施例提供的解码方法中,通过第一帧内预测模式对应的变换集对当前块的第一变换系数进行第一变换,并将第一帧内预测模式设计为包括以下中的任一项:对当前块的预测块使用由DIMD模式导出的帧内预测模式、对用于预测当前块的最优MIP模式的输出向量使用DIMD模式导出的帧内预测模式、对与当前块相邻的第一模板区域内的重建样本使用 DIMD 模式导出的帧内预测模式、由TIMD模式导出的帧内预测模式,能够提升当前块的解压缩性能。

Description

解码方法、编码方法、解码器以及编码器 技术领域
本申请实施例涉及图像视频编解码技术领域,并且更具体地,涉及解码方法、编码方法、解码器以及编码器。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够实现视频的解压缩技术,但目前仍然需要追求更好的数字视频解压缩技术,以在提升压缩效率。
发明内容
本申请实施例提供了一种解码方法、编码方法、解码器以及编码器,能够提升压缩效率。
第一方面,本申请实施例提供了一种解码方法,包括:
解析当前序列的码流获取当前块的第一变换系数;
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
对所述第二变换系数进行第二变换,得到所述当前块的残差块;
基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
第二方面,本申请实施例提供了一种编码方法,包括:
获取当前序列中当前块的残差块;
对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
对所述第四变换系数进行编码。
第三方面,本申请实施例提供了一种解码器,包括:
解析单元,用于解析当前序列的码流获取当前块的第一变换系数;
变换单元,用于:
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
对所述第二变换系数进行第二变换,得到所述当前块的残差块;
重建单元,用于基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
第四方面,本申请实施例提供了一种编码器,包括:
残差单元,用于获取当前序列中当前块的残差块;
变换单元,用于:
对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
编码单元,用于对所述第四变换系数进行编码。
第五方面,本申请实施例提供了一种解码器,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行上述第一方面或其各实现方式中的解码方法。
在一种实现方式中,该处理器为一个或多个,该存储器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算机可读存储介质与处理器分离设置。
第六方面,本申请实施例提供了一种编码器,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行上述第二方面或其各实现方式中的编码方法。
在一种实现方式中,该处理器为一个或多个,该存储器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算机可读存储介质与处理器分离设置。
第七方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,该计算机指令被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面涉及的解码方法或上述第二方面涉及的编码方法。
第八方面,本申请实施例提供了一种码流,该码流上述第一方面中涉及所述的码流或上述第二方面中涉及的码流。
基于以上技术方案,通过引入第一帧内预测模式,并基于所述第一帧内预测模式对应的变换集对所述当前块的第一变换系数进行第一变换,能够提升所述当前块的解压缩性能。尤其是,解码器采用非传统的帧内预测模式对当前块进行预测时,可以避免直接使用平面模式对应的变换集进行第一变换,而所述第一帧内预测模式对应的变换集在一定程度上能够反映当前块的纹理方向,进而能够提升当前块的解压缩性能。
附图说明
图1是本申请实施例提供的编码框架的示意性框图。
图2是本申请实施例提供的MIP模式的示意图。
图3是本申请实施例提供的基于DIMD导出预测模式的示意图。
图4是本申请实施例提供的基于DIMD导出预测块的示意图。
图5是本申请实施例提供的TIMD使用的模板的示意图。
图6是本申请实施例提供的LFNST的示例。
图7是本申请实施例提供的LFNST的变换集的示例。
图8是本申请实施例提供的解码框架的示意性框图。
图9是本申请实施例提供的解码方法的示意性流程图。
图10是本申请实施例提供的编码方法的示意性流程图。
图11是本申请实施例提供的解码器的示意性框图。
图12是本申请实施例提供的编码器的示意性框图。
图13是本申请实施例提供的电子设备的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
本申请实施例提供的方案可应用于数字视频编码技术领域,例如,包括但不限于:图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域以及实时视频编解码领域。此外,本申请实施例提供的方案可结合至音视频编码标准(Audio Video coding Standard,AVS)、第二代AVS标准(AVS2)或第三代AVS标准(AVS3)。例如,包括但不限于:H.264/音视频编码(Audio Video coding,AVC)标准、H.265/高效视频编码(High Efficiency Video Coding,HEVC)标准以及H.266/多功能视频编码(Versatile Video Coding,VVC)标准。另外,本申请实施例提供的方案可以用于对图像进行有损压缩(lossy compression),也可以用于对图像进行无损压缩(lossless compression)。其中该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
视频编解码标准都采用基于块的混合编码框架。具体地,视频中的每一张图像被分割成相同大小(如128x128,64x64等)的正方形的最大编码单元(largest coding unit,LCU)或编码树单元(Coding Tree Unit,CTU)。每个最大编码单元或编码树单元可根据规则划分成矩形的编码单元(coding unit,CU)。编码单元可能还会划分为预测单元(prediction unit,PU),变换单元(transform unit,TU)等。混合编码框架包括预测(prediction)、变换(transform)、量化(quantization)、熵编码(entropy coding)、环路滤波(in loop filter)等模块。预测模块包括帧内预测(intra prediction)和帧间预测(inter prediction)。帧间预测包括运动估计(motion estimation)和运动补偿(motion compensation)。由于视频的一张图像中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。帧内预测只参考同一张图像的信息,预测当前划分块内的像素信息。由于视频中的相邻图像之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻图像之间的时间冗余,从而提高编码效率。帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配当前划分块的运动矢量信息。变换将预测后的图像块转换到频率域,能量重新分布,结合量化可以将人眼不敏感的信息去除,用于消除视觉冗余。熵编码可以根据当前上下文模型以及二进制码流的概率信息消除字符冗余。
在数字视频编码过程中,编码器可以先从原始视频序列中读取一幅黑白图像或彩色图像,然后针对黑白图像或彩色图像进行编码。其中,黑白图像可以包括亮度分量的像素,彩色图像可以包括色度分量的像素。可选的,彩色图像还可以包括亮度分量的像素。原始视频序列的颜色格式可以是亮度色度(YCbCr,YUV)格式或红绿蓝(Red-Green-Blue,RGB)格式等。具体地,编码器读取一幅黑白图像或彩色图像之后,分别将其划分成块,并对当前块使用帧内预测或帧间预测产生当前块的预测块,当前块的原始块减去预测块得到残差块,对残差块进行变换、量化得到量化系数矩阵,对量化系数矩阵进行熵编码输出到码流中。在数字视频解码过程中,解码端对当前块使用帧内预测或帧间预测产生当前块的预测块。此外,解码端解码码流得到量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块,将预测块和残差块相加得到重建块。重建块可用于组成重建图像,解码端基于图像或基于块对重建图像进行环路滤波得到解码图像。
当前块(current block)可以是当前编码单元(CU)或当前预测单元(PU)等。
需要说明的是,编码端同样需要和解码端类似的操作获得解码图像。解码图像可以作为后续图像帧间预测的参考图像。编码端确定的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,如果有必要需要在输出到码流中。解码端通过解析及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。编码端获得的解码图像通常也叫做重建图像。在预测时可以将当前块划分成预测单元,在变换时可以将当前块划分成变换单元,预测单元和变换单元的划分可以相同也可以不同。当然,上述仅是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,框架的一些模块或流程的一些步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程。
为了便于理解,先对本申请提供的编码框架进行简单介绍。
图1是本申请实施例提供的编码框架100的示意性框图。
如图1所示,该编码框架100可包括帧内预测单元180、帧间预测单元170、残差单元110、变换与量化单元120、熵编码单元130、反变换与反量化单元140、以及环路滤波单元150。可选的,该编码框架100还可包括解码图像缓冲单元160。该编码框架100也可称为混合框架编码模式。
其中,帧内预测单元180或帧间预测单元170可对待编码图像块进行预测,以输出预测块。残差单元110可基于预测块与待编码图像块计算残差块,即预测块和待编码图像块的差值。变换与量化单元120用于对残差块执行变换与量化等操作,以去除人眼不敏感的信息,进而消除视觉冗余。可选的,经过变换与量化单元120变换与量化之前的残差块可称为时域残差块,经过变换与量化单元120变换与量 化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元130接收到变换与量化单元120输出的变换量化系数后,可基于该变换量化系数输出码流。例如,熵编码单元130可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。例如,熵编码单元130可以用于基于上下文的自适应二进制算术熵编码(CABAC)。熵编码单元130也可称为头信息编码单元。可选的,在本申请中,该待编码图像块也可称为原始图像块或目标图像块,预测块也可称为预测图像块或图像预测块,还可以称为预测信号或预测信息,重建块也可称为重建图像块或图像重建块,还可以称为重建信号或重建信息。此外,针对编码端,该待编码图像块也可称为编码块或编码图像块,针对解码端,该待编码图像块也可称为解码块或解码图像块。该待编码图像块可以是CTU或CU。
编码框架100将预测块与待编码图像块计算残差得到残差块经由变换与量化等过程,将残差块传输到解码端。相应的,解码端接收并解码码流后,经过反变换与反量化等步骤得到残差块,将解码端预测得到的预测块叠加残差块后得到重建块。
需要说明的是,编码框架100中的反变换与反量化单元140、环路滤波单元150以及解码图像缓冲单元160可用于形成一个解码器。相当于,帧内预测单元180或帧间预测单元170可基于已有的重建块对待编码图像块进行预测,进而能够保证编码端和解码端的对参考图像的理解一致。换言之,编码器可复制解码器的处理环路,进而可与解码端产生相同的预测。具体而言,量化的变换系数通过反变换与反量化单元140反变换与反量化来复制解码端的近似残差块。该近似残差块加上预测块后可经过环路滤波单元150,以平滑滤除由于基于块处理和量化产生的块效应等影响。环路滤波单元150输出的图像块可存储在解码图像缓存单元160中,以便用于后续图像的预测。
应理解,图1仅为本申请的示例,不应理解为对本申请的限制。
例如,该编码框架100中的环路滤波单元150可包括去块滤波器(deblocking filter,DBF)和样点自适应补偿(Sample Adaptive Offset,SAO)滤波。DBF的作用是去块效应,SAO的作用是去振铃效应。在本申请的其他实施例中,该编码框架100可采用基于神经网络的环路滤波算法,以提高视频的压缩效率。或者说,该编码框架100可以是基于深度学习的神经网络的视频编码混合框架。在一种实现中,可以在去块滤波器和样点自适应补偿滤波基础上,采用基于卷积神经网络的模型计算对像素滤波后的结果。环路滤波单元150在亮度分量和色度分量上的网络结构可以相同,也可以有所不同。考虑到亮度分量包含更多的视觉信息,还可以采用亮度分量指导色度分量的滤波,以提升色度分量的重建质量。
下面对帧内预测和帧间预测的相关内容进行说明。
对于帧间预测,帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配待编码图像块的运动矢量信息,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。
对于帧内预测,帧内预测只参考同一张图像的信息,预测待编码图像块内的像素信息,用于消除空间冗余;帧内预测所使用的帧可以为I帧。例如,可根据从左至右、从上到下的编码顺序,待编码图像块可以参考左上方图像块,上方图像块以及左侧图像块作为参考信息来预测待编码图像块,而待编码图像块又作为下一个图像块的参考信息,如此,可对整幅图像进行预测。若输入的数字视频为彩色格式,例如YUV 4:2:0格式,则该数字视频的每一图像帧的每4个像素点由4个Y分量和2个UV分量组成,编码框架可对Y分量(即亮度块)和UV分量(即色度块)分别进行编码。类似的,解码端也可根据格式进行相应的解码。
针对帧内预测过程,帧内预测可借助角度预测模式与非角度预测模式对待编码图像块进行预测,以得到预测块,根据预测块与待编码图像块计算得到的率失真信息,筛选出待编码图像块最优的预测模式,并将该预测模式经码流传输到解码端。解码端解析出预测模式,预测得到目标解码块的预测块并叠加经码流传输而获取的时域残差块,可得到重建块。
经过历代的数字视频编解码标准发展,非角度预测模式保持相对稳定,有均值模式和平面模式。角度预测模式则随着数字视频编解码标准的演进而不断增加。以国际数字视频编码标准H系列为例,H.264/AVC标准仅有8种角度预测模式和1种非角度预测模式;H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。在H.266/VVC中,帧内预测模式被进一步拓展,对于亮度块共有67种传统预测模式和非传统的预测模式矩阵加权帧内预测(Matrix weighted intra-frame prediction,MIP)模式,其中这67种传统预测模式包括:平面(planar)模式、直流(DC)模式和65种角度预测模式。其中,平面模式通常用于处理一些纹理存在渐变的块,DC模式顾名思义通常用于处理一些平坦区域,而角度预测模式通常用于处理角度纹理比较明显的块。
需要说明的是,本申请中,用于帧内预测的当前块可以是正方形块,也可以是矩形块。
进一步的,由于帧内预测块都是正方形的所以各个角度预测模式使用的概率是相等的,当前块的长宽不等时,对于水平类的块(宽大于高)上边的参考像素使用概率大于左边参考像素的使用概率,对于 垂直类的块(高大于宽)上边的参考像素使用概率小于左边参考像素的使用概率。在对矩形块预测时,将传统的角度预测模式转换为宽角度预测模式,可利用宽角度预测模式对矩形块进行预测时,当前块的预测角度范围大于利用传统角度预测模式对矩形块进行预测时的预测角度范围。可选的,使用宽角度预测模式时,可以仍然使用传统角度预测模式的索引发出信号,相应的,解码端在在收到信号后可将传统角度预测模式再转换为宽角度预测模式,由此,帧内预测模式的总数和帧内模式的编码方法均可以保持不变。
进一步的,可以基于当前块的尺寸来确定或选择要执行的帧内预测模式;例如,可以基于当前块的尺寸来确定或选择宽角度预测模式对当前块进行帧内预测;例如,在当前块是矩形块(宽度和高度具有不同的尺寸)时,可以使用宽角度预测模式对当前块进行帧内预测。其中,当前块的宽高比可以用于确定宽角度预测模式被替换的角度预测模式和替换后的角度预测模式。例如在预测当前块时,可以选择具有不超过当前块的对角(从当前块的左下角到右上角)的角度的任何帧内预测模式,作为替换后的角度预测模式。
下面对本申请涉及的其他帧内预测模式进行介绍:
(1)、基于矩阵的帧内预测(Matrix based Intra Prediction,MIP)模式。
MIP模式也可称为矩阵加权帧内预测(Matrix weighted Intra Prediction)模式,MIP模式涉及的流程可以分为三个主要步骤,其分别是下采样过程、矩阵相乘过程以及上采样过程。具体来说,首先通过下采样过程下采样空间相邻重建样本,然后,将得到下采样后的样本序列作为矩阵相乘过程的输入向量,即将下采样过程的输出向量作为矩阵相乘过程的输入向量,与预先设定好的矩阵相乘并加上偏置向量,并输出计算之后的样本向量;将矩阵相乘过程的输出向量作为上采样过程的输入向量,通过上采样得到最终的预测块。
图2是本申请实施例提供的MIP模式的示意图。
如图2所示,MIP模式在下采样过程中通过平均当前编码单元上边相邻的重建样本后得到上相邻下采样重建样本向量bdry top,通过平均左相邻的重建样本后得到左相邻下采样重建样本向量bdry left。得到bdry top和bdry left后,将其作为矩阵相乘过程的输入向量bdry red,具体地,可通基于bdry red的顶行向量bdry top red、bdry left、A k·bdry red+b k得到样本向量,其中A k为预先设定好的矩阵,b k为预先设定好的偏置向量,k为MIP模式的索引。得到样本向量后通过线性插值对其进行上采样,以得到与实际编码单元样本数相符的预测样本块。
换言之,为了对一个宽度为W高度为H的块进行预测,MIP需要当前块左侧一列的H个重建像素和当前块上侧一行的W个重建像素作为输入。MIP按如下3个步骤生成预测块:参考像素平均(Averaging),矩阵乘法(Matrix Vector Multiplication)和插值(Interpolation)。其中MIP的核心是矩阵乘法,其可以认为是用一种矩阵乘法的方式用输入像素(参考像素)生成预测块的过程。MIP提供了多种矩阵,预测方式的不同可以体现在矩阵的不同上,相同的输入像素使用不同的矩阵会得到不同的结果。而参考像素平均和插值的过程是一种性能和复杂度折中的设计。对于尺寸较大的块,可以通过参考像素平均来实现一种近似于降采样的效果,使输入能适配到比较小的矩阵,而插值则实现一种上采样的效果。这样就不需要对每一种尺寸的块都提供MIP的矩阵,而是只提供一种或几种特定的尺寸的矩阵即可。随着对压缩性能的需求的提高,以及硬件能力的提高,下一代的标准中也许会出现复杂度更高的MIP。
对于MIP模式而言,MIP模式可以由神经网络简化而来,例如其采用的矩阵可以是基于训练得到,因此,MIP模式拥有较强的泛化能力和传统预测模式达不到的预测效果。MIP模式可以是对一个基于神经网络的帧内预测模型经过多次硬件和软件复杂度简化而得到的模型,在大量训练样本的基础上,多种预测模式代表着多种模型和参数,能够较好的覆盖自然序列的纹理情况。
MIP有些类似于平面模式,但显然MIP比平面模式更复杂,灵活性也更强。
需要说明的是,对于不同块尺寸的编码单元,MIP模式的个数可以有所不同。示例性地,对于4x4大小的编码单元,MIP模式有16种预测模式;对于8x8、宽等于4或高等于4的编码单元,MIP模式有8种预测模式;其他尺寸的编码单元,MIP模式有6种预测模式。同时,MIP模式有一个转置功能,对于符合当前尺寸的预测模式,MIP模式在编码器侧可以尝试转置计算。因此,MIP模式不仅需要一个使用标志位来表示当前编码单元是否使用MIP模式,同时,若当前编码单元使用MIP模式,则额外还需要传输一个转置标志位和索引标志位到解码器。转置标志位可由定长编码方式(Fixed Length,FL)二值化,长度为1。索引标志位由截断二进制编码方式(Truncated Binary,TB)二值化,以4x4大小的编码单元为例,MIP模式有16种预测模式,索引标志位可以是5或6位的截断二进制标识。
(2)、解码器侧帧内模式导出(Decoder side Intra Mode Derivation,DIMD)模式。
DIMD模式主要核心点在于帧内预测的模式在解码器使用与编码器相同的方法导出帧内预测模式, 以此避免在码流中传输当前编码单元的帧内预测模式索引,达到节省比特开销的目的。
DIMD模式的具体流程可分为以下两个主要步骤:
步骤1:导出预测模式。
图3是本申请实施例提供的基于DIMD导出预测模式的示意图。
如图3的(a)所示,DIMD利用重建区域中模板中的像素(当前块左侧和上侧的重建像素)导出预测模式。例如,模板可以包括当前块的上方三行相邻重建样本、左侧三列相邻重建样本以及左上方对应相邻重建样本,基于此,可按照窗口(例如图3的(a)所示或如图3的(b)所示的窗口)在模板内确定出多个相邻重建样本对应的多个梯度值,其中每一个梯度值可用于适配出与其梯度方向相适应的一种帧内预测模式(Intra prediction mode,IPM),基于此,编码器可将多个梯度值中最大和次大的梯度值适配的预测模式作为导出的预测模式。例如,如图3的(b)所示,对于4×4大小的块,针对所有需要确定梯度值的相邻重建样本进行分析并得到的对应的梯度直方图(histogram of gradients),例如,如图3的(c)所示,对于其它大小的块,对所有需要确定梯度值的相邻重建样本进行分析并得到的对应的梯度直方图;最终,将梯度直方图中梯度最大和次大的梯度所对应的预测模式作为导出的预测模式。
当然,本申请中的梯度直方图仅为用于确定导出的预测模式的示例,具体实现时可以用多种简单的形式实现,本申请对此不作具体限定。此外,本申请对统计梯度直方图的方式不作限定,例如,可以利用可利用索贝尔算子或其他方式统计梯度直方图。另外,在其他可替代实施例中,本申请涉及的梯度值也可等同替换为梯度幅度值,本申请对此不作具体限定。
步骤2:导出预测块。
图4是本申请实施例提供的基于DIMD导出预测块的示意图。
如图4所示,编码器可对3个帧内预测模式(平面模式以及基于DIMD导出的2个帧内预测模式)的预测值进行加权。编解码器使用同样的预测块导出方式得到当前块的预测块。假设最大梯度值对应的预测模式为预测模式1,次大梯度值对应的预测模式为预测模式2,编码器判断以下两个条件:
1、预测模式2的梯度不为0;
2、预测模式1和预测模式2均不为平面模式或者DC预测模式。
若上述两个条件不同时成立,则仅使用预测模式1计算当前块的预测样本值,即对预测模式1应用普通预测预测过程;否则,即上述两个条件均成立,则使用加权求平均方式导出当前块的预测块。具体方法为:平面模式占据1/3的加权权重,剩下2/3为预测模式1和预测模式2的总权重,例如将预测模式1的梯度幅度值除以预测模式1的梯度幅度值和预测模式2的梯度幅度值的和作为预测模式1的加权权重,将预测模式2的梯度幅度值除以预测模式1的梯度幅度值和预测模式2的梯度幅度值的和作为预测模式2的加权权重;将基于上述三种预测模式得到的预测块,即对基于平面模式、预测模式1和预测模式2分别得到的预测块1、预测块2以及预测块3进行加权求平均得到当前编码单元的预测块。解码器以同样步骤得到预测块。
换言之,上述步骤2中具体权重计算如下:
Weight(PLANAR)=1/3;
Weight(mode1)=2/3*(amp1/(amp1+amp2));
Weight(mode2)=1–Weight(PLANAR)–Weight(mode1);
其中,mode1和mode2分别代表预测模式1和预测模式2,amp1和amp2分别代表预测模式1的梯度幅度值和预测模式2的梯度幅度值。DIMD模式需要传输一个标志位传输到解码器,该标志位用于表示当前编码单元是否使用DIMD模式。
当然,上述加权求平均方式仅为本申请的示例,不应理解为对本申请的限定。
总结来说,DIMD利用重建像素的梯度分析来筛选帧内预测模式,而且可以将2个帧内预测模式再加上平面模式根据分析结果进行加权。DIMD的好处在于如果当前块选择了DIMD模式,那么码流中不需要再去指示具体使用了哪种帧内预测模式,而是由解码器自己通过上述流程导出,一定程度上节省了开销。
(3)、基于模板的帧内模式导出(Template based Intra Mode Derivation,TIMD)模式。
TIMD模式的技术原理与上述DIMD模式的技术原理比较近似,都是利用编解码器同样操作导出预测模式的方式来节省传输模式索引开销。TIMD模式主要可以理解成两个主要部分,首先,根据模板计算各预测模式的代价信息,最小代价及次小代价对应的预测模式将被选中,最小代价对应的预测模式记为预测模式1,次小代价对应的预测模式记为预测模式2;若次小代价的数值(costMode2)与最小代价的数值(costMode1)比例满足预设条件,如costMode2<2*costMode1,则将预测模式1与预测模式2各对应的预测块可按照预测模式1与预测模式2各对应的权重进行加权融合,进而得到最终的预测块。
示例性地,预测模式1与预测模式2各对应的权重根据以下方式确定:
weight1=costMode2/(costMode1+costMode2);
weight2=1-weight1;
其中,weight1即为预测模式1对应预测块的加权权重,weight2即为预测模式2对应预测块的加权权重。但若次小代价的数值costMode2与最小代价的数值costMode1比例不满足预设条件,则不做预测块之间的加权融合,预测模式1对应预测块即为TIMD的预测块。
需要说明的是,若采用TIMD模式对当前块进行帧内预测时,若当前块的重建样本模板中不包含可用相邻重建样本,则TIMD模式选择平面模式对当前块进行帧内预测,即不进行不加权融合。与DIMD模式相同,TIMD模式需要传输一个标志位到解码器,以表示当前编码单元是否使用TIMD模式。
编码器或解码器计算各预测模式的代价信息过程主要为:根据与模板区域的上侧或左侧相邻重建样本对模板区域内的样本进行帧内模式预测,预测过程与原有帧内预测模式相同;例如利用DC模式对模板区域内的样本进行帧内模式预测时,计算整个编码单元的均值;再如利用角度预测模式对模板区域内的样本进行帧内模式预测时,根据模式选择对应的插值滤波器并根据规则插值出预测样本。此时,可根据模板区域内的预测样本和重建样本,计算该区域预测样本与重建样本之间的失真,即为当前预测模式的代价信息。
图5是本申请实施例提供的TIMD使用的模板的示意图。
如图5所示,若当前块为宽等于M且高等于N的编码单元,编解码器可以基于宽等于2(M+L1)+1且高等于2(N+L2)+1的编码单元中选择当前块的参考模板(Reference of template)计算当前块的模板,若当前块的模板中不包含可用相邻重建样本,则TIMD模式选择平面模式对当前块进行帧内预测。例如,所述当前块的模板可以是图5中与当前CU的左侧和上侧相邻的样本,即斜线填充区域中没有可用重建样本。也即是说,若斜线填充区域中没有可用相邻重建样本,则TIMD模式选择平面模式对当前块进行帧内预测。
值得注意的是,除了边界情况,在编解码当前块时,当前块的左侧和上侧理论上是可以得到重建值的,即当前块的模板中包含可用相邻重建样本。具体实现中,解码器可以使用某一个帧内预测模式在模板上进行预测,并且将预测值和重建值进行比较,以得到该帧内预测模式在模板上的代价。比如绝对误差和(Sum of Absolute Differences,SAD),绝对变换差和(Sum of Absolute Transformed Difference,SATD)或误差平方和(Sum of Squares for Error,SSE)等。由于模板和当前块是相邻的,因此,模板内的重建样本和当前块内的像素具有相关性,因此,可以用一个预测模式在模板上的表现来估计这个预测模式在当前块上的表现。TIMD将一些候选的帧内预测模式在模板上进行预测,得到候选的帧内预测模式在模板上的代价,取代价最低的一个或2个帧内预测模式作为当前块的帧内预测值。如果2个帧内预测模式在模板上的代价差距不大,将2个帧内预测模式的预测值进行加权平均可以得到压缩性能的提升。可选的,2个预测模式的预测值的权重跟上述的代价有关,例如权重跟代价成反比。
总结来说,TIMD利用帧内预测模式在模板上的预测效果来筛选帧内预测模式,而且可以将2个帧内预测模式根据模板上的代价进行加权。TIMD的好处在于如果当前块选择了TIMD模式,则码流中不需要再去指示具体使用了哪种帧内预测模式,而是由解码器自己通过上述流程导出,一定程度上节省了开销。
通过上述简单地对几个帧内预测模式的介绍不难发现,DIMD模式的技术原理和TIMD模式的技术原理接近,都是利用解码器执行与编码器相同的操作来推断出当前编码单元的预测模式。这种预测模式在复杂度可接受的情况下能够省去对预测模式的索引的传输,达到节省开销的作用,提高压缩效率。但受限于可参考信息的局限性和本身并没有太多提高预测质量的部分,DIMD模式和TIMD模式在大面积纹理特性一致的区域效果较好,若纹理略有变化或者模板区域不能覆盖,则这种预测模式的预测效果较差。
此外,不管针对DIMD模式还是针对TIMD模式,其都对基于多种传统预测模式得到的预测块进行了融合或者都对基于多种传统预测模式得到的预测块进行了加权处理,预测块的融合可以生成单一预测模式所达不到的效果,DIMD模式虽然引入平面模式作为额外加权预测模式,以增加相邻重建样本与预测样本之间的空间关联性,进而能够提升帧内预测的预测效果,但是,由于平面模式的预测原理相对简单,对于一些右上角与左下角相差明显的预测块,将平面模式作为额外加权预测模式可能会带来反作用。
下面对与残差块进行变换相关的内容进行说明。
在进行编码时,会先对当前块进行预测,预测利用空间或者时间上的相关性能得到一个跟当前块相同或相似的图像。对一个块来说,预测块和当前块是完全相同的情况是有可能出现的,但是很难保证一个视频中的所有块都如此。特别是对自然视频,或者说相机拍摄的视频,因为图像的纹理复杂,且图像有噪音的存在等因素,通常预测块和当前块很像,但是有差异。而且视频中不规则的运动,扭曲形变, 遮挡,亮度等的变化,当前块很难被完全预测。因此混合编码框架会将当前块的原始图像减去预测图像得到残差图像,或者说当前块减去预测块得到残差块。残差块通常要比原始图像简单很多,因而预测可以显著提升压缩效率。对残差块也不是直接进行编码,而是通常先进行变换。变换是把残差图像从空间域变换到频率域,去除残差图像的相关性。残差图像变换到频率域以后,由于能量大多集中在低频区域,变换后的非零系数大多集中在左上角,然后利用量化对残差块进行进一步压缩。可选的,由于人眼对高频不敏感,高频区域可以使用更大的量化步长。
图像变换技术是为了能够用正交函数或正交矩阵表示原始图像而对原图像所作的变换,该变换是二维线性可逆的。一般称原始图像为空间域图像,称变换后的图像为转换域图像(也称为频率域),转换域图像可反变换为空间域图像。经过图像变换后,一方面能够更有效地反映图像自身的特征,另一方面也可使能量集中在少量数据上,更有利于图像的存储、传输及处理。
下面对本申请涉及的与变换有关的技术进行说明。
结合至图像视频编码领域,编码器在得到残差块后,可对残差块进行变换。变换的方式包括主变换和二次变换。主变换的方式包括但不限于:离散余弦变换(Discrete Cosine Transform,DCT)和离散正弦变换(Discrete Sine Transform,DST)。视频编解码中可使用的DCT包括但不限于DCT2、DCT8型;视频编解码中可使用的DST包括但不限于DST7型。由于DCT具有很强的能量集中特性,因此,原始图像经过DCT变换以后只有部分区域(例如左上角区域)存在非零系数。当然,在视频编解码中,图像是分割成块来处理的,因而变换也是基于块来进行的。
值得注意的是,由于图像都是2维的,而直接进行二维的变换运算量和内存开销都是硬件条件所不能接受的,因此,上述DCT2型,DCT8型,DST7型变换通常都是拆分成水平方向和竖直方向的一维变换,即分成两步进行的。如先进行水平方向的变换再进行竖直方向的变换,或者先进行竖直方向的变换再进行水平方向的变换。上述变换方法对水平方向和竖直方向的纹理比较有效,但是对斜向的纹理效果就会差一些。由于水平和竖直方向的纹理是最常见的,因而,上述的变换方法对提升压缩效率是非常有用的。
编码器可以在主变换(primary transform)的基础上进行二次变换,以进一步提升压缩效率,。
主变换可用于处理水平和竖直方向的纹理,主变换也可称为基础变换,例如主变换包括但不限于:上述DCT2型,DCT8型,DST7型变换。二次变换用于处理斜向纹理,例如,二次变换包括但不限于:低频不可分变换(low frequency non-separable transform,LFNST)。在编码端,二次变换用于主变换之后且量化之前。在解码端,二次变换用于反量化之后且反主变换之前。
图6是本申请实施例提供的LFNST的示例。
如图6所示,在编码端,LFNST对基础变换后的左上角的低频系数进行二次变换。主变换通过对图像进行去相关性,把能量集中到左上角。而二次变换对主变换的低频系数再去相关性。在编码端,将16个系数输入到4x4的LFNST时,输出的是8个系数;将64个系数输入到8x8的LFNST时,输出的是16个系数。在解码端,将8个系数输入到4x4的反LFNST时,输出的是16个系数;将16个系数输入到8x8的反LFNST时,输出的是64个系数。
编码器对当前图像中的当前块进行二次变换时,可以采用选择的变换集中的某一个变换核对当前块的残差块进行变换。以二次变换为LFNST为例,变换核可以指用于对某个斜向纹理进行变换的变换核的集合,或变换集可以包括用于对某些类似的斜向纹理进行变换的变换核的集合。当然,在其他可替代实施例中,变换核也可称为或等同替换为变换矩阵、变换核类型或基函数等具有类似或相同含义的术语,变换集也可称为或等同替换为变换矩阵组、变换核类型组或基函数组等具有类似或相同含义的术语,本申请对此不作具体限定。
图7是本申请实施例提供的LFNST的变换集的示例。
如图7的(a)至(d)所示,LFNST可具有4个变换集,且同一个变换集中的变换核具有类似的斜向纹理。示例性地,图7中的(a)所示的变换集可以是索引为0的变换集,图7中的(b)所示的变换集可以是索引为1的变换集,图7中的(c)所示的变换集可以是索引为2的变换集,图7中的(d)所示的变换集可以是索引为3的变换集。
下面对LFNST应用于帧内编码的块的相关方案进行说明。
帧内预测使用当前块周边已重建的像素作为参考对当前块进行预测,由于目前视频都是从左向右从上向下编码的,因而当前块可使用的参考像素通常在左侧和上侧。角度预测按照指定的角度将参考像素平铺到当前块作为预测值,这意味着预测块会有明显的方向纹理,而当前块经过角度预测后的残差在统计上也会体现出明显的角度特性。因而,LFNST所选用的变换集可以跟帧内预测模式进行绑定,即确定了帧内预测模式以后,LFNST可以使用纹理方向与帧内预测模式的角度特征相适应的变换集(Transform set),以可以节省比特开销。
示例性地,假设LFNST总共有4个变换集,每个变换集有2个变换核。表1给出了帧内预测模式和变换集的对应关系。
表1
Figure PCTCN2022103654-appb-000001
如表1所示,帧内预测模式0~81可与4个变换集的索引关联。
值得注意的是,色度帧内预测使用的跨分量预测模式为81到83,亮度帧内预测并没有这几种模式。LFNST的变换集可以通过转置来用一个变换集对应处理更多的角度,举例来说,帧内预测模式13~23和帧内预测模式45~55的模式都对应变换集2,但是,帧内预测模式13~23明显是接近于水平的模式,而帧内预测模式45~55明显是接近于竖直的模式,帧内预测模式45~55的模式对应的变换通过转置来进行适配。
在具体实现中,由于LFNST共有4个变换集,编码端可根据当前块使用的帧内预测模式确定LFNST使用哪一个变换集,进而在确定的一个变换集中确定使用的变换核。相当于,可以利用帧内预测模式和LFNST的变换集之间的相关性,从而减少了选择LFNST的变换集在码流中的传输。而当前块是否会使用LFNST,以及如果使用LFNST,是使用一个变换集中的第一个还是第二个变换核,可以通过码流和一些条件来确定的。
当然,考虑到普通帧内预测模式有67种,而LFNST只有4个变换集,因此,多种相近的角度预测模式只能对应一个变换集,这是一种性能和复杂度折中的考虑,因为每个变换集都需要占用存储空间来保存变换集中的变换核的系数。而随着对压缩效率要求的提升,以及硬件能力的提升,LFNST也可以设计的更加复杂。比如使用更大的变换集,更多的变换集,以及每个变换集使用更多的变换核。
示例性地,表2给出了帧内预测模式和变换集的另一种对应关系。
表2
帧内预测模式 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
变换集的索引 2 2 2 2 2 2 2 2 2 2 2 2 2 2
帧内预测模式 0 1 2 3 4 5 6 7 8 9 10 11 12 13
变换集的索引 0 1 2 3 4 5 6 7 8 9 10 11 12 13
帧内预测模式 14 15 16 17 18 19 20 21 22 23 24 25 26 27
变换集的索引 14 15 16 17 18 19 20 21 22 23 24 25 26 27
帧内预测模式 28 29 30 31 32 33 34 35 36 37 38 39 40 41
变换集的索引 28 29 30 31 32 33 34 33 32 31 30 29 28 27
帧内预测模式 42 43 44 45 46 47 48 49 50 51 52 53 54 55
变换集的索引 26 25 24 23 22 21 20 19 18 17 16 15 14 13
帧内预测模式 56 57 58 59 60 61 62 63 64 65 66 67 68 69
变换集的索引 12 11 10 9 8 7 6 5 4 3 2 2 2 2
帧内预测模式 70 71 72 73 74 75 76 77 78 79 80      
变换集的索引 2 2 2 2 2 2 2 2 2 2 2      
如表2所示,使用35个变换集,每个变换集使用3个变换核。变换集与帧内预测模式的对应关系可以实现为:对于帧内预测模式0~34,正向对应变换集0~34,即预测模式的编号越大则变换集的索引越大;对于帧内预测模式35~67,由于转置的原因,其反向对应2~33,即预测模式的编号越大则变换集的索引越小;对于剩余的预测模式,其均可以统一对应到索引为2的变换集。也就是说,如果不考虑转置,一种帧内预测模式对应一个变换集,按这种设计,每一个帧内预测模式对应的残差可以得到更适配的变换集,压缩性能也会提高。
当然,宽角度预测模式理论上也可以与变换集做到一对一,但是这种设计的性价比较低,本申请对此不再作具体说明。需要指出的是,LFNST仅是二次变换的一种示例,不应理解为对二次变换的限制。 例如,LFNST是不可分离的二次变换,在其他可替代实施例中,还可以采用可分离的二次变换提升斜向纹理的残差的压缩效率,本申请对此不作具体限定。
图8是本申请实施例提供的解码框架200的示意性框图。
如图8所示,该解码框架200可包括熵解码单元210、反变换反量化单元220、残差单元230、帧内预测单元240、帧间预测单元250、环路滤波单元260、解码图像缓存单元270。其中,熵解码单元210接收并解析码流后,以获取预测块和频域残差块,针对频域残差块,通过反变换反量化单元220进行反变换与反量化等步骤,可获取时域残差块,残差单元230将帧内预测单元240或帧间预测单元250预测得到的预测块叠加至经过通过反变换反量化单元220进行反变换与反量化之后的时域残差块,可得到重建块。
图9是本申请实施例提供的解码方法300的示意性流程图。应理解,该解码方法300可由解码器执行。例如该解码方法300可由图6所示的解码框架200执行。为便于描述,下面以解码器为例进行说明。
如图9所示,所述解码方法300可包括:
S310,解析当前序列的码流获取当前块的第一变换系数;
S320,确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
S330,基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
S340,对所述第二变换系数进行第二变换,得到所述当前块的残差块;
S350,基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
示例性地,解码器对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式时,可以先计算所述第一模板区域内的重建样本的梯度值,然后将与所述第一模板区域内的重建样本中梯度值最大的重建样本的梯度方向相匹配的帧内预测模式,确定为使用所述DIMD模式导出的帧内预测模式。或者说,解码器可以基于所述第一模板区域内的重建样本,通过遍历帧内预测模式的方式,计算每种帧内预测模式对应的梯度值,并将梯度值最大的帧内预测模式,确定为使用所述DIMD模式导出的帧内预测模式。
示例性地,解码器对所述当前块的预测块(或所述最优MIP模式的输出向量)使用所述DIMD模式导出的帧内预测模式时,可以先计算所述当前块的预测块(或所述最优MIP模式的输出向量)内的预测样本的梯度值,然后将与所述当前块的预测块(或所述最优MIP模式的输出向量)内的预测样本中梯度值最大的预测样本的梯度方向相匹配的帧内预测模式,确定为使用所述DIMD模式导出的帧内预测模式。或者说,解码器可以基于所述当前块的预测块(或所述最优MIP模式的输出向量)内的预测样本,通过遍历帧内预测模式的方式,计算每种帧内预测模式对应的梯度值,并将梯度值最大的帧内预测模式,确定为使用所述DIMD模式导出的帧内预测模式。
示例性地,所述第一变换用于处理所述当前块中沿倾斜方向上的纹理。
示例性地,所述第二变换用于处理所述当前块中沿水平方向上的纹理和沿竖直方向上的纹理。
应当理解,所述第一变换为编码端的二次变换的反变换,所述第二变换为编码端的基础变换的反变换。例如,所述第一变换可以是反(逆)LFNST,所述第二变换可以是反(逆)DCT2型,反(逆)DCT8型或反(逆)DST7型等。
当然,TMMIP技术与LFNST的适配方法也适用于其他的二次变换方法。例如,LFNST是不可分离的二次变换,在其他可替代实施例中,TMMIP技术也可适用于可分离的二次变换,本申请对此不作具体限定。
值得注意的是,编码器或解码器对当前块进行预测时,有可能使用PLANAR模式对应的变换集进行LFNST,究其原因在于:LFNST使用的变换核是由传统的帧内预测模式的数据集进行深度学习训练得到,因此,在普通的帧内预测过程中,LFNST使用的变换核通常也是在传统的帧内预测模式对应的LFNST的变换集中选择的变换核。然而,编码器或解码器有可能采用非传统的帧内预测模式对当前块进行预测,此时考虑到平面(planar)模式通常用于处理一些纹理存在渐变的块,而LFNST用于处理斜向纹理,因此,通常会将平面模式输出的预测块的纹理信息和传统的帧内预测模式中的平面(planar)模式的纹理信息作为一类纹理处理,即编码器或解码器利用非传统的帧内预测模式对当前块进行预测时,都使用平面模式对应的变换集进行LFNST。例如,编码器采用MIP模式对当前块预测时,使用平 面模式对应的变换集进行LFNST。然而,由于MIP模式所表示的意义与传统的帧内预测模式不同,即传统的帧内预测模式带有明显的方向性,而MIP模式仅仅是矩阵系数的索引,因此,平面模式虽然用于处理一些纹理存在渐变的块,但并不一定符合当前块的纹理信息,即LFNST使用的变换集的纹理方向并不一定符合当前块的纹理方向,降低了当前块的解压缩性能。
有鉴于此,本申请实施例通过引入第一帧内预测模式,并基于所述第一帧内预测模式对应的变换集对所述当前块的第一变换系数进行第一变换,能够进一步提升所述当前块的解压缩性能。尤其是,解码器采用非传统的帧内预测模式对当前块进行预测时,可以避免直接使用平面模式对应的变换集进行第一变换,而所述第一帧内预测模式对应的变换集在一定程度上能够反映当前块的纹理方向,进而能够提升当前块的解压缩性能。
下面结合表1至表3的测试结果对本申请提供的方案的有益效果进行说明。
其中,表1是利用所述最优MIP模式和次优MIP模式对当前块进行加权预测、且将所述第一帧内预测模式设计为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式时,对测试序列进行的测试得到的结果,表2是利用所述最优MIP模式和对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式对当前块进行加权预测、且将所述第一帧内预测模式设计为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式时,对测试序列进行的测试得到的结果。
表1
Figure PCTCN2022103654-appb-000002
表2
Figure PCTCN2022103654-appb-000003
如表1至表2所示,增量比特率(BD-rate)为负代表基于本申请提供的方案相对于ECM2.0的测试结果的性能提升。从测试结果中可以看到,在通测条件下,表1至表2的测试结果均能够提供平均0.20%的亮度性能增益,且4K序列表现不俗。值得注意的是,ECM2.0集成的TIMD预测模式在ECM1.0的基础上有着较高的复杂度,同时仅有0.4%的性能增益,在当前帧内编码性能越来越难拿到的情况下,本申请提供的方案在不增加解码器复杂度的情况下,能够带来不错的性能增益,特别是对于4K类型的视频序列,性能增益明显。此外,由于服务器负载原因,即使编解码时间略有波动,理论上解码时间基本不会增加。
在一些实施例中,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
换言之,对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式的过程,可以在对所述最优MIP模式输出的向量进行上采样之前执行,也可以在对所述最优MIP模式输出的向量进行上采样之后执行,本申请对此不作具体限定。
解码器在参考样本输入所述最优MIP模式的预测矩阵并得到输出的向量之后,所述最优MIP模式 输出的向量最多有64个预测样本,相比于上采样后的预测块拥有的至多数千个预测样本而言,解码器在对所述最优MIP模式输出的向量进行上采样之前,使用所述DIMD模式导出的所述第一帧内预测模式,能够减小计算复杂度,进而提升当前块的解压缩性能。例如,解码器在上采样之前使用所述DIMD计算每个传统预测模式的梯度幅度值能有效减少计算复杂度。
在一些实施例中,所述S350可包括:
解码器基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
示例性地,解码器基于用于对所述当前块进行预测的预测模式的模式类型,确定所述第一帧内预测模式。
示例性地,解码器基于用于对所述当前块进行预测的预测模式的导出模式,确定所述第一帧内预测模式。
示例性地,所述用于对所述当前块进行预测的预测模式的导出模式包括但不限于:MIP模式、所述DIMD模式和所述TIMD模式。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
换言之,若解码器利用所述最优MIP模式和次优MIP模式对当前块进行加权预测,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
本实施例中,在所述用于对所述当前块进行预测的预测模式包括所述最优MIP模式和所述次优MIP模式的情况下,解码器优先将对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式作为所述第一帧内预测模式时,能够使得所述第一帧内预测模式对应的变换集的纹理方向能够同时贴合最优MIP模式让当前块的预测块表现出的纹理特性以及所述次优MIP模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能;解码器优先将对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式作为所述第一帧内预测模式时,可以在确定所述最优MIP模式的过程中直接获取所述最优MIP模式的输出向量,相当于,能够在降低解压缩复杂度的基础上,使得所述第一帧内预测模式对应的变换集的纹理方向能够贴合最优MIP模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能。
当然,在其他可替代实施例中,所述用于对所述当前块进行预测的预测模式包括所述最优MIP模式和次优MIP模式时,解码器也可以将对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式或由所述TIMD模式导出的帧内预测模式,确定为所述第一帧内预测模式,本申请对此不作具体限定。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
换言之,若所述第二帧内预测模式为使用所述TIMD模式导出的帧内预测模式,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
本实施例中,在所述第二帧内预测模式包括由所述TIMD模式导出的帧内预测模式的情况下,解码器优先将对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式作为所述第一帧内预测模式时,能够使得所述第一帧内预测模式对应的变换集的纹理方向能够同时贴合最优MIP模式让当前块的预测块表现出的纹理特性以及由所述TIMD模式导出的帧内预测模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能;解码器优先将由所述TIMD模式导出的帧内预测模式作为所述第一帧内预测模式时,可以将所述第二帧内预测模式直接确定为所述第一帧内预测模式,相当于,能够在降低解压缩复杂度的基础上,使得所述第一帧内预测模式对应的变换集的纹理方向能够贴合最优MIP模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能。
当然,在其他可替代实施例中,所述第二帧内预测模式为所述TIMD模式导出的帧内预测模式时,解码器也可以确定将对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,确定为所述第一帧内预测模 式,本申请对此不作具体限定。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
换言之,若所述第二帧内预测模式为使用DIMD模式导出的帧内预测模式,则解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
本实施例中,在所述第二帧内预测模式包括对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式的情况下,解码器优先将对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式作为所述第一帧内预测模式时,能够使得所述第一帧内预测模式对应的变换集的纹理方向能够同时贴合最优MIP模式让当前块的预测块表现出的纹理特性以及由所述DIMD模式导出的帧内预测模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能;解码器优先将对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式作为所述第一帧内预测模式,可以将所述第二帧内预测模式直接确定为所述第一帧内预测模式,相当于,能够在降低解压缩复杂度的基础上,使得所述第一帧内预测模式对应的变换集的纹理方向能够贴合最优MIP模式让当前块的预测块表现出的纹理特性,进而能够尽可能的提升所述当前块的解压缩性能。
当然,在其他可替代实施例中,所述第二帧内预测模式为对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式时,解码器也可以确定将对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式,确定为所述第一帧内预测模式,本申请对此不作具体限定。
在一些实施例中,所述方法300还可包括:
确定第二帧内预测模式;
其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块。
示例性地,解码器基于所述最优MIP模式和确定所述第二帧内预测模式对当前块进行预测的过程简称为模板匹配的MIP(Template Matching MIP,TMMIP)技术、基于模板匹配的MIP预测模式导出方法、或TMMIP融合增强技术;也即是说,解码器在获取当前块的残差块后,可以基于由导出的最优MIP模式和所述第二帧内预测模式对当前块的预测过程进行性能增强。或者说,TMMIP技术可利用最优MIP预测模式与以下中的至少一项对当前块的预测过程进行性能增强:次优MIP预测模式,由TIMD模式导出的帧内预测模式,对与所述当前块相邻的第一模板区域内的重建样本使用DIMD模式导出的帧内预测模式。
本实施例中,解码器基于最优MIP模式和第二帧内预测模式对当前块进行预测,并将所述最优MIP模式设计为基于多个MIP模式的失真代价确定用于预测所述当前块的最优MIP模式,将所述第二帧内预测模式设计为包括以下中的至少一项:基于所述多个MIP模式的失真代价确定的用于预测所述当前块的次优MIP模式、对与所述当前块相邻的第一模板区域内的重建样本使用DIMD模式导出的帧内预测模式、由TIMD模式导出的帧内预测模式;相当于,有利于避免解码器通过解析码流获取所述MIP模式,与传统的MIP技术相比,能够有效减少编码单元级的比特开销,进而能够提高当前块的解压缩效率。
具体地,MIP模式的比特开销相比于其他帧内预测模式较大,其不仅需要一个使用标志位表示是否使用MIP模式,还需要一个转置标志位表示是否转置使用MIP模式,最后也是最大开销部分,其需要使用截断二进制编码表示MIP模式的索引。MIP模式是基于神经网络技术简化而来的技术,与传统插值滤波预测技术有较大的不同,对于一些特殊的纹理,MIP模式往往比传统帧内预测模式效果更好。但其较大的标志位开销是MIP模式的缺陷,以4x4大小的编码单元为例,MIP模式有16种预测模式,但其比特开销包括1个MIP模式的使用标志位、1个MIP模式的转置标志位和5或6位的截断二进制标识。有鉴于此,本申请利用解码器自主确定用于预测当前块的最优MIP模式并基于最优MIP模式确定当前块的帧内预测模式的方式,能够节省最多5或6个比特位的开销,能够有效减少编码单元级的比特开销,进而能够提高解压缩效率。
此外,针对每个编码单元节省的至多5或6个比特位开销的前提是基于模板匹配预测模式的导出算 法足够精确,若基于模板匹配预测模式的导出算法的准确率过低,则会导致编码端导出的MIP模式与编码端导出的MIP模式不同,进而降低编解码性能。或者说,编解码性能有赖于基于模板匹配预测模式的导出算法的准确率。
然而,无论是传统帧内预测模式的基于模板导出算法还是帧间的基于模板匹配导出算法的准确率都不尽人意,虽然能够节省比特开销并提升压缩效率,但随着基于模板匹配预测模式的导出算法的预测模式的数量的增加,基于模板匹配预测模式的导出算法带来的编码单元级额外比特开销已经不太能够使后续技术单纯依靠基于模板匹配预测模式的导出算法来提升压缩效率。因此,基于模板匹配预测模式的导出算法亟需在提升压缩效率的基础上提升编解码性能。作为一种可能的实现方式,可以在力求节省编码单元级的比特开销的同时,可以通过创造出不同的新颖的预测块来保证预测多样性以及选择多样化,进而提升编解码性能。有鉴于此,本申请通过融合所述最优MIP模式和所述第二帧内预测模式,即基于所述最优MIP模式和所述第二帧内预测模式对当前块进行融合预测,避免了将所述最优MIP模式完全取代基于率失真代价计算得到的最优预测模式,能够兼顾预测准确性与预测多样性,进而能够提升解压缩性能。
尤其是,由于TMMIP技术是结合最优MIP模式和第二帧内预测模式对当前块进行预测的,而利用不同的预测模式对当前块进行预测得到的预测块可能具有不同的纹理特征,因此,如果当前块选择了TMMIP技术,则说明最优MIP模式有可能会使得当前块的预测块会表现出一种纹理特性,而第二帧内预测模式有可能会使得当前块的预测块表现出另一种纹理特性;换言之,对当前块进行预测后,从统计角度来说,当前块的残差块也会表现出两种纹理特性,即当前块的残差块并不一定是符合某一种预测模式能够体现的规律。此时,针对TMMIP技术,所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式时,使得所述第一帧内预测模式对应的变换集的纹理方向能够同时贴合最优MIP模式让当前块的预测块表现出的纹理特性以及所述第二帧内预测模式让当前块的预测块表现出的纹理特性,提升了所述当前块的解压缩性能。进一步的,将所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式或由所述TIMD模式导出的帧内预测模式时,可以在确定所述第二帧内预测模式的过程中直接确定第一帧内预测模式,相当于,能够在降低解压缩复杂度的基础上,使得所述第一帧内预测模式对应的变换集的纹理方向能够同时贴合最优MIP模式让当前块的预测块表现出的纹理特性以及所述第二帧内预测模式让当前块的预测块表现出的纹理特性,提升了解压缩效率。
在一些实施例中,解码器先基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;再基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;然后基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
在一些实施例中,所述基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,所述方法300还可包括:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
在一些实施例中,解码器基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;并基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;然后,解码器基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
示例性地,解码器可直接基于所述最优MIP模式对所述当前块进行帧内预测,得到所述第一预测块。此外,解码器可直接基于所述TIMD模式得到最优预测模式和次优预测模式,对当前块进行预测,得到所述第二预测块。例如,若所述最优预测模式与次优预测模式都不是直流模式(也可称为均值模式)或者平面模式(也可称为平坦模式),且次优预测模式的失真代价小于两倍的最优预测模式的失真代价,则需要进行预测块融合操作;即解码器可以先根据最优预测模式对当前块进行帧内预测,以获取最优预测块;其次根据次优预测模式对当前块进行帧内预测,以获取次优预测块;再利用最优预测模式的失真代价与次优预测模式的失真代价之间的比例计算得到属于最优预测块的权重值与次优预测块的权重值;最后将最优预测块与次优预测块进行加权融合得到所述第二预测块。再如,若最优预测模式或次优预测模式为平面模式或直流模式,或者优预测模式的失真代价大于两倍的最优预测模式的失真代价,则不需 要进行预测块融合操作,即仅可以将基于最优预测模式得到的最优预测块直接作为所述第二预测块。解码器得到所述第一预测块和所述第二预测块后,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则解码器基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则解码器确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
在一些实施例中,所述S320可包括:
解码器解析所述当前序列的码流获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,若所述第一标识的取值为第一数值,则用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测;若所述第一标识的取值为第二数值,则用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测。在一种实现方式中,所述第一数值为1且所述第二数值为0,在另一种实现方式中,所述第一数值为0且所述第二数值为1。当然,所述第一数值和所述第二数值也可为其他数值,本申请对此不作限定。
示例性地,若所述第一标识为真,则用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测;若所述第一标识为假,则用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测。
示例性地,解码器解析块级标识,若当前块采用帧内预测模式,则解析或获取所述第一标识,若所述第一标识为真,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,所述第一标识记为sps_timd_enable_flag,此时,解码器解析或获取sps_timd_enable_flag,若所述sps_timd_enable_flag为真,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,所述第一标识为序列级标识。
需要说明的是,所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,也可替换为具有类似或相同含义的描述。例如,在其他可替代实施例中,所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,也可替换为以下中的任一项:所述第一标识用于标识允许使用TMMIP技术确定当前序列中的图像块的帧内预测模式,所述第一标识用于标识允许使用TMMIP技术对当前序列中的图像块进行帧内预测,所述第一标识用于标识允许所述当前序列中的图像块使用TMMIP技术,所述第一标识用于标识允许使用基于所述多个MIP模式确定的MIP模式对当前序列中的图像块进行预测。
此外,在其他可替代实施例中,将TMMIP技术结合至其他技术时,也可通过其他技术的允许标志位来间接指示当前序列是否允许使用TMMIP技术。例如,以TIMD技术为例,所述第一标识用于指示当前序列允许使用TIMD技术时,则说明当前序列也允许使用TMMIP技术;或者说,所述第一标识用于指示当前序列允许使用TIMD技术时,则说明当前序列允许同时使用TIMD技术和TMMIP技术;以进一步节省比特开销。
在一些实施例中,若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则解码器解析所述码流获取第二标识;若所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,解码器解析块级标识,若当前块采用帧内预测模式,则解析或获取所述第一标识,若所述第一标识为真,则解码器解析或获取所述第二标识,若所述第二标识为真,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,若所述第二标识的第三数值,则用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测;若所述第二标识的第四数值,则用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测。在一种实现方式中,所述第三数值为1且所述第四数值为0,在另一种实现方式中,所述第三数值为0且所述第四数值为1。当然,所述第三数值和所述第四数值也可为其他数值,本申请对此不作具体限定。
示例性地,若所述第二标识为真,则用于标识允许使用所述最优MIP模式和所述第二帧内预测模 式对所述当前块进行预测;若所述第二标识为假,则用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测。
示例性地,所述第一标识记为sps_timd_enable_flag,所述第二标识记为cu_timd_enable_flag,此时,解码器解析或获取sps_timd_enable_flag,若所述sps_timd_enable_flag为真,则解码器可以解析或获取cu_timd_enable_flag,若所述cu_timd_enable_flag为真,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,所述第二标识为块级标识或编码单元级标识。
需要说明的是,所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,也可替换为具有类似或相同含义的描述。例如,在其他可替代实施例中,所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,也可替换为以下中的任一项:所述第二标识用于标识允许使用TMMIP技术确定当前块的帧内预测模式,所述第二标识用于标识允许使用TMMIP技术对当前块进行帧内预测,所述第二标识用于标识允许所述当前块中的图像块使用TMMIP技术,所述第二标识用于标识允许使用基于所述多个MIP模式确定的MIP模式对当前块进行预测。
此外,在其他可替代实施例中,将TMMIP技术结合至其他技术时,也可通过其他技术的允许标志位来间接指示当前块是否允许使用TMMIP技术。例如,以TIMD技术为例,所述第二标识用于指示当前块允许使用TIMD技术时,则说明当前块也允许使用TMMIP技术;或者说,所述第二标识用于指示当前块允许使用TIMD技术时,则说明当前块允许同时使用TIMD技术和TMMIP技术;以进一步节省比特开销。
另外,解码端解析所述第二标识时,可以在解析当前块的残差块之前解析所述第二标识,也可以在解析当前块的残差块之后解析所述第二标识,本申请对此不作具体限定。
在一些实施例中,所述方法300还可包括:
解码器基于多个MIP模式的失真代价,确定所述最优MIP模式;
其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
示例性地,解码器基于多个MIP模式的失真代价,在确定用于预测所述当前块的最优MIP模式之前,需要计算所述多个MIP模式中的每一个MIP模式的失真代价,并根据每一个MIP模式的失真代价对所述多个MIP模式进行排序,代价最小的MIP模式即为最优预测结果。
值得注意的是,本申请中的解码器涉及的失真代价不同于编码器涉及的率失真代价(RDcost),率失真代价为编码端用于在多种帧内预测技术中确定某一种帧内预测技术时使用的失真代价,率失真代价可以是失真图像和原始图像比较得到的代价值,由于解码器并不能获取原始图像,因此,解码器涉及的失真代价可以是重建样本和预测样本之间的失真代价,例如重建样本和预测样本之间的绝对变换差的和(Sum of Absolute Transformed Difference,SATD)代价或其他可用于计算重建样本和预测样本之间的差异的代价。
当然,在其他可替代实施例中,解码器先基于所述MIP模式的失真代价,确定所述多个MIP模式的排列顺序;然后基于所述多个MIP模式的排列顺序,确定所述最优MIP模式使用的编码方式;接着基于最优MIP模式使用的编码方式对所述当前序列的码流进行解码,得到最优MIP模式的索引。
例如,所述排列顺序中前n个MIP模式使用的编码方式的码字长度小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,所述前n个MIP模式使用变长编码方式且所述第n个MIP模式之后的MIP模式使用截断二进制编码方式。示例性地,n可以是大于或等于1的任意数值。需要说明的是,对于传统的MIP技术,对MIP模式的索引通常选择截断二进制的方式进行二值化并写入,这种编码方式比较接近等概率编码,即其将所有的预测模式分成两段,一段用N个码字表示,另一个由N+1个码字进行表示。有鉴于此,本申请中,解码器基于多个MIP模式的失真代价确定用于预测所述当前块的最优MIP模式之前,可以先计算所述多个MIP模式中的每一个MIP模式的失真代价,并根据每一个MIP模式的失真代价对所述多个MIP模式进行排序,最终,解码器可基于所述多个MIP模式的排序选择使用更灵活的变长编码方式,与等概率编码方式相比,通过灵活设置MIP模式的编码方式,有利于节省所述MIP模式的索引的比特开销。
再如,所述排列顺序为解码器按照失真代价有小到大的顺序对所述多个MIP模式进行排列得到的顺序。由于MIP模式的失真代价越小,编码器使用其对当前块进行帧内预测的概率越大,因此,将所述排列顺序中前n个MIP模式使用的编码方式的码字长度设计为小于所述排列顺序中第n个MIP模式之后的MIP模式使用的编码方式的码字长度;和/或,将所述前n个MIP模式使用的编码方式设计为变长编码方式且将所述第n个MIP模式之后的MIP模式使用的编码方式设计为截断二进制编码方式;相 当于,编码器大概率使用的MIP模式使用较短码字长度或变长编码方式,能够节省MIP模式的索引的比特开销,提升解压缩性能。
在一些实施例中,所述方法300还可包括:
若所述第二帧内预测模式为所述次优MIP模式,则解码器基于所述最优MIP模式的失真代价和所述次优MIP模式的失真代价,确定是否采用所述次优MIP模式对所述当前块进行预测;若确定不采用所述次优MIP模式,则解码器可直接基于所述最优MIP模式对所述当前块进行预测;若确定采用所述次优MIP模式,则解码器可基于所述最优MIP模式和所述次优MIP模式对所述当前块进行预测,以得到所述当前块的预测块。
示例性地,所述第二帧内预测模式为次优MIP模式时,若所述最优MIP模式的失真代价和所述次优MIP模式的失真代价之间的比值小于或等于预设比例时,则解码器可直接基于所述最优MIP模式对所述当前块进行预测,得到所述当前块的预测块。或者说,所述第二帧内预测模式为次优MIP模式时,若所述次优MIP模式的失真代价和所述最优MIP模式的失真代价之间的比值大于或等于预设比例时,则解码器可直接基于所述最优MIP模式对所述当前块进行预测,得到所述当前块的预测块。例如,若次优MIP模式的失真代价大于或等于最优MIP模式的失真代价的某一个倍数(例如两倍)时,可以解释为次优MIP模式已有较大的失真并不适合当前块,即可以不用融合增强技术,仅使用最优MIP模式对当前块进行预测。
本实施例中,解码器基于所述最优MIP模式的失真代价和所述次优MIP模式的失真代价,确定是否采用所述次优MIP模式对所述当前块进行预测,相当于,解码器基于所述最优MIP模式的失真代价和所述次优MIP模式的失真代价,确定是否采用所述次优MIP模式对最优MIP模式进行性能增强,避免了在码流中携带用于确定是否采用所述次优MIP模式对最优MIP模式进行性能增强的标识,节省了比特开销,进而能够增强解压缩性能。
在一些实施例中,所述第二模板区域和所述第一模板区域相同或不同。
示例性地,所述第二模板区域的大小可以根据当前块的尺寸大小预先定义。例如,所述第二模板区域中与当前块相邻的上侧区域的宽为当前块的宽大小,且其高为至少一行样本高度;再如,所述第二模板区域中与当前块的左侧相邻的左侧区域的高为当前块的高度,其宽为两行样本的宽度。当然,在其他可替代实施例中,所述第二模板区域也可以实现为其他尺寸或大小的第二模板区域,本申请对此不作具体限定。
在一些实施例中,所述方法300还可包括:
解码器基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;解码器基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,解码器基于多个MIP模式的失真代价,确定所述最优MIP模式之前,基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价。
如前所述,对于传统的MIP技术,其比特开销相比于其他帧内预测工具要多一些,其不仅需要一个标识位去表示是否使用MIP技术,还需要一个标识位去表示是否转置使用MIP,最后也是最大开销部分,它需要使用截断二进制编码去表示MIP的预测模式。MIP是基于神经网络技术简化而来的技术,与传统插值滤波预测技术有较大的不同,对于一些特殊的纹理,虽然MIP预测比传统帧内预测模式的效果更好,但其较大的标识开销是MIP技术的缺陷,以4x4大小的编码单元为例,共有16个预测样本,但其比特开销包括1个MIP使用标识,1个MIP转置标识和5或6位的截断二进制标识。有鉴于此,本申请在确定最优MIP模式时,通过遍历所述第三标识的每一个状态,考虑了MIP模式的转置功能,能够节省1个MIP转置标识的开销,进而能够提高解压缩效率。
示例性地,解码器遍历所述第三标识的每一个状态和所述多个MIP模式,确定所述第三标识的每一个状态下所述多个MIP模式的失真代价,并基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式;或者,解码器遍历所述第三标识的每一个状态和所述多个MIP模式,确定所述多个MIP模式在所述第三标识的每一个状态下的失真代价,并基于所述多个MIP模式在所述第三标识的每一个状态下的失真代价,确定所述最优MIP模式。也即是说,解码端可以先遍历所述多个MIP模式,也可以先遍历所述第三标识的状态。
示例性地,若所述第三标识的取值为第五数值,则用于标识转置MIP模式的输入向量和输出向量;若所述第三标识的取值为第六数值,则用于标识不转置MIP模式的输入向量和输出向量。此时,所述第三标识的每一个状态也可替换为所述第三标识的每一个取值。在一种实现方式中,所述第五数值为1 且所述第六数值为0,在另一种实现方式中,所述第五数值为0且所述第六数值为1。当然,所述第五数值和所述第六数值也可为其他数值,本申请对此不作限定。
示例性地,若所述第三标识为真,则用于标识转置MIP模式的输入向量和输出向量;若所述第三标识为假,则用于标识不转置MIP模式的输入向量和输出向量。此时,所述第三标识为真或为假均为所述第三标识的一个状态。
示例性地,所述第三标识为序列级标识、块级标识或编码单元级标识。
示例性地,所述第三标识也可称为转置信息、转置标识、或MIP转置标识位。
需要说明的是,所述第三标识用于标识是否转置MIP模式的输入向量和输出向量,也可替换为具有类似或相同含义的描述。例如,在其他可替代实施例中,所述第三标识用于标识是否需要转置MIP模式的输入和输出,所述第三标识用于标识MIP模式的输入向量和输出向量是否为转置后的向量,所述第三标识用于表示是否转置。
在一些实施例中,所述方法300还可包括:
若所述当前块的尺寸为预设尺寸,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,所述预设尺寸可以包括宽度为预设宽度且高度为预设高度的尺寸。也即是说,若所述当前块的宽度为预设宽度且高度为预设高度,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,所述预设尺寸可以通过在设备(例如,包括解码器和编码器)中预先保存相应的代码、表格或其他可用于指示相关信息的方式来实现,本申请对于其具体的实现方式不做限定。比如,预设尺寸可以是指协议中定义的尺寸。可选地,所述"协议"可以指编解码技术领域的标准协议,例如可以包括VCC或ECM协议等相关协议。
当然,在其他可替代实施例中,解码器也可以通过其他方式,基于所述预设尺寸确定是否基于所述多个MIP模式的失真代价确定所述最优MIP模式,本申请对此不作具体限定。
例如,解码器可以仅基于所述当前块的宽度或高度,确定是否基于所述多个MIP模式的失真代价确定所述最优MIP模式。在一种实现方式中,若所述当前块的宽度为预设宽度或高度为预设高度,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。再如,解码器可以通过比较所述当前块的尺寸和所述预设尺寸,确定是否基于所述多个MIP模式的失真代价确定所述最优MIP模式。在一种实现方式中,若所述当前块的尺寸大于或小于预设尺寸,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。在另一种实现方式中,若所述当前块的宽度大于或小于预设宽度,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。在另一种实现方式中,若所述当前块的高度大于或小于预设高度,则所述解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
在一些实施例中,所述方法300还可包括:
若所述当前块所在的图像帧为I帧、且所述当前块的尺寸为所述预设尺寸,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,若所述当前块所在的图像帧为I帧、所述当前块的宽度为预设宽度、且所述当前块的高度为预设高度,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。也即是说,只有在所述当前块所在的图像帧为I帧的情况下,解码器才基于所述当前块的尺寸,确定是否基于所述多个MIP模式的失真代价确定所述最优MIP模式。
在一些实施例中,所述方法300还可包括:
若所述当前块所在的图像帧为B帧,则解码器基于所述多个MIP模式的失真代价,确定所述最优MIP模式。
示例性地,若所述当前块所在的图像帧为B帧,则解码器可以直接基于所述多个MIP模式的失真代价,确定所述最优MIP模式。也即是说,在所述当前块所在的图像帧为B帧的情况下,不管当前块的尺寸为多少,解码器都可以直接基于所述多个MIP模式的失真代价确定所述最优MIP模式。
在一些实施例中,所述S320之前,所述方法300还可包括:
解码器获取与所述当前块相邻的相邻块使用的MIP模式;解码器将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
示例性地,所述相邻块可以是与所述当前块的上侧、左侧、左下、右上及左上中的至少一项相邻的图像块。例如,解码器可按照所述当前块的上侧、左侧、左下、右上及左上的顺序获取的图像块确定为所述相邻块。可选的,所述多个MIP模式可用于构建解码器确定用于预测当前块的可用MIP模式或可用MIP模式列表,以便解码器在所述可用MIP模式或可用MIP模式列表中通过对第二模板区域内的样 本进行预测的方式确定所述最优MIP模式。
在一些实施例中,所述方法300还可包括:
解码器对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参考列;解码器以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;解码器基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
示例性地,解码器基于多个MIP模式的失真代价确定所述最优MIP模式之前,对所述第二模板区域外部相邻的参考区域进行重建样本填充。
示例性地,所述参考区域中与所述第二模板区域的上侧相邻的区域的宽度等于所述第二模板区域的宽度,所述参考区域中与所述第二模板区域的左侧相邻的区域的高度等于所述第二模板区域的宽度;若所述参考区域中与所述第二模板区域的上侧相邻的区域的宽度大于所述第二模板区域的宽度,则解码器可以对所述参考区域中与所述第二模板区域的上侧相邻的区域进行下采样或降维处理,以得到所述参考行。若所述参考区域中与所述第二模板区域的左侧相邻的区域的高度可大于所述第二模板区域的宽度,则解码器可以对所述参考区域中与所述第二模板区域的左侧相邻的区域进行下采样或降维处理,以得到所述参考列。
示例性地,所述第二模板区域可以是上文涉及的TIMD模式中使用的模板(template)区域,所述参考区域可以是所述TIMD模式中使用的参考模板(Reference of template)。例如,结合图5来说,若当前块为宽等于M且高等于N的编码单元,解码器对宽等于2(M+L1)+1且高等于2(N+L2)+1的编码单元组成的参考区域进行重建样本的充填,并对填充后的参考区域进行降采样或降维处理,得到所述参考行和所述参考列,进而基于所述参考行和所述参考列构建MIP模式的输入向量。
示例性地,解码器获取所述参考行和所述参考列后,以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;也即是说,所述解码器基于当前块的参考模板内的重建样本,通过遍历所述多个MIP模式的方式对当前块的第二模板区域内的样本进行预测。以当前遍历MIP模式为例,解码器以所述参考行、所述参考列、所述当前遍历MIP模式的索引、上文涉及的第三标识为输入,得到所述当前遍历MIP模式对应的预测块;其中,所述参考行和所述参考列用于构建所述当前遍历MIP模式的输入向量;所述当前遍历MIP模式的索引用于确定当前遍历MIP模式的矩阵和/或偏置向量;所述第三标识用于标识是否转置MIP模式的输入向量和输出向量;例如,若所述第三标识用于标识不转置MIP模式的输入向量和输出向量,则将所述参考列拼接在所述参考行之后,以形成所述当前遍历MIP模式的输入向量;若所述第三标识用于标识转置MIP模式的输入向量和输出向量,则将所述参考行拼接在所述参考列之后,以形成所述当前遍历MIP模式的输入向量。相应的,若所述第三标识用于标识转置MIP模式的输入向量和输出向量,解码器对所述当前遍历MIP模式的输出进行转置,以得到所述第二模板区域的预测块。解码器遍历所述多个MIP模式的方式获取所述多个MIP模式对应的多个预测块后,可以基于所述多个预测块和所述第二模板区域内的重建样本之间的失真代价,根据失真代价最小原则选出代价最小的MIP模式,并将其确定为当前块基于模板匹配的MIP模式下最优MIP模式。
在一些实施例中,解码器利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测时,先对所述参考行和所述参考列进行下采样,得到输入向量;再以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;最后对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
示例性地,所述参考行和所述参考列满足所述多个MIP模式的输入条件。若所述参考行和所述参考列不满足所述多个MIP模式的输入条件,可以先将所述参考行和/或所述参考列处理为满足所述多个MIP模式的输入条件的输入样本,然后基于满足所述多个MIP模式的输入条件的输入样本,确定所述多个MIP模式的输入向量。例如,以所述输入条件为指定个数的输入样本为例,若所述参考行和所述参考列不满足MIP模式的输入样本个数时,解码器可对所述参考行和/或所述参考列进行哈尔下采样(Haar-downsampling)等方式降维到指定个数的输入样本,并基于降维后的指定个数的输入样本确定所述多个MIP模式的输入向量。
在一些实施例中,所述S320可包括:
解码器基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
本实施例中,解码器基于所述多个MIP模式在所述第二模板区域上的失真代价确定所述最优MIP模式时,将所述多个MIP模式的失真代价设计为所述多个MIP模式的SATD,与直接计算所述多个MIP模式的率失真代价相比,不仅能够实现基于所述多个MIP模式在所述第二模板区域上的失真代价 确定所述最优MIP模式,还能够简化所述多个MIP模式的失真代价的计算复杂度,进而能够解码器的提升解压缩性能。
综上所述,本申请提供的方案在最优MIP模式的基础上提出融合增强的思想,即解码器不仅需要确定用于预测当前块的最优MIP模式,同时还需要融合另一个预测块以达到不同的预测效果。这样不单单是节省比特开销,还能创造出一个新的预测技术,而融合的过程,实际上也是因为最优MIP模式并不能完全取代编码端基于率失真代价计算得到的最优预测模式,所以采用融合方法来兼顾预测准确性与预测多样性。
示例性地,解码器基于模板匹配的MIP模式导出方法主要思路可分为以下几个部分:
首先,填充参考区域(例如图5所示的参考模板)内的重建样本,即用于对第二模板区域(如图5所述的模板)内的样本进行预测时所需要的参考重建样本。可选的,所述参考区域的宽高不需要超出第二模板区域的宽高。若所述参考区域填充了超出第二模板区域的宽高的样本,则需要使用下采样或其他方式降维达到MIP输入维度的要求。
然后,解码器以所述参考区域内的参考重建样本、所述多个MIP模式的索引、MIP转置标识位作为输入,对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的预测块。可选的,所述参考区域内的参考重建样本需要满足MIP输入条件,如哈尔下采样(Haar-downsampling)等降维到指定个数的输入样本。所述多个MIP模式的索引用于确定MIP技术的矩阵索引,进而获取MIP预测矩阵系数。MIP转置标识位用于标识是否需要转置输入和输出。
接着,对于所述多个MIP模式对应的预测块而言,可以遍历所有MIP模式与MIP转置与否的组合情况,得到每一个MIP模式及MIP转置标识位的每一个状态下的第二模板区域的预测样本,并计算第二模板区域内预测样本与重建样本之间的失真并记录其代价信息;最后,根据失真最小原则选出代价最小的MIP模式及其对应的MIP转置信息,即为当前块基于模板匹配的MIP预测导出模式下最优MIP模式。
最后,解码器分别利用最优MIP预测模式第二帧内预测模式对当前块进行预测,得到第一预测块和第二预测块,并根据最优MIP预测模式第二帧内预测模式的加权权重,对所述第一预测块和所述第二预测块进行加权计算,得到当前块的预测块。
值得注意的是,本申请涉及的部分计算可以用查找表和移位的方式代替,虽然查找表的方式在结果上可能会与直接做除法有一定误差,但有利于硬件实现和控制编解码成本。例如,关于失真代价的计算或确定最优MIP模式中涉及到的计算。
上文中从解码器的角度详细描述了根据本申请实施例的解码方法,下面将结合图17,从编码器的角度描述根据本申请实施例的编码方法。
图10是本申请实施例提供的编码方法400的示意性流程图。应理解,该编码方法400可由编码器执行。例如应用于图1所示的编码框架100。为便于描述,下面以编码器为例进行说明。
如图10所示,所述编码方法400可包括:
S410,获取当前序列中当前块的残差块;
S420,对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
S430,确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
S440,基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
S450,对所述第四变换系数进行编码。
应当理解,解码端的第一变换为编码端的第四变换的反变换,解码端的第二变换为编码端的第三变换的反变换。例如,所述第三变换为上文涉及的基础变换或主变换,所述第四变换为上文涉及的二次变换,相应的,所述第一变换为二次变换的反变换(或逆变换),所述第二变换可以为基础变换或主变换的反变换(或逆变换)。例如,所述第一变换可以是反(逆)LFNST,所述第二变换可以是反(逆)DCT2型,反(逆)DCT8型或反(逆)DST7型等;相应的,所述第三变换可以是DCT2型,DCT8型或DST7型等,所述第四变换可以是LFNST。
在一些实施例中,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
在一些实施例中,所述S430可包括:
基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
在一些实施例中,所述第二模板区域和所述第一模板区域相同或不同。
在一些实施例中,所述S410可包括:
确定第二帧内预测模式;
其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;
基于所述当前块的预测块,得到所述当前块的残差块。
在一些实施例中,基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
在一些实施例中,若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
在一些实施例中,编码器获取第一标识;若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则确定所述第二帧内预测模式;其中,所述S450可包括:
对所述第四变换系数和所述第一标识进行编码。
在一些实施例中,若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到第一率失真代价;基于至少一个帧内预测模式对所述当前块进行预测,得到至少一个率失真代价;若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则将基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测得到的预测块,确定为所述当前块的预测块;其中,所述S450可包括:
对所述第四变换系数、所述第一标识和第二标识进行编码;
其中,若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测;若所述第一率失真代价大于所述至少一个率失真代价中的最小值,则所述第二标识用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测。
在一些实施例中,所述方法400还可包括:
基于多个MIP模式的失真代价,确定所述最优MIP模式;
其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
在一些实施例中,所述第二模板区域和所述第一模板区域相同或不同。
在一些实施例中,基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得 到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
在一些实施例中,基于多个MIP模式的失真代价确定所述最优MIP模式之前,所述方法400还可包括:
获取与所述当前块相邻的相邻块使用的MIP模式;
将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
在一些实施例中,基于多个MIP模式的失真代价确定所述最优MIP模式之前,所述方法400还可包括:
对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参考列;以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
在一些实施例中,对所述参考行和所述参考列进行下采样,得到输入向量;以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
在一些实施例中,基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
应当理解,编码方法可以理解为解码方法的逆过程,因此,所述编码方法400的具体方案可参见解码方法300的相关内容,为便于描述,本申请对此不再赘述。
下面结合具体实施例对本申请的方案进行说明。
实施例1:
本实施例中,上文涉及的第二帧内预测模式为次优MIP模式,即编码器或解码器可基于最优MIP模式和次优MIP模式对当前块进行帧内预测,以得到当前块的预测块。
编码器遍历预测模式,若当前块为帧内模式,则编码器获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前编码器允许使用TMMIP技术。
示例性地,编码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则编码器尝试TMMIP技术,即执行步骤2;若sps_tmmip_enable_flag为假,则编码器不尝试TMMIP技术,即跳过步骤2直接执行步骤3。
步骤2:
首先,编码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,编码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,编码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,编码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:编码器先对重建样本进行哈尔下采样,例如编码器根据块尺寸决定下采样步长。接着,编码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,编码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,编码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
接着,编码器基于通过遍历每个MIP模式得到的第二模板区域的预测块与第二模板区域内重建样本,计算失真代价,并记录每一个预测模式与转置信息下的失真代价值。遍历所有允许的预测模式与转 置信息后,根据代价最小原则,选择出最优的MIP模式以及其对应的转置信息,以及次优MIP模式以及其对应的转置信息。编码器根据最优MIP模式的代价值和次优MIP模式的代价值的关系判断是否需要融合增强,若次优MIP模式的代价值小于最优MIP模式的代价值的两倍,则需要将最优MIP预测块与次优MIP预测块进行融合增强。若次优预测模式的代价值大于或等于最优MIP模式的代价值的两倍,则不需要融合增强。
最后,若需要融合增强,则编码器根据最优MIP模式、次优MIP模式、最优MIP模式的转置信息以及次优MIP模式的转置信息,得到最优MIP模式对应的预测块和次优MIP模式对应的预测块。具体地,首先,编码器对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。编码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的最优MIP预测块和次优MIP预测块,并根据计算得到的最优MIP模式的权重值和次优MIP模式的权重值,对最优MIP预测块和次优MIP预测块进行加权平均,得到一个新的预测块作为当前块的最终预测块。若不需要融合增强,则编码器可根据最优MIP模式及其转置信息,计算得到最优MIP预测块,计算过程与前述相同,最终,编码器将最优MIP预测块作为当前块的预测块。
此外,编码器得到当前块的率失真代价,将其记为cost1。
另外,编码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或编码器确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
步骤3:
编码器继续遍历其他帧内预测技术并计算对应的率失真代价记为cost2…costN。
步骤4:
若cost1为所有率失真代价中最小,则当前块采用TMMIP技术,编码器将当前块的TMMIP使用标志位置真并写进码流;若cost1不为最小率失真代价,则当前块采用其他帧内预测技术,编码器将当前块的TMMIP使用标志位置假并写进码流。应当理解,其他帧内预测技术的标识位或索引等信息根据定义传输,此处不详细阐述。
步骤5:
编码器基于当前块的预测块和当前块的原始块确定当前块的残差块,然后对当前块的残差块进行基础变换并基于所述第一帧内预测模式对基础变换后的变换系数进行二次变换,接着对二次变换后的变换系数进行量化,熵编码以及环路滤波等操作。应当理解,其量化具体过程可参见上文相关内容,为避免重复,此处不再赘述。
下面对本实施例中解码器的相关方案进行说明。
解码器解析块级类型标志位,若为帧内模式,则解析或获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前解码器允许使用TMMIP技术。
示例性地,解码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则解码器解析当前块的TMMIP使用标志位,否则,当前解码过程不需要解码块级的TMMIP使用标志位,块级的TMMIP使用标志位默认为否。若当前块的TMMIP使用标志位为真,则执行步骤2;否则,执行步骤3。
步骤2:
首先,解码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,解码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,解码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,解码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:解码器先对重建样本进行哈尔下采样,例如解码器根据块尺寸 决定下采样步长。接着,解码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,解码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,解码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
接着,解码器基于通过遍历每个MIP模式得到的第二模板区域的预测块与第二模板区域内重建样本,计算失真代价,并记录每一个预测模式与转置信息下的失真代价值。遍历所有允许的预测模式与转置信息后,根据代价最小原则,选择出最优的MIP模式以及其对应的转置信息,以及次优MIP模式以及其对应的转置信息。解码器根据最优MIP模式的代价值和次优MIP模式的代价值的关系判断是否需要融合增强,若次优MIP模式的代价值小于最优MIP模式的代价值的两倍,则需要将最优MIP预测块与次优MIP预测块进行融合增强。若次优预测模式的代价值大于或等于最优MIP模式的代价值的两倍,则不需要融合增强。
最后,若需要融合增强,则解码器根据最优MIP模式、次优MIP模式、最优MIP模式的转置信息以及次优MIP模式的转置信息,得到最优MIP模式对应的预测块和次优MIP模式对应的预测块。具体地,首先,解码器对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。解码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的最优MIP预测块和次优MIP预测块,并根据计算得到的最优MIP模式的权重值和次优MIP模式的权重值,对最优MIP预测块和次优MIP预测块进行加权平均,得到一个新的预测块作为当前块的最终预测块。若不需要融合增强,则解码器可根据最优MIP模式及其转置信息,计算得到最优MIP预测块,计算过程与前述相同,最终,解码器将最优MIP预测块作为当前块的预测块。
另外,解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
步骤3:
解码器继续解析其他帧内预测技术的使用标识位或索引等信息,并根据解析到的信息求得当前块的最终预测块。
步骤4:
解码器解析码流并获取当前块的频域残差块(也称为频域残差信息),并对当前块的频域残差块进行反量化及反变换(先基于第一帧内预测模式进行二次变换的反变换然后进行基础变换或主变换的反变换的反变换)得到当前块的残差块(也称为时域残差块或时域残差信息);然后解码器将当前块的预测块与当前块的残差块叠加得到重建样本块。
步骤5:
若当前图像中所有重建样本块经由环路滤波等技术后,得到最终的重建图像。
可选的,重建图像可以作为视频输出,也可以作为后面解码参考。
本实施例中,编码器或解码器在TMMIP技术中用到的第二模板区域的大小可以根据当前块的尺寸大小预先定义。例如,所述第二模板区域中与当前块相邻的上侧区域的宽为当前块的宽大小,且其高为两行样本高度;所述第二模板区域中与当前块的左侧相邻的左侧区域的高为当前块的高度,其宽为两行样本的宽度。当然,在其他可替代实施例中,也可以实现为其他尺寸的第二模板区域,本申请对此不作具体限定。
本实施例中,通过TMMIP技术
实施例2:
本实施例中,上文涉及的第二帧内预测模式为由TIMD模式导出的帧内预测模式,即编码器或解码器可基于最优MIP模式和由TIMD模式导出的帧内预测模式对当前块进行帧内预测,以得到当前块的预测块。
也即是说,基于模板匹配的MIP模式导出融合增强技术不仅可以将两个导出的MIP预测块进行融合,也能与其他基于模板匹配的导出技术所产生的预测块进行融合。本申请将TMMIP技术与TIMD技术进行融合,得到一种导出的传统预测块与基于矩阵的预测块融合方法。TIMD在编解码端利用模板匹配的思想导出最优的传统帧内预测模式,且该技术还能够将该预测模式进行偏移拓展,得到一个更新的 帧内预测模式。而TMMIP技术也是在编解码端利用模板匹配的思想导出最优的MIP模式,将这两个最优预测模式进行融合,则可以兼顾传统预测块具有的方向性与MIP预测独特的纹理特性,产生一个全新的预测块,提高编码效率。
编码器遍历预测模式,若当前块为帧内模式,则编码器获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前编码器允许使用TMMIP技术。
示例性地,编码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则编码器尝试TMMIP技术,即执行步骤2;若sps_tmmip_enable_flag为假,则编码器不尝试TMMIP技术,即跳过步骤2直接执行步骤3。
步骤2:
首先,编码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,编码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,编码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,编码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:编码器先对重建样本进行哈尔下采样,例如编码器根据块尺寸决定下采样步长。接着,编码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,编码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,编码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
此外,编码器还需要尝试TIMD的模板匹配计算过程,根据不同的预测模式索引获取不同的插值滤波器对参考样本进行插值得到模板内的预测样本。
接着,编码器基于通过遍历每个MIP模式得到的第二模板区域的预测样本与第二模板区域内重建样本,计算失真代价,并记录每一个预测模式与转置信息下的失真代价值,并基于每一个预测模式与转置信息下的失真代价值,根据代价最小原则,选择出最优的MIP模式以及其对应的转置信息。此外,编码器还需要遍历所有TIMD允许的帧内预测模式,计算得到模板内的预测样本并与模板内重建样本计算失真代价,并根据代价最小原则记录由TIMD技术导出的最优预测模式、次优预测模式、最优预测模式的失真代价值以及次优预测模式失真代价值。
最后,编码器根据得到的最优MIP模式和转置信息,对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。编码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的输出作为当前块的最优MIP预测块。
对于由TIMD技术导出的最优预测模式和次优预测模式,若最优预测模式与次优预测模式都不是均值(DC)模式或者平坦(PLANAR)模式,且次优预测模式的失真代价小于两倍的最优预测模式失真代价,则编码器需要进行预测块融合操作。首先,编码器根据最优预测模式获取插值滤波系数,对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为最优预测块;其次编码器根据次优预测模式获取插值滤波系数,对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为次优预测块。接着,编码器利用最优预测模式代价值与次优预测模式代价值之间的比例计算得到属于最优预测块的权重值与次优预测块的权重值。最后,编码器将最优预测块与次优预测块进行加权融合得到当前块的预测块作为输出。若最优预测模式或次优预测模式为均值模式(DC)或平坦模式(PLANAR),或者次优预测模式的代价值大于两倍的最优预测模式代价值,则编码器不需要进行预测块融合操作,仅用最优预测模式对上侧和左侧相邻重建样本进行插值滤波得到的最优预测块作为当前 块的最优TIMD预测块。
最后,编码器基于计算得到的最优MIP模式的权重值和由TIMD技术导出的预测模式的权重值,将最优MIP预测块与最优TIMD预测块进行加权平均,得到新的预测块即为当前块的预测块。
此外,编码器得到当前块的率失真代价,将其记为cost1。
另外,编码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或编码器确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
需要说明的是,由于TIMD技术的模板区域与第二模板区域(即TMMIP技术的模板区域)可以设定相同,即用于计算失真代价的模板区域相同,则TIMD技术的模板区域的代价信息和TMMIP技术的模板区域的代价信息可以等效或处于同一个对比水平,此时,也可以基于代价信息确定是否融合增强,本申请对此不作具体限定。
步骤3:
编码器继续遍历其他帧内预测技术并计算对应的率失真代价记为cost2…costN。
步骤4:
若cost1为所有率失真代价中最小,则当前块采用TMMIP技术,编码器将当前块的TMMIP使用标志位置真并写进码流;若cost1不为最小率失真代价,则当前块采用其他帧内预测技术,编码器将当前块的TMMIP使用标志位置假并写进码流。应当理解,其他帧内预测技术的标识位或索引等信息根据定义传输,此处不详细阐述;
步骤5:
编码器基于当前块的预测块和当前块的原始块确定当前块的残差块,然后对当前块的残差块进行基础变换并基于所述第一帧内预测模式对基础变换后的变换系数进行二次变换,接着对二次变换后的变换系数进行量化,熵编码以及环路滤波等操作。应当理解,其量化具体过程可参见上文相关内容,为避免重复,此处不再赘述。
下面对本实施例中解码器的相关方案进行说明。
解码器解析块级类型标志位,若为帧内模式,则解析或获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前解码器允许使用TMMIP技术。
示例性地,解码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则解码器解析当前块的TMMIP使用标志位,否则,当前解码过程不需要解码块级的TMMIP使用标志位,块级的TMMIP使用标志位默认为否。若当前块的TMMIP使用标志位为真,则执行步骤2;否则,执行步骤3。
步骤2:
首先,解码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,解码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,解码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,解码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:解码器先对重建样本进行哈尔下采样,例如解码器根据块尺寸决定下采样步长。接着,解码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,解码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,解码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
此外,解码器还需要尝试TIMD的模板匹配计算过程,根据不同的预测模式索引获取不同的插值滤波器对参考样本进行插值得到模板内的预测样本。
接着,解码器基于通过遍历每个MIP模式得到的第二模板区域的预测样本与第二模板区域内重建样本,计算失真代价,并记录每一个预测模式与转置信息下的失真代价值,并基于每一个预测模式与转置信息下的失真代价值,根据代价最小原则,选择出最优的MIP模式以及其对应的转置信息。此外,解码器还需要遍历所有TIMD允许的帧内预测模式,计算得到模板内的预测样本并与模板内重建样本计算失真代价,并根据代价最小原则记录由TIMD技术导出的最优预测模式、次优预测模式、最优预测模式的失真代价值以及次优预测模式失真代价值。
最后,解码器根据得到的最优MIP模式和转置信息,对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。解码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的输出作为当前块的最优MIP预测块。
对于由TIMD技术导出的最优预测模式和次优预测模式,若最优预测模式与次优预测模式都不是均值(DC)模式或者平坦(PLANAR)模式,且次优预测模式的失真代价小于两倍的最优预测模式失真代价,则解码器需要进行预测块融合操作。首先,解码器根据最优预测模式获取插值滤波系数,对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为最优预测块;其次解码器根据次优预测模式获取插值滤波系数,对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为次优预测块。接着,解码器利用最优预测模式代价值与次优预测模式代价值之间的比例计算得到属于最优预测块的权重值与次优预测块的权重值。最后,解码器将最优预测块与次优预测块进行加权融合得到当前块的预测块作为输出。若最优预测模式或次优预测模式为均值模式(DC)或平坦模式(PLANAR),或者次优预测模式的代价值大于两倍的最优预测模式代价值,则解码器不需要进行预测块融合操作,仅用最优预测模式对上侧和左侧相邻重建样本进行插值滤波得到的最优预测块作为当前块的最优TIMD预测块。
最后,解码器基于计算得到的最优MIP模式的权重值和由TIMD技术导出的预测模式的权重值,将最优MIP预测块与最优TIMD预测块进行加权平均,得到新的预测块即为当前块的预测块。
另外,解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
步骤3:
解码器继续解析其他帧内预测技术的使用标识位或索引等信息,并根据解析到的信息求得当前块的最终预测块。
步骤4:
解码器解析码流并获取当前块的频域残差块(也称为频域残差信息),并对当前块的频域残差块进行反量化及反变换(先基于第一帧内预测模式进行二次变换的反变换然后进行基础变换或主变换的反变换的反变换)得到当前块的残差块(也称为时域残差块或时域残差信息);然后解码器将当前块的预测块与当前块的残差块叠加得到重建样本块。
步骤5:
若当前图像中所有重建样本块经由环路滤波等技术后,得到最终的重建图像。
可选的,重建图像可以作为视频输出,也可以作为后面解码参考。
本实施例中,TIMD预测块加权融合的权重值计算过程可参见上文中对TIMD技术介绍所描述的内容,为避免重复,此处不再赘述。此外,编码器或解码器可以基于由TIMD导出的最优预测模式确定是否融合增强;例如,若由TIMD导出的最优预测模式为DC模式或PLANAR模式,则编码器或解码器可以不使用融合增强,即仅由TMMIP技术导出的最优MIP模式所产生的预测块作为当前块的预测块。另外,编码器或解码器在TMMIP技术中用到的第二模板区域的大小可以根据当前块的尺寸大小预先定义。例如,TMMIP技术中关于第二模板区域的定义可以与TIMD技术中关于模板区域的定义一致,也可以不同。例如,若当前块的宽小于等于8,则所述第二模板区域中与当前块相邻的上侧区域的高度为两行样本高度,否则高度为四行样本高度;同理,若当前块高小于等于8,则所述第二模板区域中与当前块的左侧相邻的左侧区域的宽度为两列样本高度,否则宽度为四列样本高度。
实施例3:
本实施例中,上文涉及的第二帧内预测模式为DIMD模式导出的帧内预测模式,即编码器或解码器可基于最优MIP模式和对与所述当前块相邻的第一模板区域内的重建样本使用DIMD模式导出的帧内预测模式对当前块进行帧内预测,以得到当前块的预测块。
与实施例2类似,TMMIP技术也可以和DIMD技术进行融合增强。
需要注意的是,虽然DIMD技术与TIMD技术在导出的预测模式均为传统帧内预测模式,但由于 导出方法不同,两者得到的预测模式并不一定相同。此外,TMMIP技术和DIMD技术融合增强的做法会与TMMIP技术和TIMD技术融合增强有所不同,例如,由于TMMIP技术和TIMD技术的第二模板区域大小一般相同,而且计算代价信息也基本为绝对变换差的和(Sum of Absolute Transformed Difference,SATD),其也称为基于哈达玛变换的失真代价值,因此,TMMIP技术和TIMD技术可以直接根据该代价信息计算融合权重,但是,DIMD技术的第二模板区域一般与TMMIP技术(或DIMD技术)的第二模板区域不一样大,且DIMD导出预测模式的准则是根据梯度幅度值来衡量的,梯度幅度值与SATD代价值不能直接等价,因此权重不能简单参考对TMMIP技术和TIMD技术进行融合时的方案来计算。
编码器遍历预测模式,若当前块为帧内模式,则编码器获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前编码器允许使用TMMIP技术。
示例性地,编码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则编码器尝试TMMIP技术,即执行步骤2;若sps_tmmip_enable_flag为假,则编码器不尝试TMMIP技术,即跳过步骤2直接执行步骤3。
步骤2:
首先,编码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,编码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,编码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,编码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:编码器先对重建样本进行哈尔下采样,例如编码器根据块尺寸决定下采样步长。接着,编码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,编码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,编码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
此外,编码器利用DIMD技术导出最优帧内预测模式即为最优DIMD模式。DIMD技术根据索贝尔算子计算第一模板区域内的重建样本的梯度值,根据不同预测模式的角度值对梯度值进行换算得到对应预测模式下的幅度值。
接着,编码器遍历每个MIP模式得到的模板预测块,与模板内重建样本计算失真代价,根据代价最小原则记录最优MIP模式、转置信息。此外,编码器遍历所有允许使用的帧内预测模式,计算得到各帧内预测模式下的幅度值,根据幅度最大原则记录最优DIMD预测模式。
最后,编码器根据得到的最优MIP模式和转置信息,对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。编码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的输出作为当前块的最优MIP预测块。此外,对于最优DIMD预测模式,编码器获取对应的插值滤波系数,并对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为最优DIMD预测块。编码器将最优MIP预测块与最优DIMD预测块根据预先设定好的权重对每个预测样本进行加权平均,得到新的预测块即为当前块的预测块。
此外,编码器得到当前块的率失真代价,将其记为cost1。
另外,编码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或编码器确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
步骤3:
编码器继续遍历其他帧内预测技术并计算对应的率失真代价记为cost2…costN。
步骤4:
若cost1为所有率失真代价中最小,则当前块采用TMMIP技术,编码器将当前块的TMMIP使用标志位置真并写进码流;若cost1不为最小率失真代价,则当前块采用其他帧内预测技术,编码器将当前块的TMMIP使用标志位置假并写进码流。应当理解,其他帧内预测技术的标识位或索引等信息根据定义传输,此处不详细阐述;
步骤5:
编码器基于当前块的预测块和当前块的原始块确定当前块的残差块,然后对当前块的残差块进行基础变换并基于所述第一帧内预测模式对基础变换后的变换系数进行二次变换,接着对二次变换后的变换系数进行量化,熵编码以及环路滤波等操作。应当理解,其量化具体过程可参见上文相关内容,为避免重复,此处不再赘述。
下面对本实施例中解码器的相关方案进行说明。
解码器解析块级类型标志位,若为帧内模式,则解析或获取序列级允许使用标志位,用于表示是否允许当前序列使用基于模板匹配的MIP模式导出技术,其可以如sps_tmmip_enable_flag的形式。若tmmip的允许使用标志位均为真,则表示当前解码器允许使用TMMIP技术。
示例性地,解码器的流程可实现为以下过程:
步骤1:
若sps_tmmip_enable_flag为真,则解码器解析当前块的TMMIP使用标志位,否则,当前解码过程不需要解码块级的TMMIP使用标志位,块级的TMMIP使用标志位默认为否。若当前块的TMMIP使用标志位为真,则执行步骤2;否则,执行步骤3。
步骤2:
首先,解码器对第二模板区域外部相邻的行和列进行重建样本填充。填充过程与原始帧内预测过程所填充的方法相同,例如,解码器可以自左下角往右上角进行遍历填充,若所有重建样本均可以用,则依次全部填充可用重建样本;若所有重建样本均不可用,则全部填充均值;若部分重建样本可用,则先填充可用重建样本,对于其余不可用重建样本,解码器可以根据上述自左下角往右上角的顺序进行遍历,直至出现第一个可用重建样本后,用第一个可用重建样本对先前不可用位置进行填充。
其次,解码器以填充完毕的第二模板区域外侧重建样本作为输入,利用可允许使用的MIP模式对第二模板区域内的样本进行预测。
示例性地,对于4x4大小的块,可允许使用的MIP模式为16个。对于宽或高为4,或8x8大小的块,可允许使用的MIP模式为8个。其他尺寸的块可允许使用的MIP模式为6个。此外,任意尺寸的块都可以使用MIP转置功能,上述TMMIP的预测模式与MIP技术相同。
示例性地,具体预测计算过程包括:解码器先对重建样本进行哈尔下采样,例如解码器根据块尺寸决定下采样步长。接着,解码器根据转置与否的信息调整上侧下采样后的重建样本与左侧下采样后的重建样本拼接顺序;若不需要转置则将左侧下采样后的重建样本拼接在上侧下采样后的重建样本之后,将得到的向量作为输入,若需要转置则将上侧下采样后的重建样本拼接在左侧下采样后的重建样本之后,将得到的向量作为输入。然后,解码器根据遍历的预测模式作为索引获取MIP矩阵系数,与输入计算得到输出向量。最后,解码器根据输出向量个数与当前模板尺寸情况,对输出向量进行上采样,若不需要上采样则向量以水平方向依次填充作为模板预测块输出,若需要上采样则先上采样水平方向再下采样垂直方向,上采样至与模板尺寸相同后作为第二模板区域的预测块进行输出。
此外,解码器利用DIMD技术导出最优帧内预测模式即为最优DIMD模式。DIMD技术根据索贝尔算子计算第一模板区域内的重建样本的梯度值,根据不同预测模式的角度值对梯度值进行换算得到对应预测模式下的幅度值。
接着,解码器遍历每个MIP模式得到的模板预测块,与模板内重建样本计算失真代价,根据代价最小原则记录最优MIP模式、转置信息。此外,解码器遍历所有允许使用的帧内预测模式,计算得到各帧内预测模式下的幅度值,根据幅度最大原则记录最优DIMD预测模式。
最后,解码器根据得到的最优MIP模式和转置信息,对当前块上侧和左侧相邻的重建样本视情况下采样并根据转置信息进行拼接作为输入向量,并根据MIP模式作为索引读取当前模式下的矩阵系数,然后,通过输入向量与矩阵系数的计算得到输出向量。解码器可根据转置信息进行输出转置,并根据当前块的尺寸和输出向量的样本数对输出向量进行上采样,得到与当前块相同尺寸的输出作为当前块的最优MIP预测块。此外,对于最优DIMD预测模式,解码器获取对应的插值滤波系数,并对上侧和左侧相邻重建样本进行插值滤波得到当前块内所有位置的预测样本,记为最优DIMD预测块。解码器将最 优MIP预测块与最优DIMD预测块根据预先设定好的权重对每个预测样本进行加权平均,得到新的预测块即为当前块的预测块。
另外,解码器确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或解码器确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
步骤3:
解码器继续解析其他帧内预测技术的使用标识位或索引等信息,并根据解析到的信息求得当前块的最终预测块。
步骤4:
解码器解析码流并获取当前块的频域残差块(也称为频域残差信息),并对当前块的频域残差块进行反量化及反变换(先基于第一帧内预测模式进行二次变换的反变换然后进行基础变换或主变换的反变换的反变换)得到当前块的残差块(也称为时域残差块或时域残差信息);然后解码器将当前块的预测块与当前块的残差块叠加得到重建样本块。
步骤5:
若当前图像中所有重建样本块经由环路滤波等技术后,得到最终的重建图像。
可选的,重建图像可以作为视频输出,也可以作为后面解码参考。
本实施例中,最优DIMD预测块的计算过程可参见上文中对DIMD技术介绍所描述的内容,为避免重复,此处不再赘述。此外,最优MIP预测块与最优DIMD预测块的融合权重可以是预先设定值,如最优MIP预测块占比5/9,而最优DIMD预测块占比4/9。当然,在其他可替代实施例中,最优MIP预测块与最优DIMD预测块的融合权重也可以是其他数值,本申请对此不作具体限定。此外,上述第二模板区域和上述第一模板区域可以相同,也可以不同,本申请对此不作具体限定。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文详细描述了本申请的方法实施例,下文结合图11至图13,详细描述本申请的装置实施例。
图11是本申请实施例的解码器500的示意性框图。
如图11所示,所述解码器500可包括:
解析单元510,用于解析当前序列的码流获取当前块的第一变换系数;
变换单元520,用于:
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
对所述第二变换系数进行第二变换,得到所述当前块的残差块;
重建单元530,用于基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
在一些实施例中,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
在一些实施例中,所述变换单元520具体用于:
基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
在一些实施例中,所述变换单元520具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的 帧内预测模式。
在一些实施例中,所述变换单元520具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
在一些实施例中,所述变换单元520具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
在一些实施例中,所述重建单元530还用于:
确定第二帧内预测模式;
其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块。
在一些实施例中,所述重建单元530具体用于:
基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;
基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;
基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
在一些实施例中,所述重建单元530基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,还用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
在一些实施例中,所述变换单元520具体用于:
解析所述当前序列的码流获取第一标识;
若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则确定所述第二帧内预测模式。
在一些实施例中,所述变换单元520具体用于:
若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则解析所述码流获取第二标识;
若所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,则确定所述第二帧内预测模式。
在一些实施例中,所述重建单元530还用于:
基于多个MIP模式的失真代价,确定所述最优MIP模式;
其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
在一些实施例中,所述第二模板区域和所述第一模板区域相同或不同。
在一些实施例中,所述重建单元530具体用于:
基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;
基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
在一些实施例中,所述重建单元530基于多个MIP模式的失真代价,确定所述最优MIP模式之前,还用于:
获取与所述当前块相邻的相邻块使用的MIP模式;
将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
在一些实施例中,所述重建单元530基于多个MIP模式的失真代价,确定所述最优MIP模式之前,还用于:
对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参考列;
以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;
基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
在一些实施例中,所述重建单元530具体用于:
对所述参考行和所述参考列进行下采样,得到输入向量;
以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;
对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
在一些实施例中,所述重建单元530具体用于:
基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
图12是本申请实施例的编码器600的示意性框图。
如图12所示,所述编码器600可包括:
残差单元610,用于获取当前序列中当前块的残差块;
变换单元620,用于:
对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
确定第一帧内预测模式;
其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
编码单元630,用于对所述第四变换系数进行编码。
在一些实施例中,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
在一些实施例中,所述变换单元620具体用于:
基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
在一些实施例中,所述变换单元620具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
在一些实施例中,所述变换单元620具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
在一些实施例中,所述变换单元620具体用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
在一些实施例中,所述残差单元610具体用于:
确定第二帧内预测模式;
其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;
基于所述当前块的预测块,得到所述当前块的残差块。
在一些实施例中,所述残差单元610具体用于:
基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;
基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;
基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
在一些实施例中,所述残差单元610基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,还用于:
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;
若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
在一些实施例中,所述残差单元610具体用于:
获取第一标识;
若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则确定所述第二帧内预测模式;
其中,所述编码单元630具体用于:
对所述第四变换系数和所述第一标识进行编码。
在一些实施例中,所述残差单元610具体用于:
若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到第一率失真代价;
基于至少一个帧内预测模式对所述当前块进行预测,得到至少一个率失真代价;
若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则将基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测得到的预测块,确定为所述当前块的预测块;
其中,所述编码单元630具体用于:
对所述第四变换系数、所述第一标识和第二标识进行编码;
其中,若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测;若所述第一率失真代价大于所述至少一个率失真代价中的最小值,则所述第二标识用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测。
在一些实施例中,所述残差单元610还用于:
基于多个MIP模式的失真代价,确定所述最优MIP模式;
其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
在一些实施例中,所述第二模板区域和所述第一模板区域相同或不同。
在一些实施例中,所述残差单元610具体用于:
基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;
基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
在一些实施例中,所述残差单元610基于多个MIP模式的失真代价,确定所述最优MIP模式之前,还用于:
获取与所述当前块相邻的相邻块使用的MIP模式;
将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
在一些实施例中,所述残差单元610基于多个MIP模式的失真代价,确定所述最优MIP模式之前,还用于:
对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参 考列;
以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;
基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
在一些实施例中,所述残差单元610具体用于:
对所述参考行和所述参考列进行下采样,得到输入向量;
以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;
对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
在一些实施例中,所述残差单元610具体用于:
基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图11所示的解码器500可以对应于执行本申请实施例的方法300中的相应主体,并且解码器500中的各个单元的前述和其它操作和/或功能分别为了实现方法300等各个方法中的相应流程。类似的,图12所示的编码器600可以对应于执行本申请实施例的方法400中的相应主体,即编码器600中的各个单元的前述和其它操作和/或功能分别为了实现方法400等各个方法中的相应流程。
还应当理解,本申请实施例涉及的解码器500或编码器600中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该解码器500或编码器600也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的解码器500或编码器600,以及来实现本申请实施例的编码方法或解码方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于电子设备中,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图13是本申请实施例提供的电子设备700的示意结构图。
如图13所示,该电子设备700至少包括处理器710以及计算机可读存储介质720。其中,处理器710以及计算机可读存储介质720可通过总线或者其它方式连接。计算机可读存储介质720用于存储计算机程序721,计算机程序721包括计算机指令,处理器710用于执行计算机可读存储介质720存储的计算机指令。处理器710是电子设备700的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器710也可称为中央处理器(Central Processing Unit,CPU)。处理器710可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质720可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器710的计算机可读存储介质。具体而言,计算机可读存储介质720包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic  RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的编码器或编码框架;该计算机可读存储介质720中存储有第一计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第一计算机指令,以实现本申请实施例提供的编码方法中的相应步骤;换言之,计算机可读存储介质720中的第一计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的解码器或解码框架;该计算机可读存储介质720中存储有第二计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第二计算机指令,以实现本申请实施例提供的解码方法中的相应步骤;换言之,计算机可读存储介质720中的第二计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种编解码系统,包括上文涉及的编码器和解码器。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是电子设备700中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质720。可以理解的是,此处的计算机可读存储介质720既可以包括电子设备700中的内置存储介质,当然也可以包括电子设备700所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了电子设备700的操作系统。并且,在该存储空间中还存放了适于被处理器710加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序721(包括程序代码)。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序721。此时,数据处理设备700可以是计算机,处理器710从计算机可读存储介质720读取该计算机指令,处理器710执行该计算机指令,使得该计算机执行上述各种可选方式中提供的编码方法或解码方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (42)

  1. 一种解码方法,其特征在于,包括:
    解析当前序列的码流获取当前块的第一变换系数;
    确定第一帧内预测模式;
    其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
    基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
    对所述第二变换系数进行第二变换,得到所述当前块的残差块;
    基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
  2. 根据权利要求1所述的方法,其特征在于,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
  3. 根据权利要求1或2所述的方法,其特征在于,所述确定第一帧内预测模式,包括:
    基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
  4. 根据权利要求3所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
  5. 根据权利要求3所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
  6. 根据权利要求3所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    确定第二帧内预测模式;
    其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
    基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:
    基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;
    基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;
    基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,所述方法还包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述确定第二帧内预测模式,包括:
    解析所述当前序列的码流获取第一标识;
    若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则确定所述第二帧内预测模式。
  11. 根据权利要求10所述的方法,其特征在于,所述确定第二帧内预测模式,包括:
    若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则解析所述码流获取第二标识;
    若所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,则确定所述第二帧内预测模式。
  12. 根据权利要求1至11中任一项所述的方法,其特征在于,所述方法还包括:
    基于多个MIP模式的失真代价,确定所述最优MIP模式;
    其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
  13. 根据权利要求12所述的方法,其特征在于,所述第二模板区域和所述第一模板区域相同或不同。
  14. 根据权利要求12或13所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式,包括:
    基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;
    基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
  15. 根据权利要求12至14中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式之前,所述方法还包括:
    获取与所述当前块相邻的相邻块使用的MIP模式;
    将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
  16. 根据权利要求12至15中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式之前,所述方法还包括:
    对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参考列;
    以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;
    基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
  17. 根据权利要求16所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:
    对所述参考行和所述参考列进行下采样,得到输入向量;
    以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;
    对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
  18. 根据权利要求12至17中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式,包括:
    基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
  19. 一种编码方法,其特征在于,所述方法适用于编码器,所述方法包括:
    获取当前序列中当前块的残差块;
    对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
    确定第一帧内预测模式;
    其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
    基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
    对所述第四变换系数进行编码。
  20. 根据权利要求19所述的方法,其特征在于,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样前的向量;或,所述最优MIP模式的输出向量为所述最优MIP模式输出的上采样后的向量。
  21. 根据权利要求19或20所述的方法,其特征在于,所述确定第一帧内预测模式,包括:
    基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式。
  22. 根据权利要求21所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述最优MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式。
  23. 根据权利要求21所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和由所述TIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为由所述TIMD模式导出的帧内预测模式。
  24. 根据权利要求21所述的方法,其特征在于,所述基于用于对所述当前块进行预测的预测模式,确定所述第一帧内预测模式,包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式,则确定所述第一帧内预测模式为对所述当前块的预测块使用所述DIMD模式导出的帧内预测模式,或确定所述第一帧内预测模式为对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式。
  25. 根据权利要求19至22中任一项所述的方法,其特征在于,所述获取当前序列中当前块的残差块,包括:
    确定第二帧内预测模式;
    其中,所述第二帧内预测模式包括以下中的任一项:用于预测所述当前块的次优MIP模式、对所述第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由所述TIMD模式导出的帧内预测模式;
    基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块;
    基于所述当前块的预测块,得到所述当前块的残差块。
  26. 根据权利要求25所述的方法,其特征在于,所述基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:
    基于所述最优MIP模式对所述当前块进行预测,得到第一预测块;
    基于所述第二帧内预测模式对所述当前块进行预测,得到第二预测块;
    基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块。
  27. 根据权利要求26所述的方法,其特征在于,所述基于所述最优MIP模式的权重和所述第二帧内预测模式的权重,对所述第一预测块和所述第二预测块进行加权处理,得到所述当前块的预测块之前,所述方法还包括:
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和用于预测所述当前块的次优MIP模式或由所述TIMD模式导出的帧内预测模式,则基于所述最优MIP模式的失真代价和所述第二帧内预测模式的失真代价,确定所述最优MIP模式的权重和所述第二帧内预测模式的权重;
    若用于对所述当前块进行预测的预测模式包括所述最优MIP模式和对与所述第一模板区域内的重 建样本使用所述DIMD模式导出的帧内预测模式,则确定所述最优MIP模式的权重和所述第二帧内预测模式的权重均为预设值。
  28. 根据权利要求25至27中任一项所述的方法,其特征在于,所述确定第二帧内预测模式,包括:
    获取第一标识;
    若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则确定所述第二帧内预测模式;
    其中,所述对所述第四变换系数进行编码,包括:
    对所述第四变换系数和所述第一标识进行编码。
  29. 根据权利要求28所述的方法,其特征在于,所述基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到所述当前块的预测块,包括:
    若所述第一标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前序列中的图像块进行预测,则基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测,得到第一率失真代价;
    基于至少一个帧内预测模式对所述当前块进行预测,得到至少一个率失真代价;
    若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则将基于所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测得到的预测块,确定为所述当前块的预测块;
    其中,所述对所述第四变换系数和所述第一标识进行编码,包括:
    对所述第四变换系数、所述第一标识和第二标识进行编码;
    其中,若所述第一率失真代价小于或等于所述至少一个率失真代价中的最小值,则所述第二标识用于标识允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测;若所述第一率失真代价大于所述至少一个率失真代价中的最小值,则所述第二标识用于标识不允许使用所述最优MIP模式和所述第二帧内预测模式对所述当前块进行预测。
  30. 根据权利要求19至29中任一项所述的方法,其特征在于,所述方法还包括:
    基于多个MIP模式的失真代价,确定所述最优MIP模式;
    其中,所述多个MIP模式的失真代价包括利用所述多个MIP模式对与所述当前块相邻的第二模板区域内的样本进行预测得到的失真代价。
  31. 根据权利要求30所述的方法,其特征在于,所述第二模板区域和所述第一模板区域相同或不同。
  32. 根据权利要求30或31所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式,包括:
    基于第三标识和所述多个MIP模式对所述第二模板区域内的样本进行预测,得到所述第三标识的每一个状态下所述多个MIP模式的失真代价;所述第三标识的用于标识是否转置MIP模式的输入向量和输出向量;
    基于所述第三标识的每一个状态下所述多个MIP模式的失真代价,确定所述最优MIP模式。
  33. 根据权利要求30至32中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式之前,所述方法还包括:
    获取与所述当前块相邻的相邻块使用的MIP模式;
    将所述相邻块使用的MIP模式,确定为所述多个MIP模式。
  34. 根据权利要求30至33中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式之前,所述方法还包括:
    对所述第二模板区域外部相邻的参考区域进行重建样本填充,得到所述第二模板区域的参考行和参考列;
    以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块;
    基于所述多个预测块和所述第二模板区域内的重建块,确定所述多个MIP模式的失真代价。
  35. 根据权利要求34所述的方法,其特征在于,所述以所述参考行和所述参考列为输入,利用所述多个MIP模式分别对所述第二模板区域内的样本进行预测,得到所述多个MIP模式对应的多个预测块,包括:
    对所述参考行和所述参考列进行下采样,得到输入向量;
    以所述输入向量为输入,通过遍历所述多个MIP模式的方式对所述第二模板区域内的样本进行预测,得到所述多个MIP模式的输出向量;
    对所述多个MIP模式的输出向量进行上采样,得到所述多个MIP模式对应的预测块。
  36. 根据权利要求30至35中任一项所述的方法,其特征在于,所述基于多个MIP模式的失真代价,确定所述最优MIP模式,包括:
    基于所述多个MIP模式在所述第二模板区域上的绝对变换差的和SATD,确定所述最优MIP模式。
  37. 一种解码器,其特征在于,包括:
    解析单元,用于解析当前序列的码流获取当前块的第一变换系数;
    变换单元,用于:
    确定第一帧内预测模式;
    其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
    基于所述第一帧内预测模式所对应的变换集,对所述第一变换系数进行第一变换,得到所述当前块的第二变换系数;
    对所述第二变换系数进行第二变换,得到所述当前块的残差块;
    重建单元,用于基于所述当前块的预测块和所述当前块的残差块,确定所述当前块的重建块。
  38. 一种编码器,其特征在于,包括:
    残差单元,用于获取当前序列中当前块的残差块;
    变换单元,用于:
    对所述当前块的残差块进行第三变换,得到所述当前块的第三变换系数;
    确定第一帧内预测模式;
    其中,所述第一帧内预测模式包括以下中的任一项:对所述当前块的预测块使用由解码器侧帧内模式导出DIMD模式导出的帧内预测模式、对用于预测所述当前块的最优基于矩阵的帧内预测MIP模式的输出向量使用所述DIMD模式导出的帧内预测模式、对与所述当前块相邻的第一模板区域内的重建样本使用所述DIMD模式导出的帧内预测模式、由基于模板的帧内模式导出TIMD模式导出的帧内预测模式;
    基于所述第一帧内预测模式所对应的变换集,对所述第三变换系数进行第四变换,得到所述当前块的第四变换系数;
    编码单元,用于对所述第四变换系数进行编码。
  39. 一种电子设备,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至18中任一项所述的方法或如权利要求19至36中任一项所述的方法。
  40. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至18中任一项所述的方法或如权利要求19至36中任一项所述的方法。
  41. 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至18中任一项所述的方法或如权利要求19至36中任一项所述的方法。
  42. 一种码流,其特征在于,所述码流如权利要求1至18中任一项所述的方法中的码流或如权利要求19至36中任一项所述的方法生成的码流。
PCT/CN2022/103654 2022-07-04 2022-07-04 解码方法、编码方法、解码器以及编码器 WO2024007116A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/103654 WO2024007116A1 (zh) 2022-07-04 2022-07-04 解码方法、编码方法、解码器以及编码器
TW112123269A TW202404370A (zh) 2022-07-04 2023-06-20 解碼方法、編碼方法、解碼器、編碼器、電子設備、電腦可讀儲存媒介、電腦程式產品以及碼流

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103654 WO2024007116A1 (zh) 2022-07-04 2022-07-04 解码方法、编码方法、解码器以及编码器

Publications (1)

Publication Number Publication Date
WO2024007116A1 true WO2024007116A1 (zh) 2024-01-11

Family

ID=89454704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103654 WO2024007116A1 (zh) 2022-07-04 2022-07-04 解码方法、编码方法、解码器以及编码器

Country Status (2)

Country Link
TW (1) TW202404370A (zh)
WO (1) WO2024007116A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
US20190215521A1 (en) * 2016-09-22 2019-07-11 Mediatek Inc. Method and apparatus for video coding using decoder side intra prediction derivation
CN113557718A (zh) * 2019-06-04 2021-10-26 腾讯美国有限责任公司 视频编解码的方法和装置
CN113632488A (zh) * 2019-04-17 2021-11-09 华为技术有限公司 协调基于矩阵的帧内预测和二次变换核选择的编码器、解码器及相应方法
WO2022140718A1 (en) * 2020-12-22 2022-06-30 Qualcomm Incorporated Decoder side intra mode derivation for most probable mode list construction in video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
US20190215521A1 (en) * 2016-09-22 2019-07-11 Mediatek Inc. Method and apparatus for video coding using decoder side intra prediction derivation
CN113632488A (zh) * 2019-04-17 2021-11-09 华为技术有限公司 协调基于矩阵的帧内预测和二次变换核选择的编码器、解码器及相应方法
CN113557718A (zh) * 2019-06-04 2021-10-26 腾讯美国有限责任公司 视频编解码的方法和装置
WO2022140718A1 (en) * 2020-12-22 2022-06-30 Qualcomm Incorporated Decoder side intra mode derivation for most probable mode list construction in video coding

Also Published As

Publication number Publication date
TW202404370A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
TWI834773B (zh) 使用適應性迴路濾波器以編碼和解碼影像之一或多個影像部分的方法、裝置和電腦可讀儲存媒體
US10298961B2 (en) Method, apparatus and system for de-blocking a block of video samples
JP7277616B2 (ja) ビデオ・データを処理する方法、装置及び記憶媒体
CN114223208A (zh) 为视频中的缩减二次变换的边信息的上下文建模
CN112514401A (zh) 环路滤波的方法与装置
WO2021185008A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
CN117596413A (zh) 视频处理方法及装置
JP2022544438A (ja) ループ内フィルタリングの方法及びループ内フィルタリングの装置
CN114830663A (zh) 变换方法、编码器、解码器以及存储介质
CN113068026B (zh) 编码预测方法、装置及计算机存储介质
WO2022116085A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2022116113A1 (zh) 一种帧内预测方法、装置及解码器和编码器
JP7467687B2 (ja) 符号化・復号方法及び装置
CN118044184A (zh) 用于执行组合帧间预测和帧内预测的方法和系统
WO2024007116A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023193253A1 (zh) 解码方法、编码方法、解码器以及编码器
CN116250230A (zh) 视频编码的、基于偏移的帧内预测细化(orip)
WO2021134303A1 (zh) 变换方法、编码器、解码器以及存储介质
WO2023193254A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023197179A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023197181A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2021134327A1 (zh) 变换方法、编码器、解码器以及存储介质
WO2023197180A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023070505A1 (zh) 帧内预测方法、解码器、编码器及编解码系统
CN114175653B (zh) 用于视频编解码中的无损编解码模式的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949708

Country of ref document: EP

Kind code of ref document: A1