WO2023039856A1 - 一种视频解码、编码方法及设备、存储介质 - Google Patents

一种视频解码、编码方法及设备、存储介质 Download PDF

Info

Publication number
WO2023039856A1
WO2023039856A1 PCT/CN2021/119157 CN2021119157W WO2023039856A1 WO 2023039856 A1 WO2023039856 A1 WO 2023039856A1 CN 2021119157 W CN2021119157 W CN 2021119157W WO 2023039856 A1 WO2023039856 A1 WO 2023039856A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
prediction block
gradient
residual data
pixels
Prior art date
Application number
PCT/CN2021/119157
Other languages
English (en)
French (fr)
Inventor
王凡
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/119157 priority Critical patent/WO2023039856A1/zh
Priority to CN202180102264.6A priority patent/CN117957842A/zh
Publication of WO2023039856A1 publication Critical patent/WO2023039856A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/88Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks

Definitions

  • Embodiments of the present disclosure relate to but are not limited to the technical field of video data processing, and in particular, relate to a video decoding method, an encoding method and device, and a storage medium.
  • Digital video compression technology mainly compresses huge digital image and video data to facilitate transmission and storage.
  • Digital video compression standards can save a lot of video data, it is still necessary to pursue better digital video compression technology to reduce digital video.
  • the bandwidth and traffic pressure of video transmission achieve more efficient video codec and transmission storage.
  • An embodiment of the present disclosure provides a video decoding method, including:
  • a reconstructed image is obtained from the translated residual data.
  • An embodiment of the present disclosure also provides a video coding method, including:
  • the coded code stream is obtained according to the shifted residual data.
  • An embodiment of the present disclosure also provides a video decoding device, including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, any one of the present disclosure can be implemented.
  • a video decoding device including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, any one of the present disclosure can be implemented.
  • the video decoding method described in the embodiment is not limited to the embodiment.
  • An embodiment of the present disclosure also provides a video encoding device, including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, any one of the present disclosure can be implemented.
  • a video encoding device including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, any one of the present disclosure can be implemented.
  • An embodiment of the present disclosure further provides a video encoding and decoding system, which includes the video decoding device according to any embodiment of the present disclosure and/or the video encoding device according to any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, the computer program described in any embodiment of the present disclosure can be implemented.
  • the above video decoding method or video encoding method is a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, the computer program described in any embodiment of the present disclosure can be implemented.
  • An embodiment of the present disclosure further provides a code stream, wherein the code stream is generated according to the video encoding method described in any embodiment of the present disclosure.
  • FIG. 1 is a structural block diagram of a video codec system that can be used in an embodiment of the present disclosure
  • FIG. 2 is a structural block diagram of a video encoder that can be used in an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of a video decoder that can be used in an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of the effect of a DCT transformation that can be used in an embodiment of the present disclosure
  • Fig. 5 is a schematic diagram of a coding and decoding process that can be used in an embodiment of the present disclosure
  • FIG. 6 is a flowchart of a video decoding method that can be used in an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of a VCC intra prediction mode that can be used in an embodiment of the present disclosure.
  • FIG. 8 is a flow chart of a video encoding method that can be used in an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of residual data translation that can be used in embodiments of the present disclosure.
  • Fig. 10 is a schematic diagram of another residual data translation that can be used in embodiments of the present disclosure.
  • Fig. 11 is a schematic structural diagram of a video encoding/decoding device applicable to an embodiment of the present disclosure.
  • Each frame in the video is divided into square LCU largest coding units or CTU Coding Tree Units of the same size (such as 128x128, 64x64, etc.).
  • Each maximum coding unit or coding tree unit can be divided into rectangular coding units (CU coding units) according to rules.
  • the coding unit may also be divided into a prediction unit (PU prediction unit), a transformation unit (TU transform unit), etc.
  • the hybrid coding framework includes modules such as prediction, transform, quantization, entropy coding, and in loop filter.
  • the prediction module includes intra prediction and inter prediction.
  • Inter prediction includes motion estimation and motion compensation.
  • the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • mainstream video codec standards include H.264/Advanced Video Coding (Advanced Video Coding, AVC), H.265/High Efficiency Video Coding (High Efficiency Video Coding, HEVC), H.266/Versatile Video Coding (Multiple Functional Video Coding, VVC), MPEG (Moving Picture Experts Group, Dynamic Image Experts Group), AOM (Open Media Alliance, Alliance for Open Media), AVS (Audio Video coding Standard, audio and video coding standards) and the expansion of these standards, Or any other custom standards, etc. These standards reduce the amount of transmitted data and stored data through video compression technology, so as to achieve more efficient video codec and transmission storage.
  • the input image is divided into blocks of fixed size as the basic unit of encoding, and it is called a macro block (MB, Macro Block), including a luma block and two chrominance blocks, luma
  • MB Macro Block
  • the block size is 16 ⁇ 16. If 4:2:0 sampling is used, the chroma block size is half of the luma block size.
  • the macroblock is further divided into small blocks for prediction.
  • the macroblock can be divided into small blocks of 16 ⁇ 16, 8 ⁇ 8, and 4 ⁇ 4, and intra-frame prediction is performed on each small block.
  • the macroblock is divided into 4 ⁇ 4 or 8 ⁇ 8 small blocks, and the prediction residual in each small block is transformed and quantized respectively to obtain quantized coefficients.
  • H.265/HEVC Compared with H.264/AVC, H.265/HEVC has taken improvement measures in multiple encoding links.
  • an image is divided into coding tree units (CTU, Coding Tree Unit), and CTU is the basic unit of coding (corresponding to macroblocks in H.264/AVC).
  • a CTU includes a luma coding tree block (CTB, Coding Tree Block) and two chrominance coding tree blocks.
  • CTB luma coding tree block
  • the maximum size of a CU in the H.265/HEVC standard is generally 64 ⁇ 64.
  • CTU is iteratively divided into a series of coding units (CU, Coding Unit) in the form of quadtree (QT, Quadro Tree).
  • CU is the basic unit of intra/inter coding.
  • a CU includes a luma coding block (CB, Coding Block) and two chroma coding blocks and related syntax structures.
  • the maximum CU size is CTU, and the minimum CU size is 8 ⁇ 8.
  • the leaf node CUs obtained through coding tree division can be divided into three types according to different prediction methods: intra CU for intra-frame prediction, inter CU for inter-frame prediction, and skipped CU.
  • the skipped CU can be regarded as a special case of the inter CU, which does not contain motion information and prediction residual information.
  • the leaf node CU contains one or more prediction units (PU, Prediction Unit).
  • H.265/HEVC supports PUs of 4 ⁇ 4 to 64 ⁇ 64 sizes, and there are eight division modes in total.
  • For the intra coding mode there are two possible division modes: Part_2Nx2N and Part_NxN.
  • the CU uses the prediction residual quadtree to divide it into transform units (TU: Transform Unit).
  • a TU includes a luma transform block (TB, Transform Block) and two chroma transform blocks. Only square division is allowed, and one CB is divided into 1 or 4 PBs.
  • the same TU has the same transformation and quantization process, and the supported sizes are from 4 ⁇ 4 to 32 ⁇ 32.
  • TB can cross the boundary of PB to further maximize the coding efficiency of inter coding.
  • H.266/VVC video coded images are first divided into coding tree units CTU similar to H.265/HEVC, but the maximum size is increased from 64 ⁇ 64 to 128 ⁇ 128.
  • H.266/VVC proposed quadtree and nested multi-type tree (MTT, Multi-Type Tree) division, MTT includes binary tree (BT, Binary Tree) and ternary tree (TT, Ternary Tree), and unified H. 265/HEVC concepts of CU, PU, and TU, and supports more flexible CU division shapes.
  • the CTU is divided according to the quadtree structure, and the leaf nodes are further divided by MTT.
  • the leaf nodes of the multi-type tree become the coding unit CU.
  • chrominance can adopt a separate partition tree structure instead of keeping the same with the luma partition tree.
  • the chroma division of I frame in H.266/VVC adopts chroma separation tree, and the chroma division of P frame and B frame is consistent with the luma division.
  • FIG. 1 is a block diagram of a video encoding and decoding system applicable to an embodiment of the present disclosure.
  • the system is divided into an encoding-side device 1 and a decoding-side device 2 , and the encoding-side device 1 encodes video images to generate a code stream.
  • the device 2 on the decoding side can decode the code stream to obtain a reconstructed video image.
  • the encoding side device 1 and the decoding side device 2 may include one or more processors and memory coupled to the one or more processors, such as random access memory, charged erasable programmable read-only memory, flash memory or other media.
  • the encoding side device 1 and the decoding side device 2 can be implemented with various devices, such as desktop computers, mobile computing devices, notebook computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, vehicle-mounted computers, or other similar installation.
  • the device 2 on the decoding side can receive the code stream from the device 1 on the encoding side via the link 3 .
  • the link 3 includes one or more media or devices capable of moving the code stream from the device 1 on the encoding side to the device 2 on the decoding side.
  • the link 3 includes one or more communication media that enable the device 1 on the encoding side to directly transmit the code stream to the device 2 on the decoding side.
  • the device 1 on the encoding side can modulate the code stream according to a communication standard (such as a wireless communication protocol), and can send the modulated code stream to the device 2 on the decoding side.
  • the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from device 1 on the encoding side to device 2 on the decoding side.
  • the code stream can also be output from the output interface 15 to a storage device, and the decoding-side device 2 can read the stored data from the storage device via streaming or downloading.
  • the storage device may comprise any of a variety of distributed-access or locally-accessed data storage media, such as hard disk drives, Blu-ray Discs, Digital Versatile Discs, CD-ROMs, flash memory, volatile or non-volatile Volatile memory, file servers, and more.
  • distributed-access or locally-accessed data storage media such as hard disk drives, Blu-ray Discs, Digital Versatile Discs, CD-ROMs, flash memory, volatile or non-volatile Volatile memory, file servers, and more.
  • the encoding side device 1 includes a data source 11 , an encoder 13 and an output interface 15 .
  • Data sources 11 may include video capture devices (eg, video cameras), archives containing previously captured data, feed interfaces to receive data from content providers, computer graphics systems to generate data, or combinations of these sources.
  • the encoder 13 can encode the data from the data source 11 and output it to the output interface 15, and the output interface 15 can include at least one of an adjuster, a modem and a transmitter.
  • the decoding side device 2 includes an input interface 21 , a decoder 23 and a display device 25 .
  • input interface 21 includes at least one of a receiver and a modem.
  • the input interface 21 can receive the code stream via the link 3 or from a storage device.
  • the decoder 23 decodes the received code stream.
  • the display device 25 is used for displaying the decoded data, and the display device 25 may be integrated with other devices of the decoding side device 2 or provided separately.
  • the display device 25 may be, for example, a liquid crystal display, a plasma display, an organic light emitting diode display or other types of display devices.
  • the device 2 on the decoding side may not include the display device 25 , or may include other devices or devices for applying the decoded data.
  • Encoder 13 and decoder 23 of Fig. 1 can use any one in the following circuits or any combination of following circuits to realize: one or more microprocessors, digital signal processors, application specific integrated circuits, field programmable gate arrays , discrete logic, hardware. If the present disclosure is implemented partially in software, instructions for the software may be stored in a suitable non-transitory computer-readable storage medium and executed in hardware using one or more processors to thereby Implement the disclosed method.
  • Fig. 2 is a structural block diagram of an exemplary video encoder.
  • the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video encoder can also be used for videos of H.264/AVC, H.266/VVC and other similar standards coding.
  • the video encoder 20 is used to encode video data to generate code streams.
  • the video encoder 20 includes a prediction processing unit 100, a division unit 101, a prediction residual generation unit 102, a transformation processing unit 104, a quantization unit 106, an inverse quantization unit 108, an inverse transformation processing unit 110, a reconstruction unit 112, A filter unit 113 , a decoded picture buffer 114 , and an entropy coding unit 116 .
  • the prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126 .
  • video encoder 20 may contain more, fewer or different functional components than this example. Both the prediction residual generation unit 102 and the reconstruction unit 112 are represented by circles with plus signs in the figure.
  • the division unit 101 cooperates with the prediction processing unit 100 to divide the received video data into slices (Slice), CTU or other larger units.
  • the video data received by the dividing unit 101 may be a video sequence including video frames such as I frames, P frames, or B frames.
  • the prediction processing unit 100 may divide a CTU into CUs, and perform intra-frame predictive coding or inter-frame predictive coding on the CUs.
  • the 2N ⁇ 2N CU can be divided into 2N ⁇ 2N or N ⁇ N prediction units (PU: prediction unit) for intra-frame prediction.
  • PU prediction unit
  • the 2N ⁇ 2N CU can be divided into PUs of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or other sizes for inter-frame prediction, and asymmetrical PUs can also be supported divided.
  • the inter prediction processing unit 121 may perform inter prediction on the PU to generate prediction data of the PU, the prediction data including the prediction block of the PU, motion information of the PU and various syntax elements.
  • the intra prediction processing unit 126 may perform intra prediction on the PU to generate prediction data for the PU.
  • the prediction data for a PU may include the prediction block and various syntax elements for the PU.
  • the intra-frame prediction processing unit 126 may try multiple selectable intra-frame prediction modes, and select an intra-frame prediction mode with the least cost to perform intra-frame prediction on the PU.
  • the prediction residual generation unit 102 may generate a prediction residual block of the CU based on the original block of the CU and the prediction block of the PU into which the CU is divided.
  • the transform processing unit 104 may divide the CU into one or more transform units (TU: Transform Unit), and the prediction residual block associated with the TU is a sub-block obtained by dividing the prediction residual block of the CU.
  • a TU-associated coefficient block is generated by applying one or more transforms to the TU-associated prediction residual block.
  • the transform processing unit 104 may apply discrete cosine transform (DCT: Discrete Cosine Transform), directional transform or other transforms to the prediction residual block associated with the TU, and may convert the prediction residual block from the pixel domain to the frequency domain.
  • DCT discrete cosine transform
  • the quantization unit 106 can quantize the coefficients in the coefficient block based on a selected quantization parameter (QP). Quantization may cause quantization losses. By adjusting the QP value, the degree of quantization of the coefficient block can be adjusted.
  • QP quantization parameter
  • the inverse quantization unit 108 and the inverse transformation unit 110 may respectively apply inverse quantization and inverse transformation to the coefficient blocks to obtain TU-associated reconstructed prediction residual blocks.
  • the reconstruction unit 112 may generate a reconstructed block of the CU based on the reconstructed prediction residual block and the prediction block generated by the prediction processing unit 100 .
  • the filter unit 113 performs loop filtering on the reconstructed block and stores it in the decoded picture buffer 114 .
  • the intra prediction processing unit 126 may extract the reconstructed reference information adjacent to the PU from the reconstructed blocks cached in the decoded picture buffer 114 to perform intra prediction on the PU.
  • Inter prediction processing unit 121 may perform inter prediction on PUs of other pictures using reference pictures cached by decoded picture buffer 114 that contain reconstructed blocks.
  • the entropy coding unit 116 can perform entropy coding operations on received data (such as syntax elements, quantized system blocks, motion information, etc.), such as performing context adaptive variable length coding (CAVLC: Context Adaptive Variable Length Coding), context self- Adapt to Binary Arithmetic Coding (CABAC: Context-based Adaptive Binary Arithmetic Coding), etc., and output code stream (that is, coded video code stream).
  • CAVLC Context Adaptive Variable Length Coding
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • FIG. 3 is a structural block diagram of an exemplary video decoder.
  • the description is mainly based on the terminology and block division of the H.265/HEVC standard, but the structure of the video decoder can also be used for videos of H.264/AVC, H.266/VVC and other similar standards decoding.
  • the video decoder 30 can decode the received code stream and output decoded video data.
  • the video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transformation processing unit 156, a reconstruction unit 158 (indicated by a circle with a plus sign in the figure), a filter unit 159 , and the picture buffer 160.
  • video decoder 30 may contain more, fewer or different functional components.
  • the entropy decoding unit 150 may perform entropy decoding on the received code stream to extract information such as syntax elements, quantized coefficient blocks, and PU motion information.
  • the prediction processing unit 152 , the inverse quantization unit 154 , the inverse transform processing unit 156 , the reconstruction unit 158 and the filter unit 159 can all perform corresponding operations based on the syntax elements extracted from the code stream.
  • the inverse quantization unit 154 may inverse quantize the quantized TU-associated coefficient blocks.
  • Inverse transform processing unit 156 may apply one or more inverse transforms to the inverse quantized coefficient block in order to generate a reconstructed prediction residual block for the TU.
  • Prediction processing unit 152 includes inter prediction processing unit 162 and intra prediction processing unit 164 . If the PU is encoded using intra-frame prediction, the intra-frame prediction processing unit 164 can determine the intra-frame prediction mode of the PU based on the syntax elements parsed from the code stream, and according to the determined intra-frame prediction mode and the adjacent PU obtained from the picture buffer device 60 Intra prediction is performed on the reconstructed reference information, resulting in a prediction block of the PU. If the PU is encoded using inter-prediction, inter-prediction processing unit 162 may determine one or more reference blocks for the PU based on the motion information of the PU and corresponding syntax elements to generate a predictive block for the PU.
  • the reconstruction unit 158 may obtain the reconstruction block of the CU based on the reconstruction prediction residual block associated with the TU and the prediction block of the PU generated by the prediction processing unit 152 (ie intra prediction data or inter prediction data).
  • the filter unit 159 may perform loop filtering on the reconstructed block of the CU to obtain a reconstructed picture.
  • the reconstructed pictures are stored in the picture buffer 160 .
  • the picture buffer 160 can provide reference pictures for subsequent motion compensation, intra prediction, inter prediction, etc., and can also output the reconstructed video data as decoded video data for presentation on a display device.
  • encoding at the encoder end and decoding at the decoder end may also be collectively referred to as encoding or decoding. According to the contextual description of the relevant steps, those skilled in the art can know whether the encoding (decoding) mentioned later refers to the encoding at the encoder end or the decoding at the decoder end.
  • coding block or "video block” may be used in this application to refer to one or more blocks of samples, as well as the syntax structure for encoding (decoding) one or more blocks of samples; Instance types may include CTU, CU, PU, TU, subblock in H.265/HEVC, or macroblocks and macroblock partitions in other video codec standards.
  • CTU is the abbreviation of Coding Tree Unit, which is equivalent to the macroblock in H.264/AVC.
  • a coding tree unit should include one luma coding tree block (CTB) and two chrominance coding tree blocks (CTB) (Cr, Cb) at the same position.
  • the coding unit CU (Coding Unit) is the basic unit for various types of coding operations or decoding operations in the video coding and decoding process, such as CU-based prediction, transformation, entropy coding, and other operations.
  • CU refers to a two-dimensional sampling point array, which may be a square array or a rectangular array.
  • a 4x8 CU can be regarded as a square sampling point array composed of 4x8 and 32 sampling points.
  • a CU may also be called a tile.
  • a CU includes: one luma coding block and two chrominance (Cr, Cb) coding blocks, and related syntax structures.
  • a prediction unit also called a prediction block, includes: a luma prediction block and two chrominance (Cr, Cb) prediction blocks.
  • the residual block refers to the residual image block formed by subtracting the predicted block from the current block to be coded after the predicted block of the current block is generated by inter-frame prediction and/or intra-frame prediction, which can also be called residual
  • the difference data or residual image includes: one luma residual block and two chrominance (Cr, Cb) residual blocks.
  • the coefficient block includes transforming the residual block to obtain a transform block (TU, Transform Unit) containing transform coefficients, or does not transform the residual block, including a residual block containing residual data (residual signal).
  • the coefficients include the coefficients of the transform block obtained by transforming the residual block, or the coefficients of the residual block, and performing entropy coding on the coefficients includes performing entropy coding on the coefficients of the transform block after quantization, or, if not Applying a transform to the residual data includes entropy coding the quantized coefficients of the residual block.
  • the untransformed residual signal and the transformed residual signal may also be collectively referred to as coefficients. for efficient compression.
  • the coefficients need to be quantized, and the quantized coefficients can also be called levels.
  • the transform block TU includes: one luma transform block and two chrominance (Cr, Cb) transform blocks.
  • Quantization is usually used to reduce the dynamic range of coefficients, so as to achieve the purpose of expressing video with fewer codewords.
  • the quantized value is usually called a level.
  • the operation of quantization is usually to divide the coefficient by the quantization step size, and the quantization step size is determined by the quantization factor transmitted in the code stream. Inverse quantization is done by multiplying the level by the quantization step. For a block of N ⁇ M size, the quantization of all coefficients can be done independently.
  • This technology is widely used in many international video compression standards, such as H.265/HEVC, H.266/VVC, etc.
  • a specific scan order can transform a two-dimensional coefficient block into a one-dimensional coefficient stream.
  • the scan sequence can be Z-type, horizontal, vertical or any other sequential scan.
  • the quantization operation can use the correlation between coefficients and the characteristics of the quantized coefficients to select a better quantization method, so as to achieve the purpose of optimizing quantization.
  • the residual block is usually much simpler than the original image, so determining the residual after prediction, and then encoding can significantly improve the compression efficiency.
  • the residual block is not directly encoded, but usually transformed first.
  • the transformation is to transform the residual image from the spatial domain to the frequency domain, and remove the correlation of the residual image. After the residual image is transformed into the frequency domain, since most of the energy is concentrated in the low-frequency region, most of the transformed non-zero coefficients are concentrated in the upper left corner.
  • quantization is next used for further compression. And because the human eye is not sensitive to high frequencies, a larger quantization step size can be used in high frequency areas to further improve compression efficiency.
  • DCT Discrete Cosine Transform
  • DCT8 type and DST7 type can also be used in H.266/VVC. These conversion formulas are as follows:
  • DCT2, DCT8, and DST7 transformations used in relevant standards are all split into levels
  • the one-dimensional transformation of direction and vertical direction is divided into two steps. For example, the transformation in the horizontal direction is performed first and then the transformation in the vertical direction, or the transformation in the vertical direction is performed first and then the transformation in the horizontal direction.
  • the implementation of the present disclosure provides an encoding method, which performs correlation processing on the residual image, so that the texture of the processed residual image is more suitable for subsequent transformation operations, or the coefficient matrix obtained after the processed residual image undergoes subsequent transformation operations It is easier to compress and more effectively improves the encoding and compression efficiency.
  • the basic transformation used in the video codec standard is a transformation that separates the horizontal direction and the vertical direction.
  • the implementation of the present disclosure proposes a encoding and decoding scheme, which performs deformation processing (translation and/or exchange) on the residual block before transformation, so as to obtain horizontal or vertical texture, or close to horizontal or Vertical textures, transformed based on the deformed residual block, will get fewer transform coefficients to improve compression efficiency.
  • deformation processing transformation and/or exchange
  • the inversely transformed residual image is subjected to deformation processing (translation and/or exchange) opposite to that during encoding to obtain a residual block to be decoded.
  • the original image (of the current block) is subtracted from the predicted image to obtain a residual image, and the residual image undergoes residual deformation to obtain a deformed residual image, and the deformed residual image is subsequently processed Transformation, quantization, entropy coding, etc.
  • decoding (for the current block), after entropy decoding, inverse quantization, and inverse transformation, the deformed residual image is obtained, and the deformed residual image is subjected to inverse residual deformation to obtain the residual image, and the residual image and the predicted The images are combined (added) to obtain the reconstructed image.
  • the deformed residual image during encoding and decoding is not necessarily the same, because the process of quantization and inverse quantization is generally lossy, and the process of transformation and inverse transformation is generally lossy.
  • An embodiment of the present disclosure provides a video decoding method, as shown in FIG. 6 , including:
  • Step 601 decode the encoded video code stream to obtain initial residual data
  • Step 602 determining the translation direction and translation step of the initial residual data, and performing translation on the initial residual data according to the translation direction and translation step;
  • step 603 a reconstructed image is obtained according to the translated residual data.
  • step 603 includes:
  • a reconstructed image is obtained according to the prediction block corresponding to the initial residual data and the translated residual data.
  • the initial residual data in step 601 is obtained according to a relevant decoding scheme, including: performing entropy decoding on the coded video stream, inverse quantization, and inverse transformation to obtain the initial residual data.
  • a relevant decoding scheme including: performing entropy decoding on the coded video stream, inverse quantization, and inverse transformation to obtain the initial residual data.
  • Those skilled in the art implement entropy decoding, inverse quantization, and inverse transformation steps according to related solutions, and the specific aspects do not belong to the scope of protection or limitation of the present application.
  • the initial residual data that has not been translated in step 601 is also called a deformed residual block.
  • determining the translation direction and translation step of the initial residual data in step 602 includes:
  • determining the translation category of the initial residual data in step 602 includes:
  • the prediction block corresponding to the initial residual data is an intra-frame prediction block, determine the translation type according to the intra-frame prediction mode of the prediction block;
  • the translation type is determined according to the gradient of the prediction block.
  • determining the translation category of the initial residual data in step 602 includes:
  • the translation category is determined according to the gradient of the prediction block corresponding to the initial residual data.
  • the determining the translation type according to the intra prediction mode of the prediction block includes:
  • the translation category corresponding to the intra prediction mode of the prediction block is determined according to the set correspondence between the intra prediction mode and the translation category.
  • the wide-angle prediction mode enables the predicted angle to be wider than that of a square block.
  • 2 to 66 are angles corresponding to prediction modes of square blocks.
  • -1 to -14 and 67 to 80 represent extended angles in the wide angle prediction mode.
  • mapping table the corresponding relationship between the preset intra prediction mode and the translation category is shown in the following mapping table:
  • Intra prediction mode (bandwidth angle mode) Translation category 0, 1, -14 ⁇ -12, 15 ⁇ 21, 47 ⁇ 53, 79, 80 0 -8 ⁇ -11, 54 ⁇ 58 1 -4 ⁇ -7, 59 ⁇ 62 2 -2, -3, 63, 64 3 -1, 2, 3, 65 ⁇ 67 4 4, 5, 68, 69 5 6 ⁇ 9, 70 ⁇ 73 6 10 ⁇ 14, 74 ⁇ 78 7 42 ⁇ 46 8 38 ⁇ 41 9 36, 37 10 33 ⁇ 35 11 31, 32 12 27 ⁇ 30 13 23 ⁇ 26 14
  • Each translation category indicates the direction and step size to translate the initial residual data (deformed residual block).
  • the translation category table defined by the following table:
  • the translation direction indicated by the same translation category is opposite on the decoding end and the encoding end. For example, if the decoding end shifts n pixels to the left, the encoding end of this type of translation shifts n pixels to the right. Points that exceed the range of the current residual block during translation are automatically filled to the end of the queue; if the decoding end moves up by n pixels, then the translation type is shifted down by n pixels on the encoding side, and the translation exceeds the current residual block The points in the range are automatically filled to the end of the queue.
  • the gradient of the predicted block includes: a gradient in a horizontal direction and a gradient in a vertical direction.
  • determining the translation category according to the gradient of the prediction block includes:
  • the translation category corresponding to the gradient parameter of the prediction block is determined.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the prediction block determine the vertical direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the gradient parameter is determined according to a set gradient function P(Gh, Gv); wherein, Gh is the gradient in the horizontal direction of the prediction block, and Gv is the gradient in the vertical direction of the prediction block.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the prediction block determine the vertical direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the gradient parameter is determined according to a set gradient function P(Gh, Gv); wherein, Gh is the gradient in the horizontal direction of the prediction block, and Gv is the gradient in the vertical direction of the prediction block.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels, and use the sum of the horizontal direction gradients of the other pixel points as the horizontal direction gradient of the prediction block ;
  • the prediction block determine the vertical gradients of other pixels in the prediction block except the outermost circle of pixels, and use the sum of the vertical gradients of the other pixels as the vertical gradient of the prediction block Straight direction gradient;
  • how to translate the residual pixel can be determined according to the strength and sign of the gradient in the horizontal direction and the gradient in the vertical direction.
  • the gradient in the horizontal direction and the gradient in the vertical direction are the same or approximately the same, it can be estimated that the texture in the prediction block tends to be 45°, and then determine the residual at the encoding end and the decoding end. How bad pixels are shifted.
  • the horizontal direction gradient of each pixel in the prediction block except the outermost circle of pixels is determined according to the following method:
  • the difference between two adjacent pixels in the horizontal direction of the pixel is divided by 2.
  • the vertical direction gradient of each pixel in the prediction block except the outermost circle of pixels is determined according to the following method:
  • the difference between two adjacent pixels in the vertical direction of the pixel is divided by 2.
  • the difference between two pixel points is the difference between the pixel values of the two pixel points.
  • the pixel value I(x, y) is the brightness component Y on the pixel point (x, y).
  • those skilled in the art may also use other methods to determine the horizontal or vertical gradient of each pixel point in the prediction block except the outermost circle of pixel points, which is not limited to the aspects exemplified in the disclosed embodiments.
  • those skilled in the art may also use other gradient functions P(Gh, Gv) to calculate the gradient parameters, and set the gradient parameters and translation category mapping table correspondingly, not limited to the aspects exemplified in the disclosed embodiments.
  • the translation category may be determined according to related attributes of the prediction block generated during the decoding process. That is, the decoding end performs inverse deformation on the deformed residual block according to the relevant attributes of the prediction block, and translates and restores the residual block in the opposite direction of the deformation at the encoding end, so as to perform subsequent decoding steps and complete decoding to obtain a reconstructed image.
  • step 602 includes: analyzing the coded video code stream to obtain the translation category of the initial residual data.
  • the parsing the encoded video code stream to obtain the translation category of the initial residual data includes:
  • Sequence level syntax elements frame level syntax elements, slice level syntax elements, coding tree unit CTU level syntax elements and coding unit CU level syntax elements.
  • the translation category may be obtained by parsing the encoded video code stream. That is, the decoding end performs inverse deformation on the deformed residual block according to the instruction, and translates and restores the residual block in the opposite direction of the deformation at the encoding end, so as to perform subsequent decoding steps, and complete decoding to obtain a reconstructed image.
  • determining the translation category of the initial residual data in step 602 includes:
  • the translation category is determined according to the texture feature of the prediction block corresponding to the initial residual data.
  • step 602 the initial residual data is translated according to the translation direction and the translation step size, including:
  • the pixels in the current row are translated according to the translation step in the current row indicated by the translation category.
  • the translation step indicated by the translation category is determined according to the translation step function f(n), where n represents the row number in the initial residual data, and f(n) represents the translation step of the nth row long.
  • the translation direction includes one of the following:
  • the translation step size function f(n) includes one of the following:
  • the translation steps corresponding to each row indicated by the set translation category may be equal or unequal.
  • other translation directions and corresponding other translation step functions may also be selected, and are not limited to the above-mentioned aspects exemplified in the implementation of the present disclosure.
  • An embodiment of the present disclosure also provides an encoding method, as shown in FIG. 8 , including:
  • Step 801 encoding the video to be encoded to obtain residual data
  • Step 802 determine the translation direction and translation step of the residual data, and translate the residual data according to the translation direction and translation step;
  • step 803 the coded code stream is obtained according to the shifted residual data.
  • the residual data in step 801 is obtained according to the relevant coding scheme, including: subtracting the predicted image (predicted block) from the original image of the current image to be encoded to obtain the residual data, also called residual block .
  • residual block also called residual block .
  • the residual data after translation processing in step 802 is also called a deformed residual block.
  • step 803 subsequent processing such as transformation, quantization, and entropy coding is performed on the shifted residual data, and the coding is finally completed to obtain a coded code stream; those skilled in the art implement the steps of transformation, quantization, and entropy coding according to relevant schemes, and the specific aspects do not belong to The scope of application for protection or limitation of the present invention.
  • determining the translation direction and translation step of the residual data in step 802 includes:
  • the determining the translation type of the residual data includes: in the case that the prediction block corresponding to the residual data is an intra prediction block, according to the intra prediction of the prediction block mode determines the translation category;
  • the translation category is determined according to the gradient of the prediction block.
  • the determining the translation category of the residual data includes: determining the translation category according to a gradient of a prediction block corresponding to the residual data.
  • the determining the translation type according to the intra prediction mode of the prediction block includes:
  • the translation category corresponding to the intra prediction mode of the prediction block is determined according to the set correspondence between the intra prediction mode and the translation category.
  • the intra prediction includes multiple modes.
  • the intra-frame angle prediction mode can be understood as determining a reference pixel according to a predetermined angle, and further calculating a pixel at a position to be predicted according to the reference pixel. Because the angle of the prediction mode in H.266/VVC is very finely divided, the reference pixel position corresponding to the angle to be predicted is a sub-pixel position, and the corresponding reference pixel can be obtained by reference pixel interpolation and other methods.
  • the texture direction of the residual has a certain correlation with the texture direction of the block itself (the original image block) and the direction predicted by the angle. That is to say, for blocks using angle prediction, the residual texture will have a certain correlation with the angle prediction mode. For a certain proportion of blocks, the texture direction of the residual is the same or close to the direction predicted by the angle.
  • the intra angle prediction mode can thus be used to determine how the residual pixels are translated.
  • a translation manner may be determined according to each intra-frame angle prediction mode, or in other words, according to each actual intra-frame prediction angle. It can be moved by an entire pixel; it can also be moved by a sub-pixel, which is realized by interpolation filtering. Considering that the current intra-frame angle prediction mode divides very fine angles, and such a fine granularity is not necessarily required for residual deformation, especially for relatively small blocks, some aggregation can be used under certain circumstances.
  • the method of class that is, in the case of certain block sizes, several intra-frame angle prediction modes or intra-frame prediction angles correspond to the same translation method.
  • Table 1 shows the correspondence between preset intra prediction modes and translation categories.
  • each translation category indicates a direction and a step size for translation of the residual data.
  • the translation category table defined by the following table:
  • Table 4 is a translation category table for the encoding end corresponding to Table 2, and the translation direction indicated by the same translation category is opposite at the decoding end and the encoding end. Points that exceed the range of the current residual block during the translation are automatically filled to the end of the queue.
  • the angle prediction mode 2 of a square block is taken as an example.
  • the prediction block corresponding to the residual data is a 45° prediction direction.
  • the block using this prediction mode is likely to have an obvious 45° texture, and the predicted residual data is also likely to be a 45° or Texture close to 45°.
  • the translation category corresponding to mode 2 is 4, and according to Table 4, the translation category 4 indicates that the nth row is horizontally moved to the right by n pixels.
  • the 45° texture is moved into a vertical texture. Calculate with the coordinates of the upper left corner of a square block as (0, 0).
  • the pixels in row 0 do not move, the pixels in row 1 move to the right by 1 pixel, the pixels in row 2 move to the right by 2 pixels, and the pixels in row n move to the right by n pixels, beyond the current block
  • the pixels of the range are sequentially moved to the left end of the queue.
  • the operation opposite to that at the encoding end is performed, that is, the pixels in the nth row are moved to the left by n pixels, and the pixels beyond the range of the current block are moved to the right end of the queue in order.
  • the gradient of the prediction block includes: a gradient in a horizontal direction and a gradient in a vertical direction;
  • the determining the translation category according to the gradient of the prediction block includes:
  • the translation category corresponding to the gradient parameter of the prediction block is determined.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the prediction block determine the vertical direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the gradient parameter is determined according to a set gradient function P(Gh, Gv); wherein, Gh is the gradient in the horizontal direction of the prediction block, and Gv is the gradient in the vertical direction of the prediction block.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the prediction block determine the vertical direction gradients of other pixels in the prediction block except the outermost circle of pixels;
  • the gradient parameter is determined according to a set gradient function P(Gh, Gv); wherein, Gh is the gradient in the horizontal direction of the prediction block, and Gv is the gradient in the vertical direction of the prediction block.
  • the gradient parameter of the prediction block is determined in the following manner:
  • the prediction block determine the horizontal direction gradients of other pixels in the prediction block except the outermost circle of pixels, and use the sum of the horizontal direction gradients of the other pixel points as the horizontal direction gradient of the prediction block ;
  • the prediction block determine the vertical direction gradient Gh of other pixels in the prediction block except the outermost circle of pixels, and use the sum of the vertical direction gradients of the other pixel points as the prediction block Vertical direction gradient Gv;
  • how to translate the residual pixel can be determined according to the strength and sign of the gradient in the horizontal direction and the gradient in the vertical direction.
  • the gradient in the horizontal direction and the gradient in the vertical direction are the same or approximately the same, it can be estimated that the texture in the prediction block tends to be 45°, and then determine the residual at the encoding end and the decoding end based on this. How bad pixels are shifted.
  • the corresponding relationship between the gradient parameter set by the encoder and the translation category is also shown in Table 3.
  • the horizontal direction gradient of each pixel in the prediction block except the outermost circle of pixels is determined according to the following method:
  • the difference between two adjacent pixels in the horizontal direction of the pixel is divided by 2.
  • the vertical direction gradient of each pixel in the prediction block except the outermost circle of pixels is determined according to the following method:
  • the difference between two adjacent pixels in the vertical direction of the pixel is divided by 2.
  • the difference between two pixel points is the difference between the pixel values of the two pixel points.
  • the pixel value I(x, y) is the brightness component Y on the pixel point (x, y).
  • those skilled in the art may also use other methods to determine the horizontal or vertical gradient of each pixel point in the prediction block except the outermost circle of pixel points, which is not limited to the aspects exemplified in the disclosed embodiments.
  • those skilled in the art may also use other gradient functions P(Gh, Gv) to calculate the gradient parameters, and set the gradient parameters and translation category mapping table correspondingly, not limited to the aspects exemplified in the disclosed embodiments.
  • the translation category may be determined according to related attributes of the prediction block generated during the encoding process. That is, the encoding end deforms the residual block according to the relevant attributes of the prediction block, continues to perform subsequent encoding steps according to the deformed residual block, and finally completes encoding to obtain an encoded code stream.
  • the determining the translation category of the residual data includes: determining the translation category according to the texture feature of the residual data; or, according to the texture of the prediction block corresponding to the residual data The feature determines the translation category.
  • the encoding method further includes:
  • Step 804 write the translation category into the encoded code stream.
  • step 804 includes:
  • Sequence level syntax elements frame level syntax elements, slice level syntax elements, coding tree unit CTU level syntax elements and coding unit CU level syntax elements.
  • the encoding method provided by the embodiment of the present disclosure may also write the translation type into the encoded code stream by extending the code stream.
  • the encoding end may perform inverse deformation on the deformed residual block obtained through decoding according to the translation category obtained through analysis.
  • step 802 the residual data is translated according to the translation direction and the translation step size, including:
  • the pixels in the current row are translated according to the translation step in the current row indicated by the translation category.
  • the translation step indicated by the translation category is determined according to the translation step function f(n), n represents the row number in the residual data, and f(n) represents the translation step of the nth row .
  • the translation direction includes one of the following:
  • the translation step size function f(n) includes one of the following:
  • the translation steps corresponding to each row indicated by the set translation category may be equal or unequal.
  • the translation category 1 indicates that the translation direction is horizontal to the left, and the translation step size function f(n) is
  • encoding and decoding are corresponding opposite processes.
  • those skilled in the art can determine the decoding according to the aspects recorded in the encoding scheme.
  • the corresponding aspect in the scheme or determine the corresponding aspect in the encoding scheme according to the aspects recorded in the decoding scheme.
  • An embodiment of the present disclosure also provides a video encoding device, as shown in FIG. 11 , including a processor and a memory storing a computer program that can run on the processor, wherein the processor executes the computer
  • the program implements the video coding method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a video decoding device, as shown in FIG. 11 , including a processor and a memory storing a computer program that can run on the processor, wherein the processor executes the computer The program realizes the video decoding method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure further provides a video encoding and decoding system, including the video encoding device described in any implementation of the present disclosure and/or the video decoding device described in any implementation of the present disclosure.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized.
  • An embodiment of the present disclosure further provides a code stream, wherein the code stream is generated according to the video encoding method described in any embodiment of the present disclosure.
  • the coefficient matrix obtained after the transformed residual image after operation is easier to compress through the residual deformation, which can further improve the compression efficiency.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit.
  • Computer-readable media may include computer-readable storage media that correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may comprise a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or may be used to store instructions or data Any other medium that stores desired program code in the form of a structure and that can be accessed by a computer.
  • any connection could also be termed a computer-readable medium. For example, if a connection is made from a website, server or other remote source for transmitting instructions, coaxial cable, fiber optic cable, dual wire, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, or blu-ray disc, etc. where disks usually reproduce data magnetically, while discs use lasers to Data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors can be implemented by one or more processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset).
  • IC integrated circuit
  • Various components, modules, or units are described in the disclosed embodiments to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (comprising one or more processors as described above) in combination with suitable software and/or firmware.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit.
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种视频解码方法,包括:从已编码视频码流中解码得到初始残差数据;确定所述初始残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述初始残差数据进行平移;根据平移后的残差数据得到重建图像。本公开还提供一种视频编码方法,包括:对待编码视频进行编码获得残差数据;确定所述残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述残差数据进行平移;根据平移后的残差数据得到编码码流。本公开还提供了采用上述编解码方法的设备、系统、存储介质,以及根据上述视频编码方法生成的码流。

Description

一种视频解码、编码方法及设备、存储介质 技术领域
本公开实施例涉及但不限于视频数据处理技术领域,尤其设及一种视频解码方法、编码方法及设备、存储介质。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够节省不少视频数据,但目前仍然需要追求更好的数字视频压缩技术,以减少数字视频传输的带宽和流量压力,达到更高效的视频编解码和传输存储。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本公开实施例提供一种视频解码方法,包括:
从已编码视频码流中解码得到初始残差数据;
确定所述初始残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述初始残差数据进行平移;
根据平移后的残差数据得到重建图像。
本公开实施例还提供一种视频编码方法,包括:
对待编码视频进行编码获得残差数据;
确定所述残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述残差数据进行平移;
根据平移后的残差数据得到编码码流。
本公开实施例还提供一种视频解码设备,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。
本公开实施例还提供一种视频编码设备,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开实施例还提供一种视频编解码系统,其中,包括如本公开任一实施例所述的视频解码设备和/或如本公开任一实施例所述的视频编码设备。
本公开实施例还提供一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频解码方法或视频编码方法。
本公开实施例还提供一种码流,其中,所述码流根据本公开任一实施例所述的视频编码方法生成。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
附图用来提供对本公开实施例的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。
图1是可用于本公开实施例的一种视频编解码系统的结构框图;
图2是可用于本公开实施例的一种视频编码器的结构框图;
图3是可用于本公开实施例的一种视频解码器的结构框图;
图4是可用于本公开实施例的一种DCT变换的效果示意图;
图5是可用于本公开实施例的一种编解码流程示意图;
图6是可用于本公开实施例的一种视频解码方法的流程图;
图7是可用于本公开实施例的VCC帧内预测模式示意图;
图8是可用于本公开实施例的一种视频编码方法的流程图;
图9是可用于本公开实施例的一种残差数据平移示意图;
图10是可用于本公开实施例的另一种残差数据平移示意图;
图11是可用于本公开实施例的一种视频编/解码设备结构示意图。
具体实施方式
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。
本公开中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。
在描述具有代表性的示例性实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
目前通用的视频编解码标准都采用基于块的混合编码框架。视频中的每一帧被分割成相同大小(如128x128,64x64等)的正方形的最大编码单元(LCU largest coding unit)或编码树单元(CTU Coding Tree Unit)。每个最大编码单元或编码树单元可根据规则划分成矩形的编码单元(CU coding unit)。编码单元可能还会划分预测单元(PU prediction unit),变换单元(TU transform unit)等。混合编码框架包括预测(prediction)、变换(transform)、量化(quantization)、熵编码(entropy coding)、环路滤波(in loop filter)等模块。预测模块包括帧内预测(intra prediction)和帧间预测(inter prediction)。帧间预测包括运动估计(motion estimation)和运动补偿(motion compensation)。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
国际上,主流的视频编解码标准包括H.264/Advanced Video Coding(高级视频编码,AVC),H.265/High Efficiency Video Coding(高效视频编码,HEVC),H.266/Versatile Video Coding(多功能视频编码,VVC),MPEG(Moving Picture Experts Group,动态图像专家组),AOM(开放媒体联盟,Alliance for Open Media),AVS(Audio Video coding Standard,音视频编码标准)以及这些标准的拓展,或任何自定义的其他标准等,这些标准通过视频压缩技术减少传输的数据量和存储的数据量,以达到更高效的视频编解码和传输储存。
在H.264/AVC中,将输入图像划分成固定的尺寸的块作为编码的基本单元,并把它称为宏块(MB,Macro Block),包括一个亮度块和两个色度块,亮度块大小为16×16。如果采用4:2:0采样,色度块大小为亮度块大小的一半。在预测环节,根据预测模式的不同,将宏块进一步划分为用于预测的小块。帧内预测中可以把宏块划分成16×16、8×8、4×4的小块,每个小块分别进行帧内预测。在变换、量化环节,将宏块划分为4×4或8×8的小块,将每个小块中的预测残差分别进行变换和量化,得到量化后的系数。
H.265/HEVC与H.264/AVC相比,在多个编码环节采取了改进措施。在H.265/HEVC中,一幅图像被分割成编码树单元(CTU,Coding Tree Unit),CTU是编码的基本单元(对应于H.264/AVC中的宏块)。一个CTU包含一个亮度编码树块(CTB,Coding Tree Block)和两个色度编码树块,H.265/HEVC标准中CU的最大尺寸一般为64×64。为了适应多种多样的视频内容和视频特征,CTU采用四叉树(QT,Quadro Tree)方式迭代划分为一系列编码单元(CU,Coding Unit),CU是帧内/帧间编码的基本单元。一个CU包含一个亮度编码块(CB,Coding Block)和两个色度编码块及相关语法结构,最大CU大小为CTU,最小CU大小为8×8。经过编码树划分得到的叶子节点CU根据预测方式的不同,可分为三种类型:帧内预测的intra CU、帧间预测的inter CU和skipped CU。skipped CU可以看作是inter CU的特例,不包含运动信息和预测残差信息。叶子节点CU包含一个或者多个预测单元(PU,Prediction Unit),H.265/HEVC支持4×4到64×64大小的PU,一共有八种划分模式。对于帧内编码模式,可能的划分模式有两种:Part_2Nx2N和Part_NxN。对于预测残差信号,CU采用预测残差四叉树划分为变换单元(TU:Transform Unit)。一个TU包含一个亮度变换块(TB,Transform Block)和两个色度变换块。仅允许方形的划分,将一个CB划分为1个或者4个PB。同一个TU具有相同的变换和量化过程,支持的大小为4×4到32×32。 与之前的编码标准不同,在帧间预测中,TB可以跨越PB的边界,以进一步最大化帧间编码的编码效率。
在H.266/VVC中,视频编码图像首先划分为跟H.265/HEVC相似的编码树单元CTU,但是最大尺寸从64×64提高到了128×128。H.266/VVC提出了四叉树和嵌套多类型树(MTT,Multi-Type Tree)划分,MTT包括二叉树(BT,Binary Tree)和三叉树(TT,Ternary Tree),且统一了H.265/HEVC中CU、PU、TU的概念,并且支持更灵活的CU划分形状。CTU按照四叉树结构进行划分,叶子节点通过MTT进一步划分。多类型树叶子节点成为编码单元CU,当CU不大于最大变换单元(64×64)时,后续预测和变换不会再进一步划分。大部分情况下CU、PU、TU具有相同的大小。考虑到亮度和色度的不同特性和具体实现的并行度,H.266/VVC中,色度可以采用单独的划分树结构,而不必和亮度划分树保持一致。H.266/VVC中I帧的色度划分采用色度分离树,P帧和B帧色度划分则与亮度划分保持一致。
图1为可用于本公开实施例的一种视频编解码系统的框图。如图1所示,该系统分为编码侧装置1和解码侧装置2,编码侧装置1对视频图像进行编码产生码流。解码侧装置2可对码流进行解码,得到重建的视频图像。编码侧装置1和解码侧装置2可包含一个或多个处理器以及耦合到所述一个或多个处理器的存储器,如随机存取存储器、带电可擦可编程只读存储器、快闪存储器或其它媒体。编码侧装置1和解码侧装置2可以用各种装置实现,如台式计算机、移动计算装置、笔记本电脑、平板计算机、机顶盒、电视机、相机、显示装置、数字媒体播放器、车载计算机或其他类似的装置。
解码侧装置2可经由链路3从编码侧装置1接收码流。链路3包括能够将码流从编码侧装置1移动到解码侧装置2的一个或多个媒体或装置。在一个示例中,链路3包括使得编码侧装置1能够将码流直接发送到解码侧装置2的一个或多个通信媒体。编码侧装置1可根据通信标准(例如无线通信协议)来调制码流,且可将经调制的码流发送到解码侧装置2。所述一个或多个通信媒体可包含无线和/或有线通信媒体,例如射频(radio frequency,RF)频谱或一个或多个物理传输线。所述一个或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一个或多个通信媒体可包含路由器、交换器、基站或促进从编码侧装置1到解码侧装置2的通信的其它设备。在另一示例中,也可将码流从输出接口15输出到一个存储装置,解码侧装置2可经由流式传输或下载从该存储装置读取所存储的数据。该存储装置可包含多种分布式存取或本地存取的数据存储媒体中的任一种,例如硬盘驱动器、蓝光光盘、数字多功能光盘、只读光盘、快闪存储器、易失性或非易失性存储器、文件服务器等等。
在图1所示的示例中,编码侧装置1包含数据源11、编码器13和输出接口15。在一些示例中。数据源11可包括视频捕获装置(例如,摄像机)、含有先前捕获的数据的存档、用以从内容提供者接收数据的馈入接口,用于产生数据的计算机图形系统,或这些来源的组合。编码器13可对来自数据源11的数据进行编码后输出到输出接口15,输出接口15可包含调节器、调制解调器和发射器中的至少之一。
在图1所示的示例中,解码侧装置2包含输入接口21、解码器23和显示装置25。在一些示例中,输入接口21包含接收器和调制解调器中的至少之一。输入接口21可经由链路3或从存储装置接收码流。解码器23对接收的码流进行解码。显示装置25用于显示解码后的数据,显示装置25可与解码侧装置2的其他装置集成在一起或者单独设置。显示装置25例如可以是液晶显示器、等离子显示器、有机发光二极管显示器或其它类型的显示装置。在其他示例中,解码侧装置2也可以不包含所述显示装置25,或者包含应用解码后数据的其他装置或设备。
图1的编码器13和解码器23可使用以下电路中的任意一种或者以下电路的任意组合来实现:一个或多个微处理器、数字信号处理器、专用集成电路、现场可编程门阵列、离散逻辑、硬件。如果部分地以软件来实施本公开,那么可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本公开方法。
图2所示是一种示例性的视频编码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频编码器的结构也可以用于H.264/AVC、H.266/VVC及其他类似标准的视频编码。
如图所示,视频编码器20用于对视频数据编码,生成码流。如图所示,视频编码器20包含预测处理单元100、划分单元101、预测残差产生单元102、变换处理单元104、量化单元106、反量化单元108、反变换处理单元110、重建单元112、滤波器单元113、已解码图片缓冲器114,以及熵编码单元116。预测处理单元100包含帧间预测处理单元121和帧内预测处理单元126。在其他实施例中,视频编码器20可以包含比该示例更多、更少或不同功能组件。预测残差产生单元102和重建单元112在图中均用带加号的圆圈表示。
划分单元101与预测处理单元100配合将接收的视频数据划分为切片(Slice)、CTU或其它较大的单 元。划分单元101接收的视频数据可以是包括I帧、P帧或B帧等视频帧的视频序列。
预测处理单元100可以将CTU划分为CU,对CU执行帧内预测编码或帧间预测编码。对CU做帧内编码时,可以将2N×2N的CU划分为2N×2N或N×N的预测单元(PU:prediction unit)进行帧内预测。对CU做帧间预测时,可以将2N×2N的CU划分为2N×2N、2N×N、N×2N、N×N或其他大小的PU进行帧间预测,也可以支持对PU的不对称划分。
帧间预测处理单元121可对PU执行帧间预测,产生PU的预测数据,所述预测数据包括PU的预测块、PU的运动信息和各种语法元素。
帧内预测处理单元126可对PU执行帧内预测,产生PU的预测数据。PU的预测数据可包含PU的预测块和各种语法元素。帧内预测处理单元126可尝试多种可选择的帧内预测模式,从中选取代价最小的一种帧内预测模式来执行对PU的帧内预测。
预测残差产生单元102可基于CU的原始块和CU划分的PU的预测块,产生CU的预测残差块。
变换处理单元104可将CU划分为一个或多个变换单元(TU:Transform Unit),TU关联的预测残差块是CU的预测残差块划分得到的子块。通过将一种或多种变换应用于TU关联的预测残差块来产生TU关联的系数块。例如,变换处理单元104可将离散余弦变换(DCT:Discrete Cosine Transform)、方向性变换或其他的变换应用于TU关联的预测残差块,可将预测残差块从像素域转换到频域。
量化单元106可基于选定的量化参数(QP)对系数块中的系数进行量化,量化可能会带来量化损失(quantitative losses),通过调整QP值可以调整对系数块的量化程度。
反量化单元108和反变换单元110可分别将反量化和反变换应用于系数块,得到TU关联的重建预测残差块。
重建单元112可基于所述重建预测残差块和预测处理单元100产生的预测块,产生CU的重建块。
滤波器单元113对所述重建块执行环路滤波后存储在已解码图片缓冲器114中。帧内预测处理单元126可以从已解码图片缓冲器114缓存的重建块中提取PU邻近的已重建参考信息以对PU执行帧内预测。帧间预测处理单元121可使用已解码图片缓冲器114缓存的含有重建块的参考图片对其他图片的PU执行帧间预测。
熵编码单元116可以对接收的数据(如语法元素、量化后的系统块、运动信息等)执行熵编码操作,如执行上下文自适应可变长度编码(CAVLC:Context Adaptive Variable Length Coding)、上下文自适应二进制算术编码(CABAC:Context-based Adaptive Binary Arithmetic Coding)等,输出码流(即已编码视频码流)。
图3所示是一种示例性的视频解码器的结构框图。在该示例中,主要基于H.265/HEVC标准的术语和块划分方式进行描述,但该视频解码器的结构也可以用于H.264/AVC、H.266/VVC及其他类似标准的视频解码。
视频解码器30可对接收的码流解码,输出已解码视频数据。如图所示,视频解码器30包含熵解码单元150、预测处理单元152、反量化单元154、反变换处理单元156、重建单元158(图中用带加号的圆圈表示)、滤波器单元159,以及图片缓冲器160。在其它实施例中,视频解码器30可以包含更多、更少或不同的功能组件。
熵解码单元150可对接收的码流进行熵解码,提取语法元素、量化后的系数块和PU的运动信息等信息。预测处理单元152、反量化单元154、反变换处理单元156、重建单元158以及滤波器单元159均可基于从码流提取的语法元素来执行相应的操作。
作为执行重建操作的功能组件,反量化单元154可对量化后的TU关联的系数块进行反量化。反变换处理单元156可将一种或多种反变换应用于反量化后的系数块以便产生TU的重建预测残差块。
预测处理单元152包含帧间预测处理单元162和帧内预测处理单元164。如果PU使用帧内预测编码,帧内预测处理单元164可基于从码流解析出的语法元素确定PU的帧内预测模式,根据确定的帧内预测模式和从图片缓冲器件60获取的PU邻近的已重建参考信息执行帧内预测,产生PU的预测块。如果PU使用帧间预测编码,帧间预测处理单元162可基于PU的运动信息和相应的语法元素来确定PU的一个或多个参考块,基于所述参考块来产生PU的预测块。
重建单元158可基于TU关联的重建预测残差块和预测处理单元152产生的PU的预测块(即帧内预测数据或帧间预测数据),得到CU的重建块。
滤波器单元159可对CU的重建块执行环路滤波,得到重建的图片。重建的图片存储在图片缓冲器160 中。图片缓冲器160可提供参考图片以用于后续运动补偿、帧内预测、帧间预测等,也可将重建的视频数据作为已解码视频数据输出,在显示装置上的呈现。
因为视频编码包括编码和解码两部分,为方便后文描述,编码器端的编码和解码器端的解码也可以统称为编码或译码。根据相关步骤的上下文记载,本领域技术人员可以知晓后续提及的编码(译码)是指编码器端的编码,还是指解码器端的解码。本申请中可使用术语“编码块”或“视频块”以指代样本的一个或多个块,以及对于一个或多个样本块进行编码(译码)的语法结构;编码块或视频块的实例类型可包括H.265/HEVC中的CTU、CU、PU、TU、subblock,或其他视频编解码标准中的宏块、宏块分割区等。
下面先对本公开实施例中涉及到的一些概念进行介绍。本公开实施例的相关记载采用了H.265/HEVC或H.266/VVC中的术语,以易于解释。然而,并不限定本公开实施例提供的方案受限于H.265/HEVC或H.266/VVC,实际上,本公实施例提供的技术方案也可以实施于H.264/AVC、MPEG、AOM、AVS等,以及这些标准的后续和扩展中。
CTU是Coding Tree Unit的缩写,相当于H.264/AVC中的宏块。根据YUV采样格式,一个编码树单元(CTU)应当是包含了同一位置处的一个亮度编码树块(CTB)和两个色度编码树块(CTB)(Cr,Cb)。
编码单元CU(Coding Unit),是视频编解码过程中进行各种类型的编码操作或解码操作的基本单元,例如基于CU的预测、变换、熵编码等等操作。CU是指一个二维采样点阵列,可以是正方形阵列,或者可以是矩形阵列。例如,一个4x8大小的CU可看做4x8共32个采样点构成的方形采样点阵列。CU也可称为图像块。CU包括:一个亮度编码块和两个色度(Cr,Cb)编码块,及相关语法结构。
预测单元(Prediction Unit),也称为预测块,包括:一个亮度预测块和两个色度(Cr,Cb)预测块。
残差块,是指在经帧间预测和/或帧内预测产生当前块的预测块后,将从待编码的当前块减去所述预测块形成的残差图像块,也可称为残差数据或残差图像,包括:一个亮度残差块和两个色度(Cr,Cb)残差块。
系数块,包括对残差块进行变换得到含有变换系数的变换块(TU,Transform Unit),或者对残差块不进行变换,包括含有残差数据(残差信号)的残差块。本公开实施例中,系数包括对残差块进行变换得到的变换块的系数,或者残差块的系数,对系数进行熵编码包括对变换块的系数经量化后进行熵编码,或者,如果未将变换应用于残差数据,包括对残差块的系数经量化后进行熵编码。也可以将未经变换的残差信号和经变换的残差信号统称为系数(coefficient)。为进行有效的压缩。一般系数需进行量化处理,经量化后的系数也可以称为级别。其中,变换块TU包括:一个亮度变换块和两个色度(Cr,Cb)变换块。
量化通常被用于降低系数的动态范围,从而达到用更少的码字表达视频的目的。量化后的数值通常称为级别(level)。量化的操作通常是用系数除以量化步长,量化步长由在码流传递的量化因子决定。反量化则是通过级别乘以量化步长来完成。对于一个N×M大小的块,所有系数的量化可以独立的完成,这一技术被广泛地应用在很多国际视频压缩标准,例如H.265/HEVC、H.266/VVC等。特定的扫描顺序可以把一个二维的系数块变换成一维系数流。扫描顺序可以是Z型,水平,垂直或者其它任何一种顺序的扫描。在国际视频压缩标准中,量化操作可以利用系数间的相关性,利用已量化系数的特性来选择更优的量化方式,从而达到优化量化的目的。
可以看到,残差块通常要比原始图像简单很多,因而预测后确定残差,再进行编码可以显著提升压缩效率。对残差块也不是直接进行编码,而是通常先进行变换。变换是把残差图像从空间域变换到频率域,去除残差图像的相关性。残差图像变换到频率域以后,由于能量大多集中在低频区域,变换后的非零系数大多集中在左上角。变换后,接下来利用量化来进一步压缩。而且由于人眼对高频不敏感,高频区域可以使用更大的量化步长以进一步提升压缩效率。
如图4所示的DCT变换为例,原始图像经过DCT变换以后只有左上角区域存在非零系数。需要说明的是,这个例子是对整幅图像做了DCT变换,而在视频编解码中,图像是分割成块来处理的,因而变换也是基于块来进行的。
DCT(离散余弦变换Discrete Cosine Transform)2型是视频压缩标准中最常用的变换。H.266/VVC中还可以使用DCT8型和DST7型,这些变换公式如下:
N个点输入的DCT2,DCT8和DST7的基本变换公式
Figure PCTCN2021119157-appb-000001
Figure PCTCN2021119157-appb-000002
由于图像都是2维的,而直接进行二维的变换运算量和内存开销对编解码设备的硬件条件较高,因而在相关标准中使用的上述DCT2,DCT8,DST7变换都是拆分成水平方向和竖直方向的一维变换,分成两步进行的。如先进行水平方向的变换再进行竖直方向的变换,或者先进行竖直方向的变换再进行水平方向的变换。
研究发现,上述变换方法对水平方向和竖直方向的纹理比较有效,但是对斜向的纹理效果相对较差。一般而言水平和竖直方向的纹理较常见的,因而上述的变换方法对提升压缩效率是有用的。随着对压缩效率需求的不断提高,如果斜向的纹理能够更有效地处理,可以进一步提升压缩效率。本公开实施提供一种编码方法,对残差图像进行相关处理,使处理后的残差图像的纹理更适合后续变换操作,或者说使处理后的残差图像经过后续变换操作后得到的系数矩阵更容易压缩,更有效地提升了编码压缩效率。
相关技术方案中,视频编解码标准中使用的基础变换是水品方向和竖直方向分离的变换。为了进一步提升编码压缩效率,本公开实施提出一种编解码方案,对进行变换前的残差块进行变形处理(平移和/或交换),以得到水平或者竖直的纹理,或者接近于水平或竖直的纹理,基于变形后的残差块进行变换,将得到更少的变换系数从而提高压缩效率。在解码时,对反变换的残差图像进行跟编码时相反的变形处理(平移和/或交换),得到待解码的残差块。如图5所示,在编码时,(当前块的)原始图像减去预测图像得到残差图像,残差图像经过残差变形得到变形过的残差图像,对变形过的残差图像进行后续的变换,量化,熵编码等。在解码时,(对当前块),经过熵解码,反量化,反变换得到变形过的残差图像,对变形过的残差图像经过反残差变形得到残差图像,将残差图像和预测图像组合(相加)得到重建图像。需要说明的是,这里编码时和解码时的变形过的残差图像不一定相同,因为量化反量化过程一般是有损的,而且变换反变换过程一般也是有损的。
本公开实施例提供一种视频解码方法,如图6所示,包括:
步骤601,从已编码视频码流中解码得到初始残差数据;
步骤602,确定所述初始残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述初始残差数据进行平移;
步骤603,根据平移后的残差数据得到重建图像。
在本公开一实施例中,步骤603包括:
根据所述初始残差数据对应的预测块和平移后的残差数据得到重建图像。
需要说明的是,步骤601中所述初始残差数据根据相关解码方案获得,包括:对已编码视频码流经过熵解码,反量化,反变换得到所述初始残差数据。本领域技术人员根据相关方案实施熵解码,反量化,反变换步骤,具体方面不属于本发明申请保护或限定的范围。步骤601中尚未进行平移处理的初始残差数据又称为变形残差块。
在本公开一实施例中,步骤602中确定所述初始残差数据的平移方向和平移步长,包括:
确定所述初始残差数据的平移类别,将设定的所述平移类别指示的平移方向和平移步长确定为所述初始残差数据的平移方向和平移步长。
在本公开一实施例中,步骤602中确定所述初始残差数据的平移类别,包括:
在所述初始残差数据对应的预测块为帧内预测块的情况下,根据所述预测块的帧内预测模式确定所述平移类别;
在所述初始残差数据对应的预测块为帧间预测块的情况下,根据所述预测块的梯度确定所述平移类别。
在本公开一实施例中,步骤602中确定所述初始残差数据的平移类别,包括:
根据所述初始残差数据对应的预测块的梯度确定所述平移类别。
需要说明的是,本领域技术人员根据相关方案,执行解码端的预测步骤获取所述初始残差数据对应的预测块,具体步骤不属于本发明申请保护或限定的范围。
在本公开一实施例中,所述根据所述预测块的帧内预测模式确定所述平移类别,包括:
根据设定的帧内预测模式与平移类别的对应关系,确定所述预测块的帧内预测模式所对应的平移类别。
例如,以H.266/VVC编解码框架为例,基本的帧内预测模式67种,除模式0Planar,1DC外,有65种角度预测模式;对非正方形的块还可以使用宽角度预测模式,宽角度预测模式使得预测的角度会比正方形的块的角度范围更大。如图6所示,2~66为正方形块的预测模式对应的角度。-1~-14以及67~80代表宽角度预测模式下扩展的角度。
在本公开一实施例中,H.266/VVC编解码框架下,预设的帧内预测模式与平移类别的对应关系如下映射表所示:
表1-帧内预测模式与平移类别映射表
帧内预测模式(带宽角度模式) 平移类别
0,1,-14~-12,15~21,47~53,79,80 0
-8~-11,54~58 1
-4~-7,59~62 2
-2,-3,63,64 3
-1,2,3,65~67 4
4,5,68,69 5
6~9,70~73 6
10~14,74~78 7
42~46 8
38~41 9
36,37 10
33~35 11
31,32 12
27~30 13
23~26 14
每一个平移类别指示对初始残差数据(变形残差块)进行平移的方向和步长。例如,如下表定义的平移类别表:
表2-平移类别表
平移类别 具体描述
0 不做平移
1 第n行水平向左移动n/4个像素
2 第n行水平向左移动n/2个像素
3 第n行水平向左移动3n/4个像素
4 第n行水平向左移动n个像素
5 第n行水平向左移动4n/3个像素
6 第n行水平向左移动2n个像素
7 第n行水平向左移动4n个像素
8 第n行水平向右移动n/4个像素
9 第n行水平向右移动n/2个像素
10 第n行水平向右移动3n/4个像素
11 第n行水平向右移动n个像素
12 第n行水平向右移动4n/3个像素
13 第n行水平向右移动2n个像素
14 第n行水平向右移动4n个像素
需要说明的是,同一平移类别指示的平移方向在解码端和编码端视是相反的,如解码端向左平移n个像素,那该平移类型的在编码端则是向右平移n个像素,平移中超出当前残差块范围的点自动补齐到队尾;如解码端向上平移n个像素,那该平移类型的在编码端则是向下平移n个像素,平移中超出当前残差块范围的点自动补齐到队尾。
可选地,帧内预测模式与平移类别的对应关系可采用其他定义,不限于本公开实施例表1所示的方面;各平移类别所指示的平移方向和平移步长可以采用其他定义,不限于本公开实施例表2所示的方面。
在本公开一实施例中,所述预测块的梯度包括:水平方向梯度和竖直方向梯度。
在本公开一实施例中,根据所述预测块的梯度确定所述平移类别,包括:
根据所述预测块的水平方向梯度和所述预测块的竖直方向梯度,确定所述预测块的梯度参数;
根据设定的梯度参数与平移类别的对应关系,确定所述预测块的梯度参数对应的平移类别。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
根据所述预测块中除最外一圈像素点以外的其他像素点中满足设定采样规则的部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
根据所述预测块中除最外一圈像素点以外的其他像素点中满足设定采样规则的部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度,将所述其他像素点的水平方向梯度之和作为所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度,将所述其他像素点的竖直方向梯度之和作为所述预测块的竖直方向梯度;
根据设定的梯度函数P(Gh,Gv)=Gh/Gv确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度,在Gv为0的情况下,P(Gh,Gv)=0。
需要说明的是,本公开实施提出的方案中,根据水平方向梯度和竖直方向梯度的强弱及正负,可以确定残差像素如何平移。在本公开一实施例中,如果水平方向的梯度和竖直方向的梯度强度相同或近似相同,则可以估计预测块中的纹理是倾向于45°的,进而此来确定编码端和解码端的残差像素如何平移。
在本公开一实施例中,解码端设定的梯度参数与平移类别的对应关系如表3所示:
表3-梯度参数与平移类别映射表
梯度参数 平移类别
0,(-∞,-8),[8,+∞) 0
[-3/8,-1/8) 1
[-5/8,-3/8) 2
[-7/8,-5/8) 3
[-9/8,-7/8) 4
[-3/2,-9/8) 5
[-3,-3/2) 6
[-8,-3) 7
[1/8,3/8) 8
[3/8,5/8) 9
[5/8,7/8) 10
[7/8,9/8) 11
[9/8,3/2) 12
[3/2,3) 13
[3,8) 14
在本公开一实施例中,所述预测块中除最外一圈像素点以外的其他每一个像素点的水平方向梯度根据以下方式确定:
该像素点水平方向相邻的两个像素点之差除以2。
在本公开一实施例中,所述预测块中除最外一圈像素点以外的其他每一个像素点的竖直方向梯度根据以下方式确定:
该像素点竖直方向相邻的两个像素点之差除以2。
需要说明的是,两个像素点之差即为两个像素点的像素值之差。在本公开一实施例中,YUV空间中,像素值I(x,y)为像素点(x,y)上的亮度分量Y。
可选地,本领域技术人员还可以采用其他方式确定预测块中除最外一圈像素点以外的其他每一个像素点在水平或竖直方向的梯度,不限于公开实施例所示例的方面。可选地,本领域技术人员还可以采用其他梯度函数P(Gh,Gv)来计算梯度参数,并对应设定梯度参数与平移类别映射表,不限于公开实施例所示例的方面。
需要说明的是,上述实施例中,可以根据解码过程中生成的预测块的相关属性确定所述平移类别。即,解码端根据预测块的相关属性对已变形残差块进行反变形,按照编码端变形时相反的方向进行平移恢复残差块,以进行后续解码步骤,完成解码得到重建图像。
在本公开一实施例中,步骤602包括:解析所述已编码视频码流获得所述初始残差数据的平移类别。
在本公开一实施例中,所述解析所述已编码视频码流获得所述初始残差数据的平移类别,包括:
解析所述已编码视频码流,从以下语法元素之一中获得所述平移类别:
序列级语法元素、帧级语法元素、片级语法元素、编码树单元CTU级语法元素和编码单元CU级语法元素。
需要说明的是,上述实施例中,可以根据从所述已编码视频码流中解析得到所述平移类别。即,解码端根据该指示对变形残差块进行反变形,按照编码端变形时相反的方向进行平移恢复残差块,以进行后续解码步骤,完成解码得到重建图像。
在本公开一实施例中,步骤602中确定所述初始残差数据的平移类别,包括:
根据所述初始残差数据对应的预测块的纹理特征确定所述平移类别。
在本公开一实施例中,步骤602中根据所述平移方向和平移步长对所述初始残差数据进行平移,包括:
对所述初始残差数据的每一行分别执行以下步骤:
沿着所述平移类别指示的平移方向,按照所述平移类别指示的在当前行的平移步长,平移当前行像素。
在本公开一实施例中,所述平移类别指示的平移步长根据平移步长函数f(n)确定,n表示初始残差数据中的行序号,f(n)表示第n行的平移步长。
在本公开一实施例中,所述平移方向包括以下之一:
水平向左、水平向右、竖直向上、竖直向下。
在本公开一实施例中,所述平移步长函数f(n)包括以下之一:
f(n)=0;
f(n)=n/4;
f(n)=n/2;
f(n)=3n/4;
f(n)=n;
f(n)=4n/3;
f(n)=2n;
f(n)=4n。
需要说明的是,在本公开一实施例中,设定的平移类别所指示的对应到每一行的平移步长可以相等,也可以不等。根据图像数据特点,还可以选用其它平移方向以及对应其他平移步长函数,不限于本公开实施所示例的上述方面。
本公开实施例还提供一种编码方法,如图8所示,包括:
步骤801,对待编码视频进行编码获得残差数据;
步骤802,确定所述残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述残差数据进行平移;
步骤803,根据平移后的残差数据得到编码码流。
需要说明的是,步骤801中所述残差数据根据相关编码方案获得,包括:对当前待编码图像的原始图像减去预测图像(预测块)得到所述残差数据,也称为残差块。本领域技术人员根据相关方案实施预测步骤,具体方面不属于本发明申请保护或限定的范围。步骤802中进行平移处理后的残差数据又称为变形残差块。步骤803中对平移后的残差数据进行变换、量化、熵编码等后续处理,最终完成编码,得到编码码流;本领域技术人员根据相关方案实施变换、量化和熵编码步骤,具体方面不属于本发明申请保护或限定 的范围。
在本公开一实施例中,步骤802中确定所述残差数据的平移方向和平移步长,包括:
确定所述残差数据的平移类别,将设定的所述平移类别指示的平移方向和平移步长确定为所述残差数据的平移方向和平移步长。
在本公开一实施例中,所述确定所述残差数据的平移类别,包括:在所述残差数据对应的预测块为帧内预测块的情况下,根据所述预测块的帧内预测模式确定所述平移类别;
在所述残差数据对应的预测块为帧间预测块的情况下,根据所述预测块的梯度确定所述平移类别。
在本公开一实施例中,所述确定所述残差数据的平移类别,包括:根据所述残差数据对应的预测块的梯度确定所述平移类别。
在本公开一实施例中,所述根据所述预测块的帧内预测模式确定所述平移类别,包括:
根据设定的帧内预测模式与平移类别的对应关系,确定所述预测块的帧内预测模式所对应的平移类别。
在本公开一实施例中,以H.266/VVC编解码框架为例,如图7所示,帧内预测包括多种模式。其中,帧内角度预测模式即可以理解为按照既定的角度确定参考像素后,根据参考像素进一步计算待预测位置的像素。因为H.266/VVC中预测模式的角度分得很细,会遇到待预测的位置按角度对应的参考像素位置是一个分像素位置,则可以通过参考像素插值等方法得到对应的参考像素。
研究发现,原始图像中有些角度纹理比较明显的块,即使在使用角度预测后,仍然会存在比较明显的残差。而且残差的纹理方向跟块本身(原始图像块)的纹理方向、角度预测的方向会有一定的相关性。也就是说使用角度预测的块,残差的纹理会和角度预测模式有一定的相关性。有一定比例的块,残差的纹理方向和角度预测的方向是相同或者相近的。因而可以利用帧内角度预测模式来确定残差像素如何平移。
在本公开一实施例中,可以根据每一个帧内角度预测模式,或者说根据每一个实际的帧内预测角度,确定一种平移方式。可以是移动整像素的;也可以移动分像素,通过插值滤波来实现。考虑到现在的帧内角度预测模式划分了非常细的角度,而对于残差变形来说并不一定需要这么细的粒度,特别是对于本身比较小的块,可以在某种情况下使用一些聚类的方法,即在某些块大小的情况下,使得几个帧内角度预测模式或者说帧内预测角度对应同一种平移方式。在本公开一实施例中,预设的帧内预测模式与平移类别的对应关系如表1所示。
在本公开一实施例中,每一个平移类别指示对残差数据进行平移的方向和步长。例如,如下表定义的平移类别表:
表4-平移类别表
平移类别 具体描述
0 不做平移
1 第n行水平向右移动n/4个像素
2 第n行水平向右移动n/2个像素
3 第n行水平向右移动3n/4个像素
4 第n行水平向右移动n个像素
5 第n行水平向右移动4n/3个像素
6 第n行水平向右移动2n个像素
7 第n行水平向右移动4n个像素
8 第n行水平向左移动n/4个像素
9 第n行水平向左移动n/2个像素
10 第n行水平向左移动3n/4个像素
11 第n行水平向左移动n个像素
12 第n行水平向左移动4n/3个像素
13 第n行水平向左移动2n个像素
14 第n行水平向左移动4n个像素
可以看到,表4是与表2相对应的用于编码端的平移类别表,同一平移类别指示的平移方向在解码端和编码端视是相反的。平移中超出当前残差块范围的点自动补齐到队尾。
可选地,帧内预测模式与平移类别的对应关系可采用其他定义,不限于本公开实施例表1所示的方面;各平移类别所指示的平移方向和平移步长可以采用其他定义,不限于本公开实施例表4所示的方面。
在本公开一实施例中,以正方形块的角度预测模式2为例。残差数据对应的预测块是一个45°的预测方向,使用这个预测模式的块很大可能本身有明显的45°的纹理,而预测的后的残差数据也有很大可能是一个45°或接近45°的纹理。在编码端,根据表1,确定模式2对应的平移类别为4,根据表4平移类别4指示:第n行水平向右移动n个像素。如图9所示,通过水平向右的方向平移,把45°的纹理移成竖直的纹理。以一个正方形的块左上角的坐标为(0,0)算。第0行的像素不移动,第1行的像素都向右移动1个像素,第2行的像素都向右移动2个像素,第n行的像素都向右移动n个像素,超出当前块范围的像素按顺序移到左边的队尾。在解码端,进行与编码端相反的操作,即第n行的像素都向左移动n个像素,超出当前块范围的像素按顺序移到右边的队尾。
在本公开一实施例中,所述预测块的梯度包括:水平方向梯度和竖直方向梯度;
所述根据所述预测块的梯度确定所述平移类别,包括:
根据所述预测块的水平方向梯度和所述预测块的竖直方向梯度,确定所述预测块的梯度参数;
根据设定的梯度参数与平移类别的对应关系,确定所述预测块的梯度参数对应的平移类别。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
根据所述预测块中除最外一圈像素点以外的其他像素点中满足设定采样规则的部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
根据所述预测块中除最外一圈像素点以外的其他像素点中满足设定采样规则的部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
在本公开一实施例中,所述预测块的梯度参数根据以下方式确定:
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度,将所述其他像素点的水平方向梯度之和作为所述预测块的水平方向梯度;
根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度Gh,将所述其他像素点的竖直方向梯度之和作为所述预测块的竖直方向梯度Gv;
根据设定的梯度函数P(Gh,Gv)=Gh/Gv确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度, Gv为所述预测块的竖直方向梯度,在Gv为0的情况下,P(Gh,Gv)=0。
需要说明的是,本公开实施提出的方案中,根据水平方向梯度和竖直方向梯度的强弱及正负,可以确定残差像素如何平移。在本公开一实施例中,如果水平方向的梯度和竖直方向的梯度强度相同或近似相同,则可以估计预测块中的纹理是倾向于45°的,进而此来确定编码端和解码端的残差像素如何平移。
在本公开一实施例中,编码端设定的梯度参数与平移类别的对应关系也如表3所示。
在本公开一实施例中,所述预测块中除最外一圈像素点以外的其他每一个像素点的水平方向梯度根据以下方式确定:
该像素点水平方向相邻的两个像素点之差除以2。
在本公开一实施例中,所述预测块中除最外一圈像素点以外的其他每一个像素点的竖直方向梯度根据以下方式确定:
该像素点竖直方向相邻的两个像素点之差除以2。
需要说明的是,两个像素点之差即为两个像素点的像素值之差。在本公开一实施例中,YUV空间中,像素值I(x,y)为像素点(x,y)上的亮度分量Y。
可选地,本领域技术人员还可以采用其他方式确定预测块中除最外一圈像素点以外的其他每一个像素点在水平或竖直方向的梯度,不限于公开实施例所示例的方面。可选地,本领域技术人员还可以采用其他梯度函数P(Gh,Gv)来计算梯度参数,并对应设定梯度参数与平移类别映射表,不限于公开实施例所示例的方面。
需要说明的是,上述实施例中,可以根据编码过程中生成的预测块的相关属性确定所述平移类别。即,编码端根据预测块的相关属性对残差块进行变形,按照变形后的残差块继续执行后续编码步骤,最终完成编码得到编码码流。
在本公开一实施例中,所述确定所述残差数据的平移类别包括:根据所述残差数据的纹理特征确定所述平移类别;或者,根据所述残差数据对应的预测块的纹理特征确定所述平移类别。
在本公开一实施例中,所述编码方法还包括:
步骤804,将所述平移类别写入编码码流中。
在本公开一实施例中,步骤804包括:
将所述平移类别写入所述编码码流的以下语法元素之一中:
序列级语法元素、帧级语法元素、片级语法元素、编码树单元CTU级语法元素和编码单元CU级语法元素。
可以看出,根据步骤804,本公开实施例提供的编码方法还可以通过扩展码流,将平移类别写入编码码流。编码端可以根据解析得到的平移类别对解码得到的变形后的残差块进行反变形。
在本公开一实施例中,步骤802中根据所述平移方向和平移步长对所述残差数据进行平移,包括:
对所述残差数据的每一行分别执行以下步骤:
沿着所述平移类别指示的平移方向,按照所述平移类别指示的在当前行的平移步长,平移当前行像素。
在本公开一实施例中,所述平移类别指示的平移步长根据平移步长函数f(n)确定,n表示残差数据中的行序号,f(n)表示第n行的平移步长。
在本公开一实施例中,所述平移方向包括以下之一:
水平向左、水平向右、竖直向上、竖直向下。
在本公开一实施例中,所述平移步长函数f(n)包括以下之一:
f(n)=0;
f(n)=n/4;
f(n)=n/2;
f(n)=3n/4;
f(n)=n;
f(n)=4n/3;
f(n)=2n;
f(n)=4n。
在本公开一实施例中,设定的平移类别所指示的对应到每一行的平移步长可以相等,也可以不等。例如,如图10所示,平移类别1指示平移方向为水平向左,平移步长函数f(n)为
Figure PCTCN2021119157-appb-000003
可选地,根据图像数据特点,还可以设定其它平移方向以及对应其他平移步长函数,不限于本公开实施所示例的上述方面。
需要说明的是,编码和解码为对应的相反过程,对于本公开实施例中编解码方法中各个细节步骤未一一对应记载的方面,本领域技术人员可以根据编码方案中已记载的方面确定解码方案中对应方面,或者,根据解码方案中已记载的方面确定编码方案中对应方面。
本公开一实施例还提供了一种视频编码设备,如图11所示,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开一实施例还提供了一种视频解码设备,如图11所示,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。
本公开一实施例还提供了一种视频编解码系统,包括如本公开任一实施所述的视频编码设备和/或本公开任一实施所述的视频解码设备。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的视频解码方法或编码方法。
本公开一实施例还提供了一种码流,其中,所述码流根据如本公开任一实施例所述的视频编码方法生成。
可以看到,本公开实施例提供的编解码方法,通过残差变形使操作后的残差图像经过变换后所得到的系数矩阵更容易压缩,可以进一步提升压缩效率。
在一或多个示例性实施例中,所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软件实施,那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输,且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质,或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式,计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一或多个计算机或者一或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。
举例来说且并非限制,此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且,还可以将任何连接称作计算机可读介质举例来说,如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令,则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解,计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质,而是针对非瞬时有形存储介质。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等,其中磁盘通常以磁性方式再生数据,而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。
可由例如一或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理 器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或并入在组合式编解码器中。并且,可将所述技术完全实施于一个或多个电路或逻辑元件中。
本公开实施例的技术方案可在广泛多种装置或设备中实施,包含无线手机、集成电路(IC)或一组IC(例如,芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面,但不一定需要通过不同硬件单元来实现。而是,如上所述,各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (33)

  1. 一种视频解码方法,其特征在于,包括:
    从已编码视频码流中解码得到初始残差数据;
    确定所述初始残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述初始残差数据进行平移;
    根据平移后的残差数据得到重建图像。
  2. 如权利要求1所述的解码方法,其特征在于,
    所述确定所述初始残差数据的平移方向和平移步长,包括:
    确定所述初始残差数据的平移类别,将设定的所述平移类别指示的平移方向和平移步长确定为所述初始残差数据的平移方向和平移步长。
  3. 如权利要求2所述的解码方法,其特征在于,
    所述确定所述初始残差数据的平移类别,包括:
    在所述初始残差数据对应的预测块为帧内预测块的情况下,根据所述预测块的帧内预测模式确定所述平移类别;
    在所述初始残差数据对应的预测块为帧间预测块的情况下,根据所述预测块的梯度确定所述平移类别;
    或者,
    所述确定所述初始残差数据的平移类别,包括:
    根据所述初始残差数据对应的预测块的梯度确定所述平移类别。
  4. 如权利要求3所述的解码方法,其特征在于,
    所述根据所述预测块的帧内预测模式确定所述平移类别,包括:
    根据设定的帧内预测模式与平移类别的对应关系,确定所述预测块的帧内预测模式所对应的平移类别。
  5. 如权利要求3所述的解码方法,其特征在于,
    所述预测块的梯度包括:水平方向梯度和竖直方向梯度;
    所述根据所述初始残差数据对应的预测块的梯度确定所述平移类别,包括:
    根据所述预测块的水平方向梯度和所述预测块的竖直方向梯度,确定所述预测块的梯度参数;
    根据设定的梯度参数与平移类别的对应关系,确定所述预测块的梯度参数对应的平移类别。
  6. 如权利要求5所述的解码方法,其特征在于,
    所述预测块的梯度参数根据以下方式确定:
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
    根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
    根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
    根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
  7. 如权利要求5所述的解码方法,其特征在于,
    所述预测块的梯度参数根据以下方式确定:
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度,将所述其他像素点的水平方向梯度之和作为所述预测块的水平方向梯度;
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度,将所述其他像素点的竖直方向梯度之和作为所述预测块的竖直方向梯度;
    根据设定的梯度函数P(Gh,Gv)=Gh/Gv确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度,在Gv为0的情况下,P(Gh,Gv)=0。
  8. 如权利要求2所述的解码方法,其特征在于,
    所述确定所述初始残差数据的平移类别,包括:
    解析所述已编码视频码流获得所述初始残差数据的平移类别。
  9. 如权利要求8所述的解码方法,其特征在于,
    所述解析所述已编码视频码流获得所述初始残差数据的平移类别,包括:
    解析所述已编码视频码流,从以下语法元素之一中获得所述平移类别:
    序列级语法元素、帧级语法元素、片级语法元素、编码树单元CTU级语法元素和编码单元CU级语法元素。
  10. 如权利要求2所述的解码方法,其特征在于,
    所述确定所述初始残差数据的平移类别,包括:
    根据所述初始残差数据对应的预测块的纹理特征确定所述平移类别。
  11. 如权利要求2-10任一项所述的解码方法,其特征在于,
    所述根据所述平移方向和平移步长对所述初始残差数据进行平移,包括:
    对所述初始残差数据的每一行分别执行以下步骤:
    沿着所述平移类别指示的平移方向,按照所述平移类别指示的在当前行的平移步长,平移当前行像素。
  12. 如权利要求2-10任一项所述的解码方法,其特征在于,
    所述平移类别指示的平移步长根据平移步长函数f(n)确定,n表示初始残差数据中的行序号,f(n)表示第n行的平移步长。
  13. 如权利要求1-10任一项所述的解码方法,其特征在于,
    所述平移方向包括以下之一:
    水平向左、水平向右、竖直向上、竖直向下。
  14. 如权利要求12所述的解码方法,其特征在于,
    所述平移步长函数f(n)包括以下之一:
    f(n)=0;
    f(n)=n/4;
    f(n)=n/2;
    f(n)=3n/4;
    f(n)=n;
    f(n)=4n/3;
    f(n)=2n;
    f(n)=4n。
  15. 一种视频编码方法,其特征在于,包括:
    对待编码视频进行编码获得残差数据;
    确定所述残差数据的平移方向和平移步长,根据所述平移方向和平移步长对所述残差数据进行平移;
    根据平移后的残差数据得到编码码流。
  16. 如权利要求15所述的编码方法,其特征在于,
    所述确定所述残差数据的平移方向和平移步长,包括:
    确定所述残差数据的平移类别,将设定的所述平移类别指示的平移方向和平移步长确定为所述残差数据的平移方向和平移步长。
  17. 如权利要求16所述的编码方法,其特征在于,
    所述确定所述残差数据的平移类别,包括:
    在所述残差数据对应的预测块为帧内预测块的情况下,根据所述预测块的帧内预测模式确定所述平移类别;
    在所述残差数据对应的预测块为帧间预测块的情况下,根据所述预测块的梯度确定所述平移类别;
    或者,
    所述确定所述残差数据的平移类别,包括:
    根据所述残差数据对应的预测块的梯度确定所述平移类别。
  18. 如权利要求17所述的编码方法,其特征在于,
    所述根据所述预测块的帧内预测模式确定所述平移类别,包括:
    根据设定的帧内预测模式与平移类别的对应关系,确定所述预测块的帧内预测模式所对应的平移类别。
  19. 如权利要求17所述的编码方法,其特征在于,
    所述预测块的梯度包括:水平方向梯度和竖直方向梯度;
    所述根据所述残差数据对应的预测块的梯度确定所述平移类别,包括:
    根据所述预测块的水平方向梯度和所述预测块的竖直方向梯度,确定所述预测块的梯度参数;
    根据设定的梯度参数与平移类别的对应关系,确定所述预测块的梯度参数对应的平移类别。
  20. 如权利要求19所述的编码方法,其特征在于,
    所述预测块的梯度参数根据以下方式确定:
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度;
    根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的水平方向梯度,确定所述预测块的水平方向梯度;
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度;
    根据所述预测块中除最外一圈像素点以外的其他全部或部分像素点的竖直方向梯度,确定所述预测块的竖直方向梯度;
    根据设定的梯度函数P(Gh,Gv)确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度。
  21. 如权利要求19所述的编码方法,其特征在于,
    所述预测块的梯度参数根据以下方式确定:
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的水平方向梯度,将所述其他像素点的水平方向梯度之和作为所述预测块的水平方向梯度;
    根据所述预测块,确定所述预测块中除最外一圈像素点以外的其他像素点的竖直方向梯度,将所述其他像素点的竖直方向梯度之和作为所述预测块的竖直方向梯度;
    根据设定的梯度函数P(Gh,Gv)=Gh/Gv确定所述梯度参数;其中,Gh为所述预测块的水平方向梯度,Gv为所述预测块的竖直方向梯度,在Gv为0的情况下,P(Gh,Gv)=0。
  22. 如权利要求16所述的编码方法,其特征在于,
    所述确定所述残差数据的平移类别,包括:
    根据所述残差数据的纹理特征确定所述平移类别;
    或者,
    根据所述残差数据对应的预测块的纹理特征确定所述平移类别。
  23. 如权利要求16所述的编码方法,其特征在于,
    所述方法还包括:将所述平移类别写入编码码流中。
  24. 如权利要求23所述的编码方法,其特征在于,
    所述将所述平移类别写入所述编码码流中,包括:
    将所述平移类别写入所述编码码流的以下语法元素之一中:
    序列级语法元素、帧级语法元素、片级语法元素、编码树单元CTU级语法元素和编码单元CU级语法元素。
  25. 如权利要求16-24任一项所述的编码方法,其特征在于,
    所述根据所述平移方向和平移步长对所述残差数据进行平移,包括:
    对所述残差数据的每一行分别执行以下步骤:
    沿着所述平移类别指示的平移方向,按照所述平移类别指示的在当前行的平移步长,平移当前行像素。
  26. 如权利要求16-24任一项所述的编码方法,其特征在于,
    所述平移类别指示的平移步长根据平移步长函数f(n)确定,n表示残差数据中的行序号,f(n)表示第n行的平移步长。
  27. 如权利要求15-24任一项所述的编码方法,其特征在于,
    所述平移方向包括以下之一:
    水平向左、水平向右、竖直向上、竖直向下。
  28. 如权利要求26所述的编码方法,其特征在于,
    所述平移步长函数f(n)包括以下之一:
    f(n)=0;
    f(n)=n/4;
    f(n)=n/2;
    f(n)=3n/4;
    f(n)=n;
    f(n)=4n/3;
    f(n)=2n;
    f(n)=4n。
  29. 一种视频解码设备,其特征在于,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求1至14任一项所述的视频解码方法。
  30. 一种视频编码设备,其特征在于,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求15至28任一项所述的视频编码方法。
  31. 一种视频编解码系统,其中,包括如权利要求29所述的视频解码设备和/或如权利要求30所述的视频编码设备。
  32. 一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如权利要求1至28中任一项所述的方法。
  33. 一种码流,其中,所述码流根据如权利要求15至28中任一项所述的视频编码方法生成。
PCT/CN2021/119157 2021-09-17 2021-09-17 一种视频解码、编码方法及设备、存储介质 WO2023039856A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/119157 WO2023039856A1 (zh) 2021-09-17 2021-09-17 一种视频解码、编码方法及设备、存储介质
CN202180102264.6A CN117957842A (zh) 2021-09-17 2021-09-17 一种视频解码、编码方法及设备、存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/119157 WO2023039856A1 (zh) 2021-09-17 2021-09-17 一种视频解码、编码方法及设备、存储介质

Publications (1)

Publication Number Publication Date
WO2023039856A1 true WO2023039856A1 (zh) 2023-03-23

Family

ID=85602308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119157 WO2023039856A1 (zh) 2021-09-17 2021-09-17 一种视频解码、编码方法及设备、存储介质

Country Status (2)

Country Link
CN (1) CN117957842A (zh)
WO (1) WO2023039856A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101895757A (zh) * 2010-07-15 2010-11-24 北京大学 预测残差块的重排序、逆重排序方法及系统
WO2011000691A2 (en) * 2009-06-29 2011-01-06 Thomson Licensing Method and device for encoding a residual and corresponding method and device for decoding a residual
CN102474625A (zh) * 2009-07-23 2012-05-23 瑞典爱立信有限公司 用于图像的编码和解码的方法和设备
WO2020009460A1 (ko) * 2018-07-04 2020-01-09 에스케이텔레콤 주식회사 잔차신호 재배열 방법 및 영상 복호화 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011000691A2 (en) * 2009-06-29 2011-01-06 Thomson Licensing Method and device for encoding a residual and corresponding method and device for decoding a residual
CN102474625A (zh) * 2009-07-23 2012-05-23 瑞典爱立信有限公司 用于图像的编码和解码的方法和设备
CN101895757A (zh) * 2010-07-15 2010-11-24 北京大学 预测残差块的重排序、逆重排序方法及系统
WO2020009460A1 (ko) * 2018-07-04 2020-01-09 에스케이텔레콤 주식회사 잔차신호 재배열 방법 및 영상 복호화 장치

Also Published As

Publication number Publication date
CN117957842A (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
TWI827609B (zh) 基於區塊之自適應迴路濾波器(alf)之設計及發信令
RU2565502C2 (ru) Кодирование коэффициентов преобразования для кодирования видео
TWI686080B (zh) 用於大的寫碼樹單元之內容
WO2018001207A1 (zh) 编解码的方法及装置
JP6266605B2 (ja) 映像コーディングにおけるロスレスコーディングモード及びパルスコード変調(pcm)モードのシグナリングの統一
US20230319278A1 (en) Residual coding method and device for same
TW202110189A (zh) 用於視訊寫碼的環繞運動補償
US12022061B2 (en) Method and apparatus for coding transform coefficient in video/image coding system
JP7413557B2 (ja) サインデータハイディング関連映像デコーディング方法及びその装置
TW201313031A (zh) 用於大色度區塊的可變長度寫碼係數寫碼
US11245904B2 (en) Method for coding transform coefficient and device therefor
EP3764646A1 (en) Image filtering method and apparatus, and video codec
TW202143712A (zh) 視訊轉碼中的低頻不可分離變換處理
JP2022537662A (ja) ビデオエンコーディングにおける適応ループフィルタのためのクリッピングインデックスコード化
WO2022257134A1 (zh) 一种视频编解码方法、装置、系统及存储介质
TW201921938A (zh) 具有在用於視訊寫碼之隨機存取組態中之未來參考訊框之可調適圖像群組結構
WO2023039856A1 (zh) 一种视频解码、编码方法及设备、存储介质
TW202304201A (zh) 使用重疊區塊運動補償、組合訊框間-訊框內預測及/或亮度映射和色度縮放的視訊譯碼
WO2024016171A1 (zh) 一种视频编码方法、设备、存储介质及码流
JP2022540144A (ja) デブロッキングフィルタリングに基づく映像コーディング方法及びその装置
WO2023004590A1 (zh) 一种视频解码、编码方法及设备、存储介质
JP2022552173A (ja) ビデオコーディングのための変換スキップにおける残差値のためのコーディング方式をシグナリングすること
WO2024050723A1 (zh) 一种图像预测方法、装置及计算机可读存储介质
WO2022257142A1 (zh) 一种视频解码、编码方法及设备、存储介质
CN116760976B (zh) 仿射预测决策方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21957137

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180102264.6

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE