CN115699759A - Method and apparatus for encoding and decoding video using SATD-based cost computation - Google Patents

Method and apparatus for encoding and decoding video using SATD-based cost computation Download PDF

Info

Publication number
CN115699759A
CN115699759A CN202180040503.XA CN202180040503A CN115699759A CN 115699759 A CN115699759 A CN 115699759A CN 202180040503 A CN202180040503 A CN 202180040503A CN 115699759 A CN115699759 A CN 115699759A
Authority
CN
China
Prior art keywords
value
determining
transform coefficient
predefining
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180040503.XA
Other languages
Chinese (zh)
Inventor
陈伟
修晓宇
郭哲玮
陈漪纹
马宗全
朱弘正
王祥林
于冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Publication of CN115699759A publication Critical patent/CN115699759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/18Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding

Abstract

A video encoding and decoding method and apparatus, and a non-transitory computer-readable storage medium. The method comprises determining a first weighted transform coefficient by applying a first weight to the first transform coefficient and determining a second weighted transform coefficient by applying a second weight to the second transform coefficient. The method also includes determining a rate distortion based on the first and second weighted transform coefficients.

Description

Method and apparatus for encoding and decoding video using SATD-based cost computation
Cross Reference to Related Applications
This application claims priority to "improved SATD-based costing" in U.S. provisional application Ser. No.63/035,601, filed on 5/6/2020, the entire contents of which are incorporated herein by reference in their entirety.
Technical Field
The present application relates to video coding and compression. More particularly, the present application relates to methods and apparatus for improving codec efficiency for, but not limited to, sum of Absolute Transformed Differences (SATD) based cost calculations.
Background
Various video codec techniques may be used to compress the video data. Video coding is performed according to one or more video coding standards. For example, video codec standards include general video codec (VVC), high efficiency video codec (HEVC, also known as H.265 or MPEG-H part 2), and advanced video codec (AVC, also known as H.264 or MPEG-4 part 10), developed jointly by ISO/IEC MPEG and ITU-T VECG. AOmedia Video 1 (AV 1) was developed by the open media Alliance (AOM) as a successor to the previous standard VP 9. Audio video coding (AVS) refers to the digital audio and digital video compression standard, another video compression standard family developed by the chinese audio and video coding standards working group. Most existing video coding standards build on a well-known hybrid video coding framework, i.e., block-based prediction methods (e.g., inter-frame prediction, intra-frame prediction) are used to reduce redundancy present in a video image or sequence, and transform coding is used to compress the energy of the prediction error. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.
The first generation of AVS standard includes the chinese national standard "information technology, advanced audio video codec, part 2: video (called AVS 1) and information technology, advanced audio video codec, section 16: wireless television video (called AVS +). It can save about 50% of the bit rate compared to the MPEG-2 standard at the same perceptual quality. The AVS1 standard video part was published in 2006 and 2 months as a chinese national standard. The second generation AVS standard includes the chinese national standard "information technology, high efficiency multimedia codec" (referred to as AVS 2), and is mainly used for transmission of ultra high definition television programs. The encoding efficiency of AVS2 is twice that of AVS +. In 2016, 5 months, AVS2 was published as a chinese national standard.
Meanwhile, the AVS2 standard video part is filed by the Institute of Electrical and Electronics Engineers (IEEE) as an international standard application. The AVS3 standard is a new generation of video codec standard for ultra high definition video applications, aiming at exceeding the coding efficiency of the latest international standard HEVC. In month 3 2019, at the 68 th AVS meeting, the AVS3-P2 baseline was completed, saving about 30% bit rate over the HEVC standard. Currently, there is a reference software called High Performance Model (HPM) maintained by the AVS group for demonstrating reference implementations of the AVS3 standard.
Disclosure of Invention
Examples of techniques related to video coding are provided.
According to a first aspect of the present application, a video coding and decoding method is provided. The method can be used in an encoder or a decoder. The method comprises determining a first weighted transform coefficient by applying a first weight to the first transform coefficient and determining a second weighted transform coefficient by applying a second weight to the second transform coefficient. The method also includes determining a rate distortion based on the first and second weighted transform coefficients.
According to a second aspect of the present application, a video coding and decoding device is provided. The apparatus includes one or more processors and memory configured to store instructions executable by the one or more processors, wherein, when executing the instructions, the one or more processors are configured to determine a first weighted transform coefficient by applying a first weight to a first transform coefficient, determine a second weighted transform coefficient by applying a second weight to a second transform coefficient, and determine a rate distortion based on the first and second weighted transform coefficients.
According to a third aspect of the present application, there is provided a non-transitory computer-readable storage medium for video coding having stored thereon computer-executable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform the following acts: the method further includes determining a first weighted transform coefficient by applying a first weight to the first transform coefficient, determining a second weighted transform coefficient by applying a second weight to the second transform coefficient, and determining the rate distortion based on the first and second weighted transform coefficients.
Drawings
A more particular description of examples of the application will be rendered by reference to specific examples thereof which are illustrated in the appended drawings. In view of the fact that these drawings depict only some examples and are therefore not to be considered limiting in scope, these examples will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Fig. 1 illustrates a block diagram of an exemplary video encoder according to some embodiments of the present application.
Fig. 2A-2E illustrate one or more examples of a picture divided into multiple Coding Tree Units (CTUs) according to some embodiments of the present application.
Fig. 3 is a block diagram illustrating an exemplary video decoder according to some embodiments of the present application.
Fig. 4 illustrates an example of a low frequency inseparable transform (LFNST) applied between primary forward transform and quantization at an encoder and an inverse LFNST applied between dequantization and inverse primary transform at a decoder, according to some embodiments of the present application.
Fig. 5 is a block diagram illustrating an exemplary video codec device according to some embodiments of the present application.
Fig. 6 is a flow chart illustrating an exemplary video codec method according to some embodiments of the present application.
Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent, however, to one skilled in the art that various alternatives may be used and the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in connection with one or more embodiments may be applicable to other embodiments as well, unless stated otherwise.
Throughout this disclosure, the terms "first," "second," "third," and the like, are used as terms of referring only to the relevant elements, such as devices, components, compositions, steps, and the like, and do not imply any spatial or temporal order unless clearly indicated otherwise. For example, "first device" and "second device" may refer to two separately formed devices, or may refer to two parts, components, or operating states of the same device, and may be arbitrarily named.
The terms "module," "sub-module," "circuit," "sub-circuit," "circuitry," "sub-circuitry," "unit" or "sub-unit" may comprise memory (shared, dedicated, or combined) that stores code or instructions that may be executed by one or more processors. A module may comprise one or more circuits, with or without stored code or instructions. A module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to each other or positioned adjacent to each other.
As used herein, the term "if" or "when … …" may be understood to mean "at … …" or "in response to … …", depending on the context. These terms, if they appear in the claims, may not indicate that the associated limitation or feature is conditional or optional. For example, a method may include the steps of: i) Perform function or action X 'when or if condition X exists, and ii) perform function or action Y' when or if condition Y exists. The method may be implemented with the ability to perform function or action X 'and the ability to perform function or action Y'. Thus, both functions X 'and Y' may be performed in multiple executions of the method at different times.
The units or modules may be implemented purely in software, purely in hardware or in a combination of hardware and software. In a purely software implementation, a unit or module may comprise functionally related code blocks or software components linked together, directly or indirectly, for performing specific functions, for example.
Fig. 1 illustrates a block diagram of a block-based hybrid video encoder 100 that may be used in connection with a variety of video codec standards that use block-based processing. In encoder 100, a video frame is partitioned into multiple video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method. In inter-prediction, one or more prediction values are formed by motion estimation and motion compensation based on pixels from a previously reconstructed frame. In intra prediction, a prediction value is formed based on reconstructed pixels in the current frame. Through the mode decision, the best predictor can be selected to predict the current block.
The prediction residual, which represents the difference between the current video block and its prediction value, is sent to the transform circuit 102. The transform coefficients are then sent from transform circuit 102 to quantization circuit 104 for entropy reduction. The quantized coefficients are then fed to an entropy coding circuit 106 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 110, such as video block partition information, motion vectors, reference picture indices, and intra prediction modes, from intra prediction circuitry and/or inter prediction circuitry 112 is also fed through entropy encoding circuitry 106 and saved into compressed video bitstream 114.
In the encoder 100, decoder-related circuitry is also required to reconstruct the pixels for prediction. First, the prediction residual is reconstructed by inverse quantization 116 and inverse transform circuit 118. This reconstructed prediction residual is combined with the block predictor 120 to generate an unfiltered reconstructed pixel for the current video block.
Intra-prediction (also referred to as "spatial prediction") uses pixels from samples of already-coded neighboring blocks (referred to as reference samples) in the same video picture and/or slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals.
Inter prediction (also referred to as "temporal prediction") uses reconstructed pixels from an encoded video picture to predict a current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given Coding Unit (CU) or block is typically signaled by one or more Motion Vectors (MV) indicating the amount and direction of motion between the current CU and its temporal reference. Further, if multiple reference pictures are supported, one reference picture index is additionally sent to identify from which reference picture in the reference picture memory the temporal prediction signal comes.
After performing spatial and/or temporal prediction, the intra/inter mode decision circuit 121 in the encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. Block predictor 120 is then subtracted from the current video block; and the resulting prediction residual is decorrelated using transform circuitry 102 and quantization circuitry 104. The resulting quantized residual coefficients are dequantized by an dequantization circuit 116 and inverse transformed by an inverse transform circuit 118 to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. Furthermore, before placing the reconstructed CU into the reference picture memory of the picture buffer 117, a loop filter 115, such as a deblocking filter, sample Adaptive Offset (SAO), and/or Adaptive Loop Filter (ALF), may be applied on the reconstructed CU and used to encode future video blocks. To form the output video bitstream 114, the (inter or intra) coding mode, prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit 106 to be further compressed and packed to form the bitstream.
For example, deblocking filters may be used for AVC, HEVC, and current versions of VVC. In HEVC, an additional loop filter called SAO (sample adaptive offset) is defined to further improve the coding efficiency. In the present version of VVC standard, another loop filter called ALF (adaptive loop filter) is actively studied, which is highly likely to be incorporated into the final standard.
These loop filter operations are optional. Performing these operations helps to improve coding efficiency and visual quality. They can also be turned off as a decision presented by the encoder 100 to save computational complexity.
It should be noted that if the encoder 100 turns on these filter options, intra prediction is typically based on unfiltered reconstructed pixels, while inter prediction is based on filtered reconstructed pixels.
As with HEVC, the AVS3 standard builds on top of a block-based hybrid video codec framework. The input video signal is processed block by block (referred to as CU). Unlike HEVC, which partitions blocks based on only quadtrees, in AVS3 one CTU is split into multiple CUs to accommodate the different local features of quadtrees/binary trees/extended quadtrees. Furthermore, the concept of multi-partition unit type in HEVC is removed, i.e. there is no separation of CU, prediction Unit (PU) and Transform Unit (TU) in AVS 3; instead, each CU is always used as a basic unit for prediction and transform without further partitioning.
In the tree partition structure of AVS3, one CTU is first divided based on a quad tree structure. The leaf nodes of each quadtree may then be further partitioned based on the binary tree and the extended quadtree structure.
Fig. 2A-2E illustrate examples of pictures partitioned into CTUs according to some embodiments of the present application. As shown in fig. 2A-2E, there are five partition types, including quad partition 282 in fig. 2A, horizontal binary partition 284 in fig. 2B, vertical binary partition 286 in fig. 2C, horizontally extended quadtree partition 288 in fig. 2D, and vertically extended quadtree partition 290 in fig. 2E.
In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") uses pixels from samples of already coded neighboring blocks (called reference samples) in the same video picture/slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in video signals.
Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from an encoded video picture to predict the video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given CU is typically signaled by one or more MVs that indicate the amount and direction of motion between the current CU and its temporal reference. Furthermore, if multiple reference pictures are supported, one reference picture index is additionally transmitted for identifying from which reference picture in the reference picture memory the temporal prediction signal comes. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode based on, for example, a rate-distortion optimization method. The prediction block is then subtracted from the current video block and the prediction residual is decorrelated using a transform and then quantized. The quantized residual coefficients are inverse quantized and inverse transformed to form the reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. Loop filtering, such as a deblocking filter, SAO, and/or ALF, may be applied to the reconstructed CU before the reconstructed CU is placed in a reference picture memory and used for reference for coding future video blocks. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information and quantized residual coefficients are all sent to an entropy coding unit to be further compressed and packed.
Fig. 3 is a block diagram illustrating an exemplary block-based video decoder 200 that may be used in conjunction with a variety of video codec standards according to some embodiments of the present application. The decoder 200 is similar to the reconstruction related part in the encoder 100 of fig. 1. In the decoder 200, an input video bitstream 201 is first decoded by an entropy decoding 202 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization 204 and inverse transformation 206 to obtain reconstructed prediction residuals. Based on the decoded prediction information, the block prediction value mechanism implemented in the intra/inter mode selector 212 is configured to perform either intra prediction 208 or motion compensation 210. The set of unfiltered reconstructed pixels is obtained by adding the reconstructed prediction residual from the inverse transform 206 to the prediction output generated by the block prediction mechanism using an adder 214.
The reconstructed block may further pass through a loop filter 209 before being stored in a picture buffer 213 serving as a reference picture memory. The reconstructed video in the picture buffer 213 may be sent to drive a display device and used to predict future video blocks. With the loop filter 209 open, a filtering operation is performed on these reconstructed pixels to derive the final reconstructed video output 222.
The coding mode and prediction information are sent to a spatial prediction unit (if intra-coded) or a temporal prediction unit (if inter-coded) of the prediction block. The residual transform coefficients are sent to inverse quantization 204 and inverse transform 206 to reconstruct the residual block. The prediction block and the residual block are then added. The reconstructed block may be further loop filtered before being stored in the reference picture memory. The reconstructed video in the reference picture store is then sent out for display and used to predict future video blocks.
The present application improves codec efficiency when SATD is used as a cost metric in current VVC and AVS3 standards. For the purpose of describing the invention, the following first introduces the conversion schemes used in the current VVC and AVS standards. We then briefly review the error metrics used in RDO MV search, which also explains why SATD-based metrics are typically used for natural content video coding. Thereafter, some disadvantages of current SATD calculations are identified. Finally, a proposal for an improvement in the SATD calculation is provided in detail.
Low frequency indivisible transformation (LFNST)
In the current VVC specification, the LFNST tool is commonly referred to as a quadratic transform and is used to compress the energy of the transform coefficients of the primary transformed intra-coded block.
Fig. 4 illustrates an example of a low frequency inseparable transform (LFNST) applied between primary forward transform and quantization at an encoder and an inverse LFNST applied between dequantization and inverse primary transform at a decoder, according to some embodiments of the present application. As shown in fig. 4, LFNST 403 is applied between the primary forward transform 401 and quantization 405 at the encoder, while the inverse LFNST 404 is applied between dequantization 402 and the inverse primary transform 406 at the decoder. Block B1 in fig. 4 may involve 16 input coefficients of a 4 × 4 forward LFNST or 64 input coefficients of an 8 × 8 forward LFNST. Block B2 is associated with 8 input coefficients of the 4 x 4 inverse LFNST or 16 input coefficients of the 8 x 8 inverse LFNST.
In LFNST, an indivisible transform with different transform sizes is applied based on the size of one coding block, which can be described as the following matrix multiplication process. Assuming that LFNTS is applied to a 4 × 4 block, the samples within the 4 × 4 block are:
Figure BDA0003981154800000081
is first serialized into a vector, namely:
Figure BDA0003981154800000082
then, LFNTS is applied as
Figure BDA0003981154800000083
Wherein the content of the first and second substances,
Figure BDA0003981154800000084
is the transform coefficient after LFNTS, and T is the transform kernel. In the above example, T is a 16 x 16 matrix. The 16 x 1 vectors are then scanned according to a predefined scan order
Figure BDA0003981154800000085
Reorganized into 4 x 4 blocks, where the coefficients at the beginning of the vector will be associated with the smaller scan index in the 4 x 4 block.
Multiple conversion selection (MTS)
In addition to the two-dimensional discrete cosine transform (DCT 2) used in HEVC, the MTS scheme is used in VVC and applied to transform the residual of inter and intra coded blocks. Based on the MTS, one transform is selected from a plurality of transforms of DCT8 and DST7 transforms.
In existing MTS designs, two control flags are specified at the sequence level to enable MTS for intra and inter modes, respectively. When the MTS is enabled at the sequence level, another CU-level flag is further signaled to indicate whether to apply the MTS for a CU. According to the VVC specification, MTS is applied only to the luminance component.
Furthermore, the MTS is signaled only when the following conditions are met: 1) The width and height of a given block are less than or equal to 32; 2) CBF flag equals 1; 3) The horizontal and vertical coordinates of the luminance last non-zero coefficient are less than 16 (because the transform coefficients outside the 16 x 16 region in the upper left corner are forced to zero). If the MTS CU flag is equal to 0, then DCT2 is applied in both directions. If the MTS CU flag is equal to 1, two other flags are additionally sent to indicate the transform type in the horizontal and vertical directions, respectively. In terms of transform matrix precision, all MTS transform coefficients are 6-bit precision, the same as DCT2 core transform.
Given that VVC supports all transform sizes used in HEVC, all transform kernels used in HEVC remain the same as in VVC, including 4-point, 8-point, 16-point, and 32-point DCT-2 transforms and 4-point DST-7 transforms. Meanwhile, the VVC transform design also supports other transform kernels, including 64-point DCT-2, 4-point DCT-8, 8-point, 16-point, 32-point DST-7 and DCT-8.
Error metrics in Rate Distortion Optimization (RDO) MV search
On the encoder side, RDO is typically used in order to efficiently search for good MVs. In RDO Motion Estimation (ME), the rate encodes bits for MV, and distortion is measured with some measure of error on the prediction residual/error block. Commonly used error metrics include Sum of Squared Error (SSE), sum of Absolute Difference (SAD), and SATD. SAD and SATD are less computationally complex and therefore more used than SSE. For natural content video coding, the transform is an efficient and necessary step to process the residual for quantization. It is generally recognized and well known that applying a simple Hadamard transform to the predicted residual block and estimating the transform error distortion in RDO will more accurately explain the effect of the actual residual codec bits, where transform codec will be involved, and thus result in better ME performance and better overall codec efficiency. Thus, for natural content video coding, SATD is more advantageous than SAD in RDO ME.
For natural content video coding, SATD-based cost calculations more accurately account for the effects of the actual residual codec bits and are therefore commonly used for RDO ME. However, the SATD model treats each transform coefficient as equivalent to the cost of RDO ME. In practice, direct Current (DC) coefficients are typically more efficiently encoded than Alternating Current (AC) coefficients, and therefore such equivalent considerations in SATD can be improved.
In the present application, a method of improving coding efficiency for SATD-based cost computation by considering DC and AC coefficients having different importance is proposed. In one example, different importance may be represented by different weights applied to the DC and AC coefficients. These weights may be a set of predefined fixed values, or may be determined dynamically, e.g. content dependent, CU size dependent, or if using sub-block based SATD calculation, the ratio of CU to inner sub-block sizes.
Weighting coefficients for cost computation based on SATD
To more accurately account for the effects of the actual residual codec bits, the absolute values of the DC and AC coefficients are weighted differently after the SATD transform process.
In some examples, the weights applied to the DC and AC coefficients are predefined as a set of fixed values.
In some examples, the weights may be empirically predefined. For example, the DC coefficient uses a weight of 0.5, while the AC coefficient uses a higher weight value, e.g., 0.75, when the AC coefficient has a greater influence on the cost calculation, or uses a lower value, e.g., 0.25, when the AC coefficient has a smaller influence on the cost calculation.
The weights may be predefined as a plurality of sets of fixed values, and the selection of the sets may depend on certain conditions. In one example, for image content that is rich in detail (e.g., different sharp edges), the difference between different weight values may tend to be smaller, so that a group with similar weight values may be selected.
If sub-block based SATD calculations are performed within a CU, the weights may be predefined with respect to the size ratio between the CU and the internal sub-blocks. In one example, the weight applied to the DC coefficient may be half of the weight applied to the AC coefficient if the total number of samples within a sub-block is greater than one-fourth of the total number of samples within the CU to which the sub-block belongs, and the weight applied to the DC coefficient may be one-fourth of the weight applied to the AC coefficient if the total number of samples within a sub-block is less than one-fourth of the total number of samples within the CU.
In another example, if the total number of samples within a sub-block is less than one-eighth of the total number of samples within the CU to which the sub-block belongs, the weight applied to the DC coefficient may be one-quarter of the weight applied to the AC coefficient.
The weights may be predefined as a hierarchical ordering. For example, if the sub-block partition within a CU is larger than one-fourth of the size of the CU, the weight applied to the absolute value of the DC coefficient is the same as the weight applied to the AC coefficient. If the sub-block partitions within a CU are less than one-fourth the size of the CU but greater than one-sixteenth the size of the CU, then the weight applied to the DC coefficient is two-quarters of the weight applied to the AC coefficient. If the sub-block partition within a CU is less than one sixteenth of the size of the CU, the weight applied to the DC coefficient is one quarter of the weight applied to the AC coefficient.
In some other examples, the weights applied to the DC and AC coefficients are dynamically determined.
The weights may be calculated by comparing the values of the DC coefficients from the sub-block SATD transform process and the DC coefficients from the down-sampled CU to which the sub-block belongs. For example, if the absolute value of the DC coefficient from the sub-block SATD transform process is half the DC coefficient value from the down-sampled current CU, the weight applied to the DC coefficient of the current sub-block may be two-quarters.
The methods described above may be implemented using an apparatus comprising one or more circuits including an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use circuitry in combination with other hardware or software components to perform the above-described methods. Each module, sub-module, unit or sub-unit disclosed above may be implemented, at least in part, using one or more circuits.
Fig. 5 is a block diagram illustrating an exemplary video codec device according to some embodiments of the present application. The apparatus 500 may be a terminal, such as a cell phone, a tablet, a digital broadcast terminal, a tablet device, or a personal digital assistant.
As shown in fig. 5, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.
The processing component 502 generally controls overall operation of the device 500, such as operations related to display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 for executing instructions to perform all or a portion of the steps of the above-described method. Further, the processing component 502 can include one or more modules to facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store different types of data to support the operation of the apparatus 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile storage devices, and the memory 504 may be Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 500.
The multimedia component 508 includes a screen that provides an output interface between the device 500 and a user. In some examples, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen that receives an input signal from a user. The touch panel may include one or more touch sensors for sensing touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some examples, the multimedia component 508 may include a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode.
The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC). The microphone is configured to receive external audio signals when the apparatus 500 is in an operational mode, such as a talk mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted through the communication component 516. In some examples, audio component 510 also includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and a peripheral interface module. The peripheral interface module can be a keyboard, a click wheel, a button and the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
The sensor assembly 514 includes one or more sensors for providing status evaluation for the device 500 in various aspects. For example, the sensor assembly 514 may detect the on/off state of the device 500 and the relative position of the assembly. These components are, for example, the display and the keyboard of the device 500. The sensor assembly 514 may also detect changes in the position of the apparatus 500 or components of the apparatus 500, the presence or absence of user contact on a device, the direction or acceleration/deceleration of the apparatus 500, and changes in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include an optical sensor, such as a CMOS or CCD image sensor used in imaging applications. In some examples, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate wired or wireless communication between the apparatus 500 and other devices. The apparatus 500 may access a wireless network based on a communication standard such as WiFi, 4G, or a combination thereof. In one example, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one example, communications component 516 can further include a Near Field Communication (NFC) module for facilitating short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and the like.
In one example, the apparatus 500 may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components to perform the above-described methods.
The non-transitory computer-readable storage medium may be, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), flash memory, a hybrid drive or Solid State Hybrid Drive (SSHD), read Only Memory (ROM), compact disc read only memory (CD-ROM), magnetic tape, a floppy disk, and the like.
Fig. 6 is a flow chart illustrating an exemplary video codec method according to some embodiments of the present application.
In step 602, the processor 520 determines a first weighted transform coefficient by applying a first weight to the first transform coefficient.
In step 604, processor 520 determines a second weighted transform coefficient by applying a second weight to the second transform coefficient.
In step 606, the processor 520 determines a rate distortion based on the first and second weighted transform coefficients.
In some examples, processor 520 may determine the rate distortion by summing absolute values of a plurality of weighted transform coefficients including the first weighted transform coefficient and the second weighted transform coefficient.
In some examples, processor 520 may first identify a corresponding block of the video block, which may be a sub-block in the CU, and then generate a residual block by calculating a sample-by-sample difference between the video block and the corresponding block. Further, processor 520 may generate a transform block by applying a transform to the residual block. Further, processor 520 may obtain the plurality of weighted transform coefficients by applying different weights to the plurality of transform coefficients of the transform block at different locations. For example, the plurality of transform coefficients may include the first and second transform coefficients. The first and second transform coefficients may correspond to different locations.
In some examples, the transform may be, but is not limited to, a Hadamard transform.
In some examples, processor 520 may determine the first and second transform coefficients based on a sum of SATD transform processes.
In some examples, the first transform coefficient is a DC coefficient and the second transform coefficient is an AC coefficient.
In some examples, processor 520 may apply a second weight to the second transform coefficient by applying the second weight to a plurality of transform coefficients including the second transform coefficient, wherein the plurality of transform coefficients are AC coefficients.
In some examples, processor 520 may also determine a third weighted transform coefficient by applying a third weight to a third transform coefficient, where the first transform coefficient is a DC coefficient and the second and third transform coefficients are AC coefficients.
In some examples, processor 520 may determine the rate-distortion by determining the rate-distortion further based on the third weighted transform coefficient.
In some examples, processor 520 may predefine a first value and a second value for the first weight and the second weight, respectively.
In some examples, processor 520 may predefine the first value and the second value by determining a plurality of sets of fixed values, selecting a target set from the plurality of sets, and determining the first value and the second value from the target set.
In some examples, processor 520 may select a group having a near fixed value as the target group in response to determining that the amount of detail of the associated image content is greater than a predetermined amount. For example, the difference between each adjacent value in the group having close fixed values may be within a predetermined range, such as 0.1 or 0.05.
In some examples, processor 520 may predefine the first and second values by predefining the first and second values based on a size ratio between the CU and an inner sub-block within the CU.
In some examples, processor 520 may predefine the first value as half of the second value in response to determining that the total number of samples within the subblock video block is greater than one-fourth of the total number of samples within the CU. Further, in response to determining that the total number of samples within the sub-block is less than one-fourth of the total number of samples within the CU, processor 520 may predefine the first value as one-fourth of the second value. Further, in response to determining the total number of samples within the sub-block, processor 520 may predefine the first value as one-fourth of the second value.
In some examples, processor 520 may predefine the first and second values based on a hierarchical order.
In some examples, processor 520 may predefine the first value to be the same as the second value in response to determining that sub-blocks within a CU are greater than one-fourth of the size of the CU. Further, processor 520 may predefine the first value as two-quarters of the second value in response to determining that sub-blocks within the CU are less than one-quarter of the size of the CU and greater than one-sixteenth of the size of the CU. Further, processor 520 may predefine the first value as one-fourth of the second value in response to determining that sub-blocks within the CU are less than one sixteenth of the size of the CU.
In some examples, processor 520 may dynamically determine a first value for the first weight and a second value for the second weight. Further, the processor may dynamically determine the first value based on a comparison between a third value of the first transform coefficient from the sub-block SATD transform process and a fourth value of the first transform coefficient from a downsampled CU that includes the sub-block.
In some examples, processor 520 may dynamically determine the first value to be two-quarters in response to determining that the third value is half of the fourth value.
In some examples, an apparatus for video coding is provided. The apparatus includes one or more processors 520 and memory 504, the memory 504 configured to store instructions executable by the one or more processors. The processor 520 is configured to perform the method illustrated in fig. 6 when executing instructions.
In some examples, a non-transitory computer-readable storage medium for video coding is provided. The non-transitory computer-readable storage medium stores computer-executable instructions that, when executed by the one or more computer processors 520, cause the one or more computer processors 520 to perform the method illustrated in fig. 6.
The description of the present application has been presented for purposes of illustration and is not intended to be exhaustive or limited to the application. Many modifications, variations and alternative embodiments will become apparent to those skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The example was chosen and described in order to explain the principles of the application and to enable others of ordinary skill in the art to understand the application for various implementations and to best utilize the general principles and various implementations with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the application is not to be limited to the specific examples of the disclosed embodiments and that modifications and other embodiments are intended to be included within the scope of the application.

Claims (19)

1. A video encoding and decoding method, comprising:
determining a first weighted transform coefficient by applying a first weight to the first transform coefficient;
determining a second weighted transform coefficient by applying a second weight to the second transform coefficient; and
determining a rate distortion based on the first weighted transform coefficient and the second weighted transform coefficient.
2. The method of claim 1, further comprising:
the first weighted transform coefficient and the second transform coefficient are determined based on a sum of absolute transformed differences SATD transform process.
3. The method of claim 1, wherein the first transform coefficient is a Direct Current (DC) coefficient and the second transform coefficient is an Alternating Current (AC) coefficient.
4. The method of claim 1, wherein applying the second weight to the second transform coefficient comprises:
applying the second weight to a plurality of transform coefficients including the second transform coefficient,
wherein the plurality of transform coefficients are AC coefficients.
5. The method of claim 1, further comprising:
determining a third weighted transform coefficient by applying a third weight to the third transform coefficient,
wherein the first transform coefficient is a DC coefficient, and
the second transform coefficient and the third transform coefficient are AC coefficients.
6. The method of claim 5, wherein determining the rate distortion comprises:
determining the rate distortion further based on the third weighted transform coefficient.
7. The method of claim 1, further comprising:
a first value and a second value are predefined for the first weight and the second weight, respectively.
8. The method of claim 7, wherein predefining the first value and the second value comprises:
predefining the first value to be 0.5; and
predefining the second value as 0.75 or 0.25.
9. The method of claim 7, wherein predefining the first and second values comprises:
determining a plurality of groups of fixed values;
selecting a target group from the plurality of groups; and
determining the first value and the second value from the target set.
10. The method of claim 9, wherein selecting the target group comprises:
in response to determining that the amount of detail of the associated image content is greater than a predetermined amount, selecting a group having a near fixed value as the target group.
11. The method of claim 7, wherein predefining the first and second values comprises:
predefining the first value and the second value based on a size ratio between a coding unit and an inner subblock within the coding unit.
12. The method of claim 11, wherein predefining the first value and the second value based on the size ratio comprises:
predefining the first value as half of the second value in response to determining that the total number of samples within the sub-block is greater than one-fourth of the total number of samples within the coding unit;
predefining the first value as one quarter of the second value in response to determining that the total number of samples within the sub-block is less than one quarter of the total number of samples within the coding unit; or
Predefining the first value as one quarter of the second value in response to determining that the total number of samples within the sub-block is less than one eighth of the total number of samples within the coding unit.
13. The method of claim 7, wherein predefining the first and second values comprises:
predefining the first value and the second value based on a hierarchical order.
14. The method of claim 13, wherein predefining the first value and the second value based on the hierarchical order comprises:
predefining the first value to be the same as the second value in response to determining that a sub-block within a coding unit is greater than one-quarter of a size of the coding unit;
predefining the first value as two-quarters of the second value in response to determining that sub-blocks within the coding unit are less than one-quarter of the size of the coding unit and greater than one-sixteenth of the size of the coding unit; or
Predefining the first value as one quarter of the second value in response to determining that a sub-block within the coding unit is less than one sixteenth of the size of the coding unit.
15. The method of claim 1, further comprising:
dynamically determining a first value for the first weight and a second value for the second weight.
16. The method of claim 15, wherein dynamically determining the first value and the second value comprises:
dynamically determining the first value based on a comparison between a third value of the first transform coefficient from a sub-block SATD transform process and a fourth value of the first transform coefficient from a downsampling encoding unit that includes the sub-block.
17. The method of claim 16, wherein dynamically determining the first value based on the comparison comprises:
determining the first value to be two-quarters in response to determining that the third value is half of the fourth value.
18. A video encoding and decoding apparatus, comprising:
one or more processors; and
a memory configured to store instructions executable by the one or more processors, wherein, upon execution of the instructions, the one or more processors are configured to perform the method of any of claims 1-17.
19. A non-transitory computer-readable storage medium for encoding and decoding video blocks having stored thereon computer-executable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform the method of any one of claims 1-17.
CN202180040503.XA 2020-06-05 2021-06-07 Method and apparatus for encoding and decoding video using SATD-based cost computation Pending CN115699759A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063035601P 2020-06-05 2020-06-05
US63/035,601 2020-06-05
PCT/US2021/036241 WO2021248135A1 (en) 2020-06-05 2021-06-07 Methods and apparatuses for video coding using satd based cost calculation

Publications (1)

Publication Number Publication Date
CN115699759A true CN115699759A (en) 2023-02-03

Family

ID=78831617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180040503.XA Pending CN115699759A (en) 2020-06-05 2021-06-07 Method and apparatus for encoding and decoding video using SATD-based cost computation

Country Status (2)

Country Link
CN (1) CN115699759A (en)
WO (1) WO2021248135A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9161057B2 (en) * 2009-07-09 2015-10-13 Qualcomm Incorporated Non-zero rounding and prediction mode selection techniques in video encoding
US9538190B2 (en) * 2013-04-08 2017-01-03 Qualcomm Incorporated Intra rate control for video encoding based on sum of absolute transformed difference
KR102114252B1 (en) * 2013-07-05 2020-05-22 삼성전자 주식회사 Method and apparatus for deciding video prediction mode
KR101928185B1 (en) * 2017-05-15 2018-12-11 홍익대학교 산학협력단 Apparatus for hevc coding and method for predicting coding unit depth range using the same
US10779012B2 (en) * 2018-12-04 2020-09-15 Agora Lab, Inc. Error concealment in video communications systems

Also Published As

Publication number Publication date
WO2021248135A1 (en) 2021-12-09

Similar Documents

Publication Publication Date Title
TWI711300B (en) Signaling for illumination compensation
CN113784132B (en) Method and apparatus for motion vector rounding, truncation, and storage for inter prediction
CN113824959B (en) Method, apparatus and storage medium for video encoding
EP2890128A1 (en) Video encoder with block merging and methods for use therewith
US20240146950A1 (en) Methods and apparatuses for decoder-side motion vector refinement in video coding
CN116547972A (en) Network-based image filtering for video codec
CN114128263A (en) Method and apparatus for adaptive motion vector resolution in video coding and decoding
WO2021188598A1 (en) Methods and devices for affine motion-compensated prediction refinement
CN117223284A (en) Network-based image filtering for video codec
CN116491120A (en) Method and apparatus for affine motion compensated prediction refinement
CN116171576A (en) Method and apparatus for affine motion compensated prediction refinement
CN115699759A (en) Method and apparatus for encoding and decoding video using SATD-based cost computation
CN114342390B (en) Method and apparatus for prediction refinement for affine motion compensation
CN114080808A (en) Method and apparatus for decoder-side motion vector refinement in video coding
CN114402618A (en) Method and apparatus for decoder-side motion vector refinement in video coding and decoding
CN114051732A (en) Method and apparatus for decoder-side motion vector refinement in video coding
WO2022026480A1 (en) Weighted ac prediction for video coding
CN117643053A (en) Network-based video codec image filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination