CN113498609B

CN113498609B - Picture resolution dependent configuration for video codec

Info

Publication number: CN113498609B
Application number: CN201980092938.1A
Authority: CN
Inventors: 陈漪纹; 王祥林
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-31
Filing date: 2019-12-30
Publication date: 2023-06-20
Anticipated expiration: 2039-12-30
Also published as: WO2020142468A1; CN113498609A

Abstract

A video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first temporal motion vector prediction compression scheme in response to any one of a first picture resolution, a first level, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second level, or a second level.

Description

Picture resolution dependent configuration for video codec

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application serial No. 62/787,240 filed on day 12 and 31 of 2018. The entire disclosure of the above application is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates generally to video codec and compression. More particularly, the present disclosure relates to systems and methods for performing video coding using inter prediction.

Background

This section provides background information related to the present disclosure. The information contained in this section should not necessarily be construed as prior art.

Video data may be compressed using any of a variety of video codec techniques. Video encoding and decoding may be performed according to one or more video encoding and decoding standards. Some illustrative video coding standards include general video coding (VVC), joint exploration test model (JEM), high efficiency video coding (h.265/HEVC), advanced video coding (h.264/AVC), and Moving Picture Experts Group (MPEG) coding. Video codecs typically utilize prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy inherent in video pictures or sequences. One goal of video codec technology is to compress video data into a form using a lower bit rate while avoiding or minimizing degradation of video quality.

Disclosure of Invention

This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.

According to a first aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first temporal motion vector prediction compression scheme in response to any one of a first picture resolution, a first level, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second level, or a second level.

According to a second aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first motion vector precision level for storing a first motion vector in a motion vector buffer, wherein the selecting is performed in response to any one of a first picture resolution, a first level, or a first level associated with a first picture; and selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selection is selected in response to any one of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first motion vector accuracy level is different from the second motion vector accuracy level.

According to a third aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any one of a first picture resolution, a first level, or a first level associated with a first picture; and selecting a second minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first minimum allowable block size is different from the second minimum allowable block size.

Drawings

Hereinafter, a set of illustrative, non-limiting embodiments of the present disclosure will be described in conjunction with the accompanying drawings. Variations in structures, methods, or functions may be implemented by one of ordinary skill in the relevant art based on the examples provided herein and such variations are intended to be included within the scope of the present disclosure. In the absence of conflict, the teachings of the different embodiments may, but need not, be combined with one another.

FIG. 1 is a block diagram illustrating an illustrative generic video codec test model 3 (VTM-3) encoder.

Fig. 2 is a graphical depiction of a picture divided into a plurality of Coding Tree Units (CTUs).

FIG. 3 illustrates a multi-type tree structure with multiple segmentation modes.

Fig. 4A shows an example of a block-based 4-parameter affine motion model for VTM-3.

Fig. 4B shows an example of a block-based 6-parameter affine motion model for VTM-3.

Fig. 5 is a graphical depiction of an affine Motion Vector Field (MVF) organized into a plurality of sub-blocks.

Fig. 6A shows a set of spatially neighboring blocks used by a sub-block based temporal motion vector prediction (SbTMVP) process in the context of general video coding.

Fig. 6B shows a sub-block based temporal motion vector prediction (SbTMVP) process for deriving sub-Coding Unit (CU) motion fields by applying motion shifts from spatial neighbors and scaling the motion information from the corresponding co-located sub-CUs.

Fig. 7A shows a representative Motion Vector (MV) for 16:1 MV compression used in High Efficiency Video Codec (HEVC).

FIG. 7B illustrates a representative Motion Vector (MV) for 4:1 MV compression used in VTM-3.

Fig. 8A shows a representative Motion Vector (MV) for vertical 8:1 MV compression.

Fig. 8B shows a representative Motion Vector (MV) for horizontal 8:1 MV compression.

Detailed Description

The terminology used in the present disclosure is intended to be in the nature of examples rather than of limitations. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" also refer to the plural forms unless the context clearly dictates otherwise. It is to be understood that the term "and/or" as used herein refers to any one or all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various information, this information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may be referred to as second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" may be understood to mean "when … …" or "at … …" or "responsive" depending on the context.

Reference throughout this specification to "one embodiment," "an embodiment," "another embodiment," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment," "in another embodiment," and the like in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

On days 10-20 of 4.2018, at the 10 th joint video expert group (jfet) conference held in san diego, california, jfet defines the first draft of the universal video codec (VVC) and the VVC test model 1 (VTM-1) encoding method. It is decided to include a quad-tree (quadtree) structure with nested multi-type trees that uses both two-split and three-split coding block structures as the initial new codec feature for VVC. Since then, reference software VTM has been developed during jfet conferences for implementing the encoding method and draft VVC decoding process. As in most previous standards, VVC has a block-based hybrid codec architecture that combines inter-picture and intra-picture prediction and transform coding with entropy coding.

FIG. 1 is a block diagram illustrating an illustrative generic video codec test model 3 (VTM-3) encoder 100. An input video 102 comprising a plurality of pictures is applied to the non-inverting input of a first adder 104 and a switch 106. The output of the first adder 104 is connected to the input of the transform/quantization block 108. The output of the transform/quantization block 108 is fed to an input of an entropy coding block 110 and is also fed to an input of an inverse quantization/inverse transform block 111. The output of the inverse quantization/inverse transformation block 111 is fed to a first non-inverting input of a second adder 112. The output of the second adder 112 is connected to the input of the loop filter 120. The output of loop filter 120 is connected to the input of Decoded Picture Buffer (DPB) 122.

The switch 106 connects the input video 102 to an input of an intra prediction block 114 or to a first input of a motion estimation/compensation block 116. The output of the intra prediction block 114 and the output of the motion estimation/compensation block 116 are both connected to an inverting input of the first adder 104 and to a second non-inverting input of the second adder 112. The output of DPB 122 is connected to motion estimation/compensation block 116.

In operation, the encoder 100 divides or partitions an input image into a sequence of Coding Tree Units (CTUs). The CTU concept is substantially similar to that utilized in High Efficiency Video Codec (HEVC). For a picture with three sample arrays, when 4:2: in the YUV chroma subsampling format of 0, the CTU includes a 2N x 2N block of luma samples and two corresponding N x N blocks of chroma samples.

Fig. 2 is a graphical depiction of a picture divided or partitioned into a plurality of Coding Tree Units (CTUs) 201, 202, 203 using a tree structure in VVC. In HEVC, each

CTU

201, 202, 203 is partitioned into Coding Units (CUs) by using a quad-tree structure, represented as a coding tree or quadtree, to accommodate various local characteristics. A decision is made at the leaf CU level whether to encode the picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction. Each leaf CU may be further partitioned into one, two, or four Prediction Units (PUs) according to PU partition type. Within one PU, the same prediction process is applied and related information is sent to the decoder based on the PU. After obtaining the residual block by applying a prediction process based on the PU partition type, the leaf CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree of the CU. One feature of the HEVC structure is that it utilizes multiple partitioning concepts including CUs, PUs, and TUs.

In VVC, a quad tree (quadtree) having a nested multi-type tree using a two-split and three-split segment structure replaces the concept of multi-split unit types. Thus, the quadtree removes the separation of CU, PU and TU concepts, while supporting greater flexibility for CU partition shapes, except that a CU with too large a size for the maximum transform length requires the separation of CU, PU and TU concepts. In the coding tree structure, a CU may have a square or rectangular shape. Each Coding Tree Unit (CTU) 201, 202, 203 is first partitioned in a quadtree structure. The quadtree nodes may then be further partitioned in a multi-type tree structure.

FIG. 3 illustrates a multi-type tree structure with multiple segmentation modes. Any one of the four partition types exists in the multi-type tree structure of fig. 3, namely, a vertical two partition (split_bt_ver) 301, a horizontal two partition (split_bt_hor) 302, a vertical three partition (split_tt_ver) 303, and a horizontal three partition (split_tt_hor) 304. The multi-type leaf nodes are called Coding Units (CUs). Unless the CU is too large for the maximum transform length, this partitioning is used for prediction and transform processing without any further partitioning. This means that in most cases, the CUs, PUs and TUs have the same block size in a quadtree with a nested multi-type tree coding block structure. An exception occurs when the maximum supported transform length is smaller than the width or height of the color component of the CU.

For each inter-prediction CU, any additional information required to include motion vectors, reference picture indices, and reference picture list use indexed motion parameters, and new codec features of VVC, is used for inter-prediction sample generation. The motion parameters may be signaled explicitly or implicitly. When a CU is encoded in skip mode, the CU is associated with one PU and has no significant residual coefficients, no encoded motion vector delta or reference picture index. The merge mode is defined as obtaining motion parameters for the current CU from neighboring CUs (including spatial and temporal candidates), and introducing additional scheduling in VVC. Not just skip mode, merge mode may be applied to any inter-prediction CU. An alternative to merge mode is to explicitly send motion parameters, where motion vectors are explicitly signaled for each CU, corresponding reference picture indices for each reference picture list, and reference picture list usage flags and other required information.

In addition to the inter-coding features in HEVC, VTM3 includes a number of new and improved inter-prediction coding tools as listed below:

extended merge prediction

Merge mode with MVD (MMVD)

Affine motion compensated prediction

-sub-block based temporal motion vector prediction (SbTMVP)

Adaptive Motion Vector Resolution (AMVR)

-stadium storage: 1/16 pixel brightness sample point (1/16 ^th luma sample) MV storage and 8 x 8 motion field compression

Biprediction with weighted average (BWA)

-bidirectional optical flow (BDOF)

Triangle segmentation prediction

-combining inter-frame and intra-frame prediction (CIIP)

The following paragraphs provide details regarding the selected inter prediction method specified in VVC. In addition to the inter-coding features in HEVC, VTM3 includes a number of new and improved inter-prediction coding tools as listed below:

extended merge prediction is performed in VVC as follows. In VTM3, a merge candidate list is constructed by sequentially including the following five types of candidates:

1) Spatial MVP from spatially neighboring CUs

2) Temporal MVP from co-located CUs

3) History-based MVP from FIFO tables

4) Paired average MVP

5) Zero MV.

The size of the merge list is signaled in the stripe header. In VTM-3, the maximum allowed size of the merge list is 6. For each CU codec in merge mode, the index of the best merge candidate is encoded using truncated unary binarization (TU). The first binary bit of the merge index is encoded using the context and bypass encoding is used for other binary bits.

Affine motion compensation prediction in VVC is performed as follows. In HEVC, only translational motion models are applied to Motion Compensated Prediction (MCP), despite the fact that there are many types of motion in the real world, e.g., zoom in/out, rotation, perspective motion, and other irregular motions.

Fig. 4A shows an example of a block-based 4-parameter affine motion model 401 for VTM-3, and fig. 4B shows an example of a block-based 6-parameter affine motion model 402 for VTM-3. Models 401 and 402 are used in conjunction with the motion compensation process for VTM-3. In the case of the 4-parameter affine motion model 401, two control points v are used ₀ And v ₁ To describe the affine motion field of a given block (constituting a 4-parameter affine motion model). In the case of the 6-parameter affine motion model 402, the control points v from three are used ₀ 、v ₁ And v ₂ To describe the affine motion field of a given block (constituting a 6-parameter affine motion model).

For the 4-parameter affine motion model 401, the motion vectors at the sample positions (x, y) in the block are pushed as follows:

for the 6-parameter affine motion model 402, the motion vectors at the sample positions (x, y) in the block are pushed as follows:

wherein (mv) _0x ,mv _0y ) Is the motion vector of the upper left corner control point, (mv) _1x ,mv _1y ) Is the motion vector of the upper right corner control point, and (mv _2x ,mv _2y ) Is the motion vector of the lower left corner control point.

Fig. 5 is a graphical depiction of an affine Motion Vector Field (MVF) 501 organized into a plurality of

sub-blocks

502, 503 and 504. To simplify motion compensated prediction, a block-based affine transformation prediction process is applied. For purposes of illustration, each of the plurality of

sub-blocks

502, 503, and 504 is assumed to be a 4 x 4 luminance sub-block. In order to derive a motion vector for each 4×4 luminance sub-block, a motion vector of a center sample of each sub-block as shown in fig. 5 is calculated according to the foregoing equations (1) and (2), and rounded to a fractional precision of 1/16. For example, the motion vector of the center sample of sub-block 502 is shown as motion vector 505. A motion compensated interpolation filter is then applied to generate a prediction for each sub-block with the derived motion vector. The sub-block size of the chrominance component is also set to 4×4. The Motion Vector (MV) for a 4 x 4 chroma sub-block is calculated as an average of the MVs of four corresponding 4 x 4 luma sub-blocks. As in the case of translational motion inter prediction, there are also two affine motion inter prediction modes: affine merge mode and affine AMVP mode.

The VTM supports a sub-block-based temporal motion vector prediction (SbTMVP) method in VVC. Similar to Temporal Motion Vector Prediction (TMVP) in HEVC, sbTMVP uses motion fields in co-located pictures to improve motion vector prediction and merge mode for CUs in the current picture. The same co-located picture used by TMVP is used for SbTVMP. SbTMVP differs from TMVP in two ways:

1. TMVP predicts motion at CU level, but SbTMVP predicts motion at sub-CU level;

2. the TMVP extracts a temporal motion vector from a co-located block in the co-located picture (the co-located block is the lower right block or center block relative to the current CU), while the SbTMVP applies a motion shift before extracting the temporal motion information from the co-located picture, where the motion shift is obtained from a motion vector from one of the spatially neighboring blocks of the current CU.

Fig. 6A shows a set of spatially neighboring blocks used by a sub-block-based temporal motion vector prediction (SbTMVP) process in the context of general video coding, and fig. 6B shows a sub-block-based temporal motion vector prediction (SbTMVP) process for deriving a sub-Coding Unit (CU) motion field by applying a motion shift from spatial neighboring and scaling motion information from a corresponding co-located sub-CU. The SbTMVP predicts the motion vectors of the sub-CUs within the current CU in two steps. In a first step, the spatial proximity in fig. 6A is checked in the order of A1, B1, 604, B0, 603, and A0 602. Once a first spatially neighboring block is identified that has a motion vector that uses the co-located picture as its reference picture, the motion vector is selected to represent the motion shift to be applied. If such motion is not identified from spatial proximity, the motion shift is set to (0, 0).

In the second step, as shown in fig. 6B, the motion shift identified in step 1 is applied (i.e., added to the coordinates of the current block) to obtain sub-CU-level motion information (motion vector and reference index) from the co-located picture. The example in fig. 6B assumes that the motion shift is set to the motion of block A1 601. Then, for each sub-CU, the motion information of its corresponding block (the smallest motion grid covering the center sample) in the co-located picture is used to derive the motion information for the sub-CU. After identifying the motion information of the co-located sub-CU, the motion information of the co-located sub-CU is converted to a motion vector and a reference index of the current sub-CU in a similar manner as the TMVP process of HEVC, wherein temporal motion scaling is applied to align the reference picture of the temporal motion vector with the reference picture of the current CU.

In VTM3, a sub-block based merge list containing a combination of both SbTVMP candidates and affine merge candidates is used for sub-block based merge mode signaling. The SbTVMP mode is enabled/disabled by a Sequence Parameter Set (SPS) flag. If SbTMVP mode is enabled, the SbTMVP predictor is added as the first item of the list of sub-block based merge candidates, and is followed by an affine merge candidate. The size of the sub-block based merge list is signaled in the SPS and the maximum allowed size of the sub-block based merge list is 5 in VTM 3.

The sub-CU size used in the SbTMVP is fixed to 8×8, and as is done with affine merge mode, the SbTMVP mode is only applicable to CUs with both width and height greater than or equal to 8. The coding logic of the additional SbTMVP merge candidates is the same as that of the other merge candidates, that is, an additional RD check is performed for each CU in the P-slice or B-slice to decide whether to use the SbTMVP candidates.

Level (profile), hierarchy (tier), and level (level): video codec standards such as h.264/AVC, h.265/HEVC, and VVC are designed to be universal in terms of their serving a wide range of applications, bit rates, resolutions, qualities, and services. Applications should cover digital storage media, television broadcasts, and real-time communications, among other things. In creating the present specification, various requirements from typical applications have been considered, necessary algorithm elements have been developed, and these have been integrated into a single grammar comprising a plurality of feature sets. These feature sets may be implemented independently or in any of various combinations. Accordingly, the present description facilitates video data exchange between a variety of different applications.

However, given the realisation of the full feature set embodying the present specification, a limited number of subsets of these features may be specified by "level", "hierarchy" and "hierarchy". "level" is a subset of the overall bitstream syntax specified in this specification. Within the limits imposed by a given level of syntax, the performance of the encoder and decoder may still vary significantly depending on the value taken by the semantic element in the bitstream (such as the specified size of the decoded picture). In many applications, it is neither practical nor economical to implement a decoder that can handle all hypothetical uses of syntax within a particular level.

To overcome the problems inherent in all hypothetical uses of grammars implementing a given level, a "hierarchy" and a "level" are specified within each level. The level of a hierarchy is a specified set of constraints imposed on the values of semantic elements in a bitstream. These constraints may be simple limits on the values. Alternatively, they may take the form of constraints on arithmetic combinations of values (e.g., picture width times picture height times number of pictures decoded per second). The levels specified for the lower hierarchy are more constrained than the levels specified for the higher hierarchy.

Due to the inherent nature of TMVP, it is necessary to store all motion information for reference pictures in order to perform temporal Motion Vector (MV) prediction. In HEVC, the smallest available block storing this motion information is 16 x 8/8 x 16. However, in order to reduce the size of the temporal MV buffer, a motion information compression scheme may be advantageously introduced in HEVC. According to this method, each picture is divided into 16×16 blocks. Only the motion information from the upper left 4 x 4 block in each 16 x 16 block is used as a representative motion for all 4 x 4 blocks within that 16 x 16 block. Since one 4×4MV is stored to represent 16 4×4 blocks, this method may be referred to as 16:1MV compression.

Fig. 7A shows representative Motion Vector (MV) compression for use in efficient video coding (HEVC), and fig. 7B shows representative Motion Vector (MV) compression for use in VTM-3, 4:1MV compression. As shown in fig. 7A, representative 4×4 blocks for each 16×16 block are denoted as a 701, B703, C705, and D707. In current VVC (VTM-3.0), a 4:1MV compression scheme is used. As shown in fig. 7B, MVs of the upper left 4×4 block (denoted as a 711, B713, C715, … … P717) of each 8×8 block are used to represent MVs of all 4×4 blocks within the same 8×8 block.

In current versions of VVC, a higher MV precision for MV storage typically requires a larger MV buffer to store MVs. When higher MV precision is enabled (e.g., 1/8 or 1/16), we propose several methods to reduce the MV buffer size. On the other hand, when using fixed bits (e.g., 16 bits for each MV component) to store MVs, using lower MV precision for MV storage may increase the effective range of stored MVs.

Furthermore, motion Compensation (MC) is typically the largest consumer of memory access bandwidth in decoder implementations. Thus, for video codec standards, providing reasonable limits on MC memory access bandwidth requirements is extremely important to promote its cost-effective implementation and ensure its cross-industry success. The memory access bandwidth requirements for an MC are typically determined by the operation block size and the type of prediction (e.g., unidirectional or bidirectional) to be performed. In the current version of VVC, there is no limitation in this respect. As a result, the worst-case bandwidth requirement is greater than 2 times the corresponding worst-case bandwidth for HEVC. In VVC, the worst case of MC memory access bandwidth occurs with bi-directional MC of 4 x 4 blocks, which is utilized by some codec modes described in more detail below.

The proposed method

Note that the proposed methods described herein may be applied independently, or in any of various combinations.

Adaptive MV compression

Fig. 8A shows a representative Motion Vector (MV) for vertical 8:1MV compression, and fig. 8B shows a representative MV for horizontal 8:1 Motion Vector (MV) compression. In order to provide an improved tradeoff between the required size of the TMVP buffer and coding efficiency, we propose to use either of two compression schemes (denoted horizontal 8:1mv compression and vertical 8:1mv compression). As shown in fig. 8A, for the first 16×8/8×16 block 801, the MV of the upper left 4×4 block 811 is used as a representative MV. Also, as shown in fig. 8B, for the second 16×8/8×16 block 802, the MV of the upper left 4×4 block 821 is used as a representative MV. Furthermore, we propose to apply any of a number of different rate temporal MV compression schemes (e.g., 16:1, 4:1, horizontal 8:1, or vertical 8:1) in response to one or more video parameters, such as picture resolution (sometimes referred to as picture size), level, or parameter level.

According to one set of examples, 4:1 or 16:1MV compression is applied to a temporal MV buffer in response to any of picture resolution, level, or parameter level. In one exemplary embodiment, when the picture resolution is less than or equal to (1280×720), 4:1MV compression is applied to the temporal MV buffer. When the picture resolution is greater than (1280×720), 16:1MV compression is applied to the temporal MV buffer.

According to another set of examples, 4:1 or vertical 8:1MV compression is applied to the temporal MV buffer in response to picture resolution, level, or parameter level. In one exemplary embodiment, 4:1MV compression is applied to the temporal MV buffer for picture resolutions less than or equal to (1280×720). For picture resolutions greater than (1280×720), vertical 8:1MV compression is applied to the temporal MV buffer.

Adaptive MV precision for MV storage

We propose to store MVs into the MV buffer with predefined or signaled MV precision.

According to one set of illustrative examples, each corresponding MV of MVs is stored in a MV buffer at a respective predefined MV precision in response to one or more video parameters, such as picture resolution (sometimes referred to as picture size), level, or parameter level. It should be noted that the MV buffers mentioned here include any one of a spatial MV buffer, a temporal MV buffer, or a spatial MV line buffer. According to the proposed example, each of a plurality of respective MV precision levels may be used to store MVs into any of a plurality of corresponding MV buffers. Further, respective MV precision levels used by MV storage may be selected in response to the corresponding picture resolution.

According to one set of examples, when a high level of MV precision (e.g., 1/8 or 1/16) is enabled, the proposed method stores MVs for temporal MV prediction at any of a number of different MV precision (such as 1/16 pixel (1/16-pel), 1/8 pixel, 1/4 pixel, 1/2 pixel, or 1 pixel) based on picture resolution, level, or parameter level. In particular, when reconstructing all CUs within one picture/slice, the MVs of each of these CUs are stored in a temporal MV buffer (referred to as temporal MV buffer) to be used as temporal MV prediction for one or more subsequent pictures/slices. We propose to store each of the respective MVs into a temporal MV buffer using the corresponding MV precision in response to picture resolution, level or parameter level. For example, when the picture resolution is less than or equal to (1280×720), 1/16-MV precision is used to store MVs in the temporal MV buffer. When the picture resolution is greater than (1280×720), the MV is stored in the temporal MV buffer using 1/4-MV precision.

In another set of examples, the size of the MV line buffer is reduced by storing MVs for spatial MV prediction across CTU lines at any one of a plurality of different MV accuracies (such as 1/16 pixel, 1/8 pixel, 1/4 pixel, 1/2 pixel, or 1 pixel) in response to any one of picture resolution, level, or parameter level. In yet another set of examples, each MV of the MVs stored in the spatial MV buffer is stored at any of a plurality of different MV accuracies (such as 1/16 pixel, 1/8 pixel, 1/4 pixel, 1/2 pixel, or 1 pixel) in response to a picture resolution, level, or parameter level. In other words, some of the MVs generated by the averaging or scaling process may have higher MV precision (1/16 pixel or 1/8 pixel), but the MVs stored in the spatial MV buffer for MV prediction are stored using different and possibly lower MV precision. If stored at such lower resolution, the buffer size may be reduced.

In yet another set of examples, each MV of the MVs stored in the MV buffer is stored at any of a plurality of different MV accuracies (such as 1/16 pixel, 1/8 pixel, 1/4 pixel, 1/2 pixel, or 1 pixel) in response to a picture resolution, level, or parameter level. In other words, MVs generated by the averaging or scaling process may have higher MV precision (1/16 pixels or 1/8 pixels), but MVs stored in each of the MV buffers for MV prediction remain different and possibly at lower MV precision. If stored at such lower resolution, the buffer size may be reduced.

In yet another set of examples, the MV precision level used to store MVs into the historical MV table (also referred to as the historical MV buffer) may have a different MV precision than the MV precision used to store MVs in the temporal MV buffer or the spatial MV buffer or the MV line buffer. For example, even when using a lower MV precision level to store MVs in a temporal MV buffer or a spatial MV buffer, MVs may be stored in a historical MV buffer using a higher MV precision level (e.g., 1/16 pixel).

Minimum block size for motion compensation

According to one set of examples, a minimum block size for motion compensation is determined in response to a video parameter such as picture resolution (also referred to as picture size), level, or parameter level. In one example, a 4 x 4 block may be used for motion compensation for each respective picture having a corresponding resolution less than or equal to (1280 x 720); and 4 x 4 blocks are not available for motion compensation for each respective picture having a corresponding resolution greater than (1280 x 720). These block size constraints may also include sub-block size constraints for sub-block based inter modes such as affine motion mode and sub-block based temporal motion vector prediction.

In one example, the minimum block size for motion compensation is determined based on video parameters such as picture resolution (also referred to as picture size), level, or parameter level. In one example, a 4 x 4 block may be used for both unidirectional motion compensation and bi-directional motion compensation for each picture having a resolution less than or equal to (1280 x 720); and 4 x 4 blocks are not available for bi-directional motion compensation for each picture with a resolution greater than (1280 x 720). The block size constraints may also include sub-block size constraints for sub-block based inter modes such as affine motion mode and sub-block based temporal motion vector prediction.

In some examples, the first temporal motion vector compression scheme uses a first compression ratio and the second temporal motion vector compression scheme uses a second compression ratio that is different from the first compression ratio.

In some examples, the first compression ratio is selected to be less than the second compression ratio in response to the first picture resolution being less than or equal to the second picture resolution.

In some examples, the first compression ratio is selected to be greater than the second compression ratio in response to the first picture resolution being greater than the second picture resolution.

In some examples, the first compression ratio includes at least one of 16:1, 4:1, horizontal 8:1, or vertical 8:1.

According to a second aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first motion vector precision level for storing a first motion vector in a motion vector buffer, wherein the selecting is performed in response to any one of a first picture resolution, a first level, or a first level associated with a first picture; and selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selecting is performed in response to any one of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first motion vector accuracy level is different from the second motion vector accuracy level.

In some examples, the motion vector buffer includes at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

In some examples, the first motion vector accuracy level includes any of 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel.

In some examples, the plurality of coding units are reconstructed within the first picture or within a slice of the first picture; each of a plurality of motion vectors for each of a plurality of coding units is stored in a temporal motion vector buffer; and the temporal motion vector buffer is used to perform prediction for one or more consecutive pictures after the first picture or one or more consecutive slices after the slice of the first picture.

In some examples, the first motion vector precision level is selected to be less than the second motion vector precision level in response to the first picture resolution being less than or equal to the second picture resolution.

In some examples, the spatial motion vector line buffer stores a plurality of motion vectors across the coding tree unit, the plurality of motion vectors including at least a first motion vector and a second motion vector, wherein the first motion vector is stored in the spatial motion vector line buffer at a first motion vector precision level and the second motion vector is stored in the spatial motion vector line buffer at a second motion vector precision level.

In some examples, the averaging or scaling process generates one or more motion vectors including at least the first motion vector. The one or more motion vectors are generated at a first motion vector accuracy level. The one or more motion vectors are stored in the spatial motion vector line buffer at a second motion vector precision level.

In some examples, the second motion vector accuracy level is selected to be less than the first motion vector accuracy level.

In some examples, the averaging or scaling process generates one or more motion vectors including at least the first motion vector. The one or more motion vectors are generated at a first motion vector accuracy level. The one or more motion vectors are stored in the spatial motion vector buffer, the temporal motion vector buffer, and the spatial motion vector line buffer at a second level of motion vector precision.

In some examples, the historical motion vector buffer stores a plurality of motion vectors including at least the first motion vector at a first motion vector precision level. The plurality of motion vectors are stored in at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer at a second level of motion vector precision.

In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to a sub-block size constraint for at least one of affine motion prediction or sub-block-based temporal motion vector prediction.

In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to at least one constraint for performing bi-directional motion compensation or uni-directional motion compensation.

In some examples, the first minimum allowable block size is greater than 4 x 4 blocks when the first picture has a first picture resolution greater than 1280 x 720.

In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the embodiments described herein. The computer program product may include a computer-readable medium.

Furthermore, the above methods may be implemented using an apparatus comprising one or more circuits comprising an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use circuitry in combination with other hardware or software components to perform the methods described above. Each module, sub-module, unit, or sub-unit disclosed above may be implemented, at least in part, using one or more circuits.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following its general principles and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise examples described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the invention be limited only by the claims appended hereto.

Claims

1. A video encoding and decoding method, comprising:

selecting a first temporal motion vector predictive compression scheme to apply to the temporal motion vector buffer to store a first motion vector in response to any one of a first picture resolution, a first level, or a first level associated with the first picture; and is also provided with

In response to any of a second picture resolution, a second level, or a second level associated with a second picture, a second temporal motion vector predictive compression scheme is selected to be applied to the temporal motion vector buffer to store a second motion vector,

wherein the temporal motion vector predictive compression scheme refers to a motion vector of an upper left sub-block in a block as a representative motion vector of all sub-blocks in the block, wherein the sizes of the block and the sub-block are determined according to a compression ratio used by the temporal motion vector predictive compression scheme, and a first compression ratio used by the first temporal motion vector predictive compression scheme is different from a second compression ratio used by the second temporal motion vector predictive compression scheme,

wherein the first motion vector and the second motion vector are generated using an averaging or scaling procedure, and the first motion vector has a higher level of motion vector precision when generated using an averaging or scaling procedure than when stored in the temporal motion vector buffer, and the second motion vector has a higher level of motion vector precision when generated using an averaging or scaling procedure than when stored in the temporal motion vector buffer.

2. The video coding method of claim 1, further comprising: in response to the first picture resolution being less than or equal to the second picture resolution, the first compression ratio is selected to be less than the second compression ratio.

3. The video coding method of claim 1, further comprising: in response to the first picture resolution being greater than the second picture resolution, the first compression ratio is selected to be greater than the second compression ratio.

4. The video coding method of claim 1, wherein the first compression ratio and the second compression ratio comprise at least one of 16:1, 4:1, horizontal 8:1, or vertical 8:1.

5. A video encoding and decoding method, comprising:

selecting a first motion vector precision level to store a first motion vector in a motion vector buffer in response to any one of a first picture resolution, a first level, or a first level associated with a first picture; and is also provided with

Selecting a second motion vector precision level to store a second motion vector in a motion vector buffer in response to any of a second picture resolution, a second level, or a second level associated with a second picture;

wherein the first motion vector accuracy level is different from the second motion vector accuracy level,

Wherein the first motion vector and the second motion vector are generated using an averaging or scaling process, and the first motion vector has a higher level of motion vector precision than the first motion vector precision level when generated using the averaging or scaling process, and the second motion vector has a higher level of motion vector precision than the second motion vector precision level when generated using the averaging or scaling process.

6. The video coding method of claim 5, wherein the motion vector buffer comprises at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

7. The video coding method of claim 6, wherein the first motion vector precision level comprises any one of 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel.

8. The video coding method of claim 7, further comprising:

reconstructing a plurality of coding units within the first picture or within a slice of the first picture;

storing each of a plurality of motion vectors of each of the plurality of coding units in the temporal motion vector buffer; and is also provided with

The temporal motion vector buffer is used to perform prediction for one or more consecutive pictures following the first picture or to perform prediction for one or more consecutive slices in pictures following the first picture.

9. The video coding method of claim 7, further comprising: in response to the first picture resolution being less than or equal to the second picture resolution, the first motion vector precision level is selected to be less than the second motion vector precision level.

10. The video coding method of claim 6, further comprising: the spatial motion vector line buffer is utilized to store motion vectors across coding tree units, wherein the motion vectors are either first motion vectors or second motion vectors, wherein the first motion vectors are stored in the spatial motion vector line buffer at the first motion vector precision level and the second motion vectors are stored in the spatial motion vector line buffer at the second motion vector precision level.

11. The video coding method of claim 6, wherein the first motion vector is stored to a historical motion vector buffer using a different level of precision than the first motion vector level of precision; the second motion vector is stored to a historical motion vector buffer using a different level of precision than the second motion vector level of precision.

12. A video encoding and decoding method, comprising:

selecting a first minimum allowable block size for performing motion compensation in response to any of a first picture resolution, a first level, or a first level associated with a first picture; and is also provided with

Selecting a second minimum allowable block size for performing motion compensation in response to any of a second picture resolution, a second level, or a second level associated with a second picture;

wherein the first minimum allowable block size is different from the second minimum allowable block size,

wherein a first motion vector of a block in the first picture and a second motion vector of a block in the second picture are generated using an averaging or scaling procedure, and the first motion vector has a higher level of motion vector precision when generated using the averaging or scaling procedure than when stored in a motion vector buffer, and the second motion vector has a higher level of motion vector precision when generated using the averaging or scaling procedure than when stored in the motion vector buffer.

13. The video coding method of claim 12, wherein the block size constraint for performing motion compensation comprises a sub-block size constraint for at least one of affine motion prediction or sub-block-based temporal motion vector prediction.

14. The video coding method of claim 12, further comprising: the first minimum allowable block size and the second minimum allowable block size are selected in response to at least one constraint for performing bi-directional motion compensation or uni-directional motion compensation.

15. The video coding method of claim 12, wherein the first minimum allowable block size is greater than 4 x 4 blocks when the first picture has a first picture resolution greater than 1280 x 720.