CN113498609A

CN113498609A - Picture resolution dependent configuration for video coding and decoding

Info

Publication number: CN113498609A
Application number: CN201980092938.1A
Authority: CN
Inventors: 陈漪纹; 王祥林
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-31
Filing date: 2019-12-30
Publication date: 2021-10-12
Anticipated expiration: 2039-12-30
Also published as: CN113498609B; WO2020142468A1

Abstract

A video codec method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first level, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of the second picture resolution, the second level, or the second level.

Description

Picture resolution dependent configuration for video coding and decoding

Cross Reference to Related Applications

This application claims priority from U.S. provisional patent application serial No. 62/787,240 filed on 31/12/2018. The entire disclosure of the above application is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates generally to video coding and decoding and compression. More particularly, the present disclosure relates to systems and methods for performing video coding using inter-prediction.

Background

This section provides background information related to the present disclosure. The information contained in this section should not necessarily be construed as prior art.

Video data may be compressed using any of a variety of video codec techniques. Video coding may be performed according to one or more video coding standards. Some illustrative video codec standards include general video codec (VVC), joint exploration test model (JEM), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), and Moving Picture Experts Group (MPEG) codec. Video codecs typically utilize prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy inherent in video images or sequences. One goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

Disclosure of Invention

This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.

According to a first aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first level, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of the second picture resolution, the second level, or the second level.

According to a second aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first motion vector precision level for storing the first motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first level, or a first level associated with the first picture; and selecting a second motion vector precision level for storing the second motion vector in the motion vector buffer, wherein the selection is selected in response to any of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first motion vector precision level is different from the second motion vector precision level.

According to a third aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any one of a first picture resolution, a first level, or a first level associated with the first picture; and selecting a second minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any one of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first minimum allowable block size is different from the second minimum allowable block size.

Drawings

In the following, a set of illustrative, non-limiting embodiments of the present disclosure will be described in connection with the accompanying drawings. Variations in structure, method, or function may be implemented by persons of ordinary skill in the relevant art based on the examples provided herein, and such variations are included within the scope of the present disclosure. The teachings of the different embodiments may, but need not, be combined with each other in case of conflict.

FIG. 1 is a block diagram illustrating an illustrative universal video codec test model 3(VTM-3) encoder.

Fig. 2 is a graphical depiction of a picture divided into multiple Coding Tree Units (CTUs).

FIG. 3 illustrates a multi-type tree structure having multiple segmentation patterns.

FIG. 4A illustrates an example of a block-based 4-parameter affine motion model for VTM-3.

FIG. 4B illustrates an example of a block-based 6-parameter affine motion model for VTM-3.

Fig. 5 is a graphical depiction of an affine Motion Vector Field (MVF) organized as a plurality of sub-blocks.

Fig. 6A shows a set of spatially neighboring blocks used by a sub-block based temporal motion vector prediction (SbTMVP) process in the context of general video coding.

Fig. 6B illustrates a sub-block based temporal motion vector prediction (SbTMVP) process for deriving sub-Coding Unit (CU) motion fields by applying motion shifts from spatial neighbors and scaling the motion information from the corresponding co-located sub-CU.

Fig. 7A shows representative Motion Vector (MV) for 16:1MV compression used in High Efficiency Video Coding (HEVC).

Fig. 7B shows representative Motion Vectors (MVs) for 4:1MV compression for use in VTM-3.

Fig. 8A shows representative Motion Vectors (MVs) for vertical 8:1MV compression.

Fig. 8B shows representative Motion Vectors (MVs) for horizontal 8:1MV compression.

Detailed Description

The terminology used in the present disclosure is intended to be illustrative of particular examples and is not intended to be limiting of the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It is to be understood that the term "and/or" as used herein refers to any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various information, this information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first information may be referred to as a second information without departing from the scope of the present disclosure; and similarly, the second information may also be referred to as the first information. As used herein, the term "if" can be understood to mean "when … …" or "at … …" or "in response," depending on the context.

Reference throughout this specification to "one embodiment," "an embodiment," "another embodiment," or the like, in the singular or in the plural, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment," "in another embodiment," and the like, in the singular and plural, in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

At the 10 th joint video experts group (jfet) meeting, held in san diego, california, 4 months 10-20 in 2018, jfet defined a first draft of universal video codec (VVC) and a VVC test model 1(VTM-1) encoding method. A quadtree (quadtree) structure is determined that includes nested multi-type trees, which uses bi-partition and tri-partition coding block structures as initial new coding features for the VVC. Since then, reference software VTMs for implementing the encoding method and the draft VVC decoding process have been developed during the jfet conference. As in most previous standards, VVCs have a block-based hybrid codec architecture that combines inter-picture and intra-picture prediction and transform codecs with entropy codecs.

Fig. 1 is a block diagram illustrating an illustrative universal video codec test model 3(VTM-3) encoder 100. An input video 102 comprising a plurality of pictures is applied to non-inverting inputs of a first adder 104 and a switch 106. The output of the first adder 104 is connected to the input of the transform/quantization block 108. The output of the transform/quantization block 108 is fed to the input of an entropy coding block 110 and also to the input of an inverse quantization/inverse transform block 111. The output of the inverse quantization/inverse transform block 111 is fed to a first non-inverting input of a second adder 112. The output of the second adder 112 is connected to an input of a loop filter 120. The output of the loop filter 120 is connected to an input of a Decoded Picture Buffer (DPB) 122.

The switch 106 connects the input video 102 to an input of an intra prediction block 114 or to a first input of a motion estimation/compensation block 116. The output of the intra prediction block 114 and the output of the motion estimation/compensation block 116 are both connected to an inverting input of the first adder 104 and to a second non-inverting input of the second adder 112. The output of the DPB 122 is connected to a motion estimation/compensation block 116.

In operation, the encoder 100 divides or partitions an input image into a sequence of Coding Tree Units (CTUs). The CTU concept is substantially similar to that utilized in High Efficiency Video Coding (HEVC). For a picture with three arrays of spots, when using 4: 2: in the YUV chroma subsampling format of 0, the CTU comprises a 2N × 2N block of luma samples and two corresponding N × N blocks of chroma samples.

Fig. 2 is a graphical depiction of a picture partitioned or divided into a plurality of Code Tree Units (CTUs) 201, 202, 203 using a tree structure in VVC. In HEVC, each

CTU

201, 202, 203 is partitioned into Coding Units (CUs) by using a quadtree structure, represented as a coding tree or quadtree, to accommodate various local characteristics. The decision whether to encode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction is made at the leaf-CU level. Each leaf-CU may be further partitioned into one, two, or four Prediction Units (PUs) according to the PU partition type. Within one PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU partition type, the leaf-CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree of the CU. One feature of the HEVC structure is that it utilizes multiple partitioning concepts including CU, PU and TU.

In VVC, a quad-tree (quadtree) with nested multi-type trees using bi-partition and tri-partition segmentation structures replaces the concept of multi-partition unit types. Thus, in addition to a CU with too large a size for the maximum transform length requiring separation of CU, PU and TU concepts, the quadtree removes the separation of CU, PU and TU concepts while supporting greater flexibility for CU partition shapes. In the coding tree structure, a CU may have a square or rectangular shape. Each Coding Tree Unit (CTU)201, 202, 203 is first partitioned in a quadtree structure. The quadtree leaf nodes may then be further partitioned in a multi-type tree structure.

FIG. 3 illustrates a multi-type tree structure having multiple segmentation patterns. Any one of the four SPLIT types exists in the multi-type tree structure of fig. 3, i.e., a vertical binary SPLIT (SPLIT _ BT _ VER)301, a horizontal binary SPLIT (SPLIT _ BT _ HOR)302, a vertical ternary SPLIT (SPLIT _ TT _ VER)303, and a horizontal ternary SPLIT (SPLIT _ TT _ HOR) 304. The multi-type leaf node is called a Coding Unit (CU). This partition is used for prediction and transform processing without any further partitioning unless the CU is too large for the maximum transform length. This means that in most cases, a CU, a PU and a TU have the same block size in a quadtree with a nested multi-type tree coding block structure. An exception occurs when the maximum supported transform length is less than the width or height of the color component of the CU.

For each inter-predicted CU, the motion parameters including motion vectors, reference picture indices and reference picture list usage indices, and any additional information needed for the new coding features of VVC are used for inter-prediction sample generation. The motion parameters may be signaled explicitly or implicitly. When a CU is coded in skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta, or reference picture index. The merge mode is defined as obtaining motion parameters for the current CU from neighboring CUs (including spatial candidates and temporal candidates) and introducing additional scheduling in the VVC. Not only the skip mode, the merge mode can be applied to any inter-predicted CU. An alternative to merge mode is to explicitly signal the motion parameters, where the motion vectors, corresponding reference picture indices for each reference picture list, and reference picture list usage flags and other required information are explicitly signaled for each CU.

In addition to the inter-coding features in HEVC, VTM3 includes a number of new and improved inter-prediction coding tools listed below:

-extended merge prediction

Merge mode with MVD (MMVD)

-affine motion compensated prediction

Sub-block based temporal motion vector prediction (SbTMVP)

-Adaptive Motion Vector Resolution (AMVR)

-stadium storage: 1/16 pixel brightness samples (1/16)^thluma sample) MV storage and 8 x 8 motion field compression

-bi-prediction with weighted average (BWA)

Bidirectional optical flow (BDOF)

-triangle partition prediction

-Combined Inter and Intra Prediction (CIIP)

The following paragraphs provide details regarding the selected inter prediction method specified in VVC. In addition to the inter-coding features in HEVC, VTM3 includes a number of new and improved inter-prediction coding tools listed below:

extended merge prediction is performed in VVC as follows. In VTM3, a merge candidate list is constructed by sequentially including the following five types of candidates:

1) spatial MVP from spatially neighboring CUs

2) Temporal MVP from co-located CUs

3) History-based MVP from FIFO tables

4) Pairwise averaged MVP

5) And zero MV.

The size of the merge list is signaled in the slice header. In VTM-3, the maximum allowed size of the merge list is 6. For each CU codec in merge mode, the index of the best merge candidate is encoded using truncated unary binarization (TU). The first bin of the merge index is coded using the context and bypass coding is used for the other bins.

Affine motion compensated prediction in VVC is performed as follows. In HEVC, only the translational motion model is applied to Motion Compensated Prediction (MCP), despite the fact that there are many types of motion in the real world, e.g., zoom in/out, rotation, perspective motion, and other irregular motion.

Fig. 4A shows an example of a block-based 4-parameter affine motion model 401 for VTM-3, and fig. 4B shows an example of a block-based 6-parameter affine motion model 402 for VTM-3. Models 401 and 402 are used in conjunction with the motion compensation process for VTM-3. In the case of a 4-parameter affine motion model 401, the motion from two control points v is used₀And v₁Which constitutes a 4-parameter affine motion model, to describe the affine motion field of a given block. In the case of the 6-parameter affine motion model 402, the motion from three control points v is used₀、v₁And v₂Constitutes a 6-parameter affine motion model to describe the affine motion field of a given block.

For the 4-parameter affine motion model 401, the motion vector at the sample position (x, y) in the block is guided as follows:

for the 6-parameter affine motion model 402, the motion vector at the sample position (x, y) in the block is guided as follows:

wherein (mv)_0x,mv_0y) Is the motion vector of the upper left corner control point, (mv)_1x,mv_1y) Is the motion vector of the upper right corner control point, and (mv)_2x,mv_2y) Is the motion vector for the lower left corner control point.

Fig. 5 is a graphical depiction of an affine Motion Vector Field (MVF)501 organized into a plurality of

sub-blocks

502, 503, and 504. To simplify motion compensated prediction, a block-based affine transform prediction process is applied. For purposes of illustration, each of the plurality of

sub-blocks

502, 503, and 504 is assumed to be a 4 × 4 luma sub-block. To derive the motion vector for each 4 × 4 luma sub-block, the motion vector for the center sample point of each sub-block as shown in fig. 5 is calculated according to the aforementioned equations (1) and (2) and rounded to fractional precision 1/16. For example, the motion vector of the center sample of sub-block 502 is shown as motion vector 505. Then, a motion compensated interpolation filter is applied to generate a prediction for each sub-block with the derived motion vector. The subblock size of the chrominance component is also set to 4 × 4. The Motion Vector (MV) for the 4 × 4 chroma sub-block is calculated as the average of the MVs of the four corresponding 4 × 4 luma sub-blocks. As in the case of translational motion inter prediction, there are also two affine motion inter prediction modes: affine merge mode and affine AMVP mode.

The VTM supports a subblock-based temporal motion vector prediction (SbTMVP) method in VVC. Similar to Temporal Motion Vector Prediction (TMVP) in HEVC, SbTMVP uses motion fields in co-located pictures to improve motion vector prediction and merging modes for CUs in the current picture. The same co-located picture used by TMVP is used for SbTVMP. SbTMVP differs from TMVP in two ways:

1. TMVP predicts motion at CU level, but SbTMVP predicts motion at sub-CU level;

2. TMVP extracts a temporal motion vector from a collocated block in the collocated picture (the collocated block is the bottom-right block or center block relative to the current CU), while SbTMVP applies a motion shift obtained from a motion vector from one of the spatially neighboring blocks of the current CU before extracting temporal motion information from the collocated picture.

Fig. 6A shows the set of spatially neighboring blocks used by the sub-block based temporal motion vector prediction (SbTMVP) process in the context of general video coding, and fig. 6B shows the sub-block based temporal motion vector prediction (SbTMVP) process for deriving the sub-Coding Unit (CU) motion field by applying motion shifts from spatial neighbors and scaling the motion information from the corresponding co-located sub-CU. SbTMVP predicts the motion vectors of sub-CUs within the current CU in two steps. In the first step, the spatial proximity in fig. 6A is checked in the order of a 1601, B1604, B0603 and a 0602. Once a first spatially neighboring block is identified that has a motion vector that uses the co-located picture as its reference picture, the motion vector is selected to represent the motion shift to be applied. If no such motion is identified from spatial proximity, the motion shift is set to (0, 0).

In a second step, as shown in fig. 6B, the motion shift identified in step 1 is applied (i.e., added to the coordinates of the current block) to obtain the motion information (motion vector and reference index) at the sub-CU level from the co-located picture. The example in fig. 6B assumes that the motion shift is set to the motion of the block a 1601. Then, for each sub-CU, the motion information of its corresponding block (the minimum motion grid covering the central sample point) in the co-located picture is used to derive the motion information of the sub-CU. After identifying the motion information of the co-located sub-CU, the motion information of the co-located sub-CU is converted to a motion vector and a reference index of the current sub-CU in a similar manner as the TMVP process of HEVC, where temporal motion scaling is applied to align the reference picture of the temporal motion vector with the reference picture of the current CU.

In VTM3, a subblock-based merge list containing a combination of both SbTVMP candidates and affine merge candidates is used for signaling of subblock-based merge modes. The SbTVMP mode is enabled/disabled by a Sequence Parameter Set (SPS) flag. If SbTMVP mode is enabled, an SbTMVP predictor is added as the first entry of the list of sub-block based merge candidates, followed by affine merge candidates. The size of the sub-block based merge list is signaled in the SPS, and the maximum allowed size of the sub-block based merge list is 5 in the VTM 3.

The sub-CU size used in SbTMVP is fixed to 8 × 8, and the SbTMVP mode is applicable only to CUs whose both width and height are greater than or equal to 8 as is done by the affine merge mode. The encoding logic of the additional SbTMVP merge candidates is the same as that of the other merge candidates, i.e., for each CU in a P slice or a B slice, an additional RD check is performed to decide whether to use an SbTMVP candidate.

Profile, tier and level: video codec standards such as h.264/AVC, h.265/HEVC, and VVC are designed to be generic in the sense that they serve a wide range of applications, bit rates, resolutions, qualities, and services. Applications shall encompass digital storage media, television broadcasts and real-time communications, among others. In creating this specification, having considered various requirements from typical applications, the necessary algorithm elements have been developed and these have been integrated into a single grammar comprising a plurality of feature sets. These feature sets may be implemented independently, or in any of various combinations. Accordingly, the present description facilitates video data exchange between various different applications.

However, given the practicality of implementing the full set of features of the present specification, a limited number of subsets of these features may be specified by "rank", "hierarchy" and "level". The "profile" is a subset of the entire bitstream syntax specified in this specification. Within the bounds imposed by the syntax of a given level, the performance of the encoder and decoder can still vary greatly depending on the value taken by the semantic elements in the bitstream (such as the specified size of the decoded picture). In many applications, it is currently neither practical nor economical to implement a decoder that can handle all assumptions of syntax within a particular profile.

To overcome the problems inherent in all hypothetical uses of the syntax that implements a given profile, "hierarchy" and "level" are specified within each profile. The level of the hierarchy is a specified set of constraints imposed on the values of semantic elements in the bitstream. These constraints may be simple limits on the values. Alternatively, they may take the form of constraints on the arithmetic combination of values (e.g., picture width times picture height times the number of pictures decoded per second). The levels specified for the lower hierarchy are more constrained than the levels specified for the higher hierarchy.

Due to the inherent nature of TMVP, all motion information for reference pictures needs to be stored in order to perform temporal Motion Vector (MV) prediction. In HEVC, the smallest available block to store this motion information is 16 × 8/8 × 16. However, to reduce the size of the temporal MV buffer, a motion information compression scheme may be advantageously introduced in HEVC. According to this method, each picture is divided into 16 × 16 blocks. Only the motion information from the upper left 4 x 4 block in each 16 x 16 block is used as representative motion for all 4 x 4 blocks within the 16 x 16 block. Since one 4 × 4MV is stored to represent 16 4 × 4 blocks, this method may be referred to as 16:1MV compression.

Fig. 7A shows representative MVs for 16:1 Motion Vector (MV) compression used in High Efficiency Video Coding (HEVC), and fig. 7B shows representative MVs for 4:1 Motion Vector (MV) compression used in VTM-3. As shown in fig. 7A, representative 4 × 4 blocks for each 16 × 16 block are represented as a 701, B703, C705, and D707. In the current VVC (VTM-3.0), a 4:1MV compression scheme is used. As shown in fig. 7B, the MV of the upper left 4 × 4 block (denoted as a 711, B713, C715, … … P717) of each 8 × 8 block is used to represent the MVs of all 4 × 4 blocks within the same 8 × 8 block.

In current versions of VVC, higher MV precision for MV storage typically requires a larger MV buffer to store MVs. When higher MV precision (e.g., 1/8 or 1/16) is enabled, we propose several methods to reduce the size of the MV buffer. On the other hand, when the MVs are stored using fixed bits (e.g., 16 bits for each MV component), using lower MV precision for MV storage may increase the effective range of the stored MVs.

Furthermore, Motion Compensation (MC) is typically the largest consumer of memory access bandwidth in decoder implementations. Thus, for video codec standards, providing reasonable limits on MC memory access bandwidth requirements is extremely important to facilitate its cost-effective implementation and to ensure its cross-industry success. Memory access bandwidth requirements for an MC are typically determined by the size of the operation block and the type of prediction (e.g., unidirectional or bidirectional) to be performed. In the current version of VVC, there is no limitation in this respect. As a result, the worst-case bandwidth requirement is greater than 2 times the corresponding worst-case bandwidth for HEVC. In VVC, the worst case of MC memory access bandwidth occurs with a 4 × 4 block of bi-directional MC, which is utilized by some codec modes described in more detail below.

Proposed method

Note that the proposed methods described herein can be applied independently, or in any of various combinations.

Adaptive MV compression

Fig. 8A shows representative MVs for vertical 8:1 Motion Vector (MV) compression, and fig. 8B shows representative MVs for horizontal 8:1 Motion Vector (MV) compression. To provide an improved trade-off between the required size of the TMVP buffer and the coding efficiency, we propose to use either of two compression schemes (denoted horizontal 8:1MV compression and vertical 8:1MV compression). As shown in fig. 8A, for the first 16 × 8/8 × 16 block 801, the MV of the upper left 4 × 4 block 811 is used as a representative MV. Also, as shown in fig. 8B, for the second 16 × 8/8 × 16 block 802, the MV of the upper left 4 × 4 block 821 is used as a representative MV. Furthermore, we propose to apply any one of a plurality of different rate temporal MV compression schemes (e.g., 16:1, 4:1, horizontal 8:1, or vertical 8:1) in response to one or more video parameters, such as picture resolution (sometimes referred to as picture size), grade, or parameter level.

According to one set of examples, 4:1 or 16:1MV compression is applied to the temporal MV buffer in response to any one of picture resolution, profile, or parameter level. In one exemplary embodiment, when the picture resolution is less than or equal to (1280 × 720), 4:1MV compression is applied to the temporal MV buffer. When the picture resolution is greater than (1280 × 720), 16:1MV compression is applied to the temporal MV buffer.

According to another set of examples, 4:1 or vertical 8:1MV compression is applied to the temporal MV buffer in response to picture resolution, profile, or parameter level. In one exemplary embodiment, 4:1MV compression is applied to the temporal MV buffer for a picture resolution less than or equal to (1280 × 720). For picture resolutions greater than (1280 × 720), vertical 8:1MV compression is applied to the temporal MV buffer.

Adaptive MV precision for MV storage

We propose to store MVs into MV buffers with predefined or signaled MV precision.

According to one set of illustrative examples, each corresponding MV of MVs is stored in the MV buffer at a respective predefined MV precision in response to one or more video parameters, such as picture resolution (sometimes referred to as picture size), grade, or parameter level. It should be noted that references herein to MV buffers include any of spatial MV buffers, temporal MV buffers, or spatial MV line buffers. According to the proposed example, each of a plurality of respective MV precision levels may be used to store MVs into any of a plurality of corresponding MV buffers. Further, respective MV precision levels used by MV storage may be selected in response to corresponding picture resolutions.

According to one set of examples, when a high level of MV precision (e.g., 1/8 or 1/16) is enabled, the proposed method stores MVs for temporal MV prediction at any of a number of different MV precisions, such as 1/16 pixels (1/16-pel), 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel, based on picture resolution, grade, or parameter level. In particular, when reconstructing all CUs within one picture/slice, the MV of each of these CUs is stored in a temporal MV buffer (referred to as temporal MV buffer) to be used as temporal MV prediction for one or more subsequent pictures/slices. We propose to store each of the respective MVs into a temporal MV buffer using the corresponding MV precision in response to a picture resolution, grade or parameter level. For example, when the picture resolution is less than or equal to (1280 × 720), 1/16-MV precision is used to store MVs in the temporal MV buffer. When the picture resolution is greater than (1280 × 720), the MVs are stored in the temporal MV buffer using 1/4-MV precision.

In another set of examples, the MV line buffer size is reduced by storing MVs for spatial MV prediction across CTU lines at any of a plurality of different MV precisions (such as 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel) in response to any of picture resolution, grade, or parameter level. In yet another set of examples, each of the MVs stored in the spatial MV buffer is stored at any one of a plurality of different MV precisions (such as 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel) in response to picture resolution, grade, or parameter level. In other words, some of the MVs generated by the averaging or scaling process may have higher MV precision (1/16 pixels or 1/8 pixels), but the MVs stored in the spatial MV buffer for MV prediction are stored using a different and possibly lower MV precision. If stored at such a lower resolution, the buffer size may be reduced.

In yet another set of examples, each of the MVs stored in the MV buffer is stored at any one of a plurality of different MV precisions (such as 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel) in response to picture resolution, grade, or parameter level. In other words, the MVs generated by the averaging or scaling process may have higher MV precision (1/16 pixels or 1/8 pixels), but the MVs stored in each of the MV buffers used for MV prediction remain different and possibly at lower MV precision. If stored at such a lower resolution, the buffer size may be reduced.

In yet another set of examples, the MV precision level used to store MVs into the historical MV table (also referred to as the historical MV buffer) may have a MV precision that is different from the MV precision used to store MVs in the temporal MV buffer or the spatial MV buffer or the MV line buffer. For example, even when the MVs are stored in the temporal MV buffer or the spatial MV buffer using a lower MV precision level, the MVs may be stored in the historical MV buffer using a higher MV precision level (e.g., 1/16 pixels).

Minimum block size for motion compensation

According to one set of examples, the minimum block size for motion compensation is determined in response to a video parameter such as picture resolution (also referred to as picture size), profile, or parameter level. In one example, a 4 x 4 block may be used for motion compensation for each respective picture having a corresponding resolution less than or equal to (1280 x 720); and a 4 x 4 block is not available for motion compensation for each respective picture having a corresponding resolution greater than (1280 x 720). These block size constraints may also include sub-block size constraints for sub-block based inter modes, such as affine motion mode and sub-block based temporal motion vector prediction.

In one example, the minimum block size for motion compensation is determined according to video parameters such as picture resolution (also referred to as picture size), grade, or parameter level. In one example, a 4 x 4 block may be used for both unidirectional motion compensation and bidirectional motion compensation for each picture with a resolution less than or equal to (1280 x 720); and 4 × 4 blocks are not available for bi-directional motion compensation for each picture with a resolution greater than (1280 × 720). The block size constraints may also include sub-block size constraints for sub-block based inter modes, such as affine motion mode and sub-block based temporal motion vector prediction.

In some examples, the first temporal motion vector compression scheme uses a first compression ratio and the second temporal motion vector compression scheme uses a second compression ratio different from the first compression ratio.

In some examples, the first compression ratio is selected to be less than the second compression ratio in response to the first picture resolution being less than or equal to the second picture resolution.

In some examples, the first compression ratio is selected to be greater than the second compression ratio in response to the first picture resolution being greater than the second picture resolution.

In some examples, the first compression ratio comprises at least one of 16:1, 4:1, horizontal 8:1, or vertical 8: 1.

According to a second aspect of the present disclosure, a video codec method is performed at a computing device having one or more processors and a memory storing a plurality of programs to be executed by the one or more processors. The method comprises the following steps: selecting a first motion vector precision level for storing the first motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first level, or a first level associated with the first picture; and selecting a second motion vector precision level for storing the second motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a second picture resolution, a second level, or a second level associated with the second picture; wherein the first motion vector precision level is different from the second motion vector precision level.

In some examples, the motion vector buffer comprises at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

In some examples, the first motion vector precision level includes any of 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel.

In some examples, the plurality of coding units are reconstructed within the first picture or within slices of the first picture; storing each of a plurality of motion vectors for each of a plurality of coding units in a temporal motion vector buffer; and the temporal motion vector buffer is used to perform prediction for one or more consecutive pictures following the first picture or one or more consecutive slices following a slice of the first picture.

In some examples, the first motion vector precision level is selected to be less than the second motion vector precision level in response to the first picture resolution being less than or equal to the second picture resolution.

In some examples, the spatial motion vector line buffer stores a plurality of motion vectors across the coding tree unit, the plurality of motion vectors including at least a first motion vector and a second motion vector, wherein the first motion vector is stored in the spatial motion vector line buffer at a first motion vector precision level and the second motion vector is stored in the spatial motion vector line buffer at a second motion vector precision level.

In some examples, the averaging or scaling process generates one or more motion vectors that include at least the first motion vector. The one or more motion vectors are generated with a first level of motion vector precision. The one or more motion vectors are stored in a spatial motion vector line buffer with a second motion vector precision level.

In some examples, the second motion vector precision level is selected to be less than the first motion vector precision level.

In some examples, the averaging or scaling process generates one or more motion vectors that include at least the first motion vector. The one or more motion vectors are generated with a first level of motion vector precision. The one or more motion vectors are stored in a spatial motion vector line buffer, a temporal motion vector buffer, and a spatial motion vector line buffer at a second motion vector precision level.

In some examples, the historical motion vector buffer stores a plurality of motion vectors including at least the first motion vector at a first motion vector precision level. The plurality of motion vectors is stored in at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer with a second motion vector precision level.

In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to a sub-block size constraint for at least one of affine motion prediction or sub-block based temporal motion vector prediction.

In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to at least one constraint for performing bi-directional motion compensation or uni-directional motion compensation.

In some examples, the first minimum allowable block size is greater than 4 x 4 blocks when the first picture has a first picture resolution greater than 1280 x 720.

In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer readable medium may include a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the embodiments described herein. The computer program product may include a computer-readable medium.

Further, the above methods may be implemented using an apparatus comprising one or more circuits including an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use circuitry in combination with other hardware or software components to perform the above-described methods. Each module, sub-module, unit or sub-unit disclosed above may be implemented, at least in part, using one or more circuits.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles thereof, and including such departures from the present disclosure as come within known or customary practice within the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise examples described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the invention be limited only by the claims appended hereto.

Claims

1. A video encoding and decoding method, comprising:

selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first level, or a first level; and is

Selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second level, or a second level.

2. The video coding and decoding method of claim 1, wherein the first temporal motion vector compression scheme uses a first compression ratio and the second temporal motion vector compression scheme uses a second compression ratio different from the first compression ratio.

3. The video coding and decoding method according to claim 2, further comprising: in response to the first picture resolution being less than or equal to the second picture resolution, selecting the first compression ratio to be less than the second compression ratio.

4. The video coding and decoding method according to claim 2, further comprising: in response to the first picture resolution being greater than the second picture resolution, selecting the first compression ratio to be greater than the second compression ratio.

5. The video coding method of claim 2, wherein the first compression ratio comprises at least one of 16:1, 4:1, horizontal 8:1, or vertical 8: 1.

6. A video encoding and decoding method, comprising:

selecting a first motion vector precision level for storing the first motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first level, or a first level associated with the first picture; and is

Selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a second picture resolution, a second level, or a second level associated with a second picture;

wherein the first motion vector precision level is different from the second motion vector precision level.

7. The video coding and decoding method of claim 6, wherein the motion vector buffer comprises at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

8. The video coding method of claim 7, wherein the first motion vector precision level comprises any one of 1/16 pixels, 1/8 pixels, 1/4 pixels, 1/2 pixels, or 1 pixel.

9. The video coding and decoding method of claim 8, further comprising:

reconstructing a plurality of coding units within the first picture or within a slice of the first picture;

storing each of a plurality of motion vectors for each of the plurality of coding units in the temporal motion vector buffer; and is

Performing prediction for one or more consecutive pictures following the first picture or one or more consecutive slices following the slice of the first picture using the temporal motion vector buffer.

10. The video coding and decoding method of claim 8, further comprising: in response to the first picture resolution being less than or equal to the second picture resolution, selecting the first motion vector precision level to be less than the second motion vector precision level.

11. The video coding and decoding method according to claim 7, further comprising: storing, with the spatial motion vector line buffer, a plurality of motion vectors across a coding tree unit, wherein the plurality of motion vectors includes at least a first motion vector and a second motion vector, wherein the first motion vector is stored in the spatial motion vector line buffer at the first motion vector precision level and the second motion vector is stored in the spatial motion vector line buffer at the second motion vector precision level.

12. The video coding and decoding method according to claim 7, further comprising: generating one or more motion vectors comprising at least a first motion vector using an averaging or scaling process, wherein the one or more motion vectors are generated at a first motion vector precision level, and storing the one or more motion vectors in the spatial motion vector line buffer at the second motion vector precision level.

13. The video coding and decoding method according to claim 12, further comprising: selecting the second motion vector precision level to be less than the first motion vector precision level.

14. The video coding and decoding method according to claim 7, further comprising: generating one or more motion vectors comprising at least a first motion vector using an averaging or scaling process, wherein the one or more motion vectors are generated with a first motion vector precision level, and storing the one or more motion vectors in the spatial motion vector line buffer, the temporal motion vector buffer and the spatial motion vector line buffer with the second motion vector precision level.

15. The video coding and decoding method of claim 14, further comprising: selecting the second motion vector precision level to be less than the first motion vector precision level.

16. The video coding and decoding method according to claim 7, further comprising: using a historical motion vector buffer to store a plurality of motion vectors including at least a first motion vector at the first motion vector precision level; and storing the plurality of motion vectors in at least one of the spatial motion vector buffer, the temporal motion vector buffer, or the spatial motion vector line buffer with the second motion vector precision level.

17. A video encoding and decoding method, comprising:

selecting a first minimum allowable block size for performing motion compensation in response to any one of a first picture resolution, a first level, or a first level associated with the first picture; and is

Selecting a second minimum allowable block size for performing motion compensation in response to any of a second picture resolution, a second level, or a second level associated with the second picture;

wherein the first minimum allowable block size is different from the second minimum allowable block size.

18. The video coding and decoding method of claim 17, further comprising: selecting the first minimum allowable block size and the second minimum allowable block size in response to a sub-block size constraint for at least one of affine motion prediction or sub-block based temporal motion vector prediction.

19. The video coding and decoding method of claim 17, further comprising: selecting the first minimum allowable block size and the second minimum allowable block size in response to at least one constraint for performing bi-directional motion compensation or uni-directional motion compensation.

20. The video coding method of claim 17, wherein the first minimum allowable block size is greater than 4 x 4 blocks when the first picture has a first picture resolution greater than 1280 x 720.