WO2020142468A1

WO2020142468A1 - Picture resolution dependent configurations for video coding

Info

Publication number: WO2020142468A1
Application number: PCT/US2019/069009
Authority: WO
Inventors: Yi-Wen Chen; Xianglin Wang
Original assignee: Beijing Dajia Internet Information Technology Co., Ltd.
Priority date: 2018-12-31
Filing date: 2019-12-30
Publication date: 2020-07-09
Also published as: CN113498609A; CN113498609B

Abstract

A video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first profile, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second profile, or a second level.

Description

Picture Resolution Dependent Configurations for Video Coding

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority to U.S. provisional patent application Ser. No.

62/787,240 filed on Dec. 31, 2018. The entire disclosure of the aforementioned application is incorporated herein by reference in its entirety.

FIELD

[0002] The present disclosure relates generally to video coding and compression. More specifically, this disclosure relates to systems and methods for performing video coding using inter prediction.

BACKGROUND

[0003] This section provides background information related to the present disclosure. The information contained within this section should not necessarily be construed as prior art.

[0004] Any of various video coding techniques may be used to compress video data. Video coding can be performed according to one or more video coding standards. Some illustrative video coding standards include versatile video coding (VVC), joint exploration test model (JEM), high-efficiency video coding (H.265/HEVC), advanced video coding (H.264/AVC), and moving picture experts group (MPEG) coding. Video coding generally utilizes predictive methods (e.g., inter-prediction, intra-prediction, or the like) that take advantage of redundancy inherent in video images or sequences. One goal of video coding techniques is to compress video data into a form that uses a lower bit rate, while avoiding or minimizing degradations to video quality.

SUMMARY

[0005] This section provides a general summary of the disclosure, and is not a

comprehensive disclosure of its full scope or all of its features.

[0006] According to a first aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first profile, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second profile, or a second level. [0007] According to a second aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first motion vector precision level for storing a first motion vector in a motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture; wherein the first motion vector precision level is different from the second motion vector precision level.

[0008] According to a third aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and selecting a second minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture; wherein the first minimum allowable block size is different from the second minimum allowable block size.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] Hereinafter, sets of illustrative, non-limiting embodiments of the present disclosure will be described in conjunction with the accompanying drawings. Variations of structure, method, or functionality may be implemented by those of ordinary skill in the relevant art based on the examples presented herein, and such variations are all contained within the scope of the present disclosure. In cases where no conflict is present, the teachings of different embodiments may, but need not, be combined with one another.

[0010] FIG. 1 is a block diagram setting forth an illustrative Versatile Video Coding Test Model 3 (VTM-3) encoder.

[0011] FIG. 2 is a graphical depiction of a picture divided into a plurality of Coding Tree Units (CTUs).

[0012] FIG. 3 illustrates a multi-type tree structure with a plurality of splitting modes. [0013] FIG. 4A shows an example of a block-based, 4-parameter affine motion model for VTM-3.

[0014] FIG. 4B shows an example of a block-based, 6-parameter affine motion model for VTM-3.

[0015] FIG. 5 is a graphical depiction of an affine Motion Vector Field (MVF) organized into a plurality of sub-blocks.

[0016] FIG. 6A illustrates a set of spatially neighboring blocks used by a subblock-based temporal motion vector prediction (SbTMVP) process in the context of Versatile Video Coding.

[0017] FIG. 6B illustrates a subblock-based temporal motion vector prediction (SbTMVP) process for deriving a sub-Coding Unit (CU) motion field by applying a motion shift from a spatial neighbor, and scaling motion information from a corresponding collocated sub-CU.

[0018] FIG. 7A illustrates a representative Motion Vector (MV) for 16: 1 MV compression used in High-Efficiency Video Coding (HEVC).

[0019] FIG. 7B illustrates a representative Motion Vector (MV) for 4: 1 MV compression used in VTM-3.

[0020] FIG. 8 A illustrates a representative Motion Vector (MV) for Vertical 8: 1 MV compression.

[0021] FIG. 8B illustrates a representative Motion Vector (MV) for Horizontal 8: 1 MV compression.

DETAILED DESCRIPTION

[0022] The terms used in the present disclosure are directed to illustrating particular examples, rather than to limit the present disclosure. The singular forms“a”“an” and“the” as used in the present disclosure as well as the appended claims also refer to plural forms unless other meanings are definitely contained in the context. It should be appreciated that the term "and/or" as used herein refers to any or all possible combinations of one or more associated listed items.

[0023] It shall be understood that, although the terms“first,”“second,”“third,” etc. may be used herein to describe various information, the information should not be limited by these terms. These terms are only used to distinguish one category of information from another. For example, without departing from the scope of the present disclosure, first information may be termed as second information; and similarly, second information may also be termed as first information. As used herein, the term“if’ may be understood to mean“when” or“upon” or “in response to,” depending on the context.

[0024] Reference throughout this specification to“one embodiment,”“an embodiment,” “another embodiment,” or the like in the singular or plural means that one or more particular features, structures, or characteristics described in connection with an embodiment are included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases“in one embodiment” or“in an embodiment,”“in another embodiment,” or the like in the singular or plural in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics in one or more embodiments may be combined in any suitable manner.

[0025] At the 10^th Joint Video Experts Team (JVET) meeting, held in San Diego, California on April 10-20, 2018, JVET defined the first draft of Versatile Video Coding (VVC) and the VVC Test Model 1 (VTM-1) encoding method. It was decided to include a quaternary tree (quadtree) structure with a nested multi-type tree using binary and ternary splits coding block structure as the initial new coding feature of VVC. Since then, the reference software VTM to implement the encoding method and the draft VVC decoding process has been developed during the JVET meetings. As in most preceding standards, VVC has a block-based hybrid coding architecture, combining inter-picture and intra-picture prediction and transform coding with entropy coding.

[0026] FIG. 1 is a block diagram setting forth an illustrative Versatile Video Coding Test Model 3 (VTM-3) encoder 100. Input video 102, comprising a plurality of pictures, is applied to a non-inverting input of a first summer 104 and a switch 106. An output of the first summer 104 is connected to an input of a transform/quantization 108 block. An output of the transform/quantization 108 block is fed to an input of an entropy coding 110 block, and also to an input of an inverse quantization/inverse transform 111 block. The output of the inverse quantization/inverse transform 111 block is fed to a first non-inverting input of a second summer 112. An output of the second summer 112 is connected to an input of an in loop filter 120. An output of the in-loop filter 120 is connected to an input of a decoded picture buffer (DPB) 122.

[0027] The switch 106 connects the input video 102 to an input of an intra prediction 114 block, or to a first input of a motion estimation/compensation 116 block. The output of the intra prediction block 114, and the output of the motion estimation/compensation 116 block, are both connected to an inverting input of the first summer 104, as well as to a second non- inverting input of the second summer 112. An output of the DPB 122 is connected to the motion estimation/compensation 116 block.

[0028] In operation, the encoder 100 divides or partitions incoming pictures into a sequence of coding tree units (CTUs). The CTU concept is substantially similar to that utilized in High Efficiency Video Coding (HEVC). For a picture that has three sample arrays, a CTU includes an 2N*2N block of luma samples, together with two corresponding NxN blocks of chroma samples, when a YUV chroma subsampling format of 4:2:0 is used.

[0029] FIG. 2 is a graphical depiction of a picture divided or partitioned into a plurality of Coding Tree Units (CTUs) 201, 202, 203 using a tree structure in VVC. In HEVC, each CTU 201, 202, 203 is split into coding units (CUs) by using a quaternary -tree structure, denoted as a coding tree or a quadtree, to adapt to various local characteristics. The decision of whether to code a picture area using inter-picture (temporal), versus intra-picture (spatial) prediction, is made at a leaf CU level. Each leaf CU can be further split into one, two or four prediction units (Pus) according to a PU splitting type. Inside one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU splitting type, a leaf CU can be partitioned into transform units (TUs) according to another quaternary- tree structure similar to the coding tree for the CU. One feature of the HEVC structure is that it utilizes multiple partition concepts including CU, PU, and TU.

[0030] In VVC, a quaternary tree (quadtree) with a nested multi -type tree using a binary and ternary splits segmentation structure replaces the concept of multiple partition unit types. Thus, the quadtree removes the separation of the CU, PU and TU concepts except as needed for CUs that have a size too large for the maximum transform length, while supporting more flexibility for CU partition shapes. In the coding tree structure, a CU can have either a square or a rectangular shape. Each coding tree unit (CTU) 201, 202, 203 is first partitioned by the quadtree structure. Then the quadtree leaf nodes can be further partitioned by a multi-type tree structure.

[0031] FIG. 3 illustrates a multi -type tree structure with a plurality of splitting modes. Any of four splitting types exist in the multi-type tree structure of FIG. 3 - namely, vertical binary splitting (SPLIT BT VER) 301, horizontal binary splitting (SPLIT BT HOR) 302, vertical ternary splitting (SPLIT TT VER) 303, and horizontal ternary splitting (SPLIT TT HOR) 304. The multi-type tree leaf nodes are called coding units (CUs). Unless the CU is too large for the maximum transform length, this segmentation is used for prediction and transform processing without any further partitioning. This means that, in most cases, the CU, PU and TU have the same block size in the quadtree with the nested multi-type tree coding block structure. The exception occurs when the maximum supported transform length is smaller than the width or height of the color component of the CU.

[0032] For each inter-predicted CU, motion parameters including motion vectors, reference picture indices and a reference picture list usage index, and any additional information needed for the new coding feature of VVC, are used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with a skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules are introduced in VVC. The merge mode can be applied to any inter-predicted CU, not only for skip mode. The alternative to merge mode is the explicit transmission of motion parameters, where the motion vector, a corresponding reference picture index for each reference picture list, and a reference picture list usage flag and other needed information are signalled explicitly per each CU.

[0033] Beyond the inter coding features in HEVC, the VTM3 includes a number of new and refined inter prediction coding tools listed as follows:

- Extended merge prediction

- Merge mode with MVD (MMVD)

- Affine motion compensated prediction

- Subblock-based temporal motion vector prediction (SbTMVP)

- Adaptive motion vector resolution (AMVR)

- Motion field storage: 1/16^th luma sample MV storage and 8x8 motion field compression

- Bi-prediction with weighted averaging (BWA)

- Bi-directional optical flow (BDOF)

- Triangle partition prediction

- Combined inter and intra prediction (CUP)

[0034] The following paragraphs provide details regarding the selected inter prediction methods specified in VVC. Beyond the inter coding features in HEVC, the VTM3 includes a number of new and refined inter prediction coding tools listed as follows:

[0035] Extended merge prediction is performed in VVC as follows. In VTM3, the merge candidate list is constructed by including the following five types of candidates in order:

1) Spatial MVP from spatial neighbour CUs 2) Temporal MVP from collocated CUs

3) History-based MVP from an FIFO table

4) Pairwise average MVP

5) Zero MVs.

[0036] The size of the merge list is signalled in a slice header. The maximum allowed size of the merge list is 6 in VTM-3. For each CU code in merge mode, an index of best merge candidates is encoded using truncated unary binarization (TU). The first bin of the merge index is coded with context, and bypass coding is used for other bins.

[0037] Affine motion compensated prediction in VVC is performed as follows. In HEVC, only a translational motion model is applied for motion compensation prediction (MCP), despite the fact that, in the real world, there are many kinds of motion, e.g. zoom in/out, rotation, perspective motions and the other irregular motions.

[0038] FIG. 4 A shows an example of a block-based, 4-parameter affine motion model 401 for VTM-3, and FIG. 4B shows an example of a block-based, 6-parameter affine motion model 402 for VTM-3. The models 401 and 402 are used in conjunction with a motion compensation procedure for VTM-3. In the case of the 4-parameter affine motion model 401, the affine motion field of a given block is described using motion information from two control points vo and vi (constituting a 4-parameter affine motion model). In the case of the 6-parameter affine motion model 402, the affine motion field of a given block is described using motion information from three control points vo, vi and V2 (constituting a 6-parameter affine motion model).

[0039] For the 4-parameter affine motion model 401, a motion vector at sample location (x, y) in a block is derived as:

mvi_x-mv_ox my_ly-mv_oy

mv_x w X +

W y + mv_0x

_ my_ly-mv_oy my_ly-mv_ox (1)

mvy x 4- W w y + mv_0y

For the 6-parameter affine motion model 402, a motion vector at the sample location (x, y) in a block is derived as:

where ( mvo_x , mvo_y) is a motion vector of the top-left comer control point, ( mvi_x , mvi_y) is a motion vector of the top-right comer control point, and ( mv2x , mv2_y) is a motion vector of the bottom-left comer control point. [0040] FIG. 5 is a graphical depiction of an affine Motion Vector Field (MVF) 501 organized into a plurality of sub-blocks 502, 503, and 504. In order to simplify motion compensation prediction, a block-based affine transform prediction procedure is applied. For purposes of illustration, assume that each of the plurality of sub-blocks 502, 503 and 504 are 4x4 luma sub-blocks. To derive a motion vector for each 4x4 luma sub-block, the motion vector of the center sample of each sub-block, as shown in Figure 5, is calculated according to the foregoing equations (1) and (2), and rounded to a fractional accuracy of 1/16. For example, the motion vector of the center sample of sub-block 502 is shown as a motion vector 505. Then the motion compensation interpolation filters are applied to generate the prediction of each sub-block with a derived motion vector. The sub-block size of chroma-components is also set to be 4x4. A motion vector (MV) for a 4x4 chroma sub-block is calculated as the average of the MVs of the four corresponding 4x4 luma sub-blocks. As in the case of translational motion inter prediction, there are also two affine motion inter prediction modes: affine merge mode and affine AMVP mode.

[0041] VTM supports the subblock-based temporal motion vector prediction (SbTMVP) method in VVC. Similar to the temporal motion vector prediction (TMVP) in HEVC, SbTMVP uses the motion field in the collocated picture to improve motion vector prediction and merge mode for CUs in the current picture. The same collocated picture used by TMVP is used for SbTVMP. SbTMVP differs from TMVP in the following two aspects:

1. TMVP predicts motion at the CU level, but SbTMVP predicts motion at a sub-CU level;

2. Whereas TMVP fetches the temporal motion vectors from the collocated block in the collocated picture (the collocated block is the bottom-right or center block relative to the current CU), SbTMVP applies a motion shift before fetching the temporal motion information from the collocated picture, where the motion shift is obtained from the motion vector from one of the spatial neighboring blocks of the current CU.

[0042] FIG. 6A illustrates a set of spatially neighboring blocks used by a subblock-based temporal motion vector prediction (SbTMVP) process in the context of Versatile Video Coding, and FIG. 6B illustrates a subblock-based temporal motion vector prediction

(SbTMVP) process for deriving a sub-Coding Unit (CU) motion field by applying a motion shift from a spatial neighbor, and scaling motion information from a corresponding collocated sub-CU. SbTMVP predicts the motion vectors of the sub-CUs within the current CU in two steps. In the first step, the spatial neighbors in FIG. 6 A are examined in the order of A1 601, B1 604, B0 603, and A0 602. As soon as the first spatial neighboring block that has a motion vector that uses the collocated picture as its reference picture is identified, this motion vector is selected to represent the motion shift to be applied. If no such motion is identified from the spatial neighbors, then the motion shift is set to (0, 0).

[0043] In the second step, the motion shift identified in Step 1 is applied (i.e. added to the current block’s coordinates) to obtain sub-CU-level motion information (motion vectors and reference indices) from the collocated picture as shown in FIG. 6B. The example in FIG. 6B assumes the motion shift is set to the motion of block A1 601. Then, for each sub-CU, the motion information of its corresponding block (the smallest motion grid that covers the center sample) in the collocated picture is used to derive the motion information for the sub-CU. After the motion information of the collocated sub-CU is identified, it is converted to the motion vectors and reference indices of the current sub-CU in a similar manner as the TMVP process of HEVC, where temporal motion scaling is applied to align the reference pictures of the temporal motion vectors to those of the current CU.

[0044] In VTM3, a combined sub-block based merge list which contains both SbTVMP candidate and affine merge candidates is used for the signalling of sub-block based merge mode. The SbTVMP mode is enabled/disabled by a sequence parameter set (SPS) flag. If the SbTMVP mode is enabled, the SbTMVP predictor is added as the first entry of the list of sub block based merge candidates, and followed by the affine merge candidates. The size of a sub-block based merge list is signalled in SPS, and the maximum allowed size of the sub block based merge list is 5 in VTM3.

[0045] The sub-CU size used in SbTMVP is fixed to be 8x8, and as done for affine merge mode, SbTMVP mode is only applicable to the CU with both width and height are larger than or equal to 8. The encoding logic of the additional SbTMVP merge candidate is the same as for the other merge candidates, that is, for each CU in P or B slice, an additional RD check is performed to decide whether to use the SbTMVP candidate.

[0046] Profiles, tiers, and levels: Video Coding Standards such as H.264/AVC, H.265/HEVC and VVC are designed to be generic in the sense that they serve a wide range of applications, bit rates, resolutions, qualities and services. Applications should cover, among other things, digital storage media, television broadcasting, and real-time communications. In the course of creating this Specification, various requirements from typical applications have been considered, necessary algorithmic elements have been developed, and these have been integrated into a single syntax that includes a multiplicity of feature sets. These feature sets can be implemented independently, or in any of various combinations. Hence, this

Specification facilitates video data interchange among a variety of different applications. [0047] Considering the practicality of implementing the full feature set of this Specification, however, it is possible to stipulate a limited number of subsets of these features by means of "profiles", "tiers" and "levels". A "profile" is a subset of the entire bitstream syntax that is specified in this Specification. Within the bounds imposed by the syntax of a given profile, it is still possible to have a very large variation in the performance of encoders and decoders, depending upon the values taken by syntax elements in the bitstream, such as the specified size of the decoded pictures. In many applications, it is currently neither practical nor economical to implement a decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.

[0048] In order to overcome the issues inherent in implementing all hypothetical uses of the syntax of a given profile, "tiers" and "levels" are specified within each profile. A level of a tier is a specified set of constraints imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on values. Alternatively, they may take the form of constraints on arithmetic combinations of values (e.g., picture width multiplied by picture height multiplied by number of pictures decoded per second). A level specified for a lower tier is more constrained than a level specified for a higher tier.

[0049] Due to the inherent nature of TMVP, all of the motion information for the reference pictures needs to be stored in order to perform temporal motion vector (MV) prediction. In HEVC, the smallest available block in which to store this motion information is 16x8/8x16. However, to reduce the size of the temporal MV buffer, a scheme of motion information compression can be advantageously introduced in HEVC. Pursuant to this approach, each picture is divided into 16x16 blocks. Only motion information from the top-left 4x4 block in each 16x16 block is used as the representative motion for all of the 4x4 blocks within that 16x16 block. Since one 4x4 MV is stored to represent the 16 4x4 blocks, this approach may be referred to as 16: 1 MV compression.

[0050] FIG. 7A illustrates a representative Motion Vector (MV) for 16: 1 MV compression used in High-Efficiency Video Coding (HEVC), and FIG. 7B illustrates a representative Motion Vector (MV) for 4: 1 MV compression used in VTM-3. As shown in FIG. 7A, the representative 4x4 blocks for each 16x16 blocks are denoted as A 701, B 703, C 705 and D 707. In current VVC (VTM-3.0), a scheme of 4: 1 MV compression is used. As shown in FIG. 7B, the MVs of the top-left 4x4 block (denoted as A 711,B 713, C 715,... P 717) of each 8x8 block are used to represent the MV for all of the 4x4 blocks within the same 8x8 block.

[0051] In the current version of VVC, higher MV precision for MV storage generally requires a larger MV buffer to store the MVs. We propose several methods to reduce the size of the MV buffer when higher MV precision (e.g. 1/8 or 1/16) is enabled. In another aspect, using lower MV precision for MV storage could increase the valid range of the stored MVs when fixed bits (e.g. 16 bits for each MV component) are used to store the MVs.

[0052] Moreover, motion compensation (MC) is usually the largest consumer of memory access bandwidth in a decoder implementation. Therefore, for a video coding standard, providing a reasonable limit on the MC memory access bandwidth requirement is extremely important to facilitate its cost-effective implementation, and to ensure its success across the industry. The memory access bandwidth requirement for MC is usually determined by the operating block size and the type of prediction (e.g. uni-directional or bi-directional) to be performed. In the current version of VVC, there is no limitation in this regard. As a result, the worst-case bandwidth requirement is more than 2x the corresponding worst-case bandwidth for HEVC. In VVC, the worst case of MC memory access bandwidth occurs with the bi directional MC of a 4x4 block, which is utilized by some coding modes that are described in greater detail hereinafter.

[0053] Proposed Methods

[0054] It is noted that the proposed methods described herein could be applied independently, or in any of various combinations.

[0055] Adaptive MV Compression

[0056] FIG. 8 A illustrates a representative Motion Vector (MV) for Vertical 8: 1 MV compression, and FIG. 8B illustrates a representative Motion Vector (MV) for Horizontal 8: 1 MV compression. In order to provide an improved tradeoff between the required size of the TMVP buffer and coding efficiency, we propose using any of two compression schemes, denoted as Horizontal 8: 1 MV compression and Vertical 8: 1 MV compression. As shown in FIG. 8A, for a first 16x8/8x16 block 801, a MV of a top-left 4x4 block 811 is used as the representative MV. Likewise, as shown in FIG. 8B, for a second 16x8/8x16 block 802, a MV of a top-left 4x4 block 821 is used as the representative MV. Further, we propose to apply any of a plurality of different ratio temporal MV compression schemes (e.g. 16: 1, 4: 1, Horizontal 8: 1 or Vertical 8: 1) in response to one or more video parameters, such as a picture resolution (sometimes referred to as a picture size), a profile, or a parameter level.

[0057] Pursuant to one set of examples, 4: 1 or 16: 1 MV compression is applied to the temporal MV buffer in response to any of the picture resolution, the profile, or the parameter level. In one exemplary implementation, when the picture resolution is smaller than or equal to (1280x720), 4: 1 MV compression is applied to the temporal MV buffer. When the picture resolution is larger than (1280x720), 16: 1 MV compression is applied to the temporal MV buffer.

[0058] Pursuant to another set of examples, 4: 1 or Vertical 8: 1 MV compression is applied to the temporal MV buffer in response to the picture resolution, the profile, or the parameter level. In one exemplary implementation, for a picture resolution smaller than or equal to (1280x720), 4: 1 MV compression is applied to the temporal MV buffer. For a picture resolution larger than (1280x720), Vertical 8: 1 MV compression is applied to the temporal MV buffer.

[0059] Adaptive MV Precision for MV Storage

[0060] We propose to store the MVs into the MV buffers in a predefined or signaled MV precision.

Pursuant to one set of illustrative examples, each corresponding MV of the MVs is stored in the MV buffers in a respective predefined MV precision in response to one or more video parameters such as the picture resolution (sometimes referred to as picture size), the profile, or the parameter level. It should be noted that the MV buffers referred to herein include any of the spatial MV buffer, the temporal MV buffer, or the spatial MV line buffer. According to the proposed examples, each of a plurality of respective MV precision levels may be used to store the MVs into any of a plurality of corresponding MV buffers. Moreover, a respective MV precision level used by the MV storage may be selected in response to a corresponding picture resolution.

[0061] Pursuant to one set of examples, when a high level of MV precision is enabled (e.g. 1/8 or 1/16), the proposed method stores the MVs used for temporal MV prediction in any of a plurality of different MV precisions such as 1/16-pel, 1/8-pel, 1/4-pel, 1/2-pel or 1-pel, based upon the picture resolution, the profile, or the parameter level. Specifically, when all of the CUs within one picture/slice are reconstructed, the MVs of each of the CUs are stored in the temporal MV buffer (termed a temporal MV buffer) to be used as a temporal MV prediction for one or more following pictures/slices. We propose to store each of respective MVs into the temporal MV buffer using a corresponding MV precision in response to the picture resolution, the profile, or the parameter level. For example, when the picture resolution is less than or equal to (1280x720), 1/16-MV precision is used to store the MVs in the temporal MV buffer. When the picture resolution is greater than (1280x720), 1/4-MV precision is used to store the MVs in the temporal MV buffer.

[0062] In another set of examples, the size of the MV line buffer is reduced by storing the MVs used for spatial MV prediction across a CTU row in any of a plurality of different MV precisions such as 1/16-pel, 1/8-pel, 1/4-pel, 1/2-pel or 1-pel, in response to any of the picture resolution, the profile, or the parameter level. In yet another set of examples, each of the MVs stored in the spatial MV buffer are stored in any of a plurality of different MV precisions such as 1/16-pel, 1/8-pel, 1/4-pel, 1/2-pel or 1-pel, in response to the picture resolution, the profile, or the parameter level. In other words, some of the MVs generated by the averaging or scaling process could have a higher MV precision (1/16-pel or 1/8-pel), but the stored MVs in the spatial MV buffers for MV prediction are stored using a different, and possibly lower, MV precision. If stored at such a lower resolution, the buffer size may be reduced.

[0063] In yet another set of examples, each of the MVs stored in the MV buffers is stored in any of a plurality of different MV precisions such as 1/16-pel, 1/8-pel, 1/4-pel, 1/2-pel, or 1- pel in response to the picture resolution, the profile, or the parameter level. In other words, the MVs generated by the averaging or scaling process could have a higher MV precision (1/16- pel or 1/8-pel), but the stored MVs in each of the MV buffers for MV prediction are kept in a different, and possibly lower, MV precision. If stored at such a lower resolution, the buffer size may be reduced.

[0064] In yet another set of examples, the MV precision level used to store the MVs into the history MV table (also referred to as a history MV buffer) may be in a different MV precision other than the MV precision used to store the MVs in the temporal MV buffer, or spatial MV buffer, or MV line buffer. For example, a higher MV precision level (e.g. 1/16-pel) can be used to store the MV in the history MV buffer, even when a lower MV precision level is used to store the MVs in the temporal MV buffer or the spatial MV buffer.

[0065] Smallest block size for Motion Compensation

[0066] Pursuant to one set of examples, a smallest block size for motion compensation is determined in response to the video parameters such as the picture resolution (also referred to as the picture size), the profile, or the parameter level. In one example, a 4x4 block is available for motion compensation for each respective picture having a corresponding resolution smaller than or equal to (1280x720); and 4x4 block is NOT available for motion compensation for each respective picture having a corresponding resolution larger than (1280x720). These block size constraints may also include a subblock size constraint for subblock-based inter modes, such as affine motion mode and Subblock-based temporal motion vector prediction.

[0067] In one example, the smallest block size for motion compensation is determined according to the video parameters such as the picture resolution (also referred to as the picture size), the profile, or the parameter level. In one example, 4x4 block is available for both uni- directional and bi-directional motion compensation for each picture having a resolution smaller than or equal to (1280x720); and 4x4 block is NOT available for bi-directional motion compensation for each picture having a resolution larger than (1280x720). The block size constraints may also include a subblock size constraint for subblock-based inter modes, such as affine motion mode and subblock-based temporal motion vector prediction.

[0068] According to a first aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first profile, or a first level; and selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second profile, or a second level.

[0069] In some examples, the first temporal motion vector compression scheme uses a first compression ratio, and the second temporal motion vector compression scheme uses a second compression ratio different from the first compression ratio.

[0070] In some examples, the first compression ratio is selected to be smaller than the second compression ratio in response to the first picture resolution being smaller than or equal to the second picture resolution.

[0071] In some examples, the first compression ratio is selected to be larger than the second compression ratio in response to the first picture resolution being greater than the second picture resolution.

[0072] In some examples, the first compression ratio comprises at least one of 16: 1, 4: 1, Horizontal 8: 1, or Vertical 8: 1.

[0073] According to a second aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first motion vector precision level for storing a first motion vector in a motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture; wherein the first motion vector precision level is different from the second motion vector precision level. [0074] In some examples, the motion vector buffer comprises at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

[0075] In some examples, the first motion vector precision level comprises any of 1/16-pel, 1/8-pel, ¼-pel, ½-pel, or 1-pel.

[0076] In some examples, a plurality of coding units are reconstructed within the first picture or within a slice of the first picture; each of a plurality of motion vectors for each of the plurality of coding units are stored in the temporal motion vector buffer; and the temporal motion vector buffer is used to perform a prediction for one or more successive pictures or successive slices that follow the first picture or the slice of the first picture.

[0077] In some examples, the first motion vector precision level is selected to be smaller than the second motion vector precision level in response to the first picture resolution being smaller than or equal to the second picture resolution.

[0078] In some examples, the spatial motion vector line buffer stores a plurality of motion vectors across a coding tree unit, the plurality of motion vectors including at least the first and second motion vectors, wherein the first motion vector is stored in the spatial motion vector line buffer at the first motion vector precision level, and the second motion vector is stored in the spatial motion vector line buffer at the second motion vector precision level.

[0079] In some examples, an averaging or scaling process generates one or more motion vectors including at least the first motion vector. The one or more motion vectors are generated at a first motion vector precision level. The one or more motion vectors are stored in the spatial motion vector line buffer at the second motion vector precision level.

[0080] In some examples, the second motion vector precision level is selected to be less than the first motion vector precision level.

[0081] In some examples, an averaging or scaling process generates one or more motion vectors including at least the first motion vector. The one or more motion vectors are generated at a first motion vector precision level. The one or more motion vectors are stored in the spatial motion vector line buffer, the temporal motion vector buffer, and the spatial motion vector line buffer at the second motion vector precision level.

[0082] In some examples, the second motion vector precision level is selected to be less than the first motion vector precision level.

[0083] In some examples, a history motion vector buffer stores a plurality of motion vectors, including at least the first motion vector, at the first motion vector precision level. The plurality of motion vectors are stored in at least one of the spatial motion vector buffer, the temporal motion vector buffer, or the spatial motion vector line buffer, at the second motion vector precision level.

[0084] According to a third aspect of the present disclosure, a video coding method is performed at a computing device having one or more processors and memory storing a plurality of programs to be executed by the one or more processors. The method includes selecting a first minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and selecting a second minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture; wherein the first minimum allowable block size is different from the second minimum allowable block size.

[0085] In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to a subblock size constraint for at least one of affine motion prediction or subblock-based temporal motion vector prediction.

[0086] In some examples, the first minimum allowable block size and the second minimum allowable block size are selected in response to at least one constraint for performing bi directional or uni-directional motion compensation.

[0087] In some examples, the first minimum allowable block size is greater than 4 x 4 block when the first picture has a first picture resolution larger than 1280 x 720.

[0088] In one or more examples, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer- readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a

communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the implementations described in the present application. A computer program product may include a computer- readable medium. [0089] Further, the above methods may be implemented using an apparatus that includes one or more circuitries, which include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components. The apparatus may use the circuitries in combination with the other hardware or software components for performing the above described methods. Each module, sub-module, unit, or sub-unit disclosed above may be implemented at least partially using the one or more circuitries.

[0090] Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed here. This application is intended to cover any variations, uses, or adaptations of the invention following the general principles thereof and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

[0091] It will be appreciated that the present invention is not limited to the exact examples described above and illustrated in the accompanying drawings, and that various modifications and changes can be made without departing from the scope thereof. It is intended that the scope of the invention only be limited by the appended claims.

Claims

Claims We Claim:

1. A video coding method comprising:

selecting a first temporal motion vector prediction compression scheme in response to any of a first picture resolution, a first profile, or a first level; and

selecting a second temporal motion vector prediction compression scheme in response to any of a second picture resolution, a second profile, or a second level.

2. The video coding method of claim 1, wherein the first temporal motion vector

compression scheme uses a first compression ratio, and the second temporal motion vector compression scheme uses a second compression ratio different from the first compression ratio.

3. The video coding method of claim 2, further comprising selecting the first compression ratio to be smaller than the second compression ratio in response to the first picture resolution being smaller than or equal to the second picture resolution.

4. The video coding method of claim 2, further comprising selecting the first compression ratio to be larger than the second compression ratio in response to the first picture resolution being greater than the second picture resolution.

5. The video coding method of claim 2, wherein the first compression ratio comprises at least one of 16: 1, 4: 1, Horizontal 8: 1, or Vertical 8: 1.

6. A video coding method comprising:

selecting a first motion vector precision level for storing a first motion vector in a motion vector buffer, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and

selecting a second motion vector precision level for storing a second motion vector in the motion vector buffer, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture;

wherein the first motion vector precision level is different from the second motion vector precision level.

7. The video coding method of claim 6, wherein the motion vector buffer comprises at least one of a spatial motion vector buffer, a temporal motion vector buffer, or a spatial motion vector line buffer.

8. The video coding method of claim 7, wherein the first motion vector precision level comprises any of 1/16-pel, 1/8-pel, ¼-pel, ½-pel, or 1-pel.

9. The video coding method of claim 8, further comprising:

reconstructing a plurality of coding units within the first picture or within a slice of the first picture;

storing each of a plurality of motion vectors for each of the plurality of coding units in the temporal motion vector buffer; and

using the temporal motion vector buffer to perform a prediction for one or more successive pictures or successive slices that follow the first picture or the slice of the first picture.

10. The video coding method of claim 8, further comprising selecting the first motion vector precision level to be smaller than the second motion vector precision level in response to the first picture resolution being smaller than or equal to the second picture resolution.

11. The video coding method of claim 7, further comprising utilizing the spatial motion vector line buffer to store a plurality of motion vectors across a coding tree unit, the plurality of motion vectors including at least the first and second motion vectors, wherein the first motion vector is stored in the spatial motion vector line buffer at the first motion vector precision level, and the second motion vector is stored in the spatial motion vector line buffer at the second motion vector precision level.

12. The video coding method of claim 7, further comprising using an averaging or scaling process to generate one or more motion vectors including at least the first motion vector, the one or more motion vectors being generated at a first motion vector precision level, and storing the one or more motion vectors in the spatial motion vector line buffer at the second motion vector precision level.

13. The video coding method of claim 12, further comprising selecting the second motion vector precision level to be less than the first motion vector precision level.

14. The video coding method of claim 7, further comprising using an averaging or scaling process to generate one or more motion vectors including at least the first motion vector, the one or more motion vectors being generated at a first motion vector precision level, and storing the one or more motion vectors in the spatial motion vector line buffer, the temporal motion vector buffer, and the spatial motion vector line buffer at the second motion vector precision level.

15. The video coding method of claim 14, further comprising selecting the second motion vector precision level to be less than the first motion vector precision level.

16. The video coding method of claim 7, further comprising using a history motion vector buffer to store a plurality of motion vectors, including at least the first motion vector, at the first motion vector precision level; and storing the plurality of motion vectors in at least one of the spatial motion vector buffer, the temporal motion vector buffer, or the spatial motion vector line buffer, at the second motion vector precision level.

17. A video coding method comprising:

selecting a first minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a first picture resolution, a first profile, or a first level associated with a first picture; and

selecting a second minimum allowable block size for performing motion compensation, wherein the selecting is performed in response to any of a second picture resolution, a second profile, or a second level associated with a second picture;

wherein the first minimum allowable block size is different from the second minimum allowable block size.

18. The video coding method of claim 17, further comprising selecting the first minimum allowable block size and the second minimum allowable block size in response to a subblock size constraint for at least one of affine motion prediction or subblock-based temporal motion vector prediction.

19. The video coding method of claim 17, further comprising selecting the first minimum allowable block size and the second minimum allowable block size in response to at least one constraint for performing bi-directional or uni-directional motion compensation.

20. The video coding method of claim 17, wherein the first minimum allowable block size is greater than 4 x 4 block when the first picture has a first picture resolution larger than 1280 x 720.