CN118101969A

CN118101969A - Method and apparatus for syntax processing for video codec system

Info

Publication number: CN118101969A
Application number: CN202410210285.5A
Authority: CN
Inventors: 赖贞延; 林芷仪; 陈庆晔; 庄子德
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2018-02-12
Filing date: 2019-02-11
Publication date: 2024-05-28
Also published as: TWI692973B; GB202013536D0; GB2585304B; CN111869216A; AU2019217409A1; TW201935930A; US11109056B2; KR102483602B1; KR20200117017A; CA3090562C; AU2019217409B2; WO2019154417A1; CN111869216B; CA3090562A1; GB2585304A; US20200374545A1

Abstract

The invention discloses a method and a device for a video coding and decoding system with the current image reference mode. According to one approach, integer motion vector flags are inferred to be true when the current reference picture is equal to the current picture, without issuing integers or resolving motion vector flags. According to another approach, when all motion vector differences for the current block are equal to zero, the integer motion vector flag is inferred to be true, without issuing an integer or resolving the motion vector flag. According to another method, when all the reference pictures of the current block are equal to the current picture: disabling the sub-block predictive coding mode; and encodes or decodes the current block by disabling the sub-block predictive coding mode. Or the derived motion vector associated with a sub-block of the current block may be converted to an integer motion vector.

Description

Method and apparatus for syntax processing for video codec system

Related references

The present invention claims priority to U.S. provisional patent application number 62/629,204, filed on 2018, month 2, and day 12, filed on 2018, month 10, and day 8, filed on to U.S. provisional patent application number 62/742,474, and filed on to U.S. provisional patent application number 62/747,170, filed on 2018, month 10, and day 18, the disclosures of which are incorporated herein by reference in their entirety.

Technical Field

The present invention relates to video codec using current picture reference (current picture referencing, CPR) codec tools. More specifically, the present invention discloses syntax transmission for a codec system that uses CPR codec tools and other codec tools, such as adaptive motion vector resolution (adaptive motion vector resolution, AMVR), sub-block based temporal motion vector prediction (sub-block based temporal motion vector prediction (sbTMVP), or affine prediction (affine prediction).

Background

The High Efficiency Video Codec (HEVC) standard was developed under the joint Video project of the ITU-T Video codec expert group (Video Coding Experts Group, VCEG) and ISO/IEC moving picture expert group (Moving Picture Experts Group, MPEG) standardization bodies, especially in cooperation with a joint collaboration team called Video codec (Joint Collaborative Team on Video Coding, JCT-VC). In HEVC, one Slice (Slice) is partitioned into a plurality of coding tree units (coding tree units, hereinafter referred to as CTUs). In the master profile (profile), the minimum and maximum sizes of CTUs are specified by syntax elements in the sequence parameter set (sequence PARAMETER SET, abbreviated SPS). The allowed CTU size may be 8x8, 16x16, 32x32 or 64x64. For each segment, CTUs within the segment are processed in accordance with a raster scan (RASTER SCAN) order.

The CTU is also divided into a plurality of coding units (multiple coding units, CU for short) to accommodate various local characteristics. A quadtree called a coding tree (coding tree) is used to divide a CTU into a plurality of CUs. The CTU is sized to MxM, where M is one of 64, 32 or 16. The CTU may be a single CU (i.e., not split) or may be divided into four smaller units of the same size (i.e., each size M/2 xM/2), which correspond to nodes of the coding tree. If the unit is a leaf node of the encoding tree, the unit becomes a CU. Otherwise, the quadtree splitting process may be iterated until the size of the node reaches the minimum allowed CU size specified in the Sequence parameter set (Sequence PARAMETER SET, SPS). The representations form a recursive structure specified by the coding tree (also referred to as a split tree structure) 120 in fig. 1. The partitioning of CTUs 110 is shown in fig. 1, where the solid lines represent the boundaries of the CUs. The decision to encode the image region using inter-image (temporal) or intra-image (spatial) prediction is made at the CU layer. Since the minimum CU size may be 8x8, the minimum granularity (granularity) to switch between different basic prediction types is 8x 8.

Furthermore, according to HEVC, each CU may be divided into one or more Prediction Units (PUs). Together with a CU, a PU serves as a basic representative block of shared prediction information. Inside each PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. A CU may be divided into one, two, or four PUs depending on the PU partition type. As shown in fig. 2, HEVC defines eight shapes that decompose a CU into PUs, including partition types 2nx2N,2nxn, nx2N, nxn,2nxnu,2nxnd, nlx2N, and nRx2N. Unlike CU, PU can only be split once according to HEVC. The segmentation shown in the second row (row) corresponds to an asymmetric segmentation, wherein the two segments have different sizes.

After obtaining the residual block through the prediction process based on the PU partition type, the prediction residual of the CU is partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree of the CU as shown in fig. 1. The solid lines represent CU boundaries and the dashed lines represent TU boundaries. A TU is a basic representative block with residual or transform coefficients for applying integer transform (integer transform) and quantization. For each TU, an integer transform of the same size is applied to the TU to obtain residual coefficients. These coefficients are transmitted to a decoder after TU-based quantization.

The terms coding tree block (coding tree block, CTB) Coding Block (CB), prediction Block (PB) and Transform Block (TB) are defined to designate 2-D sample arrays of one color component associated with CTU, CU, PU and TU, respectively. Thus, a CTU consists of one luma CTB, two chroma CTBs and associated syntax elements. Similar relationships are valid for CUs, PUs and TUs. Tree segmentation is typically applied to both luminance and chrominance at the same time, although there are exceptions when some minimum size for chrominance is reached.

Or in JCTP-P1005 (D.F. Flynn et al ,"HEVC Range Extensions Draft6",Joint Collaborative Team on Video Coding(JCT-VC)of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,16th Meeting:San Jose,US,9–17January 2014,Document:JCTVC-P1005), binary tree partition structure is proposed. As shown in FIG. 3, in the proposed binary tree partition structure, a block can be recursively partitioned into two smaller blocks using various binary partition types. Most efficient and simplest are symmetric horizontal partition and symmetric vertical partition as shown in the first two partition types of FIG. 3. For a given block of size MxN, a flag is sent to indicate whether the given block is partitioned into two smaller blocks. If so, another syntax element is sent to indicate which partition type is used. If horizontal partition is used, the given block is partitioned into two blocks of size Mx (N/2). If vertical partition is used, the binary tree partition process can be repeated, since the binary tree has two partition types (i.e., horizontal and vertical), the minimum allowed block width and block height should be indicated. A1 may indicate a vertical split.

Binary tree structures may be used to partition an image region into multiple smaller blocks, such as dividing a slice into CTUs, dividing a CTU into CUs, dividing a CU into PUs, or dividing a CU into TUs, etc. A binary tree may be used to partition CTUs into CUs, where the root node of the binary tree is the CTU and the leaf nodes of the binary tree are the CUs. The leaf nodes may be further processed through prediction and transform coding. For simplicity, there is no further partitioning from CU to PU or from CU to TU, meaning that CU equals PU and PU equals TU. In other words, therefore, leaf nodes of the binary tree are the basic units for prediction and transform coding.

Binary tree structures are more flexible than quadtree structures because more segmentation shapes can be supported, which is also a source of improvement in coding efficiency. However, in order to select the optimal segmentation shape, the coding complexity will also increase. To balance complexity and coding efficiency, a method of combining a quadtree and a binary tree structure, also known as a quadtree plus binary tree (quadtree plus binary tree, QTBT) structure, has been disclosed. According to QTBT structure, the block is first partitioned by a quadtree structure, and quadtree partitioning may iterate until the size of the partitioned block reaches the minimum allowed quadtree node size. If the leaf quadtree block is not greater than the maximum allowable binary tree root node size, then the binary tree structure may be further partitioned, and the binary tree partitioning may iterate until the size (width or height) of the partitioned block reaches the minimum allowable binary leaf node size (width or height) or the binary tree depth reaches the allowable maximum binary tree depth. In the QTBT structure, the minimum allowed quarter tree node size, the maximum allowed binary tree root node size, the minimum allowed binary tree node width and height, and the maximum allowed binary tree depth may be indicated in a high level syntax, such as in SPS. Fig. 5 shows an example of a segmentation of block 510 and its corresponding QTBT 520. The solid line represents a quadtree segmentation and the dashed line represents a binary tree segmentation. In each partition node (i.e., non-leaf node) of the binary tree, one flag indicates which partition type (horizontal or vertical) is used, 0 may indicate horizontal partition, and 1 may indicate vertical partition.

The QTBT structure described above may be used to partition an image region (e.g., a slice, CTU, or CU) into multiple smaller blocks, such as partitioning a slice into CTUs, partitioning a CTU into CUs, partitioning a CU into PUs, partitioning a CU into TUs, and so on. For example, QTBT may be used to partition a CTU into CUs, where the root node of QTBT is a CTU, which is partitioned into multiple CUs through a QTBT structure, and these CUs are further processed through prediction and transform coding. For simplicity, there is no further partitioning from CU to PU or CU to TU. This means that the CU is equal to PU and the PU is equal to TU. Thus, in other words, the leaf nodes of the QTBT structure are the fundamental units of prediction and transformation.

An example of QTBT structure is shown below. For a CTU of size 128x128, the minimum allowed quarter tree node size is set to 16x16, the maximum allowed binary tree root node size is set to 64x64, the minimum allowed binary tree node width and height are both set to 4, and the maximum allowed binary tree depth is set to 4. First, the CTUs are partitioned by a quadtree structure, and the leaf quadtree units may have a size from 16×16 (i.e., the minimum allowed quadtree node size) to 128×128 (equal to the size of the CTUs, not partitioned). If the leaf quadtree unit is 128x128, it cannot be further segmented by the binary tree because the size exceeds the maximum allowed binary tree root node size of 64x64. Otherwise, the leaf quadtree units are further partitioned through the binary tree. The leaf quadtree unit is also a root binary tree unit, whose binary tree depth is 0. When the binary tree depth reaches 4 (i.e., the maximum allowed binary tree as indicated), no segmentation is implied. When the width of the block of the corresponding binary tree node is equal to 4, non-horizontal segmentation is implied. When the height of the block of the corresponding binary tree node is equal to 4, a non-vertical split is implied. QTBT leaf nodes are further processed through prediction (intra-picture or inter-picture) and transform coding.

For an I-slice, the QTBT tree structure typically applies luma/chroma separation coding. For example, QTBT tree structures are applied to the luminance and chrominance components of the I-and B-slices, respectively, and to the luminance and chrominance components of the P-and B-slices simultaneously (except to some minimum size of chrominance is reached). In other words, in the I segment, the luminance CTB has a block division of QTBT structure, and the two chrominance CTBs have a block division of another QTBT structure. In another example, two chroma CTBs may also have their own QTBT-structured block partitions.

For block-based coding, it is always necessary to partition the image into blocks (e.g., CUs, PUs and TUs) for coding purposes. As is known in the art, an image may be segmented into smaller image areas, such as segments, tiles, CTU rows (row) or CTUs, before applying block segmentation. The process of dividing an image into blocks for encoding purposes is referred to as dividing the image using an encoding unit structure. The method of generating a CU, PU and TU by special partitioning employed by HEVC is one example of a Coding Unit (CU) structure. The QTBT tree structure is another example of a Coding Unit (CU) structure.

Current image reference

Motion estimation/compensation is a well-known key technique in hybrid video codec that explores pixel correlation between neighboring images. In a video sequence, the movement of objects between adjacent frames is small and the movement of objects can be modeled by a two-dimensional translational motion. Thus, a pattern (pattern) corresponding to an object or background in a frame is moved in position to form a corresponding object in a subsequent frame or associated with other patterns within the current frame. With an estimate of the movement position (e.g. using block matching techniques), most patterns can be reproduced without re-encoding the pattern. Similarly, block matching and copying are attempted to allow reference blocks to be selected from within the same picture. When this concept is applied to video captured by a camera, inefficiencies are observed. In part, because the text pattern (textual pattern) in spatially adjacent regions may be similar to the current encoded block, but typically has some gradual change in space. Thus, it is difficult for a block to find an exact match within the same image of the video taken by the camera. Therefore, improvement of coding performance is limited.

However, the spatial correlation between pixels within the same picture is different for screen content. For a typical video with text and graphics, there will typically be a repeating pattern in the same image. Thus, intra (image) block compensation has been observed to be very effective. A new prediction mode, intra Block Copy (IBC) mode or referred to as current picture reference (current picture referencing, CPR) has been introduced for screen content coding to take advantage of the feature. In CPR mode, a Prediction Unit (PU) is predicted from a previously reconstructed block within the same picture. Further, a displacement vector (referred to as a block vector or BV) is used to send a relative displacement from the position of the current block to the position of the reference block. The prediction error is then encoded using transform, quantization and entropy coding. An example of CPR compensation is shown in fig. 6, wherein the region 610 corresponds to an image, a slice or an image region to be encoded. Blocks 620 and 630 correspond to two blocks to be encoded. In the example, each block may find a corresponding block (i.e., 622 and 632, respectively) in a previously encoded region in the current image. According to the techniques, the reference samples correspond to reconstructed samples of a currently decoded image prior to a loop filtering operation (in-loop filter operations) included in HEVC, which includes a deblocking filter and a Sample Adaptive Offset (SAO) filter.

In JCTCVC-M0350 (Madhukar Budagavi et al ,"AHG8:Video coding using Intra motion compensation",Joint Collaborative Team on Video Coding(JCT-VC)of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,13th Meeting:Incheon,KR,18–26Apr.2013,Document:JCTVC-M0350)), an early version of CPR was revealed, which was submitted as a candidate technique for HEVC range extension (RExt) development, CPR compensation was limited to a small local area, searching was limited to 1-D block vectors for block sizes of only 2Nx2N in JCVC-M0350, then a more advanced CPR method was developed during HEVC screen content codec (screen content coding, SCC for short) normalization.

In order to efficiently transmit a Block Vector (BV), a BV is predictively transmitted using a BV predictor (BVP) in a similar manner to MV coding. Thus, as shown in fig. 7, a BV difference (BVD) is sent and a BV is reconstructed from BV = BVP + BVD, where reference block 720 is selected for the current block 710 (i.e., CU) in accordance with intra block copy (IntraBC) prediction. One BVP for the current CU is determined. Methods of deriving motion vector predictors (motion vector predictor, MVP) are known in the art. Similar derivation may be applied to BVP derivation.

In JCTVC-N0256 (Pang et al ,"Non-RCE3:Intra Motion Compensation with2-D MVs",Joint Collaborative Team on Video Coding(JCT-VC)of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,14th Meeting:Vienna,AT,25July–2Aug.2013,Document:JCTVC-N0256),2-D intra MC further combined with the transmission path friendly method (PIPELINE FRIENDLY approach):

1. Instead of using an interpolation filter,

2. The MV search area is limited, in two cases:

a. the search area is the current CTU and the left CTU, or

B. The search area is the rightmost four columns (columns) of samples of the current CTU and the left CTU.

In the method proposed by JCTVC-N0256, the method of removing interpolation filters is employed for MC within 2-D pictures, and the method of restricting search areas to current CTUs and left CTUs is employed. Other aspects were either rejected or suggested for further investigation.

Spatial advanced motion vector prediction (advanced motion vector prediction, AMVP for short) is disclosed at JCTVC-O0218("Evaluation of Palette Mode Coding on HM-12.0+RExt-4.1",15th Meeting:Geneva,CH,23Oct.–1Nov.2013,Document:JCTVC-O0218),. Fig. 8 shows a number of possible block vector candidates in previously encoded neighboring block positions according to JCTVC-O0218. In table 1, the positions are described in detail.

TABLE 1

Position of	Description of the invention
		0	In the lower left position of the lower left corner of the current block
1	Left position in the lower left corner of the current block
		2	In the upper right position of the upper right corner of the current block
3	In an upper position in the upper right corner of the current block
		4	In the upper left position of the upper left corner of the current block
5	Left position in upper left corner of current block
		6	In an upper position of the upper left corner of the current block

In HEVC, temporal MV prediction is also used as inter-slice motion compensation in addition to position AMVP prediction. As shown in fig. 9, the temporal predictor is derived from a block (TBR or TCTR) located in a co-located (co-located) picture, which is the first reference picture in reference list 0 or reference list 1. Since a block in which a temporal MVP is located may include two MVs, one MV is from reference list 0 and the other MV is from reference list 1, the temporal MVP is derived from MVs from reference list 0 or reference list 1 according to the following rule.

1. The MV passing through the current picture is first selected.

2. If both MVs pass through the current picture or neither MV passes through the current picture, MVs having the same reference list as the current list are selected.

When CPR is used, only part of the current image can be used as a reference image. Some bitstream conformance constraints are applied to adjust the effective MV values of the reference current picture.

First, one of the following two equations must be true:

Bv_x+ offsetX + nPbSw + xPbs-xCbs < = 0, and (1)

BV_y + offsetY + nPbSh + yPbs – yCbs <＝ 0. (2)

Second, the following wavefront parallel processing (Wavefront Parallel Processing, WPP) condition must be true:

(xPbs+BV_x+offsetX+nPbSw-1)/CtbSizeY–xCbs/CtbSizeY<＝

yCbs/CtbSizeY-(yPbs+BV_y+offsetY+nPbSh-1)/CtbSizeY(3)

In equations (1) to (3), (bv_x, bv_y) are the luma block vectors (i.e., the motion vectors for CPR) for the current PU; nPbSw and nPbSh are the width and height of the current block PU; (xPbS, yPbs) is the position of the top left pixel of the current PU relative to the current image; (xCbs, yCbs) is the position of the upper left pixel of the current CU relative to the current image; and CtbSizeY is the size of the CTU. Considering chroma sample interpolation for CPR mode, offsetX and offsetY are two adjustment offsets in two-dimensional space:

offsetX ＝ BVC_x & 0x7 ？ 2 : 0, (4)

offsetY ＝ BVC_y & 0x7 ？ 2 : 0. (5)

(BVC_x, BVC_y) is a chroma block vector of 1/8 pixel (1/8-pel) resolution in HEVC.

Third, the reference blocks for CPR must be at the same grid/slice boundaries.

Affine motion compensation

Affine models can be used to describe 2D block rotations, as well as 2D deformations that are used to deform squares (rectangles) into parallelograms. The model can be described as follows:

x’＝a0+a1*x+a2*y，

y’ ＝ b0 + b1*x + b2*y。 (6)

in the model, 6 parameters need to be determined. For each pixel (x, y) in the region of interest, a motion vector is determined as the difference between the position of the given pixel (a) and the position of its corresponding pixel in the reference block (a '), i.e., mv=a' -a= (a0+ (a 1-1) x+a2 x y, b0+b1 x+ (b 2-1) y). Thus, the motion vector of each pixel is position dependent.

According to the model, the above parameters can be solved if the motion vectors of three different positions are known. This condition corresponds to 6 parameters being known. Each position with a known motion vector is called a control point. The 6 parameter affine model corresponds to the 3 control point model.

Some exemplary embodiments of affine motion compensation are presented in the technical literature by Li et al ("An affine motion compensation framework for high efficiency video coding",in 2015IEEE International Symposium on Circuits and Systems(ISCAS),24-27May 2015,Pages:525–528) and Huang et al ("Control-Point Representation and Differential Coding Affine-Motion Compensation",IEEE Transactions on Circuits,System and Video Technology(CSVT),Vol.23,No.10,pages 1651-1660,Oct.2013),. In the technical literature of Li et al, when the current block is encoded in merge mode or AMVP mode, affine flags are sent for 2Nx2N block segmentation. If the flag is true, the derivation of the motion vector of the current block follows an affine model. If this flag is false, the derivation of the motion vector for the current block follows a conventional translation model. When affine AMVP mode is used, three control points (3 motion vectors) are transmitted. At each control point location, the MV is predictively encoded. The MVDs for these control points are then encoded and transmitted. In the Huang et al technical literature, different control point positions and predictive coding of MVs in control points have been studied.

A syntax table for affine motion compensation implementation is shown in table 2. As shown in table 2, as indicated by notes (2-1) to (2-3) of the merge mode, if at least one merge candidate is affine encoding and the partition mode is 2Nx2N (i.e., partMode = part_2nx 2N), a syntax element use_affine_flag is transmitted. If the current block size is greater than 8x8 (i.e., (log 2CbSize > 3) and the partition mode is 2Nx2N (i.e., (PartMode = = part_2nx 2N), as shown in notes (2-4) to (2-6) for the B-fragment, syntax element use_affine_flag is transmitted, as shown in notes (2-7) to (2-9), if use_affine_flag indicates that affine model is being used (i.e., use_affine_flag with value 1 is used), information of the other two control points is transmitted to the reference list L0, and as shown in notes (2-10) to (2-12), information of the other two control points is transmitted for the reference list L1.

TABLE 2

And (5) palette coding. In screen content coding, a palette is used to represent a given video block (e.g., CU). In JCTVC-O0218("Evaluation of Palette Mode Coding on HM-12.0+RExt-4.1",15th Meeting:Geneva,CH,23Oct.–1Nov.2013,Document:JCTVC-O0218), by Guo et al:

1. transmission of palettes: the first transmission is the size of the palette, followed by the palette elements.

2. Transmission of pixel values: the pixels of the CU are encoded via sequential scan order. For each location, a flag is first transmitted to indicate whether an operational mode or a duplicate upper mode is used;

2.1 "run mode", in which the palette index is first sent, followed by the value "palette_run" (say M). No more information for the current position and the next M positions needs to be transmitted, as they have the same palette index as the transmitted palette index. The palette index (e.g., i) is shared by all three color components, which means that the reconstructed pixel value is (Y, U, V) = (paletteY [ i ], paletteU [ i ], paletteV [ i ]) (assuming that the color space is YUV).

2.2 "Copy over mode", in which a value "copy_run", say N, is sent to indicate that for the next N positions, including the current one, its palette index is equal to the palette index at the same position in the upper row (row).

3. Transmission of residual: the plurality of palette indices transmitted at stage 2 are converted back to pixel values and used as a prediction. The residual information is transmitted using HEVC residual coding and added to the prediction for reconstruction.

The palette may be predicted (shared) from its left neighbor CU to reduce the bit rate in JCTVC-N0247 (Guo et al ,RCE3:Results of Test 3.1on Palette Mode for Screen Content Coding,Joint Collaborative Team on Video Coding(JCT-VC)of ITU-T SG 16WP 3and ISO/IEC JTC 1/SC 29/WG 11,14th Meeting:Vienna,AT,25July–2Aug.2013,Document:JCTVC-N0247), palette for each component. Palette is predicted (shared) to reduce the bit rate. Each element in the palette is a triplet representing a particular combination of three color components later proposed in Qualcomm. Predictive coding of the palette across CUs is eliminated.

Coding based on the dominant color (or palette). Palette coding is another tool for screen content coding, in which palettes for each color component are created and transmitted. But the palette may be predicted from the palette of the left CU. For palette prediction, each entry in the palette may be predicted from a corresponding palette entry in the upper CU or the left CU.

In particular, three line modes, namely a horizontal mode, a vertical mode and a normal mode, are used for the pixel lines in the horizontal mode.

Further, according to JCTVC-O0182, pixels are classified into main color pixels and exceptional pixels (escape pixels). For the primary color pixels, the decoder reconstructs the pixel values from the primary color index (i.e., palette index) and the palette. For exceptional pixels, the encoder must send further pixel values.

In the present invention, the problems of various aspects of CPR encoding with QTBT structures or luminance/chrominance separation encoding are addressed.

Disclosure of Invention

The present invention proposes a method and apparatus for syntax transmission for video coding and decoding systems, wherein current picture reference (current picture referencing, abbreviated CPR) and adaptive motion vector resolution (adaptive motion vector resolution, abbreviated AMVR) codec tools are enabled. According to a proposed embodiment of the present invention, first, a current reference picture of a current block in a current picture is determined. When the current reference picture is equal to the current picture, the integer motion vector flag is inferred to be true, without the need to issue the integer motion vector flag in the bitstream at the encoder side, or without the need to parse the motion vector flag of the current block from the bitstream at the decoder side. An integer motion vector flag is true to represent the current Motion Vector (MV) in an integer, and an integer motion vector flag is false to represent the current Motion Vector (MV) in a fraction.

According to an embodiment of the present invention, when the integer motion vector flag is true, an additional indication may be further issued in the bitstream at the encoder side or parsed from the bitstream at the decoder side. The additional indication is used to indicate whether integer mode or 4-pixel mode is used. In one embodiment, the integer motion vector flag may be referred to as true regardless of the motion vector difference (motion vector difference, abbreviated MVD) between the current MV and MV predictors. In another embodiment, when the current reference picture is not equal to the current picture, the integer motion vector flag is inferred to be false, without the need to issue the integer motion vector flag in the known bitstream at the encoder side, or parse the motion vector flag of the current block from the bitstream at the decoder side.

According to a second embodiment of the present invention, a motion vector difference of a current block in a current image is determined. When all motion vector differences of the current block are equal to zero, it is inferred that the integer motion vector flag is false, no integer motion vector flag in the bitstream needs to be issued at the encoder side, or no motion vector flag of the current block needs to be parsed from the bitstream at the decoder side. The integer motion vector flag is true to indicate the current Motion Vector (MV) as an integer, and the integer motion vector flag is false to indicate the current Motion Vector (MV) as a fraction.

According to a second embodiment of the present invention, when all motion vector differences of the current block are equal to zero, the integer motion vector flag may be referred to as false, regardless of whether the selected reference picture associated with the current MV is equal to the current picture. In another embodiment, the integer motion vector flag is referred to as true if any motion vector difference of the current block is not equal to zero and the selected reference picture associated with the current MV is equal to the current picture. In yet another embodiment, if any motion vector difference of the current block is not equal to zero and the selected reference picture associated with the current MV is not equal to the current picture. The integer motion vector flag is issued in the bitstream at the encoder side or the integer motion vector flag of the current block is parsed in the bitstream at the decoder side. In yet another embodiment, the integer motion vector flag is referred to as false only if the selected reference picture associated with the current MV is not equal to the current picture.

According to a third embodiment of the present invention, syntax transmission of a video encoding system and a video decoding system is disclosed, wherein a Current Picture Reference (CPR) codec tool and a sub-block prediction codec mode are enabled. All reference pictures of the current block in the current picture are determined. When all the reference images of the current block are equal to the current image, disabling the subblock prediction encoding and decoding mode; and the current block is encoded at the encoder side or decoded at the decoder side by disabling the subblock prediction codec mode.

According to the third embodiment of the present invention, when all reference pictures are equal to the current picture, a syntax element indicating a subblock prediction codec mode is not issued in the bitstream at the encoder side or the syntax element of the current block is not parsed from the bitstream at the decoder side. In another embodiment, the syntax element for indicating the sub-block predictive codec mode is inferred to be false when all reference pictures of the current block are equal to the current picture. In yet another embodiment, a syntax element for indicating a sub-block prediction codec mode is constrained to indicate that the sub-block prediction codec mode is disabled when all reference pictures of the current block are equal to the current picture. The sub-block predictive codec mode may be associated with an affine predictive codec tool or a sub-block based temporal motion vector predictive (subblock based temporal motion vector prediction, sbTMVP for short) codec tool.

According to a fourth embodiment of the present invention, syntax transmission of a video coding system and a video decoding system is disclosed, wherein a current picture reference (current picture referencing, abbreviated CPR) codec tool and a sub-block prediction codec mode are enabled. All reference pictures of the current block in the current picture are determined. When all reference pictures of the current block are equal to the current picture: the derived motion vector associated with the sub-block of the current block is converted to an integer motion vector; and, the current motion vector of the current block is encoded at the encoder side or decoded at the decoder side using the integer motion vector as a motion vector predictor.

Drawings

Fig. 1 is a block division example illustrating division of coding tree units (unit CTUs) into Coding Units (CUs) using a quadtree structure.

Fig. 2 illustrates asymmetric motion partitioning (ASYMMETRIC MOTION PARTITION, AMP) according to high efficiency Video Coding (HIGH EFFICIENCY Video Coding, HEVC), wherein AMP defines eight shapes that partition a CU into PUs.

FIG. 3 is an example of various binary partition types used to illustrate a binary tree partition structure, where a partition type may be used to recursively partition a block into two smaller blocks.

Fig. 4 is an example showing a block partition and its corresponding binary tree, where in each partition node (i.e., non-leaf node) of the binary tree, a syntax is used to indicate which partition type (horizontal or vertical) is used, where 0 represents horizontal partition and 1 represents vertical partition.

Fig. 5 is an example showing a block segmentation and a quadtree plus binary tree (quadtree plus binary tree, QTBT) structure, where the solid line represents the quadtree segmentation and the dashed line represents the binary tree segmentation.

Fig. 6 is an example showing CPR compensation, wherein the region 610 corresponds to an image, slice or image region to be encoded. Blocks 620 and 630 correspond to two blocks to be encoded.

Fig. 7 is a diagram illustrating an example of predictive Block Vector (BV) encoding in which BV differences (block vector difference, BVD for short) corresponding to the difference between the current BV and BV predictors are signaled.

Fig. 8 illustrates a number of possible block vector candidates in previously encoded neighboring block position space Advanced Motion Vector Prediction (AMVP).

Fig. 9 shows that the temporal predictor is derived from a block (TBR or TCTR) located in a co-located (co-located) picture, wherein the co-located picture is the first reference picture in reference list 0 or reference list 1.

Fig. 10 shows a flowchart of an exemplary encoding system with current image reference (CPR) and Adaptive Motion Vector Resolution (AMVR) codec tools according to an embodiment of the invention.

Fig. 11 shows a flowchart of an exemplary encoding system with current image reference (CPR) and Adaptive Motion Vector Resolution (AMVR) codec tools according to another embodiment of the present invention.

Fig. 12 shows a flowchart of an exemplary encoding system with a Current Picture Reference (CPR) codec tool and an enabled sub-block predictive codec mode according to another embodiment of the present invention.

Fig. 13 shows a flowchart of an exemplary encoding system with a Current Picture Reference (CPR) codec tool and an enabled sub-block predictive codec mode according to another embodiment of the present invention.

Detailed Description

The following description is of the best mode for carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by the claims.

In video coding based on the original quadtree plus binary tree (quad-tree plus binary tree, QTBT) structure and luma/chroma separation coding, luma and chroma are coded separately for all intra frames (e.g., I slices). In the following, various aspects of luma/chroma separation coding and syntax transmission using CPR mode are disclosed.

CPR with affine motion compensation

If affine motion compensation is enabled and affine flags are sent before the reference picture index, the reference picture index (Ref-idx) for list 0 or list 1 or for both list 0 and list 1 will need to be sent on the encoder side or parsed on the decoder side. But according to an embodiment of the invention the current image is removed from the reference image list because when affine mode is selected the current image is not in the reference image list. Accordingly, the length of the codeword of the reference picture index is reduced and the encoding efficiency is improved.

CPR with adaptive motion resolution

In video coding systems that support Adaptive Motion Vector Resolution (AMVR), a Motion Vector (MV) or its derivative (i.e., motion Vector Difference (MVD) or Motion Vector Predictor (MVP)) may be represented in various resolutions (i.e., integer and fractional). A flag (i.e., imv-flag) is used to indicate selection. The integer MV flag (imv-flag) being true indicates that integer MVs are used. In this case, an integer MV mode or a 4-pixel mode may be used. Additional bits are used to represent the selected integer mode or 4-pixel mode. If the integer MV flag (imv-flag) is true, then the MVP needs to be rounded to an integer.

Furthermore, the present invention discloses 3 different types of AMVR signaling. In a first type of AMVR signaling, imv-flag and Imv modes are issued, where the Imv-flag signaling is independent of MVD. An example of a syntax design according to the first type of AMVR signaling is as follows:

in the above example, bold characters are used to represent the grammar encoding. In the above example, imv-flag is still sent out when mvd=0. In this case, the MVs may be original MVPs (i.e., imv-flag is false) or rounded MVPs (i.e., imv-flag is true).

In a second type of AMVR signaling, imv-flag, imv mode, and MVD are sent, where Imv-flag signaling depends on MVD. Examples of syntactic designs according to the second type of AMVR signaling are as follows:

in the above case, imv-flag is inferred to be 0 when mvd=0, and mv can only be MVP.

In a third type of AMVR signaling, IMV flags, IMV mode, and MVD are issued. Examples of syntactic designs according to the third type of AMVR signaling are as follows:

In the above case, ref-idx, MVP and MVD are encoded after imv-flag.

In conventional grammar designs, there may be some redundancy. To improve coding efficiency, various syntax designs related to CPR and AMVR are disclosed. In one embodiment, if adaptive motion resolution (AMVR) for list 0 or list 1 or for both list 0 and list 1 is enabled and the AMVR is issued before the reference picture index, then the reference picture index needs to be issued or parsed for list 0 or list 1 or for both list 0 and list 1.

If the integer MV flag (imv-flag) is preceded by a list 0 or list 1 or list

Reference picture indexes for both list 0 and list 1 and for list 0 or list 1 or for list 0 and list

1 Is equal to the current picture, according to an embodiment of the invention, the integer MV flags

Imv-flag is inferred to be true. Thus, the integer MV flag imv-flag need not be issued for either list 0 or list 1 or for both list 0 and list 1.

When a 4-pixel integer MV mode is employed as one of the integer MV modes, an integer MV index (imv _idx) may be issued according to one embodiment of the present invention. When imv _idx is 0, a fractional MV (e.g., a quarter MV) is used; when imv _idx is 1, an integer MV is used; when imv _idx is 2, 4-pixel MV is used.

The above-described embodiments may be implemented by modifying existing signaling designs. For example, the first type of grammar design may be modified as follows:

In the above example, when the reference picture is not equal to the current picture (i.e., "if (ref |=cpr)"), grammar imv-flag is transmitted. Otherwise (i.e., the "else" case), imv-flag is inferred to be true (i.e., "imv-flag=true"). The above embodiment may be implemented by modifying the second type of grammar design, which may be modified as follows:

/>

In the above example, when the reference picture is not equal to the current picture (i.e., "if (ref |=cpr)"), if the MVD is not equal to zero (i.e., "if (MVD |=0))"), syntax imv-flag is transmitted; otherwise (i.e., the "else" case), imv-flag is inferred as false (i.e., "imv-flag=false"). Imv-flag is inferred to be true (i.e., "imv-flag=true") when the reference picture is equal to the current picture (i.e., "else" case of "if (ref |=cpr)").

If the integer MV flag (imv-flag) is preceded by a list 0 or list 1 or list

The reference picture index of both 1 is equal to the current picture and imv _ idx can only be greater than 0, such as 1 or 2. In one embodiment, a binary symbol (bin) is issued to indicate whether imv _idx is equal to 1 or 2.

In the above example, the syntax of "imv-flag" and "imv-mode" is combined into a new syntax "imv _idx". When imv _idx is 0, a fractional MV (e.g., quarter MV) is used; when imv _idx is 1, an integer MV is used; when imv _idx is 2, 4-pixel MV is used.

In the above example, if imv-flag is inferred to be true, it means that imv _idx should be 1 or 2; if imv-flag is false, imv _idx is 0.

In one embodiment, imv _idx is binarized using truncated binary codes, such as 1-bit code "0" for imv _idx=0, 2-bit code "10" for imv _idx=1, and 2-bit code "11" for imv _idx=2. imv-flag may be considered as the first bin of imv _idx and imv-mode may be considered as the second bin of imv _idx. The above-described "imv-flag" and "imv-mode" syntax transmissions may be converted to "imv _idx" syntax transmissions. For example, the following pseudocode may be used to implement the described embodiments:

When ref equals CPR, imv _idx should be 1 or 2 (i.e., the first bin of imv _idx is inferred to be 1 because imv-flag is inferred to be 1). In another embodiment, imv _idx is inferred to be 0 if imv-flag is inferred to be 0.

In other embodiments, if the reference picture index is issued before the integer MV flag, then the reference picture is equal to the current picture and the MVDs in list 0 or list 1 or both list 0 and list 1 are equal to zero, and then the integer MV flag in list 0 or list 1 or both list 0 and list 1 is inferred to be false. Therefore, it is not necessary to issue integer MV flags at the encoder side or parse the integer MV flags at the decoder side. In other words, if the integer MV flag in list 0 or list 1 is false, or the integer MV flags in both list 0 and list 1 are false, the reference picture in list 0 or list 1, or both list 0 and list 1, is equal to the current picture, the MVD representing the target reference picture is equal to zero. In this disclosure, the phrase "issuing or parsing a syntax element" may be used for convenience. It should be understood that it corresponds to the abbreviation "parse syntax elements issued at encoder side or at decoder side".

The above-described embodiments may be implemented by modifying the second type of grammar design as follows:

/>

In another embodiment, the integer MV flag is inferred to be false only when the MVD in list 0 or list 1, or both list 0 and list 1, is zero and the selected reference picture is not equal to the current picture. The embodiment may be implemented by modifying the second type of grammar design as follows:

another exemplary signaling design for the embodiment is as follows:

/>

In yet another embodiment, the integer MV flag is inferred to be true when the selected reference picture is equal to the current picture, regardless of MVD. The described embodiments may be implemented by modifying the second type of grammar design, which may be modified as follows:

CPR with adaptive motion resolution and affine motion compensation

If the affine flags of list 0 or list 1 or both list 0 and list 1 are issued at the encoder side or the affine flags of list 0 or list 1 or both list 0 and list 1 are parsed at the decoder side before the integer MV flags and the reference picture indexes, then both the integer MV flags and the reference picture indexes need to be issued or parsed for list 0 or list 1 or both list 0 and list 1. But if an affine pattern is used (e.g. affine flag equals 1), the current image may be deleted from the reference image list. Accordingly, the length of the codeword of the reference picture index can be reduced.

If the integer MV flag is issued or parsed before the affine flag and the reference picture index, the affine flag and the reference picture index need to be issued or parsed. Similarly, if a fractional MV mode is used (e.g., integer MVs are disabled), the current picture may be removed from the reference picture list. Accordingly, the length of the codeword of the reference picture index can be reduced.

If the reference picture index of list 0 or list 1 or both list 0 and list 1 is issued or parsed before the affine flag and/or integer MV flag, the affine flag is inferred to be false if the reference picture is equal to the current frame. Thus, there is no need to issue or parse affine flags for either list 0 or list 1 or both list 0 and list 1. Likewise, the integer MV flag for either List 0 or List 1 or both List 0 and List 1 is inferred to be true (or imv _idx is equal to 1 or 2 according to an embodiment of the present invention). However, in other embodiments, under the above conditions, if the MVD of list 0 or list 1 or both list 0 and list 1 is equal to zero, then the integer MV flag is inferred to be false.

CPR with sub-block mode

A sub-block mode (e.g., sbTMVP (sub-block based temporal motion vector prediction) (or also referred to as alternative temporal motion vector prediction (ALTERNATIVE TEMPORAL MOTION VECTOR PREDICTION, short ATMVP) or sub-block temporal merging mode/candidate) or affine prediction) may be used to improve coding efficiency. For these types of sub-block patterns, they may be collected to be shared as a candidate list, referred to as a sub-block pattern candidate list. In skip mode coding, merge mode coding, or AMVP mode coding (i.e., inter mode coding), a flag may be issued to indicate whether to use the sub-block mode. If the sub-block mode is used, a candidate index is issued or inferred to select one of the sub-block candidates. The sub-block candidates may include sub-block temporal merging candidates, affine candidates and/or planar MV mode candidates. In one embodiment, if a CPR mode (which may be implicitly indicated or explicitly indicated using a flag or any other syntax element) is used or selected, and there are no other inter reference pictures (e.g., all reference pictures are current pictures, meaning that the current picture is the only reference picture for the current block), the sub-block mode is disabled. In some embodiments, if a flag indicating CPR mode is selected, it is inferred that the current image is the only reference image of the current block. In the syntax design, the sub-block mode syntax is not sent (e.g., the sub-block mode flag is inferred to be false) or the sub-block mode syntax is constrained to disable the sub-block mode (e.g., the sub-block mode flag is constrained to be false, as a bitstream conformance requirement, the sub-block mode flag is false). The sub-block mode is limited to be applied in the skip mode and the merge mode. In another embodiment, when sub-block mode is used in CPR (e.g., CPR mode is used and no other inter reference picture or selected reference picture is the current picture), the derived motion vector for each sub-block is also rounded to an integer MV. The above proposed method may be implemented in an encoder and/or decoder. For example, the proposed method may be implemented in an inter prediction module of an encoder and/or an inter prediction module of a decoder.

New conditions for dual tree coding allowing use of intra copy (intraBC) mode

In HEVC SCC extensions, if an I-slice with intraBC modes is enabled, the I-slice will be encoded as an inter-slice. The switch flag for intraBC mode may be indicated by checking the reference frame list. If the current frame is inserted into the reference frame list, intraBC mode is enabled.

Furthermore, in BMS2.1 reference software, a double tree is enabled for the I-slices, with separate coding unit partitions applied to the Luma and Chroma signals. To better integrate intraBC modes and dual-tree coding, dual-tree coding in inter-slices (e.g., P-slices or B-slices) is allowed if only one reference frame is put into the reference list and the reference frame is the current frame.

In the present invention, CPR, affine prediction AMVR, ATMVP, intraBC is a technique for video codec. These techniques are also referred to as codec tools in this disclosure.

The above disclosed invention may be incorporated in various forms in various video encoding or decoding systems. For example, the invention may be implemented using hardware-based methods such as application specific integrated circuits (INTEGRATED CIRCUITS, IC for short), field programmable gate arrays (field programmable logic array, FPGA for short), digital signal processors (DIGITAL SIGNAL processors, DSP for short), central processing units (central processing unit, CPU for short), and the like. The present invention may also be implemented using software code or firmware code which is executed on a calculator, laptop, or mobile device such as a smart phone. Furthermore, the software code or firmware code may be executed on a hybrid platform such as a CPU with a dedicated processor (e.g., video encoding engine or co-processor).

Fig. 10 shows a flow chart of an exemplary codec system with current image reference (current picture referencing, abbreviated CPR) and adaptive motion vector resolution (adaptive motion vector resolution, abbreviated AMVR) codec tools according to an embodiment of the invention. The steps shown in the flowcharts, as well as other subsequent flowcharts in this disclosure, may be implemented as program code executable on one or more processors (e.g., one or more CPUs) on the encoder side and/or decoder side. The steps shown in the flowcharts may also be implemented based on hardware, such as one or more electronic devices or processors arranged to perform the steps in the flowcharts. According to the method, in step 1010, a current reference picture for a current block in a current picture is determined. In step 1020, when the current reference picture is equal to the current picture, the integer motion vector flag is inferred to be true, no integer motion vector flag needs to be sent in the bitstream at the encoder side, or no integer motion vector flag needs to be parsed from the bitstream at the decoder side for the current block, where the integer motion vector flag is true to indicate that the current Motion Vector (MV) is represented in an integer and the integer motion vector flag is false to indicate that the current Motion Vector (MV) is represented in a fraction.

Fig. 11 shows a flowchart of an exemplary codec system with current image reference (CPR) and Adaptive Motion Vector Resolution (AMVR) codec tools according to another embodiment of the present invention. According to the method, in step 1110, a current reference picture for a current block in a current picture is determined. In step 1120, when all motion vector differences of the current block are equal to zero, the integer motion vector flag is inferred to be false, no integer motion vector flag needs to be sent in the bitstream at the encoder side, or the integer motion vector flag is parsed from the bitstream at the decoder side for the current block, where the integer motion vector flag is true to represent the current Motion Vector (MV) in an integer and the integer motion vector flag is false to represent the current Motion Vector (MV) in a fraction.

Fig. 12 shows a flow chart of an exemplary codec system with a Current Picture Reference (CPR) codec tool and a sub-block predictive codec mode according to yet another embodiment of the present invention. According to the method, at least one reference picture for a current block in a current picture is determined in step 1210. In step 1220, when the current picture is the only reference picture for the current block: the sub-block predictive codec mode is disabled; and encodes the current motion vector of the current block at the encoder side or decodes the current motion vector at the decoder side by disabling the subblock predictive coding mode.

Fig. 13 shows a flow chart of an exemplary codec system with a Current Picture Reference (CPR) codec and a sub-block predictive codec mode according to yet another embodiment of the present invention. According to the method, at least one reference picture for a current block in a current picture is determined in step 1310. In step 1320, when the current picture is the only reference picture for the current block: the derived motion vectors associated with the sub-blocks in the current block are converted to integer motion vectors; and, the current motion vector of the current block is encoded at the encoder side or decoded at the decoder side using the integer motion vector as a motion vector predictor.

The flow chart shown is intended to illustrate an example of an exemplary video codec according to the present invention. One of ordinary skill in the art may modify each step, rearrange steps, split steps, or combine steps to practice the invention without departing from the spirit of the invention. In this disclosure, specific syntax and semantics have been used to illustrate examples of implementing embodiments of the invention. Those of ordinary skill in the art may practice the invention with the same grammars and semantics instead of those described without departing from the spirit of the invention.

The previous description is presented to enable any person skilled in the art to make or use the invention in the context of a particular application and its requirements. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. In the previous detailed description, numerous specific details were set forth in order to provide a thorough understanding of the present invention. However, those of ordinary skill in the art will appreciate that the present invention may be practiced.

Embodiments of the invention as described above may be implemented in various hardware, software code or a combination of both. For example, embodiments of the invention may be one or more circuits integrated into a video compression chip, or program code integrated into video compression software to perform the processes described herein. Embodiments of the invention may also be program code to be executed on a digital signal Processor (DIGITAL SIGNAL Processor, DSP) to perform the processes described herein. The invention may also relate to a number of functions performed by a computer processor, a digital signal processor, a microprocessor or a field programmable gate array (field programmable GATE ARRAY, FPGA for short). The processors may be configured to perform particular tasks according to the invention by executing machine readable software code or firmware code that defines the particular methods in which the invention is embodied. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software code, and other ways of configuring code to perform tasks consistent with the invention will not depart from the spirit and scope of the invention.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method for syntax processing for a video coding system and a video decoding system, wherein a current picture reference codec and a sub-block prediction codec mode are enabled, the method comprising:

determining at least one reference image of a current block in the current image; and

When the current picture is the only reference picture for the current block:

disabling the sub-block predictive coding mode; and

The current block is encoded at the encoder side or decoded at the decoder side by disabling the subblock predictive codec mode.

2. The method for syntax processing of a video coding system and a video decoding system according to claim 1, wherein when said current picture is said unique reference picture of said current block, syntax elements used to indicate said sub-block predictive codec mode do not need to be issued in a bitstream at said encoder side or parsed in said bitstream at said decoder side for said current block.

3. The method for syntax processing for a video coding system and a video decoding system according to claim 1, wherein a syntax element used to indicate the sub-block predictive codec mode is inferred to be false when the current picture is the only reference picture of the current block.

4. The method for syntax processing for a video coding system and a video decoding system according to claim 1, wherein when said current picture is said unique reference picture for said current block, a syntax element used to indicate said sub-block predictive codec mode is constrained to indicate that said sub-block predictive codec mode is disabled.

5. The method for syntax processing for a video coding system and a video decoding system according to claim 1, wherein the sub-block predictive codec mode is associated with an affine predictive codec tool or a sub-block based temporal motion vector predictive codec tool.

6. An apparatus for syntax processing of a video encoding system and a video decoding system, wherein a current image reference codec tool and a sub-block prediction codec mode are enabled, the apparatus comprising one or more electronic circuits or one or more processors configured to:

When the current picture is the only reference picture for the current block:

disabling the sub-block predictive coding mode; and

7. A method for syntax processing for a video coding system and a video decoding system, wherein a current picture reference codec and a sub-block prediction codec mode are enabled, the method comprising:

When the current picture is the only reference picture for the current block:

converting a derived motion vector associated with a sub-block of the current block into an integer motion vector; and

The integer motion vector is used as a motion vector predictor, the current motion vector of the current block is encoded at the encoder side, or the current motion vector of the current block is decoded at the decoder side.

8. An apparatus for syntax processing of a video encoding system and a video decoding system, wherein a current image reference codec tool and a sub-block prediction codec mode are enabled, the apparatus comprising one or more electronic circuits or one or more processors configured to:

When the current picture is the only reference picture for the current block: