CN113302936A - Control method for Merge with MVD - Google Patents

Control method for Merge with MVD Download PDF

Info

Publication number
CN113302936A
CN113302936A CN202080008242.9A CN202080008242A CN113302936A CN 113302936 A CN113302936 A CN 113302936A CN 202080008242 A CN202080008242 A CN 202080008242A CN 113302936 A CN113302936 A CN 113302936A
Authority
CN
China
Prior art keywords
resolution
mmvd
distance
pixels
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080008242.9A
Other languages
Chinese (zh)
Other versions
CN113302936B (en
Inventor
张凯
张莉
刘鸿彬
许继征
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Original Assignee
Beijing ByteDance Network Technology Co Ltd
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd, ByteDance Inc filed Critical Beijing ByteDance Network Technology Co Ltd
Publication of CN113302936A publication Critical patent/CN113302936A/en
Application granted granted Critical
Publication of CN113302936B publication Critical patent/CN113302936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Systems, methods, and devices for video processing are disclosed. An exemplary method for video processing comprises: performing a conversion between a current video block of a video and a bitstream representation of the video, the current video block being converted as a Coding Unit (CU) in a Merge (MMVD) mode with motion vector differences, and the conversion comprising determining MMVD side information for the current video block based on an indication from a level above a CU level.

Description

Control method for Merge with MVD
The present application is intended to claim in time the priority and benefits of international patent application No. pct/CN2019/070636 filed No. 1/7 in 2019 and international patent application No. pct/CN2019/071159 filed No. 1/10 in 2019, according to applicable patent laws and/or rules under the paris convention. The entire disclosure of which is incorporated by reference as part of the disclosure of the present application.
Technical Field
This document relates to video and image encoding and decoding.
Background
Digital video accounts for the largest bandwidth usage on the internet and other digital communication networks. As the number of networked user devices capable of receiving and displaying video increases, the demand for bandwidth for digital video usage is expected to continue to grow.
Disclosure of Invention
This document discloses video codec tools that, in one example aspect, improve signaling of motion vectors for video and image codecs.
In one aspect, a method for video processing is disclosed, comprising: performing a conversion between a current video block of a video and a bitstream representation of the video, wherein the current video block is converted as a Codec Unit (CU) in a Merge (MMVD) mode with a motion vector difference and the conversion comprises determining MMVD side information of the current video block based on an indication from a level above the CU level, wherein, in the MMVD mode, at least one Merge candidate is selected and further refined based on the MMVD side information.
In another aspect, a method for video processing is disclosed, comprising: determining a ratio of similar or identical blocks within at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs), or a group of blocks; and determining whether at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs), or a group of blocks is screen content based on the ratio.
In one aspect, an apparatus in a video system is disclosed, the apparatus comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method in any of the above examples.
In one aspect, a computer program product stored on a non-transitory computer readable medium is disclosed, the computer program product comprising program code for performing the method in any of the above examples.
In yet another example aspect, the above method may be implemented by a video encoder apparatus or a video decoder apparatus including a processor.
In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.
These and other aspects are further described in this document.
Drawings
Fig. 1 shows an example of a simplified affine motion model.
Fig. 2 shows an example of an affine Motion Vector Field (MVF) of each sub-block.
Fig. 3A and 3B show examples of a 4-parameter affine model and a 6-parameter affine model, respectively.
Fig. 4 shows an example of a Motion Vector Predictor (MVP) of the AF _ INTER.
Fig. 5A and 5B show examples of candidates for AF _ MERGE.
Fig. 6 shows an example of candidate positions of the affine Merge mode.
FIG. 7 illustrates an example of a distance index and distance offset mapping.
Fig. 8 shows an example of a final Motion Vector Expression (UMVE) search process.
FIG. 9 illustrates an example of UMVE search points.
FIG. 10 is a flow diagram of an example method for video processing.
FIG. 11 is a flow diagram of another example method for video processing.
FIG. 12 shows an example of a hardware platform for implementing the techniques described in this document.
Detailed Description
This document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.
Section headings are used in this document for clarity, and do not limit embodiments and techniques to corresponding sections. As such, embodiments of one section may be combined with embodiments of other sections.
1. Overview
This patent document relates to video coding and decoding techniques. In particular, it relates to motion compensation in video coding. It can be applied to existing Video codec standards, such as HEVC, or to upcoming standards (e.g., Versatile Video Coding (VVC)). It may also be applied to future video codec standards or video codecs.
2. Introductory notes
The video codec standard has evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T has established H.261 and H.263, ISO/IEC has established MPEG-1 and MPEG-4 visualizations, and these two organizations have jointly established the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is utilized. In order to explore future Video coding and decoding technologies beyond HEVC, VCEG and MPEG united in 2015 to form Joint Video Exploration Team (jfet). Thereafter, JFET adopted many new methods and placed them into a reference software named Joint Exploration Model (JEM). In month 4 of 2018, the joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) holds in an effort to the multifunctional video codec (VVC) standard, with a 50% reduction in bitrate compared to HEVC.
2.1 affine motion compensated prediction
In HEVC, only the translational Motion model is applied to Motion Compensation Prediction (MCP). While in the real world there are many kinds of movements, such as zoom in/out, rotation, perspective movement and other irregular movements. In JEM, a simplified affine transform motion compensated prediction is applied. As shown in fig. 1, the affine motion field of a block is described by two control point motion vectors.
The Motion Vector Field (MVF) of a block is described by the following equation:
Figure BDA0003151189140000041
wherein (v)0x,v0y) Is the motion vector of the upper left corner control point, and (v)1x,v1y) Is the motion vector of the upper right hand corner control point.
To further simplify the motion compensated prediction, sub-block based affine transform prediction is applied. The subblock size M N is derived from equation 2, where MvPre is the motion vector fractional precision (1/16 in JEM), (v) and2x,v2y) Is the motion vector of the lower left control point calculated according to equation 1.
Figure BDA0003151189140000042
After being derived from equation 2, M and N should be adjusted downward, if necessary, to be factors of w and h (divosor), respectively.
To derive the motion vector for each M × N sub-block, the motion vector for the center sample of each sub-block is calculated according to equation 1 and rounded to a fractional precision of 1/16, as shown in fig. 2.
After MCP, the high precision motion vector of each sub-block is rounded and saved to the same precision as the normal motion vector.
2.1.1AF _ INTER mode
In JEM, there are two affine motion patterns: AF _ INTER mode and AF _ MERGE mode. For CUs with a width and height larger than 8, the AF _ INTER mode may be applied. An affine flag at the CU level is signaled in the bitstream to indicate whether AF _ INTER mode is used. In this mode, neighboring blocks are used to construct a block with a motion vector pair { (v)0,v1)|v0={vA,vB,vc},v1={vD,vE} of the candidate list. As shown in FIG. 4, v is selected from the motion vectors of block A, block B, or block C0. The motion vectors from the neighboring blocks are scaled according to the reference list and the relationship between the POC of the reference of the neighboring block, the POC of the reference of the current block, and the POC of the current CU. And selecting v from neighboring blocks D and E1The method of (3) is similar. If the number of candidate lists is less than 2, the list is populated by pairs of motion vectors that are composed by copying each AMVP candidate. When the candidate list is larger than 2, the candidates are first sorted according to the consistency of the neighboring motion vectors (similarity of two motion vectors in a pair of candidates), and only the first two candidates are retained. The RD cost is used to determine which Motion Vector pair is selected as a candidate for Control Point Motion Vector Prediction (CPMVP) of the current CU. And areAnd signaling an index indicating the location of the CPMVP in the candidate list in the bitstream. After determining the CPMVP of the current affine CU, affine Motion estimation is applied and a Control Point Motion Vector (CPMV) is found. Then, the difference of CPMV and CPMVP is signaled in the bitstream.
FIG. 3A shows an example of a 4-parameter affine model. FIG. 3B shows an example of a 6-parameter affine model.
In AF _ INTER mode, 2/3 control points are needed when 4/6 parameter affine mode is used, and thus 2/3 MVDs need to be codec for these control points, as shown in fig. 3A. In an example, it is proposed to derive MVs, e.g. from mvd, as follows0Median prediction mvd1And mvd2
Figure BDA0003151189140000051
Figure BDA0003151189140000052
Figure BDA0003151189140000053
Wherein
Figure BDA0003151189140000054
mvdiAnd mv1The predicted motion vector, the motion vector difference, and the motion vector of the upper left pixel (i ═ 0), the upper right pixel (i ═ 1), or the lower left pixel (i ═ 2), respectively, are shown in fig. 3B. Note that the addition of two motion vectors (e.g., mvA (xA, yA) and mvB (xB, yB)) is equal to the separate summation of two components, i.e., newMV ═ mvA + mvB, with the two components of newMV set to (xA + xB) and (yA + yB), respectively.
2.1.2 fast affine ME Algorithm in AF _ INTER mode
In affine mode, the MVs of 2 or 3 control points need to be jointly determined. Direct joint search for multiple MVs is computationally complex. A fast affine ME algorithm is proposed and applied to a VTM/BMS.
The fast affine ME algorithm is described for a 4-parameter affine model, and the idea can be extended to a 6-parameter affine model.
Figure BDA0003151189140000055
Figure BDA0003151189140000056
Replacing (a-1) with a', the motion vector can be rewritten as:
Figure BDA0003151189140000057
assuming that the motion vectors of the two control points (0,0) and (0, w) are known, we can derive affine parameters from equation (5),
Figure BDA0003151189140000061
the motion vector can be rewritten in vector form as:
Figure BDA0003151189140000062
wherein
Figure BDA0003151189140000063
Figure BDA0003151189140000064
P ═ x, y is the pixel location.
At the encoder, the AF _ INTE is iteratively derivedThe MVD of R. Mixing MVi(P) is expressed as the MV deduced in the ith iteration of position P and will be dMVC iDenoted as MV in the ith iterationCThe increment of the update. Then, in the (i +1) th iteration,
Figure BDA0003151189140000065
mix PicrefDenoted as reference picture, and PiccurExpressed as a current picture and expressed as Q ═ P + MVi(P) of the reaction mixture. Assuming we use MSE as the matching criterion, we need to minimize:
Figure BDA0003151189140000066
suppose that
Figure BDA0003151189140000067
Small enough that we can approximately rewrite with a Taylor expansion of order 1
Figure BDA0003151189140000068
The following were used:
Figure BDA0003151189140000069
wherein the content of the first and second substances,
Figure BDA00031511891400000610
represents Ei+1(P)=Piccur(P)-Picref(Q),
Figure BDA00031511891400000611
We can derive this by setting the derivative of the error function to zero
Figure BDA00031511891400000612
Then, can be based on
Figure BDA00031511891400000613
The incremental MV of control points (0,0) and (0, w) are calculated:
Figure BDA00031511891400000614
Figure BDA00031511891400000615
Figure BDA0003151189140000071
Figure BDA0003151189140000072
assuming that such MVD derivation process is iterated n times, the final MVD is calculated as follows:
Figure BDA0003151189140000073
Figure BDA0003151189140000074
Figure BDA0003151189140000075
Figure BDA0003151189140000076
by way of example, e.g. from mvd0Incremental MV prediction for a represented control point (0,0) is predicted by mvd1Incremental MV of the indicated control point (0, w) now being implementedActually only for mvd1Encoding
Figure BDA0003151189140000077
2.1.3AF _ MERGE mode
When a CU is applied in AF _ MERGE mode, it obtains the first block coded with affine mode from the valid neighboring reconstructed blocks. And the selection order of the candidate blocks is from left, top right, bottom left to top left as shown in fig. 5A. If the neighboring lower left block A is coded in affine mode, as shown in FIG. 5B, then the motion vectors v for the upper left, upper right and lower left corner of the CU containing block A are derived2、v3And v4. And according to v2、v3And v4To calculate the motion vector v of the upper left corner of the current CU0. Next, the motion vector v at the top right of the current CU is calculated1
In deriving CPMV v of the current CU0And v1Thereafter, the MVF of the current CU is generated according to the simplified affine motion model equation 1. To identify whether the current CU is coded with AF _ MERGE mode, an affine flag is signaled in the bitstream when there is at least one neighboring block coded with affine mode.
In the example where the plan is adopted into VTM 3.0, the affine Merge candidate list is constructed by the following steps:
1) inserting inherited affine candidates
Inherited affine candidates refer to candidates that are derived from affine motion models of their valid neighboring affine codec blocks. In the common basis, as shown in fig. 6, the scan order of the candidate locations is: a1, B1, B0, a0 and B2.
After deriving the candidates, a full pruning process is performed to check if the same candidate has been inserted into the list. If the same candidate exists, the derived candidate is discarded.
2) Insertion-built affine candidates
If the number of candidates in the affine Merge candidate list is less than MaxmumAffinic and (set to 5 herein), the constructed affine candidate is inserted into the candidate list. The constructed affine candidate refers to a candidate constructed by combining the neighboring motion information of each control point.
The motion information of the control points is first derived from the assigned spatial and temporal neighbors shown in fig. 5B. CPk (k ═ 1,2,3,4) represents the kth control point. A0, a1, a2, B0, B1, B2, and B3 are spatial positions of the predicted CPk (k ═ 1,2, 3); t is the temporal location of the predicted CP 4.
The coordinates of the CP1, CP2, CP3, and CP4 are (0,0), (W,0), (H,0), and (W, H), respectively, where W and H are the width and height of the current block.
FIG. 6 shows an example of candidate positions for affine Merge mode
Motion information for each control point is obtained according to the following priority order:
for CP1, the checking priority is B2->B3->A2. If B is present2Can be used, then B is used2. Otherwise, if B2Not available, then B is used3. If B is present2And B3Are all unusable, use A2. If all three candidates are not available, no motion information for CP1 can be obtained.
For CP2, the checking priority is B1->B0
For CP3, the inspection priority is A1->A0
For CP4, T is used.
Next, affine Merge candidates are constructed using combinations of control points.
Motion information of three control points is required to construct a 6-parameter affine candidate. The three control points may be selected from one of the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combinations CP1, CP2, CP3, CP2, CP3, CP4, CP1, CP3, CP4 will be converted into a 6-parameter motion model represented by upper-left, upper-right and lower-left control points.
Motion information for two control points is needed to construct a 4-parameter affine candidate. The two control points may be selected from one of six combinations ({ CP1, CP4}, { CP2, CP3}, { CP1, CP2}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4 }). The combinations CP1, CP4, CP2, CP3, CP2, CP4, CP1, CP3, CP3, CP4 will be converted into a 4-parameter motion model represented by the upper left and upper right control points.
The combination of constructed affine candidates is inserted into the candidate list in the following order: { CP1, CP2, CP3}, { CP1, CP2, CP4}, { CP1, CP3, CP4}, { CP2, CP3, CP4}, { CP1, CP2}, { CP1, CP3}, { CP2, CP3}, { CP1, CP4}, { CP2, CP4}, { CP3, CP4 }.
For a combined reference list X (X is 0 or 1), the reference index with the highest usage in the control points is selected as the reference index for list X, and the motion vectors pointing to different reference pictures will be scaled.
After deriving the candidates, a full pruning process is performed to check if the same candidate has been inserted into the list. If the same candidate exists, the derived candidate will be discarded.
3) Filling with zero motion vectors
If the number of candidates in the affine Merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.
2.2 affine Merge mode with prediction bias
In the example, UMVE is extended to affine Merge mode, hereafter we refer to it as UMVE affine mode. The proposed method selects the available first affine Merge candidate as the basic predictor. It then applies the motion vector offset to the motion vector value from each control point of the base predictor. If no affine Merge candidates are available, the proposed method is not used.
The selected inter prediction direction of the basic prediction amount and the reference index of each direction are used without change.
In the current embodiment, the affine model of the current block is assumed to be a 4-parameter model, and only 2 control points need to be derived. Therefore, only the first 2 control points of the basic prediction will be used as control point predictions.
For each control point, the zero _ MVD flag is used to indicate whether the control point of the current block has the same MV value as the corresponding control point pre-measured. If the zero _ MVD flag is true, no further signaling is required by the control point. Otherwise, signaling the distance index and the offset direction index to the control point.
A table of distance offsets of size 5 is used, as shown in the table below. The distance index is signaled to indicate which distance offset to use. The mapping of the distance index and the distance offset value is shown in fig. 7.
Meter-distance offset meter
Distance DX 0 1 2 3 4
Offset of distance 1/2 pixel 1 pixel 2 pixels 4 pixels 8 pixels
The direction index may represent four directions as shown below, where only the x or y direction may have MV differences, but not both directions.
Offset direction IDX 00 01 10 11
x-dir-factor +1 –1 0 0
y-dir-factor 0 0 +1 –1
If the inter prediction is unidirectional, the signaled distance offset is applied to the offset direction of each control point prediction quantity. The result will be the MV value for each control point.
For example, when the basic prediction amount is unidirectional, and the motion vector value of the control point is MVP (v)px,vpy). When the distance offset and direction index are signaled, the motion vector of the corresponding control point of the current block will be calculated as follows.
MV(vx,vy)=MVP(vpx,vpy)+MV(x-dir-factor*distance-offset,y-dir-factor*distance-offset)
If inter prediction is bi-directional, the signaled distance offset is applied in the signaled offset direction of the L0 motion vector of the control point prediction measure; and the same distance offset with the opposite direction is applied to the L1 motion vector of the control point pre-measurement. The result will be an MV value for each control point in each inter prediction direction.
For example, when the basic pre-measurement is unidirectional, and the motion vector value of the control point on L0 is MVPL0(v0px,v0py) And the motion vector of the control point on L1 is MVPL1(v1px,v1py). When the distance offset and direction index are signaled, the motion vector of the corresponding control point of the current block will be calculated as follows.
MVL0(v0x,v0y)=MVPL0(v0px,v0py)+MV(x-dir-factor*distance-offset,y-dir-factor*distance-offset);
MVL1(v0x,v0y)=MVPL1(v0px,v0py)+MV(-x-dir-factor*distance-offset,-y-dir-factor*distance-offset)。
2.3 Final motion vector representation
In the example, a final motion vector representation (UMVE) is proposed. UMVE is used for skip or Merge mode with the proposed motion vector expression method.
UMVE reuses the same Merge candidates as those included in the regular Merge candidate list in VVC. Among the Merge candidates, a base candidate may be selected and further expanded by the proposed motion vector expression method.
UMVE provides a new Motion Vector Difference (MVD) representation method in which the MVD is represented using a starting point, a motion magnitude, and a motion direction.
FIG. 8 illustrates an example of a UMVE search process.
FIG. 9 illustrates an example of UMVE search points.
The proposed technique uses the Merge candidate list as is. But for the extension of UMVE, only the candidate of the DEFAULT Merge TYPE (MRG _ TYPE _ DEFAULT _ N) is considered.
The base candidate index defines a starting point. The base candidate index indicates the best candidate among the candidates in the list, as shown below.
TABLE 1 basic candidate IDX
Basic candidate IDX 0 1 2 3
The Nth MVP First MVP Second MVP Third MVP Fourth MVP
If the number of basic candidates is equal to 1, the basic candidate IDX is not signaled.
The distance index is motion amplitude information. The distance index indicates a predefined distance from the start point information. The predefined distances are as follows:
table 2a distance IDX
Figure BDA0003151189140000111
In the entropy coding process, the distance IDX is binarized into bins (bins) with a truncated unary code:
table 2 b: distance IDX binarization
Figure BDA0003151189140000112
In arithmetic coding, the first bit is coded with probability context and the following bits are coded with equal probability model (also called bypass coding).
The direction index indicates the direction of the MVD with respect to the starting point. The direction index may represent four directions as shown below.
TABLE 3 Direction IDX
Direction IDX 00 01 10 11
x axis + N/A N/A
y axis N/A N/A +
The UMVE flag is signaled immediately after sending the skip flag and the Merge flag. If skip and Merge flags are true, then the UMVE flag is resolved. If the UMVE flag is equal to 1, the UMVE syntax is parsed. However, if not 1, the AFFINE flag is resolved. If the AFFINE flag is equal to 1, this is AFFINE mode, but if not 1, the skip/Merge index is resolved for the skip/Merge mode of the VTM.
No additional line buffering due to UMVE candidates is required. Since the software skip/Merge candidates are directly used as base candidates. Using the input UMVE index, the MV supplementation is decided before motion compensation. No long line buffers need to be reserved for this purpose.
Under current general test conditions, the first or second Merge candidate in the Merge candidate list may be selected as a basic candidate.
UMVE is called Merge with MVD (MMVD).
2.4 generalized Bi-prediction
In conventional bi-directional prediction, the predictors from L0 and L1 are averaged to generate the final predictor using an equal weight of 0.5. The pre-measurement generation formula is shown in equation (3).
PTraditionalBiPred=(PL0+PL1+RoundingOffset)>>shiftNum (1)
In equation (3), the final predictor of conventional bi-directional prediction is PTraditionalBiPred,PL0And PL1Are the predictors from L0 and L1, respectively, and RoundingOffset and shiftNum are used to normalize the final predictor.
Generalized Bi-prediction (GBI) was proposed to allow different weights to be applied to the predictors from L0 and L1. The generation of the pre-measurement is shown in equation (4).
PGBi=((1-w1)*PL0+w1*PL1+RoundingOffsetGBi)>>shiftNumGBi (2)
In equation (4), PGBiIs the final predictor of GBi, (1-w)1) And w1Are the selected GBI weights applied to the predicted quantities of L0 and L1, respectively. roundingOffsetGBiAnd shiftNumGBiIs used byNormalized to the final predicted amount in GBi.
w1The supported weights of (a) are { -1/4,3/8,1/2,5/8,5/4 }. One equal weight set and four unequal weight sets are supported. For the equal weight case, the process of generating the final prediction amount is exactly the same as in the conventional bi-directional prediction mode. For the true bi-directional prediction case under Random Access (RA) conditions, the number of candidate weight sets is reduced to three.
For the Advanced Motion Vector Prediction (AMVP) mode, if the CU is bi-predictive coded, the weight selection in the GBI is explicitly signaled at the CU level. For Merge mode, the weight selection inherits from the Merge candidate. In this proposal, GBI supports weighted averaging of DMVR-generated templates and final prediction of BMS-1.0.
2.5 adaptive motion vector difference resolution
In HEVC, when use _ integer _ mv _ flag in slice header is equal to 0, Motion Vector Difference (MVD) (between Motion Vector and predicted Motion Vector of PU) is signaled in units of quarter (predictor) luma samples. In VTM-3.0, a Locally Adaptive Motion Vector Resolution (LAMVR) is introduced. In the JEM, the MVD may be coded in units of quarter luminance samples, integer luminance samples, or four luminance samples. The MVD resolution is controlled at the Codec Unit (CU) level, and the MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.
For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter-luma sample MV precision is not used, another flag is signaled to indicate whether integer-luma sample MV precision or four-luma sample MV precision is used.
The quarter-luma sample MV resolution is used for a CU when the first MVD resolution flag of the CU is zero or is not coded for the CU (meaning all MVDs in the CU are zero). When a CU uses integer luma sample MV precision or four luma sample MV precision, the MVPs in the CU's AMVP candidate list are rounded to the corresponding precision.
In arithmetic coding, the first MVD resolution flag is coded using one of three types of probability contexts: c0, C1, or C2; and the second MVD resolution flag is coded using a fourth probability context C3. The probability context Cx for the first MVD resolution flag is derived as (L denotes the left neighboring block and a denotes the upper neighboring block):
if L is available, inter-coded, and its first MVD resolution flag is not equal to 0, xL is set equal to 1; otherwise, xL is set equal to 0.
If a is available, inter-coded, and its first MVD resolution flag is not equal to 0, xA is set equal to 1; otherwise, xA is set equal to 0.
x is set equal to xL + xA.
In the encoder, RD checking at the CU level is used to determine which MVD resolution is to be used for the CU. That is, RD checking at the CU level is performed three times for each MVD resolution. To speed up the encoder speed, the following encoding scheme is applied in JEM:
during the RD check of a CU with normal quarter-luma sample MVD resolution, the motion information of the current CU (integer luma sample precision) is stored. The stored motion information (after rounding) is used as a starting point for further small-range motion vector refinement during RD-checking for the same CU with integer luma samples and 4 luma samples MVD resolution, so that the time-consuming motion estimation process is not repeated three times.
The RD check of the CU with 4 luma samples MVD resolution is conditionally invoked. For a CU, when the RD cost of the integer-luma sample MVD resolution is much greater than the RD cost of the quarter-luma sample MVD resolution, the RD check for the 4-luma sample MVD resolution of the CU is skipped.
In VTM-3.0, LAMVR is also called Integer Motion Vector (IMV).
2.6 Current Picture reference
Decoder side:
in this approach, the currently (partially) decoded picture is considered as a reference picture. The current picture is placed at the last position of the reference picture list 0. Therefore, for a slice using the current picture as the only reference picture, its slice type is considered as P slice. In this method, the bitstream syntax follows the same syntax structure as used for inter-coding, and the decoding process is unified with inter-coding. The only significant difference is that the block vector (the motion vector pointing to the current picture) always uses integer-pixel resolution.
The variation from the block level CPR _ flag method is:
when searching for this pattern in the encoder, the width and height of the block are both less than or equal to 16.
When the luma block vector is an odd integer, chroma interpolation is enabled.
When the SPS flag is on, the adaptive motion vector resolution (AMVP) of the CPR mode is enabled. In this case, when using AMVR, the block vector may be switched between 1-pixel integer resolution and 4-pixel integer resolution at the block level.
Encoder side:
the encoder performs an RD check on blocks whose width or height is not greater than 16. For non-Merge mode, a block vector search is first performed using a hash-based search. If no valid candidate is found from the hash search, a local search based on block matching will be performed.
In a hash-based search, hash-key matching (32-bit CRC) between the current block and the reference block is extended to all allowed block sizes. The hash key calculation for each position in the current picture is based on 4 x 4 blocks. For larger size current blocks, when all of their 4 x 4 blocks match the hash key in the corresponding reference location, a hash key matching the reference block occurs. If multiple reference blocks are found to match the current block with the same hash key, then the block vector cost for each candidate is calculated and the block with the smallest cost is selected.
In the block matching search, a search range is set to 64 pixels on the left and top of the current block, and the search range is limited within the current CTU.
2.7 Merge List design in one example
Three different Merge list construction processes are supported in the VVC:
1)subblock Merge candidate list:it includes ATMVP and affine Merge candidates. The affine mode and the ATMVP mode share one Merge list building process. Here, the ATMVP and affine Merge candidates may be added in order. The sub-block Merge list size is signaled in the slice header and has a maximum value of 5.
2)List of one-way prediction TPMMerge:for the triangle prediction mode, two partitions share one Merge list construction process, even though two partitions can select their own Merge candidate index. When building the Merge list, the spatial neighborhood of blocks and two time domain blocks are examined. The motion information derived from spatial neighboring blocks and temporal blocks is referred to herein as regular motion candidates. These regular motion candidates are further used to derive a plurality of TPM candidates. Note that the transform is performed at the entire block level, even though two partitions may use different motion vectors to generate their own prediction blocks.
In some embodiments, the unidirectional prediction TPM Merge list size is fixed to 5.
3)Rule Merge List:and sharing a Merge list construction process for the rest coding and decoding blocks. Here, the spatial/temporal/HMVP, the pairwise combined bidirectional prediction Merge candidate, and the zero motion candidate may be inserted in order. The rule Merge list size is signaled in the slice header,
and a maximum value of 6.
Subblock Merge candidate list
It is proposed to put all sub-block related motion candidates in a separate Merge list in addition to the regular Merge list for non sub-block Merge candidates.
The sub-block related motion candidates are put into a separate Merge list named "sub-block Merge candidate list".
In one example, the sub-block Merge candidate list includes affine Merge candidates, and ATMVP candidates and/or sub-block-based STMVP candidates.
In this context, the ATMVP large candidate in the normal large list is moved to the first position of the affine large list. So that all the Merge candidates in the new list (i.e. the sub-block based Merge candidate list) are based on the sub-block coding tool.
Constructing an affine Merge candidate list by:
1) inserting inherited affine candidates
Inherited affine candidates refer to candidates that are derived from affine motion models of their valid neighboring affine codec blocks. At most two inherited affine candidates are derived from the affine motion models of the neighboring blocks and inserted into the candidate list. For left pre-measurement, the scan order is { A0, A1 }; for the upper edge prediction measure, the scan order is { B0, B1, B2 }.
2) Insertion-built affine candidates
If the candidate number in the affine Merge candidate list is less than MaxMumAffinic and (set to 5), the constructed affine candidate is inserted into the candidate list. The constructed affine candidate refers to a candidate constructed by combining the neighboring motion information of each control point.
The motion information of the control points is first derived from the assigned spatial and temporal neighbors shown in fig. 7. CPk (k ═ 1,2,3,4) denotes the kth control point. A0, a1, a2, B0, B1, B2, and B3 are spatial positions of the predicted CPk (k ═ 1,2, 3); t is the temporal location of the predicted CP 4.
The coordinates of the CP1, CP2, CP3, and CP4 are (0,0), (W,0), (H,0), and (W, H), respectively, where W and H are the width and height of the current block.
Motion information for each control point is obtained according to the following priority order:
for CP1, the checking priority is B2->B3->A2. If B is present2Can be used, then B is used2. Otherwise, if B2Not available, then B is used3. If B is present2And B3Are all unusable, use A2. If all three candidates are not available, no motion information for CP1 can be obtained.
For CP2, the checking priority is B1->B0
For CP3, the inspection priority is A1->A0
For CP4, T is used.
Next, affine Merge candidates are constructed using combinations of control points.
Motion information of three control points is required to construct a 6-parameter affine candidate. The three control points may select one from the following four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combinations CP1, CP2, CP3, CP2, CP3, CP4, CP1, CP3, CP4 will be converted into a 6-parameter motion model represented by upper-left, upper-right and lower-left control points.
Motion information for two control points is needed to construct a 4-parameter affine candidate. The two control points may select one from the following two combinations ({ CP1, CP2}, { CP1, CP3 }). These two combinations will be converted into a 4-parameter motion model represented by the upper left and upper right control points.
The combination of the constructed affine candidates is inserted into the candidate list in the following order:
{CP1,CP2,CP3}、{CP1,CP2,CP4}、{CP1,CP3,CP4}、{CP2,CP3,CP4}、{CP1,CP2}、{CP1,CP3}。
only when the CP has the same reference index, an available combination of motion information of the CP is added to the affine Merge list.
3) Filling with zero motion vectors
If the number of candidates in the affine Merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.
2.8 MMVD with affine Merge candidates in the example
For example, the MMVD idea is applied to affine Merge candidates (referred to as affine Merge with prediction offset). It is an extension of the MVD (otherwise known as "distance" or "offset") that is signaled after the affine Merge candidate (known as) is signaled. All CPMVs are added to the MVD to obtain new CPMVs. The distance table is specified as
Distance IDX 0 1 2 3 4
Distance-offset 1/2-pixel 1-pixel 2-pixel 4-pixel 8-pixel
In some embodiments, a POC distance based offset mirroring method is used for bi-prediction. When the base candidate is bi-predicted, the offset applied to L0 is signaled, and the offset on L1 depends on the temporal position of the reference pictures on list 0 and list 1.
If both reference pictures are on the same temporal side of the current picture, the same distance offset and the same offset direction are applied to the CPMVs of L0 and L1.
When the two reference pictures are on different sides of the current picture, the CPMV of L1 will apply a distance offset in opposite offset directions.
3. Examples of problems addressed by the disclosed embodiments
There are some potential problems in the design of MMVD:
the encoding/decoding/parsing process for UMVE information may not be efficient because it uses a truncated unary binarization method to encode and decode distance (MVD precision) information and fixed length to bypass the encoding and decoding direction index. This is based on the assumption that 1/4 pixel accuracy is the highest percentage. However, this is not true for all types of sequences.
The possible distance design may not be efficient.
4. Examples of techniques implemented by various embodiments
The following list should be considered as an example to explain the general concept. These inventions should not be construed narrowly. Furthermore, these techniques may be combined in any manner.
Resolution of distance index (e.g., MVD precision index)
1. It is proposed that the Distance Index (DI) used in UMVE is binarized without a truncated unary code.
a. In one example, the DI may be binarized with a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
2. The distance index may be signaled with more than one syntax element.
3. It is proposed to classify the set of allowed distances into a plurality of subsets, e.g. K subsets (K being larger than 1). The subset index (first syntax) is signaled first, followed by the distance index within the subset (second syntax).
a. For example, first, mmvd _ distance _ subset _ idx is signaled, and then, mmvd _ distance _ idx _ in _ subset.
i. In one example, mmvd _ distance _ idx _ in _ subset may be binarized with unary code, truncated unary code, fixed length code, exponential-Golomb code, truncated exponential-Golomb code, Rice code, or any other code.
(i) In particular, if there are only two possible distances in the subset, mmvd _ distance _ idx _ in _ subset may be binarized as a flag.
(ii) In particular, if there is only one possible distance in the subset, mmvd _ distance _ idx _ in _ subset is not signaled.
(iii) Specifically, if mmvd _ distance _ idx _ in _ subset is binarized as the truncated code, the maximum value is set to the number of possible distances in the subset minus 1.
b. In one example, there are two subsets (e.g., K ═ 2).
i. In one example, one subset includes all fractional MVD precisions (e.g., 1/4 pixels, 1/2 pixels). Another subset includes all integer MVD precision (e.g., 1 pixel, 2 pixels, 4 pixels, 8 pixels, 16 pixels, 32 pixels).
in one example, one subset may have only one distance (e.g., 1/2 pixels) and another subset has all remaining distances.
c. In one example, there are three subsets (e.g., K — 3).
i. In one example, the first subset includes a fractional MVD precision (e.g., 1/4 pixels, 1/2 pixels); the second subset includes integer MVD precision of less than 4 pixels (e.g., 1 pixel, 2 pixels); and the third subset includes all other MVD precisions (e.g., 4 pixels, 8 pixels, 16 pixels, 32 pixels).
d. In one example, there are K subsets and the number of K is set equal to the MVD precision allowed in LAMVR.
i. Optionally, in addition, signaling of subset indices may be reused for LAMVR (e.g., reuse of the way in which context offset indices are derived; reuse of context, etc.)
The distance within a subset may be determined by the associated LAMVR index (e.g., AMVR _ mode in the specification).
e. In one example, how the subset is defined and/or how many subsets can be predefined or dynamically adapted.
f. In one example, the first syntax may be coded with a truncated unary code, a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
g. In one example, the second syntax may be coded with a truncated unary code, a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
h. In one example, the subset index (e.g., the first syntax) may not be explicitly coded in the bitstream. Optionally, furthermore, the subset index may be dynamically derived, e.g. based on coding information (e.g. block dimensions) of the current block and/or a previously coded block.
i. In one example, the distance indices within the subset (e.g., the second syntax) may not be explicitly coded in the bitstream.
i. In one example, when the subset has only one distance, no further signaling of the distance index is required.
Optionally, furthermore, the second syntax may be dynamically derived, e.g. based on coding information (e.g. block dimensions) of the current block and/or a previously coded block.
j. In one example, the first resolution bit is signaled to indicate whether DI is less than a predetermined number T. Optionally, the first resolution bit is signaled to indicate whether the distance is less than a predetermined number.
i. In one example, two syntax elements are used to represent the distance index. First, the mmvd _ resolution _ flag is signaled, followed by the mmvd _ distance _ idx _ in _ subset.
in one example, three syntax elements are used to represent the distance index. The mmvd _ resolution _ flag is first signaled, followed by mmvd _ short _ distance _ idx _ in _ subset when it is equal to 0, and followed by mmvd _ long _ distance _ idx _ in _ subset when it is equal to 1.
in one example, the distance index number T corresponds to a 1-pixel distance. For example, in table 2a defined in VTM-3.0, T ═ 2.
in one example, the distance index number T corresponds to an 1/2 pixel distance. For example, in table 2a defined in VTM-3.0, T ═ 1.
v. in one example, the distance index number T corresponds to W pixel distances. For example, in table 2a defined in VTM-3.0, T ═ 3 corresponds to a 2-pixel distance.
In one example, if DI is less than T, the first resolution bit is equal to 0. Alternatively, if DI is less than T, the first resolution bit is equal to 1.
In one example, if DI is less than T, a code of the short-range index is further signaled after the first resolution bit to indicate the value of DI.
(i) In one example, the DI is signaled. The DI may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When DI is binarized into a truncated code, such as a truncated unary code, the maximum codec value is T-1.
(ii) In one example, S-T-1-DI is signaled. The T-1-DI may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When T-1-DI is binarized into a truncated code (such as a truncated unary code), the maximum codec value is T-1.
b. After S is resolved, DI is reconstructed as DI ═ T-S-1.
In one example, if DI is not less than T, a code of the long-distance index is further signaled after the first resolution bit to indicate the value of DI.
(i) In one example, B ═ DI-T is signaled. The DI-T may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When DI-T is binarized to a truncated code (such as a truncated unary code), the maximum codec value is DMax-T, where DMax is the maximum allowable distance, such as 7 in VTM-3.0.
b. After B is resolved, DI is reconstructed as DI ═ B + T.
(ii) In one example, B' DMax-DI is signaled, where DMax is the maximum allowed distance, such as 7 in VTM-3.0. The DMax-DI may be binarized with a unary code, a truncated unary code, a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
a. When DMax-DI is binarized to a truncated code (such as a truncated unary code), the maximum codec value is DMax-T, where DMax is the maximum allowable distance, such as 7 in VTM-3.0.
b. After B 'is resolved, DI is reconstructed as DI ═ DMax-B'.
k. In one example, the first resolution bit is signaled to indicate whether DI is greater than a predetermined number T. Optionally, the first resolution bit is signaled to indicate whether the distance is greater than a predetermined number.
i. In one example, the distance index number T corresponds to a 1-pixel distance. For example, in table 2a defined in VTM-3.0, T ═ 2.
in one example, the distance index number T corresponds to an 1/2 pixel distance. For example, in table 2a defined in VTM-3.0, T ═ 1.
in one example, the distance index number T corresponds to a W pixel distance. For example, in table 2a defined in VTM-3.0, T ═ 3 corresponds to a 2-pixel distance.
in one example, if DI is greater than T, the first resolution bit is equal to 0. Alternatively, if DI is greater than T, the first resolution bit is equal to 1.
v. in one example, if DI is not greater than T, further signaling a code of the short range index after the first resolution bit to indicate the value of DI.
(i) In one example, the DI is signaled. The DI may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When DI is binarized into a truncated code, such as a truncated unary code, the maximum codec value is T.
(ii) In one example, S ═ T-DI is signaled. The T-DI may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When T-DI is binarized into a truncated code, such as a truncated unary code, the maximum codec value is T.
b. After S is resolved, DI is reconstructed as DI ═ T-S.
In one example, if DI is greater than T, a code of the long-distance index is further signaled after the first resolution bit to indicate the value of DI.
(i) In one example, B ═ DI-1-T is signaled. DI-1-T may be binarized with unary codes, truncated unary codes, fixed length codes, exponential-Golomb codes, truncated exponential-Golomb codes, Rice codes, or any other code.
a. When DI-1-T is binarized to a truncated code (such as a truncated unary code), the maximum codec value is DMax-1-T, where DMax is the maximum allowable distance, such as 7 in VTM-3.0.
b. After B is resolved, DI is reconstructed as DI ═ B + T + 1.
(ii) In one example, B' DMax-DI is signaled, where DMax is the maximum allowed distance, such as 7 in VTM-3.0. The DMax-DI may be binarized with a unary code, a truncated unary code, a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
a. When DMax-DI is binarized to a truncated code (such as a truncated unary code), the maximum codec value is DMax-1-T, where DMax is the maximum allowable distance, such as 7 in VTM-3.0.
b. After B 'is resolved, DI is reconstructed as DI ═ DMax-B'.
Several possible binarization methods for the distance index are: (it should be noted that both binarization methods should be considered identical if the process of changing all "1's" in one method to "0" and all "0's" in one method to "1" would yield the same codeword as the other method.)
Figure BDA0003151189140000221
4. It is proposed to codec a first syntax using one or more probability contexts
a. In one example, the first syntax is the first resolution bit mentioned above.
b. In one example, which probability context to use is derived from the first resolution bits of the neighboring blocks.
c. In one example, which probability context to use is derived from the LAMVR _ mode value (e.g., AMVR _ mode value) of the neighboring block.
5. It is proposed to codec the second syntax using one or more probability contexts.
a. In one example, the second syntax is the short-range index mentioned above.
i. In one example, the first bin used to codec the short distance index is coded with a probability context and the other bins are bypassed.
in one example, the first N bins for coding the short distance index are coded with a probability context and the other bins are bypassed.
in one example, all bins used for coding the short distance index are coded with a probability context.
in one example, different bins may have different probability contexts.
v. in one example, several bins share a single probability context.
(i) In one example, the bits are contiguous.
In one example, which probability context to use is derived from the short-range indices of neighboring blocks.
b. In one example, the second syntax is the long-range index mentioned above.
i. In one example, the first bin used to encode and decode the long-range index is encoded and decoded with a probability context and the other bins are bypass encoded and decoded.
in one example, the first N bins for coding the long-distance index are coded with a probability context and the other bins are bypass coded.
in one example, all bins used for coding long-distance indices are coded with probability context.
in one example, different bins may have different probability contexts.
v. in one example, several bins share a single probability context.
(i) In one example, the bits are contiguous.
In one example, which probability context to use is derived from the long-range indices of neighboring blocks.
Interaction with LAMVR
6. It is proposed to codec a first syntax (e.g. first resolution bins) according to a probabilistic model for coding LAMVR information.
a. In one example, the first resolution bit is coded in the same manner as the first MVD resolution flag is coded (e.g., sharing context, or the same context index derivation method, but the LAMVR information of neighboring blocks is replaced by MMVD information).
i. In one example, which probability context to use to encode the first resolution bit is derived from the LAMVR information of neighboring blocks.
(i) In one example, which probability context to use to encode the first resolution bit is derived from the first MVD resolution flags of neighboring blocks.
b. Optionally, the first MVD resolution flag is coded and decoded and used as the first resolution bit when the distance index is coded and decoded.
c. In one example, which probability model to use to encode the first resolution bit may depend on the encoded LAMVR information.
i. For example, which probability model to use to encode the first resolution bit may depend on the MV resolution of the neighboring blocks.
7. It is proposed that the first bit for coding the short distance index is coded with a probability context.
a. In one example, the first bit for coding the short distance index is coded in the same way as the second MVD resolution flag is coded (e.g., sharing context, or the same context index derivation method, but the LAMVR information of the neighboring block is replaced by MMVD information).
b. Alternatively, the second MVD resolution flag is coded and decoded and used as the first bit for coding the short distance index when the distance index is coded and decoded.
c. In one example, which probability model to use to encode the first bit for encoding the short distance index may depend on the encoded LAMVR information.
i. For example, which probability model to use to encode the first bit for encoding the short distance index may depend on the MV resolution of the neighboring blocks.
8. It is proposed that the first bit for coding a long-distance index is coded with a probability context.
a. In one example, the first bit for coding the long-distance index is coded in the same manner as the second MVD resolution flag is coded (e.g., sharing context, or the same context index derivation method, but the LAMVR information of the neighboring block is replaced with MMVD information).
b. Alternatively, the second MVD resolution flag is coded and decoded and used as the first bit for coding the long-distance index when the distance index is coded and decoded.
c. In one example, which probability model to use to encode the first bit for encoding the long-distance index may depend on the encoded LAMVR information.
i. For example, which probability model to use to encode the first bit for encoding the long-distance index may depend on the MV resolution of the neighboring blocks.
9. For the LAMVR mode, in arithmetic coding, the first MVD resolution flag is coded with one of three probability contexts (C0, C1, or C2); and the second MVD resolution flag is coded using a fourth probability context C3. An example of deriving a probability context for coding a decoding distance index is described below.
a. The probability context Cx for the first resolution bit is derived as (L denotes the left neighboring block and a denotes the upper neighboring block):
if L is available, inter-coded, and its first MVD resolution flag is not equal to 0, xL is set equal to 1; otherwise, xL is set equal to 0.
If a is available, inter-coded, and its first MVD resolution flag is not equal to 0, xA is set equal to 1; otherwise, xA is set equal to 0.
X is set equal to xL + xA.
b. The probability context of coding the first bit of the long-distance index is C3.
c. The probability context of coding the first bit of the short-range index is C3.
10. It is proposed that the LAMVR MVD resolution is signaled when MMVD mode is applied.
a. It is proposed to reuse the syntax used for the LAMVR MVD resolution signaling when encoding and decoding the side information of MMVD modes.
b. When the signaled LAMVR MVD resolution is 1/4 pixels, a short-range index is signaled to indicate the MMVD distance in the first subset. For example, the short distance index may be 0 or 1 to represent an MMVD distance of 1/4 pixels or 1/2 pixels, respectively.
c. When the signaled LAMVR MVD resolution is 1 pixel, the medium distance index is signaled to indicate the MMVD distance in the second subset. For example, the medium distance index may be 0 or 1 to represent a MMVD distance of 1 pixel or 2 pixels, respectively.
d. When the signaled LAMVR MVD resolution is 4 pixels in the third subset, the long-distance index is signaled to indicate the MMVD distance. For example, the medium distance index may be X to represent MMVD distances of (4< < X) pixels.
e. In the following disclosure, the subset distance index may refer to a short distance index, a middle distance index, or a long distance index.
i. In one example, the subset distance index may be binarized with a unary code, a truncated unary code, a fixed length code, an exponential-Golomb code, a truncated exponential-Golomb code, a Rice code, or any other code.
(i) In particular, if there are only two possible distances in the subset, the subset distance index may be binarized as a flag.
(ii) In particular, if there is only one possible distance in the subset, the subset distance index is not signaled.
(iii) In particular, if the subset distance index is binarized to a truncated code, the maximum value is set to the number of possible distances in the subset minus 1.
in one example, the first bin for encoding the subset distance index is coded with a probability context and the other bins are bypass coded.
in one example, the first N bits for coding the subset distance index are coded with a probability context, and the other bits are bypass coded.
in one example, all bins used to encode the subset distance index are encoded with a probability context.
v. in one example, different bins may have different probability contexts.
In one example, several bins share a single probability context.
(i) In one example, the bits are contiguous.
Propose that one distance cannot occur in two different ion concentrations from the ion.
In one example, more distances may be signaled in a short range ion set.
(i) For example, the distance signaled in the short range ion concentration must be in sub-pixels, rather than integer pixels. For example, 5/4 pixels, 3/2 pixels, 7/4 pixels may be concentrated in short distance ions, but 3 pixels cannot be concentrated in short distance ions.
in one example, more distances may be signaled in the medium-distance subset.
(i) For example, the distance signaled in the medium-distance subset must be an integer number of pixels, but not of the 4N form, where N is an integer. For example, 3 pixels, 5 pixels may be in the medium-distance subset, but 24 pixels may not.
In one example, more distances may be signaled in a long-range ionosphere.
(i) For example, the distance signaled in long-range ionoconcentration must be an integer pixel of the form 4N, where N is an integer. For example, 4 pixels, 8 pixels, 16 pixels, or 24 pixels may be concentrated in long range ions.
11. It is proposed that the variable for storing the MV resolution of the current block may be determined by the UMVE distance.
a. In one example, if the UMVE distance < T1 or < ═ T1, the MV of the current block
The resolution is set to 1/4 pixels.
b. In one example, if the UMVE distance < T1 or < ═ T1, the first and second MVD resolution flags of the current block are set to 0.
c. In one example, if the UMVE distance > T1 or > -T1, the MV resolution of the current block is set to 1 pixel.
d. In one example, if the UMVE distance > T1 or > -T1, the first MVD resolution flag of the current block is set to 1 and the second MVD resolution flag of the current block is set to 0.
e. In one example, if the UMVE distance > T2 or > -T2, the MV resolution of the current block is set to 4 pixels.
f. In one example, if the UMVE distance > T2 or > -T2, the first and second MVD resolution flags of the current block are set to 1.
g. In one example, if the UMVE distance > T1 or > -T1 and the UMVE distance < T2 or < T2, the MV resolution of the current block is set to 1 pixel.
h. In one example, if the UMVE distance > T1 or > <t1 and the UMVE distance < T2 or < T2, the first MVD resolution flag of the current block is set to 1 and the second MVD resolution flag of the current block is set to 0.
T1 and T2 may be any number. For example, T1 ═ 1 pixel, and T2 ═ 4 pixels.
12. It is proposed that the variable for storing the MV resolution of the current block may be decided by the UMVE distance index.
a. In one example, if the UMVE distance index < T1 or < ═ T1, the MV resolution of the current block is set to 1/4 pixels.
b. In one example, if the UMVE distance index < T1 or < ═ T1, the first and second MVD resolution flags of the current block are set to 0.
c. In one example, if the UMVE distance index > T1 or > ═ T1, the MV resolution of the current block is set to 1 pixel.
d. In one example, if the UMVE distance index > T1 or > ═ T1, the first MVD resolution flag of the current block is set to 1 and the second MVD resolution flag of the current block is set to 0.
e. In one example, if the UMVE distance index > T2 or > ═ T2, the MV resolution of the current block is set to 4 pixels.
f. In one example, if the UMVE distance index > T2 or > ═ T2, the first and second MVD resolution flags of the current block are set to 1.
g. In one example, if the UMVE distance > T1 or > -T1 and the UMVE distance index < T2 or < T2, the MV resolution of the current block is set to 1 pixel.
h. In one example, if the UMVE distance index > T1 or > <t1 and the UMVE distance index < T2 or < T2, the first MVD resolution flag of the current block is set to 1 and the second MVD resolution flag of the current block is set to 0.
T1 and T2 may be any number. For example, T1 ═ 2 and T2 ═ 3, or T1 ═ 2 and T2 ═ 4;
13. the variables used to store the MV resolution of a UMVE codec block may be used to codec subsequent blocks that are coded with the LAMVR mode.
a. Alternatively, the variable used to store the MV resolution of a UMVE codec block may be used to codec subsequent blocks that are coded in UMVE mode.
b. Alternatively, the MV precision of a LAMVR codec block may be used to codec a subsequent UMVE codec block.
14. The above items may also be applied to the coding direction index.
Mapping between distance index and distance
15. It is proposed that the relationship between Distance Index (DI) and distance is not an exponential relationship as VTM-3.0. (distance 1/4 pixels × 2DI)
a. In one example, the mapping may be segmented.
i. For example, when T0< ═ DI < T1, distance f1(DI), when T1< ═ DI < T2, distance f2(DI), and … when Tn-1< ═ DI < Tn, distance fn (DI).
(i) For example, when DI<T1, distance 1/4 pixels × 2DI(ii) a When T1<=DI<At T2, distance a × DI + b; when DI>When T2, the distance is c × 2DI. In one example, T1 ═ 4, a ═ 1, b ═ 1, T2 ═ 6, and c ═ 1/8.
16. It is proposed that the distance meter size may be larger than 8, such as 9, 10, 12, 16.
17. It is proposed that distances smaller than 1/4 pixels, such as 1/8 pixels, 1/16 pixels or 3/8 pixels, may be included in the distance table.
18. It is proposed that non-2 can be included in the distance tableXDistances in pixels, such as 3 pixels, 5 pixels, 6 pixels, etc.
19. It is proposed that the distance table may be different for different directions.
a. Accordingly, the resolution process of the distance index may be different for different directions.
b. In one example, four directions with direction indices of 0, 1,2, and 3 have different distance tables.
c. In one example, two x-directions with direction indices of 0 and 1 have the same distance table.
d. In one example, two y-directions with direction indices of 2 and 3 have the same distance table.
e. In one example, the x-direction and the y-direction may have two different distance tables.
i. Accordingly, the resolution process of the distance index may be different for the x-direction and the y-direction.
in one example, the y-direction distance table may have fewer possible distances than the x-direction distance table.
in one example, the shortest distance in the y-direction distance table may be shorter than the shortest distance in the x-direction distance table.
in one example, the longest distance in the y-direction distance table may be shorter than the longest distance in the x-direction distance table.
20. It is proposed that different distance tables may be used for different block widths and/or heights.
a. In one example, different distance tables may be used for different block widths when the direction is along the x-axis.
b. In one example, different distance tables may be used for different block heights when the direction is along the y-axis.
21. It is proposed that different distance tables may be used when POC distances are different. The POC difference is calculated as POC | of the POC-reference picture of the current picture.
22. It is proposed that different distance tables can be used for different basic candidates.
23. It is proposed that the ratio of two distances with consecutive indices (MVD precision) is not fixed to 2.
a. In one example, the ratio of two distances with consecutive indices (MVD precision) is fixed to M (e.g., M-4).
b. In one example, the increment (rather than the ratio) of two distances (MVD precision) with consecutive indices may be fixed for all indices. Alternatively, the increment of the two distances (MVD precision) with consecutive indices may be different for different indices.
c. In one example, the ratio of two distances with consecutive indices (MVD precision) may be different for different indices.
i. In one example, a set of distances such as 1 pixel, 2 pixels, 4 pixels, 8 pixels, 16 pixels, 32 pixels, 48 pixels, 64 pixels may be used.
in one example, a set of distances such as 1 pixel, 2 pixels, 4 pixels, 8 pixels, 16 pixels, 32 pixels, 64 pixels, 96 pixels may be used.
in one example, a set of distances such as 1 pixel, 2 pixels, 3 pixels, 4 pixels, 5 pixels, 16 pixels, 32 pixels may be used.
Signaling of MMVD side information can be done by:
a. when the current block is inter mode and non-Merge mode (which may include, for example, non-skip, non-sub-block, non-triangle, non-MHIntra), the MMVD flag may be signaled first, followed by the subset index of the distance, the distance index within the subset, the direction index. Here, the MMVD is considered as a mode different from the Merge mode.
b. Optionally, when the current block is in the Merge mode, the MMVD flag may be further signaled, followed by the subset index of the distance, the distance index within the subset, and the direction index. Here, MMVD is considered a special Merge mode.
25. The direction of the MMVD and the distance of the MMVD may be signaled jointly.
a. In one example, whether and how the MMVD distance is signaled may depend on the MMVD direction.
b. In one example, whether and how the MMVD direction is signaled may depend on the MMVD distance.
c. In one example, the joint codeword is signaled with one or more syntax elements. The MMVD distance and MMVD direction can be derived from the codeword. For example, the codeword is equal to MMVD distance index + MMVD direction index 7. In another example, an MMVD code word table is designed. Each codeword corresponds to a unique combination of MMVD distance and MMVD direction.
26. Some exemplary UMVE distance tables are listed below:
a. the gauge size is 9:
Figure BDA0003151189140000301
Figure BDA0003151189140000311
b. the gauge size is 10:
Figure BDA0003151189140000312
a. the gauge size is 12:
Figure BDA0003151189140000313
Figure BDA0003151189140000321
27. it is proposed that MMVD distance can be signaled with a granular signaling method. The distance is signaled first by an index with coarse granularity, followed by one or more indices with finer granularity.
a. For example, the first index F1Representing an ordered set M1The distance of (1); second index F2Representing an ordered set M2Of (2) is less than (d). The final distance is calculated as, for example, M1[F1]+M2[F2]。
b. For example, the first index F1Representing an ordered set M1The distance of (1); second index F2Representing an ordered set M2The distance of (1); and so on until the nth index FnRepresenting an ordered set MnOf (2) is less than (d). The final distance is calculated as M1[F1]+M2[F2]+...+Mn[Fn]。
c. For example, FkMay depend on the signaled Fk-1
i. In one example, when Fk-1Not pointing to Mk-1At the maximum index of, Mk[Fk]Must be less than Mk-1[Fk-1+1]-Mk-1[Fk-1]。1<k<=n。
d. For example, FkMay depend on the signaling or binarization for all 1 s<=s<F of signaling of kS
i. In one example, when 1<k<When n is equal to Mk[Fk]Must be less than for all 1 s<=s<M of kS[FS+1]-MS[FS]。
e. In one example, if FkIs directed to MkMaximum index of, then no longer signals Fk+1And the final distance is calculated as M1[F1]+M2[F2]+...+Mk[Fk]In which 1 is<=k<=n。
f. In one example, Mk[Fk]The entry in (b) may depend on the signaled Fk-1
g. In one example, Mk[Fk]An entry in (1) may depend on all 1 s<=s<F of signaling of kS
h. For example, n is 2. M1(iii) 1/4 pixels, 1 pixel, 4 pixels, 8 pixels, 16 pixels, 32 pixels, 64 pixels, 128 pixels,
i. when F is present1=0(M1[F1]1/4 pixels), M 20 pixels, 1/4 pixels;
when F1=1(M1[F1]1 pixel), M 20 pixels, 1 pixel, 2 pixels;
when F1=2(M1[F1]4 pixels), M 20 pixels, 1 pixel, 2 pixels, 3 pixels;
when F1=3(M1[F1]8 pixels), M 20 pixels, 2 pixels, 4 pixels, 6 pixels };
v. when F1=4(M1[F1]16 pixels), M 20 pixels, 4 pixels, 8 pixels, 12 pixels };
when F1=5(M1[F1]32 pixels), M 20 pixels, 8 pixels, 16 pixels, 24 pixels;
when F1=6(M1[F1]32 pixels), M 20 pixels, 16 pixels, 32 pixels, 48 pixels };
stripe/picture level control
28. It is proposed how to signal MMVD side information (e.g. MMVD distance) and/or how to interpret the signaled MMVD side information (e.g. distance index of distance) may depend on the information signaled or inferred at a level higher than the CU level (e.g. sequence level, or picture level or slice level, or slice group level, such as in VPS/SPS/PPS/slice header/picture header/slice group header).
a. In one example, the code table index is signaled or inferred at a higher level. The particular code table is determined by the table index. The distance index may be signaled in the methods disclosed in items 1-26. The distance is then derived by looking up an entry in the specific code table with the signaled distance index.
b. In one example, the parameter X is signaled or inferred at a higher level. The distance index may be signaled in the methods disclosed in items 1-26. The distance D' is then derived by looking up an entry in the code table with the signaled distance index. Then, the final distance D is calculated as D ═ f (D', X). f may be any function. For example, f (D ', X) ═ D' < < X or f (D ', X) ═ D' ×, or f (D ', X) ═ D' + X, or f (D ', X) ═ D' is shifted right by X (with or without rounding).
c. In one example, the effective MV resolution is signaled or inferred at a higher level. Only MMVD distances with valid MV resolution can be signaled.
i. For example, the signaling method of MMVD information at CU level may depend on the effective MV resolution signaled at higher level.
(i) For example, the signaling method of MMVD distance resolution information at CU level may depend on the effective MV resolution signaled at higher level.
(ii) For example, the number of distance subsets may depend on the effective MV resolution signaled at a higher level.
(iii) For example, the meaning of each subset may depend on the effective MV resolution signaled at a higher level.
For example, the minimum MV resolution (such as 1/4 pixels or 1 pixel or 4 pixels) is signaled.
(i) For example, when the minimum MV resolution is 1/4 pixels, the distance index is signaled, as described in items 1-26.
(ii) For example, when the minimum MV resolution is 1 pixel, a flag (such as the first resolution bit in LAMVR) for signaling whether the range resolution is 1/4 pixels is not signaled. Only the medium distance index and the long distance index disclosed in item 10 may be signaled after the LAMVR information.
(iii) For example, when the minimum MV resolution is 4 pixels, a flag (such as the first resolution bit in LAMVR) for signaling whether the range resolution is 1/4 pixels is not signaled; and does not signal a flag (such as a second resolution bit in lamfr) that signals whether the range resolution is 1 pixel. Only the long-range index disclosed in entry 10 may be signaled after the LAMVR information.
(iv) For example, when the minimum MV resolution is 1 pixel, the distance resolution is signaled in the same manner as when the minimum MV resolution is 1/4 pixels. But the significance of the distance subsets may be different.
a. For example, the short-range subset represented by the short-range index is redefined as a very long-range subset. For example, the two distances that can be signaled within this very long subset are 64 pixels and 128 pixels.
It is proposed that the encoder can decide whether a slice/picture/sequence/CTU group/block group is screen content by checking the ratio of blocks having one or more similar or identical blocks within the same slice/picture/sequence/CTU group/block group.
a. In one example, if the ratio is greater than a threshold, it is considered screen content.
b. In one example, if the ratio is greater than a first threshold and less than a second threshold, it is considered screen content.
c. In one example, a slice/picture/sequence/CTU group/block group may be partitioned into mxn non-overlapping blocks. For each mxn block, the encoder checks whether another (or more) mxn blocks are similar or identical to it. For example, M × N is equal to 4 × 4.
d. In one example, only partial blocks are examined when calculating the ratio. For example, only blocks in even rows and even columns are examined.
e. In one example, a key, e.g., a Cyclic Redundancy Check (CRC) code, may be generated for each M × N block, and the key values of the two blocks are compared to check whether the two blocks are the same.
i. In one example, key values may be generated using only some of the color components of the block. For example, the key value is generated only by using the luminance component.
in one example, key values may be generated using only some pixels of a block. For example, only the even rows of the block are used.
f. In one example, SAD/SATD/SSE or mean removal SAD/SATD/SSE may be used to measure the similarity of two blocks.
i. In one example, the SAD/SATD/SSE or mean removal SAD/SATD/SSE may be calculated for only some pixels. For example, SAD/SATD/SSE or mean-removed SAD/SATD/SSE is computed for even rows only.
Affine MMVD
It is proposed that the indication of the use of affine MMVD can be signaled only when the Merge index of the sub-block Merge list is greater than K (where K is 0 or 1).
a. Alternatively, when there are separate lists for the affine Merge list and other Merge lists (such as ATMVP lists), the indication of the use of affine MMVD may be signaled only when affine mode is enabled. Further, optionally, the indication of the use of affine MMVD may be signaled only when affine mode is enabled and there is more than one basic affine candidate.
30. It is proposed that the MMVD method can be applied to other subblock-based codec tools, such as ATMVP mode, in addition to affine mode. In one example, if the current CU applies ATMVP and the MMVD on/off flag is set to 1, then MMVD is applied to ATMVP.
a. In one example, a set of MMVD side-information may be applied to all sub-blocks, in which case a set of MMVD side-information is signaled. Alternatively, different sub-blocks may select different groups, in which case multiple sets of MMVD side-information may be signaled.
b. In one embodiment, the MV of each sub-block is added to the signaled MVD (also referred to as offset or distance).
c. In one embodiment, when the sub-block Merge candidate is an ATMVP Merge candidate, the method of signaling MMVD information is the same as when the sub-block Merge candidate is an affine Merge candidate.
d. In one embodiment, when the subblock Merge candidates are ATMVP Merge candidates, a POC distance based offset mirroring method is used for bi-directional prediction to add MVD to the MV of each subblock.
31. It is proposed that when the subblock Merge candidates are affine Merge candidates, the MV of each subblock is added to the signaled MVD (also referred to as offset or distance).
32. It is proposed that the MMVD signaling method disclosed in items 1-28 can also be applied to signal the MVDs used by the affine MMVD mode.
a. In one embodiment, the LAMVR information used to signal MMVD information for affine MMVD may be different from the LAMVR information used to signal MMVD information for non-affine MMVD modes.
i. For example, the LAMVR information used to signal MMVD information for an affine MMVD mode is also used to signal MV precision used in an affine inter-frame mode; the LAMVR information used to signal MMVD information for non-affine MMVD modes is used to signal MV precision for use in non-affine inter-frame modes.
33. It is proposed that MVD information in MMVD mode of sub-block Merge candidate should be signaled in the same way as MVD information in MMVD mode of regular Merge candidate.
a. For example, they share the same distance table;
b. for example, they share the same mapping between distance index and distance.
c. For example, they share the same orientation.
d. For example, they share the same binarization method.
e. For example, they share the same arithmetic codec context.
34. It is proposed that MMVD side information signaling may depend on the codec mode, such as affine or normal Merge or triangle Merge mode or ATMVP mode.
35. It is proposed that the predetermined MMVD side information may depend on a codec mode, such as an affine or normal large or triangular large mode or an ATMVP mode.
36. It is proposed that the predetermined MMVD side information may depend on the color sub-sampling method (e.g. 4:2:0, 4:2:2, 4:4:4:4) and/or the color component.
Triangular MMVD
37. It is proposed that MMVD can be applied to triangle prediction modes.
a. After signaling the TPM target candidate, MMD information is signaled. The TPM large candidate signaled is regarded as a basic large candidate.
i. For example, the MMVD information is signaled with the same signaling method as the MMVD of the rule Merge;
for example, the MMVD information is signaled with the same signaling method as the MMVD of affine Merge or other kind of sub-block Merge;
for example, MMVD information is signaled with a signaling method different from MMVD of rule, affine, or other kind of sub-block, Merge;
b. in one example, the MV of each triangle partition is added to the signaled MVD;
c. in one example, the MVs of one triangle partition are added to the signaled MVD and the MVs of the other triangle partition are added to f (signaled MVD), f being any function.
i. In one example, f depends on the reference picture POC or reference index of the two triangle partitions.
in one example, if the reference picture of one triangular partition precedes the current picture in display order and the reference picture of another triangular partition follows the current picture in display order, then f (MVD) — MVD.
38. It is proposed that the MMVD signaling method disclosed in items 1-28 can also be applied to signaling the MVDs used by the triangular MMVD pattern.
a. In one embodiment, the LAMVR information used to signal MMVD information for affine MMVD may be different from the LAMVR information used to signal MMVD information for non-affine MMVD modes.
i. For example, the LAMVR information used to signal MMVD information for affine MMVD mode is also used to signal MV precision used in affine inter-frame mode; the LAMVR information used to signal MMVD information for non-affine MMVD modes is used to signal MV precision for use in non-affine inter-frame modes.
39. For all the above items, the MMVD side information may include, for example, an offset table (distance) and direction information.
5. Example embodiments
This section shows some embodiments of the improved MMVD design.
5.1 example #1(MMVD distance index codec)
In one embodiment, to codec MMVD distances, the first resolution bits are codec. For example, it may be coded with the same probability context as the first flag of MV resolution.
-if the resolution bit is 0, encoding and decoding the following flags. For example, it may be coded with another probability context to indicate a short range index. If the flag is 0, the index is 0; if the flag is 1, the index is 1.
Otherwise (resolution bit 0), the long-distance index L is coded as a truncated unary code with a maximum value MaxDI-2, where MaxDI is the largest possible distance index, in this embodiment equal to 7.
After L is resolved, the distance index is reconstructed as L + 2. In an exemplary type C embodiment:
Figure BDA0003151189140000381
the first bin of the long-distance index is coded with a probability context and the other bins are bypass coded. In the type C example:
Figure BDA0003151189140000382
the proposed example of a change in syntax is highlighted and the deleted portions are marked with strikethrough.
Figure BDA0003151189140000383
Figure BDA0003151189140000391
In one example, mmvd _ distance _ subset _ idx denotes a resolution index as described above, and mmvd _ distance _ idx _ in _ subset denotes a short-distance or long-distance index according to the resolution index. The truncated unary may be used for codec mmvd _ distance _ idx _ in _ subset.
This embodiment can achieve an average codec gain of 0.15% and a gain of 0.34% for UHD sequences (class a1) in random access testing under general test conditions.
Figure BDA0003151189140000392
5.2 embodiment #2(MMVD side information codec)
MMVD is considered a separate mode rather than a Merge mode. Therefore, the MMVD flag can be further codec only when the Merge flag is 0.
Figure BDA0003151189140000393
Figure BDA0003151189140000401
Figure BDA0003151189140000402
In one embodiment, the MMVD information is signaled as:
Figure BDA0003151189140000403
mmvd _ distance _ idx _ in _ subset [ x0] [ y0] is binarized into truncated unary codes. If amvr _ mode [ x0] [ y0] <2, the maximum value of the truncated unary code is 1; otherwise (amvr _ mode x0 y0 equals 2), the maximum value is set to 3.
mmvd _ distance _ idx [ x0] [ y0] is set equal to mmvd _ distance _ idx _ in _ subset [ x0] [ y0] +2 × amvr _ mode [ x0] [ y0 ].
Which probability contexts are used by mmvd _ distance _ idx _ in _ subset [ x0] [ y0] depends on amvr _ mode [ x0] [ y0 ].
5.3 example #3(MMVD stripe level control)
In the slice header, a syntax element mmvd _ integer _ flag is signaled.
The grammatical changes are described below, with the newly added part highlighted in italics.
7.3.2.1 sequence parameter set RBSP syntax
Figure BDA0003151189140000411
7.3.3.1 general purpose Bandwidth header grammar
Figure BDA0003151189140000412
7.4.3.1 sequence parameter set RBSP semantics
sps _ fracmmvd _ enabled _ flag equal to 1 specifies that slice _ fracmmvd _ flag is present in the slice header syntax of B slices and P slices. sps _ fracmmvd _ enabled _ flag equal to 0 specifies that slice _ fracmmvd _ flag is not present in the slice header syntax of B slices and P slices.
7.4.4.1 Universal Bandwidth header semantics
slice _ fracmmvd _ flag specifies the distance table used to derive MmvdDistance [ x0] [ y0 ]. When not present, the value of slice _ fracmmvd _ flag is inferred to be 1.
Figure BDA0003151189140000421
Figure BDA0003151189140000422
Figure BDA0003151189140000431
In one embodiment, the MMVD information is signaled as:
Figure BDA0003151189140000432
mmvd _ distance _ idx _ in _ subset [ x0] [ y0] is binarized into truncated unary codes. If amvr _ mode [ x0] [ y0] <2, the maximum value of the truncated unary code is 1; otherwise (amvr _ mode x0 y0 equals 2), the maximum value is set to 3.
mmvd _ distance _ idx [ x0] [ y0] is set equal to mmvd _ distance _ idx _ in _ subset [ x0] [ y0] +2 × amvr _ mode [ x0] [ y0 ]. In one example, the probability context used by mmvd _ distance _ idx _ in _ subset [ x0] [ y0] depends on amvr _ mode [ x0] [ y0 ].
The array index x0, y0 specifies the position of the top left luma sample of the considered coding block relative to the top left luma sample of the picture (x0, y 0). mmvd _ distance _ idx [ x0] [ y0] and Mmvddistance [ x0] [ y0] are as follows:
table 7-9-when slice _ fracmmvd _ flag is equal to 1, the specification for MmvdDistance [ x0] [ y0] is based on mmvd _ Distance _ idx [ x0] [ y0 ].
mmvd_distance_idx[x0][y0] MmvdDistance[x0][y0]
0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
Table 7-9-when slice _ fracmmvd _ flag is equal to 0, the specification for MmvdDistance [ x0] [ y0] is based on mmvd _ Distance _ idx [ x0] [ y0 ].
mmvd_distance_idx[x0][y0] MmvdDistance[x0][y0]
0 4
1 8
2 16
3 32
4 64
5 128
6 256
7 512
When mmvd _ integer _ flag is equal to 1, mmvd _ distance ═ mmvd _ distance < < 2.
Fig. 10 is a flow diagram of an example method 1000 for video processing. Method 1000 includes performing (1002) a conversion between a current video block of a video and a bitstream representation of the video, wherein the current video block is converted as a Codec Unit (CU) in a merge (MMVD) mode with a motion vector difference, and the conversion includes determining (1004) MMVD side information for the current video block based on an indication from a level higher than the CU level.
Fig. 11 is a flow diagram of an example method 1100 for video processing. The method 1100 comprises determining (1102) a ratio of similar or identical blocks within at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs) or a group of blocks; and determining (1104) whether at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs), or a group of blocks is screen content based on the ratio.
Some examples of motion vector signaling are described in section 4 of this document with reference to methods 1000, 1100, and the aforementioned methods may include the features and steps described below.
In one aspect, a method for video processing is disclosed, comprising: performing a conversion between a current video block of a video and a bitstream representation of the video, wherein the current video block is converted as a Codec Unit (CU) in a merge (MMVD) mode with a motion vector difference, and the conversion comprises determining MMVD side information of the current video block based on an indication from a level higher than the CU level; wherein, in the MMVD mode, at least one Merge candidate is selected and further refined based on the MMVD side information.
In one example, the level includes at least one of a sequence level, a picture level, a slice group level.
In one example, the indication is signaled or inferred in at least one of a Sequence Parameter Set (SPS), a Video Parameter Set (VPS), a Picture Parameter Set (PPS), a picture header, a slice header, and a slice group header.
In one example, the MMVD side information includes one or more MMVD distances that are included as entries in one or more tables.
In one example, wherein one or more tables are determined based on the indication, and each entry is retrieved with the corresponding distance index.
In one example, the indication includes a table index indicating each of the one or more tables, and each entry is retrieved with a corresponding distance index.
In one example, the method further includes deriving a final distance for the current video block from the entries retrieved with the corresponding distance index.
In one example, the method also includes deriving a final distance for the current video block based on a functional relationship of a parameter value indicated by the indication and an entry retrieved with a corresponding distance index.
In one example, the functional relationship includes at least one of:
f(D’,X)=D’<<X;
f(D’,X)=D’*X;
f (D ', X) ═ D' + X; and
f(D’,X)=D’>>X,
where D' represents the entry retrieved with the corresponding distance index, X represents the parameter value, and "< <" and "> >" represent the left and right shifts, respectively, with or without rounding.
In one example, X is equal to 0 or 2 and is indicated from a level higher than the CU level.
In one example, the indication indicates a Motion Vector (MV) resolution for determining MMVD side information.
In one example, the definition of the one or more tables depends on MV resolution, wherein the MMVD side information includes one or more MMVD distances included as entries in the one or more tables, and each entry is retrieved with a corresponding distance index.
In one example, the number of the one or more tables depends on the MV resolution, wherein the MMVD side information includes one or more MMVD distances included as entries in the one or more tables, and each entry is retrieved with a corresponding distance index.
In one example, the signaling of the one or more MMVD distances included in the MMVD side information of the current video block depends on the MV resolution.
In one example, the MV resolution includes a first sample precision and a second sample precision, and the one or more tables are selected based on whether the MV resolution is the first sample precision or the second sample precision, wherein the MMVD side information includes one or more MMVD distances included as entries in the one or more tables, and each entry is retrieved with a corresponding distance index.
In one example, the first sampling precision is 1/4 samples and the second sampling precision is 1 sample.
In one example, the MV resolution comprises a minimum MV resolution.
In one example, a flag indicating an MV resolution less than the minimum MV resolution is not signaled for the current video block.
In one example, if the minimum MV resolution is 1 pixel, a first flag indicating that the MV resolution of the current video block is 1/4 pixels is not signaled.
In one example, the first flag represents a first resolution bit of an Adaptive Motion Vector Resolution (AMVR) and does not signal a short range index indicating an entry of less than 1 pixel.
In one example, if the minimum MV resolution is 4 pixels, a first flag indicating that the MV resolution of the current video block is 1/4 pixels is not signaled, and a second flag indicating that the MV resolution of the current video block is 1 pixel is not signaled.
In one example, the first flag and the second flag represent a first resolution bit and a second resolution bit, respectively, of an Adaptive Motion Vector Resolution (AMVR).
In one example, if the minimum MV resolution is 4 pixels, the short distance index indicating an entry of less than 1 pixel is not signaled, and the medium distance index indicating an entry of less than 4 pixels is not signaled.
In one example, if the minimum MV resolution is 1 pixel, the subset indicated with the short-range index is redefined as a very long-range subset comprising two large-sized entries.
In one example, the two entries are 64 pixels and 128 pixels, respectively.
In one aspect, a video processing method is disclosed, comprising:
determining a ratio of similar or identical blocks within at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs), or a group of blocks; and
determining whether at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is screen content based on the ratio.
In one example, if the ratio is greater than a threshold, at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is determined to be screen content.
In one example, if the ratio is greater than a first threshold and less than a second threshold, at least one of the picture, the slice, the sequence, the group of CTUs, or the group of blocks is determined to be screen content.
In one example, at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is partitioned into a plurality of non-overlapping blocks of size mxn, and all or a portion of the non-overlapping blocks are examined to determine a ratio of similar or identical blocks therein.
In one example, where M-N-4.
In one example, this portion of the non-overlapping block is located in even rows and even columns.
In one example, a key value is generated for a block to be examined, the method including comparing the generated key value of the block with a key value of another block to determine whether the two blocks are similar or identical.
In one example, the key value represents a Cyclic Redundancy Check (CRC) code.
In one example, a key value is generated based on at least one color component of a block.
In one example, the at least one color component is a luminance component.
In one example, a key value is generated based on a portion of pixels of a block.
In one example, the similarity of two blocks is measured based on at least one of a Sum of Absolute Differences (SAD), a Sum of Absolute Transformed Differences (SATD), a Sum of Squared Errors (SSE), and a mean removed SAD/SATD/SSE.
In one example, the SAD, SATD, SSE, or mean removed SAD/SATD/SSE is computed for only a portion of the pixels of the block.
In one example, the portion of pixels is in even rows of the block.
In one example, the converting includes encoding the current video block into a bitstream representation of the current video block, and decoding the current video block from the bitstream representation of the current video block.
In one aspect, an apparatus in a video system is disclosed, the apparatus comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method in any of the above examples.
In one aspect, a computer program product stored on a non-transitory computer readable medium is disclosed, the computer program product comprising program code for performing the method in any of the above examples.
Fig. 12 is a block diagram of the video processing apparatus 1200. Apparatus 1200 may be used to implement one or more of the methods described herein. The apparatus 1200 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1200 may include one or more processors 1202, one or more memories 1204, and video processing hardware 1206. The processor(s) 1202 may be configured to implement one or more of the methods described in this document. The memory (es) 1204 may be used for storing data and code for implementing the methods and techniques described herein. The video processing hardware 1206 may be used to implement some of the techniques described in this document in hardware circuitry and may be partially or completely part of the processor 1202 (e.g., a graphics processor core GPU or other signal processing circuitry).
In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during the conversion from a pixel representation to a corresponding bitstream representation of the video, and vice versa. The bitstream representation of the current video block may, for example, correspond to bits collocated or distributed at different locations within the bitstream, as defined by the syntax. For example, a macroblock may be encoded from transformed and encoded error residual values, and may also be encoded using bits in headers and other fields in the bitstream.
It should be appreciated that several techniques have been disclosed that would benefit video encoder and decoder embodiments incorporated in video processing devices such as smart phones, laptops, desktops, and similar devices by allowing the use of virtual motion candidates constructed based on the various rules disclosed in this document.
The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or claim, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims (42)

1. A method for video processing, comprising:
performing a conversion between a current video block of a video and a bitstream representation of the video, wherein the current video block is converted as a Codec Unit (CU) in a Merge (MMVD) mode with a motion vector difference, and the conversion comprises determining MMVD side information for the current video block based on an indication from a level above a CU level;
wherein, in the MMVD mode, at least one Merge candidate is selected and further refined based on the MMVD side information.
2. The method of claim 1, wherein the level comprises at least one of a sequence level, a picture level, a slice group level.
3. The method of claim 2, wherein the indication is signaled or inferred in at least one of a Sequence Parameter Set (SPS), a Video Parameter Set (VPS), a Picture Parameter Set (PPS), a picture header, a slice header, and a slice group header.
4. The method of any of claims 1-3, wherein the MMVD side information includes one or more MMVD distances that are included as entries in one or more tables.
5. The method of claim 4, wherein the one or more tables are determined based on the indication and each entry is retrieved with a corresponding distance index.
6. The method of claim 4 or 5, wherein the indication comprises a table index indicating each of the one or more tables, and each entry is retrieved with a corresponding distance index.
7. The method of claim 5 or 6, further comprising:
deriving a final distance for the current video block from the entries retrieved with the corresponding distance index.
8. The method of claim 1, further comprising:
deriving a final distance for the current video block based on a functional relationship of a parameter value indicated by the indication and an entry retrieved with the corresponding distance index.
9. The method of claim 8, wherein the functional relationship comprises at least one of:
f(D’,X)=D’<<X;
f(D’,X)=D’*X;
f (D ', X) ═ D' + X; and
f(D’,X)=D’>>X,
where D' represents the entry retrieved with the corresponding distance index, X represents the parameter value, and "< <" and "> >" represent a left shift and a right shift, respectively, with or without rounding.
10. The method of claim 9, wherein X is equal to 0 or 2 and is indicated from a level above the CU level.
11. The method of any one of claims 1-10, wherein the indication indicates a Motion Vector (MV) resolution for determining the MMVD side-information.
12. The method of claim 11, wherein the definition of one or more tables depends on the MV resolution, wherein the MMVD side information includes one or more MMVD distances included as entries in one or more tables, and each entry is retrieved with the corresponding distance index.
13. The method of claim 11 or 12, wherein the number of the one or more tables depends on the MV resolution, wherein the MMVD side information comprises one or more MMVD distances included as entries in one or more tables, and each entry is retrieved with a corresponding distance index.
14. The method of any of claims 11-13, wherein the signaling of the one or more MMVD distances included in the MMVD side information of the current video block depends on the MV resolution.
15. The method of any of claims 11-14, wherein the MV resolution comprises a first sample precision and a second sample precision, and one or more tables are selected based on whether the MV resolution is the first sample precision or the second sample precision, wherein the MMVD side information comprises one or more MMVD distances included as entries in the one or more tables, and each entry is retrieved with a corresponding distance index.
16. The method of claim 15, wherein the first sample accuracy is 1/4 samples and the second sample accuracy is 1 sample.
17. The method of any of claims 11-14, wherein the MV resolution comprises a minimum MV resolution.
18. The method of claim 17, wherein, for the current video block, a flag indicating an MV resolution less than a minimum MV resolution is not signaled.
19. The method of claim 18, wherein if the minimum MV resolution is 1 pixel, a first flag indicating that the MV resolution of the current video block is 1/4 pixels is not signaled.
20. The method of claim 19, wherein the first flag represents a first resolution bit of an Adaptive Motion Vector Resolution (AMVR) and does not signal a short range index indicating an entry of less than 1 pixel.
21. The method of claim 18, wherein if the minimum MV resolution is 4 pixels, then a first flag indicating that the MV resolution of the current video block is 1/4 pixels is not signaled, and a second flag indicating that the MV resolution of the current video block is 1 pixel is not signaled.
22. The method of claim 21, wherein the first and second flags represent first and second resolution bits of an Adaptive Motion Vector Resolution (AMVR), respectively.
23. The method of claim 22, wherein if the minimum MV resolution is 4 pixels, no short range index is signaled to indicate entries that are less than 1 pixel, and no medium range index is signaled to indicate entries that are less than 4 pixels.
24. The method of claim 18, wherein if the minimum MV resolution is 1 pixel, the subset indicated with the short-range index is redefined as a very-long-range subset comprising two large-size entries.
25. The method of claim 24, wherein the two entries are 64 pixels and 128 pixels, respectively.
26. A method for video processing, comprising:
determining a ratio of similar or identical blocks within at least one of a picture, a slice, a sequence, a group of Codec Tree Units (CTUs), or a group of blocks; and
determining whether at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is screen content based on the ratio.
27. The method of claim 26, wherein if the ratio is greater than a threshold, at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is determined to be screen content.
28. The method of claim 26, wherein at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is determined to be screen content if the ratio is greater than a first threshold and less than a second threshold.
29. The method of any of claims 26-28, wherein at least one of a picture, a slice, a sequence, a group of CTUs, or a group of blocks is partitioned into a plurality of non-overlapping blocks of size mxn, and all or a portion of the non-overlapping blocks are examined to determine a ratio of similar or identical blocks therein.
30. The method of claim 29, wherein M-N-4.
31. The method of claim 29 or 30, wherein a portion of the non-overlapping block is located in even rows and even columns.
32. A method as claimed in any one of claims 29 to 31, wherein a key value is generated for the block to be checked, the method comprising comparing the generated key value of the block with a key value of another block to determine whether the two blocks are similar or identical.
33. The method of claim 32, wherein the key value represents a Cyclic Redundancy Check (CRC) code.
34. The method of claim 32 or 33, wherein the key value is generated based on at least one color component of a block.
35. The method of claim 34, wherein the at least one color component is a luminance component.
36. The method of any of claims 32-35, wherein the key value is generated based on a portion of pixels of a block.
37. The method of one of claims 26-36, wherein the similarity of two blocks is measured based on at least one of a Sum of Absolute Differences (SAD), a Sum of Absolute Transform Differences (SATD), a Sum of Squared Errors (SSE), and a mean removed SAD/SATD/SSE.
38. The method of claim 37 wherein SAD, SATD, SSE or mean removed SAD/SATD/SSE is calculated for only a portion of the pixels of the block.
39. A method as claimed in claim 36 or 38, wherein the portion of pixels is in even rows of the block.
40. The method of any one of claims 1-25, wherein the converting comprises encoding the current video block into a bitstream representation of the current video block, and decoding the current video block from the bitstream representation of the current video block.
41. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1 to 40.
42. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of any of claims 1 to 40.
CN202080008242.9A 2019-01-07 2020-01-07 Control method for Merge with MVD Active CN113302936B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
CN2019070636 2019-01-07
CNPCT/CN2019/070636 2019-01-07
CNPCT/CN2019/071159 2019-01-10
CN2019071159 2019-01-10
PCT/CN2020/070772 WO2020143643A1 (en) 2019-01-07 2020-01-07 Control method for merge with mvd

Publications (2)

Publication Number Publication Date
CN113302936A true CN113302936A (en) 2021-08-24
CN113302936B CN113302936B (en) 2024-03-19

Family

ID=71521713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080008242.9A Active CN113302936B (en) 2019-01-07 2020-01-07 Control method for Merge with MVD

Country Status (2)

Country Link
CN (1) CN113302936B (en)
WO (1) WO2020143643A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023131047A1 (en) * 2022-01-05 2023-07-13 Beijing Bytedance Network Technology Co., Ltd. Method, apparatus, and medium for video processing

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117501688A (en) * 2021-06-15 2024-02-02 抖音视界有限公司 Method, apparatus and medium for video processing

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120328209A1 (en) * 2011-06-27 2012-12-27 Hisao Sasai Image decoding method, image coding method, image decoding apparatus, image coding apparatus, and image coding and decoding apparatus
CN103583048A (en) * 2011-06-30 2014-02-12 松下电器产业株式会社 Image decoding method, image encoding method, image decoding device, image encoding device, and image encoding/decoding device
WO2015106121A1 (en) * 2014-01-10 2015-07-16 Qualcomm Incorporated Block vector coding for intra block copy in video coding
CN106031175A (en) * 2013-12-20 2016-10-12 三星电子株式会社 Interlayer video encoding method using brightness compensation and device thereof, and video decoding method and device thereof
CN106358029A (en) * 2016-10-18 2017-01-25 北京字节跳动科技有限公司 Video image processing method and device
CN107113424A (en) * 2014-11-18 2017-08-29 联发科技股份有限公司 Bidirectional predictive video coding method based on the motion vector from single directional prediction and merging candidate
US20180077417A1 (en) * 2016-09-14 2018-03-15 Mediatek Inc. Method and Apparatus of Encoding Decision for Encoder Block Partition
US20180091816A1 (en) * 2016-09-29 2018-03-29 Qualcomm Incorporated Motion vector coding for video coding
CN108353184A (en) * 2015-11-05 2018-07-31 联发科技股份有限公司 The method and apparatus of the inter-prediction using average motion vector for coding and decoding video
CN108432250A (en) * 2016-01-07 2018-08-21 联发科技股份有限公司 The method and device of affine inter-prediction for coding and decoding video
US20180352223A1 (en) * 2017-05-31 2018-12-06 Mediatek Inc. Split Based Motion Vector Operation Reduction
CN109792532A (en) * 2016-10-04 2019-05-21 高通股份有限公司 Adaptive motion vector precision for video coding

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107371025B (en) * 2011-06-24 2020-04-14 威勒斯媒体国际有限公司 Decoding method and decoding device
WO2017039117A1 (en) * 2015-08-30 2017-03-09 엘지전자(주) Method for encoding/decoding image and device therefor
EP3355578B1 (en) * 2015-09-24 2020-12-09 LG Electronics Inc. Motion vector predictor derivation and candidate list construction
US11399187B2 (en) * 2017-03-10 2022-07-26 Intel Corporation Screen content detection for adaptive encoding
CN109120928B (en) * 2018-04-18 2022-02-01 北方工业大学 Improved intra block copying method and device based on character segmentation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120328209A1 (en) * 2011-06-27 2012-12-27 Hisao Sasai Image decoding method, image coding method, image decoding apparatus, image coding apparatus, and image coding and decoding apparatus
CN103583048A (en) * 2011-06-30 2014-02-12 松下电器产业株式会社 Image decoding method, image encoding method, image decoding device, image encoding device, and image encoding/decoding device
CN106031175A (en) * 2013-12-20 2016-10-12 三星电子株式会社 Interlayer video encoding method using brightness compensation and device thereof, and video decoding method and device thereof
WO2015106121A1 (en) * 2014-01-10 2015-07-16 Qualcomm Incorporated Block vector coding for intra block copy in video coding
CN107113424A (en) * 2014-11-18 2017-08-29 联发科技股份有限公司 Bidirectional predictive video coding method based on the motion vector from single directional prediction and merging candidate
CN108353184A (en) * 2015-11-05 2018-07-31 联发科技股份有限公司 The method and apparatus of the inter-prediction using average motion vector for coding and decoding video
CN108432250A (en) * 2016-01-07 2018-08-21 联发科技股份有限公司 The method and device of affine inter-prediction for coding and decoding video
US20180077417A1 (en) * 2016-09-14 2018-03-15 Mediatek Inc. Method and Apparatus of Encoding Decision for Encoder Block Partition
US20180091816A1 (en) * 2016-09-29 2018-03-29 Qualcomm Incorporated Motion vector coding for video coding
CN109792532A (en) * 2016-10-04 2019-05-21 高通股份有限公司 Adaptive motion vector precision for video coding
CN106358029A (en) * 2016-10-18 2017-01-25 北京字节跳动科技有限公司 Video image processing method and device
US20180352223A1 (en) * 2017-05-31 2018-12-06 Mediatek Inc. Split Based Motion Vector Operation Reduction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHI-YI LIN等: "CE4.2.3: Affine merge mode", 《JVET》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023131047A1 (en) * 2022-01-05 2023-07-13 Beijing Bytedance Network Technology Co., Ltd. Method, apparatus, and medium for video processing

Also Published As

Publication number Publication date
CN113302936B (en) 2024-03-19
WO2020143643A1 (en) 2020-07-16

Similar Documents

Publication Publication Date Title
CN113039790B (en) Method, apparatus and non-transitory computer readable medium for video processing
CN113196747B (en) Information signaling in current picture reference mode
CN110944191A (en) Signaling of motion vector accuracy indication with adaptive motion vector resolution
CN111418210A (en) Ordered motion candidate list generation using geometric partitioning patterns
CN113273207A (en) Merge with Motion Vector Difference (MVD) based on geometric partitioning
CN113016183A (en) Construction method for spatial domain motion candidate list
CN113545069A (en) Motion vector management for decoder-side motion vector refinement
CN113906738A (en) Adaptive motion vector difference resolution for affine mode
CN115280774A (en) Differencing merge with motion vector in affine mode
CN113302936B (en) Control method for Merge with MVD
CN113424534A (en) Multiple syntax elements for adaptive motion vector resolution
CN113826394A (en) Improvement of adaptive motion vector difference resolution in intra block copy mode
CN113348667B (en) Resolution method of distance index under Merge with MVD
CN113273187B (en) Affine-based Merge with Motion Vector Difference (MVD)
CN113273208A (en) Improvement of affine prediction mode
CN113557720A (en) Adaptive weights in multi-hypothesis prediction in video coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant