WO2019072187A1 - Élagage de liste de candidats de modèle de mouvement pour une inter-prédiction - Google Patents

Élagage de liste de candidats de modèle de mouvement pour une inter-prédiction Download PDF

Info

Publication number
WO2019072187A1
WO2019072187A1 PCT/CN2018/109618 CN2018109618W WO2019072187A1 WO 2019072187 A1 WO2019072187 A1 WO 2019072187A1 CN 2018109618 W CN2018109618 W CN 2018109618W WO 2019072187 A1 WO2019072187 A1 WO 2019072187A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
candidate list
model
parameters
motion model
Prior art date
Application number
PCT/CN2018/109618
Other languages
English (en)
Inventor
Huanbang CHEN
Haitao Yang
Shan Gao
Yin ZHAO
Jiantong Zhou
Shan Liu
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2019072187A1 publication Critical patent/WO2019072187A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria

Definitions

  • the present disclosure is generally related to video coding, and is specifically related to generation of motion model candidate lists for coding video blocks via inter-prediction in video coding.
  • video data is generally compressed before being communicated across modern day telecommunications networks.
  • the size of a video could also be an issue when the video is stored on a storage device because memory resources may be limited.
  • Video compression devices often use software and/or hardware at the source to code the video data prior to transmission or storage, thereby decreasing the quantity of data needed to represent digital video images.
  • the compressed data is then received at the destination by a video decompression device that decodes the video data.
  • the disclosure includes a method implemented in a decoder.
  • the method comprises receiving, by a receiver of the decoder, a bitstream with a plurality of coded blocks including a current block and neighbor coded blocks.
  • the method also includes determining, by a processor of the decoder, motion vectors for control points of the current block from the neighbor coded blocks.
  • the method also includes generating, by the processor, a plurality of motion models based on the motion vectors for the control points.
  • the method generates, by the processor, a candidate list. Generating a candidate list includes inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • Generating the candidate list also includes rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the method also includes parsing the bitstream, by the processor, to obtain a candidate index indicating a selected motion model from the candidate list.
  • the method also includes reconstructing, by the processor, the current block by performing motion compensation of the current block based on the selected motion model.
  • the method also includes forwarding, by the processor, a video frame including the current block for display as part of a video sequence.
  • Removing redundant motion models decreases the number bits/bins employed to uniquely indicate a selected motion model. Hence, removing redundant motion models decreases the file size of the video coding file, and hence increases file compression. Further, removing redundant motion models reduces the number of motion models considered at the encoder during video coding. Hence, removing redundant motion models also decreases encoder complexity, and hence reduces time to encode files at the encoder.
  • another implementation of the aspect includes, wherein the parameters are transform parameters, and wherein the difference between the parameters for the current motion model and corresponding parameters of motion models in the candidate list is compared to a threshold according to:
  • ai are the transform parameters for the current motion model
  • bi are the transform parameters of a motion model in the candidate list
  • i is a parameter index
  • Thi is the threshold for the corresponding parameter index. Comparing transform parameters allows the motion models to be inserted into or rejected from the candidate list without expending processing time generating the motion vectors of redundant models.
  • another implementation of the aspect includes, wherein the plurality of motion models includes a translation model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1 and a2 are transform parameters of the translation motion model determined based on a motion vector for a control point of the current block.
  • another implementation of the aspect includes, wherein the plurality of motion models includes a four-parameter affine model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, and a4 are transform parameters of the four-parameter affine model determined based on motion vectors for two control points of the current block, and wherein the four-parameter affine model degrades to a translation model when a3 is zero and a4 is zero.
  • another implementation of the aspect includes, wherein the plurality of motion models includes a six-parameter affine model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, a4, a5, and a6 are transform parameters of the six-parameter affine model determined based on motion vectors for three control points of the current block, and wherein the six-parameter affine model degrades to a four-parameter affine model when a3 minus a6 is zero and a4 plus a5 is zero.
  • another implementation of the aspect includes, wherein the plurality of motion models includes an eight-parameter bilinear model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, a4, a5, a6, a7, and a8 are transform parameters of the eight-parameter bilinear model determined based on motion vectors for four control points of the current block, and wherein the eight-parameter bilinear model degrades to a six-parameter affine model when a7 and a8 are zero.
  • another implementation of the aspect includes, wherein a higher order motion model is rejected from the candidate list when the higher order motion model degrades to a lower order motion model with a common number of transform parameters as a lower order motion model in the candidate list and a difference between transform parameters for the degraded motion model and corresponding transform parameters of the lower order motion model in the candidate list is less than or equal to the threshold.
  • another implementation of the aspect includes, wherein the parameters are the motion vectors for the control points of the current block, and wherein the difference between the parameters for the current motion model and corresponding parameters of motion models in the candidate list is compared to a threshold according to:
  • (vxa i , vya i ) are control point motion vector parameters for the current motion model
  • (vxb i , vyb i ) are the control point motion vector parameters for a motion model in the candidate list
  • i is a parameter index
  • Thx i and Thy i are the threshold values for the corresponding parameter index.
  • Comparisons for generating the candidate list can also be made by determining motion vectors fields for the models and comparing the motion vector fields. Comparison of such motion vector fields can be used as an accurate basis for determining redundancy.
  • another implementation of the aspect includes, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common number of parameters. Avoiding degrading motion models can reduce complexity, and may be beneficial for some files.
  • another implementation of the aspect includes, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common reference index to a common reference block. Motion models that reference different reference blocks can be assumed to create different results. Hence, such models may not be compared in some aspects in order to reduce processing time.
  • another implementation of the aspect includes, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that reference a common picture list for the current block.
  • Motion models that reference different picture lists also reference difference reference blocks.
  • models can be assumed to create different results. Accordingly, such models may not be compared in some aspects in order to reduce processing time.
  • the disclosure includes a method implemented in an encoder.
  • the method includes determining, by a processor of the encoder, motion vectors for control points of a current block from neighbor coded blocks in video frame.
  • the method also includes generating, by the processor, a plurality of motion models based on the motion vectors for the control points.
  • the method also includes generating, by the processor, a candidate list. Generating the candidate list includes inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • Generating the candidate list also includes rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the method also includes performing rate distortion optimization to select a selected motion model from the candidate list to encode the current block.
  • the method also includes encoding a candidate index in a bitstream, the candidate index indicating the selected motion model from the candidate list.
  • the method also includes transmitting, by a transmitter of the encoder, the bitstream toward a decoder for reconstruction as a video sequence.
  • Removing redundant motion models decreases the number bits/bins employed to uniquely indicate a selected motion model. Hence, removing redundant motion models decreases the file size of the video coding file, and hence increases file compression. Further, removing redundant motion models reduces the number of motion models considered at the encoder during video coding. Hence, removing redundant motion models also decreases encoder complexity, and hence reduces time to encode files at the encoder.
  • another implementation of the aspect includes, employing the preceding candidate list generation aspects in conjunction with the method implemented in the encoder.
  • the disclosure includes a non-transitory computer readable medium comprising a computer program product for use by a video coding device, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the video coding device to perform the method of any of the preceding aspects.
  • the disclosure includes an apparatus comprising a receiver configured to receive a bitstream with a plurality of coded blocks including a current block and neighbor coded blocks.
  • the apparatus also includes a processor coupled to the receiver.
  • the processor is configured to determine motion vectors for control points of the current block from the neighbor coded blocks.
  • the processor is also configured to generate a plurality of motion models based on the motion vectors for the control points.
  • the processor generates the candidate list by inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • the processor also generates the candidate list by rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the processor is also configured to parse the bitstream to obtain a candidate index indicating a selected motion model from the candidate list.
  • the processor is also configured to reconstruct the current block by performing motion compensation of the current block based on the selected motion model.
  • the processor is also configured to forward a video frame including the current block for display on a display as part of a video sequence.
  • removing redundant motion models decreases the file size of the video coding file, and hence increases file compression. Further, removing redundant motion models reduces the number of motion models considered at the encoder during video coding. Hence, removing redundant motion models also decreases encoder complexity, and hence reduces time to encode files at the encoder.
  • the disclosure includes an apparatus comprising a processor.
  • the processor is configured to determine motion vectors for control points of a current block from neighbor coded blocks in video frame.
  • the processor is also configured to generate a plurality of motion models based on the motion vectors for the control points.
  • the processor generates the candidate list by inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • the processor generates the candidate list by rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the processor is also configured to perform rate distortion optimization to select a selected motion model from the candidate list to encode the current block.
  • the processor is also configured to encode a candidate index in a bitstream, the candidate index indicating the selected motion model from the candidate list.
  • the apparatus also includes a transmitter coupled to the processor. The transmitter is configured to transmit the bitstream toward a decoder for reconstruction as a video sequence.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the parameters are transform parameters, and wherein the difference between the parameters for the current motion model and corresponding parameters of motion models in the candidate list is compared to a threshold according to:
  • ai are the transform parameters for the current motion model
  • bi are the transform parameters of a motion model in the candidate list
  • i is a parameter index
  • Thi is the threshold for the corresponding parameter index. Comparing transform parameters allows the motion models to be inserted into or rejected from the candidate list without expending processing time generating the motion vectors of redundant models.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the plurality of motion models includes a translation model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1 and a2 are transform parameters of the translation motion model determined based on a motion vector for a control point of the current block.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the plurality of motion models includes a four-parameter affine model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, and a4 are transform parameters of the four-parameter affine model determined based on motion vectors for two control points of the current block, and wherein the four-parameter affine model degrades to a translation model when a3 is zero and a4 is zero.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the plurality of motion models includes a six-parameter affine model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, a4, a5, and a6 are transform parameters of the six-parameter affine model determined based on motion vectors for three control points of the current block, and wherein the six-parameter affine model degrades to a four-parameter affine model when a3 minus a6 is zero and a4 plus a5 is zero.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the plurality of motion models includes an eight-parameter bilinear model according to:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a1, a2, a3, a4, a5, a6, a7, and a8 are transform parameters of the eight-parameter bilinear model determined based on motion vectors for four control points of the current block, and wherein the eight-parameter bilinear model degrades to a six-parameter affine model when a 7 and a 8 are zero.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein a higher order motion model is rejected from the candidate list when the higher order motion model degrades to a lower order motion model with a common number of transform parameters as a lower order motion model in the candidate list and a difference between transform parameters for the degraded motion model and corresponding transform parameters of the lower order motion model in the candidate list is less than or equal to the threshold.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the parameters are the motion vectors for the control points of the current block, and wherein the difference between the parameters for the current motion model and corresponding parameters of motion models in the candidate list is compared to a threshold according to:
  • (vxa i , vya i ) are control point motion vector parameters for the current motion model
  • (vxb i , vyb i ) are the control point motion vector parameters for a motion model in the candidate list
  • i is a parameter index
  • Thx i and Thy i are the threshold values for the corresponding parameter index.
  • Comparisons for generating the candidate list can also be made by determining motion vectors fields for the models and comparing the motion vector fields. Comparison of such motion vector fields can be used as an accurate basis for determining redundancy.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common number of parameters. Avoiding degrading motion models can reduce complexity, and may be beneficial for some files.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common reference index to a common reference block. Motion models that reference different reference blocks can be assumed to create different results. Hence, such models may not be compared in some aspects in order to reduce processing time.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein generating the candidate list includes only comparing motion models with motion models in the candidate list that reference a common picture list for the current block.
  • Motion models that reference different picture lists also reference difference reference blocks.
  • models can be assumed to create different results. Accordingly, such models may not be compared in some aspects in order to reduce processing time.
  • the disclosure includes apparatus.
  • the apparatus comprises a receiving means for receiving a bitstream with a plurality of coded blocks including a current block and neighbor coded blocks.
  • the apparatus also comprises a processing means for determining motion vectors for control points of the current block from the neighbor coded blocks.
  • the processing means is also for generating a plurality of motion models based on the motion vectors for the control points.
  • the processing means is also for generating a candidate list by inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • the processing means is also for generating a candidate list by rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the processing means is also for parsing the bitstream to obtain a candidate index indicating a selected motion model from the candidate list.
  • the processing means is also for reconstructing the current block by performing motion compensation of the current block based on the selected motion model.
  • the processing means is also for forwarding a video frame including the current block for display on a display as part of a video sequence.
  • Removing redundant motion models decreases the number bits/bins employed to uniquely indicate a selected motion model. Hence, removing redundant motion models decreases the file size of the video coding file, and hence increases file compression. Further, removing redundant motion models reduces the number of motion models considered at the encoder during video coding. Hence, removing redundant motion models also decreases encoder complexity, and hence reduces time to encode files at the encoder.
  • the disclosure includes apparatus.
  • the apparatus comprises a processing means for determining motion vectors for control points of a current block from neighbor coded blocks in video frame.
  • the processing means is also for generating a plurality of motion models based on the motion vectors for the control points.
  • the processing means is also for generating a candidate list by inserting a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • the processing means is also for generating a candidate list by rejecting the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the processing means is also for performing rate distortion optimization to select a selected motion model from the candidate list to encode the current block.
  • the processing means is also for encoding a candidate index in a bitstream, the candidate index indicating the selected motion model from the candidate list.
  • the apparatus also comprises a transmitting means for transmitting the bitstream toward a decoder for reconstruction as a video sequence.
  • another implementation of the aspect includes, any of the preceding apparatuses, wherein the processing means is further configured to perform the method of any of the preceding aspects.
  • any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
  • FIG. 1 is a flowchart of an example method of coding a video signal.
  • FIG. 2 is a schematic diagram of an example coding and decoding (codec) system for video coding.
  • FIG. 3 is a schematic diagram illustrating an example video encoder that may generate a motion model candidate list for inter-prediction.
  • FIG. 4 is a schematic diagram illustrating an example video decoder that may generate a motion model candidate list for inter-prediction.
  • FIG. 5 is a schematic diagram illustrating an example of unidirectional inter-prediction.
  • FIG. 6 is a schematic diagram illustrating an example of bidirectional inter-prediction.
  • FIG. 7 is a schematic diagram illustrating an example of an affine motion model for affine inter-prediction.
  • FIG. 8 is a schematic diagram illustrating an example of control points employed in complex merge mode.
  • FIG. 9 is a flowchart of an example method of pruning a motion model candidate list used for complex merge mode based inter-prediction.
  • FIG. 10 is a schematic diagram of an example video coding device.
  • FIG. 11 is an embodiment of a device for pruning a motion model candidate list used for complex merge mode based inter-prediction.
  • Video coding involves a combination of compression by inter-prediction and intra-prediction.
  • the present disclosure focuses on increasing the coding efficiency of inter-prediction, which is a mechanism to encode the position of an object in a frame based on the position of the object in a different frame.
  • a motion vector can indicate a direction of movement of an object over time as depicted over multiple frames of a sequence of video.
  • an object in a reference frame and a motion vector can be encoded and then employed by a decoder to partially reconstruct one or more frames that are temporally adjacent to the reference frame.
  • Inter-prediction can employ unidirectional inter-prediction and/or bidirectional inter-prediction.
  • Unidirectional inter-prediction uses a single motion vector to a single reference frame to predict the location of an object in a current frame.
  • Bidirectional inter-prediction uses a preceding motion vector pointing towards a preceding reference frame and a subsequent motion vector pointing towards a subsequent reference frame.
  • Affine inter-prediction is a type of inter-prediction that is applied when an object visually changes shape between frames. For example, camera zooming in and/or out, rotations, perspective motion, and/or other irregular motion may cause an object to appear to change shape between frames.
  • Affine inter-prediction distorts a reference frame so that the motion vectors point in the correct directions for the various sub-portions of the object.
  • An affine transformation may preserve points, straight lines, planes, and/or parallel relationships between lines, while distorting angles between lines and distances between points.
  • Affine inter-prediction may involve employing motion vectors for a current block to generate a motion vector field, partitioning a current block into a plurality of sub-blocks based on motion vectors in the motion vector field, and then determining a motion vector for each sub-block based on the motion vector field.
  • Complex merge mode is an inter-prediction mechanism that employs affine based motion models to perform bidirectional and unidirectional inter-prediction.
  • an encoder can generate motion models by employing one, two, three, or four motion vectors (e.g., bidirectional and/or unidirectional vectors) for a current block. The encoder can then employ rate distortion optimization to select the motion model that encodes a current block with the best balance of compression and image quality loss.
  • a motion model candidate list can be created, which is hereinafter referred to as a candidate list.
  • the encoder obtains motion vectors from previously coded neighbor blocks.
  • the candidate list includes a list of motion models that can be generated from such motion vectors obtained from the neighbor blocks.
  • the encoder can then select a motion model and signal a candidate index for the selected motion model in a bitstream.
  • the decoder can then obtain the candidate index from the bitstream, reconstruct the candidate list by employing the same mechanism as the encoder, and employ the candidate index to determine the selected motion model.
  • the selected motion model can then be employed to reconstruct the current block.
  • Such a process includes certain inefficiencies. Specifically, higher order motion models can degrade to lower order motion models when certain motion vectors are employed. For example, a four vector motion model may provide the same motion vector field as a three motion vector model when one of the motion vectors in the four vector motion model provides little or no change to the motion vector field. Likewise, a three vector motion model may degrade to a two vector motion model, and a two vector motion model can degrade to a one vector motion model in certain cases. Accordingly, the encoder’s rate distortion optimization process may consider multiple motion vector fields from multiple motion models that are substantially the same, which wastes processing resources. Further, a longer candidate list results in signaling a longer candidate index value to uniquely identify the selected motion model, which may unnecessarily increase coding size when multiple motion models generate substantially the same motion vector field.
  • the disclosed mechanisms employ a translational motion model, a four-parameter motion model, a six-parameter motion model, and an eight-parameter bilinear motion model, which generate a motion vector field based on one motion vector, two motion vectors, three motion vectors, and four motion vectors, respectively.
  • the motion vectors are inherited from neighboring coded blocks based on a predetermined priority.
  • the motion vectors are positioned at control points at the four corners of the current block.
  • motion models are generated using various permutations of the motion vectors at the control points.
  • the motion models are also scaled based on one or more reference indices (e.g., unidirectional or bidirectional inter-prediction) to reference block (s) in corresponding reference frame (s) .
  • Reference indices e.g., unidirectional or bidirectional inter-prediction
  • Higher order motion models are degraded to lower order motion models when possible.
  • higher order/lower order indicates the number of motion vectors and/or parameters of a motion model relative to the number of motion vectors and/or parameters of another motion model.
  • the motion models are inserted into a candidate list in a predetermined order (e.g., lower order to higher order, higher order to lower order, etc. ) . During insertion, a current motion model is compared to the motion models already included in the candidate list.
  • motion models are compared for similarity by comparing each parameter of the current motion model with a corresponding parameter of another motion model. When the parameters are the same or the difference in parameters are less than or equal to one or more thresholds, the motion models are considered to be the same or similar. In another example, the motion vectors of the motion models are compared relative to one or more thresholds to determine similarity. In some examples, various checks can be employed to reduce comparison time.
  • inter-prediction employs reference picture lists to refer to unidirectional inter-prediction based on a preceding reference frame, unidirectional inter-prediction based on a subsequent reference frame, and bidirectional inter-prediction based on both the preceding and the subsequent reference frame. If motion models are based on different reference picture lists, the motion models are based on different reference blocks.
  • motion models are different and a comparison to the threshold (s) can be skipped.
  • motion models that employ bidirectional inter-prediction may continue to be compared to unidirectional motion models in case a bidirectional motion model degrades to a unidirectional motion model.
  • the rate distortion optimization process selects the best motion model for inter-prediction and encodes a corresponding candidate list index in a bitstream.
  • a corresponding process is employed to generate the same candidate list as employed by the encoder.
  • the candidate list index from the bitstream is then employed to select the proper motion model to reconstruct the current block via a motion compensation process.
  • the reconstructed current block can then be included in a video sequence displayed for a user.
  • FIG. 1 is a flowchart of an example method 100 of coding a video signal.
  • a video signal is encoded at an encoder.
  • the encoding process compresses the video signal by employing various mechanisms to reduce the video file size. A smaller file size allows the compressed video file to be transmitted toward a user, while reducing associated bandwidth overhead.
  • the decoder then decodes the compressed video file to reconstruct the original video signal for display to an end user.
  • the decoding process generally mirrors the encoding process to allow the decoder to consistently reconstruct the video signal.
  • the video signal is input into the encoder.
  • the video signal may be an uncompressed video file stored in memory.
  • the video file may be captured by a video capture device, such as a video camera, and encoded to support live streaming of the video.
  • the video file may include both an audio component and a video component.
  • the video component contains a series of image frames that, when viewed in a sequence, gives the visual impression of motion.
  • the frames contain pixels that are expressed in terms of light, referred to herein as luma components, and color, which is referred to as chroma components.
  • the frames may also contain depth values to support three dimensional viewing.
  • the video is partitioned into blocks.
  • Partitioning includes subdividing the pixels in each frame into square and/or rectangular blocks for compression.
  • coding trees may be employed to divide and then recursively subdivide blocks until configurations are achieved that support further encoding.
  • the blocks may be referred to as coding tree units in High Efficiency Video Coding (HEVC) (also known as H. 265 and MPEG-H Part 2) .
  • HEVC High Efficiency Video Coding
  • luma components of a frame may be subdivided until the individual blocks contain relatively homogenous lighting values.
  • chroma components of a frame may be subdivided until the individual blocks contain relatively homogenous color values. Accordingly, partitioning mechanisms vary depending on the content of the video frames.
  • inter-prediction and/or intra-prediction may be employed.
  • Inter-prediction is designed to take advantage of the fact that objects in a common scene tend to appear in successive frames. Accordingly, a block depicting an object in a reference frame need not be repeatedly described in adjacent frames. Specifically, an object, such as a table, may remain in a constant position over multiple frames. Hence the table is described once and adjacent frames can refer back to the reference frame.
  • Pattern matching mechanisms may be employed to match objects over multiple frames. Further, moving objects may be represented across multiple frames, for example due to object movement or camera movement. As a particular example, a video may show an automobile that moves across the screen over multiple frames.
  • Motion vectors can be employed to describe such movement.
  • a motion vector is a two-dimensional vector that provides an offset from the coordinates of an object in a frame to the coordinates of the object in a reference frame.
  • inter-prediction can encode an image block in a current frame as a set of motion vectors indicating an offset from a corresponding block in a reference frame.
  • Intra-prediction encodes blocks in a common frame. Intra-prediction takes advantage of the fact that luma and chroma components tend to cluster in a frame. For example, a patch of green in a portion of a tree tends to be positioned adjacent to similar patches of green. Intra-prediction employs multiple directional prediction modes (e.g., thirty three in HEVC) , a planar mode, and a direct current (DC) mode. The directional modes indicate that a current block is similar/the same as samples of a neighbor block in a corresponding direction. Planar mode indicates that a series of blocks along a row/column (e.g., a plane) can be interpolated based on neighbor blocks at the edges of the row.
  • a row/column e.g., a plane
  • Planar mode in effect, indicates a smooth transition of light/color across a row/column by employing a relatively constant slope in changing values.
  • DC mode is employed for boundary smoothing and indicates that a block is similar/the same as an average value associated with samples of all the neighbor blocks associated with the angular directions of the directional prediction modes.
  • intra-prediction blocks can represent image blocks as various relational prediction mode values instead of the actual values.
  • inter-prediction blocks can represent image blocks as motion vector values instead of the actual values. In either case, the prediction blocks may not exactly represent the image blocks in some cases. Any differences are stored in residual blocks. Transforms may be applied to the residual blocks to further compress the file.
  • various filtering techniques may be applied.
  • the filters are applied according to an in-loop filtering scheme.
  • the block based prediction discussed above may result in the creation of blocky images at the decoder. Further, the block based prediction scheme may encode a block and then reconstruct the encoded block for later use as a reference block.
  • the in-loop filtering scheme iteratively applies noise suppression filters, de-blocking filters, adaptive loop filters, and sample adaptive offset (SAO) filters to the blocks/frames. These filters mitigate such blocking artifacts so that the encoded file can be accurately reconstructed. Further, these filters mitigate artifacts in the reconstructed reference blocks so that artifacts are less likely to create additional artifacts in subsequent blocks that are encoded based on the reconstructed reference blocks.
  • the bitstream includes the data discussed above as well as any signaling data desired to support proper video signal reconstruction at the decoder.
  • data may include partition data, prediction data, residual blocks, and various flags providing coding instructions to the decoder.
  • the bitstream may be stored in memory for transmission toward a decoder upon request.
  • the bitstream may also be broadcast and/or multicast toward a plurality of decoders.
  • the creation of the bitstream is an iterative process. Accordingly, steps 101, 103, 105, 107, and 109 may occur continuously and/or simultaneously over many frames and blocks.
  • FIG. 1 is presented for clarity and ease of discussion, and is not intended to limit the video coding process to a particular order.
  • the decoder receives the bitstream and begins the decoding process at step 111. Specifically, the decoder employs an entropy decoding scheme to convert the bitstream into corresponding syntax and video data. The decoder employs the syntax data from the bitstream to determine the partitions for the frames at step 111. The partitioning should match the results of block partitioning at step 103. Entropy encoding/decoding as employed in step 111 is now described. The encoder makes many choices during the compression process, such as selecting block partitioning schemes from several possible choices based on the spatial positioning of values in the input image (s) . Signaling the exact choices may employ a large number of bins.
  • a bin is a binary value that is treated as a variable (e.g., a bit value that may vary depending on context) .
  • Entropy coding allows the encoder to discard any options that are clearly not viable for a particular case, leaving a set of allowable options.
  • Each allowable option is then assigned a code word.
  • the length of the code words is based on the number of allowable options (e.g., one bin for two options, two bins for three to four options, etc. ) The encoder then encodes the code word for the selected option.
  • This scheme reduces the size of the code words as the code words are as big as desired to uniquely indicate a selection from a small sub-set of allowable options as opposed to uniquely indicating the selection from a potentially large set of all possible options.
  • the decoder then decodes the selection by determining the set of allowable options in a similar manner to the encoder. By determining the set of allowable options, the decoder can read the code word and determine the selection made by the encoder.
  • the decoder performs block decoding. Specifically, the decoder employs reverse transforms to generate residual blocks. Then the decoder employs the residual blocks and corresponding prediction blocks to reconstruct the image blocks according to the partitioning.
  • the prediction blocks may include both intra-prediction blocks and inter-prediction blocks as generated at the encoder at step 105.
  • the reconstructed image blocks are then positioned into frames of a reconstructed video signal according to the partitioning data determined at step 111. Syntax for step 113 may also be signaled in the bitstream via entropy coding as discussed above.
  • step 115 filtering is performed on the frames of the reconstructed video signal in a manner similar to step 107 at the encoder. For example, noise suppression filters, de-blocking filters, adaptive loop filters, and SAO filters may be applied to the frames to remove blocking artifacts. Once the frames are filtered, the video signal can be output to a display at step 117 for viewing by an end user.
  • noise suppression filters For example, noise suppression filters, de-blocking filters, adaptive loop filters, and SAO filters may be applied to the frames to remove blocking artifacts.
  • the video signal can be output to a display at step 117 for viewing by an end user.
  • the present disclosure relates to modifications to decrease the computational complexity of affine inter-prediction in complex merge mode.
  • the present disclosure introduces a mechanism to reduce the size (e.g., prune) a candidate list employed to signal motion models. This reduces the complexity block compression at the encoder as well as encoding size at both the encoder and the decoder.
  • the inter-prediction mechanisms described in the FIGS. below impact the operation of block compression at step 105, bitstream encoding and transmission at step 109, and block decoding at step 113.
  • FIG. 2 is a schematic diagram of an example coding and decoding (codec) system 200 for video coding.
  • codec system 200 provides functionality to support the implementation of method 100.
  • Codec system 200 is generalized to depict components employed in both an encoder and a decoder.
  • Codec system 200 receives and partitions a video signal as discussed with respect to steps 101 and 103 in method 100, which results in a partitioned video signal 201.
  • Codec system 200 then compresses the partitioned video signal 201 into a coded bitstream when acting as an encoder as discussed with respect to steps 105, 107, and 109 in method 100.
  • the codec system 200 When acting as a decoder codec system 200 generates an output video signal from the bitstream as discussed with respect to steps 111, 113, 115, and 117 in method 100.
  • the codec system 200 includes a general coder control component 211, a transform scaling and quantization component 213, an intra-picture estimation component 215, an intra-picture prediction component 217, a motion compensation component 219, a motion estimation component 221, a scaling and inverse transform component 229, a filter control analysis component 227, an in-loop filters component 225, a decoded picture buffer component 223, and a header formatting and context adaptive binary arithmetic coding (CABAC) component 231.
  • CABAC header formatting and context adaptive binary arithmetic coding
  • the components of codec system 200 may all be present in the encoder.
  • the decoder may include a subset of the components of codec system 200.
  • the decoder may include the intra-picture prediction component 217, the motion compensation component 219, the scaling and inverse transform component 229, the in-loop filters component 225, and the decoded picture buffer component 223. These components are now described.
  • the partitioned video signal 201 is a captured video sequence that has been partitioned into blocks of pixels by a coding tree.
  • a coding tree employs various split modes to subdivide a block of pixels into smaller blocks of pixels. These blocks can then be further subdivided into smaller blocks.
  • the blocks may be referred to as nodes on the coding tree. Larger parent nodes are split into smaller child nodes. The number of times a node is subdivided is referred to as the depth of the node/coding tree.
  • the divided blocks are referred to as coding units (CUs) in some cases.
  • the split modes may include a binary tree (BT) , triple tree (TT) , and a quad tree (QT) employed to partition a node into two, three, or four child nodes, respectively, of varying shapes depending on the split modes employed.
  • the partitioned video signal 201 is forwarded to the general coder control component 211, the transform scaling and quantization component 213, the intra-picture estimation component 215, the filter control analysis component 227, and the motion estimation component 221 for compression.
  • the general coder control component 211 is configured to make decisions related to coding of the images of the video sequence into the bitstream according to application constraints. For example, the general coder control component 211 manages optimization of bitrate/bitstream size versus reconstruction quality. Such decisions may be made based on storage space/bandwidth availability and image resolution requests.
  • the general coder control component 211 also manages buffer utilization in light of transmission speed to mitigate buffer underrun and overrun issues. To manage these issues, the general coder control component 211 manages partitioning, prediction, and filtering by the other components. For example, the general coder control component 211 may dynamically increase compression complexity to increase resolution and increase bandwidth usage or decrease compression complexity to decrease resolution and bandwidth usage.
  • the general coder control component 211 controls the other components of codec system 200 to balance video signal reconstruction quality with bit rate concerns.
  • the general coder control component 211 creates control data, which controls the operation of the other components.
  • the control data is also forwarded to the header formatting and CABAC component 231 to be encoded in the bitstream to signal parameters for decoding at the decoder.
  • the partitioned video signal 201 is also sent to the motion estimation component 221 and the motion compensation component 219 for inter-prediction.
  • a frame or slice of the partitioned video signal 201 may be divided into multiple video blocks.
  • Motion estimation component 221 and the motion compensation component 219 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction.
  • Codec system 200 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.
  • Motion estimation component 221 and motion compensation component 219 may be highly integrated, but are illustrated separately for conceptual purposes.
  • Motion estimation performed by motion estimation component 221, is the process of generating motion vectors, which estimate motion for video blocks.
  • a motion vector for example, may indicate the displacement of a coded object relative to a predictive block.
  • a predictive block is a block that is found to closely match the block to be coded, in terms of pixel difference.
  • a predictive block may also be referred to as a reference block.
  • Such pixel difference may be determined by sum of absolute difference (SAD) , sum of square difference (SSD) , or other difference metrics.
  • HEVC employs several coded objects including a coding tree unit (CTU) , coding tree blocks (CTBs) , and CUs.
  • CTU coding tree unit
  • CTBs coding tree blocks
  • CUs coding tree blocks
  • a CTU can be divided into CTBs, which can then be divided into CUs, which can be further sub-divided as desired.
  • a CU can be encoded as a prediction unit (PU) containing prediction data and/or a transform unit (TU) containing transformed residual data for the CU.
  • the motion estimation component 221 generates motion vectors, PUs, and TUs by using a rate-distortion analysis as part of a rate distortion optimization process. For example, the motion estimation component 221 may determine multiple reference blocks, multiple motion vectors, etc. for a current block/frame, and may select the reference blocks, motion vectors, etc. having the best rate-distortion characteristics. The best rate-distortion characteristics balance both quality of video reconstruction (e.g., amount of data loss by compression) with coding efficiency (e.g., size of the final encoding) .
  • codec system 200 may calculate values for sub-integer pixel positions of reference pictures stored in decoded picture buffer component 223. For example, video codec system 200 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation component 221 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision. The motion estimation component 221 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. Motion estimation component 221 outputs the calculated motion vector as motion data to header formatting and CABAC component 231 for encoding and motion to the motion compensation component 219.
  • Motion compensation performed by motion compensation component 219, may involve fetching or generating the predictive block based on the motion vector determined by motion estimation component 221. Again, motion estimation component 221 and motion compensation component 219 may be functionally integrated, in some examples. Upon receiving the motion vector for the PU of the current video block, motion compensation component 219 may locate the predictive block to which the motion vector points. A residual video block is then formed by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. In general, motion estimation component 221 performs motion estimation relative to luma components, and motion compensation component 219 uses motion vectors calculated based on the luma components for both chroma components and luma components. The predictive block and residual block are forwarded to transform scaling and quantization component 213.
  • the partitioned video signal 201 is also sent to intra-picture estimation component 215 and intra-picture prediction component 217.
  • intra-picture estimation component 215 and intra-picture prediction component 217 may be highly integrated, but are illustrated separately for conceptual purposes.
  • the intra-picture estimation component 215 and intra-picture prediction component 217 intra-predict a current block relative to blocks in a current frame, as an alternative to the inter-prediction performed by motion estimation component 221 and motion compensation component 219 between frames, as described above.
  • the intra-picture estimation component 215 determines an intra-prediction mode to use to encode a current block.
  • intra-picture estimation component 215 selects an appropriate intra-prediction mode to encode a current block from multiple tested intra-prediction modes. The selected intra-prediction modes are then forwarded to the header formatting and CABAC component 231 for encoding.
  • the intra-picture estimation component 215 calculates rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and selects the intra-prediction mode having the best rate-distortion characteristics among the tested modes.
  • Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original unencoded block that was encoded to produce the encoded block, as well as a bitrate (e.g., a number of bits) used to produce the encoded block.
  • the intra-picture estimation component 215 calculates ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
  • intra-picture estimation component 215 may be configured to code depth blocks of a depth map using a depth modeling mode (DMM) based on rate-distortion optimization (RDO) .
  • DDMM depth modeling mode
  • RDO rate-distortion optimization
  • the intra-picture prediction component 217 may generate a residual block from the predictive block based on the selected intra-prediction modes determined by intra-picture estimation component 215 when implemented on an encoder or read the residual block from the bitstream when implemented on a decoder.
  • the residual block includes the difference in values between the predictive block and the original block, represented as a matrix.
  • the residual block is then forwarded to the transform scaling and quantization component 213.
  • the intra-picture estimation component 215 and the intra-picture prediction component 217 may operate on both luma and chroma components.
  • the transform scaling and quantization component 213 is configured to further compress the residual block.
  • the transform scaling and quantization component 213 applies a transform, such as a discrete cosine transform (DCT) , a discrete sine transform (DST) , or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Wavelet transforms, integer transforms, sub-band transforms or other types of transforms could also be used.
  • the transform may convert the residual information from a pixel value domain to a transform domain, such as a frequency domain.
  • the transform scaling and quantization component 213 is also configured to scale the transformed residual information, for example based on frequency.
  • Such scaling involves applying a scale factor to the residual information so that different frequency information is quantized at different granularities, which may affect final visual quality of the reconstructed video.
  • the transform scaling and quantization component 213 is also configured to quantize the transform coefficients to further reduce bit rate.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients.
  • the degree of quantization may be modified by adjusting a quantization parameter.
  • the transform scaling and quantization component 213 may then perform a scan of the matrix including the quantized transform coefficients.
  • the quantized transform coefficients are forwarded to the header formatting and CABAC component 231 to be encoded in the bitstream.
  • the scaling and inverse transform component 229 applies a reverse operation of the transform scaling and quantization component 213 to support motion estimation.
  • the scaling and inverse transform component 229 applies inverse scaling, transformation, and/or quantization to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block which may become a predictive block for another current block.
  • the motion estimation component 221 and/or motion compensation component 219 may calculate a reference block by adding the residual block back to a corresponding predictive block for use in motion estimation of a later block/frame. Filters are applied to the reconstructed reference blocks to mitigate artifacts created during scaling, quantization, and transform. Such artifacts could otherwise cause inaccurate prediction (and create additional artifacts) when subsequent blocks are predicted.
  • the filter control analysis component 227 and the in-loop filters component 225 apply the filters to the residual blocks and/or to reconstructed image blocks.
  • the transformed residual block from scaling and inverse transform component 229 may be combined with a corresponding prediction block from intra-picture prediction component 217 and/or motion compensation component 219 to reconstruct the original image block.
  • the filters may then be applied to the reconstructed image block.
  • the filters may instead be applied to the residual blocks.
  • the filter control analysis component 227 and the in-loop filters component 225 are highly integrated and may be implemented together, but are depicted separately for conceptual purposes. Filters applied to the reconstructed reference blocks are applied to particular spatial regions and include multiple parameters to adjust how such filters are applied.
  • the filter control analysis component 227 analyzes the reconstructed reference blocks to determine where such filters should be applied and sets corresponding parameters. Such data is forwarded to the header formatting and CABAC component 231 as filter control data for encoding.
  • the in-loop filters component 225 applies such filters based on the filter control data.
  • the filters may include a deblocking filter, a noise suppression filter, a SAO filter, and an adaptive loop filter. Such filters may be applied in the spatial/pixel domain (e.g., on a reconstructed pixel block) or in the frequency domain, depending on the example.
  • the filtered reconstructed image block, residual block, and/or prediction block are stored in the decoded picture buffer component 223 for later use in motion estimation as discussed above.
  • the decoded picture buffer component 223 stores and forwards the reconstructed and filtered blocks toward a display as part of an output video signal.
  • the decoded picture buffer component 223 may be any memory device capable of storing prediction blocks, residual blocks, and/or reconstructed image blocks.
  • the header formatting and CABAC component 231 receives the data from the various components of codec system 200 and encodes such data into a coded bitstream for transmission toward a decoder. Specifically, the header formatting and CABAC component 231 generates various headers to encode control data, such as general control data and filter control data. Further, prediction data, including intra-prediction and motion data, as well as residual data in the form of quantized transform coefficient data are all encoded in the bitstream. The final bitstream includes all information desired by the decoder to reconstruct the original partitioned video signal 201.
  • Such information may also include intra-prediction mode index tables (also referred to as codeword mapping tables) , definitions of encoding contexts for various blocks, indications of most probable intra-prediction modes, an indication of partition information, etc.
  • Such data may be encoded be employing entropy coding.
  • the information may be encoded by employing context adaptive variable length coding (CAVLC) , CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC) , probability interval partitioning entropy (PIPE) coding, or another entropy coding technique.
  • CAVLC context adaptive variable length coding
  • CABAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the coded bitstream may be transmitted to another device (e.g., a video decoder) or archived for later transmission or retrieval.
  • the present disclosure relates to modifications to decrease the computational complexity of affine inter-prediction in complex merge mode.
  • the present disclosure introduces a mechanism to reduce the size (e.g., prune) a candidate list employed to signal motion models. This reduces the complexity block compression at the encoder as well as encoding size at both the encoder and the decoder.
  • the inter-prediction mechanisms described in the FIGS. below impact the operation of motion estimation component 221, motion compensation component 219, and/or header formatting and CABAC component 231.
  • FIG. 3 is a block diagram illustrating an example video encoder 300 that may generate a motion model candidate list for inter-prediction.
  • Video encoder 300 may be employed to implement the encoding functions of codec system 200 and/or implement steps 101, 103, 105, 107, and/or 109 of method 100.
  • Encoder 300 partitions an input video signal, resulting in a partitioned video signal 301, which is substantially similar to the partitioned video signal 201.
  • the partitioned video signal 301 is then compressed and encoded into a bitstream by components of encoder 300.
  • the partitioned video signal 301 is forwarded to an intra-picture prediction component 317 for intra-prediction.
  • the intra-picture prediction component 317 may be substantially similar to intra-picture estimation component 215 and intra-picture prediction component 217.
  • the partitioned video signal 301 is also forwarded to a motion compensation component 321 for inter-prediction based on reference blocks in a decoded picture buffer component 323.
  • the motion compensation component 321 may be substantially similar to motion estimation component 221 and motion compensation component 219.
  • the prediction blocks and residual blocks from the intra-picture prediction component 317 and the motion compensation component 321 are forwarded to a transform and quantization component 313 for transform and quantization of the residual blocks.
  • the transform and quantization component 313 may be substantially similar to the transform scaling and quantization component 213.
  • the transformed and quantized residual blocks and the corresponding prediction blocks (along with associated control data) are forwarded to an entropy coding component 331 for coding into a bitstream.
  • the entropy coding component 331 may be substantially similar to the header formatting and CABAC component 231.
  • the transformed and quantized residual blocks and/or the corresponding prediction blocks are also forwarded from the transform and quantization component 313 to an inverse transform and quantization component 329 for reconstruction into reference blocks for use by the motion compensation component 321.
  • the inverse transform and quantization component 329 may be substantially similar to the scaling and inverse transform component 229.
  • In-loop filters in an in-loop filters component 325 are also applied to the residual blocks and/or reconstructed reference blocks, depending on the example.
  • the in-loop filters component 325 may be substantially similar to the filter control analysis component 227 and the in-loop filters component 225.
  • the in-loop filters component 325 may include multiple filters as discussed with respect to in-loop filters component 225.
  • the filtered blocks are then stored in a decoded picture buffer component 323 for use as reference blocks by the motion compensation component 321.
  • the decoded picture buffer component 323 may be substantially similar to the decoded picture buffer component 223.
  • Affine inter-prediction is a particular type of inter-prediction employed in encoding and decoding by step 105, step 113, motion compensation component 219, motion estimation component 221, and/or motion compensation component 321.
  • Inter-prediction employs a motion vector and a reference block in a reference frame to encode blocks for one or more frames that are temporally adjacent to the reference frame. As discussed above, this allows an object to be coded with respect to the reference frame without recoding the object repeatedly for every frame.
  • Affine inter-prediction is employed when an object visually changes shape between frames, which may occur due to camera zoom, camera rotations, perspective motion, and/or other irregular motion.
  • the motion compensation component 321 distorts the reference frame in order to project the shape and location of the object in temporally adjacent frames.
  • the motion vectors for a current block can be described in terms of a motion vector field (MVF) generated based on control point motion vectors for the current block.
  • the current block is subdivided into sub-blocks of sizes selected based on the MVF and then the motion vectors for the sub-blocks can be determined based on the MVF.
  • MVF motion vector field
  • the resulting motion vectors for the sub-blocks can be filtered and weighted by the motion compensation component 321 and/or the in-loop filters component 325 to generate prediction information (e.g., PUs) and residual information, which can be transformed and/or encoded by the transform and quantization component 313 and the entropy coding component 331, respectively.
  • prediction information e.g., PUs
  • residual information which can be transformed and/or encoded by the transform and quantization component 313 and the entropy coding component 331, respectively.
  • the motion compensation component 321 may first determine control point vectors for a current block as part of a rate distortion optimization process.
  • the motion vectors may be inherited from neighboring coded blocks based on a predetermined priority.
  • the motion compensation component 321 may also determine the MVF based on the control point vectors.
  • the motion compensation component 321 may then determine the size of the various sub-blocks based on the motion vectors in the MVF.
  • the motion compensation component 321 may then determine the relevant motion vector for each sub-block.
  • the motion compensation component 321 may employ such a process as part of both a unidirectional inter-prediction and a bidirectional inter-prediction.
  • the motion compensation component 321 may attempt both unidirectional inter-prediction and bidirectional inter-prediction during rate distortion optimization and then select the approach that results in the best balance of coding size and video quality.
  • unidirectional prediction a current block is predicted by a single reference frame
  • bidirectional prediction a current block is predicted by a temporally preceding reference frame and a temporally subsequent reference frame.
  • the determined control point vectors and resulting MVF may be referred to as a motion model.
  • a motion model can be generated by the motion compensation component 321 based on a single motion vector (e.g. non-affine inter-prediction) , two motion vectors, three motion vectors, and/or four motion vectors. These models may be referred to as translation motion model, a four-parameter affine motion model, a six-parameter affine motion model, and an eight-parameter bilinear motion model, respectively.
  • the motion vectors can be positioned at the corners of a current block (e.g., as control point vectors) and combined to create the MVF. As four control point vectors are available for modeling purposes, four possible single motion vector models can be employed to describe a current block.
  • six possible two motion vector motion models can be employed to describe a current block based on various permutations of pairs of the four control point vectors.
  • Four possible three motion vector motion models can be employed to describe a current block based on various permutations of three vector groups of the four control point vectors.
  • a single four motion vector motion model can be employed to describe a current block by employing all four control point vectors.
  • the motion models can be scaled based on the reference block/frame referenced by the corresponding motion vector (s) , and then used to predict the current block.
  • the motion compensation component 321 can attempt to encode the current block with each of the motion models. The motion compensation component 321 can then select the motion model with the best balance of image quality and encoding size.
  • the motion compensation component 321 can generate a candidate list of the possible motion models and determine a candidate index that indicates the selected motion model.
  • the candidate index can then be transmitted by the entropy coding component 331 toward the decoder for use in reconstructing the current block.
  • motion models described above may be redundant in some cases.
  • a two motion vector motion model may result in substantially the same MVF as another two motion vector motion model when the motion vector pairs are similar.
  • a higher order motion model may be substantially similar to a lower order motion model when one of the motion vectors of the higher order motion model has only a small effect on the MVF.
  • Motion models that generate the same or similar MVFs result in substantially the same encoding. Accordingly, attempting to encode the current block with more than one motion model with the same MVF is a waste of motion compensation component 321.
  • including redundant motion models in a candidate list increases the length of the candidate list. Hence, the length of the candidate index also increases in order to uniquely identify the selected motion model. This results in longer encodings and reduced coding efficiency.
  • the motion compensation component 321 is configured to prune redundant motion models from the candidate list in order to reduce rate distortion optimization complexity and/or encoding size for the candidate index. For example, the motion compensation component 321 is configured to degrade higher order motion vector models into lower order motion vector models when possible to support comparison. As used herein higher order/lower order indicates the number of motion vectors and/or parameters of a motion model relative to the number of motion vectors and/or parameters of another motion model. Once the motion model (s) are degraded, the motion compensation component 321 compares the motion models to determine redundant motion models for pruning. For example, the motion compensation component 321 may iteratively insert motion models into the candidate list.
  • the motion compensation component 321 may compare the current motion model with each motion model already in the candidate list. If the current motion model is the same as any of the motion models already in the candidate list or if the difference between the motion models is lower than a threshold, then the current motion model is not added and may not be considered further for the rate distortion optimization process. Such a comparison process may be completed by comparing motion model parameters, motion model motion vectors, and/or resulting motion vector fields. These aspects are discussed in greater detail with respect to the FIGS. below.
  • the motion compensation component 321 can perform rate distortion optimization to select the best motion model from the candidate list.
  • the motion compensation component 321 can then signal a candidate index indicating the selected motion model to the entropy coding component 331.
  • the entropy coding component 331 encodes the candidate index as prediction information in the bitstream for transmission to a decoder.
  • the entropy coding component 331 may not encode the candidate list in the bitstream, in which case the decoder generates the candidate list using a similar process to the process used by the encoder 300.
  • FIG. 4 is a block diagram illustrating an example video decoder 400 that may generate a motion model candidate list for inter-prediction.
  • Video decoder 400 may be employed to implement the decoding functions of codec system 200 and/or implement steps 111, 113, 115, and/or 117 of method 100.
  • Decoder 400 receives a bitstream, for example from an encoder 300, and generates a reconstructed output video signal based on the bitstream for display to an end user.
  • the bitstream is received by an entropy decoding component 433.
  • the entropy decoding component 433 is configured to implement an entropy decoding scheme, such as CAVLC, CABAC, SBAC, PIPE coding, or other entropy coding techniques.
  • the entropy decoding component 433 may employ header information to provide a context to interpret additional data encoded as codewords in the bitstream.
  • the decoded information includes any desired information to decode the video signal, such as general control data, filter control data, partition information, motion data, prediction data, and quantized transform coefficients from residual blocks.
  • the quantized transform coefficients are forwarded to an inverse transform and quantization component 429 for reconstruction into residual blocks.
  • the inverse transform and quantization component 429 may be similar to inverse transform and quantization component 329.
  • the reconstructed residual blocks and/or prediction blocks are forwarded to intra-picture prediction component 417 for reconstruction into image blocks based on intra-prediction operations.
  • the intra-picture prediction component 417 may be similar to intra-picture estimation component 215 and an intra-picture prediction component 217. Specifically, the intra-picture prediction component 417 employs prediction modes to locate a reference block in the frame and applies a residual block to the result to reconstruct intra-predicted image blocks.
  • the reconstructed intra-predicted image blocks and/or the residual blocks and corresponding inter-prediction data are forwarded to a decoded picture buffer component 423 via in-loop filters component 425, which may be substantially similar to decoded picture buffer component 223 and in-loop filters component 225, respectively.
  • the in-loop filters component 425 filters the reconstructed image blocks, residual blocks and/or prediction blocks, and such information is stored in the decoded picture buffer component 423.
  • Reconstructed image blocks from decoded picture buffer component 423 are forwarded to a motion compensation component 421 for inter-prediction.
  • the motion compensation component 421 may be substantially similar to motion estimation component 221 and/or motion compensation component 219. Specifically, the motion compensation component 421 employs motion vectors from a reference block to generate a prediction block and applies a residual block to the result to reconstruct an image block.
  • the resulting reconstructed blocks may also be forwarded via the in-loop filters component 425 to the decoded picture buffer component 423.
  • the decoded picture buffer component 423 continues to store additional reconstructed image blocks, which can be reconstructed into frames via the partition information. Such frames may also be placed in a sequence. The sequence is output toward a display as a reconstructed output video signal.
  • affine inter-prediction is applied by motion compensation component 421 as part of performing inter-prediction.
  • the motion compensation component 421 is configured to employ the prediction information in the bitstream to reconstruct current blocks.
  • the motion compensation component 421 receives the candidate index from the bitstream via the entropy decoding component 433.
  • the motion compensation component 421 employs the same process as the encoder 300 to generate a candidate list, for example by determining control point motion vectors, constructing the motion models, scaling the motion models, degrading motion models when possible, and comparing motion models to prune redundant models.
  • the motion compensation component 421 can then employ the candidate index from the bitstream to determine the selected motion model from the candidate list.
  • the motion compensation component 421 can determine the MVF based on the selected motion model and the control point motion vectors for the current block.
  • the control point motion vectors are inherited from neighboring coded blocks based on a predetermined priority.
  • the motion compensation component 421 can reconstruct the current block by employing motion compensation.
  • Motion compensation includes determining the size of sub-blocks for the current block. The size of the sub-blocks may be signaled in the bitstream, set to a default value known to the decoder, and/or derived based on the control point motion vectors and current block dimensions.
  • the motion compensation component 421 can determine motion vectors for the sub-blocks based on the MVF.
  • the motion vectors for the sub-blocks can then be employed to reconstruct the prediction information for the current block.
  • the prediction information can be combined with residual information, if any, to generate a reconstructed block of pixels for the current block.
  • the current block can also be filtered and combined with other blocks to generate reconstructed frames. Such reconstructed frames can be stored in the decoded picture buffer component 423 for display as part of a reconstructed video sequence in the output video signal.
  • FIG. 5 is a schematic diagram illustrating an example of unidirectional inter-prediction 500, for example as performed to determine motion vectors (MVs) at block compression step 105, block decoding step 113, motion estimation component 221, motion compensation component 219, motion compensation component 321, and/or motion compensation component 421.
  • unidirectional inter-prediction 500 can be employed to determine motion vectors for a block in inter-prediction modes and/or to determine motion vectors for sub-blocks in affine inter-prediction mode.
  • Unidirectional inter-prediction 500 employs a reference frame 530 with a reference block 531 to predict a current block 511 in a current frame 510.
  • the reference frame 530 may be temporally positioned after the current frame 510 as shown (e.g., as a subsequent reference frame) , but may also be temporally positioned before the current frame 510 (e.g., as a preceding reference frame) in some examples.
  • the current frame 510 is an example frame/picture being encoded/decoded at a particular time.
  • the current frame 510 contains an object in the current block 511 that matches an object in the reference block 531 of the reference frame 530.
  • the reference frame 530 is a frame that is employed as a reference for encoding a current frame 510, and a reference block 531 is a block in the reference frame 530 that contains an object also contained in the current block 511 of the current frame 510.
  • the current block 511 is any coding unit that is being encoded/decoded at a specified point in the coding process.
  • the current block 511 may be an entire partitioned block, or may be a sub-block in the affine inter-prediction case.
  • the current frame 510 is separated from the reference frame 530 by some temporal distance (TD) 533.
  • the TD 533 indicates an amount of time between the current frame 510 and the reference frame 530 in a video sequence, and may be measured in units of frames.
  • the prediction information for the current block 511 may reference the reference frame 530 and/or reference block 531 by a reference index indicating the direction and temporal distance between the frames.
  • the object in the current block 511 moves from a position in the current frame 510 to another position in the reference frame 530 (e.g., the position of the reference block 531) .
  • the object may move along a motion trajectory 513, which is a direction of movement of an object over time.
  • a motion vector 535 describes the direction and magnitude of the movement of the object along the motion trajectory 513 over the TD 533.
  • an encoded motion vector 535 and a reference block 531 provides information sufficient to reconstruct a current block 511 and position the current block 511 in the current frame 510.
  • the object changes shape between the current frame 510 and the reference frame 530.
  • the current block 511 is sub-divided into sub-blocks that each include a corresponding motion vector 535, for example as defined by an MVF.
  • FIG. 6 is a schematic diagram illustrating an example of bidirectional inter-prediction 600, for example as performed to determine MVs at block compression step 105, block decoding step 113, motion estimation component 221, motion compensation component 219, motion compensation component 321, and/or motion compensation component 421.
  • bidirectional inter-prediction 600 can be employed to determine motion vectors for a block in inter-prediction modes and/or to determine motion vectors for sub-blocks in affine inter-prediction mode.
  • Bidirectional inter-prediction 600 is similar to unidirectional inter-prediction 500, but employs a pair of reference frames to predict a current block 611 in a current frame 610.
  • current frame 610 and current block 611 are substantially similar to current frame 510 and current block 511, respectively.
  • the current frame 610 is temporally positioned between a preceding reference frame 620, which occurs before the current frame 610 in the video sequence, and a subsequent reference frame 630, which occurs after the current frame 610 in the video sequence.
  • Preceding reference frame 620 and subsequent reference frame 630 are otherwise substantially similar to reference frame 530.
  • the current block 611 is matched to a preceding reference block 621 in the preceding reference frame 620 and to a subsequent reference block 631 in the subsequent reference frame 630. Such a match indicates that, over the course of the video sequence, an object moves from a position at the preceding reference block 621 to a position at the subsequent reference block 631 along a motion trajectory 613 and via the current block 611.
  • the current frame 610 is separated from the preceding reference frame 620 by some preceding temporal distance (TD0) 623 and separated from the subsequent reference frame 630 by some subsequent temporal distance (TD1) 633.
  • the TD0 623 indicates an amount of time between the preceding reference frame 620 and the current frame 610 in the video sequence in units of frames.
  • the TD1 633 indicates an amount of time between the current frame 610 and the subsequent reference frame 630 in the video sequence in units of frame.
  • the object moves from the preceding reference block 621 to the current block 611 along the motion trajectory 613 over a time period indicated by TD0 623.
  • the object also moves from the current block 611 to the subsequent reference block 631 along the motion trajectory 613 over a time period indicated by TD1 633.
  • the prediction information for the current block 611 may reference the preceding reference frame 620 and/or preceding reference block 621 and the subsequent reference frame 630 and/or subsequent reference block 631 by a pair of reference indices indicating the direction and temporal distance between the frames.
  • a preceding motion vector (MV0) 625 describes the direction and magnitude of the movement of the object along the motion trajectory 613 over the TD0 623 (e.g., between the preceding reference frame 620 and the current frame 610) .
  • a subsequent motion vector (MV1) 635 describes the direction and magnitude of the movement of the object along the motion trajectory 613 over the TD1 633 (e.g., between the current frame 610 and the subsequent reference frame 630) .
  • the current block 611 can be coded and reconstructed by employing the preceding reference block 621 and/or the subsequent reference block 631, MV0 625, and MV1 635.
  • the motion models discussed above may employ unidirectional inter-prediction 500 and/or bidirectional inter-prediction 600.
  • the direction and type of inter-prediction may depend on the reference index of the relevant motion vector.
  • the type of inter-prediction employed for a motion model may depend on the control point motion vector (s) , which may be determined based on neighbor coded blocks.
  • FIG. 7 is a schematic diagram illustrating an example of an affine motion model 700 for affine inter-prediction.
  • Affine motion model 700 may be used for both unidirectional inter-prediction 500 and bidirectional inter-prediction 600.
  • affine motion model 700 can be applied to determine motion vectors at block compression step 105, block decoding step 113, motion estimation component 221, motion compensation component 219, motion compensation component 321, and/or motion compensation component 421.
  • affine inter-prediction distorts the reference frame (s) so that a current block 701 can be predicted despite certain shape changes while the corresponding object moves between the corresponding frames. Accordingly, the motion vectors for a current block 701 vary across the current block 701.
  • the motion vectors for the current block 701 are described in terms of control point motion vectors. In the example depicted, two control point motion vectors v0 702 and v1 703 are shown for simplicity of discussion.
  • the control point motion vector v0 702 is positioned at the top left corner of the current block 701
  • the control point motion vector v1 703 is positioned at the top right corner of the current block 701.
  • Motion vector v0 702 and motion vector v1 703 contain horizontal (x) components and vertical (y) components that indicate the magnitude of the vectors. Hence, motion vector v0 702 can be described as (v0x, v0y) and motion vector v1 703 can be described as (v1x, v1y) , respectively. Motion vector v0 702 and motion vector v1 703 can be employed to determine an MVF 741 for the entire current block 701.
  • the MVF 741 is a field of vectors that change based on position.
  • a simplified example of the MVF 741 is depicted by dashed arrows calculated from motion vector v0 702 and motion vector v1 703.
  • the current block 701 is divided into sub-blocks 740.
  • the sub-blocks 740 size may be denoted as MxN where M indicates sub-block 740 width and N indicates sub-block 740 height.
  • the sub-blocks 740 size may be determined by many mechanisms. For example, sub-blocks 740 size can be set to a default value known to the encoder/decoder, such as 4x4. In another example, the sub-blocks 740 size can be selected by the encoder during rate distortion optimization and signaled in the bitstream. In some examples, the sub-blocks 740 size is derived based on the motion vector differences of the control points v0 702 and v1 703 as well as the width and height of current block 701.
  • a motion vector for each of the sub-blocks 740 can be determined from the MVF 741.
  • the motion vector for a sub-block may be selected as the value of the MVF 741 at a center pixel of the sub-block.
  • a smaller sub-blocks 740 size results in more granular motion compensation.
  • motion compensation is performed at the pixel level when M ⁇ N is set to 1 ⁇ 1.
  • motion of a current block 701 can be modeled based on the MVF 741, which can be determined based on control point motion vectors, such as v0 702 and v1 703.
  • a motion model includes one or more equations with parameters set based on a number of control point motion vectors, such as v0 702 and v1 703.
  • Motion models may be generated based on one, two, three, or four control point motion vectors as discussed below.
  • FIG. 8 is a schematic diagram illustrating an example of control points employed in complex merge mode 800.
  • Complex merge mode 800 is employed to generate motion models, such as affine motion model 700, for inter-prediction, such as unidirectional inter-prediction 500 and bidirectional inter-prediction 600.
  • Complex merge mode 800 can be employed by an encoder 300 and a decoder 400.
  • complex merge mode can be employed to determine motion vectors at block compression step 105 and/or block decoding step 113, for example by a motion estimation component 221, a motion compensation component 219, a motion compensation component 321, and/or motion compensation component 421.
  • Complex merge mode 800 is applied to a current block 801 to be encoded/decoded via inter-prediction.
  • Current block 801 may be used as a current block 511, 611, and/or 701.
  • the current block 801 may include a first control point (CP 1 ) 851, a second control point (CP 2 ) 852, a third control point (CP 3 ) 853, and/or a fourth control point (CP 4 ) 854.
  • the control points 851-854 include motion vectors derived from motion information contained in neighboring coded blocks. Accordingly, complex merge mode 800 may not signal motion parameters generated according to motion estimation at the encoder.
  • the motion vectors of the control points 851-854 of the current block 801 are derived from motion vectors employed in neighbor coded blocks, which are denoted as A 0 860, A 1 861, A 2 862, B 0 863, B 1 864, B 2 865, B 3 866, and T r 867.
  • the neighbor coded blocks 860-867 are blocks that may be coded according to inter-prediction, and hence may already include at least one motion vector as prediction information. In the event a neighbor coded block 860-867 is not coded according to inter-prediction and/or does not contain a motion vector, such a block is ignored for purposes of the complex merge mode 800.
  • the neighbor coded blocks A 0 860, A 1 861, A 2 862 are positioned to the left side of the current block 801 in the same frame, and the neighbor coded blocks B 0 863, B 1 864, B 2 865, B 3 866 are positioned above the current block 801 in the same frame.
  • the neighbor coded block T r 867 is positioned in a temporally adjacent frame to the frame containing the current block 801 (e.g., a preceding frame or a subsequent frame) .
  • the neighbor coded block T r 867 is depicted in a dashed line to indicate that T r 867 is not positioned in the same frame as the current block 801.
  • T r 867 may be positioned at the same coordinates as the current block 801, and may be positioned in a preceding and/or subsequent frame.
  • the coordinates of CP 1 851, CP 2 852, CP 3 853 and CP 4 854 are (0, 0) , (W, 0) , (H, 0) and (W, H) , respectively, where W and H are the width and height of current block 801.
  • the motion information of each control point can be obtained according to a priority order of availability. For example, a block coded by inter-prediction mode is considered available, while a position/block that has not been coded by inter-prediction mode is considered unavailable.
  • CP 1 851 inherits motion information from B 2 865, A 2 862, or B 3 866. For example, CP 1 851 inherits motion information from B 2 865 when B 2 865 contains such information.
  • CP 1 851 inherits motion information from A 2 862.
  • B 2 865 contains no motion information (e.g., is an intra-prediction block, has not been encoded yet, is not a valid position, etc. )
  • CP 1 851 inherits motion information from B 3 866.
  • CP 2 852 inherits motion information from B 0 863, if available, and otherwise inherits motion information from B 1 864.
  • CP 3 853 inherits motion information from A 0 860, if available, and otherwise inherits motion information from A 1 861.
  • CP 4 854 inherits motion information from T r 867.
  • a merge candidate list containing different motion models can be constructed with different combinations of the control points 851-854.
  • an encoder can first determine the motion vector (s) associated with the control points 851-854 based on corresponding neighbor coded blocks, in priority order, as discussed above. The encoder can then determine motion models for various combinations of the control points 851-854. The motion models are candidates for inter-prediction, and hence act as a candidate list. Once the motion models are determined, the encoder can perform rate distortion optimization to select a motion model from the candidate list to encode the current block 801. For example, the motion model for the current block 801 can be selected by sum of absolute transformed differences (SATD) criterion.
  • SATD absolute transformed differences
  • the encoder can then determine a candidate index corresponding to the position of the selected motion model in the candidate list.
  • the encoder can then signal the selected motion model by signaling the candidate index as a bin in the bitstream.
  • the decoder can generate the same candidate list by employing the same process employed by the encoder.
  • the decoder can then determine the selected motion model based on the candidate index.
  • the decoder can then reconstruct the current block 801 by employing the selected motion model, for example by deriving a motion vector field for the current block and performing motion compensation.
  • the motion models employed in the candidate list are discussed in detail below.
  • a motion model is a set of parameters for representing motion vectors of pixels in current block 801.
  • the present disclosure considers a translational model, a four-parameter affine motion model, an eight-parameter affine motion model, and an eight-parameter bilinear motion model, for clarity of discussion. However, other motion models may also be used without departing from the scope of the present disclosure.
  • a translation motion model employs a single motion vector to represent the movement of all pixels in the current block.
  • four translation motion models can be generated as candidates for the candidate list (e.g., one translation motion model for each control point) .
  • a translation model can be generated according to Equation Set 1:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a 1 and a 2 are transform parameters of the translation motion model determined based on a motion vector for a corresponding control point of the current block.
  • the motion vectors for all pixels inside the entire block 801 are same, which renders a uniform motion vector field across the current block 801.
  • the motion vector is (a 1 , a 2 ) , where a 1 is the change in x value and a 2 is the change in y value of the pixel when moving between the current block 801 and the reference block as predicted by the corresponding control point.
  • Motion vectors of two control points are employed to compute transform parameters in a four-parameter affine model.
  • the two control points can be selected from one of the following six combinations ( ⁇ CP 1 851, CP 4 854 ⁇ , ⁇ CP 2 852, CP 3 853 ⁇ , ⁇ CP 1 851, CP 2 852 ⁇ , ⁇ CP 2 852, CP 4 854 ⁇ , ⁇ CP 1 851, CP 3 853 ⁇ , ⁇ CP 3 853, CP 4 854 ⁇ ) .
  • the candidate list can include up to six four-parameter affine models.
  • a four-parameter affine model can be generated according to Equation Set 2:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a 1 , a 2 , a 3 , and a 4 are transform parameters of the four-parameter affine model determined based on motion vectors for the two corresponding control points of the current block 801.
  • the motion vector field of the four-parameter affine motion model (e.g., motion vector field 741) can be represented by two motion vectors of two corresponding control points and their coordinates.
  • CP 1 851 and CP 2 852 as an example, a four-parameter affine model can be represented as according to Equation Set 3:
  • (v0x, v0y) is the motion vector of the CP 1 851 with the coordinate (0, 0)
  • (v1x, v1y) is motion vector of CP 2 852 with the coordinate (W, 0)
  • W is the width of current block 801.
  • Motion vectors of three control points are employed to compute transform parameters in a six-parameter affine model.
  • the three control points can be selected from one of the following four combinations ( ⁇ CP 1 851, CP 2 852, CP 4 854 ⁇ , ⁇ CP 1 851, CP 2 852, CP 3 853 ⁇ , ⁇ CP 2 852, CP 3 853, CP 4 854 ⁇ , ⁇ CP 1 851, CP 3 853, CP 4 854 ⁇ ) .
  • the candidate list can include up to four six-parameter affine models.
  • a six-parameter affine model can be generated according to Equation Set 4:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a 1 , a 2 , a 3 , a 4 , a 5 , and a 6 are transform parameters of the six-parameter affine model determined based on motion vectors for three control points of the current block. It should be noted that the six-parameter affine model degrades to a four-parameter affine model when a 3 minus a 6 is zero and a 4 plus a 5 is zero.
  • the motion vector field of the six-parameter affine motion model can be represented by the three motion vectors of the three corresponding control points and their coordinates. Taking CP 1 851, CP 2 852, and CP 3 853 as an example, the six-parameter affine model can be represented according to Equation Set 5:
  • (v0x, v0y) is motion vector of CP 1 851 with the coordinate (0, 0)
  • (v1x, v1y) is motion vector of CP 2 852 with the coordinate (W, 0)
  • (v2x, v2y) is motion vector of CP 3 853 with the coordinate (0, H)
  • W and H are the width and height of current block 801, respectively.
  • Motion vectors of all four control points CP 1 851-CP 4 854 are employed to compute transform parameters in an eight-parameter bilinear motion model.
  • the candidate list can include up to one eight-parameter bilinear motion model.
  • a eight-parameter bilinear motion model can be generated according to Equation Set 6:
  • (x, y) is a coordinate of a pixel in the current block, is a motion vector for the pixel
  • (x′, y′) is a coordinate of a corresponding pixel in a reference block
  • a 1 , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 , and a 8 are transform parameters of the eight-parameter bilinear model determined based on motion vectors for four control points of the current block. It should be noted that the eight-parameter bilinear model degrades to a six-parameter affine model when a 7 and a 8 are zero.
  • the motion vector field for the bilinear motion model can be represented by the four motion vectors of CP 1 851-CP 4 854 and their coordinates.
  • the motion vector field for the bilinear motion model can be represented according to Equation Set 7:
  • (v0x, v0y) is the motion vector of CP 1 851 with the coordinate (0, 0)
  • (v1x, v1y) is motion vector of CP 2 852 with the coordinate (W, 0)
  • (v2x, v2y) is motion vector of CP 3 853 with the coordinate (0, H)
  • (v3, v3y) is motion vector of CP 4 854 with the coordinate (W, H)
  • W and H are the width and height of current block 801, respectively.
  • the motion model When the motion information of all the chosen control points can be derived, the motion model is valid. Otherwise, the motion model is rejected and not considered further (e.g., not included in the candidate index list) .
  • the motion vectors for the motion models can then be scaled according to a corresponding reference index.
  • the reference index indicates which frame a current block 801 references.
  • the motion vectors in the motion vector field of a motion model can be scaled according to the reference index of the current block 801 so that motion models can be compared when the motion models employ different reference frames.
  • the reference index of the current block 801 can be inherited from the control points employed by the motion model (s) .
  • the reference index may be specified explicitly in the bitstream.
  • the reference index of current block 801. the reference index with the highest utilization rate among the reference indices of the CP 1 851-CP 4 854 is selected. It should be noted that there may be more than one reference index with the highest utilization rate. In this situation, the smallest such reference index can be selected as the reference index of current block 801.
  • CurPoc denotes the POC of the current frame/picture containing the current block 801
  • DesPoc denotes the POC of the reference picture of current block 801
  • SrcPoc denotes the POC of the reference picture of the control point
  • MV denotes the motion vector of the control point
  • MV s denotes the scaled MV.
  • the motion models for a current block 801 can be generated based on various combinations of CP 1 851-CP 4 854.
  • the motion models can then be scaled based on current block 801 reference index and inserted into a candidate list.
  • the candidate list can contain up to fifteen motion models, which are then iteratively considered by the encoder to determine a selected motion model to predict the current block 801.
  • different motion models represented by different control points CP 1 851-CP 4 854 may produce substantially the same motion vector field, which results is redundancy as a common motion vector field produces the same predictive value from an encoding standpoint.
  • a redundant candidate list increases the complexity of the encoder selection process and increases the bits for signaling the candidate index.
  • a pruning mechanism is disclosed to remove redundant motion models.
  • the motion models for the current block 801 are generated, scaled, and inserted into the candidate list. During insertion, the current motion model is compared with motion models already contained in the candidate list. When a current motion model is the same or substantially similar to a motion model already in the candidate list, the current motion model is rejected from the candidate list and not considered further.
  • Various example mechanisms for determining when a motion model is the same or substantially similar to another motion model are disclosed.
  • the parameters of the motion models can be compared directly.
  • the parameters of the motion models can be derived by the coordinates of control points and the scaled motion vectors. When the parameters of A are the same or similar to B, A is considered redundant relative to B and is discarded. Otherwise, A is inserted into the candidate list.
  • the number of parameters in A is the same as the number of parameters in B (e.g., the motion models are of the same order) .
  • the parameters of A are compared to corresponding parameters of B, and the result of the comparison is compared to a threshold. If the difference is less than the threshold, the motion models are considered redundant.
  • the parameters can be compared to a threshold according to Equation 9:
  • a i are the transform parameters for the motion model A
  • b i are the transform parameters of a motion model B in the candidate list
  • i is a parameter index
  • Th i is the threshold for the corresponding parameter index.
  • the threshold Th i may be predefined or specified in a parameter set of a coded video sequence (e.g., a sequence parameter set, a picture parameter set, a slice parameter set, etc. ) .
  • the threshold may vary for different parameters based on index.
  • the same threshold is employed for all parameters.
  • the number of parameters in A may be different than the number of parameters in B.
  • A has more parameters than B, A can be degraded to same number of parameters as B, and
  • ⁇ Th i (i 1, 2, 3, ...) is true for all parameters, then A is redundant relative to B. Further, if B has more parameters than A, B can degraded to same number of parameters as A, and
  • ⁇ Th i (i 1, 2, 3, ...) is true for all parameters, then A is redundant relative to B.
  • the difference between transform parameters for the motion model and corresponding transform parameters of motion models in the candidate list is compared to a threshold according to
  • ai are the transform parameters for the motion model
  • bi are the transform parameters of a motion model in the candidate list
  • i is a parameter index
  • Thi is the threshold for the corresponding parameter index.
  • generating the candidate list may include only comparing motion models with motion models in the candidate list that include a common number of transform parameters without considering degradation. In such an example, the transform parameters of motion models are only compared when the number of parameters is the same for A and B without considering degradation. Otherwise, the motion models are considered to not be redundant.
  • generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common reference index to a common reference block. In such an example, transform parameters of the motion models are only compared when the reference index of the current motion model is the same as the reference index of the motion model in the candidate list. Otherwise, the motion models are considered to not be redundant.
  • the motion vectors of the control points for the motion models are compared.
  • the parameters of the motion models can be derived by the coordinates of control points and the scaled motion vectors.
  • the motion vectors of the control points for the current motion model are converted to the same coordinates as the motion vectors of the control points for the motion model in the candidate list.
  • the resulting motion vectors are compared relative to a threshold.
  • A can be selected as six parameter affine model with control points CP 1 851, CP 2 852, and CP 3 853.
  • B can be selected as a six parameter affine model with control points CP 1 851, CP 2 852, and CP 4 854.
  • the motion vectors of A and B can be calculated according to Equation Set 4 and/or 5.
  • motion vectors for the control points of the current block 801, as determined for the current motion model can then be compared to the motion vectors as determined for the motion model in the candidate list.
  • the difference between the motion vectors of A and B can be compared to a threshold according to:
  • (vxa i , vya i ) are the control point motion vector parameters for the motion model
  • (vxb i , vyb i ) are the control point motion vector parameters for a motion model in the candidate list
  • i is a parameter index
  • Thx i and Thy i are the threshold values for the corresponding parameter index.
  • the threshold Thx i and Thy i could be predefined or specified in a parameter set of a coded video sequence (e.g., sequence parameter set, picture parameter set, slice parameter set, etc. )
  • the threshold may vary for different motion vectors based on index. In other examples, the same threshold is employed for all motion vectors.
  • A is considered redundant relative to B and is discarded. Otherwise, A is inserted into the candidate list.
  • the number of parameters in A may be different than the number of parameters in B.
  • a or B is degraded, when possible, so that A and B have the same number of parameters.
  • Equation Set 10 The results can then be compared according to Equation Set 10. Hence, if
  • ⁇ Thy i (i 1, 2, 3) is true for all motion vectors, A is redundant relative to B and can be rejected from the candidate list.
  • generating the candidate list may include only comparing motion models with motion models in the candidate list that include a common number of parameters without considering degradation. In such an example, motion models are only compared according to Equation Set 10 when the number of parameters is the same without considering degradation. Otherwise, the motion models are considered to not be redundant.
  • generating the candidate list includes only comparing motion models with motion models in the candidate list that include a common reference index to a common reference block.
  • motion models are only compared according to Equation Set 10 when the reference index of the current motion model is the same as the reference index of the motion model in the candidate list. Otherwise, the motion models are considered to not be redundant.
  • generating the candidate list includes only comparing motion models with motion models in the candidate list that reference a common picture list for the current block.
  • inter-prediction employs multiple reference picture lists for the current block 801.
  • a List0 includes references to preceding frames
  • a List1 includes references to subsequent frames
  • a combination List0 and List1 reference both preceding and subsequent frames for bidirectional inter-prediction.
  • Such lists may be denoted as PRED_L0, PRED_L1, and PRED_BI, respectively.
  • the transform parameters of the motion models or the motion vectors of the motion models are only compared when the motion models reference the same list (e.g., based on the reference indices.
  • the motion models are compared as discussed above. Otherwise, the motion models may be considered to not be redundant. In yet another example, the motion models may be compared if one of the motion models references PRED_BI and the other motion model references PRED_L0 or PRED_L1.
  • the best motion model for predicting the current block 801 is selected, for example by a SATD criterion.
  • Motion compensation can be performed by the encoder to obtain a prediction block for the current block 801 based each motion model in the candidate list. Then the differences between the prediction block and current block 801 can be determined. Such information can then be employed to select the motion model. Such information can also be signaled to the decoder as residual information. The candidate index of the selected motion model in the candidate list is then signaled toward the decoder via the bitstream.
  • a selected motion model can be employed as prediction information, and motion estimation can be performed to find a more accurate motion model according to a process called refined motion model. The differences between the refined motion model and selected motion model can then be signaled in the bitstream.
  • the candidate list is generated by employing the same mechanisms as the encoder. This ensures the candidate list at the decoder contains the same list of motion models as the candidate list at the encoder.
  • the decoder parses the candidate index from the bitstream.
  • the decoder determines the selected motion model based on the candidate list and the candidate index.
  • differences between a refined motion model and the selected motion model can be parsed and added to the selected motion model to obtain a refined motion model at the decoder.
  • the selected and/or refined motion model is then used to derive the motion vector field of the current block 801 and perform motion compensation to reconstruct the current block 801 for use in a video sequence.
  • both the encoder and the decoder may employ motion compensation.
  • the encoder employs motion compensation to reconstruct a current block based on motion models in the candidate list and selects the motion model that provides the best result according to predetermined criteria (e.g., SATD) .
  • the decoder employs motion compensation to reconstruct a current block 801 based on a selected motion model.
  • Motion compensation includes generating a motion vector field based on the motion vectors at CP 1 851, CP 2 852, CP 3 853 and/or CP 4 854, depending on the motion model.
  • the motion compensation process also determines a size of a motion compensation unit, which may be referred to as a sub-block, such as sub-block 740.
  • Each sub-block is assigned a motion vector from a corresponding point in the motion vector field.
  • the sub-block size is denoted as M ⁇ N , and may be determined by many mechanisms. For example, the sub-block size can be set to 4 ⁇ 4 as a default value. As another example, the sub-block size can be derived based on motion vector differences of the control points CP 1 851, CP 2 852, CP 3 853 and/or CP 4 854 and the width and height of current block. Sub-block size affects the granularity of the application of the motion vector field, from the motion model, to the current block 801. For example, when M ⁇ N is set to 1 ⁇ 1, the motion compensation is performed at a pixel level. Larger MxN sizes result in larger areas of the current block 801 receiving the same motion vector from the motion vector fields.
  • the motion vector of a pixel at a predefined location in a sub-block is calculated according to the motion model of the current block 801. Such motion vector is then used as the motion vector representing the motion of the entire sub-block.
  • the predefined location could be, for a M ⁇ N sub-block, the center pixel (e.g., M/2, N/2) , the top-left pixel (0, 0) , the top-right pixel (M-1, 0) , or other locations in the sub-block.
  • the center location is assumed hereinafter for illustration.
  • the center pixel coordinates are calculated according to Equation 11:
  • M indicates sub-block width
  • N indicates sub-block height
  • i indicates the ith sub-block counted from left to right
  • j indicates the jth sub-block counted from top to bottom
  • (x (i, j) , y (i, j) ) indicates the central pixel of the (i, j) sub-block relative to the top-left pixle of the current block.
  • the coordinates of CP 1 851, CP 2 852, CP 3 853 and/or CP 4 854 and the scaled motion vectors are used to derive motion vectors of each sub-block for the selected motion model.
  • the parameters a 1 , a 2 , a 3 and a 4 are derived by the coordinates of the control points and the scaled motion vectors according to Equation Set 2.
  • the motion vector (vx (i, j) , vy (i, j) ) of each sub-block is calculated according to the Equation Set 2 and rounded to the predefined motion vector precision, using the coordinates (x (i, j) , y (i, j) ) as input.
  • Equation Set 3 can be used directly if the motion model is represented by CP 1 851 and CP 2 852.
  • the parameters a 1 , a 2 , a 3 , a 4 , a 5 and a 6 are derived by the coordinates of control points and the scaled motion vectors according to Equation Set 4. Then the motion vector (vx (i, j) , vy (i, j) ) of each sub-block is calculated according to Equation Set 4 and rounded to the predefined motion vector precision, using the coordinates (x (i, j) , y (i, j) ) as input.
  • Equation Set 5 can be used directly if the motion model is represented by CP 1 851, CP 2 852, CP 3 853. Accordingly, motion compensation derives the motion vector (vx (i, j) , vy (i, j) ) of each sub-block according to the selected motion model of current block 801.
  • the sub-block at the specified location may be copied and used as prediction information for the current sub-block.
  • an interpolation filter may be applied to the sub-block at the specified location to obtain the prediction of the current sub-block.
  • FIG. 9 is a flowchart of an example method 900 of pruning a motion model candidate list used for complex merge mode, such as complex merge mode 800, based inter-prediction.
  • Method 900 may be employed to generate motion models, such as affine motion model 700, for inter-prediction, such as unidirectional inter-prediction 500 and bidirectional inter-prediction 600.
  • Method 900 can be employed by an encoder 300 and/or a decoder 400.
  • method 900 can be employed to determine motion vectors at block compression step 105 and/or block decoding step 113, for example by a motion estimation component 221, a motion compensation component 219, a motion compensation component 321, and/or motion compensation component 421.
  • Step 901 can be employed at a decoder or an encoder, but includes different mechanisms at the decoder than the encoder.
  • a decoder employs a receiver to receive a bitstream with a plurality of coded blocks including a current block and neighbor coded blocks. Data from the bitstream indicates that the current block should be decoded by employing complex merge mode based on the neighbor coded blocks.
  • the encoder partitions a frame into a plurality of blocks and encodes neighbor blocks prior to encoding a current block. The encoder determines to encode the current block by employing complex merge mode based on the neighbor coded blocks.
  • motion vectors for control points of the current block are determined based on the neighbor coded blocks.
  • the control points may include CP 1 851, CP 2 852, CP 3 853 and/or CP 4 854 as discussed with respect to complex merge mode 800. Further, the motion vectors for the control points can be determined based on neighbor coded blocks such as blocks A 0 860, A 1 861, A 2 862, B 0 863, B 1 864, B 2 865, B 3 866, and T r 867 as discussed with respect to complex merge mode 800.
  • the method 900 may proceed to step 905.
  • a plurality of motion models is generated based on the motion vectors for the control points.
  • the motion models may include a translation model, a four-parameter affine model, a six-parameter affine model, and/or an eight-parameter bilinear model as described by Equation Set 1, Equation Sets 2-3, Equation Sets 4-5, and Equation Sets 6-7, respectively.
  • the motion vectors at various combinations of the control points are employed to generate motion models. Further, the motion models may be scaled according to the reference index of the current block, for example according to Equation 8.
  • a candidate list is generated. Generating the candidate list includes inserting the motion models generated at step 905 into the candidate list in a predetermined order. Prior to inserting a current motion model into the candidate list, the current motion model is compared to the motion models already included in the candidate list. The current motion model is inserted into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold. The current motion model is rejected from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the parameters of the current motion model can be compared to motion model (s) in the candidate list and a threshold according to Equation 9.
  • the compared parameters are transform parameters, and each transform parameter of the current motion model is compared to a corresponding transform parameter of a motion model in the candidate list and a difference between the transform parameters is compared to a threshold.
  • the parameters of the current motion model can be compared to motion model (s) in the candidate list and a threshold according to Equation Set 10.
  • the compared parameters are the motion vectors for the control points of the current block, and each control point motion vector component of the current motion model is compared to a corresponding control point motion vector component of a motion model in the candidate list and a difference between the control point motion vectors components is compared to a threshold.
  • a higher order motion model can be degraded to a lower order motion model for comparison at step 907.
  • a degraded higher order motion model may be compared with a lower order motion model in the candidate list with a common number of parameters. The difference between transform parameters for the degraded motion model and corresponding transform parameters of the lower order motion model in the candidate list can then be compared with a threshold. When the difference is less than or equal to the threshold, the higher order motion models is rejected from the candidate list.
  • a higher order motion model in the candidate index could also be degraded for comparison with a lower order motion model being prepared for insertion into the candidate list.
  • generating the candidate list includes only comparing motion models with motion models in the candidate list when both models include a common number of parameters. In such a case, the higher order motion model is not degraded even when degradation is possible.
  • the motion models are only compared at step 907 when the current motion model and the corresponding motion model in the candidate list include a common reference index to a common reference block. In such a case, the motion models can be considered not redundant during candidate list creation when the motion models employ different reference indices and hence reference different reference blocks.
  • the motion models are only compared at step 907 when the current motion model and the motion model in the candidate list reference a common picture list for the current block. For example, a motion model that references a preceding reference frame and a motion model that references a subsequent frame can be considered not redundant during candidate list creation.
  • a bidirectional motion model may still be compared to a preceding reference unidirectional motion model and/or a subsequent reference unidirectional motion model.
  • a selected motion model is determined based on the candidate list.
  • a rate distortion optimization process can be employed to select a motion model from the candidate list.
  • the selected motion model can then be employed to encode the current block.
  • a candidate index for the selected motion model can then be determined for transmission to the decoder as prediction information.
  • step 909 includes parsing the bitstream to obtain the candidate index indicating a selected motion model from the candidate list.
  • the decoder can then obtain the selected motion model from the candidate list based on the candidate index.
  • Step 910 can be employed at a decoder or an encoder, but includes different mechanisms at the decoder than the encoder.
  • the current block is reconstructed by performing motion compensation of the current block based on the selected motion model.
  • the reconstructed block is included in a reconstructed frame.
  • the reconstructed frame is then included in a video sequence.
  • the decoder forwards the video frame including the current block for display as part of a video sequence.
  • the candidate index determined at step 909 is encoded in a bitstream to indicate the selected motion model from the candidate list to the decoder.
  • the encoder can then transmit the bitstream including the candidate index and any other predication information toward a decoder to support reconstruction of the current block as part of a video sequence.
  • FIG. 10 is a schematic diagram of an example video coding device 1000 according to an embodiment of the disclosure.
  • the video coding device 1000 is suitable for implementing the disclosed examples/embodiments as described herein.
  • the video coding device 1000 comprises downstream ports 1020, upstream ports 1050, and/or transceiver units (Tx/Rx) 1010, including transmitters and/or receivers for communicating data upstream and/or downstream over a network.
  • the video coding device 1000 also includes a processor 1030 including a logic unit and/or central processing unit (CPU) to process the data and a memory 1032 for storing the data.
  • CPU central processing unit
  • the video coding device 1000 may also comprise optical-to-electrical (OE) components, electrical-to-optical (EO) components, and/or wireless communication components coupled to the upstream ports 1050 and/or downstream ports 1020 for communication of data via optical or wireless communication networks.
  • the video coding device 1000 may also include input and/or output (I/O) devices 1060 for communicating data to and from a user.
  • the I/O devices 1060 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc.
  • the I/O devices 1060 may also include input devices, such as a keyboard, mouse, trackball, etc. and/or corresponding interfaces for interacting with such output devices.
  • the processor 1030 is implemented by hardware and software.
  • the processor 1030 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor) , field-programmable gate arrays (FPGAs) , application specific integrated circuits (ASICs) , and digital signal processors (DSPs) .
  • the processor 1030 is in communication with the downstream ports 1020, Tx/Rx 1010, upstream ports 1050, and memory 1032.
  • the processor 1030 comprises a coding module 1014.
  • the coding module 1014 implements the disclosed embodiments described above, such as methods 100, and/or 900, unidirectional inter-prediction 500, bidirectional inter-prediction 600, affine motion model 700, complex merge mode 800, and/or any other method/mechanism described herein. Further, the coding module 1014 may implement a codec system 200, an encoder 300, and/or a decoder 400. Accordingly, coding module 1014 can be employed to generate a candidate list. Specifically, the coding module 1014 may generate a plurality of motion models based on motion vectors for control points of a current block. The coding module 1014 may iteratively insert the motion models into a candidate list while employing a pruning mechanism.
  • the coding module 1014 inserts a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold. Also, the coding module 1014 rejects a current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold. Further, coding module 1014 effects a transformation of the video coding device 1000 to a different state. Alternatively, the coding module 1014 can be implemented as instructions stored in the memory 1032 and executed by the processor 1030 (e.g., as a computer program product stored on a non-transitory medium) .
  • the memory 1032 comprises one or more memory types such as disks, tape drives, solid-state drives, read only memory (ROM) , random access memory (RAM) , flash memory, ternary content-addressable memory (TCAM) , static random-access memory (SRAM) , etc.
  • the memory 1032 may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • FIG. 11 is an embodiment of a device 1100 for pruning a motion model candidate list used for complex merge mode based inter-prediction, such as complex merge mode 800, based inter-prediction.
  • Device 1100 may be employed to implement method 900.
  • device 1100 may be employed to generate motion models, such as affine motion model 700, for inter-prediction, such as unidirectional inter-prediction 500 and bidirectional inter-prediction 600.
  • Device 1100 can be employed as an encoder 300 and/or a decoder 400.
  • device 1100 can be employed to determine motion vectors at block compression step 105 and/or block decoding step 113, for example by a motion estimation component 221, a motion compensation component 219, a motion compensation component 321, and/or motion compensation component 421.
  • the device 1100 may include a receiver 1101 configured to receive a bitstream with a plurality of coded blocks including a current block and neighbor coded blocks.
  • the device 1100 also includes a motion vector determination module 1103 configured to determine motion vectors for control points of the current block from the neighbor coded blocks.
  • the device 1100 also includes a motion model generation module 1105 configured to generate a plurality of motion models based on the motion vectors for the control points.
  • the device 1100 also includes a candidate list module 1107 configured to generate a candidate list. Specifically, the candidate list module 1107 is configured to insert a current motion model into the candidate list when a difference between parameters for the current motion model and corresponding parameters of each motion model in the candidate list is greater than a threshold.
  • the candidate list module 1107 is configured to reject the current motion model from the candidate list when a difference between parameters for the current motion model and corresponding parameters of at least one motion model in the candidate list is less than or equal to the threshold.
  • the device 1100 may include a parsing module 1109 configured to parse the bitstream to obtain a candidate index indicating a selected motion model from the candidate list.
  • the device 1100 may also include a reconstruction module 1111 configured to reconstruct the current block by performing motion compensation of the current block based on the selected motion model, and forward a video frame including the current block for display as part of a video sequence.
  • device 1100 is depicted as a decoder device. However, device 1100 may be converted to an encoder device by omitting the receiver 1101, the parsing module 1109, and the reconstruction module 1111. In an encoder such modules could be replaced by an encoding module (e.g. coding module 1014) configured to perform rate distortion optimization to select a selected motion model from the candidate list to encode the current block; and encode a candidate index in a bitstream, the candidate index indicating the selected motion model from the candidate list. As an encoder, the device 1100 also includes a transmitter (e.g., tx/rx 1010) configured to transmit the bitstream toward a decoder for reconstruction as a video sequence.
  • a transmitter e.g., tx/rx 1010
  • a first component is directly coupled to a second component when there are no intervening components, except for a line, a trace, or another medium between the first component and the second component.
  • the first component is indirectly coupled to the second component when there are intervening components other than a line, a trace, or another medium between the first component and the second component.
  • the term “coupled” and its variants include both directly coupled and indirectly coupled. The use of the term “about” means a range including ⁇ 10%of the subsequent number unless otherwise stated.

Abstract

L'invention concerne un mécanisme d'élagage d'une liste de candidats de modèle de mouvement pour une inter-prédiction. Le mécanisme consiste à déterminer des vecteurs de mouvement pour des points de commande d'un bloc courant à partir de blocs codés voisins. Une pluralité de modèles de mouvement sont générés sur la base des vecteurs de mouvement pour les points de commande. Une liste de candidats est générée. Un modèle de mouvement est inséré dans la liste de candidats lorsqu'une différence entre des paramètres pour un modèle de mouvement courant et des paramètres correspondants de chaque modèle de mouvement dans la liste de candidats est supérieure à un seuil. Sinon, le modèle de mouvement courant est rejeté à partir de la liste de candidats. Un index candidat indique un modèle de mouvement sélectionné à partir de la liste de candidats pour le codage/décodage.
PCT/CN2018/109618 2017-10-13 2018-10-10 Élagage de liste de candidats de modèle de mouvement pour une inter-prédiction WO2019072187A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762572214P 2017-10-13 2017-10-13
US62/572,214 2017-10-13
US201862724387P 2018-08-29 2018-08-29
US62/724,387 2018-08-29

Publications (1)

Publication Number Publication Date
WO2019072187A1 true WO2019072187A1 (fr) 2019-04-18

Family

ID=66100428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109618 WO2019072187A1 (fr) 2017-10-13 2018-10-10 Élagage de liste de candidats de modèle de mouvement pour une inter-prédiction

Country Status (1)

Country Link
WO (1) WO2019072187A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112868234A (zh) * 2019-09-24 2021-05-28 深圳市大疆创新科技有限公司 运动估计方法、系统和存储介质
CN114071147A (zh) * 2020-07-29 2022-02-18 四川大学 一种基于双线性模型的vvc运动补偿方法
WO2022088003A1 (fr) * 2020-10-30 2022-05-05 华为技术有限公司 Procédé de transmission d'informations, procédé de traitement léger et appareil de communication associé
WO2023078430A1 (fr) * 2021-11-05 2023-05-11 Beijing Bytedance Network Technology Co., Ltd. Procédé, appareil et support de traitement vidéo

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935938A (zh) * 2015-07-15 2015-09-23 哈尔滨工业大学 一种混合视频编码标准中帧间预测方法
WO2016008408A1 (fr) * 2014-07-18 2016-01-21 Mediatek Singapore Pte. Ltd. Procédé de dérivation de vecteur de mouvement pour un codage vidéo
WO2017026681A1 (fr) * 2015-08-07 2017-02-16 엘지전자 주식회사 Procédé et dispositif d'interprédiction dans un système de codage vidéo
WO2017118411A1 (fr) * 2016-01-07 2017-07-13 Mediatek Inc. Procédé et appareil pour une prédiction inter affine pour un système de codage vidéo
WO2017148345A1 (fr) * 2016-03-01 2017-09-08 Mediatek Inc. Procédé et appareil de codage vidéo à compensation de mouvement affine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016008408A1 (fr) * 2014-07-18 2016-01-21 Mediatek Singapore Pte. Ltd. Procédé de dérivation de vecteur de mouvement pour un codage vidéo
CN104935938A (zh) * 2015-07-15 2015-09-23 哈尔滨工业大学 一种混合视频编码标准中帧间预测方法
WO2017026681A1 (fr) * 2015-08-07 2017-02-16 엘지전자 주식회사 Procédé et dispositif d'interprédiction dans un système de codage vidéo
WO2017118411A1 (fr) * 2016-01-07 2017-07-13 Mediatek Inc. Procédé et appareil pour une prédiction inter affine pour un système de codage vidéo
WO2017148345A1 (fr) * 2016-03-01 2017-09-08 Mediatek Inc. Procédé et appareil de codage vidéo à compensation de mouvement affine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112868234A (zh) * 2019-09-24 2021-05-28 深圳市大疆创新科技有限公司 运动估计方法、系统和存储介质
CN114071147A (zh) * 2020-07-29 2022-02-18 四川大学 一种基于双线性模型的vvc运动补偿方法
WO2022088003A1 (fr) * 2020-10-30 2022-05-05 华为技术有限公司 Procédé de transmission d'informations, procédé de traitement léger et appareil de communication associé
WO2023078430A1 (fr) * 2021-11-05 2023-05-11 Beijing Bytedance Network Technology Co., Ltd. Procédé, appareil et support de traitement vidéo

Similar Documents

Publication Publication Date Title
US11146809B2 (en) Adaptive interpolation filter
US10609384B2 (en) Restriction on sub-block size derivation for affine inter prediction
US11877006B2 (en) Intra-prediction using a cross-component linear model in video coding
US11917130B2 (en) Error mitigation in sub-picture bitstream based viewpoint dependent video coding
US10595019B2 (en) Noise suppression filter parameter estimation for video coding
US20190007699A1 (en) Decoder Side Motion Vector Derivation in Video Coding
US11109026B2 (en) Luma and chroma block partitioning
US10841794B2 (en) Adaptive motion vector resolution
US20180295364A1 (en) Noise Suppression Filter
US11606571B2 (en) Spatial varying transform for video coding
WO2019079611A1 (fr) Génération de liste de candidats de vecteur de mouvement dépendant de la disponibilité d'un bloc voisin
WO2019072187A1 (fr) Élagage de liste de candidats de modèle de mouvement pour une inter-prédiction
WO2020057516A1 (fr) Division avec contrainte de haut niveau
WO2020069652A1 (fr) Procédé de construction de mv candidat pour un mode hmvp

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18866373

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18866373

Country of ref document: EP

Kind code of ref document: A1