CN116491120A

CN116491120A - Method and apparatus for affine motion compensated prediction refinement

Info

Publication number: CN116491120A
Application number: CN202180070773.5A
Authority: CN
Inventors: 陈伟; 修晓宇; 陈漪纹; 马宗全; 朱弘正; 郭哲玮; 王祥林; 于冰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2021-10-14
Publication date: 2023-07-25
Also published as: WO2022081878A1

Abstract

A method and apparatus for affine motion compensated prediction refinement (AMPR) are provided. The method comprises the following steps: determining whether AMPR is applied to a Coding Unit (CU), and adjusting coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples in response to determining that AMPR is applied to the CU.

Description

Method and apparatus for affine motion compensated prediction refinement

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No.63/091,892 entitled "Affine Motion-Compensated Prediction Refinement," filed on day 14, 10, 2020, the entire contents of which are incorporated herein by reference for all purposes.

Technical Field

The present disclosure relates to video coding (coding) and compression, and in particular, but not limited to, methods and apparatus for affine motion compensated prediction refinement (AMPR) in video coding.

Background

Various video codec techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, some well known video codec standards today include the universal video codec (VVC), the high efficiency video codec (HEVC, also known as h.265 or MPEG-H part 2) and the advanced video codec (AVC, also known as h.264 or MPEG-4 part 10) developed by the combination of ISO/IEC MPEG and ITU-T VECG. AOMedia video 1 (AV 1) was developed by the open media Alliance (AOM) as an inheritor of its previous standard VP 9. Audio video codec (AVS), which relates to digital audio and digital video compression standards, is another family of video compression standards developed by the chinese audio video codec standards working group. Most existing video codec standards build on top of the well-known hybrid video codec framework, i.e. use block-based prediction methods, such as inter-prediction, intra-prediction, to reduce redundancy present in video pictures or sequences, and transform coding to compress the energy of the prediction errors. An important goal of video codec technology is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality.

The first generation AVS standard comprises China national Standard information technology, advanced audio and video coding and decoding, part 2: video "(called AVS 1) and" information technology, "advanced audio video codec part 16: broadcast television video "(known as avs+). It can save about 50% of the bit rate compared to the MPEG-2 standard at the same perceived quality. The AVS 1 standard video part was issued as Chinese national standard in month 2 of 2006. The second generation AVS standard includes the chinese national standard series "information technology," high efficiency multimedia codec "(referred to as AVS 2), which is primarily directed to the transmission of additional high definition television programs. The codec efficiency of AVS2 is twice that of avs+. AVS2 was released as a national standard in china 5 months 2016. Meanwhile, the AVS2 standard video part is submitted by the Institute of Electrical and Electronics Engineers (IEEE) as an international application standard. The AVS3 standard is a new generation video codec standard for UHD video applications, aimed at exceeding the codec efficiency of the latest international standard HEVC. 3 months 2019, the AVS3-P2 baseline was completed at 68 th AVS conference, which saved the bit rate by about 30% relative to the HEVC standard. Currently, there is a reference software called High Performance Model (HPM), maintained by the AVS group to demonstrate one reference implementation of the AVS3 standard.

Disclosure of Invention

The present disclosure provides examples relating to techniques for AMPR of the AVS3 standard.

According to a first aspect of the present disclosure, a method for AMPR is provided. The method comprises the following steps: determining whether AMPR is applied to a Coding Unit (CU); and in response to determining that AMPR is applied to the CU, adjusting coefficients of a motion compensated interpolation filter used to generate the intermediate prediction samples.

According to a second aspect of the present disclosure, there is provided a method for predicting a sample at a pixel location (1-operation) in a sub-block by implementing AMPR. The method comprises the following steps: generating a plurality of affine motion compensated predictions at pixel locations in a sub-block and a plurality of neighboring pixel locations; obtaining initialized coefficients of a filter by initializing the coefficients of the filter to a set of predefined values; scaling the initialization coefficients using one or more online scaling factors to obtain a scaled filter; and at the pixel location, obtaining a refined prediction of the sample at the pixel location in the sub-block based on the plurality of affine motion compensated predictions using the scaling filter.

According to a third aspect of the present disclosure, a method for AMPR is provided. The method comprises the following steps: determining whether a neighboring sample point relative to a sample point location in a sub-block is outside the sub-block, and in response to determining that the neighboring sample point is outside the sub-block, determining a fill sample point based on one or more reference sample points at one or more integer locations and copying the fill sample point to the neighboring sample point location for filter-based AMPR implementation.

According to a fourth aspect of the present disclosure, there is provided an apparatus for AMPR. The device comprises: one or more processors, and a memory configured to store instructions executable by the one or more processors. When executing the instructions, the one or more processors are configured to perform any method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided an apparatus for predicting a sample at a pixel location in a sub-block by implementing AMPR. The device comprises: one or more processors, and a memory configured to store instructions executable by the one or more processors. When executing the instructions, the one or more processors are configured to perform any method according to the second aspect.

According to a sixth aspect of the present disclosure there is provided an apparatus for AMPR. The device comprises: one or more processors, and a memory configured to store instructions executable by the one or more processors. When executing the instructions, the one or more processors are configured to perform any method according to the third aspect.

According to a seventh aspect of the present disclosure there is provided a non-transitory computer-readable storage medium for AMPR storing computer-executable instructions which, when executed by one or more computer processors, cause the one or more computer processors to perform any method according to the first aspect.

According to an eighth aspect of the present disclosure there is provided a non-transitory computer-readable storage medium storing computer-executable instructions for predicting a sample point at a pixel location in a sub-block by implementing AMPR, which instructions, when executed by one or more computer processors, cause the one or more computer processors to perform any method according to the second aspect.

According to a ninth aspect of the present disclosure there is provided a non-transitory computer-readable storage medium for AMPR storing computer-executable instructions which, when executed by one or more computer processors, cause the one or more computer processors to perform any method according to the third aspect.

Drawings

A more particular description of the examples of the disclosure will be rendered by reference to specific examples that are illustrated in the appended drawings. Considering that these drawings depict only some examples and are therefore not to be considered limiting of scope, these examples will be described and explained with additional specificity and detail through the use of these drawings.

Fig. 1 is a block diagram illustrating a video encoder according to some implementations of the present disclosure.

Fig. 2 is a block diagram illustrating a video decoder according to some implementations of the present disclosure.

Fig. 3A-3E are diagrams illustrating multi-type tree splitting patterns according to some implementations of the present disclosure.

FIG. 4 is a schematic diagram illustrating one example of a bi-directional optical flow (BIO) model in accordance with some implementations of the present disclosure.

Fig. 5A is a schematic diagram illustrating one example of a four-parameter affine model according to some implementations of the present disclosure.

Fig. 5B is a schematic diagram illustrating one example of a four-parameter affine model according to some implementations of the present disclosure.

Fig. 6 is a schematic diagram illustrating one example of a six-parameter affine model according to some implementations of the present disclosure.

Fig. 7 illustrates a Prediction Refinement (PROF) process for affine patterns using optical flow according to some implementations of the present disclosure.

Fig. 8 illustrates one example of the calculation of horizontal and vertical offsets from a sample position to a particular position of a sub-block where the sub-block MV is located, according to some implementations of the present disclosure.

Fig. 9 illustrates one example of one affine CU inner sub-block in accordance with some implementations of the present disclosure.

Fig. 10A illustrates one example of a five tap diamond filter according to some implementations of the present disclosure.

Fig. 10B illustrates one example of a nine tap diamond filter according to some implementations of the present disclosure.

Fig. 1a illustrates one example of a five tap diamond filter scaled by MV difference in horizontal and vertical directions according to some implementations of the present disclosure.

Fig. 1b illustrates one example of a nine tap diamond filter scaled by MV difference in horizontal and vertical directions according to some implementations of the present disclosure.

Fig. 12 illustrates a spatial relationship of one fractional position and its four adjacent integer positions according to some implementations of the present disclosure.

Fig. 13 illustrates one example of extending boundary prediction samples of one sub-block to its extended region, according to some implementations of the present disclosure.

Fig. 14 is a block diagram illustrating an apparatus for AMPR in accordance with some implementations of the present disclosure.

Fig. 15 is a flow chart illustrating one AMPR procedure in accordance with some implementations of the present disclosure.

Fig. 16 is a flow chart illustrating a process by implementing samples at pixel locations in an AMPR predictor block, in accordance with some implementations of the present disclosure.

Fig. 17 is a flow chart illustrating one AMPR procedure in accordance with some implementations of the present disclosure.

Detailed Description

Reference will now be made in detail to specific implementations, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific, non-limiting details are set forth in order to provide an understanding of the subject matter presented herein. However, it will be apparent to those of ordinary skill in the art that a variety of different alternatives may be used. For example, it will be apparent to those of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.

Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described with respect to one or some embodiments may also be applicable to other embodiments unless explicitly stated otherwise.

Throughout this disclosure, unless explicitly stated otherwise, the terms "first," "second," "third," and the like, are used as names for reference only to related elements, such as devices, components, compositions, steps, etc., and do not imply any spatial or temporal order. For example, a "first device" and a "second device" may refer to two separately formed devices, or two portions, components, or operational states of the same device, and may be arbitrarily named.

The terms "module," "sub-module," "circuit," "sub-circuit," "circuitry," "sub-circuitry," "unit," or "sub-unit" may include a memory (shared, dedicated, or group) that stores code or instructions that may be executed by one or more processors. A module may include one or more circuits with or without stored code or instructions. The module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to each other or adjacent to each other.

As used herein, the term "if" or "when" is "understood to mean" one "or" responsive to "depending on the context. These terms may not indicate that the relevant limitations or features are conditional or optional if present in the claims. For example, a method may include the steps of: i) When or if condition X exists, performing a function or action X ', and ii) when or if condition Y exists, performing a function or action Y'. The method may be implemented with both the ability to perform a function or action X 'and the ability to perform a function or action Y'. Thus, both functions X 'and Y' may be performed at different times in multiple executions of the method.

The units or modules may be implemented solely by software, solely by hardware or by a combination of hardware and software. In a software-only implementation, for example, a unit or module may include functionally related code blocks or software components that are directly or indirectly linked together in order to perform a specific function.

Fig. 1 shows a block diagram illustrating a block-based hybrid video encoder 100, the hybrid video encoder 100 may be used in conjunction with many video codec standards that use block-based processing. In encoder 100, a video frame is partitioned into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method. In inter prediction, one or more prediction values are formed by motion estimation and motion compensation based on pixels from a previously reconstructed frame. In intra prediction, a prediction value is formed based on pixels reconstructed in the current frame. Through mode decision, the best predictor may be selected to predict the current block.

Intra prediction (also referred to as "spatial prediction") predicts a current video block using pixels from samples (referred to as reference samples) of neighboring blocks already encoded in the same video picture and/or slice. Spatial prediction reduces the spatial redundancy inherent in video signals.

Inter prediction (also referred to as "temporal prediction") predicts a current video block using reconstructed pixels from an already encoded video picture. Temporal prediction reduces the inherent temporal redundancy in video signals. The temporal prediction signal for a given Coding Unit (CU) or coding block is typically signaled by one or more Motion Vectors (MVs) that indicate the amount and direction of motion between the current CU and its temporal reference. In addition, if multiple reference pictures are supported, one reference picture index is additionally transmitted, which is used to identify from which reference picture in the reference picture store the temporal prediction signal came.

After spatial and/or temporal prediction is performed, intra/inter mode decision circuitry 121 in encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. The block predictor 120 is then subtracted from the current video block; and decorrelates the resulting prediction residual using transform circuitry 102 and quantization circuitry 104. The resulting quantized residual coefficients are dequantized by dequantization circuitry 116 and inverse transformed by inverse transform circuitry 118 to form reconstructed residuals, which are then added back to the prediction block to form the reconstructed signal of the CU. The reconstructed CU may be further applied with loop filtering 115, such as a deblocking filter, a Sample Adaptive Offset (SAO), and/or an Adaptive Loop Filter (ALF), before being placed into a reference picture store of the picture buffer 117 and used to encode future video blocks. To form the output video bitstream 114, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit 106 to be further compressed and packetized to form the bitstream.

For example, deblocking filters are available in AVC, HEVC, and current versions of VVC. In HEVC, an additional loop filter called SAO (sample adaptive offset) is defined to further improve the codec efficiency. In the current version of the VVC standard, still another loop filter called ALF (adaptive loop filter) is being actively studied, and has a high possibility of being incorporated into the final standard.

These loop filter operations are optional. Performing these operations helps to improve codec efficiency and visual quality. As decisions made by the encoder 100, they may also be turned off to save computational complexity.

It should be noted that intra prediction is typically based on unfiltered reconstructed pixels, whereas inter prediction is based on filtered reconstructed pixels if the encoder 100 turns on these filter options.

Fig. 2 is a block diagram illustrating a block-based video decoder 200 that may be used in connection with many video codec standards. The decoder 200 is similar to the reconstruction-related portion located in the encoder 100 of fig. 1. In decoder 200, an incoming video bitstream 201 is first decoded by entropy decoding 202 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed through inverse quantization 204 and inverse transformation 206 to obtain a reconstructed prediction residual. The block predictor mechanism implemented in the intra/inter mode selector 212 is configured to perform intra prediction 208 or motion compensation 210 based on the decoded prediction information. By using adder 214, the reconstructed prediction residual from inverse transform 206 and the prediction output generated by the block predictor mechanism are added to obtain a set of unfiltered reconstructed pixels.

The reconstructed block may further pass through a loop filter 209 before it is stored in a picture buffer 213, which serves as a reference picture store. The reconstructed video in the picture buffer 213 may then be sent out to drive the display device and used to predict future video blocks. With loop filter 209 turned on, a filtering operation is performed on these reconstructed pixels to derive the final reconstructed video output 222.

The above-mentioned video encoding/decoding standards such as HEVC and AV3 are conceptually similar. For example, they all use a block-based hybrid video codec framework. The block partitioning scheme in some standards is described in detail below.

HEVC partitions blocks based on quadtrees only. The basic unit for compression is called a Coding Tree Unit (CTU). Each CTU may contain one Coding Unit (CU) or be recursively split into four smaller CUs until a predefined minimum CU size is reached. Each CU, also referred to as a leaf CU, includes one or more Prediction Units (PUs) and a Transform Unit (TU) tree.

In AVS3, one Coding Tree Unit (CTU) is split into CUs based on quadtrees/binary/extended quadtrees to accommodate varying local characteristics. Furthermore, the concept of multi-partition unit types in HEVC is removed, i.e., no separation of CUs, PUs and TUs is present in AVS 3. Conversely, each CU always serves as a base unit for both prediction and transformation without further segmentation. In the tree partition structure of AVS3, one CTU is first partitioned based on the quadtree structure. Each quadtree node may then be further partitioned based on the binary and extended quadtree structures.

Fig. 3A-3E are diagrams illustrating multi-type tree splitting patterns according to some implementations of the present disclosure. As shown in fig. 3A-3E, there are five split types in the multi-type tree structure: quaternary segmentation 301, vertical binary segmentation 302, horizontal binary segmentation 303, vertical extended quaternary segmentation 304, and horizontal extended quaternary segmentation 305.

In current VVC and AVS3 standards, block-based motion compensation may be applied to achieve trade-offs between codec efficiency, complexity, and memory access bandwidth. The average prediction accuracy is lower than pixel-based prediction because all pixels within each block or sub-block share the same block-level motion vector. In order to improve the prediction accuracy at each pixel, a PROF for affine mode is employed as a codec tool in the current VVC standard. In AVS3, no similar tools exist.

As described in detail below, some examples of the present disclosure provide alternative optical flow-based methods to improve affine pattern prediction accuracy.

Affine pattern

In HEVC, only translational motion models are applied to motion compensated prediction. While in the real world there are many kinds of movements, such as zoom in/out, rotation, perspective movements and other irregular movements. In VVC and AVS3, affine motion compensation prediction is applied by: a flag is signaled for each inter coded block to indicate whether a translational motion model or an affine motion model is applied for inter prediction. In current VVC and AVS3 designs, one affine encoding block supports two affine modes, including a four-parameter affine mode and a six-parameter affine mode.

The four-parameter affine model may have the following parameters: two parameters for translational movement in the horizontal and vertical directions, respectively, one parameter for scaling movement, and one parameter for rotational movement in both directions. In the four-parameter affine model, the horizontal scaling parameter may be equal to the vertical scaling parameter, and the horizontal rotation parameter may be equal to the vertical rotation parameter. In order to achieve a better adjustment of the motion vectors and affine parameters, those affine parameters may be derived from two MVs located in the top left and top right corner of the current block, also referred to as Control Point Motion Vectors (CPMV).

Fig. 5A-5B are diagrams illustrating examples of four-parameter affine models according to some implementations of the present disclosure. As shown in fig. 5A-5B, the affine motion field of a block is controlled by two control points MV (V ₀ ，V ₁ ) To describe. Motion field (v) of an affine coded block based on control point motion _x ，v _y ) Is described as follows formula (1):

wherein (v) _0x ，v _0y ) Sum (v) _1x ，v _1y ) The upper left and right control points MV of the current block, respectively, and w is the width of the block.

The six-parameter affine pattern may have the following parameters: two parameters for translational movement in the horizontal and vertical directions, respectively, two parameters for scaling movement and rotational movement in the horizontal direction, respectively, and two other parameters for scaling movement and rotational movement in the vertical direction, respectively. The six-parameter affine motion model is encoded and decoded with three CPMV.

Fig. 6 is a schematic diagram illustrating one example of a six-parameter affine model according to some implementations of the present disclosure. As shown in fig. 6, three control points of one six-parameter affine block 601 are located at the upper left corner, upper right corner, and lower left corner of the block. The movement at the upper left control point is related to translational movement, the movement at the upper right control point is related to rotational and zooming movements in the horizontal direction, and the movement at the lower left control point is related to rotational and zooming movements in the vertical direction. Compared to the four-parameter affine motion model, the six-parameter horizontal direction rotation and scaling motions may be different from those in the vertical direction.

In some examples, when (V ₀ ，V ₁ ，V ₂ ) For the MVs of the upper left, upper right and lower left corners of the current block in fig. 6, the motion vector (v _x ，v _y ) Three MVs at the control point are used to be derived as in the following equation (2):

wherein (v) _0x ，v _0y )、(v _1x ，v _1y ) Sum (v) _2x ，v _2y ) The upper left, upper right and lower left control points MV of the current block, and w and h are the width and height of the block, respectively.

Prediction Refinement (PROF) using optical flow for affine mode

In order to improve affine motion compensation accuracy, a PROF is employed in VVC, which refines sub-block-based affine motion compensation based on an optical flow model. Specifically, after performing sub-block-based affine motion compensation, each luminance prediction sample of one affine block is modified by one sample refinement value derived based on an optical flow equation. In some examples, the operation of the PROF can be summarized in the following four steps.

In a first step, sub-block based affine motion compensation is performed to generate a sub-block prediction I (I, j) using a sub-block MV derived from equation (1) above for a four-parameter affine model or equation (2) above for a six-parameter affine model.

Furthermore, in the second step, the spatial gradient g of each predicted sample is calculated _x (i, j) and g _y (i, j) is calculated as in the following formula (3):

g _x (i，j)＝(I(i+1，j)-I(i-1，j))

g _y (i，j)＝(I(i，j+1)-I(i，j-1)) (3)

thus, to calculate the gradient, one additional row and/or column of prediction samples needs to be generated on each of the four sides of one sub-block, which expands the 4x4 sub-block into a 6x6 sub-block. To reduce memory bandwidth and complexity, samples on the extended boundaries are copied from the nearest integer pixel locations in the reference picture to avoid additional interpolation processes.

Further, in the third step, a luminance prediction refinement value is calculated by the following equation (4):

ΔI(i，j)＝g _x (i，j)*Δv _x (i，j)+g _y (i，j)*Δv _y (i，j) (4)

where Δv (i, j) is the difference between the pixel MV represented by v (i, j) calculated for the sample point (i, j) and the sub-block MV of the sub-block where the pixel (i, j) is located.

Fig. 7 illustrates a PROF process for affine patterns according to some implementations of the present disclosure. In PROF, after adding prediction refinement to the original prediction samples, a clipping operation "clip3" is performed to clip the values of the refined prediction samples to within 15 bits, as shown in the following equation:

I ^r (i，j)＝I(i，j)+ΔI(i，j)

I ^r (i，j)＝clip3(-dILimit，dILimit＝1，I ^r (i，j))

dILimit＝(1<<max(13，BitDepth+1))

Where I (I, j) and Ir (I, j) are the original predicted and refined predicted sample values at position (I, j), respectively. The function clip3 (min, max, val) limits the given value "val" to within the range [ min, max ].

Since affine model parameters and pixel positions relative to the center of a sub-block do not change from sub-block to sub-block, Δv (i, j) can be calculated for the first sub-block and reused for other sub-blocks in the same CU. When Δx and Δy are horizontal and vertical offsets from the sample point (i, j) to the center of the sub-block to which the sample point belongs, Δv (i, j) can be derived as shown in the following formula (5):

deriving equations (1) and (2) based on the affine sub-block MV, parameters c, d, e, and f in equation (5) can be derived. Specifically, for a four-parameter affine model, parameters c, d, e, and f can be derived as shown in the following equation:

furthermore, for a six-parameter affine model, parameters c, d, e, and f may be derived as shown in the following equation:

wherein (v) _0x ，v _0y )、(v _1x ，v _1y )、(v _2x ，v _2y ) For the upper left, upper right and lower left control points MV of the current coding block, w and h are the width and height of the block. In PROF, MV difference Deltav _x And Deltav _y Always with an accuracy of 1/32 pixel.

Finally, in a fourth step, the luma prediction refinement Δi (I, j) is added to the sub-block prediction I (I, j). The final prediction I' (I, j) for the sample at position (I, j) is generated as shown in equation (6) below:

I′(i，j)＝I(i，j)+ΔI(i，j) (6)

Bidirectional optical flow (BIO)

Bi-prediction in video coding is a simple combination of two temporal prediction blocks obtained from a reference picture. However, the motion vectors received at the decoder side may not be so accurate due to the signaling cost and accuracy tradeoff of the motion vectors. As a result, there may be small residual motion that may be observed between the two prediction blocks, which may reduce the efficiency of motion compensated prediction. To solve this problem, a BIO tool is employed in both VVC and AVS3 standards to compensate for such motion for each sample within a block. In particular, BIO is a point-by-point motion refinement that is performed on block-based motion compensated prediction when bi-directional prediction is used.

In BIO design, deriving a refined motion vector for each sample in a block is based on a classical optical flow model. Let I ^(k) (x, y) is a sample value at coordinates (x, y) of the prediction block derived from the reference picture list k (k=0, 1), andand-> Is the horizontal and vertical gradient of the sample. Assuming that the optical flow model is valid, then motion refinement at (x, y) (v _x ，v _y ) The optical flow can be derived from the following optical flow equation (7):

As shown in fig. 4, in combination with the optical flow equation (7) and interpolation of the prediction block along the motion trajectory, the BIO prediction can be obtained as shown in the following equation (8):

fig. 4 is a schematic diagram illustrating one example of a BIO model according to some implementations of the present disclosure. As shown in fig. 4, (MV) _x0 ，MV _y0 ) Sum (MV) _x1 ，MV _y1 ) Indication is used to generate two prediction blocks I ⁽⁰⁾ And I ⁽¹⁾ Block-level motion vectors of (a). Furthermore, motion refinement (v) at sample points (x, y) _x ，v _y ) Calculated by minimizing the difference between the values of the samples after motion refinement compensation (i.e., a and B in fig. 4), as shown in the following equation (9):

furthermore, to ensure the regularities of the derived motion refinement, it is assumed that the motion refinement is consistent within a local surrounding region centered on (x, y); thus, in the BIO design of AVS3, as shown in equation (10) below, (v) is derived by minimizing delta within the 4x4 window Ω around the current sample at (x, y) _x ，v _y ) Is the value of (1):

as shown in equations (8) and (10), in addition to the block level MC, a block compensation for each motion is required in the BIO, i.e., I ⁽⁰⁾ And I ⁽¹⁾ To derive a gradient in order to derive a local motion refinement and to generate a final prediction at that sample point. In AVS3, gradients are calculated by a two-dimensional (2D) separable Finite Impulse Response (FIR) filtering process that defines a set of eight-tap filters and is based on block-level motion vectors (e.g., (MV) in fig. 4 _x0 ，MV _y0 ) Sum (MV) _x1 ，MV _y1 ) With a horizontal and vertical gradient derived from the accuracy of the model). Table 1 illustrates the coefficients of the gradient filter used by the BIO.

TABLE 1

Fractional location	Gradient filter
		0	{-4，11，-39，-1，41，-14，8，-2}
1/4	{-2，6，-19，-31，53，-12，7，-2}
		1/2	{0，-1，0，-50，50，0，1，0}
3/4	{2，-7，12，-53，31，19，-6，2}

Finally, BIO is applied only to bi-predicted blocks, which are predicted by two reference blocks from temporally neighboring pictures. Furthermore, the BIO is enabled without sending additional information from the encoder to the decoder. Specifically, the BIO is applied to all bi-predictive blocks having both forward and backward predictive signals.

Final motion vector expression (UMVE)

The UMVE mode in the AVS3 standard is the same tool as the merge mode called with motion vector difference (MMVD) in the VVC standard. In addition to the traditional merge mode, in which motion information is derived from a spatial/temporal neighbor of a current block, MMVD/UMVE mode is introduced as a special merge mode in the VVC and AVS standards.

Specifically, in both VVC and AVS3, it is signaled at the coding block level by one MMVD flag. In the MMVD mode, two basic merge candidates are first generated as the first two candidates of the normal merge mode. After one base merge candidate is selected and signaled, an additional syntax element is signaled to indicate the MVD of the motion added to the selected merge candidate. The MMVD syntax element includes a merge candidate flag selecting a base merge candidate, a distance index designating an MVD amplitude, and a direction index indicating an MVD direction.

In the AVS3 standard, similar to the VVC standard, sub-block-based affine motion compensation (affine mode) is used to generate inter-prediction pixel values. This sub-block based prediction is a trade-off between codec efficiency, complexity and memory access bandwidth. The average prediction accuracy is lower than pixel-based prediction because all pixels within each sub-block share the same motion vector. Unlike the VVC standard, the AVS3 standard has no pixel level refinement after sub-block based motion compensation in affine mode.

The present disclosure provides a method of improving affine mode prediction accuracy. After conventional sub-block based affine motion compensation, the predicted value for each pixel is refined by adding the differential value derived from the optical flow equation. The proposed method may be referred to as AMPR. AMPR can achieve pixel level prediction accuracy without significantly increasing complexity and also preserve worst-case memory access bandwidth comparable to conventional sub-block based motion compensation in affine mode. Although AMPR also builds on optical flow, it is distinct from the pro in the VVC standard in the following respects.

First, a gradient is calculated at each pixel. Unlike PROF, which extends sub-block prediction by one pixel on each side, AMPR uses interpolation-based filtering for gradient computation at each pixel, which allows for a unified design between AMPR and BIO workflow in AVS 3.

Second, MV difference is calculated at each pixel. Unlike PROF, where MV difference is always calculated based on pixel position relative to the center of the sub-block, in AMPR MV difference can be calculated based on pixel position relative to different positions within the sub-block.

Third is early termination. Unlike the PROF procedure, which is always invoked on the decoder side when predicting the encoded block by affine mode, AMPR can be adaptively skipped on the decoder side based on certain defined conditions that indicate that applying AMPR is not a good performance and/or complexity tradeoff.

In some examples, the early termination method may also be used to simplify encoder-side operation. Some encoder-side optimization methods for the AMPR procedure are also presented in examples of the disclosure to reduce its latency and power consumption, such as skipping AMPR for affine UMVE, checking best mode selection before applying AMPR at parent CU, skipping AMPR for motion estimation at certain block sizes, checking magnitude of pixel MV difference before applying AMPR, skipping AMPR for some picture types, e.g. low-latency pictures or non-low-latency pictures, etc.

Exemplary workflow of AMPR

The AMPR method may include five steps as explained below. In a first step, conventional sub-block based affine motion compensation is performed to generate a sub-block prediction I (I, j) at each pixel position (I, j).

In a second step, horizontal and vertical spatial gradients g of the sub-block prediction are calculated at each pixel position using interpolation-based filtering _x (i, j) and g _y (i, j). In some examples, both horizontal and vertical gradients of affine prediction samples are calculated directly from reference samples at integer sample points in the temporal reference picture. One advantage of this is that for each affine sub-block, its gradient values can be generated at the same time as its prediction samples are generated. Another design benefit of such a gradient computation method is that it is also consistent with the gradient computation process used by other codec tools such as BIO in the AVS standard. Sharing the same process between different modules in the standard is friendly for pipelining and/or parallel design in practical hardware codec implementations.

In particular, the inputs of the gradient derivation process are the same reference points and inputs of the sub-blocks as the process for motion compensation of the affine sub-blocks Sports (MV) _x ，MV _y ) Is a component of the same score (fracX, fracY). In order to derive gradient values at each sample position, a default eight-tap FIR filter h for affine prediction is added _L In addition, another new set of FIR filters h is introduced in the proposed method _G To calculate the gradient value.

Furthermore, depending on the direction of the derived gradient, a filter h is applied _G And h _L Is different. In deriving the horizontal gradient g _x In the case of (i, j), the gradient filter h is first applied in the horizontal direction _G To derive a horizontal gradient value at a horizontal fractional sample point fracX; then, an interpolation filter h is vertically applied _L To interpolate the gradient values at the vertical fractional sample point fracY.

Conversely, in deriving the vertical gradient g _y (i, j) first applying the interpolation filter h horizontally _L To interpolate the intermediate interpolated samples at the horizontal sample point fracX, followed by applying a gradient filter h in the vertical direction _G To derive a vertical gradient value at the vertical fractional sample point fracY from the intermediate interpolated sample point.

In some examples, gradient filters may be generated with different filter coefficient precision and with different numbers of taps, which may provide various trade-offs between gradient computation precision and computation complexity. For example, gradient filters with more filter taps and/or with higher filter coefficient accuracy may generally result in better codec efficiency, but at the cost of more computational operations, such as more numbers of additions, multiplications, and shifts, due to the gradient computation process. In one example, the following eight-tap filter is proposed for horizontal and/or vertical gradient calculations of AMPR, as shown in table 2.

Table 2 is a predefined eight-tap interpolation filter coefficient f for generating spatial gradients based on 1/16 pixel precision of input sample values _grad [p]Is a table of (a).

TABLE 2

In another example, to reduce the complexity of gradient computation, a four tap FIR filter as shown in table 3 below is used for gradient generation of the proposed AMPR method. Table 3 is a predefined four tap interpolation filter coefficient f for generating spatial gradients based on 1/16 pixel precision of input sample values _grad [p]Is a table of (a).

In a third step, an MV difference Δv (i, j) is calculated at each pixel location (i, j), which is the difference between the MV of each pixel and the MV of the sub-block to which that pixel belongs. Fig. 8 illustrates one example of calculation of the horizontal offset and the vertical offset from the sample position to the specific position of the sub-block in which the sub-block MV is derived. As shown in fig. 8, the horizontal offset Δx and the vertical offset Δy are calculated from the sample point position (i, j) to the specific position (i ', j') of the sub-block in which the sub-block MV is derived. In some examples, the particular location may not always be the center of the sub-block. As shown in fig. 8, Δv (i, j) is calculated by equation (5) based on the pixel position with respect to a specific position within the sub-block.

TABLE 3 Table 3

In one example, let (i, j) be the pixel position/coordinates within the sub-block to which the pixel belongs, w and h be the width and height of the sub-block for affine mode (e.g., w=h=4 for a 4x4 sub-block, for an 8x8 sub-block, w=h=8), the horizontal offset Δx and the vertical offset Δy for each pixel (i, j) can be derived as follows (Δx and Δy are defined in equation (5), i=0..the (w-1) and j=0..the (h-1).

In one example, if the sub-block MV is derived from the position of the sub-block center at an integer position, Δx and Δy may be calculated by the equations shown below:

alternatively, if the sub-block MV is derived from the position of the sub-block center at the fractional position, Δx and Δy may be calculated by the following equation shown below:

in another example, if the sub-block MV is derived from the upper left corner position of the sub-block, Δx and Δy may be calculated by the equations shown below:

in another example, if the sub-block MV is derived from the upper right corner position within the sub-block, Δx and Δy may be calculated by the equations shown below:

alternatively, if the sub-block MV is derived from the upper right corner position outside the sub-block, Δx and Δy may be calculated by the equations shown below:

in another example, if the sub-block MV is derived from the lower left corner position within the sub-block, Δx and Δy may be calculated by the equations shown below:

alternatively, if the sub-block MV is derived from the lower left corner position outside the sub-block, Δx and Δy can be calculated as:

in another example, Δv (i, j) can be calculated by equation (5), Δx and Δy being the horizontal and vertical offsets from the sample position (i, j) to the sample position (pilot) of the sub-block to which the sample belongs. The trial sample position refers to a sample position within a sub-block that is used to derive MVs for generating sub-block-based prediction samples for the sub-block. In one example, the values of Δx and Δy are derived as follows based on the locations of the sub-blocks within the current CU.

Fig. 9 illustrates one example of one affine CU inner sub-block in accordance with some implementations of the present disclosure. For the upper left sub-block, sub-block a in fig. 9, Δx=i, Δy=j. For the upper right sub-block, sub-block B in fig. 9, Δx= (i-w+1), Δy=j. For the lower left sub-block, sub-block C in fig. 9, when a six-parameter affine model is applied, Δx=i, Δy= (j-h+1); and when the four-parameter affine model is applied, Δx= (i- (w > > 1) -0.5), Δy= (j- (h > > 1) -0.5). For the other sub-blocks, Δx= (i- (w > > 1) -0.5), Δy= (j- (h > > 1) -0.5).

Once the horizontal offset Δx and the vertical offset Δy are calculated, Δv (i, j) can be derived by the following equation (11):

where c, d, e and f are affine model parameters, which are known because the current block is an affine mode coded block. Equation (11) is similar to equation (5) of the PROF tool in the VVC standard.

In the fourth step, a predictive refinement value is calculated by equation (4).

In a fifth step, the prediction refinement is added to the sub-block prediction I (I, j). The final prediction I' (I, j) is generated as in equation (6).

In the present disclosure, the proposed AMPR workflow may be applied to luminance components and/or chrominance components. In one example, to achieve a good performance/complexity tradeoff, the proposed AMPR is applied only to refine affine prediction samples of the luma component, while chroma prediction samples are still generated based on existing sub-block based affine motion compensation.

In another example, to refine the alignment, both the luminance component and the chrominance component are refined by the proposed AMPR procedure. In this case, the sample-by-sample MV difference Δv (i, j) can be derived in different ways.

In one example, when the prediction refinement value is calculated in the above fourth step, the sample-wise MV difference Δv (i, j) may always be derived only once based on the luminance sub-block and then reused for the chrominance component. In this case, the value of Δv (i, j) used by the chrominance components may be scaled according to the sample grid ratio between the co-located luminance and chrominance encoded blocks. For example, for a 4:2:0 video, the value of the re-used Deltav (i, j) may be halved before being used by the chrominance component, while for a 4:4:4 video, the same value of the re-used Deltav (i, j) may be used by the chrominance component. For 4:2:2 video, the horizontal offset of Deltav (i, j) may be halved and the vertical offset of Deltav (i, j) may not be changed before being used by the chrominance components.

In another example, the sample-wise MV difference Δv (i, j) can be derived separately for the luminance and chrominance components, where the derivation process can be the same as the third step described above.

In another example, it is proposed to adaptively switch between a method of reusing luminance motion refinement Δv (i, j) for chrominance and a method of separately deriving luminance and chrominance motion refinements based on the chroma sampling format of the input video. For example, for 4:2:0 and 4:2:2 video, assuming that the sampling grids for the luminance and chrominance components are not aligned, a separate derivation of motion refinement for the luminance and chrominance components may be applied. On the other hand, when the input video is in the 4:4:4 chroma sampling format, motion refinement need only be derived once, i.e., for luminance, which is then reused for the other two color components because the sampling grids of the three color components are perfectly aligned.

In another example, a flag is signaled to indicate whether amps are applied to the chroma components at various different codec levels, e.g., at sequence level, picture level, slice level, etc. Further, if the above enable/disable flag is true, another flag may be signaled from the encoder to the decoder to indicate whether to recalculate the chroma motion refinement from the corresponding control point motion vector or directly borrow the corresponding motion refinement from the luma component.

Alternative workflow for AMPR

An alternative implementation of AMPR is to simulate multiplication based on the optical flow equation by a filtering process. In particular, it is proposed to replace the multiplication of the gradient value and the motion vector difference value at each sample position by performing a filtering process on the conventional sub-block based affine motion prediction. This can be formulated by the following equation

Wherein P is _AFF (x, y) is a sub-block based motion compensated prediction sample, nine is a filter coefficient, and P _AMPR (i, j) is a filtered affine prediction sample. In practice, different numbers of filter taps and filter shapes may be applied to achieve different trade-offs between complexity and codec performance.

In one or more examples, the filtering process may be performed by a cross-shaped filter, which may also be referred to as a diamond filter. For example, the diamond filter may be a combination of vertical and horizontal shapes of a three-tap filter [ -1, 1] or a five-tap filter [ -1, -2,4,2,1], as shown in fig. 10A-10B. The cross-shaped filter may simulate the gradient computation process described in the optical flow-based refinement process above.

In order to capture a Motion Vector (MV) difference Δv (i, j) between the MV of each pixel and the MV of the sub-block to which the pixel belongs, filter coefficients in the selected diamond filter may be calculated based on the values of Δv (i, j) in the horizontal and vertical directions. In other words, a scaled diamond filter may be used to compensate for the motion vector difference at each sample point. Fig. 11A-11B illustrate examples of diamond filters scaled by MV difference in horizontal and vertical directions according to some implementations of the present disclosure. Fig. 11A illustrates a five tap scaled diamond filter. Fig. 11B illustrates a nine tap scaled diamond filter.

In another example, the filtering process may be performed by a square filter. For example, the square filter may be a 3x3 or 5x5 shape filter, wherein the importance of each coefficient of the square filter may depend on the distance between the position of each coefficient and the center of the filter, which means that the center coefficient may have a maximum value in the filter. Similar to the diamond filter mentioned above, the filter coefficients in the selected square filter may be scaled by the values of Δv (i, j) in the horizontal and vertical directions.

Once a particular type of filter is selected, such as diamond or square, a corresponding scaling filter is calculated at each sample location. The scaling value is a Motion Vector (MV) difference Δv (i, j) at each sample point, which is the difference between the MV of each pixel and the MV of the sub-block to which the pixel belongs. The calculation process is the same as for the optical flow based implementation of AMPR, in which the value of Δv (i, j) is determined based on whether the associated sub-block MV is derived from the sub-block center position at an integer position, from the sub-block upper left corner position, from the sub-block upper right corner position, or from the sub-block lower left corner position.

In one particular example, when a three-tap cross filter is applied in the proposed scheme, the corresponding filtered affine prediction samples can be calculated as:

P _AMPR (i，j)＝(M·P _AFF (i，j)+N·(P _AFF (i，j+1)·Δ _x (i，j+1)-

P _AFF (i，j-1)·Δ _x (i，j-1))+N·Δ _y ·(P _AFF (i+1，j)·Δ _y (i+1，j)-

P _AFF (i-1，j)·Δ _y (i-1，j)))÷M (13)

Where M and N are constant valued initialization coefficients, Δx and Δy are scaling factors that can be used to adjust the importance of neighboring samples to the filter samples at one current location. In a specific embodiment, it is proposed to set m=2n, for example m=16 and n=8.

In some examples, the scaling factor may be a set of integers that result in a multiplication as shown in equation (13). It should be noted that the set of integers may be empirical numbers predefined offline, or dynamically derived online based on predicted samples, such as video content perception. Furthermore, the scaling factor, which may be derived online or defined offline, may be a set of integers that are to the power n of 2, such that the filtering process by applying the scaling filter may be converted to a left shift operation, e.g., complexity reduction.

Once the scaling filter coefficients are calculated, a filtering process may be performed on conventional sub-block-based affine motion prediction samples. For neighboring sample points that are outside the current block/sub-block, a padding process may be required. In one or more examples, the padded sample value may be copied from the nearest integer position reference sample. In another example, the padded sample value may be a duplicate value of an integer position reference sample used by the current block/sub-block. In one example, integer samples closest to a current boundary sample of the current CU, which may be fractional, are used to populate an extended region (region) of the current CU. In another example, integer samples whose positions are smaller than the corresponding boundary samples of the current CU, namely floor (), are used to fill the samples in the extension region of the current CU.

It should be noted that integer samples may be populated with a one-to-one mapping or a many-to-one mapping. For example, in the case of a many-to-one mapping, 4 adjacent integer samples may be averaged or weighted averaged to generate a single sample that is used to fill in sample locations that are outside of the current block/sub-block.

Reference sample filling

As described in the above "alternative workflow of AMPR", a method of achieving AMPR sample refinement by replacing the sample-by-sample light stream refinement, i.e. multiplication of gradient and local motion refinement, by a convolution filter is proposed. However, such filtering operations require access to the predicted samples of additional lines of each sub-block, i.e. the basic unit of conventional affine motion compensation, due to the length of the applied filter. When AMPR is implemented in hardware, such a design may severely complicate the pipeline design due to interdependence between prediction samples in different sub-blocks. To explain such a problem, it is assumed that there are two adjacent sub-blocks a and B in the current affine CU, where a is to the left of B, and the filter size applied to filter affine prediction samples is 3x3. In such a case, the sample refinement of the prediction samples located at the right boundary of the sub-block B cannot be started until the prediction samples at the left boundary of the sub-block B are completely reconstructed.

In order to solve such a delay problem, various methods are proposed below to remove dependencies between different sub-blocks in the affine motion compensation phase. In one example, it is proposed to directly use reference samples at integer samples in the reference picture to fill prediction samples at the extension region around each sub-block. Since there are adjacent integer-sample positions around one fractional-sample position, there may be different ways of selecting the corresponding integer-sample position.

In one or more examples, it is proposed to always select an integer reference sample that is located horizontally to the left of the prediction sample and vertically above the prediction sample, i.e., integer sample LT in fig. 12, to fill the extension region of each sub-block. In some examples, it is proposed to always select an integer reference sample that is located horizontally to the right of the prediction sample and vertically above the prediction sample, i.e., integer sample RT in fig. 12, to fill the extension region of each sub-block.

In one example, it is proposed to always select an integer reference sample located to the left of the prediction sample in the horizontal direction and below the prediction sample in the vertical direction, i.e., an integer sample LB in fig. 12, to fill the extension region of each sub-block. In yet another example, it is proposed to always select an integer reference sample located to the right of the prediction sample in the horizontal direction and below the prediction sample in the vertical direction, i.e., an integer sample RB in fig. 12, to fill the extension region of each sub-block. In some examples, the integer reference samples closest to the prediction samples are used for affine prediction filtering.

Alternatively or in addition, instead of selecting only one specific integer sample point, the average of the reference samples at multiple integer sample points may be used to fill the extension samples of one sub-block. For example, depending on the respective fractional position of the sub-block motion vector, two integer samples closest to the fractional position may be used, and the average may be used to populate the extension samples. In another example, all four integer reference samples are always averaged to calculate the corresponding extension samples, as shown in the following equation:

P _ext ＝(LT+Rr+LB+RB+2)>>2

in another example, to increase the accuracy of the expanded samples, instead of averaging, other interpolation filters are used to generate the predicted samples in the expanded region. In one particular example, a bilinear filter may be applied that interpolates the extended samples according to the distances of the four integer reference samples to the fractional positions, as shown below:

P _ext ＝LT·(1-X _frac )·(1-Y _frac )+RT·X _frac ·(1-Y _frac )+LB

·(1-X _frac )·Y _frac +RB·X _frac ·Y _frac

wherein X is _frac And Y _frac Fractional sample points in the x-direction and y-direction, which are floating point numbers within the range 0, 1).

In another example, instead of using reference samples at integer sample positions, predicted samples at four boundaries of a sub-block are directly padded out to their extension regions, as shown in fig. 13.

Selective enabling of AMPR

Predictive refinement derived by application of AMPR may not always be beneficial or/and necessary. The significance of the derived Δi (I, j) is determined from the accuracy and magnitude of the derived Δv (I, j) and g (I, j) according to equation (4).

In some examples, the AMPR operation may be conditionally applied based on certain conditions. This may be achieved by signaling a flag for each block to indicate whether AMPR is applied. It can also be implemented by using the same conditions on the encoder side and the decoder side to enable AMPR operation without additional signaling.

One motivation for such conditional application of AMPR operations is that if the CPMV of the CU is not accurate, or the derived affine model, e.g. two-parameter, four-parameter or six-parameter affine model, is not accurate, then the subsequent Δv (i, j) derived by equation (11) may also not be accurate. In this case, the AMPR operation may not contribute or even impair the codec performance, and thus it is better to skip the AMPR operation for this block. Another motivation for such conditional application of AMPR operations is that in some cases the benefits of applying AMPR may be trivial and from a computational complexity point of view, shutting down the operation is also better.

In one or more examples, the AMPR operation may be applied depending on whether the CMPV is explicitly signaled. In affine merge mode, where CPMV is not explicitly signaled but implicitly derived from spatially neighboring CUs, AMPR may be skipped for the current CU, as CPMV in this mode may not be accurate.

In another example, AMPR may be skipped if the magnitude of derived Δv (i, j) or/and g (i, j) is small, for example, compared to some predefined or dynamically determined threshold. Such a threshold may be determined based on a variety of different factors (e.g., CU aspect ratio and/or sub-block size, etc.). Such examples may be implemented in different ways as described below.

In one example, if the absolute value of the derived Δv (i, j) for all pixels within a sub-block is less than a threshold, then AMPR may be skipped for that sub-block. The conditions may have different implementation variants. For example, checking the absolute value of derived Δv (i, j) for all pixels may be simplified by checking only the four corners of the current sub-block, where the maximum absolute value of derived Δv (i, j) for all pixels within the sub-block may be found as shown in equation (14) below:

Where pixel location (i, j) may be any pixel coordinate in a sub-block, or may be from four corners (0, 0), (w-1, 0), (0, h-1), (w-1, h-1).

In another example, the calculation of the maximum absolute value of all Δv (i, j) can be obtained by:

where the sample points (i, j) are the four corners of the CU except for the upper left sub-block, sub-block a in fig. 9, the upper right sub-block, sub-block B in fig. 9, and the lower left sub-block, sub-block C in fig. 9. The coordinates of the four corner pixels within the sub-block are: (0, 0), (w-1, 0), (0, h-1), (w-1, h-1). |x| is a function that takes the absolute value of x.

In another example, Δv (i, j) derived for all pixel checks within a sub-block may be jointly performed as in equation (15) below, or separately performed from the horizontal and vertical directions as in equation (16) below.

Or alternatively

In equations (15) and (16) above, different closed-form expressions of Δi (I, j) represent different simplified methods. For example, the number of the cells to be processed,at Δi (I, j) =g _y (i，j)*Δv _y In the case of (i, j), g can be skipped _x And Deltav _x Is calculated by the computer.

In another example, the checking of derived Δv (i, j) may be combined with non-simplified AMPR operations. In this case, equation (14) may be combined with equation (4), and then the prediction refinement value is calculated by:

/>

In some examples, threshv _x Or threshv _y May have different or the same values. In some examples, when Δv (i, j) is derived for a sub-block, threshv may be determined depending on which location is used to derive the sub-block MV _x And threshv _y Is a value of (2). In other words, if MVs for two sub-blocks are derived using different positions, then threshv can be determined for them _x And threshv _y Is a different or the same pair of values. For example, for a sub-block whose sub-block level MV is derived based on the sub-block center, its threshv _x And threshv _y May be the same as or different from the pair of values of the sub-block whose sub-block level MV is derived based on the position of the upper left corner of the sub-block.

In some examples, threshv may be used _x And threshv _y The value of (2) is defined in the range of 1/32,1/16 in pixels]And (3) inner part. For example, a value of (1/16) × (10/16), (1/16) × (12/16), or (1/16) × (14/16) may be used as the threshold value. In this case, the threshold is a floating point of 1/16 pixel, for example, 0.625 in 1/16 pixel, 0.75 in 1/16 pixel or 0.875 in 1/16 pixel.

In some examples, threshv may be defined based on picture type _x And threshv _y Is a value of (2). For low-delay pictures, the derived affine model parameters may have smaller magnitudes than other non-low-delay pictures, because low-delay pictures tend to have smaller and/or smoother motions, and thus for those thresholdsSmaller values may be preferred.

In some examples, threshv _x And threshv _y The values of (2) may be the same regardless of the picture type. In some examples, if the absolute value of most of the derived g (i, j) for all pixels within a sub-block is less than a threshold, AMPR may be skipped for that sub-block. An example of such a method is that the sub-blocks comprise a smooth surface, which may consist of a flat texture (e.g. with no or little high frequency detail).

In some examples, the significance of Δv (i, j) and g (i, j) may be considered jointly or used in a mixed manner to decide whether or not AMPR should be skipped for the current sub-block or CU.

Selective enabling of AMPR may also be performed in case of an implementation that simulates AMPR by using the filtering process described above. For example, based on equation (15), ifAnd/or +.>Less than a predefined threshold threshv _x And threshv _y The corresponding scaling factor in the selected filter may become 0, which means that the filtering process may be simplified from 2D to a one-dimensional (1D) filtering process. Taking the five tap filter in FIG. 10A as an example, when there is only +.>Less than threshv _x When the 1D filter [1,2,1 ] is applied only in the vertical direction]. When only +.>Less than threshv _y When the 1D filter [1,2,1 ] is applied only in the horizontal direction]. When->And->Are not less than the corresponding threshold, namely threshv _x And threshv _y When the 2D filter is applied in both the horizontal and vertical directions. When->And->When they are smaller than the corresponding threshold values, no filter is applied at all, i.e. no AMPR is applied.

Computational complexity reduction for AMPR

When AMPR is enabled by optical flow-based or quadratic filtering, additional computational operations, such as multiplications and additions, need to be performed before the refined prediction samples of one affine CU are obtained. Tables 4 and 5 compare the number of multiplications per sample performed for different CU sizes before and after applying AMPR. Table 4 shows the number of multiplications performed by affine mode for different CU sizes without applying AMPR.

TABLE 4 Table 4

CU_w	CU_h	P0 (every sample)	P1 (every sample)	Totals to	Aggregate (per spot)
						16	16	5888	5888	11776	46
32	16	11776	11776	23552	46
						32	32	23552	23552	47104	46
16	64	23552	23552	47104	46
						64	16	23552	23552	47104	46
32	64	47104	47104	94208	46
						64	32	47104	47104	94208	46
64	64	94208	94208	188416	46
						64	128	188416	188416	376832	46
128	64	188416	188416	376832	46

Table 5 shows the number of multiplications performed by affine mode for different CU sizes with application of AMPR.

TABLE 5

Based on the comparison between tables 4 and 5, by enabling AMPR, the number of times per sample of affine pattern was increased from 46 to 54, i.e. by about 17%. Furthermore, during the optical flow or secondary filtering phase, additional multiplication operations can potentially increase the bit width of the multiplier for affine motion compensation. Taking secondary filtering as an example, as shown in equation (13), the predicted samples need to be multiplied by horizontal and vertical Δmv, i.e., Δx and Δy. Assuming that the inner codec bit depth is 10 bits and that Δmv is represented by a six-bit signed integer, the bit width of the output parameters of the multiplication of the AMPR quadratic filter will be a 16-bit signed integer. It may be beneficial to reduce the dynamic range of the input parameters of the multiplications involved in the AMPR process to reduce the corresponding hardware implementation costs. For the two reasons above, methods of reducing the complexity of existing implementations of AMPR are presented from two aspects.

The first aspect relates to low complexity motion compensated interpolation for AMPR. As with a conventional affine CU (i.e., AMPR is not applied), a default eight-tap interpolation filter is always applied to generate intermediate prediction samples for the affine CU when AMPR is applied. This makes the overall computational complexity of AMPR higher than that of conventional affine modes due to the additional computation introduced by optical flow or secondary filtering. To reduce the AMPR complexity, other motion compensated interpolation filters with fewer computations are applied to generate intermediate prediction samples when AMPR is applied. There may be a variety of ways to derive the motion compensated interpolation filter.

TABLE 6

In one example, the new interpolation filter for AMPR is derived directly from the existing interpolation filter for the regular affine mode. Specifically, assuming that the derived interpolation filter is six-tap, a new filter may be derived from the default eight-tap filter by adding the two leftmost/rightmost filter coefficients of the eight-tap filter, respectively, to one single coefficient. According to this example, table 6 shows the corresponding motion compensated interpolation filter for AMPR by illustrating the coefficients of the proposed six tap interpolation filter for AMPR.

In another example, a new set of interpolation filters is derived based on a Discrete Cosine Transform (DCT) kernel, and table 7 shows the corresponding interpolation coefficients when the filter tap is 6 by illustrating the coefficients of the proposed six tap interpolation filter for AMPR.

TABLE 7

At the same time, there are a number of ways to switch between the default eight-tap interpolation filter and the proposed six-tap filter. In one example, when AMPR operation is enabled for one affine, either horizontally, vertically, or both, the proposed six-tap filter will be used in place of the corresponding eight-tap filter for affine motion compensation in the horizontal and vertical directions.

In another example, it is proposed to enable the proposed six-tap interpolation filter only in the direction in which AMPR is applied, and to apply the default eight-tap filter in the direction in which AMPR is not applied. For example, when AMPR is enabled in the horizontal direction but disabled in the vertical direction, the proposed six-tap filter will only be used for motion compensated interpolation of CUs in the horizontal direction, and an eight-tap filter is still applied in the vertical direction in the motion compensated interpolation stage.

In another example, it is proposed to enable a six-tap interpolation filter only in the direction opposite to the direction in which AMPR is applied, and apply a default eight-tap filter to the interpolation process in the direction in which AMPR is applied.

A second aspect relates to bit width control for multipliers for AMPR. The additional multiplication operations involved in AMPR can potentially increase the worst-case bit width for multiplication of affine patterns. To simplify the multiplication complexity, it is proposed to apply certain right shift operations, i.e. to reduce the input dynamic range, to the input parameters of the AMPR multiplication. Taking the example of quadratic filter based AMPR, in one example, the dynamic range of the input horizontal and vertical Δmv, i.e. Δx and Δy, is reduced by applying a right shift prior to multiplication. In another example, the dynamic range of the input prediction samples is reduced prior to multiplication.

Encoder-side optimization

In the AVS3 standard, affine UMVE mode is computationally intensive for the encoder, as it involves selecting the best distance index for each merge mode candidate. When calculating the sum of absolute transformation differences for each candidate distance index, conventional affine motion compensation is always applied. If AMPR is applied over affine motion compensation, the computation may increase dramatically.

In one example, during SATD-based cost calculations at the encoder side for affine UMVE mode, the AMPR operation is skipped. It has been found through experimentation that although the best index is selected based on the best SATD cost, whether the AMPR is applied during the SATD calculation generally does not change the ranking of the best SATD cost. Thus, with the proposed method, enabling the AMPR mode does not incur significant encoder complexity for the affine UMVE mode.

Motion estimation is another major overhead on the encoder side. In another example, the AMPR procedure may be skipped depending on certain conditions. These conditions indicate that the best coding mode of the CU is unlikely to be an affine mode after the mode selection process.

One example of such a condition is whether the current CU has a parent CU that has been determined to be either explicitly affine-mode or affine-merge-mode encoded. This is due to the strong correlation of the codec mode selection between a CU and its parent CU, and more likely, if the above condition is true, the best codec mode for the current CU is also an explicit affine mode.

Another exemplary condition for enabling AMPR is whether the parent CU of the current CU is determined to be inter predicted using explicit affine mode. If true, applying AMPR during affine motion estimation of the current CU; otherwise, AMPR is skipped during affine motion estimation of the current CU.

Small-sized CUs, such as 16x16 CUs, have a much higher average per-pixel computation cost when AMPR is applied than large-sized CUs, such as 64x64 CUs. To effectively save computational complexity, in another example of the present disclosure, AMPR may be skipped for small-size CUs during motion estimation. The size of a CU may be defined as the total number of pixels. A pixel number threshold, such as 16x16 or 16x32 or 32 x32, may be defined and for blocks having a size smaller than the defined threshold, AMPR may be skipped during the affine motion estimation process for the block.

Encoder-side optimization may also be performed in case of an implementation that simulates AMPR by using the filtering process described above. For example, the filtering process may not be performed for affine UMVE mode. Another encoder optimization for enabling AMPR based filtering procedures is whether the parent CU of the current CU is determined to be inter predicted with explicit affine patterns. If so, the AMPR filtering process is applied during affine motion estimation of the current CU. Otherwise, the filtering process is skipped during affine motion estimation of the current CU.

Fig. 14 is a block diagram illustrating an apparatus for AMPR in accordance with some implementations of the present disclosure. The apparatus 1400 may be a terminal such as a mobile phone, tablet, digital broadcast terminal, tablet device, or personal digital assistant.

As shown in fig. 14, the apparatus 1400 may include one or more of the following: a processing component 1402, a memory 1404, a power component 1406, a multimedia component 1408, an audio component 1410, an input/output (I/O) interface 1412, a sensor component 1414, and a communication component 1416.

The processing component 1402 generally controls overall operations of the device 1400, such as operations related to display, telephone calls, data communications, camera operations, and recording operations. The processing unit 1402 may include one or more processors 1420 for executing instructions to perform all or part of the steps of the above methods. Further, the processing component 1402 may include one or more modules that facilitate interactions between the processing component 1402 and other components. For example, processing component 1402 may include multimedia modules that facilitate interactions between multimedia component 1408 and processing component 1402.

The memory 1404 is configured to store different types of data that support the operation of the device 1400. Examples of such data include instructions for any application or method operating on the apparatus 1400, contact data, phonebook data, messages, pictures, video, and so forth. The memory 1404 may be implemented by any type or combination of volatile or nonvolatile memory devices and the memory 1404 may be Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.

The power supply component 1406 provides power to the different components of the device 1400. Power supply component 1406 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 1400.

Multimedia component 1408 includes a screen that provides an output interface between device 1400 and a user. In some examples, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen that receives input signals from a user. The touch panel may include one or more touch sensors for sensing touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some examples, the multimedia component 1408 can include a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 1400 is in an operational mode, such as a photographing mode or a video mode.

The audio component 1410 is configured to output and/or input audio signals. For example, the audio component 1410 includes a Microphone (MIC). The microphone is configured to receive external audio signals when the apparatus 1400 is in an operation mode such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1404 or transmitted via the communication section 1416. In some examples, audio component 1410 further includes a speaker for outputting audio signals.

I/O interface 1412 provides an interface between processing unit 1402 and peripheral interface modules. The above peripheral interface module may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.

The sensor component 1414 includes one or more sensors that provide status assessment of various aspects of the apparatus 1400. For example, the sensor component 1414 may detect the on/off state of the device 1400 and the relative positioning of the components. These components are, for example, the display and keypad of device 1400. The sensor component 1414 can also detect a change in position of the device 1400 or a component of the device 1400, the presence or absence of user contact on the device 1400, an orientation or acceleration/deceleration of the device 1400, and a change in temperature of the device 1400. The sensor component 1414 may include a proximity sensor configured to detect the presence of nearby objects without any physical touch. The sensor component 1414 may further include an optical sensor such as a CMOS or CCD image sensor used in imaging applications. In some examples, the sensor component 1414 may further include an acceleration sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1416 is configured to facilitate wired or wireless communication between the apparatus 1400 and other devices. The device 1400 may access the wireless network based on a communication standard such as WiFi, 4G, or a combination thereof. In one example, the communication component 1416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one example, the communication component 1416 may further include a Near Field Communication (NFC) module for facilitating short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In one example, the apparatus 1400 may be implemented by one or more of the following: an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components that perform the above methods.

A non-transitory computer readable storage medium may be, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), flash memory, a hybrid drive, or a Solid State Hybrid Drive (SSHD), read-only memory (ROM), compact disk read-only memory (CD-ROM), magnetic tape, floppy disk, etc.

Fig. 15 is a flow chart illustrating a process for AMPR in accordance with some implementations of the present disclosure.

In step 1502, the processor 1420 determines whether AMPR is applied to the CU.

In step 1504, the processor 1420 adjusts coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples in response to determining that AMPR is applied to the CU.

In some examples, the coefficients of the motion compensated interpolation filter are adjusted by obtaining the motion compensated interpolation filter based on a conventional affine mode interpolation filter. For example, the conventional affine mode interpolation filter may be an eight-tap filter. The derived motion compensated interpolation filter may be a six tap filter used as a quadratic filter.

In some examples, one coefficient of the motion compensated interpolation filter may be obtained by adding a plurality of coefficients of a conventional affine mode interpolation filter. For example, a six-tap filter may be obtained from the default eight-tap filter by separately adding the two leftmost and/or rightmost filter coefficients of the eight-tap filter to one single coefficient, as shown in table 6.

In some examples, coefficients of the motion compensated interpolation filter may be obtained based on a DCT kernel, as shown in table 7.

In some examples, processor 1420 may apply motion compensated interpolation filters with adjusted coefficients in the horizontal and vertical directions in place of conventional affine mode interpolation filters for affine motion compensation.

In some examples, processor 1420 may apply a motion compensated interpolation filter with adjusted coefficients in a first direction and a conventional affine mode interpolation filter in a second direction when AMPR is enabled in the first direction and AMPR is disabled in the second direction, where the first and second directions may be horizontal and vertical directions, or vertical and horizontal directions, respectively.

In some examples, when AMPR is disabled in a first direction and AMPR is enabled in a second direction, a motion compensated interpolation filter with adjusted coefficients is applied in the first direction and in the second direction, where the first and second directions may be horizontal and vertical directions, respectively, or vertical and horizontal directions.

In some examples, processor 1420 may apply one or more shift operations to the input parameters of the AMPR in response to determining that the AMPR is applied to the CU.

In some examples, the processor 1420 may apply a shift operation to parameters related to MV differences in the horizontal and vertical directions. For example, these shift operations may include a right shift operation.

In some examples, processor 1420 may apply a shift operation to the input prediction samples. For example, these shift operations may include a right shift operation.

Fig. 16 is a flow chart illustrating a process for predicting a sample at a pixel location in a sub-block by implementing AMPR in accordance with some implementations of the disclosure.

In step 1602, the processor 1420 generates a plurality of affine motion compensated predictions at pixel locations in the sub-block and a plurality of neighboring pixel locations.

For example, the pixel location may be a location (I, j) within the sub-block, and the affine motion compensated prediction at the pixel location (I, j) may be the sub-block prediction I (I, j) generated in the first step of the AMPR method. The adjacent pixel positions may include pixel positions adjacent to the pixel position (i, j). For example, as shown in fig. 12, the integer-like points LT, RT, LB, and RB are at adjacent pixel positions with respect to the fractional position.

In step 1604, the processor 1420 obtains the initialized coefficients for the filter by initializing the coefficients for the filter to a set of predefined values. The initialization coefficients of the filter may be obtained to simulate the gradient computation process.

For example, the coefficients of the filter are initialized to predefined empirical numbers as shown in equation (13) or as shown in fig. 10A-10B.

In some examples, the set of predefined values may include a set of integers predefined offline. For example, the coefficients of the filter may be initialized using the predefined set of integers.

In some examples, the filter may be cross-shaped or square-shaped.

In step 1606, the processor 1420 scales the coefficients using one or more online scaling factors to obtain a scaling filter.

For example, the one or more online scaling factors are dynamically derived from the sample-by-sample MV difference Δv (i, j). In some examples, the scaling filter may be a scaling diamond filter as shown in fig. 11A-11B.

In step 1608, the processor 1420 obtains, at the pixel location, a refined prediction of the sample at the pixel location in the sub-block based on the multiple affine motion compensated predictions using a scaling filter.

For example, instead of performing multiplication of gradient values and motion vector differences at each sample position in the AMPR as shown in equation (4), a refined prediction of the sample at pixel position (i, j) is directly calculated using a scaling filter based on the multiple affine motion compensation predictions obtained in step 1602.

In some examples, processor 1420 may dynamically determine one or more online scaling factors from samples at pixel locations in a sub-block. For example, instead of determining (i.e., predefining) one or more online scaling factors offline, one or more online scaling factors may be dynamically determined from MV differences at each of a plurality of adjacent pixel locations.

Fig. 17 is a flow chart illustrating a process for AMPR in accordance with some implementations of the present disclosure.

In step 1702, the processor 1420 determines whether the neighboring sample locations in the sub-block relative to the sample location are outside of the sub-block.

For example, the sample position may be a position (i, j) within a sub-block. As shown in fig. 12-13, the adjacent sample positions may be positions relative to the sample points (i, j).

In step 1704, in response to determining that the neighboring sample points are outside of the sub-block, for one filter-based AMPR implementation, the processor 1420 determines the fill sample points based on the one or more reference sample points at the one or more integer locations and copies the fill sample points to the neighboring sample point locations.

In some examples, processor 1420 may determine the reference samples at integer positions in the reference picture and determine the reference samples as padding samples.

In some examples, the integer positions may include an integer position of an upper left of the predicted sample, an integer position of a lower left of the predicted sample, an integer position of an upper right of the predicted sample, or an integer position of a lower left of the predicted sample corresponding to the sample, as shown in integer samples LT, RT, LB, and RB of fig. 12. The predicted samples may be in fractional positions as shown in fig. 12.

In some examples, processor 1420 may determine an average sample based on the plurality of reference samples at the integer locations and determine the average sample as the fill sample.

In some examples, the average sample point may comprise an average of multiple reference samples or a weighted average of multiple reference samples.

In some examples, the plurality of reference samples may include a reference sample at an integer position on the left of the predicted sample corresponding to the sample, an integer reference sample at an integer position on the left of the predicted sample, an integer reference sample at a position on the right of the predicted sample, or an integer reference sample at a position on the left of the predicted sample, as shown in fig. 12 as integer samples LT, RT, LB, and RB. The predicted samples may be in fractional positions as shown in fig. 12.

In some examples, an apparatus for AMPR is provided. The apparatus includes one or more processors 1420 and a memory 1404 configured to store instructions executable by the one or more processors; wherein upon execution of the instructions, the processor is configured to perform any of the methods as described in fig. 15 and above.

In some examples, an apparatus for AMPR is provided. The apparatus includes one or more processors 1420 and a memory 1404 configured to store instructions executable by the one or more processors; wherein upon execution of the instructions, the processor is configured to perform any of the methods as described in fig. 16 and above.

In some examples, an apparatus for AMPR is provided. The apparatus includes one or more processors 1420 and a memory 1404 configured to store instructions executable by the one or more processors; wherein upon executing the instructions the processor is configured to perform any of the methods as described in fig. 17 and above.

In some other examples, a non-transitory computer-readable storage medium 1404 having instructions stored therein is provided. These instructions, when executed by one or more processors 1420, cause the processors to perform any of the methods as described above and in fig. 15.

In some other examples, a non-transitory computer-readable storage medium 1404 having instructions stored therein is provided. These instructions, when executed by one or more processors 1420, cause the processors to perform any of the methods as described in fig. 16 and above.

In some other examples, a non-transitory computer-readable storage medium 1404 having instructions stored therein is provided. These instructions, when executed by one or more processors 1420, cause the processors to perform any of the methods as described in fig. 17 and above.

The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative implementations will become apparent to those skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

The examples were chosen and described in order to explain the principles of the present disclosure and to enable others of ordinary skill in the art to understand the present disclosure for various implementations and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure should not be limited to the specific examples of implementations disclosed and that modifications and other implementations are intended to be included within the scope of the disclosure.

Claims

1. A method for affine motion compensated prediction refinement (AMPR), comprising:

determining whether the AMPR is applied to a Coding Unit (CU); and

In response to determining that the AMPR is applied to the CU, coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples are adjusted.

2. The method of claim 1, wherein adjusting coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples comprises:

the motion compensated interpolation filter is obtained based on a conventional affine mode interpolation filter.

3. The method of claim 2, wherein adjusting coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples comprises:

one coefficient of the motion compensation interpolation filter is obtained by adding a plurality of coefficients of the conventional affine mode interpolation filter.

4. The method of claim 1, wherein adjusting coefficients of a motion compensated interpolation filter used to generate intermediate prediction samples comprises:

coefficients of the motion compensated interpolation filter are obtained based on a Discrete Cosine Transform (DCT) kernel.

5. The method of claim 1, further comprising:

the motion compensated interpolation filter with adjusted coefficients is applied in the horizontal and vertical directions instead of the conventional affine mode interpolation filter for affine motion compensation.

6. The method of claim 1, further comprising:

the motion compensated interpolation filter with adjusted coefficients is applied in a first direction and the regular affine mode interpolation filter is applied in a second direction.

7. The method of claim 6 wherein the AMPR is enabled in the first direction and the AMPR is disabled in the second direction, and the first direction and the second direction comprise a horizontal direction and a vertical direction.

8. The method of claim 6 wherein the AMPR is disabled in the first direction and the AMPR is enabled in the second direction, and the first direction and the second direction comprise a horizontal direction and a vertical direction.

9. The method of claim 1, further comprising:

in response to determining that the AMPR is applied to the CU, one or more shift operations are applied to input parameters of the AMPR.

10. The method of claim 9, wherein applying the one or more shift operations to the input parameters comprises: the shift operation is applied to parameters related to a Motion Vector (MV) difference value in the horizontal direction and the vertical direction.

11. The method of claim 1, further comprising:

In response to determining that the AMPR is applied to the CU, a shift operation is applied to input prediction samples.

12. A method for predicting samples at pixel locations in a sub-block by implementing affine motion compensated prediction refinement (AMPR), comprising:

generating a plurality of affine motion compensated predictions at the pixel location and a plurality of neighboring pixel locations in the sub-block;

obtaining initialized coefficients of a filter by initializing the coefficients of the filter to a set of predefined values;

scaling the initialization coefficients using one or more online scaling factors to obtain a scaled filter; and

at the pixel location, a refined prediction of the sample at the pixel location in the sub-block is obtained based on the plurality of affine motion compensated predictions using the scaling filter.

13. The method of claim 12, wherein the set of predefined values comprises a set of integers predefined offline.

14. The method of claim 12, further comprising:

the one or more online scaling factors are dynamically determined from the samples at the pixel locations in the sub-block.

15. The method of claim 14, wherein dynamically determining the one or more online scaling factors from the samples at the pixel locations in the sub-block comprises:

The one or more online scaling factors are dynamically determined from Motion Vector (MV) differences at each of the plurality of adjacent pixel locations.

16. The method of claim 12, wherein the filter comprises a cross or square shape.

17. A method for affine motion compensated prediction refinement (AMPR), comprising:

determining whether neighboring sample locations relative to sample locations in a sub-block are outside the sub-block; and

in response to determining that the neighboring sample point is outside the sub-block, a fill sample point is determined based on one or more reference sample points at one or more integer locations, and the fill sample point is copied to the neighboring sample point for filter-based AMPR implementation.

18. The method of claim 17, further comprising:

determining a reference sample point at an integer position in a reference picture; and

and determining the reference sample point as the filling sample point.

19. The method of claim 18, wherein the integer positions comprise an integer position on the upper left of the predicted sample, an integer position on the lower left of the predicted sample, an integer position on the upper right of the predicted sample, or an integer position on the lower left of the predicted sample corresponding to the sample.

20. The method of claim 17, further comprising:

determining an average sample point based on the plurality of reference sample points at the integer positions; and

and determining the average sampling point as the filling sampling point.

21. The method of claim 20, wherein the average sample point comprises an average of the plurality of reference sample points or a weighted average of the plurality of reference sample points.

22. The method of claim 21, wherein the plurality of reference samples comprises a reference sample at an integer position on the left of the predicted sample corresponding to the sample, an integer reference sample at a position on the left of the predicted sample, an integer reference sample at a position on the right of the predicted sample, or an integer reference sample at a position on the left of the predicted sample.

23. An apparatus for affine motion compensated prediction refinement (AMPR), comprising:

one or more processors; and

a memory configured to store instructions executable by the one or more processors;

wherein the one or more processors are configured, when executing the instructions, to perform the method of any of claims 1-11.

24. An apparatus for predicting samples at pixel locations in a sub-block by implementing affine motion compensated prediction refinement (AMPR), comprising:

One or more processors; and

wherein the one or more processors are configured, when executing the instructions, to perform the method of any of claims 12-16.

25. An apparatus for affine motion compensated prediction refinement (AMPR), comprising:

one or more processors; and

wherein the one or more processors are configured, when executing the instructions, to perform the method of any of claims 17-22.

26. A non-transitory computer-readable storage medium storing computer-executable instructions for affine motion compensated prediction refinement (AMPR), which when executed by one or more computer processors, cause the one or more computer processors to perform the method of any one of claims 1-11.

27. A non-transitory computer-readable storage medium storing computer-executable instructions for predicting samples at pixel locations in sub-blocks by implementing affine motion compensated prediction refinement (AMPR), which when executed by one or more computer processors, cause the one or more computer processors to perform the method of any of claims 12-16.

28. A non-transitory computer-readable storage medium storing computer-executable instructions for affine motion compensated prediction refinement (AMPR), which when executed by one or more computer processors, cause the one or more computer processors to perform the method of any one of claims 17-22.