CN114556944A - Inter-frame prediction method, encoder, decoder, and computer storage medium - Google Patents

Inter-frame prediction method, encoder, decoder, and computer storage medium Download PDF

Info

Publication number
CN114556944A
CN114556944A CN202180005841.XA CN202180005841A CN114556944A CN 114556944 A CN114556944 A CN 114556944A CN 202180005841 A CN202180005841 A CN 202180005841A CN 114556944 A CN114556944 A CN 114556944A
Authority
CN
China
Prior art keywords
prediction
sub
motion vector
block
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180005841.XA
Other languages
Chinese (zh)
Inventor
杨宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202210644545.0A priority Critical patent/CN114866783A/en
Publication of CN114556944A publication Critical patent/CN114556944A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/533Motion estimation using multistep search, e.g. 2D-log search or one-at-a-time search [OTS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses an inter-frame prediction method, an encoder, a decoder and a computer storage medium, wherein the decoder analyzes a code stream and acquires a prediction mode parameter of a current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block based on the first motion vector, and a motion vector offset between the pixel location and the sub-block; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block.

Description

Inter-frame prediction method, encoder, decoder, and computer storage medium
Cross Reference to Related Applications
The present application claims priority from chinese patent application filed on 29/07/2020, having application number 202010746227.6 and entitled "inter prediction method, encoder, decoder, and computer storage medium," which is incorporated herein by reference in its entirety.
Technical Field
The present application relates to the field of video encoding and decoding technologies, and in particular, to an inter-frame prediction method, an encoder, a decoder, and a computer storage medium.
Background
In the field of Video encoding and decoding, in order to take performance and cost into consideration, generally, affine prediction in multifunctional Video Coding (VVC) and digital Audio/Video encoding Standard of China (AVS) is realized based on subblocks. Currently, it is proposed that prediction refinement with optical flow (PROF) using an optical flow principle corrects affine prediction based on subblocks using an optical flow principle, thereby improving compression performance.
However, the optical flow calculation method using the PROF is effective when the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is very small, that is, the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is very small, but since the PROF depends on the horizontal and vertical gradients of the reference position, the horizontal and vertical gradients of the reference position cannot truly reflect the horizontal and vertical gradients between the reference position and the actual position when the actual position is far from the reference position, and therefore, the method is not particularly effective when the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is large.
Disclosure of Invention
The application provides an inter-frame prediction method, an encoder, a decoder and a computer storage medium, which can greatly improve the encoding performance, thereby improving the encoding and decoding efficiency.
The technical scheme of the application is realized as follows:
in a first aspect, an embodiment of the present application provides an inter-frame prediction method applied to a decoder, where the method includes:
analyzing the code stream to obtain the prediction mode parameter of the current block;
determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block;
determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape;
determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
In a second aspect, an embodiment of the present application provides an inter-frame prediction method applied to an encoder, where the method includes:
determining a prediction mode parameter of a current block;
determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block;
determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape;
determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
In a third aspect, an embodiment of the present application provides a decoder, where the decoder includes a parsing portion, a first determination portion;
the analysis part is configured to analyze the code stream and obtain the prediction mode parameter of the current block;
the first determining part is configured to determine a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
In a fourth aspect, embodiments of the present application provide a decoder comprising a first processor, a first memory storing the first processor-executable instructions, which when executed, the first processor implements the inter prediction method as described above.
In a fifth aspect, an embodiment of the present application provides an encoder, which includes a second determination portion;
the second determining part configured to determine a prediction mode parameter of the current block; and determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
In a sixth aspect, embodiments of the present application provide an encoder comprising a second processor, a second memory storing instructions executable by the second processor, the instructions, when executed, implement the inter prediction method as described above.
In a seventh aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, and when the computer program is executed by a first processor and a second processor, the inter-frame prediction method is implemented as described above.
According to the inter-frame prediction method, the encoder, the decoder and the computer storage medium, the decoder analyzes the code stream and obtains the prediction mode parameters of the current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block. The encoder determines a prediction mode parameter of the current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block based on the first motion vector, and a motion vector offset between the pixel location and the sub-block; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block. That is, the inter prediction method proposed in the present application may perform point-based secondary prediction on the first prediction value based on the sub-block for the pixel position where the motion vector deviates from the motion vector of the sub-block after prediction based on the sub-block, to obtain the second prediction value. The interframe prediction method provided by the application can be well suitable for all scenes, the coding performance is greatly improved, and the coding and decoding efficiency is improved.
Drawings
FIG. 1 is a diagram of an affine model one;
FIG. 2 is a second schematic diagram of an affine model;
FIG. 3 is a schematic diagram of interpolation of a pixel;
FIG. 4 is a first schematic diagram of sub-block interpolation;
FIG. 5 is a second schematic diagram of sub-block interpolation;
FIG. 6 is a diagram of motion vectors for each sub-block;
FIG. 7 is a schematic view of a sample position;
FIG. 8 is a diagram illustrating the location of affine prediction luma samples;
FIG. 9 is a schematic diagram of the positions of affine predicted integer-pixel samples and sub-pixel samples;
FIG. 10 is a diagram illustrating the locations of affine predicted chroma samples;
FIG. 11 is a schematic diagram of the positions of affine predicted integer-pixel samples and sub-pixel samples;
FIG. 12 is a schematic diagram of a flow of affine prediction;
FIG. 13 is a first flowchart illustrating a process of correcting a prediction value by a PROF;
FIG. 14 is a second flowchart of the PROF correction prediction value;
fig. 15 is a block diagram illustrating a video coding system according to an embodiment of the present application;
fig. 16 is a block diagram illustrating a video decoding system according to an embodiment of the present application;
FIG. 17 is a first flowchart illustrating an implementation of an inter-frame prediction method;
FIG. 18 is a first schematic diagram of a two-dimensional filter;
FIG. 19 is a second schematic diagram of a two-dimensional filter;
FIG. 20 is a flowchart illustrating a second implementation of the inter-frame prediction method;
FIG. 21 is a third flowchart illustrating an implementation of an inter-frame prediction method;
FIG. 22 is a fourth flowchart illustrating an implementation of the inter-frame prediction method;
FIG. 23 is a schematic diagram of filtering boundary pixel locations;
FIG. 24 is an expanded view of the boundary;
FIG. 25 is a fifth flowchart illustrating an implementation of the inter-frame prediction method;
FIG. 26 is a first block diagram of a decoder;
FIG. 27 is a second block diagram of the decoder;
FIG. 28 is a first block diagram of the encoder;
fig. 29 is a schematic diagram of the second encoder.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
In a video image, a first image component, a second image component and a third image component are generally adopted to characterize a current Block (Coding Block, CB); wherein the three image components are respectively a luminance component, a blue chrominance component and a red chrominance component, and specifically, the luminance component is generally represented by a symbol Y, the blue chrominance component is generally represented by a symbol Cb or U, and the red chrominance component is generally represented by a symbol Cr or V; thus, the video image can be represented in YCbCr format, and also in YUV format.
Currently, the common video codec standard is based on the adoption of a block-based hybrid coding framework. Each frame in a video image is divided into square Largest Coding Units (LCUs) with the same size (e.g., 128 × 128, 64 × 64, etc.), and each Largest Coding Unit may be further divided into rectangular Coding Units (CUs) according to rules; and the coding Unit may be further divided into smaller Prediction Units (PUs). Specifically, the hybrid coding framework may include modules such as prediction, Transform (Transform), Quantization (Quantization), entropy coding (entropy coding), Loop Filter (In Loop Filter), and the like; the prediction module may include intra prediction (intra prediction) and inter prediction (inter prediction), and the inter prediction may include motion estimation (motion estimation) and motion compensation (motion compensation). Because strong correlation exists between adjacent pixels in one frame of the video image, the spatial redundancy between the adjacent pixels can be eliminated by using an intra-frame prediction mode in the video coding and decoding technology; however, because there is strong similarity between adjacent frames in the video image, the inter-frame prediction method is used in the video coding and decoding technology to eliminate the time redundancy between adjacent frames, thereby improving the coding efficiency. The following detailed description of the present application will be made in terms of inter prediction.
Inter-frame prediction is to use an already coded/decoded frame to predict a part to be coded/decoded in a current frame, and in a block-based coding/decoding frame, the part to be coded/decoded is usually a coding unit or a prediction unit. The coding unit or prediction unit that needs to be coded/decoded is collectively referred to herein as a current block. Translational motion is a common and simple motion mode in video, so the prediction of translational motion is also a traditional prediction method in video coding and decoding. Translational motion in a video may be understood as the movement of a portion of content from a location on one frame to a location on another frame over time. A simple one-way prediction of the translation can be represented by a Motion Vector (MV) between a certain frame and the current frame. The current block can find a reference block with the same size as the current block on the reference frame through the motion information containing the reference frame and the motion vector, and the reference block is taken as a prediction block of the current block. In an ideal translation motion, the content of the current block has no deformation, rotation, and the like, and no change in luminance color, and the like, between different frames, however, the content in the video does not always conform to such an ideal situation. Bi-directional prediction may solve the above problem to some extent. The general bi-directional prediction refers to bi-directional translational prediction. Bi-prediction is to use two reference frames and motion information of motion vectors to find two reference blocks with the same size as the current block from the two reference frames (the two reference frames may be the same reference frame), and use the two reference blocks to generate a prediction block of the current block. The generation method includes averaging, weighted averaging, and some other calculation.
In this application, prediction may be considered as part of motion compensation, and some documents will refer to prediction in this application as motion compensation, and some will refer to affine prediction as affine motion compensation.
Rotation, zooming, warping, etc. are also common changes in video, however, ordinary translational prediction does not handle such changes well, and affine (affine) prediction models are applied in video codecs, such as affine in VVC and AVS, where the affine prediction models of VVC and AVS3 are similar. In the rotation, enlargement, reduction, warping, deformation, etc., it can be considered that the current block does not use the same MV for all points, and thus the MV for each point needs to be derived. The affine prediction model derives the MV for each point by calculation with a small number of several parameters. The affine predictive models of VVC and AVS3 both used 2-control-point (4-parameter) and 3-control-point (6-parameter) models. 2 control points, namely 2 control points at the upper left corner and the upper right corner of the current block, and 3 control points, namely 3 control points at the upper left corner, the upper right corner and the lower left corner of the current block. Illustratively, fig. 1 is a schematic diagram of an affine model, and fig. 2 is a schematic diagram of an affine model, as shown in fig. 1 and 2. Since each MV comprises an x-component and a y-component, there are 4 parameters for 2 control points and 6 parameters for 3 control points.
An MV can be derived for each pixel position according to the affine prediction model, each pixel position can find its corresponding position in the reference frame, and if the position is not an integer pixel position, the value of the sub-pixel position needs to be obtained through an interpolation method. The interpolation methods used in the video coding and decoding standards are usually implemented by Finite Impulse Response (FIR) filters, and the complexity (cost) of implementation in this way is high. For example, in AVS3, an 8-tap interpolation filter is used for the luminance component, and the normal mode sub-pixel precision is 1/4 pixels and the affine mode sub-pixel precision is 1/16 pixels. For each sub-pixel point meeting 1/16-pixel precision, 8 whole pixels in the horizontal direction and 8 whole pixels in the vertical direction, namely 64 whole pixels, are needed to be interpolated. Fig. 3 is a schematic diagram of interpolation of pixels, as shown in fig. 3, a circular pixel is a sub-pixel point to be obtained, a dark square pixel is a full pixel position corresponding to the sub-pixel, a vector between the two is a motion vector of the sub-pixel, a light square pixel is a pixel required for interpolation of the circular sub-pixel position, and to obtain a value of the sub-pixel position, pixel values of these 8 × 8 light square pixel regions are required for interpolation, and the dark pixel position is also included.
In conventional translational prediction, the MV of each pixel position of the current block is the same. If the concept of sub-blocks is further introduced, the size of the sub-blocks is e.g. 4x4, 8x8, etc. Fig. 4 is a schematic diagram of sub-block interpolation, i.e., a pixel region required for 4 × 4 block interpolation is shown in fig. 4. Fig. 5 is a diagram of sub-block interpolation, and the pixel regions required for 8 × 8 block interpolation are shown in fig. 5.
If the MV for each pixel position in a sub-block is the same. The pixel locations in one sub-block can be interpolated together to share the bandwidth, filters using the same phase, and to share intermediate values of the interpolation process. However, if one MV is used for each pixel, the bandwidth increases and filters with different phases may be used and intermediate values of the interpolation process cannot be shared.
Affine predictions based on points are costly, and therefore, to achieve both performance and cost, affine predictions in VVC and AVS3 are implemented on a subblock basis. The subblock sizes in AVS3 are 4x4 and 8x8, and a subblock size of 4x4 is used in VVC. Each sub-block has an MV, and the pixel locations within the sub-block share the same MV. So as to interpolate all pixel positions inside the sub-block uniformly. By the method, the motion compensation complexity of the affine prediction based on the sub-block is similar to that of other prediction methods based on the sub-block.
It can be seen that, in the sub-block-based affine prediction method, pixel positions inside the sub-block share the same MV, wherein the method for determining the shared MV is to take the MV at the center of the current sub-block. For at least one sub-block having an even number of pixels in the vertical and horizontal directions, such as 4x4 and 8x8, the center of the sub-block is located at a non-integer pixel position. The current standard takes an integer pixel position, for example, for a 4 × 4 sub-block, a pixel position (2, 2) from the upper left corner position. For the 8x8 sub-block, the pixel position (4, 4) from the upper left corner position is taken.
The affine prediction model can derive the MV for each pixel position from the control points (2 control points or 3 control points) used by the current block. In the sub-block based affine prediction, the MV at this position is calculated from the pixel position in the previous paragraph as the MV of the sub-block. Fig. 6 is a schematic diagram of the motion vector of each sub-block, and as shown in fig. 6, in order to derive the motion vector of each sub-block, the motion vector of the center sample of each sub-block is rounded to 1/16 precision, and then motion compensation is performed.
With the development of the technology, a method called a prediction improvement technique PROF using optical flow is proposed. This technique can improve the prediction value of block-based affine prediction without increasing the bandwidth. After the sub-block-based affine prediction is finished, calculating the gradients in the horizontal direction and the vertical direction of each pixel point which is subjected to the sub-block-based affine prediction. When the PROF calculates the gradient in VVC, a 3-tap filter [ -1, 0, 1] is used, and the calculation method is the same as that of a Bi-directional Optical flow (BDOF). Then, for each pixel position, calculating the motion vector deviation thereof, wherein the motion vector deviation is the difference between the motion vector of the current pixel position and the MV used by the whole sub-block. These motion vector biases can all be calculated from the formula of the affine prediction model. Due to the characteristics of the formula, the motion vector deviations of the same positions of some sub-blocks are the same, and for these sub-blocks, only one set of motion vector deviations needs to be calculated, and other sub-blocks can directly multiplex the values. For each pixel position, the horizontal and vertical gradients and the motion vector deviation (including the deviation in the horizontal direction and the deviation in the vertical direction) of the point are used for calculating the correction value of the predicted value of the pixel position, and then the corrected predicted value can be obtained by adding the correction value of the predicted value to the original predicted value, namely the predicted value based on the affine prediction of the sub-block.
When calculating the gradient in the horizontal and vertical directions, a [ -1, 0, 1] filter is used, that is, for the current pixel position, the predicted values of the pixel position whose left distance is one and the pixel position whose right distance is one are used in the horizontal direction, and the predicted values of the pixel position whose upper distance is one and the pixel position whose lower distance is one are used in the vertical direction. If the current pixel position is the boundary position of the current block, some of the several pixel positions may exceed the boundary of the current block by a pixel distance. The position of one pixel distance is filled out using the predictor of the boundary of the current block to satisfy the gradient calculation, so that the above predictor of one pixel distance beyond the boundary of the current block does not need to be additionally increased. Since the gradient calculation only needs to use the prediction values based on the affine prediction of the sub-blocks, no additional bandwidth needs to be added.
In the VVC standard text, the MV of each subblock of the current block and the motion vector deviation of each pixel position within the subblock are derived for the MV according to the control point. The pixel positions used by each subblock in the VVC as subblocks MV are the same, so only the motion vector offsets of a group of subblocks need to be derived, and other subblocks can multiplex the subblocks. Further, in the VVC standard text, description of the PROF flow, calculation of the motion vector deviation by the PROF is included in the above flow.
The affine prediction of the AVS3 is the same as the basic principle of VVC when calculating the MV of a sub-block, but the AVS3 has special treatment for the top left sub-block a, the top right sub-block B and the bottom left sub-block C of the current block.
The following is a description of the AVS3 standard text derivation of an affine motion element sub-block motion vector array:
if there are 3 motion vectors in the affine control point motion vector group, the motion vector group can be represented as mvsAffine (mv0, mv1, mv 2); if there are 2 motion vectors in the affine control point motion vector group, the motion vector group can be represented as mvsAffine (mv0, mv 1). Next, an affine motion element sub-block motion vector array may be derived as follows:
1. calculating variables dHorX, dVerX, dHorY and dVerY:
dHorX=(mv1_x-mv0_x)<<(7-Log(width));
dHorY=(mv1_y-mv0_y)<<(7-Log(width));
if the motion vector group is mvsAffine (mv0, mv1, mv2), then:
dVerX=(mv2_x-mv0_x)<<(7-Log(height));
dVerY=(mv2_y-mv0_y)<<(7-Log(height));
if the motion vector group is mvsAffine (mv0, mv1), then:
dVerX=-dHorY;
dVerY=dHorX;
it should be noted that fig. 7 is a sample position diagram, as shown in fig. 7, (xE, yE) is a position of an upper left corner sample of a luma prediction block of a current prediction unit in a luma sample matrix of a current image, a width and a height of the current prediction unit are width and height, respectively, a width and a height of each sub-block are sub-width and sub-height, respectively, a sub-block where the upper left corner sample of the luma prediction block of the current prediction unit is located is a, a sub-block where the upper right corner sample is located is B, and a sub-block where the lower left corner sample is located is C.
2.1, if the prediction reference mode of the current prediction unit is 'Pred _ List 01' or affinesblocksizeflag is equal to 1 (affinesblocksizeflag is used to indicate the size of the sub-block size), then both subwindoth and subwight are equal to 8, (x, y) are the coordinates of the position of the upper left corner of the sub-block of size 8x8, then a motion vector mvE (mvE _ x, mvE _ y) can be calculated for each 8x8 luma sub-block:
if the current subblock is A, both xPos and yPos are equal to 0;
if the current sub-block is B, xPos is equal to width, and yPos is equal to 0;
if the current subblock is C and there are 3 motion vectors in mvsAffine, xPos is equal to 0 and yPos is equal to height;
otherwise, xPos equals (x-xE) +4, yPos equals (y-yE) + 4;
thus, the motion vector mvE for the current 8x8 sub-block is:
mvE_x=Clip3(-131072,131071,Rounding((mv0_x<<7)+dHorX×xPos+dVerX×yPos,7));
mvE_y=Clip3(-131072,131071,Rounding((mv0_y<<7)+dHorY×xPos+dVerY×yPos,7));
2.2, if the prediction reference mode of the current prediction unit is 'Pred _ List 0' or 'Pred _ List 1', and the affinity ubbbocksizeflag is equal to 0, then both subwidth and subweight are equal to 4, (x, y) is the coordinate of the position of the upper left corner of the subblock of size 4x4, and the motion vector mvE (mvE _ x, mvE _ y) of each 4x4 luma subblock is calculated:
if the current subblock is A, both xPos and yPos are equal to 0;
if the current sub-block is B, xPos is equal to width, and yPos is equal to 0;
if the current sub-block is C and there are 3 motion vectors in mvAffinine, xPos is equal to 0, and y Pos and other heights;
otherwise, xPos equals (x-xE) +2, yPos equals (y-yE) + 2;
thus, the motion vector mvE for the current 4x4 sub-block is:
mvE_x=Clip3(-131072,131071,Rounding((mv0_x<<7)+dHorX×xPos+dVerX×yPos,7));
mvE_y=Clip3(-131072,131071,Rounding((mv0_y<<7)+dHorY×xPos+dVerY×yPos,7))。
the following is a description of the AVS3 text for affine prediction sample derivation and luma, chroma sample interpolation:
if the position of the top left sample of the luma prediction block of the current prediction unit in the luma sample matrix of the current picture is (xE, yE).
If the prediction reference mode of the current prediction unit is 'PRED _ List 0' and the value of AffiniBanckSizeFlag is 0, mv0E0 is the LO motion vector of 4x4 units with the MvArrayL0 motion vector set at the (xE + x, yE + y) position. The value of the element pred matrix l0[ x ] [ y ] in the luma prediction sample matrix pred matrix xl0 is a sample value in the 1/16-precision luma sample matrix with the reference index RefIdxL0 in the reference image queue 0 (((xE + x) < <4) + mv0E0_ x, ((yE + y) < <4) + mv0E0_ y), and the value of the element pred matrix l0[ x ] [ y ] in the chroma prediction sample matrix pred matrixl0 is a sample value in the 1/32-precision chroma sample matrix with the reference index RefIdxL0 in the reference image queue 0 (((xE +2x) < <4) + MvC _ x, ((yE +2y) < <4) + MvC _ y). Where x1 ═ ((xE +2x) > >3) < <3, y1 ═ ((yE +2y) > >3) < <3, mv1E0 is an LO motion vector of a 4x4 unit where MvArrayL l0 motion vectors are grouped at the (x1, y1) position, mv2E0 is an LO motion vector of a 4x4 unit where MvArrayL l0 motion vectors are grouped at the (x1+4, y1) position, mv3E0 is an LO motion vector of a 4x4 unit where MvArrayL l0 motion vectors are grouped at the (x1, y1+4) position, and mv4E0 is an LO motion vector of a 4x4 unit where MvArrayL l0 motion vectors are grouped at the (x1+4, y1+4) position.
MvC_x=(mv1E0_x+mv2E0_x+mv3E0_x+mv4E0_x+2)>>2
MvC_y=(mv1E0_y+mv2E0_y+mv3E0_y+mv4E0_y+2)>>2
If the prediction reference mode of the current prediction unit is 'PRED _ List 0' and the value of affinesblocksizeflag is 1, mv0E0 is the LO motion vector of 8x8 units with MvArrayL0 motion vectors grouped at (xE + x, yE + y) positions. The value of element pred matrix l0[ x ] [ y ] in luma prediction sample matrix pred matrix l0 is the sample value in the 1/16 precision luma sample matrix with reference index ref idxl0 in reference image queue 0 at (((xE + x) < <4) + mv0E0_ x, ((yE + y) < <4) + mv0E0_ y), and the value of element pred matrix l0[ x ] [ y ] in chroma prediction sample matrix pred matrix l0 is the sample value in the 1/32 precision chroma sample matrix with reference index ref idxl0 in reference image queue 0 at (((xE +2x) < <4) + MvC _ x, ((yE +2y) <4) + MvC _ y). Wherein MvC _ x is equal to mv0E0_ x, and MvC _ y is equal to mv0E 0.
If the prediction reference mode of the current prediction unit is 'PRED _ List 1' and the value of affinesbocksizeflag is 0, mv0E1 is an L1 motion vector of 4x4 units where MvArrayL L1 motion vectors are grouped at (xE + x, yE + y) positions. The value of element pred matrix l1[ x ] [ y ] in luma prediction sample matrix predmatrix l1 is the sample value in the 1/16 precision luma sample matrix with reference index RefIdxL1 in reference image queue 1 of (((xE + x) < <4) + mv0E1_ x, ((yE + y) < <4) + mv0E1_ y), and the value of element pred matrix l1[ x ] [ y ] in chroma prediction sample matrix pred matrix l1 is the sample value in the 1/32 precision chroma sample matrix with reference index RefIdxL1 in reference image queue 1 of (((xE +2x) < <4) + MvC _ x, ((yE +2y) <4) + MvC _ y). Wherein x1 ═ ((xE +2x) > >3) < <3, y1 ═ ((yE +2y) > >3) < <3, mv1E1 is the L1 motion vector of the 4x4 unit where MvArrayL L1 motion vectors are collected at (x1, y1) position, mv2E1 is the L1 motion vector of the 4x4 unit where MvArrayL L1 motion vectors are collected at (x1+4, y1) position, mv3E1 is the L1 motion vector of the 4x4 unit where MvArrayL L1 motion vectors are collected at (x1, y1+4) position, mv4E1 is the L1 of the 4x4 unit where MvArrayL L1 motion vectors are collected at (x1+4, y1+ 464) position.
MvC_x=(mv1E1_x+mv2E1_x+mv3E1_x+mv4E1_x+2)>>2
MvC_y=(mv1E1_y+mv2E1_y+mv3E1_y+mv4E1_y+2)>>2
If the prediction reference mode of the current prediction unit is 'PRED _ List 1' and the value of affinesbocksizeflag is 1, mv0E1 is an L1 motion vector of 8x8 units where MvArrayL L1 motion vectors are grouped at (xE + x, yE + y) positions. The value of element pred matrix l1[ x ] [ y ] in luma prediction sample matrix predmatrix l1 is the sample value in the 1/16 precision luma sample matrix with reference index RefIdxL1 in reference image queue 1 of (((xE + x) < <4) + mv0E1_ x, ((yE + y) < <4) + mv0E1_ y), and the value of element pred matrix l1[ x ] [ y ] in chroma prediction sample matrix pred matrix l1 is the sample value in the 1/32 precision chroma sample matrix with reference index RefIdxL1 in reference image queue 1 of (((xE +2x) < <4) + MvC _ x, ((yE +2y) <4) + MvC _ y). Where MvC _ x equals mv0E1_ x, MvC _ y equals mv0E 1.
If the prediction mode of the current prediction unit is 'PRED _ List 01', mv0E0 is an L0 motion vector of 8x8 units with the MvArrayL0 motion vector set at the (xE + x, yE + y) position, and mv0E1 is an L1 motion vector of 8x8 units with the MvArrayL1 motion vector set at the (x, y) position. The value of element pred matrix xl0[ x ] [ y ] in luma prediction sample matrix pred matrix xl0 is the sample value in position (((xE + x) < <4) + mv0E0_ x), ((yE + y) < <4) + mv0E0_ y) in the 1/16 luma sample matrix with reference index ref idxl0 in reference image queue 0, the value of element pred matrix xl0[ x ] [ y ] in chroma prediction sample matrix pred matrixl0 is the sample value in position in 1/32 chroma sample matrix with reference index ref idxl0 in reference image queue 0 ((xE +2x) < <4) + MvC0_ x, ((yE +2y) <4) + MvC0_ y), the value of element pred matrix 1[ x ] [ y ] in luma prediction sample matrix pred 1 is the sample value in position (< 9 x ] [ y ] [ 9 > ((ref x ] [ y) in reference image queue 358 ] ((ref x) is the value of intensity sample matrix 358 + mve 9 mve + mve 630), the value of the element predMatrixL1[ x ] [ y ] in the chroma prediction sample matrix predMatrixL1 is the sample value located in position (((xE +2x) < <4) + MvC1_ x, ((yE +2y) < <4) + MvC1_ y) in the 1/32-precision chroma sample matrix with reference index RefIdxL1 in the reference image queue 1. Where MvC0_ x equals mv0E0_ x, MvC0_ y equals mv0E0_ y, MvC1_ x equals mv0E1_ x, MvC1_ y equals mv0E1_ y.
The element values of the positions in the luminance 1/16 precision sample matrix and the chrominance 1/32 precision sample matrix of the reference image are obtained by the following interpolation method defined by the affine luminance sample interpolation process and the affine chrominance sample interpolation process. Integer samples outside the reference image should be replaced with integer samples (edge or corner samples) within the image that are closest to the sample, i.e. the motion vectors can point to samples outside the reference image.
Specifically, the affine luminance sample interpolation process is as follows:
FIG. 8 is a schematic diagram of the positions of affine-predicted luma samples, as shown in FIG. 8, where A, B, C, and D are neighboring integer-pixel samples, dx and dy are the horizontal and vertical distances between sub-pixel samples a (dx, dy) around the integer-pixel sample A and A, and dx is equal to fx&15, dy equals fy&15, where (fx, fy) is the coordinate of the sub-pixel sample in the luminance sample matrix with accuracy 1/16. FIG. 9 is a schematic diagram of the positions of affine-predicted integer-pixel samples and sub-pixel samples, integer-pixel Ax,yAnd 255 peripheral sub-pixels sample ax,yThe specific positions of (dx, dy) are shown in fig. 9.
In particular, the sample position ax,0(x is 1-15) is obtained by filtering 8 integer values closest to the interpolation point in the horizontal direction, and the predicted value is obtained by the following method:
a x,0=Clip1((fL[x][0]×A -3,0+fL[x][1]×A -2,0+fL[x][2]×A -1,0+fL[x][3]×A 0,0+fL[x][4]×A 1,0+fL[x][5]×A 2,0+fL[x][6]×A 3,0+fL[x][7]×A 4,0+32)>>6)。
in particular, the sample position a0,yAnd (y is 1-15) filtering the 8 integer values nearest to the interpolation point in the vertical direction to obtain a predicted value in the following mode:
a 0,y=Clip1((fL[y][0]×A 0,-3+fL[y][1]×A -2,0+fL[y][2]×A -1,0+fL[y][3]×A 0,0+fL[y][4]×A 1,0+fL[y][5]×A 2,0+fL[y][6]×A 3, 0+fL[y][7]×A -4,0+32)>>6)。
in particular, the sample position ax,yThe predicted values (x1 to 15, y1 to 15) are obtained as follows:
a x,y=Clip1((fL[y][0]×a' x,y-3+fL[y][1]×a' x,y-2+fL[y][2]× a'x,y-1+fL[y][3]× a'x,y+fL[y][4]× a'x,y+1+fL[y][5]× a'x,y+2+fL[y][6]× a'x, y+3+fL[y][7]× a'x,y+4+(1<<(19-BitDepth)))>>(20-BitDepth))。
wherein:
a' x,y=(fL[x][0]×A -3,y+fL[x][1]×A -2,y+fL[x][2]×A -1,y+fL[x][3]×A 0,y+fL[x][4]×A 1,y+fL[x][5]×A 2,y+fL[x][6]×A 3,y+fL[x][7]×A 4, y+((1<<(BitDepth-8))>>1))>>(BitDepth-8)。
the luminance interpolation filter coefficients are shown in table 1:
TABLE 1
Figure PCTCN2021106081-APPB-000001
Specifically, the affine chroma sample interpolation process is as follows:
FIG. 10 is a schematic diagram of positions of affine-predicted chroma samples, as shown in FIG. 10, where A, B, C, D are adjacent integer pixel samples, dx and dy are horizontal and vertical distances between sub-pixel samples a (dx, dy) around the integer pixel sample A and A, and dx is equal to fx&31, dy equals fy&31, (fx, fy) is the coordinate of the sub-pixel sample in the chroma sample matrix with accuracy 1/32. FIG. 11 is a schematic diagram of the positions of affine-predicted integer-pixel samples and sub-pixel samples, integer-pixel Ax,yAnd 1023 surrounding sub-pixels like specimen ax,yThe specific positions of (dx, dy) are shown in fig. 11.
Specifically, for a sub-pixel point where dx is equal to 0 or dy is equal to 0, it can be directly interpolated with chroma integer pixels, and for a point where dx is not equal to 0 and dy is not equal to 0, it is calculated using sub-pixels on the integer pixel row (dy is equal to 0):
if(dx==0){
a x,y(0,dy)=Clip3(0,(1<<BitDepth)-1,(fC[dy][0]×A x,y-1+fC[dy][1]×A x,y+fC[dy][2]×A x,y+1+fC[dy][3]×A x,y+2+32)>>6)
}
else if(dy==0){
a x,y(dx,0)=Clip3(0,(1<<BitDepth)-1,(fC[dx][0]×A x-1,y+fC[dx][1]×A x,y+fC[dx][2]×A x+1,y+fC[dx][3]×A x+2,y+32)>>6)
}
else{
a x,y(dx,dy)=Clip3(0,(1<<BitDepth)-1,(C[dy][0]×a' x,y-1(dx,0)+C[dy][1]×a' x,y(dx,0)+C[dy][2]×a' x,y+1(dx,0)+C[dy][3]×a' x,y+2(dx,0)+(1<<(19-BitDepth)))>>(20-BitDepth))
}
wherein, a'x,y(dx, 0) is the temporary value for a sub-pixel on the integer pixel row, defined as: a'x,y(dx,0)=(fC[dx][0]×A x-1,y+fC[dx][1]×A x,y+fC[dx][2]×A x+1, y+fC[dx][3]×A x+2,y+((1<<(BitDepth-8))>>1))>>(BitDepth-8)。
The chroma interpolation filter coefficients are shown in table 2:
TABLE 2
Figure PCTCN2021106081-APPB-000002
Figure PCTCN2021106081-APPB-000003
Fig. 12 is a schematic diagram of a flow of affine prediction, and as shown in fig. 12, a common affine prediction method may include the following steps:
step 101, determining a motion vector of a control point.
And step 102, determining the motion vector of the sub-block according to the motion vector of the control point.
And 103, predicting the sub-blocks according to the motion vectors of the sub-blocks.
Currently, fig. 13 is a first flowchart illustrating a process of correcting a prediction value by a PROF, and as shown in fig. 13, when improving a prediction value of affine prediction based on a block by a PROF, the method may specifically include the following steps:
step 101, determining a motion vector of a control point.
And step 102, determining the motion vector of the sub-block according to the motion vector of the control point.
And 103, predicting the sub-blocks according to the motion vectors of the sub-blocks.
And step 104, determining the deviation of each position in the sub-block and the motion vector of the sub-block according to the motion vector of the control point and the motion vector of the sub-block.
And step 105, determining the motion vector of the sub-block according to the motion vector of the control point.
Step 106, the horizontal and vertical gradients are calculated for each position using the predicted values based on the sub-blocks.
And step 107, calculating the deviation value of the predicted value of each position according to the motion vector deviation and the gradients in the horizontal and vertical directions of each position by using the optical flow principle.
And step 108, adding the deviation value of the predicted value to the predicted value of each position based on the subblocks to obtain the corrected predicted value.
At present, fig. 14 is a schematic flow chart of correcting a predicted value by a PROF, and as shown in fig. 14, when improving a predicted value of affine prediction based on a block by a PROF, the method may further include the following steps:
step 101, determining a motion vector of a control point.
And step 109, determining the motion vector of the sub-block and the deviation of each position in the sub-block and the motion vector of the sub-block according to the motion vector of the control point.
And 103, predicting the sub-blocks according to the motion vectors of the sub-blocks.
Step 106, the horizontal and vertical gradients are calculated for each position using the predicted values based on the sub-blocks.
And step 107, calculating the deviation value of the predicted value of each position according to the motion vector deviation and the gradients in the horizontal and vertical directions of each position by using the optical flow principle.
And step 108, adding the deviation value of the predicted value to the predicted value of each position based on the subblocks to obtain the corrected predicted value.
The prediction correction PROF using the optical flow principle can correct the affine prediction based on the sub-blocks by using the optical flow principle, thereby improving the compression performance. However, the optical flow calculation method using the PROF is effective when the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is very small, that is, the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is very small, but since the PROF depends on the horizontal and vertical gradients of the reference position, the horizontal and vertical gradients of the reference position cannot truly reflect the horizontal and vertical gradients between the reference position and the actual position when the actual position is far from the reference position, and therefore, the method is not particularly effective when the deviation between the motion vector of the pixel position within the sub-block and the sub-block motion vector is large.
Therefore, the conventional method for correcting the predicted value by the PROF is not strict, and the method cannot be well applied to all scenes when affine prediction is improved, and the encoding performance needs to be improved.
In order to solve the drawbacks of the prior art, in the embodiment of the present application, after the prediction based on the sub-block, the point-based secondary prediction may be performed on the basis of the first prediction value based on the sub-block for the pixel position where the motion vector deviates from the motion vector of the sub-block, so as to obtain the second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.
It should be understood that the present application provides a video coding system, fig. 15 is a schematic block diagram illustrating a component of the video coding system provided in the present application, and as shown in fig. 15, the video coding system 11 may include: a transformation unit 111, a quantization unit 112, a mode selection and encoding control logic unit 113, an intra prediction unit 114, an inter prediction unit 115 (including motion compensation and motion estimation), an inverse quantization unit 116, an inverse transformation unit 117, a loop filtering unit 118, an encoding unit 119, and a decoded image buffer unit 110; for an input original video signal, a video reconstruction block can be obtained by dividing a Coding Tree Unit (CTU), a Coding mode is determined by a mode selection and Coding control logic Unit 113, and then residual pixel information obtained by intra-frame or inter-frame prediction is transformed by a transformation Unit 111 and a quantization Unit 112, including transforming the residual information from a pixel domain to a transformation domain and quantizing the obtained transformation coefficient, so as to further reduce the bit rate; the intra-prediction unit 114 is configured to perform intra-prediction on the video reconstructed block; wherein, the intra prediction unit 114 is configured to determine an optimal intra prediction mode (i.e. a target prediction mode) of the video reconstructed block; inter-prediction unit 115 is to perform inter-prediction encoding of the received video reconstructed block relative to one or more blocks in one or more reference frames to provide temporal prediction information; wherein motion estimation is the process of generating motion vectors that can estimate the motion of the video reconstructed block, and then motion compensation is performed based on the motion vectors determined by motion estimation; after determining the inter prediction mode, the inter prediction unit 115 is also configured to supply the selected inter prediction data to the encoding unit 119, and also to send the calculated determined motion vector data to the encoding unit 119; furthermore, the inverse quantization unit 116 and the inverse transformation unit 117 are used for reconstruction of the video reconstruction block, reconstructing a residual block in the pixel domain, which removes blocking artifacts through the loop filtering unit 118, and then adding the reconstructed residual block to a predictive block in the frame of the decoded picture buffer unit 110 to generate a reconstructed video reconstruction block; coding section 119 is for coding various coding parameters and quantized transform coefficients. And the decoded picture buffer unit 110 is used to store reconstructed video reconstructed blocks for prediction reference. As the video coding proceeds, new reconstructed video blocks are generated, and these reconstructed video blocks are stored in the decoded picture buffer unit 110.
Fig. 16 is a schematic block diagram illustrating a composition of a video decoding system provided in an embodiment of the present application, and as shown in fig. 16, the video decoding system 12 may include: a decoding unit 121, an inverse transform unit 127, and inverse quantization unit 122, intra prediction unit 123, motion compensation unit 124, loop filter unit 125, and decoded picture buffer unit 126; after the input video signal is coded by the video coding system 11, the code stream of the video signal is output; the code stream is input into the video decoding system 12, and first passes through the decoding unit 121 to obtain a decoded transform coefficient; the transform coefficients are processed by an inverse transform unit 127 and an inverse quantization unit 122 to produce a residual block in the pixel domain; intra-prediction unit 123 may be used to generate prediction data for a current video decoded block based on the determined intra-prediction direction and data from previously decoded blocks of the current frame or picture; motion compensation unit 124 is a predictive block that determines prediction information for a video decoded block by parsing motion vectors and other associated syntax elements and uses the prediction information to generate the video decoded block being decoded; forming a decoded video block by summing the residual block from inverse transform unit 127 and inverse quantization unit 122 with the corresponding predictive block generated by intra prediction unit 123 or motion compensation unit 124; the decoded video signal passes through the loop filtering unit 125 to remove blocking artifacts, which may improve video quality; the decoded video blocks are then stored in the decoded picture buffer unit 126, and the decoded picture buffer unit 126 stores reference pictures for subsequent intra prediction or motion compensation, and also for the output of the video signal, resulting in a restored original video signal.
The inter-frame prediction method provided by the embodiment of the present application mainly acts on the inter-frame prediction unit 215 of the video coding system 11 and the inter-frame prediction unit, i.e., the motion compensation unit 124, of the video decoding system 12; that is, if the video encoding system 11 can obtain a better prediction effect by the inter-frame prediction method provided in the embodiment of the present application, the video decoding system 12 can also improve the video decoding recovery quality accordingly.
Based on this, the technical solution of the present application is further elaborated below with reference to the drawings and the embodiments. Before the detailed description is given, it should be noted that "first", "second", "third", etc. are mentioned throughout the specification only for distinguishing different features, and do not have the functions of defining priority, precedence, size relationship, etc.
It should be noted that, in the present embodiment, an example is described based on the AVS3 standard, and the inter-frame prediction method proposed in the present application may be applied to other coding standard technologies such as VVC, and the present application is not limited to this.
The embodiment of the application provides an inter-frame prediction method, which is applied to video decoding equipment, namely a decoder. The functions performed by the method may be implemented by the first processor in the decoder calling a computer program, which of course may be stored in the first memory, it being understood that the decoder comprises at least the first processor and the first memory.
Further, in an embodiment of the present application, fig. 17 is a first flowchart illustrating an implementation of an inter prediction method, and as shown in fig. 17, the method for a decoder to perform inter prediction may include the following steps:
step 201, analyzing the code stream, and obtaining the prediction mode parameter of the current block.
In an embodiment of the present application, a decoder may first parse a binary code stream to obtain the prediction mode parameters of the current block. Wherein the prediction mode parameter may be used to determine a prediction mode used by the current block.
It should be noted that an image to be decoded may be divided into a plurality of image blocks, and an image block to be decoded currently may be referred to as a current block (which may be represented by a CU), and an image block adjacent to the current block may be referred to as a neighboring block; that is, in the image to be decoded, the current block has a neighboring relationship with the neighboring block. Here, each current block may include a first image component, a second image component, and a third image component, that is, the current block represents an image block to be currently subjected to prediction of the first image component, the second image component, or the third image component in an image to be decoded.
Wherein, assuming that the current block performs the first image component prediction, and the first image component is a luminance component, that is, the image component to be predicted is a luminance component, then the current block may also be called a luminance block; alternatively, assuming that the current block performs the second image component prediction, and the second image component is a chroma component, that is, the image component to be predicted is a chroma component, the current block may also be referred to as a chroma block.
Further, in the embodiments of the present application, the prediction mode parameter may indicate not only the prediction mode adopted by the current block but also a parameter related to the prediction mode.
It is understood that, in the embodiments of the present application, the prediction modes may include inter prediction modes, conventional intra prediction modes, and non-conventional intra prediction modes, etc.
That is to say, on the encoding side, the encoder may select an optimal prediction mode to perform pre-encoding on the current block, and in this process, the prediction mode of the current block may be determined, and then a prediction mode parameter for indicating the prediction mode is determined, so that the corresponding prediction mode parameter is written into the code stream and transmitted to the decoder by the encoder.
Correspondingly, on the decoder side, the decoder can directly acquire the prediction mode parameters of the current block by analyzing the code stream, and determines the prediction mode used by the current block and the related parameters corresponding to the prediction mode according to the prediction mode parameters acquired by analyzing.
Further, in an embodiment of the present application, after parsing to obtain the prediction mode parameter, the decoder may determine whether the current block uses an inter prediction mode based on the prediction mode parameter.
Step 202, when the prediction mode parameter indicates that the inter prediction value of the current block is determined by using the inter prediction mode, determining a first motion vector of a sub-block of the current block; wherein the current block comprises a plurality of sub-blocks.
In an embodiment of the present application, after parsing to obtain the prediction mode parameter, if the parsed prediction mode parameter indicates that the current block determines the inter prediction value of the current block using the inter prediction mode, the decoder may determine the first motion vector of each sub-block of the current block. Wherein a sub-block corresponds to a first motion vector.
It should be noted that, in the embodiment of the present application, the current block is an image block to be decoded in the current frame, the current frame is sequentially decoded in a certain order in the form of an image block, and the current block is an image block to be decoded in the current frame at the next moment in the order. The current block may have a variety of sizes, such as 16 × 16, 32 × 32, or 32 × 16, where the numbers represent the number of rows and columns of pixel points on the current block.
Further, in the embodiment of the present application, the current block may be divided into a plurality of sub-blocks, where the size of each sub-block is the same, and the sub-blocks are a set of pixels with a smaller specification. The sub-blocks may be 8 × 8 or 4 × 4 in size.
For example, in the present application, the size of the current block is 16 × 16, and the current block may be divided into 4 sub-blocks each having a size of 8 × 8.
It can be understood that, in the embodiment of the present application, in the case that the decoder parses the code stream to obtain the prediction mode parameter indicating that the inter prediction value of the current block is determined using the inter prediction mode, the inter prediction method provided in the embodiment of the present application may be continuously used.
In an embodiment of the present application, further, when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, the method of the decoder determining the first motion vector of the sub-block of the current block may include the steps of:
step 202a, analyzing the code stream, and obtaining the affine mode parameter and the prediction reference mode of the current block.
Step 202b, when the affine mode parameter indicates that the affine mode is used, determining a control point mode and a sub-block size parameter.
Step 202c, determining a first motion vector according to the prediction reference mode, the control point mode and the sub-block size parameter.
In an embodiment of the application, after the decoder parses the obtained prediction mode parameters, if the parsed prediction mode parameters indicate that the current block determines the inter prediction value of the current block by using the inter prediction mode, the decoder may obtain the affine mode parameters and the prediction reference mode by parsing the code stream.
It should be noted that, in the embodiment of the present application, an affine mode parameter is used to indicate whether to use an affine mode. Specifically, the affine mode parameter may be an affine motion compensation enable flag affine _ enable _ flag, and the decoder may further determine whether to use the affine mode through determination of a value of the affine mode parameter.
That is, in the present application, the affine pattern parameter may be a binary variable. If the value of the affine mode parameter is 1, indicating to use the affine mode; and if the value of the affine mode parameter is 0, indicating that the affine mode is not used.
It is understood that in the present application, the decoder parses the code stream, and if the affine mode parameter is not parsed, it can also be understood as indicating that the affine mode is not used.
For example, in the present application, the value of the affine mode parameter may be equal to the value of the affine motion compensation permission flag affine _ enable _ flag, and if the value of the affine _ enable _ flag is '1', it indicates that affine motion compensation may be used; if the value of affine _ enable _ flag is '0', it indicates that affine motion compensation should not be used.
Further, in the embodiment of the present application, if the affine mode parameter obtained by the decoder parsing the code stream indicates that the affine mode is used, the decoder may perform obtaining the control point mode and the sub-block size parameter.
In the embodiment of the present application, the control point mode is used to determine the number of control points. In the affine model, one sub-block may have 2 control points or 3 control points, and accordingly, the control point pattern may be a control point pattern corresponding to 2 control points or a control point pattern corresponding to 3 control points. I.e. the control point mode may comprise a 4-parameter mode and a 6-parameter mode.
It can be understood that, in the embodiment of the present application, for the AVS3 standard, if the current block uses the affine mode, the decoder needs to determine the number of control points in the affine mode of the current block, so as to determine whether to use the 4-parameter (2 control points) mode or the 6-parameter (3 control points) mode.
Further, in the embodiment of the present application, if the affine mode parameter obtained by the decoder parsing the code stream indicates that the affine mode is used, the decoder may further obtain the sub-block size parameter by parsing the code stream.
Specifically, the subblock size parameter may be determined by an affine predictor subblock size flag affine _ subblocklock _ size _ flag, and the decoder obtains the subblock size flag by parsing the code stream and determines the size of the subblock of the current block according to a value of the subblock flag. The size of the sub-block may be 8 × 8 or 4 × 4. Specifically, in the present application, the sub-block size flag may be a binary variable. If the value of the sub-block size flag is 1, indicating that the sub-block size parameter is 8 multiplied by 8; if the sub-block size flag takes a value of 0, it indicates that the sub-block size parameter is 4 × 4.
For example, in the present application, the value of the sub-block size flag may be equal to the value of an affine predictor block size flag affine _ sub-block _ size _ flag, and if the value of the affine _ sub-block _ size _ flag is '1', the current block is divided into sub-blocks with a size of 8 × 8; if the value of affine _ sub _ size _ flag is '0', the current block is divided into subblocks of size 4 × 4.
It can be understood that, in the present application, the decoder parses the code stream, and if the sub-block size flag is not parsed, it can also be understood that the current block is divided into 4 × 4 sub-blocks. That is, if affine _ sub _ size _ flag does not exist in the code stream, the value of the sub-block size flag may be directly set to 0.
Further, in the embodiment of the present application, after determining the control point mode and the sub-block size parameter, the decoder may further determine the first motion vector of the sub-block in the current block according to the prediction reference mode, the control point mode, and the sub-block size parameter.
Specifically, in the embodiment of the present application, the decoder may determine the control point motion vector group according to the prediction reference mode; a first motion vector for the sub-block may then be determined based on the set of control point motion vectors, the control point pattern, and the sub-block size parameter.
It will be appreciated that in embodiments of the present application, a set of control point motion vectors may be used to determine the motion vectors for the control points.
It should be noted that, in the embodiment of the present application, the decoder may traverse each sub-block in the current block according to the above method, and determine the first motion vector of each sub-block by using the control point motion vector group, the control point mode, and the sub-block size parameter of each sub-block, so that the motion vector set may be constructed and obtained according to the first motion vector of each sub-block.
It is to be understood that, in the embodiment of the present application, the first motion vector of each sub-block of the current block may be included in the motion vector set of the current block.
Further, in the embodiment of the present application, when determining the first motion vector according to the control point motion vector group, the control point mode, and the sub-block size parameter, the decoder may first determine a difference variable according to the control point motion vector group, the control point mode, and the size parameter of the current block; a sub-block position may then be determined based on the prediction mode parameter and the sub-block size parameter; finally, the first motion vector of the sub-block can be determined by using the difference variable and the position of the sub-block, and then a motion vector set of a plurality of sub-blocks of the current block can be obtained.
For example, in the present application, the difference variable may include 4 variables, specifically dHorX, dVerX, dHorY, and dVerY, and when calculating the difference variable, the decoder needs to determine a control point motion vector group first, where the control point motion vector group may characterize the motion vector of the control point.
Specifically, if the control point mode is a 6-parameter mode, i.e., there are 3 control points, the control point motion vector group may be a motion vector group including 3 motion vectors, denoted as mvsffine (mv0, mv1, mv 2); if the control point mode is a 4-parameter mode, i.e. there are 2 control points, then the control point motion vector set may be a motion vector set comprising 2 motion vectors, denoted mvsAffine (mv0, mv 1).
The decoder then performs the calculation of the difference variable using the set of control point motion vectors:
dHorX=(mv1_x-mv0_x)<<(7-Log(width));
dHorY=(mv1_y-mv0_y)<<(7-Log(width));
if the motion vector group is mvsAffine (mv0, mv1, mv2), then:
dVerX=(mv2_x-mv0_x)<<(7-Log(height));
dVerY=(mv2_y-mv0_y)<<(7-Log(height));
if the motion vector group is mvsAffine (mv0, mv1), then:
dVerX=-dHorY;
dVerY=dHorX。
the width and height are respectively a width and a height of the current block, that is, a size parameter of the current block, and specifically, the size parameter of the current block may be obtained by a decoder by analyzing a code stream.
Further, in embodiments of the present application, the decoder, after determining the difference variable, may then determine a sub-block position based on the prediction mode parameter and the sub-block size parameter. Specifically, the decoder can determine the size of the sub-block through the block size flag, and can determine which prediction mode is specifically used through the prediction mode parameter, and then can determine the position of the sub-block according to the size of the sub-block and the prediction mode used.
For example, in the present application, if the prediction reference mode of the current block is set to 2, i.e., the third reference mode 'Pred _ List 01', or the subblock size flag is set to 1, i.e., both the width and height of a subblock are equal to 8, (x, y) is the coordinates of the upper left corner position of an 8 × 8 subblock, the coordinates xPos and yPos of the subblock position may be determined by:
if the sub-block is the control point of the upper left corner of the current block, both xPos and yPos are equal to 0;
if the sub-block is the control point of the upper right corner of the current block, xPos is equal to width, and yPos is equal to 0;
if the subblock is the control point in the lower left corner of the current block, and the control point motion vector set may be a motion vector set including 3 motion vectors, xPos is equal to 0 and yPos is equal to height;
otherwise, xPos equals (x-xE) +4, and yPos equals (y-yE) + 4.
For example, in the present application, if the prediction reference mode of the current block has a value of 0 or 1, i.e., the first reference mode 'Pred _ List 0' or the second reference mode 'Pred _ List 1', and the subblock size flag has a value of 0, i.e., both the width and height of the subblock are equal to 4, (x, y) are coordinates of the upper left corner position of the 4 × 4 subblock, the coordinates xPos and yPos of the subblock position may be determined by:
if the sub-block is the control point of the upper left corner of the current block, both xPos and yPos are equal to 0;
if the sub-block is the control point of the upper right corner of the current block, xPos is equal to width, and yPos is equal to 0;
if the sub-block is a control point in the lower left corner of the current block, and the control point motion vector group can be a motion vector group comprising 3 motion vectors, xPos is equal to 0 and yPos is equal to height;
otherwise, xPos equals (x-xE) +2 and yPos equals (y-yE) + 2.
Further, in the embodiment of the present application, after the decoder calculates and obtains the positions of the sub-blocks, a first motion vector of the sub-block may be determined based on the positions of the sub-blocks and the difference variable, and finally, a motion vector set for obtaining a plurality of sub-blocks of the current block may be constructed by traversing each sub-block of the current block to obtain the first motion vector of each sub-block.
Illustratively, in the present application, after determining the sub-block locations xPos and yPos, the decoder may determine a first motion vector mvE (mvE _ x, mvE _ y) for the sub-block in the following manner
mvE_x=Clip3(-131072,131071,Rounding((mv0_x<<7)+dHorX×xPos+dVerX×yPos,7));
mvE_y=Clip3(-131072,131071,Rounding((mv0_y<<7)+dHorY×xPos+dVerY×yPos,7))。
It should be noted that, in the present application, when determining the deviation between each position in the sub-block and the motion vector of the sub-block, if the current block uses an affine prediction model, the motion vector of each position in the sub-block can be calculated according to the formula of the affine prediction model, and the deviation can be obtained by subtracting the motion vector of the sub-block. If the motion vectors of the sub-blocks all select the same position of the motion vector within the sub-block, e.g. 4x4 block uses a position from the top left corner (2, 2) and 8x8 block uses a position from the top left corner (4, 4), the motion vector deviations of the same position of each sub-block are the same according to the affine model used in the present standard including VVC and AVS 3. But the lower left corner in the case of AVS at the upper left corner, upper right corner, and 3 control points (the a, B, C positions shown in figure 7 in the AVS3 text above) is different from the positions used by other blocks, and correspondingly different from other blocks when calculating the motion vector deviations for the sub-blocks at the upper left corner, upper right corner, and lower left corner in the case of 3 control points. The concrete is as shown in the embodiment.
Step 203, determining a first predicted value of the sub-block and a motion vector deviation between the pixel position and the sub-block based on the first motion vector; wherein, the pixel position is the position of the pixel point in the sub-block.
In an embodiment of the present application, after determining the first motion vector of each sub-block of the current block, the decoder may determine a first predictor of the sub-block and a motion vector offset between the pixel position and the sub-block, respectively, based on the first motion vector of the sub-block.
It is understood that, in the embodiment of the present application, step 203 may specifically include:
step 203a, determining a first predictor of the sub-block based on the first motion vector.
Step 203b, determining a motion vector deviation between the pixel position and the sub-block based on the first motion vector.
In this application, after determining the first motion vector of each sub-block of the current block, the decoder may first perform step 203a and then perform step 203b, may first perform step 203b and then perform step 203a, may also first perform step 203b and then perform step 203a, and may also perform step 203a and step 203b at the same time.
Further, in an embodiment of the present application, the decoder may first determine a sample matrix when determining the first predictor of the sub-block based on the first motion vector; wherein the sample matrix comprises a luma sample matrix and a chroma sample matrix; the first predictor may then be determined according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the set of motion vectors.
It should be noted that, in the embodiment of the present application, when determining the first predictor according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the motion vector set, the decoder may first determine a target motion vector from the motion vector set according to the prediction reference mode and the sub-block size parameter; then, a prediction sample matrix can be determined by using a reference image queue corresponding to a prediction reference mode, a reference index sample matrix and the target motion vector; wherein the prediction sample matrix comprises the first prediction values of a plurality of sub-blocks.
Specifically, in an embodiment of the present application, the sample matrix may include a luma sample matrix and a chroma sample matrix, and accordingly, the prediction sample matrix determined by the decoder may include a luma prediction sample matrix and a chroma prediction sample matrix, wherein the luma prediction sample matrix includes a first luma predictor of the plurality of sub-blocks, the chroma prediction sample matrix includes a first chroma predictor of the plurality of sub-blocks, and the first luma predictor and the first chroma predictor constitute the first predictor of the sub-blocks.
For example, in the present application, it is assumed that the position of the top-left sample of the current block in the luma sample matrix of the current image is (xE, yE). If the prediction reference mode of the current block is taken to be 0, i.e., the first reference mode 'PRED _ List 0' is used, and the subblock size flag is taken to be 0, i.e., the subblock size parameter is 4 × 4, the target motion vector mv0E0 is the first motion vector of the 4 × 4 subblock where the motion vector set of the current block is at the (xE + x, yE + y) position. The value of element pred matrix l0[ x ] [ y ] in luma prediction sample matrix pred matrix l0 is the sample value in the 1/16-precision luma sample matrix with reference index ref idxl0 in reference image queue 0 at position (((xE + x) < <4) + mv0E0_ x, ((yE + y) < <4) + mv0E0_ y), and the value of element pred matrix l0[ x ] [ y ] in chroma prediction sample matrix pred matrix l0 is the sample value in the 1/32-precision chroma sample matrix with reference index ref idxl0 in reference image queue 0 at position (((xE +2 × x) < <4) + MvC _ x, ((yE +2 × y) <4) + MvC _ y). Where x1 ═ ((xE +2 × x) > >3) < <3, y1 ═ ((yE +2 × y) > >3) < <3, mv1E0 is a first motion vector of 4 × 4 units where the motion vector set of the current block is at the (x1, y1) position, mv2E0 is a first motion vector of 4 × 4 units where the motion vector set of the current block is at the (x1+4, y1) position, mv3E0 is a first motion vector of 4 × 4 units where the motion vector set of the current block is at the (x1, y1+4) position, and mv4E0 is a first motion vector of 4 × 4 units where the motion vector set of the current block is at the (x1+4, y1+4) position.
Specifically, MvC _ x and MvC _ y may be determined by:
MvC_x=(mv1E0_x+mv2E0_x+mv3E0_x+mv4E0_x+2)>>2
MvC_y=(mv1E0_y+mv2E0_y+mv3E0_y+mv4E0_y+2)>>2
for example, in the present application, assuming that the position of the top-left sample of the current block in the luma sample matrix of the current picture is (xE, yE), if the prediction reference mode of the current block is taken to be 0, i.e., the first reference mode 'PRED _ List 0' is used, and the sub-block size flag is taken to be 1, i.e., the sub-block size parameter is 8 × 8, then the target motion vector mv0E0 is the first motion vector of 8 × 8 units of the motion vector set of the current block at the (xE + x, yE + y) position. The value of element pred matrix l0[ x ] [ y ] in luma prediction sample matrix pred matrix l0 is the sample value in the 1/16-precision luma sample matrix with reference index ref idxl0 in reference image queue 0 at position (((xE + x) < <4) + mv0E0_ x, ((yE + y) < <4) + mv0E0_ y), and the value of element pred matrix l0[ x ] [ y ] in chroma prediction sample matrix pred matrix l0 is the sample value in the 1/32-precision chroma sample matrix with reference index ref idxl0 in reference image queue 0 at position (((xE +2 × x) < <4) + MvC _ x, ((yE +2 × y) <4) + MvC _ y). Wherein MvC _ x is equal to mv0E0_ x, and MvC _ y is equal to mv0E 0.
For example, in the present application, it is assumed that the position of the top-left sample of the current block in the luma sample matrix of the current image is (xE, yE). If the prediction reference mode of the current block is taken to be 1, i.e., the second reference mode 'PRED _ List 1' is used, and the subblock size flag is taken to be 0, i.e., the subblock size parameter is 4 × 4, the target motion vector mv0E1 is the first motion vector of the 4 × 4 unit of the motion vector set of the current block at the (xE + x, yE + y) position. The value of element pred matrix l1[ x ] [ y ] in luma prediction sample matrix pred matrix l1 is the sample value of the position in the 1/16-precision luma sample matrix with reference index ref idxl1 in reference image queue 1 (((xE + x) < <4) + mv0E1_ x, ((yE + y) < <4) + mv0E1_ y), and the value of element pred matrix l1[ x ] [ y ] in chroma prediction sample matrix pred matrix l1 is the sample value of the position in the 1/32-precision chroma sample matrix with reference index ref idxl1 in reference image queue 1 (((xE +2 × x) < <4) + MvC _ x, ((yE +2 × y) <4) + MvC _ y). Where x1 ═ ((xE +2 × x) > >3) < <3, y1 ═ yE +2 × y) > >3 < <3, mv1E1 is the first motion vector of a 4 × 4 unit of the MvArray first motion vector set at the (x1, y1) position, mv2E1 is the first motion vector of a 4 × 4 unit of the MvArray first motion vector set at the (x1+4, y1) position, mv3E1 is the first motion vector of a 4 × 4 unit of the MvArray first motion vector set at the (x1, y1+4) position, and mv4E1 is the first motion vector of a 4 × 4 unit of the MvArray first motion vector set at the (x1+4, y1+4) position.
Specifically, MvC _ x and MvC _ y may be determined by:
MvC_x=(mv1E1_x+mv2E1_x+mv3E1_x+mv4E1_x+2)>>2
MvC_y=(mv1E1_y+mv2E1_y+mv3E1_y+mv4E1_y+2)>>2
for example, in the present application, it is assumed that the position of the top-left sample of the current block in the luma sample matrix of the current image is (xE, yE). If the prediction reference mode of the current block is taken to be 1, i.e., the second reference mode 'PRED _ List 1' is used, and the sub-block size flag is taken to be 1, i.e., the sub-block size parameter is 8 × 8, the target motion vector mv0E1 is the first motion vector of the 8 × 8 unit of the motion vector set of the current block at the (xE + x, yE + y) position. The value of element pred matrix l1[ x ] [ y ] in luma prediction sample matrix pred matrix l1 is the sample value of the position in the 1/16-precision luma sample matrix with reference index ref idxl1 in reference image queue 1 (((xE + x) < <4) + mv0E1_ x, ((yE + y) < <4) + mv0E1_ y), and the value of element pred matrix l1[ x ] [ y ] in chroma prediction sample matrix pred matrix l1 is the sample value of the position in the 1/32-precision chroma sample matrix with reference index ref idxl1 in reference image queue 1 (((xE +2 × x) < <4) + MvC _ x, ((yE +2 × y) <4) + MvC _ y). Where MvC _ x equals mv0E1_ x, MvC _ y equals mv0E 1.
For example, in the present application, it is assumed that the position of the top-left sample of the current block in the luma sample matrix of the current image is (xE, yE). If the prediction reference mode of the current block is taken to be 2, i.e., the third reference mode 'PRED _ List 01' is used, the target motion vector mv0E0 is the first motion vector of 8 × 8 units of the motion vector set of the current block at the (xE + x, yE + y) position, and the target motion vector mv0E1 is the first motion vector of 8 × 8 units of the motion vector set of the current block at the (x, y) position. The value of element pred matrix xl0[ x ] [ y ] in luma prediction sample matrix pred matrix xl0 is the sample value in the 1/16 precision luma sample matrix with reference index ref idxl0 in reference image queue 0 (((xE + x) < <4) + mv0E0_ x), ((yE + y) < <4) + mv0E0_ y), the value of element pred matrix xl0[ x ] [ y ] in chroma prediction sample matrix pred matrix 0 is the sample value in the 1/32 precision chroma sample matrix with reference index ref idxl0 in reference image queue 0 (((xE +2 × x) < <4) + MvC0_ x, ((yE +2 × y) <4) + MvC0_ y), the value of element pred matrix 6326 is the value in the luma sample matrix 638 + mv sample matrix 596 2 × 596 # 3 with reference index ref idxl0 (ref x) (ref x × 2 × 596 +2 × y) < + 3+ y), the sample value of ((yE + y) < <4) + mv0E1_ y), the value of the element predmatrix l1[ x ] [ y ] in the chroma prediction sample matrix predmatrix l1 is the sample value of the position in the 1/32 precision chroma sample matrix with reference index RefIdxL1 in the reference image queue 1 (((xE +2 × x) < <4) + MvC1_ x, ((yE +2 × y) < <4) + MvC1_ y). Where MvC0_ x equals mv0E0_ x, MvC0_ y equals mv0E0_ y, MvC1_ x equals mv0E1_ x, MvC1_ y equals mv0E1_ y.
It should be noted that, in the embodiment of the application, the luminance sample matrix in the sample matrix may be an 1/16 precision luminance sample matrix, and the chrominance sample matrix in the sample matrix may be a 1/32 precision chrominance sample matrix.
It is understood that in the embodiments of the present application, the reference image queue and the reference index obtained by the decoder by parsing the codestream are different for different prediction reference modes.
Further, in the embodiment of the present application, when the decoder determines the sample matrix, a luminance interpolation filter coefficient and a chrominance interpolation filter coefficient may be obtained first; the luma sample matrix may then be determined based on the luma interpolation filter coefficients, while the chroma sample matrix may be determined based on the chroma interpolation filter coefficients.
Illustratively, in the present application, when determining the luminance sample matrix, the decoder obtains the luminance interpolation filter coefficients as shown in table 1 above, and then obtains the luminance sample matrix by calculation according to the pixel positions and the sample positions as shown in fig. 8 and 9.
In particular, the sample position ax,0(x is 1-15) is obtained by filtering 8 integer values closest to the interpolation point in the horizontal direction, and the predicted value is obtained by the following method:
a x,0=Clip1((fL[x][0]×A -3,0+fL[x][1]×A -2,0+fL[x][2]×A -1,0+fL[x][3]×A 0,0+fL[x][4]×A 1,0+fL[x][5]×A 2,0+fL[x][6]×A 3,0+fL[x][7]×A 4,0+32)>>6)。
in particular, the sample position a0,y(y=115) filtering by 8 integer values nearest to the interpolation point in the vertical direction, wherein the predicted value is obtained by the following method:
a 0,y=Clip1((fL[y][0]×A 0,-3+fL[y][1]×A -2,0+fL[y][2]×A -1,0+fL[y][3]×A 0,0+fL[y][4]×A 1,0+fL[y][5]×A 2,0+fL[y][6]×A 3, 0+fL[y][7]×A -4,0+32)>>6)。
in particular, the sample position ax,yThe predicted values (x1 to 15, y1 to 15) are obtained as follows:
a x,y=Clip1((fL[y][0]×a' x,y-3+fL[y][1]×a' x,y-2+fL[y][2]× a'x,y-1+fL[y][3]× a'x,y+fL[y][4]× a'x,y+1+fL[y][5]× a'x,y+2+fL[y][6]× a'x, y+3+fL[y][7]× a'x,y+4+(1<<(19-BitDepth)))>>(20-BitDepth))。
wherein:
a' x,y=(fL[x][0]×A -3,y+fL[x][1]×A -2,y+fL[x][2]×A -1,y+fL[x][3]×A 0,y+fL[x][4]×A 1,y+fL[x][5]×A 2,y+fL[x][6]×A 3,y+fL[x][7]×A 4, y+((1<<(BitDepth-8))>>1))>>(BitDepth-8)。
for example, in the present application, when determining the chroma sample matrix, the decoder may first parse the code stream to obtain the chroma interpolation filter coefficients as shown in table 2, and then calculate to obtain the chroma sample matrix according to the pixel positions and the sample positions as shown in fig. 10 and fig. 11.
Specifically, for a sub-pixel point where dx is equal to 0 or dy is equal to 0, it can be directly interpolated with chroma integer pixels, and for a point where dx is not equal to 0 and dy is not equal to 0, it is calculated using sub-pixels on the integer pixel row (dy is equal to 0):
if(dx==0){
a x,y(0,dy)=Clip3(0,(1<<BitDepth)-1,(fC[dy][0]×A x,y-1+fC[dy][1]×A x,y+fC[dy][2]×A x,y+1+fC[dy][3]×A x,y+2+32)>>6)
}
else if(dy==0){
a x,y(dx,0)=Clip3(0,(1<<BitDepth)-1,(fC[dx][0]×A x-1,y+fC[dx][1]×A x,y+fC[dx][2]×A x+1,y+fC[dx][3]×A x+2,y+32)>>6)
}
else{
a x,y(dx,dy)=Clip3(0,(1<<BitDepth)-1,(C[dy][0]×a' x,y-1(dx,0)+C[dy][1]×a' x,y(dx,0)+C[dy][2]×a' x,y+1(dx,0)+C[dy][3]×a' x,y+2(dx,0)+(1<<(19-BitDepth)))>>(20-BitDepth))
}
wherein, a'x,y(dx, 0) is the temporary value for a sub-pixel on the integer pixel row, defined as: a'x,y(dx,0)=(fC[dx][0]×A x-1,y+fC[dx][1]×A x,y+fC[dx][2]×A x+1, y+fC[dx][3]×A x+2,y+((1<<(BitDepth-8))>>1))>>(BitDepth-8)。
Further, in the embodiment of the present application, when the decoder is based on the motion vector deviation between the pixel position and the subblock, the decoder may analyze the code stream first to obtain a secondary prediction parameter; if the secondary prediction parameters indicate that secondary prediction is used, the decoder may determine the motion vector offset between the sub-block and each pixel location based on the difference variable.
Specifically, in the embodiment of the present application, when determining the motion vector bias between the sub-block and each pixel position based on the difference variable, the decoder may determine 4 difference variables dHorX, dVerX, dHorY, and dVerY according to the control point motion vector group, the control point mode, and the size parameter of the current block according to the method set forth in step 202, and then further determine the motion vector bias corresponding to each pixel position in the sub-block by using the difference variables.
Illustratively, in the present application, the width and height are the width and height of the current block obtained by the decoder, respectively, and the width and height of the subblock are determined by using the subblock size parameter. Assuming that (i, j) is the coordinate of any pixel point inside the subblock, wherein the range of i is 0 to (sub-width-1), and the range of j is 0 to (sub-height-1), the motion vector deviation of each pixel (i, j) position inside the subblocks of 4 different types can be calculated by the following method:
if the subblock is the control point A in the upper left corner of the current block, then (i, j) the motion vector disparity dMvA [ i ] [ j ] for the pixel:
dMvA[i][j][0]=dHorX×i+dVerX×j
dMvA[i][j][1]=dHorY×i+dVerY×j;
if the sub-block is the control point B in the upper right corner of the current block, then (i, j) the motion vector disparity dMvB [ i ] [ j ] for the pixel:
dMvB[i][j][0]=dHorX×(i-subwidth)+dVerX×j
dMvB[i][j][1]=dHorY×(i-subwidth)+dVerY×j;
if the sub-block is the control point C in the lower left corner of the current block, which may be a motion vector set comprising 3 motion vectors, then the motion vector deviation dMvC [ i ] [ j ] for the (i, j) pixel:
dMvC[i][j][0]=dHorX×i+dVerX×(j-subheight)
dMvC[i][j][1]=dHorY×i+dVerY×(j-subheight);
otherwise, the motion vector deviation dMvN [ i ] [ j ] of the (i, j) pixel:
dMvN[i][j][0]=dHorX×(i–(subwidth>>1))+dVerX×(j–(subheight>>1))
dMvN[i][j][1]=dHorY×(i–(subwidth>>1))+dVerY×(j–(subheight>>1))。
where dMvX [ i ] [ j ] [0] represents the offset value of the motion vector offset in the horizontal component, and dMvX [ i ] [ j ] [1] represents the offset value of the motion vector offset in the vertical component. X is A, B, C or N.
It is to be understood that, in the embodiment of the present application, after determining the motion vector offset between the sub-block and each pixel position based on the difference variable, the decoder may use all the motion vector offsets corresponding to all the pixel positions in the sub-block to construct the motion vector offset matrix corresponding to the sub-block. It can be seen that the motion vector deviation matrix includes the motion vector deviation between the sub-block and any one of the internal pixel points, i.e. the motion vector deviation.
Further, in the embodiment of the present application, if the secondary prediction parameter obtained by the decoder parsing the code stream indicates that secondary prediction is not used, the decoder may select to directly use the first prediction value of the sub-block of the current block obtained in the above step 203a as the second prediction value of the sub-block, without performing the following processing of steps 204 and 205.
In particular, in embodiments of the present application, if the secondary prediction parameters indicate that secondary prediction is not used, the decoder may determine the second prediction value using the prediction sample matrix. The decoder may determine the first prediction value of the sub-block where the pixel position is located as the second prediction value of the decoder.
For example, in the present application, if the prediction reference mode of the current block takes a value of 0 or 1, i.e., a first reference mode 'PRED _ List 0' is used, or a second reference mode 'PRED _ List 1' is used, the first predictor of the sub-block where the pixel position is located may be selected directly from a prediction sample matrix including one luma prediction sample matrix and two chroma prediction sample matrices, and determined as the inter predictor of the pixel position, i.e., the second predictor.
For example, in this application, if the prediction reference mode of the current block is 2, that is, the third reference mode 'PRED _ List 01' is used, an average operation may be performed on 2 luma prediction sample matrices (2 groups of 4 chroma prediction sample matrices) included in the prediction sample matrix to obtain 1 averaged luma prediction sample (2 averaged chroma prediction samples), and finally, a first predictor of a sub-block where a pixel position is located is selected from the averaged luma prediction samples (2 averaged chroma prediction samples), and the first predictor is determined as an inter predictor of the pixel position, that is, a second predictor.
Step 204, determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape.
In an embodiment of the present application, after determining the first prediction value of the sub-block and the motion vector deviation between the pixel position and the sub-block based on the first motion vector, the decoder may further determine the filter coefficient of the two-dimensional filter according to the motion vector deviation.
It should be noted that, in the embodiment of the present application, the filter coefficient of the two-dimensional filter is related to the motion vector bias corresponding to the pixel position. That is, if the corresponding motion vector deviations are different for different pixel positions, the filter coefficients of the two-dimensional filter used are also different.
It will be appreciated that in embodiments of the present application, the two-dimensional filter is used for quadratic prediction using a plurality of adjacent pixel locations constituting the predetermined shape. The preset shape is a rectangle, a rhombus or any symmetrical shape.
That is, in the present application, the two-dimensional filter for performing quadratic prediction is a filter configured by adjacent points constituting a preset shape. The adjacent dots constituting the preset shape may include a plurality of dots, for example, 9 dots. The predetermined shape may be a symmetrical shape, for example, the predetermined shape may include a rectangle, a diamond shape, or any other symmetrical shape.
Illustratively, in the present application, the two-dimensional filter is a rectangular filter, and specifically, the two-dimensional filter is a filter composed of 9 adjacent pixel positions constituting a rectangle. Of the 9 pixel positions, the pixel position located at the center is the pixel position of the pixel position that currently requires quadratic prediction.
Further, in the embodiment of the present application, when determining the filter coefficient of the two-dimensional filter according to the motion vector deviation, the decoder may first analyze the code stream to obtain a scale parameter, and then may determine the filter coefficient corresponding to the pixel position according to the scale parameter and the motion vector deviation.
It should be noted that, in the embodiment of the present application, the scale parameter may include at least one scale value, and the motion vector deviation includes a horizontal deviation and a vertical deviation; wherein the at least one ratio value is a non-zero real number.
Specifically, in the present application, when the two-dimensional filter performs secondary prediction using 9 adjacent pixel positions that form a rectangle, the pixel position located at the center of the rectangle is a position to be predicted, and the other 8 pixel positions are sequentially located at adjacent positions of the position to be predicted, which are upper left, upper right, lower left, and lower left.
Accordingly, in the present application, the decoder may calculate and obtain 9 filter coefficient coefficients corresponding to the 9 adjacent pixel positions according to a preset calculation rule based on the at least one scale value and the motion vector deviation of the position to be predicted.
It should be noted that, in the present application, the preset calculation rule may include a plurality of different calculation manners, such as addition, subtraction, multiplication, and the like. Wherein, for different pixel positions, different calculation modes can be used for calculating the filter coefficient.
It is understood that, in the present application, in obtaining a plurality of filter coefficients corresponding to a plurality of pixel positions by calculation according to different calculation methods in a preset calculation rule, a part of the filter coefficients may be a linear function of the motion vector offset, that is, a linear relationship between the two filter coefficients, or may be a quadratic function or a high-order function of the motion vector offset, that is, a nonlinear relationship between the two filter coefficients.
That is, in the present application, any one of the plurality of filter coefficients corresponding to a plurality of adjacent pixel positions may be a linear function, a quadratic function, or a high-order function of the motion vector deviation.
Illustratively, in this application, the motion vector bias for a pixel location is assumed to be (dmv _ x, dmv _ y), where, if the coordinates of the target pixel location are (i, j), dmv _ x may be represented as dMvX [ i ] [ j ] [0], i.e., representing the bias value of the motion vector bias in the horizontal component, and dmv _ y may be represented as dMvX [ i ] [ j ] [1], i.e., representing the bias value of the motion vector bias in the vertical component.
Accordingly, table 3 is a filter coefficient obtained based on the motion vector biases (dmv _ x, dmv _ y), and as shown in table 3, for the two-dimensional filter, 9 filter coefficients corresponding to 9 neighboring pixel positions can be obtained according to the motion vector bias (horizontal bias is dmv _ x, and the vertical bias is dmv _ y) of the pixel position and different scaling parameters, such as m and n, wherein the decoder can directly set the filter coefficient of the central to-be-predicted position to 1.
TABLE 3
Pixel position Filter coefficient
Upper left of (-dmv_x-dmv_y)×m
Left side of -dmv_x×n
Left lower part (-dmv_x+dmv_y)×m
On the upper part -dmv_y×n
Center of a ship 1
Lower part dmv_y×n
Upper right part (dmv_x-dmv_y)×m
Right side dmv_x×n
Lower right (dmv_x+dmv_y)×m
Where the scaling parameters m and n are typically fractional numbers or fractions, one possibility is that both m and n are powers of 2, such as 1/2, 1/4, 1/8, and the like. Here, the dmv _ x, dmv _ y are both the actual sizes thereof, i.e., 1 of the dmv _ x, dmv _ y represents a distance of 1 pixel, and the dmv _ x, dmv _ y are fractions or fractions.
It should be noted that, in the embodiment of the present application, compared to the existing 8-tap filter, the motion vectors of the integer pixel position and the sub-pixel position corresponding to the currently common 8-tap filter are non-negative in both horizontal and vertical directions, and both belong to 0 pixel to 1 pixel, i.e., dmv _ x, dmv _ y may not be negative. In the present application, the motion vectors for the integer pixel position and the sub-pixel position corresponding to the filter may be negative in both the horizontal and vertical directions, i.e., dmv _ x, dmv _ y may be negative.
For example, in the embodiment of the present application, if the ratio parameter m is 1/16 and n is 1/2, the above table 3 may be represented as the following table 4:
TABLE 4
Pixel position Filter coefficient
Upper left of (-dmv_x-dmv_y)/16
Left side of -dmv_x/2
Left lower part (-dmv_x+dmv_y)/16
On the upper part -dmv_y/2
Center of a ship 1
Lower part dmv_y/2
Upper right part (dmv_x-dmv_y)/16
Right side dmv_x/2
Lower right (dmv_x+dmv_y)/16
It will be appreciated that in embodiments of the present application, in video codec techniques and standards, magnification is typically used to avoid fractional, floating point operations, and then the result of the computation is reduced by an appropriate factor to obtain the correct result. A left shift is typically used for magnification and a right shift is typically used for reduction. Therefore, when performing the second prediction by the two-dimensional filter, the following form is written in practical use:
assuming that the motion vector deviation of the pixel position is (dmv _ x, dmv _ y), the left shift1 yields (dmv _ x ', dmv _ y'), and based on table 4 above, the coefficients of the two-dimensional filter can be represented as table 5 below:
TABLE 5
Pixel position Filter coefficient
Upper left of -dmv_x’-dmv_y’
Left side of -dmv_x’×8
Left lower part -dmv_x’+dmv_y’
On the upper part -dmv_y’×8
Center of a ship 16<<shift1
Lower part dmv_y’×8
Upper right part dmv_x’-dmv_y’
Right side dmv_x×8
Lower right dmv_x’+dmv_y’
Fig. 18 is a schematic diagram of a two-dimensional filter, as shown in fig. 18, based on the result of prediction based on sub-blocks as a quadratic prediction, and the light square is the integer pixel position of the filter, i.e. the position obtained based on prediction of sub-blocks. The circle is a sub-pixel position which needs to be subjected to secondary prediction, namely the position of a pixel position, the dark square is an integer pixel position corresponding to the sub-pixel position, and 9 integer pixel positions as shown in the figure are needed when the sub-pixel position is obtained through interpolation.
Fig. 19 is a diagram ii of a two-dimensional filter, as shown in fig. 19, based on the result of prediction based on sub-blocks as a quadratic prediction, and the light square is the integer pixel position of the filter, that is, the position obtained based on prediction of sub-blocks. The circle is a sub-pixel position which needs to be subjected to secondary prediction, namely the position of a pixel position, the dark square is an integer pixel position corresponding to the sub-pixel position, and when the sub-pixel position is obtained through interpolation, 13 integer pixel positions are needed as shown in the figure.
And step 205, determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block.
In the embodiment of the application, after the decoder determines the filter coefficient of the two-dimensional filter according to the motion vector deviation, the decoder can determine the second predicted value of the sub-block based on the filter coefficient and the first predicted value, so that secondary prediction of the sub-block by combining the motion vector deviation with the pixel position can be realized, and the correction of the first predicted value can be completed.
It is understood that, in the embodiment of the present application, the decoder determines the filter coefficient by using the motion vector deviation corresponding to the pixel position, so that the first prediction value can be corrected by the two-dimensional filter according to the filter coefficient to obtain the corrected second prediction value of the sub-block. It can be seen that the second predicted value is a corrected value based on the first predicted value.
Further, in the embodiment of the present application, when the decoder determines the second predicted value of the sub-block based on the filter coefficient and the first predicted value, the decoder may first perform multiplication on the filter coefficient and the first predicted value of the sub-block to obtain a product result, and after traversing all pixel positions in the sub-block, add the product results of all pixel positions of the sub-block to obtain a sum result, and finally perform normalization processing on the sum result, so as to finally obtain the second predicted value after the sub-block is corrected.
In the embodiment of the present application, before performing the quadratic prediction, the first predicted value of the sub-block where the pixel position is located is generally used as the predicted value before the correction of the pixel position, and therefore, when performing the filtering by the two-dimensional filter, the filter coefficient may be multiplied by the predicted value of the corresponding pixel position, that is, the first predicted value, and the multiplication results corresponding to each pixel position may be accumulated and then normalized.
It is understood that the decoder can perform normalization in various ways in the present application, for example, the filter coefficients can be multiplied by the predicted values of the corresponding pixel positions and then the accumulated results can be right-shifted by 4+ shift1 bits. Alternatively, the filter coefficients may be multiplied by the predicted values for the corresponding pixel locations and added to (1< < (3+ shift1)), and then shifted right by 4+ shift1 bits.
Therefore, in the present application, after obtaining the motion vector deviation corresponding to the pixel position inside the sub-block, for each sub-block and each pixel position in each sub-block, filtering may be performed by using a two-dimensional filter based on the first predicted value of the motion compensation of the sub-block according to the motion vector deviation, so as to complete the secondary prediction of the sub-block, and obtain a new second predicted value.
Further, in the embodiments of the present application, the two-dimensional filter may be understood as performing quadratic prediction using a plurality of adjacent pixel positions constituting the preset shape. The preset shape can be a rectangle, a rhombus or any symmetrical shape.
Specifically, in the embodiment of the present application, when performing secondary prediction using 9 adjacent pixel positions constituting a rectangle, the two-dimensional filter may first determine a prediction sample matrix of the current block and a motion vector bias set of the sub-block of the current block; wherein the motion vector deviation matrix comprises the motion vector deviations corresponding to all pixel positions; then, based on the 9 adjacent pixel positions forming a rectangle, a secondary predicted sample matrix of the current block is determined by using the predicted sample matrix and the motion vector bias set.
Illustratively, in the present application, if the width and height of the current block are width and height, respectively, the width and height of each sub-block are sub-width and sub-height, respectively. As shown in fig. 7, the sub-block where the top-left sample of the luma prediction sample matrix of the current block is located is a, the sub-block where the top-right sample is located is B, the sub-block where the bottom-left sample is located is C, and the sub-blocks where other positions are located are other sub-blocks.
For each sub-block in the current block, the motion vector disparity matrix for the sub-block can be dMv, then:
1. if the sub-block is A, dMv equals dMvA;
2. if the sub-block is B, dMv equals dMvB;
3. if the subblock is C and the control point motion vector group of the subblock has 3 motion vectors in mvAffinine, dMv is equal to dMvC;
4. dMv equals dMvN if the sub-block is other than A, B, C.
Further, assuming that (x, y) is the coordinate of the position of the upper left corner of the current sub-block, (i, j) is the coordinate of the pixel inside the luminance sub-block, i ranges from 0 to (subwidth-1), j ranges from 0 to (subweight-1), the prediction sample matrix based on the sub-block is predmatrix sb, and the prediction sample matrix of the quadratic prediction is predmatrix s, the prediction sample predmatrix of the quadratic prediction of (x + i, y + j) can be calculated as follows [ x + i ] [ y + j ]:
PredMatrixS[x+i][y+j]=
(UPLEFT(x+i,y+j)×(-dMv[i][j][0]-dMv[i][j][1])+
UP(x+i,y+j)×((-dMv[i][j][1])<<3)+
UPRIGHT(x+i,y+j)×(dMv[i][j][0]-dMv[i][j][1])+
LEFT(x+i,y+j)×((-dMv[i][j][0])<<3)+
CENTER(x+i,y+j)×(1<<15)+
RIGHT(x+i,y+j)×(dMv[i][j][0]<<3)+
DOWNLEFT(x+i,y+j)×(-dMv[i][j][0]+dMv[i][j][1])+
DOWN(x+i,y+j)×(dMv[i][j][1]<<3)+
DOWNRIGHT(x+i,y+j)×(dMv[i][j][0]+dMv[i][j][1])+
(1<<10))>>11
PredMatrixS[x+i][y+j]=Clip3(0,(1<<BitDepth)-1,PredMatrixS[x+i][y+j])。
wherein UPLEFT (x + i, y + j) ═ predMatrix [ x + i-1<00: x + i-1] [ y + j-1<00: y + j-1]
UP(x+i,y+j)=predMatrix[x+i][y+j-1<00:y+j-1]
UPRIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j-1<00:y+j-1]
LEFT(x+i,y+j)=predMatrix[x+i-1<00:x+i-1][y+j]
CENTER(x+i,y+j)=predMatrix[x+i][y+j]
RIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j]
DOWNLEFT(x+i,y+j)=predMatrix[x+i-1<00:x+i-1][y+j+1<00:y+j-1]
DOWN(x+i,y+j)=predMatrix[x+i][y+j-1<00:y+j-1]
DOWNRIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j+1>height-1height-1:y+j+1]。
It is understood that, in the embodiment of the present application, the CENTER (x + i, y + j) pixel position may be a central position among the 9 adjacent pixel positions constituting the rectangle, and then the quadratic prediction process may be performed based on the (x + i, y + j) pixel position and the other 8 pixel positions adjacent thereto. Specifically, the other 8 pixel positions are UP (UP), uptight (UP RIGHT), LEFT (LEFT), RIGHT (RIGHT), DOWN (DOWN LEFT), DOWN (DOWN), DOWN RIGHT, and UP LEFT, respectively.
It should be noted that, in the present application, the precision of the calculation formula of predmatrix s [ x + i ] [ y + j ] may use a lower precision. For example, right-shifted terms for each multiplication, such as dMv [ i ] [ j ] [0] and dMv [ i ] [ j ] [1] right-shifted by 3 bits, and accordingly, 1< <15 becomes 1< < (15-shift3), … + (1< <10)) > >11 becomes … + (1< < (10-shift3))) > (11-shift 3).
For example, the size of the motion vector may be limited to a reasonable range, and the positive and negative values of the motion vector in the horizontal direction and the vertical direction used as above do not exceed 1 pixel or 1/2 pixels or 1/4 pixels.
It can be understood that, if the prediction reference mode of the current block is 'Pred _ List 01', the decoder averages a plurality of prediction sample matrices of respective components to obtain a final prediction sample matrix of the component. For example, a new luma prediction sample matrix is obtained by averaging 2 luma prediction sample matrices.
Further, in the embodiment of the present application, after obtaining the prediction sample matrix of the current block, if the current block has no transform coefficient, the prediction matrix is used as the decoding result of the current block, and if the current block has transform coefficients, the transform coefficient may be decoded first, and a residual matrix is obtained through inverse transform and inverse quantization, and the residual matrix is added to the prediction matrix to obtain the decoding result.
In summary, with the inter prediction methods proposed in steps 201 to 205, after the prediction based on the sub-block, the point-based secondary prediction is performed on the basis of the prediction based on the sub-block for the pixel position where the motion vector deviates from the motion vector of the sub-block. The point-based quadratic prediction uses information of a plurality of points constituting a preset shape such as a rectangle, a rhombus, or the like. Point-based quadratic prediction uses a two-dimensional filter. The two-dimensional filter is a filter composed of adjacent dots constituting a preset shape. The adjacent dots constituting the preset shape may be 9 dots. For a pixel location, the result of the filter processing is a new prediction for that location.
The inter-frame prediction method provided by the application is suitable for the situation that the motion vector of the pixel position deviates from the motion vector of the sub-block, namely, the decoder performs secondary prediction because the pixel position is not completely predicted after prediction based on the sub-block. If the motion vector of the pixel position does not deviate from the motion vector of the sub-block, the second predictor predicted by using the inter prediction method proposed by the present application should be the same as the first predictor before using the inter prediction method proposed by the present application. In the present application, when there is no deviation between the motion vector of the pixel position and the motion vector of the sub-block, the inter prediction method proposed in the present application may be selected not to be used for the pixel position.
The inter-frame prediction method provided by the application can be applied to the situation that the motion vectors of all pixel positions deviate from the motion vectors of the sub-blocks, including the situation that the motion vector of each pixel position can be calculated by an affine prediction model, but only the prediction based on the sub-blocks is carried out, namely the motion vectors of all pixel positions of the current block can be calculated, and the motion vectors of all pixel positions of the current block are not the same as the motion vectors of the sub-blocks where the current block is located. It is also applicable to the case where motion vectors for not all pixel positions can be calculated according to a model, but when motion vectors of some pixel positions inside the current block, such as a sub-block, change and are different from motion vectors of other positions, the motion vectors between adjacent portions, such as sub-blocks, having different motion vectors may change continuously, assuming that the contents change continuously at the time point of the current frame, and these motion vectors deviate from the original motion vectors after recalculating the motion vectors of some relevant pixel positions according to a calculation model or the like. That is, the inter-frame prediction method proposed in the present application may be applied to affine prediction or other scenes, and is not limited to the application scene.
The two-dimensional filter used by the inter-frame prediction method provided by the application can be a rectangular filter, a diamond filter or a filter with other shapes. Illustratively, a two-dimensional filter is a filter consisting of 9 adjacent pixel locations that form a rectangle. The pixel position at the center of the 9 pixel positions is the pixel position that needs to be predicted currently.
It should be noted that the inter-frame prediction method proposed in the present application may be applied to any image component, and in the present embodiment, a quadratic prediction scheme is exemplarily used for the luminance component, but may also be applied to the chrominance component, or any component in other formats. The inter-frame prediction method proposed in the present application can also be applied to any video format, including but not limited to YUV format, including but not limited to the luma component of YUV format.
The embodiment provides an inter-frame prediction method, wherein a decoder analyzes a code stream to obtain a prediction mode parameter of a current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block based on the first motion vector, and a motion vector offset between the pixel location and the sub-block; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block. That is, the inter prediction method proposed in the present application may perform point-based secondary prediction on the first prediction value based on the sub-block for the pixel position where the motion vector deviates from the motion vector of the sub-block after prediction based on the sub-block, to obtain the second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.
Based on the foregoing embodiment, in yet another embodiment of the present application, fig. 20 is a schematic diagram illustrating an implementation flow of an inter prediction method, as shown in fig. 20, a method for a decoder to perform inter prediction may include the following steps:
step 301, predicting the sub-block according to the first motion vector of the sub-block to obtain a first predicted value.
Step 302, determining the motion vector deviation of each position in the sub-block and the sub-block.
And 303, filtering the first predicted value by using a two-dimensional filter according to the motion vector deviation of each position to obtain a second predicted value.
That is, the inter-prediction method proposed in the present application may perform point-based secondary prediction on a pixel position where a motion vector deviates from a motion vector of a sub-block after prediction based on the sub-block, based on the prediction based on the sub-block, and finally complete correction of the first predicted value to obtain a new predicted value, that is, a second predicted value.
Specifically, point-based quadratic prediction uses a two-dimensional filter. The two-dimensional filter is a filter composed of adjacent dots constituting a preset shape. The adjacent dots constituting the preset shape may be 9 dots. For a pixel location, the result of the filter processing is a new prediction for that location. The filter coefficient of the two-dimensional filter is determined by the motion vector deviation of each position, the input of the two-dimensional filter is a first predicted value, and the output of the two-dimensional filter is a second predicted value.
Further, in the present application, if the affine mode is used for the current block in the present application, the motion vector of the control point needs to be determined first. Fig. 21 is a flowchart illustrating a third implementation of the inter prediction method, and as shown in fig. 21, the method for the decoder to perform inter prediction may include the following steps:
step 304, determining a second motion vector of the control point.
Step 305, determining a first motion vector of the sub-block according to the second motion vector.
And step 301, predicting the sub-block according to the first motion vector of the sub-block to obtain a first predicted value.
Step 302, determining the motion vector deviation of each position in the sub-block and the sub-block.
And 303, filtering the first predicted value by using a two-dimensional filter according to the motion vector deviation of each position to obtain a second predicted value.
As can be seen from this, in the embodiment of the present application, after the motion vector of the control point is determined, the sub-block may be subjected to prediction processing by using the motion vector of the control point, and after the deviation between the pixel position in the sub-block and the motion vector of the sub-block is determined, point-based secondary prediction may be performed on the pixel position where the motion vector deviates from the motion vector of the sub-block based on prediction based on the sub-block, and finally, the correction of the first predicted value is completed, and a new predicted value, that is, the second predicted value is obtained.
Further, in the present application, the two steps of determining the first motion vector of the sub-block and determining the deviation of each position within the sub-block from the motion vector of the sub-block may also be performed simultaneously. Fig. 22 is a flowchart illustrating a fourth implementation of the inter prediction method, and as shown in fig. 22, the method for the decoder to perform inter prediction may include the following steps:
step 304, determining a second motion vector of the control point.
Step 306, determining a first motion vector of the sub-block according to the second motion vector, and a motion vector deviation of each position in the sub-block from the sub-block.
Step 301, predicting the sub-block according to the first motion vector of the sub-block to obtain a first predicted value.
And 303, filtering the first predicted value by using a two-dimensional filter according to the motion vector deviation of each position to obtain a second predicted value.
It can be seen that, in the embodiment of the present application, after the motion vector of the control point is determined, the first motion vector of the sub-block and the deviation of each position in the sub-block from the motion vector of the sub-block may be determined at the same time, then, for the pixel position where the motion vector deviates from the motion vector of the sub-block, the point-based secondary prediction is performed on the basis of the prediction based on the sub-block, and finally, the correction of the first prediction value is completed to obtain a new prediction value, that is, the second prediction value.
It should be noted that, in the embodiment of the present application, because the affine model can explicitly calculate the motion vector of each pixel position, or the deviation between each pixel position in the sub-block and the sub-block motion vector, the inter-frame prediction method provided in the present application can be used to improve affine prediction, and of course, the method can also be applied to improve other sub-block-based predictions.
Further, the inter-frame prediction method proposed in the present application may be based on the AVS3 standard, and may also be applied to the VVC standard, which is not specifically limited in the present application.
For example, in the present application, sequence header information of a code stream is analyzed, and whether the quadratic prediction method proposed in the present application is used when the sequence is decoded is determined. Generally, the control level for the quadratic prediction method is of the sequence level, and if so, it is determined as described above. If the control level of the secondary prediction method is image level, that is, whether the secondary prediction method provided by the application is used when decoding is required to be judged for each frame of image, then when decoding each frame of image, the image head information of the code stream is analyzed, and whether the technology is used when decoding the frame of image is judged. If the technology is used for the current sequence or the current frame image, the coding unit corresponding to the current sequence or the current frame image meeting the condition uses the quadratic prediction method provided by the application. In the present embodiment, the technique uses the present technique for an encoding unit that uses an affine mode in decoding with an affine mode, that is, an encoding unit that uses an affine mode in a current sequence or a current frame image, respectively. Affine mode is sequence level controlled, whether the sequence uses affine mode requires parsing the information of the sequence header, the text describing the sequence header definition as shown in table 6:
TABLE 6
Figure PCTCN2021106081-APPB-000004
The affine motion compensation enable flag affine _ enable _ flag is a binary variable. A value of '1' indicates that affine motion compensation can be used; a value of '0' indicates that affine motion compensation should not be used. The value of AffiniEnablFlag is equal to the value of affine _ enable _ flag.
Further, if sequence level control is used, a sequence header definition as shown in table 7 below may be added to the text:
TABLE 7
Figure PCTCN2021106081-APPB-000005
The secondary _ pred _ enable _ flag is a binary variable. A value of '1' indicates that quadratic prediction can be used; a value of '0' indicates that no quadratic prediction should be used. The value of sencondardarypredenableflag is equal to the value of secondary _ pred _ enable _ flag.
Further, if the quadratic prediction method proposed in the present application is only used for the affine mode, the sequence header definition as shown in the following table 8 can also be added to the text:
TABLE 8
Figure PCTCN2021106081-APPB-000006
Figure PCTCN2021106081-APPB-000007
Further, in AVS3, the size of the sub-block of affine prediction can be controlled at the image level using either 4x4 or 8x8, the text describing inter-prediction header definitions as shown in table 9 below:
TABLE 9
Figure PCTCN2021106081-APPB-000008
The affine predictor block size flag affine _ sublock _ size _ flag is a binary variable. A value of '0' indicates that the current image affine predictor block has a minimum size of 4x 4; a value of '1' indicates a minimum size of 8x 8. The value of affinesblocksizeflag is equal to the value of affine _ sublock _ size _ flag. If affine _ sub _ size _ flag is not present in the bitstream, the value of AffiniBanbblockSizeFlag is equal to 0.
When decoding the current block, it needs to be parsed whether it uses the affine mode. In the AVS3, if the current block uses the affine mode, it is also necessary to determine whether the 4-parameter (2-control-point) mode or the 6-parameter (3-control-point) mode, the prediction reference mode, and the motion information of the respective control points are used. The prediction reference mode includes a prediction reference mode 'Pred _ List 0' referring to only the reference frame List0, a prediction reference mode 'Pred _ List 1' referring to only the reference frame List1, a prediction reference mode 'Pred _ List 01' referring to the reference frame List0 and the reference frame List1, whether a 4-parameter (2 control point) mode or a 6-parameter (3 control point) mode is used, a prediction reference mode, and motion information of respective control points have different determination methods in different modes (direct mode, skip mode, normal mode).
The affine mode flag affine _ flag is a binary variable. A value of '1' indicates that the current block is affine mode; a value of '0' indicates not an affine pattern. The value of AffiniFlag is equal to the value of affine _ flag. If affine _ flag is not present in the bitstream, the value of AffiniFlag is 0.
Affine motion vector index cu _ affine _ cand _ idx, affine mode index value in skip mode or direct mode. The value of AffiniCdIdx is equal to the value of cu _ affine _ cand _ idx. If cu _ affine _ cand _ idx is not present in the bitstream, the value of AffiniClandIdx is equal to 0.
And the affine self-adaptive motion vector precision index affine _ amvr _ index is used for determining the affine motion vector precision of the coding unit. The value of AffiniAmvrIndex equals the value of affine _ amvr _ index. If the affine _ amvr _ index is not present in the bitstream, the value of AffiniAmvrIndex is equal to 0.
The affine inter mode L0 motion vector horizontal component difference absolute value mv _ diff _ x _ abs _ L0_ affine, the affine inter mode L0 motion vector vertical component difference absolute value mv _ diff _ y _ abs _ L0_ affine, the absolute value of the motion vector difference value from the reference image queue 0. MvDiffXAbsL0 Affinine equals the value of mv _ diff _ x _ abs _ l0_ Affine, and MvDiffYAbsAffinine equals the value of mv _ diff _ y _ abs _ l0_ Affine.
The affine inter-mode L0 motion vector horizontal component difference sign value mv _ diff _ x _ sign _ L0_ affine, the affine inter-mode L0 motion vector vertical component difference sign value mv _ diff _ y _ sign _ L0_ affine, the sign bit of the motion vector difference value from the reference picture queue 0. The value of MvDiffXSignL0 Affinine is equal to the value of mv _ diff _ x _ sign _ l0_ Affine, and the value of MvDiffYSignL0 Affinine is equal to mv _ diff _ y _ sign _ l0_ Affine. If mv _ diff _ x _ sign _ l0_ Affine or mv _ diff _ y _ sign _ l0_ Affine is not present in the bitstream, then MvDiffXSignL0 Afffine or MvDiffYSignL0 Afffine has a value of 0. If the value of MvDiffXSignL0 Affinine is 0, MvDiffXL0 Affinine is equal to MvDiffXAbsL0 Affinine; if MvDiffXSignL0 Afffine has a value of 1, MvDiffXL0 Afffine is equal to-MvDiffXAbsL 0 Afffine. If MvDiffYSignL0 Afffine has a value of 0, MvDiffYL0 Afffine equals MvDiffYAbsL0 Afffine; MvdDiffYSignL 0 Affinine is equal to-MvdDiffYAbsL 0 Affinine if MvdDiffYSignL 0 Affinine has a value of 1. The value ranges of MvDiffXL0 Affinine and MvDiffYL0 Affinine are-32768-32767.
The affine inter-mode L1 motion vector horizontal component difference absolute value mv _ diff _ x _ abs _ L1_ affine, the affine inter-mode L1 motion vector vertical component difference absolute value mv _ diff _ y _ abs _ L1_ affine, the absolute value of the motion vector difference value from the reference image queue 1. MvDiffXAbsL1 Affinine equals the value of mv _ diff _ x _ abs _ l1_ Affine, and MvDiffYAbsAffinine equals the value of mv _ diff _ y _ abs _ l1_ Affine.
The affine inter-mode L1 motion vector horizontal component difference sign value mv _ diff _ x _ sign _ L1_ affine, the affine inter-mode L1 motion vector vertical component difference sign value mv _ diff _ y _ sign _ L1_ affine, the sign bit of the motion vector difference value from the reference image queue 1. The value of MvDiffXSignL1 Affinine is equal to the value of mv _ diff _ x _ sign _ l1_ Affine, and the value of MvDiffYSignL1 Affinine is equal to mv _ diff _ y _ sign _ l1_ Affine. If mv _ diff _ x _ sign _ l1_ Affine or mv _ diff _ y _ sign _ l1_ Affine is not present in the bitstream, then the value of MvDiffXSignL1 Afffine or MvDiffYSignL1 Afffine is 0. If the value of MvDiffXSignL1 Affinine is 0, MvDiffXL1 Affinine is equal to MvDiffXAbsL1 Affinine; if MvDiffXSignL1 Affinine has a value of 1, MvDiffXL1 Affinine is equal to-MvDiffXAbsL 1 Affinine. MvDiffYSignL1 Afffine equals MvDiffYAbsL1 Afffine if MvDiffYSignL1 Afffine has a value of 0; if MvDiffYSignL1 Affinine has a value of 1, MvDiffYL0 Affinine is equal to-MvDiffYAbsL 1 Affinine. The value ranges of MvDiffXL1 Affinine and MvDiffYL1 Affinine are-32768-32767.
Further, in one embodiment of the present application, if the current block uses an affine mode, a control point mode of the affine mode, a prediction reference mode, and sub-block size parameters are also determined, a motion vector of each sub-block can be derived using the affine model.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 0', mvs _ L0 (the affine control point motion vector group corresponding to the reference frame List 0) is used as the affine control point motion vector group, and an L0 motion vector set (denoted as MvArrayL L0) of the current prediction unit is obtained by a method of derivation of an affine motion unit sub-block motion vector array. The MvArrayL0 set consists of L0 motion vectors for all luma predictor blocks.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 1', mvs _ L1 (the affine control point motion vector group corresponding to the reference frame List 1) is used as the affine control point motion vector group, and an L1 motion vector set (denoted as MvArrayL L1) of the current prediction unit is obtained by a method of derivation of an affine motion unit sub-block motion vector array. MvArrayL1 consists of L1 motion vectors for all luma predictor blocks.
If the prediction reference mode of the current predictor (here, the predictor is equal to the coding block) is 'Pred _ List 01', the L0 motion vector set (denoted MvArrayL L0) and the L1 motion vector set (denoted MvArrayL L1) of the current predictor are obtained by using mvs _ L0 and mvs _ L1 as affine control point motion vector sets and by a method defined as derivation of "affine motion unit subblock motion vector array", respectively. MvArrayL L0 and MvArrayL L1 consist of L0 motion vectors and L1 motion vectors, respectively, of all luma predictor blocks.
Next, a matrix of the chroma motion vector may be further obtained, and affine luma sample interpolation and affine chroma sample interpolation may obtain a luma prediction sample matrix of an affine mode, and a chroma prediction sample matrix.
If the current sequence or current frame image does not use the present technique, if the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is ' Pred _ List0 ' or Pred _ List1 ', the obtained luma prediction sample matrix and chroma prediction sample matrix are the prediction sample matrix of the current prediction unit, if the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is ' Pred _ List01 ', the obtained 2 luma prediction sample matrices are averaged to obtain the luma prediction sample matrix of the current prediction unit, and for each chroma component, the obtained 2 groups (2 in each group, 4 in total) of chroma prediction sample matrices are averaged to obtain the chroma prediction sample matrix of the current prediction unit.
If the current sequence or the current frame image uses the technology, a pixel motion vector deviation matrix in the affine motion unit sub-block is derived:
if the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the encoding block) is 'Pred _ List 0', mvs _ L0 (the affine control point motion vector group corresponding to the reference frame List 0) is taken as the affine control point motion vector group, and the 4-seed intra block intra motion vector bias matrix motion vector set dMvA _ L0, dMvB _ L0, dMvC _ L0, dMvN _ L0 of L0 of the current prediction unit is obtained by a method of derivation of the affine motion unit sub-block intra pixel motion vector bias matrix. The derived dMvA, dMvB, dMvC, dMvN through the affine motion unit sub-block intra pixel motion vector bias matrix are then taken as dMvA _ L0, dMvB _ L0, dMvC _ L0, dMvN _ L0, respectively.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 1', mvs _ L1 (the affine control point motion vector group corresponding to reference frame List 1) is taken as the affine control point motion vector group, and the 4-seed intra block intra motion vector bias matrix motion vector set dMvA _ L1, dMvB _ L1, dMvC _ L1, dMvN _ L1 of L1 of the current prediction unit is obtained by the method of derivation of the affine motion unit sub-block intra pixel motion vector bias matrix. The derived dMvA, dMvB, dMvC, dMvN through the affine motion unit sub-block intra pixel motion vector bias matrix are then taken as dMvA _ L1, dMvB _ L1, dMvC _ L1, dMvN _ L1, respectively.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 01', mvs _ L0 and mvs _ L1 are taken as affine control point motion vector groups, respectively, and the 4-seed intra block intra motion vector bias matrix motion vector set dMvA _ L0, dMvB _ L0, dMvC _ L0, dMvN _ L0 and L1 of the current prediction unit are obtained by a method of derivation of the pixel motion vector bias matrix inside the affine motion unit sub-block.
Further, in the present application, the derivation of the motion vector deviation matrix of the pixels inside the sub-block of affine motion unit may be performed according to the following method:
if there are 3 motion vectors in the affine control point motion vector group, the motion vector group is represented as mvsAffine (mv0, mv1, mv 2); otherwise (there are 2 motion vectors in the affine control point motion vector group), the motion vector group is denoted as mvsAffine (mv0, mv 1).
1. Calculating variables dHorX, dVerX, dHorY and dVerY:
dHorX=(mv1_x-mv0_x)<<(7-Log(width))
dHorY=(mv1_y-mv0_y)<<(7-Log(width))
if there are 3 motion vectors in mvsAffine:
dVerX=(mv2_x-mv0_x)<<(7-Log(height))
dVerY=(mv2_y-mv0_y)<<(7-Log(height))
otherwise (2 motion vectors in mvaffine):
dVerX=-dHorY
dVerY=dHorX。
as in fig. 7, it is assumed that the width and height of the pre-prediction unit are width and height, respectively, and the width and height of each subblock are sub-width and sub-height, respectively. The subblock where the top left corner sample of the luma prediction block of the current prediction unit is located is A, the subblock where the top right corner sample is located is B, the subblock where the bottom left corner sample is located is C, and the subblocks where other positions are located are other subblocks.
(i, j) is the coordinate of the pixel inside the luminance sub-block, the range of i is 0 to (sub-width-1), the range of j is 0 to (sub-height-1), the motion vector deviation of each pixel (i, j) position inside the 4 kinds of luminance predictor blocks is calculated:
2.1, if the current sub-block is A, (i, j) the motion vector deviation dMvA [ i ] [ j ] of the pixel:
dMvA[i][j][0]=dHorX×i+dVerX×j
dMvA[i][j][1]=dHorY×i+dVerY×j;
2.2, if the current sub-block is B, (i, j) the motion vector deviation dMvB [ i ] [ j ] of the pixel:
dMvB[i][j][0]=dHorX×(i-subwidth)+dVerX×j
dMvB[i][j][1]=dHorY×(i-subwidth)+dVerY×j;
2.3, if the current sub-block is C and there are 3 motion vectors in mvAffinine, (i, j) the motion vector deviation dMvC [ i ] [ j ] of the pixel:
dMvC[i][j][0]=dHorX×i+dVerX×(j-subheight)
dMvC[i][j][1]=dHorY×i+dVerY×(j-subheight;
2.4 (i, j) motion vector deviation dMvN [ i ] [ j ] of pixel:
dMvN[i][j][0]=dHorX×(i–(subwidth>>1))+dVerX×(j–(subheight>>1))
dMvN[i][j][1]=dHorY×(i–(subwidth>>1))+dVerY×(j–(subheight>>1))。
where dMvX [ i ] [ j ] [0] represents the value of the horizontal component and dMvX [ i ] [ j ] [1] represents the value of the vertical component. X is A, B, C or N.
Further, in the present application, after deriving a motion vector deviation matrix of pixels inside the affine motion unit sub-block, for each pixel position, based on a first predicted value of motion compensation of the sub-block, filtering is performed using a two-dimensional filter to obtain a new second predicted value according to the motion vector deviation. The inter prediction technique is used for the luma component in this embodiment, however, the inter prediction technique may be used for the chroma component, or any other format of component.
Specifically, in the embodiment of the present application, when performing secondary prediction by using 9 adjacent pixel positions constituting a rectangle, the two-dimensional filter may first determine a prediction sample matrix of the current block and a motion vector bias set of a sub-block of the current block; the motion vector deviation matrix comprises motion vector deviations corresponding to all pixel positions; a quadratic predicted sample matrix for the current block is then determined based on the 9 neighboring pixel positions that form the rectangle, using the prediction sample matrix and the set of motion vector biases.
Illustratively, in the present application, the process of the two-dimensional filter performing quadratic prediction using 9 adjacent pixel positions constituting a rectangle is as follows:
if the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 0', the luma prediction sample matrix Pred matrix xl0 is predmatrix sb, and predmatrix s is obtained as a new luma prediction sample matrix predmatrix l0 by a prediction matrix derivation method of affine quadratic prediction.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 1', the luma prediction sample matrix predmatrix l1 is predmatrix sb, and predmatrix s is obtained as a new luma prediction sample matrix predmatrix l1 by a prediction matrix derivation method of affine quadratic prediction.
If the prediction reference mode of the current prediction unit (here, the prediction unit is equal to the coding block) is 'Pred _ List 01', the luma prediction sample matrices predmatrix 0 and predmatrix l1 are respectively used as predmatrix sb, and predmatrix s is obtained as new luma prediction sample matrices predmatrix l0 and predmatrix l1 by a method of prediction matrix derivation by affine quadratic prediction.
Further, in the present application, the prediction matrix derivation for affine quadratic prediction can be performed according to the following method:
as in fig. 7, it is assumed that the width and height of the current block are width and height, respectively, and the width and height of each sub-block are sub-width and sub-height, respectively. A subblock where an upper-left sample of the current block brightness prediction sample matrix is located is A, a subblock where an upper-right sample is located is B, a subblock where a lower-left sample is located is C, and subblocks where other positions are located are other subblocks.
For each sub-block in the current block, the motion vector disparity matrix for the sub-block is dMv. The following operations are performed:
1. if the sub-block is A, dMv equals dMvA;
2. if the sub-block is B, dMv equals dMvB;
3. dMv equals dMvC if the subblock is C and there are 3 motion vectors in mvAffine;
4. dMv is equal to dMvN.
(x, y) is the coordinate of the position of the upper left corner of the subblock, (i, j) is the coordinate of the pixel inside the luminance subblock, (i) has a value range of 0 to (subwidth-1), j has a value range of 0 to (subweight-1), the prediction sample matrix based on the subblock is predmatrix sb, the prediction sample matrix for the quadratic prediction is predmatrix s, and the prediction sample for the quadratic prediction of (x + i, y + j) [ x + i ] [ y + j ] is calculated as follows:
PredMatrixS[x+i][y+j]=(UPLEFT(x+i,y+j)×(-dMv[i][j][0]-dMv[i][j][1])+UP(x+i,y+j)×((-dMv[i][j][1])<<3)+
UPRIGHT(x+i,y+j)×(dMv[i][j][0]-dMv[i][j][1])+
LEFT(x+i,y+j)×((-dMv[i][j][0])<<3)+
CENTER(x+i,y+j)×(1<<15)+
RIGHT(x+i,y+j)×(dMv[i][j][0]<<3)+
DOWNLEFT(x+i,y+j)×(-dMv[i][j][0]+dMv[i][j][1])+
DOWN(x+i,y+j)×(dMv[i][j][1]<<3)+
DOWNRIGHT(x+i,y+j)×(dMv[i][j][0]+dMv[i][j][1])+
(1<<10))>>11
PredMatrixS[x+i][y+j]=Clip3(0,(1<<BitDepth)-1,PredMatrixS[x+i][y+j])。
wherein UPLEFT (x + i, y + j) ═ predMatrix [ x + i-1<00: x + i-1] [ y + j-1<00: y + j-1]
UP(x+i,y+j)=predMatrix[x+i][y+j-1<00:y+j-1]
UPRIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j-1<00:y+j-1]
LEFT(x+i,y+j)=predMatrix[x+i-1<00:x+i-1][y+j]
CENTER(x+i,y+j)=predMatrix[x+i][y+j]
RIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j]
DOWNLEFT(x+i,y+j)=predMatrix[x+i-1<00:x+i-1][y+j+1<00:y+j-1]
DOWN(x+i,y+j)=predMatrix[x+i][y+j-1<00:y+j-1]
DOWNRIGHT(x+i,y+j)=predMatrix[x+i+1>width-1width-1:x+i+1][y+j+1>height-1height-1:y+j+1]。
It is understood that, in the embodiment of the present application, the CENTER (x + i, y + j) pixel position may be a central position among the 9 adjacent pixel positions constituting the rectangle, and then the quadratic prediction process may be performed based on the (x + i, y + j) pixel position and the other 8 pixel positions adjacent thereto. Specifically, the other 8 pixel positions are UP (UP), UP (UP RIGHT), LEFT (LEFT), RIGHT, DOWN (DOWN LEFT), DOWN (DOWN), DOWN RIGHT, and UP LEFT, respectively.
It should be noted that, in the embodiment of the present application, when filtering each pixel position in the current block, the value of the corresponding pixel position used by the two-dimensional filter is the predicted value of the above prediction based on the sub-block, and the filtered result is a new predicted value obtained by the present technology. Specifically, according to the shape (e.g., rectangle, diamond) of the two-dimensional filter, the pixel positions that the two-dimensional filter needs to use may exceed the boundary of the current block. FIG. 23 is a diagram illustrating filtering of boundary pixel positions, and as shown in FIG. 23, taking the 9-point rectangular filter as an example, if a pixel position to be filtered is located on the boundary of a current block, a portion of pixel positions corresponding to the filter when filtering this point will exceed the boundary of the current block.
In view of the above problems, the present application proposes a solution, specifically, to expand the boundary of the current block based on the prediction of the sub-block of the current block, so that the pixel position required by the two-dimensional filter does not exceed the boundary of the expanded block. That is, if a pixel position to be used by the two-dimensional filter does not belong to the current block, the current block may be subjected to the expansion process according to the boundary position of the current block. Fig. 24 is an expansion diagram of a boundary, as shown in fig. 24, taking the 9-point rectangular filter as an example, the current block based on the prediction of the sub-block needs to be expanded by an upper boundary, a lower boundary, a left boundary, a right boundary, and a left boundary, respectively, by a simple expansion method, a pixel expanded on the left side copies a pixel value of the left boundary corresponding to the horizontal direction, a pixel expanded on the right side copies a pixel value of the right boundary corresponding to the horizontal direction, a pixel expanded on the upper side copies a pixel value of the upper boundary corresponding to the vertical direction, a pixel expanded on the lower side copies a pixel value of the lower boundary corresponding to the vertical direction, and pixel values of vertices corresponding to the four expanded vertices can be copied.
It is understood that the method for ensuring that the pixel position required by the two-dimensional filter does not exceed the boundary of the expanded block through the expansion process can be applied to two-dimensional filters with other shapes, and is not particularly limited.
In view of the above problems, the present application also provides a solution, where if a pixel position corresponding to a certain position of a two-dimensional filter exceeds a boundary of a current block, the pixel position corresponding to the certain position of the two-dimensional filter is adjusted to be a position within the current block. That is, if a pixel location to be used by the two-dimensional filter does not belong to the current block, the pixel location within the current block may be substituted for the pixel location. For example, if the pixel position at the top left corner of the current block is (0, 0), the width of the current block is width, and the height of the current block is height, the range of the current block in the horizontal direction is 0 to (width-1), and the range of the current block in the vertical direction is 0 to (height-1). If the pixel position that the filter needs to use is (x, y), if x is less than 0, x is set to 0. If x is greater than width-1, then x is set to width-1. If y is less than 0, then y is set to 0, and if y is greater than height-1, then y is set to height-1.
It is understood that the method for ensuring that the pixel position required by the two-dimensional filter does not exceed the boundary of the expanded block by replacing the pixel position exceeding the current block with the pixel position within the current block may also be applied to two-dimensional filters of other shapes, and is not applied to be limited in particular.
The embodiment of the application provides an inter-frame prediction method, which can perform point-based secondary prediction on a pixel position of a motion vector deviated from a motion vector of a subblock on the basis of a first predicted value based on the subblock after prediction based on the subblock to obtain a second predicted value. The interframe prediction method provided by the application can be well suitable for all scenes, the coding performance is greatly improved, and the coding and decoding efficiency is improved.
The embodiment of the application provides an inter-frame prediction method, which is applied to a video coding device, namely an encoder. The functions implemented by the method may be implemented by the second processor in the encoder calling the computer program, although the computer program may be stored in the second memory, it is understood that the encoder comprises at least the second processor and the second memory.
Fig. 25 is a schematic view illustrating an implementation flow of the inter prediction method in fig. 25, and as shown in fig. 25, the method for the encoder to perform inter prediction may include the following steps:
step 401, determining a prediction mode parameter of the current block.
In an embodiment of the present application, an encoder may first determine a prediction mode parameter of a current block. Specifically, the encoder may first determine a prediction mode used by the current block and then determine corresponding prediction mode parameters based on the prediction mode. Wherein the prediction mode parameter may be used to determine a prediction mode used by the current block.
It should be noted that, in the embodiment of the present application, an image to be encoded may be divided into a plurality of image blocks, the image block to be encoded currently may be referred to as a current block, and an image block adjacent to the current block may be referred to as an adjacent block; i.e. in the image to be encoded, the current block has a neighboring relationship with the neighboring block. Here, each current block may include a first image component, a second image component, and a third image component; that is, the current block is an image block to be subjected to prediction of a first image component, a second image component or a third image component in the image to be coded.
Wherein, assuming that the current block performs the first image component prediction, and the first image component is a luminance component, that is, the image component to be predicted is a luminance component, then the current block may also be called a luminance block; alternatively, assuming that the current block performs the second image component prediction, and the second image component is a chroma component, that is, the image component to be predicted is a chroma component, the current block may also be referred to as a chroma block.
It should be noted that, in the embodiment of the present application, the prediction mode parameter indicates the prediction mode adopted by the current block and a parameter related to the prediction mode. Here, for the determination of the prediction mode parameter, a simple decision strategy may be adopted, such as determining according to the magnitude of the distortion value; a complex decision strategy, such as determination based on the result of Rate Distortion Optimization (RDO), may also be adopted, and the embodiment of the present application is not limited in any way. Generally, the prediction mode parameter of the current block may be determined in an RDO manner.
Specifically, in some embodiments, when determining the prediction mode parameter of the current block, the encoder may perform pre-coding processing on the current block by using multiple prediction modes to obtain a rate-distortion cost value corresponding to each prediction mode; and then selecting the minimum rate distortion cost value from the obtained multiple rate distortion cost values, and determining the prediction mode parameters of the current block according to the prediction mode corresponding to the minimum rate distortion cost value.
That is, on the encoder side, the current block may be pre-encoded in a plurality of prediction modes for the current block. Here, the plurality of prediction modes generally include an inter prediction mode, a conventional intra prediction mode, and a non-conventional intra prediction mode; the conventional Intra Prediction mode may include a Direct-Current (DC) mode, a PLANAR (PLANAR) mode, an angular mode, and the like, the non-conventional Intra Prediction mode may include a Matrix-based Intra Prediction (MIP) mode, a Cross-component Linear Model Prediction (CCLM) mode, an Intra Block Copy (IBC) mode, a plt (pattern) mode, and the like, and the inter Prediction mode may include a general inter Prediction mode, a GPM mode, an AWP mode, and the like.
Therefore, after the current block is pre-coded by utilizing a plurality of prediction modes, the rate distortion cost value corresponding to each prediction mode can be obtained; and then selecting the minimum rate distortion cost value from the obtained multiple rate distortion cost values, and determining the prediction mode corresponding to the minimum rate distortion cost value as the prediction mode parameter of the current block. In addition, after the current block is pre-coded by utilizing a plurality of prediction modes, a distortion value corresponding to each prediction mode can be obtained; then, the minimum distortion value is selected from the obtained distortion values, the prediction mode corresponding to the minimum distortion value is determined as the prediction mode used by the current block, and the corresponding prediction mode parameters are set according to the prediction mode. In this way, the determined prediction mode parameters are finally used for encoding the current block, and in the prediction mode, the prediction residual error can be smaller, and the encoding efficiency can be improved.
That is to say, on the encoding side, the encoder may select an optimal prediction mode to perform pre-encoding on the current block, and in this process, the prediction mode of the current block may be determined, and then a prediction mode parameter for indicating the prediction mode is determined, so that the corresponding prediction mode parameter is written into the code stream and transmitted to the decoder by the encoder.
Correspondingly, on the decoder side, the decoder can directly acquire the prediction mode parameters of the current block by analyzing the code stream, and determines the prediction mode used by the current block and the related parameters corresponding to the prediction mode according to the prediction mode parameters acquired by analyzing.
Step 402, determining a first motion vector of a sub-block of a current block when a prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block includes a plurality of sub-blocks.
In an embodiment of the present application, if the prediction mode parameter indicates that the current block determines an inter prediction value of the current block using an inter prediction mode, the encoder may first determine a first motion vector of each sub-block of the current block. Wherein a sub-block corresponds to a first motion vector.
It should be noted that, in the embodiment of the present application, the current block is an image block to be encoded in the current frame, the current frame is sequentially encoded in a certain order in the form of an image block, and the current block is an image block to be encoded in the current frame at the next moment in the order. The current block may have a variety of specification sizes, such as a specification of 16 × 16, 32 × 32, or 32 × 16, where the numbers represent the number of rows and columns of pixel points on the current block.
Further, in the embodiment of the present application, the current block may be divided into a plurality of sub-blocks, where the size of each sub-block is the same, and the sub-blocks are a set of pixels with a smaller specification. The size of the sub-blocks may be 8 × 8 or 4 × 4.
For example, in the present application, the size of the current block is 16 × 16, and the current block may be divided into 4 sub-blocks each having a size of 8 × 8.
It can be understood that, in the embodiment of the present application, in the case that the encoder determines that the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, the inter prediction method provided by the embodiment of the present application may be continuously employed.
In an embodiment of the present application, further, when the encoder determines the first motion vector of the sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, the affine mode parameter and the prediction reference mode of the current block may be determined. When the affine pattern parameter indicates that the affine pattern is used, the control point pattern and the sub-block size parameter are determined. Finally, the first motion vector may be determined according to the prediction reference mode, the control point mode, and the sub-block size parameter.
In an embodiment of the present application, after the encoder determines the prediction mode parameter, the encoder may determine the affine mode parameter and the prediction reference mode if the prediction mode parameter indicates that the current block determines the inter prediction value of the current block using the inter prediction mode.
It should be noted that, in the embodiment of the present application, an affine mode parameter is used to indicate whether to use an affine mode. Specifically, the affine mode parameter may be an affine motion compensation enable flag affine _ enable _ flag, and the encoder may further determine whether to use the affine mode by determining a value of the affine mode parameter.
That is, in the present application, the affine pattern parameter may be a binary variable. If the value of the affine mode parameter is 1, indicating to use the affine mode; if the value of the affine mode parameter is 0, indicating that the affine mode is not used.
For example, in the present application, the value of the affine mode parameter may be equal to the value of the affine motion compensation enable flag affine _ enable _ flag, and if the value of the affine _ enable _ flag is '1', it indicates that affine motion compensation may be used; if the value of affine _ enable _ flag is '0', it indicates that affine motion compensation should not be used.
Further, in embodiments of the present application, if the affine mode parameters determined by the encoder indicate that the affine mode is used, the encoder may proceed to obtain the control point mode and the sub-block size parameters.
In the embodiment of the present application, the control point mode is used to determine the number of control points. In the affine model, one sub-block may have 2 control points or 3 control points, and accordingly, the control point pattern may be a control point pattern corresponding to 2 control points or a control point pattern corresponding to 3 control points. I.e. the control point mode may comprise a 4-parameter mode and a 6-parameter mode.
It can be understood that, in the embodiment of the present application, for the AVS3 standard, if the current block uses the affine mode, the encoder needs to determine the number of control points in the affine mode of the current block, so as to determine whether to use the 4-parameter (2 control points) mode or the 6-parameter (3 control points) mode.
Further, in embodiments of the present application, if the affine mode parameters determined by the encoder indicate that affine mode is used, the encoder may further determine sub-block size parameters.
In particular, the subblock size parameter may be characterized by an affine predictor subblock size flag affine _ subblocklock _ size _ flag, and the encoder may indicate the subblock size parameter, i.e., indicate the size of the subblock of the current block, by setting the value of the subblock size flag. The size of the sub-block may be 8 × 8 or 4 × 4. Specifically, in the present application, the sub-block size flag may be a binary variable. If the value of the sub-block size flag is 1, indicating that the sub-block size parameter is 8 multiplied by 8; if the sub-block size flag takes a value of 0, it indicates that the sub-block size parameter is 4 × 4.
For example, in the present application, the value of the sub-block size flag may be equal to the value of an affine predictor block size flag affine _ sub-block _ size _ flag, and if the value of the affine _ sub-block _ size _ flag is '1', the current block is divided into sub-blocks with a size of 8 × 8; if the value of affine _ sub _ size _ flag is '0', the current block is divided into subblocks of size 4 × 4.
Further, in the embodiment of the present application, after determining the control point mode and the sub-block size parameter, the encoder may further determine the first motion vector of the sub-block in the current block according to the prediction reference mode, the control point mode, and the sub-block size parameter.
Specifically, in the embodiment of the present application, the encoder may first determine the control point motion vector group according to the prediction reference mode; a first motion vector for the sub-block may then be determined based on the set of control point motion vectors, the control point pattern, and the sub-block size parameter.
It will be appreciated that in embodiments of the present application, a set of control point motion vectors may be used to determine the motion vectors for the control points.
It should be noted that, in the embodiment of the present application, the encoder may traverse each sub-block in the current block according to the above method, and determine the first motion vector of each sub-block by using the control point motion vector group, the control point mode, and the sub-block size parameter of each sub-block, so that the motion vector set may be constructed and obtained according to the first motion vector of each sub-block.
It is to be understood that, in the embodiment of the present application, the first motion vector of each sub-block of the current block may be included in the motion vector set of the current block.
Further, in the embodiment of the present application, when determining the first motion vector according to the control point motion vector group, the control point mode, and the sub-block size parameter, the encoder may first determine a difference variable according to the control point motion vector group, the control point mode, and the size parameter of the current block; a sub-block location may then be determined based on the prediction mode parameter and the sub-block size parameter; finally, the first motion vector of the sub-block can be determined by using the difference variable and the position of the sub-block, and then a motion vector set of a plurality of sub-blocks of the current block can be obtained.
It should be noted that, in the present application, when determining the deviation between each position in the sub-block and the motion vector of the sub-block, if the current block uses an affine prediction model, the motion vector of each position in the sub-block can be calculated according to the formula of the affine prediction model, and the deviation can be obtained by subtracting the motion vector of the sub-block. If the motion vectors of the sub-blocks all select the same position of the motion vector within the sub-block, e.g. 4x4 block uses a position from the top left corner (2, 2) and 8x8 block uses a position from the top left corner (4, 4), the motion vector deviations of the same position of each sub-block are the same according to the affine model used in the present standard including VVC and AVS 3. But the lower left corner in the case of AVS at the upper left corner, upper right corner, and 3 control points (the a, B, C positions shown in figure 7 in the AVS3 text above) is different from the positions used by other blocks, and correspondingly different from other blocks when calculating the motion vector deviations for the sub-blocks at the upper left corner, upper right corner, and lower left corner in the case of 3 control points. The concrete is as shown in the embodiment.
Step 403, determining a first predicted value of the sub-block and a motion vector deviation between the pixel position and the sub-block based on the first motion vector; wherein, the pixel position is the position of the pixel point in the sub-block.
In an embodiment of the present application, the encoder may determine the first prediction value of the sub-block and the motion vector offset between the pixel position and the sub-block, respectively, based on the first motion vector of the sub-block after determining the first motion vector of each sub-block of the current block.
It is understood that, in the embodiment of the present application, step 403 may specifically include:
step 403a, determining a first predictor of the sub-block based on the first motion vector.
Step 403b, determining a motion vector deviation between the pixel position and the sub-block based on the first motion vector.
In this application, the order in which the encoder performs steps 403a and 403b is not limited by the inter prediction method provided in this embodiment, that is, in this application, after determining the first motion vector of each sub-block of the current block, the encoder may perform step 403a first and then step 403b, or may perform step 403b first and then step 403a, or may perform step 403a and step 403b simultaneously.
Further, in an embodiment of the present application, the encoder may first determine a sample matrix when determining the first predictor of the sub-block based on the first motion vector; wherein the sample matrix comprises a luma sample matrix and a chroma sample matrix; the first predictor may then be determined according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the set of motion vectors.
It should be noted that, in the embodiment of the present application, when determining the first predictor according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the motion vector set, the encoder may first determine a target motion vector from the motion vector set according to the prediction reference mode and the sub-block size parameter; then, a prediction sample matrix can be determined by utilizing a reference image queue corresponding to a prediction reference mode, the sample matrix of a reference index and the target motion vector; wherein the prediction sample matrix comprises the first prediction values of a plurality of sub-blocks.
Specifically, in an embodiment of the present application, the sample matrix may include a luma sample matrix and a chroma sample matrix, and accordingly, the prediction sample matrix determined by the encoder may include a luma prediction sample matrix and a chroma prediction sample matrix, wherein the luma prediction sample matrix includes a first luma predictor of the plurality of sub-blocks, the chroma prediction sample matrix includes a first chroma predictor of the plurality of sub-blocks, and the first luma predictor and the first chroma predictor constitute the first predictor of the sub-blocks.
It should be noted that, in the embodiment of the application, the luminance sample matrix in the sample matrix may be an 1/16 precision luminance sample matrix, and the chrominance sample matrix in the sample matrix may be a 1/32 precision chrominance sample matrix.
It is understood that in the embodiments of the present application, the reference image queues and reference indexes obtained by the encoder are not the same for different prediction reference modes.
Further, in the embodiment of the present application, when the encoder determines the sample matrix, a luminance interpolation filter coefficient and a chrominance interpolation filter coefficient may be obtained first; the luma sample matrix may then be determined based on the luma interpolation filter coefficients, while the chroma sample matrix may be determined based on the chroma interpolation filter coefficients.
Further, in embodiments of the present application, the encoder may determine a quadratic prediction parameter when based on a motion vector deviation between the pixel location and the sub-block; if the secondary prediction parameters indicate that secondary prediction is used, the encoder may determine the motion vector offset between the sub-block and each pixel location based on the difference variable.
It is to be understood that, in the embodiment of the present application, after determining the motion vector deviations between the sub-block and each pixel position based on the difference variable, the encoder may construct a motion vector deviation matrix corresponding to the sub-block by using all motion vector deviations corresponding to all pixel positions in the sub-block. It can be seen that the motion vector deviation matrix includes the motion vector deviation between the sub-block and any one of the internal pixel points, i.e. the motion vector deviation.
Further, in the embodiment of the present application, if the secondary prediction parameter determined by the encoder indicates that secondary prediction is not used, the encoder may select to directly use the first prediction value of the sub-block of the current block obtained in the above step 403a as the second prediction value of the sub-block without performing the processes of the following steps 404 and 405.
In particular, in embodiments of the present application, if the secondary prediction parameters indicate that secondary prediction is not used, the encoder may determine the second prediction value using the prediction sample matrix. Wherein the prediction sample matrix comprises the first prediction values of the plurality of sub-blocks, and the encoder may determine the first prediction value of the sub-block where the pixel position is located as a second prediction value of the encoder.
For example, in the present application, if the prediction reference mode of the current block takes a value of 0 or 1, i.e., using the first reference mode 'PRED _ List 0', or using the second reference mode 'PRED _ List 1', the first predictor of the sub-block where the pixel position is located may be selected directly from the prediction sample matrix including 1 luma prediction sample matrix (2 chroma prediction sample matrices), and determined as the inter predictor of the pixel position, i.e., the second predictor.
For example, in this application, if the prediction reference mode of the current block is 2, that is, the third reference mode 'PRED _ List 01' is used, an average operation may be performed on 2 luma prediction sample matrices (2 groups of 4 chroma prediction sample matrices) included in the prediction sample matrix to obtain 1 averaged luma prediction sample (2 averaged chroma prediction samples), and finally, a first predictor of a sub-block where a pixel position is located is selected from the averaged luma prediction samples (2 averaged chroma prediction samples), and the first predictor is determined as an inter predictor of the pixel position, that is, a second predictor.
Step 404, determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape.
In an embodiment of the present application, after determining the first prediction value of the sub-block and the motion vector deviation between the pixel position and the sub-block based on the first motion vector, respectively, the encoder may further determine the filter coefficient of the two-dimensional filter according to the motion vector deviation.
It should be noted that, in the embodiment of the present application, the filter coefficient of the two-dimensional filter is related to the motion vector bias corresponding to the pixel position. That is, if the corresponding motion vector deviations are different for different pixel positions, the filter coefficients of the two-dimensional filter used are also different.
Further, in the embodiment of the present application, when determining the filter coefficient of the two-dimensional filter according to the motion vector deviation, the encoder may determine a scaling parameter first, and then may determine the filter coefficient corresponding to the pixel position according to the scaling parameter and the motion vector deviation.
It should be noted that, in the embodiment of the present application, the scale parameter may include at least one scale value, and the motion vector deviation includes a horizontal deviation and a vertical deviation; wherein the at least one ratio value is a non-zero real number.
Specifically, in the present application, when the two-dimensional filter performs secondary prediction using 9 adjacent pixel positions that form a rectangle, the pixel position located at the center of the rectangle is a position to be predicted, and the other 8 pixel positions are sequentially located at adjacent positions of the position to be predicted, which are upper left, upper right, lower left, and lower left.
Accordingly, in the present application, the encoder may calculate and obtain 9 filter coefficient coefficients corresponding to the 9 adjacent pixel positions according to a preset calculation rule based on the at least one scale value and the motion vector deviation of the position to be predicted.
It should be noted that, in the present application, the preset calculation rule may include a plurality of different calculation manners, such as addition, subtraction, multiplication, and the like. Wherein, for different pixel positions, different calculation modes can be used for calculating the filter coefficient.
It is understood that, in the present application, in obtaining a plurality of filter coefficients corresponding to a plurality of pixel positions by calculation according to different calculation methods in a preset calculation rule, a part of the filter coefficients may be a linear function of the motion vector deviation, that is, a linear relationship between the two, or may be a quadratic function or a high-order function of the motion vector deviation, that is, a nonlinear relationship between the two.
That is, in the present application, any one of the plurality of filter coefficients corresponding to a plurality of adjacent pixel positions may be a linear function, a quadratic function, or a high-order function of the motion vector deviation.
For example, in the present application, it is assumed that the motion vector offset of a pixel position is (dmv _ x, dmv _ y), wherein, if the coordinates of the target pixel position are (i, j), dmv _ x may be represented as dMvX [ i ] [ j ] [0], i.e., representing the offset value of the motion vector offset in the horizontal component, and dmv _ y may be represented as dMvX [ i ] [ j ] [1], i.e., representing the offset value of the motion vector offset in the vertical component.
Where the scaling parameters are typically fractional numbers or fractions, one possible scenario is where the scaling parameters are all powers of 2, such as 1/2, 1/4, 1/8, and the like. Here, both the dmv _ x, dmv _ y are their actual sizes, i.e., 1 for dmv _ x, dmv _ y represents a distance of 1 pixel, and dmv _ x, dmv _ y is a fraction or fraction.
It should be noted that, in the embodiment of the present application, compared to the existing 8-tap filter, the motion vectors of the integer pixel position and the sub-pixel position corresponding to the currently common 8-tap filter are non-negative in both horizontal and vertical directions, and both belong to 0 pixel to 1 pixel, i.e., dmv _ x, dmv _ y may not be negative. In the present application, the motion vectors for the integer pixel position and the sub-pixel position corresponding to the filter may be negative in both the horizontal and vertical directions, i.e., dmv _ x, dmv _ y may be negative.
In the embodiment of the present application, the two-dimensional filter used for performing quadratic prediction is a filter formed by adjacent points forming a preset shape. The adjacent dots constituting the preset shape may include a plurality of dots, for example, 9 dots.
It is understood that in the embodiments of the present application, the predetermined shape may be a symmetrical shape, for example, the predetermined shape may include a rectangle, a diamond shape, or any other symmetrical shape.
Illustratively, in the present application, the two-dimensional filter is a rectangular filter, and specifically, the two-dimensional filter is a filter composed of 9 adjacent pixel positions constituting a rectangle. Of the 9 pixel positions, the pixel position located at the center is the pixel position of the pixel position that currently requires quadratic prediction.
And step 405, determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block.
In the embodiment of the application, after the encoder determines the filter coefficient of the two-dimensional filter according to the motion vector deviation, the encoder can determine the second predicted value of the sub-block based on the filter coefficient and the first predicted value, so that secondary prediction of the sub-block by combining the motion vector deviation with the pixel position can be realized, and the correction of the first predicted value can be completed.
It is understood that, in the embodiment of the present application, the encoder determines the filter coefficient by using the motion vector deviation corresponding to the pixel position, so that the first prediction value can be corrected by the two-dimensional filter according to the filter coefficient to obtain the corrected second prediction value of the sub-block. It can be seen that the second predicted value is a corrected value based on the first predicted value.
Further, in the embodiment of the present application, when the encoder determines the second predicted value of the sub-block based on the filter coefficient and the first predicted value, the encoder may first perform multiplication on the filter coefficient and the first predicted value of the sub-block to obtain a product result, and after traversing all pixel positions in the sub-block, add the product results of all pixel positions of the sub-block to obtain a sum result, and finally perform normalization processing on the sum result, so as to finally obtain the second predicted value after the sub-block is corrected.
In the embodiment of the present application, before performing the quadratic prediction, the first predicted value of the sub-block where the pixel position is located is generally used as the predicted value before the correction of the pixel position, and therefore, when performing the filtering by the two-dimensional filter, the filter coefficient may be multiplied by the predicted value of the corresponding pixel position, that is, the first predicted value, and the multiplication results corresponding to each pixel position may be accumulated and then normalized.
It is understood that the encoder may perform normalization in a variety of ways in the present application, for example, the filter coefficients may be multiplied by the predicted values of the corresponding pixel positions and then the accumulated results may be right-shifted by 4+ shift1 bits. Alternatively, the filter coefficients may be multiplied by the predicted values for the corresponding pixel locations and added to (1< < (3+ shift1)), and then shifted right by 4+ shift1 bits.
Therefore, in the present application, after obtaining the motion vector deviation corresponding to the pixel position inside the sub-block, for each sub-block and each pixel position in each sub-block, filtering may be performed by using a two-dimensional filter based on the first predicted value of the motion compensation of the sub-block according to the motion vector deviation, so as to complete the secondary prediction of the sub-block, and obtain a new second predicted value.
Further, in the embodiments of the present application, the two-dimensional filter may be understood as performing quadratic prediction using a plurality of adjacent pixel positions constituting the preset shape. The preset shape can be a rectangle, a rhombus or any symmetrical shape.
Specifically, in the embodiment of the present application, when performing secondary prediction using 9 adjacent pixel positions constituting a rectangle, the two-dimensional filter may first determine a prediction sample matrix of the current block and a motion vector bias set of the sub-block of the current block; wherein the motion vector deviation matrix comprises the motion vector deviations corresponding to all pixel positions; and then determining a secondary predicted sample matrix of the current block by using the predicted sample matrix and the motion vector deviation set based on the 9 adjacent pixel positions forming a rectangle.
For example, the size of the motion vector may be limited to a reasonable range, and the positive and negative values of the motion vector in the horizontal direction and the vertical direction used as above do not exceed 1 pixel or 1/2 pixels or 1/4 pixels.
It can be understood that, if the prediction reference mode of the current block is 'Pred _ List 01', the encoder averages a plurality of prediction sample matrices of respective components to obtain a final prediction sample matrix of the component. For example, a new luma prediction sample matrix is obtained by averaging 2 luma prediction sample matrices.
Further, in the embodiment of the present application, after obtaining the prediction sample matrix of the current block, if the current block has no transform coefficient, the prediction matrix is used as the encoding result of the current block, and if the current block has transform coefficients, the transform coefficient may be encoded first, and a residual matrix is obtained through inverse transform and inverse quantization, and the residual matrix is added to the prediction matrix to obtain the encoding result.
In summary, with the inter prediction methods proposed in steps 401 to 405, after prediction based on sub-blocks, point-based secondary prediction is performed on the basis of prediction based on sub-blocks for pixel positions where the motion vector deviates from the motion vector of the sub-blocks. The point-based quadratic prediction uses information of a plurality of points constituting a preset shape such as a rectangle, a rhombus, or the like. Point-based quadratic prediction uses a two-dimensional filter. The two-dimensional filter is a filter composed of adjacent dots constituting a preset shape. The adjacent dots constituting the preset shape may be 9 dots. For a pixel location, the result of the filter processing is a new prediction for that location.
The present embodiment provides an inter prediction method that may perform, after prediction based on a sub-block, point-based secondary prediction on the basis of a first prediction value based on the sub-block for a pixel position where a motion vector deviates from the motion vector of the sub-block to obtain a second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.
Based on the above embodiments, in yet another embodiment of the present application, fig. 26 is a schematic structural diagram of a decoder, and as shown in fig. 26, a decoder 300 according to an embodiment of the present application may include a parsing part 301 and a first determining part 302;
the analysis part 301 is configured to analyze the code stream to obtain the prediction mode parameter of the current block;
the first determining part 302 configured to determine a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
Fig. 27 is a schematic diagram illustrating a composition structure of a decoder, and as shown in fig. 27, the decoder 300 according to the embodiment of the present application may further include a first processor 303, a first memory 304 storing an executable instruction of the first processor 303, a first communication interface 305, and a first bus 306 for connecting the first processor 303, the first memory 304, and the first communication interface 305.
Further, in an embodiment of the present application, the first processor 303 is configured to parse a code stream to obtain a prediction mode parameter of a current block;
determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Embodiments of the present application provide a decoder that may perform, after prediction based on a sub-block, point-based secondary prediction on the basis of a first prediction value based on the sub-block for a pixel position where a motion vector deviates from the motion vector of the sub-block, to obtain a second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.
Fig. 28 is a first schematic structural diagram of an encoder, and as shown in fig. 28, an encoder 400 according to an embodiment of the present application may include a second determining portion 401;
the second determining part 401 configured to determine a prediction mode parameter of the current block; and determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
Fig. 29 is a schematic diagram of a second constituent structure of the encoder, and as shown in fig. 29, the encoder 400 according to the embodiment of the present application may further include a second processor 402, a second memory 403 in which an executable instruction of the second processor 402 is stored, a second communication interface 404, and a second bus 405 for connecting the second processor 402, the second memory 403, and the second communication interface 404.
Further, in an embodiment of the present application, the second processor 402 is configured to determine a prediction mode parameter of the current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Embodiments of the present application provide an encoder that may perform, after prediction based on a sub-block, point-based secondary prediction on the basis of a first prediction value based on the sub-block for a pixel position where a motion vector deviates from the motion vector of the sub-block, to obtain a second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.
Embodiments of the present application provide a computer-readable storage medium and a computer-readable storage medium, on which a program is stored, which when executed by a processor implements the method as described in the above embodiments.
Specifically, the program instructions corresponding to an inter-frame prediction method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, or the like, and when the program instructions corresponding to an inter-frame prediction method in the storage medium are read or executed by an electronic device, the method includes the following steps:
analyzing the code stream to obtain the prediction mode parameter of the current block;
determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block;
determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape;
determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
Specifically, the program instructions corresponding to an inter-frame prediction method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, or the like, and when the program instructions corresponding to an inter-frame prediction method in the storage medium are read or executed by an electronic device, the method includes the following steps:
determining a prediction mode parameter of a current block;
determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; wherein the pixel position is the position of a pixel point in the sub-block;
determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape;
determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Industrial applicability
According to the inter-frame prediction method, the encoder, the decoder and the computer storage medium, the decoder analyzes the code stream and obtains the prediction mode parameters of the current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block based on the first motion vector, and a motion vector offset between the pixel location and the sub-block; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block. The encoder determines a prediction mode parameter of the current block; determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode; wherein the current block comprises a plurality of sub-blocks; determining a first predictor of the sub-block based on the first motion vector, and a motion vector offset between the pixel location and the sub-block; wherein, the pixel position is the position of the pixel point in the sub-block; determining a filter coefficient of the two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; and determining a second predicted value of the sub-block based on the filter coefficient and the first predicted value, and determining the second predicted value as an inter-frame predicted value of the sub-block. That is, the inter prediction method proposed in the present application may perform point-based secondary prediction on the first prediction value based on the sub-block for the pixel position where the motion vector deviates from the motion vector of the sub-block after prediction based on the sub-block, to obtain the second prediction value. The interframe prediction method can be well suitable for all scenes, greatly improves the coding performance and improves the coding and decoding efficiency.

Claims (61)

  1. An inter-prediction method applied to a decoder, the method comprising:
    analyzing the code stream to obtain the prediction mode parameter of the current block;
    determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
    determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector;
    determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape;
    determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
  2. The method of claim 1, wherein the determining a first motion vector for a sub-block of the current block comprises:
    analyzing the code stream to obtain affine mode parameters and a prediction reference mode of the current block;
    determining a control point mode and a sub-block size parameter when the affine mode parameter indicates that an affine mode is used;
    and determining the first motion vector according to the prediction reference mode, the control point mode and the sub-block size parameter.
  3. The method of claim 2, wherein the determining the first motion vector according to the prediction reference mode, the control point mode, and the sub-block size parameter comprises:
    determining a control point motion vector group according to the prediction reference mode;
    and determining the first motion vector according to the control point motion vector group, the control point mode and the sub-block size parameter.
  4. The method of claim 3, wherein the method further comprises:
    and traversing each sub-block of the current block, and constructing a motion vector set according to the first motion vector of each sub-block.
  5. The method of claim 4, wherein said determining the first motion vector from the set of control point motion vectors, the control point mode, and the sub-block size parameter comprises:
    determining a difference variable according to the control point motion vector group, the control point mode and the size parameter of the current block;
    determining a subblock position based on the prediction mode parameter and the subblock size parameter;
    determining the first motion vector of the sub-block using the difference variable and the sub-block position.
  6. The method of claim 4, wherein the determining a first predictor of the sub-block based on the first motion vector and a motion vector offset between the pixel location and the sub-block comprises:
    determining a sample matrix; wherein the sample matrix comprises a luma sample matrix and a chroma sample matrix;
    determining the first prediction value according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the motion vector set.
  7. The method of claim 6, wherein the determining a sample matrix comprises:
    obtaining a brightness interpolation filter coefficient and a chroma interpolation filter coefficient;
    determining the luma sample matrix based on the luma interpolation filter coefficients and determining the chroma sample matrix based on the chroma interpolation filter coefficients.
  8. The method of claim 6, wherein the determining the first prediction value according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the set of motion vectors comprises:
    determining a target motion vector from the motion vector set according to the prediction reference mode and the sub-block size parameter;
    determining a prediction sample matrix by using a reference image queue and a reference index corresponding to the prediction reference mode, the sample matrix and the target motion vector; wherein the prediction sample matrix comprises the first prediction values of a plurality of sub-blocks.
  9. The method of claim 2, wherein the determining a first predictor of the sub-block based on the first motion vector and a motion vector offset between the pixel location and the sub-block comprises:
    analyzing the code stream to obtain a secondary prediction parameter;
    determining the motion vector bias between the sub-block and each pixel location based on a difference variable when the quadratic prediction parameter indicates that quadratic prediction is used; wherein the difference variable is determined according to a control point motion vector group, the control point mode, and a size parameter of the current block; the set of control point motion vectors is determined from the current fast prediction reference mode.
  10. The method of claim 8, wherein the method further comprises:
    analyzing the code stream to obtain a secondary prediction parameter;
    determining the first predictor in the prediction sample matrix as the second predictor when the secondary prediction parameters indicate that secondary prediction is not used.
  11. The method of claim 1, wherein the two-dimensional filter is used for quadratic prediction using a plurality of adjacent pixel locations that constitute the preset shape.
  12. The method of claim 11, wherein the predetermined shape is a rectangle, a diamond, or any one of symmetrical shapes.
  13. The method of claim 12, wherein said determining filter coefficients of a two-dimensional filter from said motion vector bias comprises:
    acquiring a proportional parameter;
    and determining the filter coefficient corresponding to the pixel position according to the proportion parameter and the motion vector deviation.
  14. The method of claim 12, wherein the scale parameter comprises at least one scale value, the motion vector bias comprises a horizontal bias and a vertical bias; wherein the at least one ratio value is a non-zero real number.
  15. The method according to claim 14, wherein when the two-dimensional filter performs secondary prediction by using 9 adjacent pixel positions constituting a rectangle, a pixel position located at the center of the rectangle is a position to be predicted, and the other 8 pixel positions are sequentially located at adjacent positions of upper left, upper right, lower left, and lower left of the position to be predicted.
  16. The method of claim 15, wherein said determining the filter coefficient corresponding to the pixel location based on the scale parameter and the motion vector bias comprises:
    and calculating and obtaining 9 filter coefficient coefficients corresponding to the 9 adjacent pixel positions according to a preset calculation rule based on the at least one proportion value and the motion vector deviation of the position to be predicted.
  17. The method of claim 16, wherein,
    setting the filter coefficient of the position to be predicted to be 1.
  18. The method of claim 16, wherein said determining the filter coefficient corresponding to the pixel location based on the scale parameter and the motion vector bias comprises:
    shifting the motion vector deviation to obtain a shifted motion vector deviation;
    and calculating to obtain the filter coefficient according to a preset calculation rule based on the at least one proportion value and the deviation of the motion vector of the position to be predicted after the displacement.
  19. The method according to claim 11, wherein any one of the plurality of filter coefficients corresponding to the plurality of adjacent pixel positions is a linear function, a quadratic function, or a higher-order function of the motion vector deviation.
  20. The method of claim 1, wherein the determining a second predictor of the sub-block based on the filter coefficient and the first predictor comprises:
    multiplying the filter coefficient and the first predicted value to obtain a product result corresponding to the pixel position;
    adding the product results of all pixel positions of the sub-blocks to obtain a summation result;
    and carrying out normalization processing on the summation result to obtain the second predicted value.
  21. The method of claim 12, wherein when the two-dimensional filter makes a quadratic prediction using 9 adjacent pixel positions constituting a rectangle,
    determining a prediction sample matrix for the current block and a motion vector bias set for the sub-block of the current block; wherein the motion vector deviation matrix comprises the motion vector deviations corresponding to all pixel positions;
    and determining a secondary predicted sample matrix of the current block by using the predicted sample matrix and the motion vector deviation set based on the 9 adjacent pixel positions forming the rectangle.
  22. The method of claim 21, wherein,
    if at least one pixel position of the 9 adjacent pixel positions forming the rectangle does not belong to the current block, performing expansion processing on the current block by using the boundary position of the current block.
  23. The method of claim 21, wherein,
    if at least one pixel position of the 9 adjacent pixel positions forming the rectangle does not belong to the current block, replacing the at least one pixel position by the pixel position in the current block.
  24. The method of claim 2, wherein,
    if the value of the affine mode parameter is 1, indicating to use the affine mode;
    and if the value of the affine mode parameter is 0 or the affine mode parameter is not obtained through analysis, indicating not to use the affine mode.
  25. The method of claim 2, wherein the determining a sub-block size parameter comprises:
    analyzing the code stream to obtain a subblock size mark;
    if the value of the subblock size flag is 1, determining that the subblock size parameter is 8 multiplied by 8;
    and if the value of the subblock size flag is 0 or the subblock size flag is not obtained through analysis, determining that the subblock size parameter is 4x 4.
  26. The method of claim 2, wherein the control point modes include a 4 parameter mode and a 6 parameter mode.
  27. The method of claim 8, wherein,
    and analyzing the code stream to obtain the reference image queue and the reference index corresponding to the prediction reference mode.
  28. An inter prediction method applied to an encoder, the method comprising:
    determining a prediction mode parameter of a current block;
    determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks;
    determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector;
    determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape;
    determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, the second prediction value being determined as an inter prediction value of the sub-block.
  29. The method of claim 28, wherein the determining the prediction mode parameter for the current block comprises:
    carrying out pre-coding processing on the current block by utilizing multiple prediction modes to obtain rate distortion cost values corresponding to each prediction mode;
    and selecting the minimum rate distortion cost value from the obtained multiple rate distortion cost values, and determining the prediction mode parameters of the current block according to the prediction mode corresponding to the minimum rate distortion cost value.
  30. The method of claim 28, wherein the determining a first motion vector for a sub-block of the current block comprises:
    determining affine mode parameters and a prediction reference mode for the current block;
    determining a control point mode and a sub-block size parameter when the affine mode parameter indicates that an affine mode is used;
    and determining the first motion vector according to the prediction reference mode, the control point mode and the sub-block size parameter.
  31. The method of claim 29, wherein the determining the first motion vector based on the prediction reference mode, the control point mode, and the sub-block size parameter comprises:
    determining a control point motion vector group according to the prediction reference mode;
    and determining the first motion vector according to the control point motion vector group, the control point mode and the sub-block size parameter.
  32. The method of claim 31, wherein the method further comprises:
    and traversing each sub-block of the current block, and constructing a motion vector set according to the first motion vector of each sub-block.
  33. The method of claim 31, wherein said determining the first motion vector from the set of control point motion vectors, the control point mode, and the sub-block size parameter comprises:
    determining a difference variable according to the control point motion vector group, the control point mode and the size parameter of the current block;
    determining a subblock position based on the prediction mode parameter and the subblock size parameter;
    determining the first motion vector of the sub-block using the difference variable and the sub-block position.
  34. The method of claim 31, wherein the determining a first predictor of the sub-block based on the first motion vector and a motion vector offset between the pixel location and the sub-block comprises:
    determining a sample matrix; wherein the sample matrix comprises a luma sample matrix and a chroma sample matrix;
    determining the first prediction value according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the motion vector set.
  35. The method of claim 34, wherein the determining a sample matrix comprises:
    obtaining a brightness interpolation filter coefficient and a chroma interpolation filter coefficient;
    determining the luma sample matrix based on the luma interpolation filter coefficients and determining the chroma sample matrix based on the chroma interpolation filter coefficients.
  36. The method of claim 34, wherein the determining the first prediction value according to the prediction reference mode, the sub-block size parameter, the sample matrix, and the set of motion vectors comprises:
    determining a target motion vector from the motion vector set according to the prediction reference mode and the subblock size parameter;
    determining a prediction sample matrix by using a reference image queue and a reference index corresponding to the prediction reference mode, the sample matrix and the target motion vector; wherein the prediction sample matrix comprises the first prediction values of a plurality of sub-blocks.
  37. The method of claim 30, wherein the determining a first predictor of the sub-block based on the first motion vector and a motion vector offset between the pixel location and the sub-block comprises:
    determining a secondary prediction parameter;
    determining the motion vector bias between the sub-block and each pixel location based on a difference variable when the quadratic prediction parameter indicates that quadratic prediction is used.
  38. The method of claim 36, wherein the method further comprises:
    determining a secondary prediction parameter;
    determining the first predictor in the prediction sample matrix as the second predictor when the secondary prediction parameters indicate that secondary prediction is not used.
  39. The method of claim 28, wherein the two-dimensional filter is used for quadratic prediction using a plurality of adjacent pixel locations that constitute the preset shape.
  40. The method of claim 39, wherein the predetermined shape is a rectangle, a diamond, or any one of symmetrical shapes.
  41. The method of claim 40, wherein said determining filter coefficients of a two-dimensional filter from said motion vector bias comprises:
    determining a proportion parameter;
    and determining the filter coefficient corresponding to the pixel position according to the proportion parameter and the motion vector deviation.
  42. The method of claim 40, wherein the scale parameter comprises at least one scale value, the motion vector bias comprises a horizontal bias and a vertical bias; wherein the at least one ratio value is a non-zero real number.
  43. The method of claim 42, wherein when the two-dimensional filter performs secondary prediction by using 9 adjacent pixel positions constituting a rectangle, the pixel position located at the center of the rectangle is the position to be predicted, and the other 8 pixel positions are sequentially located at the adjacent positions of upper left, upper right, lower left, and lower left of the position to be predicted.
  44. The method of claim 43, wherein said determining the filter coefficient corresponding to the pixel location based on the scale parameter and the motion vector bias comprises:
    and calculating and obtaining 9 filter coefficient coefficients corresponding to the 9 adjacent pixel positions according to a preset calculation rule based on the at least one proportion value and the motion vector deviation of the position to be predicted.
  45. The method of claim 44, wherein,
    setting the filter coefficient of the position to be predicted to be 1.
  46. The method of claim 43, wherein said determining the filter coefficient corresponding to the pixel location based on the scale parameter and the motion vector bias comprises:
    shifting the motion vector deviation to obtain a shifted motion vector deviation;
    and calculating to obtain the filter coefficient according to a preset calculation rule based on the at least one proportion value and the deviation of the motion vector of the position to be predicted after the displacement.
  47. The method of claim 39, wherein any one of the plurality of filter coefficients corresponding to the plurality of adjacent pixel locations is a linear, quadratic, or high order function of the motion vector bias.
  48. The method of claim 28, wherein the determining a second predictor of the sub-block based on the filter coefficient and the first predictor comprises:
    multiplying the filter coefficient and the first predicted value to obtain a product result corresponding to the pixel position;
    adding the product results of all pixel positions of the sub-blocks to obtain a summation result;
    and carrying out normalization processing on the summation result to obtain the second predicted value.
  49. The method of claim 40, wherein when the two-dimensional filter makes a second prediction using 9 adjacent pixel locations that make up a rectangle,
    determining a prediction sample matrix for the current block and a motion vector bias set for the sub-block of the current block; wherein the motion vector deviation matrix comprises the motion vector deviations corresponding to all pixel positions;
    and determining a secondary predicted sample matrix of the current block by using the predicted sample matrix and the motion vector deviation set based on the 9 adjacent pixel positions forming the rectangle.
  50. The method of claim 49, wherein,
    if at least one pixel position of the 9 adjacent pixel positions forming the rectangle does not belong to the current block, performing expansion processing on the current block by using the boundary position of the current block.
  51. The method of claim 49, wherein,
    if at least one pixel position of the 9 adjacent pixel positions forming the rectangle does not belong to the current block, replacing the at least one pixel position by the pixel position in the current block.
  52. The method of claim 30, wherein,
    and writing the prediction mode parameters, the affine mode parameters and the prediction reference mode into a code stream.
  53. The method of claim 30, wherein,
    if the subblock size parameter is 8 multiplied by 8, setting the subblock size flag to be 1, and writing the subblock size flag into a code stream;
    and if the subblock size parameter is 4 multiplied by 4, setting the subblock size flag to be 0, and writing the subblock size flag into a code stream.
  54. The method of claim 30, wherein the control point modes include a 4-parameter mode and a 6-parameter mode.
  55. The method of claim 37 or 38,
    and writing the secondary prediction parameters into a code stream.
  56. The method of claim 41, wherein,
    and writing the proportional parameters into a code stream.
  57. A decoder, the decoder comprising a parsing part, a first determining part;
    the analysis part is configured to analyze the code stream and obtain the prediction mode parameter of the current block;
    the first determining part is configured to determine a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; the two-dimensional filter is used for carrying out quadratic prediction processing according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
  58. A decoder comprising a first processor, a first memory having stored thereon instructions executable by the first processor, the instructions when executed by the first processor implementing the method of any of claims 1-27.
  59. An encoder, the encoder comprising a second determining portion;
    the second determining part configured to determine a prediction mode parameter of the current block; and determining a first motion vector of a sub-block of the current block when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode; wherein the current block comprises a plurality of sub-blocks; and determining a first predictor of the sub-block and a motion vector offset between the pixel location and the sub-block based on the first motion vector; determining a filter coefficient of a two-dimensional filter according to the motion vector deviation; wherein the two-dimensional filter is configured to use quadratic prediction according to a preset shape; and determining a second prediction value of the sub-block based on the filter coefficient and the first prediction value, and determining the second prediction value as an inter prediction value of the sub-block.
  60. An encoder comprising a second processor, a second memory storing instructions executable by the second processor, the instructions when executed, the second processor implementing the method of any of claims 28-56.
  61. A computer storage medium storing a computer program which, when executed by a first processor, implements the method of any of claims 1-27 or which, when executed by a second processor, implements the method of any of claims 28-56.
CN202180005841.XA 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium Pending CN114556944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210644545.0A CN114866783A (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202010746227.6A CN114071157A (en) 2020-07-29 2020-07-29 Inter-frame prediction method, encoder, decoder, and computer storage medium
CN2020107462276 2020-07-29
PCT/CN2021/106081 WO2022022278A1 (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210644545.0A Division CN114866783A (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium

Publications (1)

Publication Number Publication Date
CN114556944A true CN114556944A (en) 2022-05-27

Family

ID=80037497

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202010746227.6A Withdrawn CN114071157A (en) 2020-07-29 2020-07-29 Inter-frame prediction method, encoder, decoder, and computer storage medium
CN202180005841.XA Pending CN114556944A (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium
CN202210644545.0A Pending CN114866783A (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010746227.6A Withdrawn CN114071157A (en) 2020-07-29 2020-07-29 Inter-frame prediction method, encoder, decoder, and computer storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202210644545.0A Pending CN114866783A (en) 2020-07-29 2021-07-13 Inter-frame prediction method, encoder, decoder, and computer storage medium

Country Status (5)

Country Link
CN (3) CN114071157A (en)
MX (1) MX2023000959A (en)
TW (1) TW202211690A (en)
WO (1) WO2022022278A1 (en)
ZA (1) ZA202213227B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278255B (en) * 2022-09-23 2022-12-20 山东宝德龙健身器材有限公司 Data storage system for safety management of strength instrument

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111656783B (en) * 2018-01-25 2024-03-08 三星电子株式会社 Method and apparatus for video signal processing using sub-block based motion compensation
WO2019160860A1 (en) * 2018-02-14 2019-08-22 Futurewei Technologies, Inc. Adaptive interpolation filter
WO2020003257A1 (en) * 2018-06-29 2020-01-02 Beijing Bytedance Network Technology Co., Ltd. Boundary filtering for sub-block
CN111050168B (en) * 2019-12-27 2021-07-13 浙江大华技术股份有限公司 Affine prediction method and related device thereof
CN112601081B (en) * 2020-12-04 2022-06-24 浙江大华技术股份有限公司 Adaptive partition multi-prediction method and device

Also Published As

Publication number Publication date
CN114866783A (en) 2022-08-05
WO2022022278A1 (en) 2022-02-03
ZA202213227B (en) 2023-09-27
MX2023000959A (en) 2023-02-22
CN114071157A (en) 2022-02-18
TW202211690A (en) 2022-03-16

Similar Documents

Publication Publication Date Title
US20190246102A1 (en) Method and apparatus for video encoding and video decoding based on neural network
JP7391958B2 (en) Video signal encoding/decoding method and equipment used in the method
JP7009632B2 (en) Video coding method based on conversion and its equipment
CN117528111A (en) Image encoding method, image decoding method, and method for transmitting bit stream
KR20180061069A (en) Method and apparatus for filtering
US11595664B2 (en) Affine model-based image encoding/decoding method and device
WO2010001918A1 (en) Image processing device and method, and program
CN113382234B (en) Video signal encoding/decoding method and apparatus for the same
KR20100036284A (en) Moving image encoding device, moving image encoding method, moving image decoding device and moving image decoding method
CN114586366A (en) Inter-frame prediction method, encoder, decoder, and storage medium
CN116055720A (en) Video signal encoding/decoding method and apparatus therefor
CN115315953A (en) Inter-frame prediction method, encoder, decoder and storage medium
WO2022022278A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
WO2022061680A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
TW202332274A (en) Mip for all channels in the case of 4:4:4-chroma format and of single tree
CN116980596A (en) Intra-frame prediction method, encoder, decoder and storage medium
CN114125466A (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
WO2022037344A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
WO2022077495A1 (en) Inter-frame prediction methods, encoder and decoders and computer storage medium
US20230336780A1 (en) Method, apparatus, and storage medium for encoding/decoding feature map for machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination