CN114982228A - Inter-frame prediction method, encoder, decoder, and computer storage medium - Google Patents

Inter-frame prediction method, encoder, decoder, and computer storage medium Download PDF

Info

Publication number
CN114982228A
CN114982228A CN202080093968.7A CN202080093968A CN114982228A CN 114982228 A CN114982228 A CN 114982228A CN 202080093968 A CN202080093968 A CN 202080093968A CN 114982228 A CN114982228 A CN 114982228A
Authority
CN
China
Prior art keywords
determining
prediction
pixel point
weight
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080093968.7A
Other languages
Chinese (zh)
Inventor
谢志煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN114982228A publication Critical patent/CN114982228A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based

Abstract

The embodiment of the application discloses an inter-frame prediction method, an encoder, a decoder and a computer storage medium, wherein the decoder analyzes a code stream to obtain a prediction mode parameter of a current block; when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, determining a first prediction value of a pixel point in each sub-block using a first motion vector of each sub-block of the current block, and determining a second prediction value of each pixel point using a second motion vector of each pixel point of the current block; determining a first weight and a second weight of a pixel point in a current block; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value; determining a predicted value of a pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.

Description

Inter-frame prediction method, encoder, decoder, and computer storage medium Technical Field
The present application relates to the field of video encoding and decoding technologies, and in particular, to an inter-frame prediction method, an encoder, a decoder, and a computer storage medium.
Background
In the field of Video encoding and decoding, in order to take performance and cost into consideration, generally, affine prediction in multifunctional Video Coding (VVC) and digital Audio/Video encoding Standard of China (AVS) is realized based on subblocks.
The sub-block based affine prediction is not accurate enough, and thus, the sub-block based affine prediction needs to be improved by a technique such as an interleaved prediction (interleaved prediction). Although the new subblock division mode introduced by the interleaving prediction can improve the accuracy of the prediction to a certain extent, the interleaving prediction also brings obvious complexity and reduces the coding and decoding efficiency.
Disclosure of Invention
The application provides an inter-frame prediction method, an encoder, a decoder and a computer storage medium, which can greatly improve the encoding performance, thereby improving the encoding and decoding efficiency.
The technical scheme of the application is realized as follows:
in a first aspect, an embodiment of the present application provides an inter prediction method applied to a decoder, where the method includes:
analyzing the code stream to obtain the prediction mode parameter of the current block;
when the prediction mode parameter indicates that the inter prediction mode is used to determine the inter prediction value of the current block, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
for a pixel point in the current block, determining a first weight and a second weight of the pixel point; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
In a second aspect, an embodiment of the present application provides an inter-frame prediction method applied to an encoder, where the method includes:
determining a prediction mode parameter of a current block;
when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
In a third aspect, an embodiment of the present application provides a decoder, which includes a parsing part, a first determining part;
the analysis part is configured to analyze the code stream and obtain the prediction mode parameter of the current block;
the first determining part is configured to determine a first predictor of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block and a second predictor of the each pixel point of the current block using a second motion vector of the each pixel point when the prediction mode parameter indicates that the inter predictor of the current block is determined using an inter prediction mode; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
In a fourth aspect, embodiments of the present application provide a decoder comprising a first processor, a first memory storing first processor-executable instructions that, when executed, implement the inter prediction method as described above.
In a fifth aspect, an embodiment of the present application provides an encoder, which includes a second determination portion;
the second determining part configured to determine a prediction mode parameter of the current block; when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
In a sixth aspect, embodiments of the present application provide an encoder comprising a second processor, a second memory storing instructions executable by the second processor, the instructions, when executed, implement the inter prediction method as described above.
In a seventh aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, and when the computer program is executed by a first processor, the inter-frame prediction method according to the first aspect is implemented, and when the computer program is executed by a second processor, the inter-frame prediction method according to the second aspect is implemented.
According to the inter-frame prediction method, the encoder, the decoder and the computer storage medium, the decoder analyzes the code stream and obtains the prediction mode parameters of the current block; when the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, determining a first prediction value of a pixel point in each sub-block using a first motion vector of each sub-block of the current block, and determining a second prediction value of each pixel point using a second motion vector of each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in a current block; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value; determining a predicted value of a pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block. That is to say, the interframe prediction method provided by the application can obtain a group of predicted values, namely a first predicted value, by using prediction based on subblocks for a current block, obtain another group of predicted values, namely a second predicted value, by using prediction based on points for the current block, and after determining a first weight of prediction based on subblocks and a second weight of prediction based on points for the same pixel points in the current block, perform weighted average on the first predicted value and the second predicted value by using the first weight and the second weight, and finally obtain a new predicted value of the current block, so that the accuracy of interframe prediction can be improved, meanwhile, the complexity of calculation is reduced, the bandwidth is reduced, the coding performance is greatly improved, and the coding and decoding efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of interpolation of a pixel;
FIG. 2 is a first schematic diagram of sub-block interpolation;
FIG. 3 is a second schematic diagram of sub-block interpolation;
FIG. 4 is a schematic diagram of interleaved prediction;
FIG. 5 is a first diagram illustrating weights of predictor values based on sub-blocks;
FIG. 6 is a second diagram of the weight of predictor based on sub-block;
fig. 7 is a block diagram illustrating a video coding system according to an embodiment of the present application;
fig. 8 is a block diagram illustrating a video decoding system according to an embodiment of the present application;
FIG. 9 is a first flowchart illustrating an implementation of an inter-frame prediction method;
FIG. 10 is a first diagram illustrating first weights;
FIG. 11 is a second diagram illustrating the first weights;
FIG. 12 is a flowchart illustrating a second implementation of the inter-frame prediction method;
FIG. 13 is a schematic illustration of unidirectional prediction;
FIG. 14 is a third flowchart illustrating an implementation of the inter prediction method;
FIG. 15 is a diagram illustrating a first example of bi-directional prediction;
FIG. 16 is a flowchart illustrating a fourth implementation of the inter-frame prediction method;
FIG. 17 is a diagram illustrating bi-directional prediction;
FIG. 18 is a diagram of interpolation filtering;
FIG. 19 is a diagram of interpolation filtering;
FIG. 20 is a third schematic diagram of interpolation filtering;
FIG. 21 is a fourth schematic of interpolation filtering;
FIG. 22 is a fifth schematic of interpolation filtering;
FIG. 23 is a sixth schematic of interpolation filtering;
FIG. 24 is a fifth flowchart illustrating an implementation of the inter-frame prediction method;
FIG. 25 is a first block diagram of a decoder;
FIG. 26 is a block diagram of a decoder;
FIG. 27 is a first block diagram of the encoder;
fig. 28 is a schematic diagram of the second constituent structure of the encoder.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
Currently, the common video codec standard is based on the adoption of a block-based hybrid coding framework. Each frame in a video image is divided into square Largest Coding Units (LCUs) with the same size (for example, 128 × 128, 64 × 64, etc.), and each Largest Coding Unit may also be divided into rectangular Coding Units (CUs) according to a rule; and the coding Unit may be further divided into smaller Prediction Units (PUs). Specifically, the hybrid coding framework may include modules such as prediction, Transform (Transform), Quantization (Quantization), entropy coding (entropy coding), Loop filtering (In Loop Filter), and the like; the prediction module may include intra prediction (intra prediction) and inter prediction (inter prediction), and the inter prediction may include motion estimation (motion estimation) and motion compensation (motion compensation). Because strong correlation exists between adjacent pixels in one frame of the video image, the spatial redundancy between the adjacent pixels can be eliminated by using an intra-frame prediction mode in the video coding and decoding technology; however, because there is strong similarity between adjacent frames in the video image, the inter-frame prediction method is used in the video coding and decoding technology to eliminate the time redundancy between adjacent frames, thereby improving the coding efficiency. The following detailed description of the present application will be made in terms of inter prediction.
Inter-frame prediction is to use an already coded/decoded frame to predict a part to be coded/decoded in a current frame, and in a block-based coding/decoding frame, the part to be coded/decoded is usually a coding unit or a prediction unit. The coding unit or prediction unit that needs to be coded/decoded is collectively referred to herein as a current block. Translational motion is a common and simple motion mode in video, so the prediction of translational motion is also a traditional prediction method in video coding and decoding. Translational motion in a video may be understood as the movement of a portion of content from a location on one frame to a location on another frame over time. A simple one-way prediction of the translation can be represented by a Motion Vector (MV) between a certain frame and the current frame. The current block can find a reference block with the same size as the current block on the reference frame through the motion information containing the reference frame and the motion vector, and the reference block is taken as a prediction block of the current block. In an ideal translation motion, the content of the current block has no deformation, rotation, and the like, and no change in luminance color, and the like, between different frames, however, the content in the video does not always conform to such an ideal situation. Bi-directional prediction can solve the above problems to some extent. The normal bidirectional prediction refers to a prediction of bidirectional translation. Bi-prediction is to use two reference frames and motion information of motion vectors to find two reference blocks with the same size as the current block from the two reference frames (the two reference frames may be the same reference frame), and use the two reference blocks to generate a prediction block of the current block. The generation method includes averaging, weighted averaging, and some other calculation.
In this application, prediction may be considered as part of motion compensation, and some documents will refer to prediction in this application as motion compensation, and some documents will refer to affine prediction as affine motion compensation.
Rotation, zooming, warping, etc. are also common changes in video, however, ordinary translational prediction does not handle such changes well, and affine (affine) prediction models are applied in video codecs, such as affine in VVC and AVS, where the affine prediction models of VVC and AVS3 are similar. In the rotation, enlargement, reduction, warping, deformation, etc., it can be considered that the current block does not use the same MV for all points, and thus the MV for each point needs to be derived. The affine prediction model derives the MV for each point by calculation with a small number of several parameters. The affine predictive models of VVC and AVS3 both used 2-control-point (4-parameter) and 3-control-point (6-parameter) models. 2 control points, namely 2 control points at the upper left corner and the upper right corner of the current block, and 3 control points, namely 3 control points at the upper left corner, the upper right corner and the lower left corner of the current block. In general, since each MV comprises an x-component and a y-component, there are 4 parameters for 2 control points and 6 parameters for 3 control points.
An MV can be derived for each pixel point according to an affine prediction model, each pixel point can find the corresponding position in a reference frame, and if the position is not the whole pixel point, the value of the sub-pixel point is obtained by an interpolation method. The interpolation methods used in the video coding and decoding standards are usually implemented by Finite Impulse Response (FIR) filters, and the complexity (cost) for implementing such a method is high. For example, in AVS3, an 8-tap interpolation filter is used for the luminance component, and the normal mode sub-pixel precision is 1/4 pixels and the affine mode sub-pixel precision is 1/16 pixels. For each sub-pixel point meeting 1/16-pixel precision, 8 whole pixels in the horizontal direction and 8 whole pixels in the vertical direction, namely 64 whole pixels, are needed to be interpolated. Fig. 1 is a schematic diagram of interpolation of pixels, as shown in fig. 1, a circular pixel is a sub-pixel point to be obtained, a dark square pixel is a position of an integer pixel corresponding to the sub-pixel, a vector between the two is a motion vector of the sub-pixel, a light square pixel is a pixel required for interpolation of the circular sub-pixel point, and to obtain a value of the sub-pixel point, pixel values of light square pixel regions of 8 × 8 are required for interpolation, and the pixel values also include dark pixel points.
In conventional translational prediction, the MV of each pixel of the current block is the same. If the concept of sub-blocks is further introduced, the size of the sub-blocks is e.g. 4x4, 8x8, etc. Fig. 2 is a first schematic diagram of sub-block interpolation, and a pixel region required for 4 × 4 block interpolation is shown in fig. 2. Fig. 3 is a diagram of sub-block interpolation, and a pixel region required for 8 × 8 block interpolation is shown in fig. 3.
If the MV of each pixel in a sub-block is the same. The pixels in one sub-block can be interpolated together to share the bandwidth, use the same phase filter, and share the intermediate values of the interpolation process. However, if an MV is used for each pixel, the bandwidth increases and filters in different phases may be used and intermediate values of the interpolation process may not be shared.
Affine predictions based on points are costly, and therefore, to achieve both performance and cost, affine predictions in VVC and AVS3 are implemented on a subblock basis. The subblock sizes in the AVS3 are two sizes of 4x4 and 8x8, and a subblock size of 4x4 is used in the VVC. Each subblock has an MV, and pixels inside the subblocks share the same MV. Therefore, interpolation is uniformly carried out on all pixel points in the subblocks. By the method, the motion compensation complexity of the affine prediction based on the sub-block is similar to that of other prediction methods based on the sub-block.
Therefore, in the sub-block-based affine prediction method, pixel points inside the sub-blocks share the same MV, wherein the MV sharing method is determined by taking the MV in the center of the current sub-block. For at least one sub-block with even number of pixels in the horizontal and vertical directions, such as 4x4 and 8x8, the center of the sub-block actually falls on a non-integer pixel point. The current standard takes an integer pixel position, for example, for a 4 × 4 sub-block, a pixel point with a distance of (2, 2) from the upper left corner is taken. For 8x8 sub-blocks, the pixel point whose distance from the upper left corner is (4, 4) is taken.
The affine prediction model can derive the MV of each pixel point from the control points (2 control points or 3 control points) used by the current block. In the affine prediction based on the sub-block, the MV of the position is calculated according to the pixel points in the previous section and is used as the MV of the sub-block. To derive the motion vector for each sub-block, the motion vector for the center sample of each sub-block may be rounded to 1/16 precision and then motion compensated.
As the technology develops, the affine prediction may also use bidirectional prediction, which may be specifically understood as using two sets of affine model parameters for the current block, where the two sets of affine model parameters may be respectively from different reference frames and may also be from the same reference frame. And performing affine prediction on the two groups of affine model parameters respectively, and then performing average or weighted average on the two groups of affine prediction results to finally obtain the prediction block of the current block.
Sub-block based affine prediction is not accurate enough and there are several techniques proposed to further improve sub-block based affine prediction, one of which is called interleaved prediction. A sub-block dividing mode is added in the interleaving prediction, and the new sub-block dividing mode is different from the original dividing mode. Fig. 4 is a schematic diagram of interleaving prediction, as shown in fig. 4, Pattern0 is an existing partitioning manner, Pattern1 is a newly added partitioning manner of interleaving prediction, and compared with Pattern0, the partitioning of all sub-blocks in Pattern1 is shifted by half the length of the sub-block in the horizontal direction and the vertical direction respectively. This is a typical partitioning method, and other partitioning methods can be used for Pattern0 and Pattern 1. Like Pattern0, each sub-block of Pattern1 can also calculate an MV using an affine model, and then use the MV for sub-block based prediction. Therefore, two prediction values, one from Pattern0 and the other from Pattern1, can be obtained for each point of the current block, so as to constitute the prediction block P0 and the prediction block P1 corresponding to the current block, and further obtain a new prediction block P corresponding to the current block based on P0 and P1.
In general, the closer the distance to the sub-block reference position (determined as the position of the sub-block MV) is, the higher the accuracy of prediction is, and the farther the distance from the sub-block reference position is, the lower the accuracy of prediction is. In the current standard, the pixel position with the distance of (2, 2) from the upper left corner position is taken as the sub-block reference position for the 4 × 4 sub-block. Since the pixel position from the upper left corner position of the 8 × 8 sub-block is (4, 4) as the sub-block reference position, the weight of the prediction value corresponding to a different pixel position in the sub-block may be different. Fig. 5 is a first diagram illustrating the weight of the predictor based on the sub-block, and fig. 6 is a second diagram illustrating the weight of the predictor based on the sub-block, and as shown in fig. 5 and 6, the weights of the predictors based on the sub-block are different for different pixel positions. For example, the predicted values at the pixel positions (2, 2), (2, 3), (3, 2) and (3, 3) in the 4x4 sub-block have a large weight and take on a value of 3, while the predicted values at other pixel positions have a small weight and take on a value of 1, and accordingly, the predicted values at 16 pixel positions in the middle region in the 8x8 sub-block have a large weight and take on a value of 3, while the predicted values at other pixel positions have a small weight and take on a value of 1. Further, the predicted values at each pixel position in P0 and P1 may be weighted and averaged according to their corresponding weight values, specifically, if the position weights of one pixel position in P0 and P1 are equal, i.e., the weight values are 1 in both cases of P0 and P1, or the weight values are 3 in both cases of P0 and P1, the weight values are weighted and averaged by 1: 1, otherwise, performing weighted average according to corresponding weight, such as 3: 1 or 1: 3.
it will be appreciated that the proposed interleaved prediction method can be seen as an improvement over either unidirectional prediction or each of the bidirectional predictions. If interleaved prediction is applied for bi-directional prediction, there will be one interleaved prediction for each unidirectional prediction. And the predicted values of the original unidirectional prediction and the interleaved prediction are weighted-averaged to obtain a new predicted value of the unidirectional prediction, and the two new predicted values of the unidirectional prediction are re-averaged (or weighted-averaged) to obtain a final predicted value.
The interleaving prediction uses different division modes, so that the reference positions of the subblocks of the two division modes are also in an interleaving shape, the prediction accuracy of some points with low prediction accuracy in one division mode is higher in the other division mode, and errors are reduced by a weighted average method, so that the compression performance is improved.
However, the interleaving prediction technique has a certain disadvantage, specifically, the interleaving prediction uses 2 different subblock division modes, each subblock division mode needs to perform prediction based on subblocks, and as a result, compared with the original prediction method, the interleaving prediction method only based on the prediction of subblocks increases the complexity by more than one time.
On one hand, the calculation amount is increased because the subblock division method of the original prediction method is uniform, the interleaving prediction is divided into smaller subblocks, and the smaller subblocks mean more calculation amount under the condition that the interpolation result points are the same and the filter taps are the same in the case that the horizontal and vertical separable filters are used. Take 1 subblock of 4x4 and 4 subblocks of 2x2, all using an 8-tap horizontally and vertically separable filter as an example. The horizontal direction of 1 subblock of 4x4 requires interpolation of 4x (4+7) points, and the vertical direction requires interpolation of 4x4 points, for 60 points. The horizontal direction of 1 subblock of 2x2 requires interpolation of 2x (2+7) points, and the vertical direction requires interpolation of 2x2 points, for a total of 4 subblocks, and for a total of 88 points. The amount calculated is more than 1 time.
On the other hand, the bandwidth is increased because each sub-block of the Pattern1 has a new MV, the MV of the sub-block of the Pattern1 is not necessarily the same as the MV of the sub-block of the Pattern0 around the sub-block, and the MV of the sub-block of the Pattern1 and the sub-block of the Pattern0 around the sub-block are misaligned, thereby increasing the bandwidth. How much bandwidth is increased depends on the specific hardware implementation, the control of interpolation sequence between sub-blocks, the limit of the algorithm on the MV, and the like.
On the other hand, the difference of such small blocks of 2x2, 4x2, and 2x4 also brings additional complexity, and also the control of the interpolation order between each divided sub-block, the storage of intermediate values, the control of the weighting timing, and the like, are involved.
In summary, the performance of interleaving prediction comes from "interleaving", but "interleaving" also brings significant complexity. On the basis of the technology, although some simplified methods for the technology have been proposed, the method is performed under the framework of "interleaving", and the complexity caused by the "interleaving" cannot be effectively reduced, that is, only the improvement cannot be eliminated aiming at the above problems at present.
Therefore, although the accuracy of prediction can be improved to a certain extent by a new subblock division mode introduced by interleaving prediction, the interleaving prediction also brings obvious complexity and reduces the coding and decoding efficiency.
In order to solve the defects in the prior art, in the embodiment of the present application, a set of predicted values, that is, a first predicted value, may be obtained by using a prediction based on sub-blocks for a current block, and another set of predicted values, that is, a second predicted value, may be obtained by using a prediction based on points for a current block, and after determining a first weight of the prediction based on sub-blocks and a second weight of the prediction based on points for the same pixel point in the current block, the first predicted value and the second predicted value are weighted-averaged by using the first weight and the second weight, and finally a new predicted value of the current block is obtained, so that the accuracy of inter-frame prediction may be improved while the complexity of calculation is reduced, the bandwidth is reduced, and thus the encoding performance is greatly improved, and the encoding and decoding efficiency is improved.
It should be understood that the present application provides a video coding system, and fig. 7 is a schematic block diagram illustrating a component of the video coding system provided in the present application, as shown in fig. 7, the video coding system 11 may include: a transformation unit 111, a quantization unit 112, a mode selection and encoding control logic unit 113, an intra prediction unit 114, an inter prediction unit 115 (including motion compensation and motion estimation), an inverse quantization unit 116, an inverse transformation unit 117, a loop filtering unit 118, an encoding unit 119, and a decoded image buffer unit 110; for an input original video signal, a video reconstruction block can be obtained by dividing a Coding Tree Unit (CTU), a Coding mode is determined by a mode selection and Coding control logic Unit 113, and then residual pixel information obtained by intra-frame or inter-frame prediction is transformed by a transformation Unit 111 and a quantization Unit 112, including transforming the residual information from a pixel domain to a transformation domain and quantizing the obtained transformation coefficient, so as to further reduce the bit rate; the intra-prediction unit 114 is configured to perform intra-prediction on the video reconstructed block; wherein, the intra prediction unit 114 is configured to determine an optimal intra prediction mode (i.e. a target prediction mode) of the video reconstructed block; inter-prediction unit 115 is to perform inter-prediction encoding of the received video reconstructed block relative to one or more blocks in one or more reference frames to provide temporal prediction information; wherein motion estimation is the process of generating motion vectors that can estimate the motion of the video reconstructed block, and then motion compensation is performed based on the motion vectors determined by motion estimation; after determining the inter prediction mode, the inter prediction unit 115 is also configured to supply the selected inter prediction data to the encoding unit 119, and also, send the calculated and determined motion vector data to the encoding unit 119; furthermore, the inverse quantization unit 116 and the inverse transformation unit 117 are used for reconstruction of the video reconstruction block, reconstructing a residual block in the pixel domain, which removes blocking artifacts through the loop filtering unit 118, and then adding the reconstructed residual block to a predictive block in the frame of the decoded picture buffer unit 110 to generate a reconstructed video reconstruction block; coding section 119 is for coding various coding parameters and quantized transform coefficients. And the decoded picture buffer unit 110 is used to store reconstructed video reconstructed blocks for prediction reference. As the video encoding proceeds, new reconstructed video blocks are generated continuously and stored in the decoded picture buffer unit 110.
Fig. 8 is a schematic block diagram illustrating a composition of a video decoding system provided in an embodiment of the present application, and as shown in fig. 8, the video decoding system 12 may include: a decoding unit 121, an inverse transform unit 127, and inverse quantization unit 122, intra prediction unit 123, motion compensation unit 124, loop filter unit 125, and decoded picture buffer unit 126; after the input video signal is coded by the video coding system 11, the code stream of the video signal is output; the code stream is input into the video decoding system 12, and first passes through the decoding unit 121 to obtain a decoded transform coefficient; the transform coefficients are processed by an inverse transform unit 127 and an inverse quantization unit 122 to produce a residual block in the pixel domain; intra-prediction unit 123 may be used to generate prediction data for a current video decoded block based on the determined intra-prediction direction and data from previously decoded blocks of the current frame or picture; motion compensation unit 124 is a predictive block that determines prediction information for a video decoded block by parsing motion vectors and other associated syntax elements and uses the prediction information to generate the video decoded block being decoded; forming a decoded video block by summing the residual block from inverse transform unit 127 and inverse quantization unit 122 with the corresponding predictive block generated by intra prediction unit 123 or motion compensation unit 124; the decoded video signal passes through the loop filtering unit 125 to remove blocking artifacts, which can improve video quality; the decoded video blocks are then stored in the decoded picture buffer unit 126, and the decoded picture buffer unit 126 stores reference pictures for subsequent intra prediction or motion compensation, and also for the output of the video signal, resulting in a restored original video signal.
The inter-frame prediction method provided by the embodiment of the present application mainly acts on the inter-frame prediction unit 215 of the video coding system 11 and the inter-frame prediction unit, i.e. the motion compensation unit 124, of the video decoding system 12; that is, if the video encoding system 11 can obtain a better prediction effect by the inter-frame prediction method provided in the embodiment of the present application, the video decoding system 12 can also improve the video decoding recovery quality accordingly.
Based on this, the technical solution of the present application is further elaborated below with reference to the drawings and the embodiments. Before detailed explanation, it should be noted that "first", "second", "third", and the like, are mentioned throughout the specification only for distinguishing different features, and do not have functions of defining priority, precedence, size relationships, and the like.
It should be noted that, in the present embodiment, an example is described based on the AVS3 standard, and the inter-frame prediction method proposed in the present application may be applied to other coding standard technologies such as VVC, and the present application is not limited to this.
The embodiment of the application provides an inter-frame prediction method which is applied to video decoding equipment, namely a decoder. The functions performed by the method may be implemented by the first processor in the decoder calling a computer program, which of course may be stored in the first memory, it being understood that the decoder comprises at least the first processor and the first memory.
Further, in an embodiment of the present application, fig. 9 is a first flowchart illustrating an implementation of an inter prediction method, and as shown in fig. 9, the method for a decoder to perform inter prediction may include the following steps:
step 101, analyzing the code stream to obtain the prediction mode parameter of the current block.
In an embodiment of the present application, the decoder may first parse the binary code stream, thereby obtaining the prediction mode parameters of the current block. Wherein the prediction mode parameter may be used to determine a prediction mode used by the current block.
It should be noted that, an image to be decoded may be divided into a plurality of image blocks, and the image block to be decoded currently may be referred to as a current block (which may be represented by a CU), and an image block adjacent to the current block may be referred to as an adjacent block; that is, in the image to be decoded, the current block has a neighboring relationship with the neighboring block. Here, each current block may include a first image component, a second image component, and a third image component, that is, the current block represents an image block to be currently subjected to prediction of the first image component, the second image component, or the third image component in an image to be decoded.
Wherein, assuming that the current block performs the first image component prediction, and the first image component is a luminance component, that is, the image component to be predicted is a luminance component, then the current block may also be called a luminance block; alternatively, assuming that the current block performs the second image component prediction, and the second image component is a chroma component, that is, the image component to be predicted is a chroma component, the current block may also be referred to as a chroma block.
Further, in an embodiment of the present application, the prediction mode parameter may indicate not only a prediction mode adopted by the current block but also a parameter related to the prediction mode.
It is understood that, in the embodiments of the present application, the prediction modes may include inter prediction modes, conventional intra prediction modes, and non-conventional intra prediction modes, etc.
That is to say, on the encoding side, the encoder may select an optimal prediction mode to perform pre-encoding on the current block, and in the process, the prediction mode of the current block may be determined, and then the prediction mode parameter for indicating the prediction mode is determined, so that the corresponding prediction mode parameter is written into the code stream and transmitted to the decoder by the encoder.
Correspondingly, on the decoder side, the decoder can directly acquire the prediction mode parameters of the current block by analyzing the code stream, and determines the prediction mode used by the current block and the related parameters corresponding to the prediction mode according to the prediction mode parameters acquired by analysis.
Further, in an embodiment of the present application, after parsing to obtain the prediction mode parameter, the decoder may determine whether the current block uses an inter prediction mode based on the prediction mode parameter.
Step 102, when the prediction mode parameter indicates that the interframe prediction value of the current block is determined by using the interframe prediction mode, determining a first prediction value of a pixel point in each sub-block by using a first motion vector of each sub-block of the current block, and determining a second prediction value of each pixel point by using a second motion vector of each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block includes one or more pixel points.
In an embodiment of the present application, after the decoder obtains the prediction mode parameter through parsing, if the prediction mode parameter obtained through parsing indicates that the current block determines the inter prediction value of the current block using the inter prediction mode, the decoder may determine the first prediction value of the pixel point in each sub-block using the first motion vector of each sub-block of the current block, and at the same time, the decoder may determine the second prediction value of each pixel point using the second motion vector of each pixel point of the current block.
That is, in the present application, the decoder may use prediction based on sub-blocks for the current block to obtain one set of prediction values, i.e., a first prediction value corresponding to a pixel point in each sub-block, and may also use prediction based on points for the current block to obtain another set of prediction values, i.e., a second prediction value corresponding to each pixel point.
It can be understood that, in the present application, the first prediction value is a prediction value of a pixel point in a subblock obtained based on prediction of the subblock, and the second prediction value is a prediction value of a pixel point in a current block obtained based on prediction of a point. It can be seen that, for the same pixel (pixel position) in the current block, the decoder can obtain the corresponding two prediction values in different ways, i.e., a first prediction value obtained by sub-block-based prediction and a second prediction value obtained by point-based prediction.
It is understood that in the present application, the current block may include one or more sub-blocks; the current block may include one or more pixels (samples), each of which corresponds to a pixel position and a pixel value.
For example, in the embodiment of the present application, when determining the first prediction value of the pixel point in each sub-block, the decoder may determine the first motion vector of each sub-block of the current block, and then determine the first prediction value corresponding to the pixel point in the sub-block based on the first motion vector.
That is, in the present application, when the decoder performs prediction based on sub-blocks, the decoder may divide the current block into a plurality of sub-blocks, determine a motion vector for each sub-block, that is, each sub-block corresponds to a first motion vector, and then perform prediction on the sub-blocks by using the first motion vector of each sub-block, so as to obtain first prediction values corresponding to pixel points in the sub-blocks.
For example, in the embodiment of the present application, when determining the second prediction value of each pixel, the decoder may determine the second motion vector of each pixel of the current block, and then determine the second prediction value corresponding to the pixel based on the second motion vector.
That is to say, in the present application, when the decoder performs prediction based on sub-blocks, a motion vector may be determined for each pixel, that is, each pixel corresponds to a second motion vector, and then the decoder may predict the pixel by using the second motion vector of each pixel to obtain a corresponding second prediction value.
It is understood that, in the embodiment of the present application, both the first motion vector of the sub-block of the current block and the second motion vector of the pixel point may be derived by using an affine model. Further, the decoder may also determine the first motion vector and the second motion vector in other manners, such as other prediction techniques based on sub-blocks, for example, decoder-side motion vector modification (DMVR), BIO (bi-directional optical flow) called BDOF in VVC, sub-block based temporal motion vector prediction (SbTMVP), motion vector prediction (MVAP), and so on. The decoder may also determine a second motion vector for a pixel in a sub-block based on the first motion vector of the sub-block, e.g., determine a second motion vector for any pixel in one of the sub-blocks based on the first motion vectors of two or more of the sub-blocks.
Further, in embodiments of the present application, the decoder may use different prediction methods for the sub-block based prediction and the point based prediction. For example, the decoder may use different interpolation methods for sub-block based prediction and point based prediction. Specifically, the decoder may interpolate the values of the sub-pixels using different interpolation filters at the time of the sub-block-based prediction and the point-based prediction.
For example, in the present application, the subblock-based prediction may use a relatively complex filter, and the point-based prediction may use a relatively simple filter.
For example, in the present application, a filter with a larger number of taps may be used for subblock-based prediction, and a filter with a smaller number of taps may be used for point-based prediction.
For example, in the present application, the subblock-based prediction and the point-based prediction may use a separable two-dimensional filter or an inseparable two-dimensional filter, respectively.
It should be noted that, in the embodiment of the present application, the separable filter may be a two-dimensional filter, and specifically, the separable filter may be a two-dimensional filter that is separable horizontally and vertically or a two-dimensional filter that is separable horizontally and vertically, and may be configured by a one-dimensional filter in the horizontal direction and a one-dimensional filter in the vertical direction.
That is, in the present application, if a separable filter is used, the separable filter can perform filtering processing in the horizontal direction and the vertical direction, respectively.
It can be understood that, in the present application, as a separable filter that can be separated horizontally and vertically, filtering processing may be performed on two-dimensional pixel points in two directions, specifically, the separable filter may perform filtering in one direction (e.g., in a horizontal direction or a vertical direction) to obtain an intermediate value corresponding to the one direction, and then perform filtering in another direction (e.g., in a vertical direction or a horizontal direction) on the intermediate value to obtain a final filtering result.
It should be noted that in the present application, separable filters have been used in various common encoding and decoding scenarios, such as interpolation filtering for inter-frame block-based prediction, interpolation filtering for inter-frame sub-block-based prediction, interpolation filtering for affine sub-block-based prediction, and the like. Further, the separable filter may also be applied to block or sub-block based motion compensation in HEVC, VVC, AVS3, i.e. the motion compensated predictive interpolation filter may comprise a separable two-dimensional filter. For example, the luminance interpolation filter for affine motion compensated prediction in AVS3 may be a separable two-dimensional filter, where the luminance interpolation filter coefficients are shown in table 1:
TABLE 1
Figure PCTCN2020121676-APPB-000001
Further, in an embodiment of the present application, in performing sub-block-based prediction, for a sub-block of a current block, a decoder may determine a first filtering parameter according to a first motion vector; then, based on the first filtering parameter, filtering processing may be performed by using a first filter to obtain a first predicted value.
Specifically, in the present application, the first filter may be any one of the following filters: an n-tap interpolation filter, a separable two-dimensional filter, and an inseparable two-dimensional filter; wherein n is any one of the following values: 8,6,5,4,3,2.
It should be noted that, in an embodiment of the present application, the first filtering parameter of the first filter may include a filter coefficient of the first filter, and may further include a filter phase of the first filter, and the present application is not limited in particular.
For example, in the present application, if the first filtering parameter is a filter coefficient of the first filter, the decoder may determine the first scale parameter when determining the first filtering parameter according to the first motion vector; then, a first filtering parameter is determined according to the first scale parameter and the first motion vector.
For example, in the present application, if the first filtering parameter is a filter phase of the first filter, the decoder may first determine a mapping table of the first phase and the motion vector when determining the first filtering parameter according to the first motion vector; and then determining a first filtering parameter according to the mapping table of the first phase and the motion vector and the first motion vector.
Further, in the embodiment of the present application, when performing the point-based prediction, for a pixel point of the current block, the decoder may determine a second filtering parameter according to the second motion vector; then, based on the second filtering parameter, the second filter is used for filtering processing to obtain a second predicted value.
Specifically, in the present application, the second filter may be any one of the following filters: an m-tap interpolation filter, a separable two-dimensional filter, and an inseparable two-dimensional filter; wherein m is any one of the following values: 8,6,5,4,3,2.
It is to be understood that in the present application, m may be less than or equal to n, i.e., a filter with a higher number of taps may be used for subblock-based prediction, and a filter with a lower number of taps may be used for point-based prediction.
It should be noted that, in the embodiment of the present application, the second filtering parameter of the second filter may include a filter coefficient of the second filter, and may also include a filter phase of the second filter, and the present application is not limited specifically.
For example, in the present application, if the second filtering parameter is a filter coefficient of the second filter, the decoder may determine the second scale parameter when determining the second filtering parameter according to the second motion vector; and then determining a second filtering parameter according to the second proportion parameter and the second motion vector.
For example, in the present application, if the second filtering parameter is a filter phase of the second filter, the decoder may first determine a mapping table of the second phase and the motion vector when determining the second filtering parameter according to the second motion vector; and then determining a second filtering parameter according to the mapping table of the second phase and the motion vector and the second motion vector.
That is, in an embodiment of the present application, the first filter of the prediction based on the sub-block and the second filter of the prediction based on the point may respectively take different forms, and specifically, the first filter coefficient of the first filter and the second filter coefficient of the second filter may be respectively determined using different manners. The sub-block based prediction uses a form of determining the coefficients whose phases are determined from the table based on the motion vectors of the sub-pixels as shown in table 1 above. The point-based prediction may be in the form of determining a coefficient from a table by determining a phase from a motion vector of a sub-pixel, or in the form of calculating a coefficient from a motion vector of a sub-pixel.
For example, in the embodiment of the present application, when determining the first prediction value of a pixel point in each sub-block of the current block, the decoder may also determine a motion vector offset between each sub-block and the current block; and then determining a first predicted value of a pixel point in each sub-block based on the motion vector deviation.
It should be noted that, in the present application, the first proportional parameter or the second proportional parameter may include at least one proportional value, where at least one proportional value is a non-zero real number.
Further, in the embodiment of the present application, any one of the filter coefficients obtained by the decoder based on the motion vector calculation may be a linear function (polynomial), a quadratic function (polynomial), or a high-order function (polynomial) of the motion vector, and the present application is not particularly limited.
That is, in the present application, in the plurality of filter coefficients corresponding to the plurality of pixel points obtained by calculation according to different calculation methods in the preset calculation rule, part of the filter coefficients may be a linear function (polynomial) of the motion vector, or may be a quadratic function (polynomial) or a higher-order function (polynomial) of the first motion vector, that is, they are in a nonlinear relationship.
In some scenarios, the first predictor quantity for the sub-block may be determined even without first performing the calculation of the first motion vector. For example, bi-directional optical flow, referred to as BIO in AVS and BDOF in VVC, calculates the motion vector deviation of the subblock and the current block using the bi-directional optical flow, and then obtains a new prediction value according to the gradient, the motion vector deviation, and the current prediction value using the optical flow principle.
It should be noted that, in the present application, in some scenarios, calculation such as calculation of motion vector deviation, calculation of correction of predicted value, and the like, may be performed on a block (coding block or prediction block), for example, a 4 × 4 block, as a whole, and may also be processed according to the concept of sub-block in the present application.
It should be noted that, in the embodiment of the present application, the current block is an image block to be decoded in the current frame, the current frame is sequentially decoded in a certain order in the form of image blocks, and the current block is an image block to be decoded in the current frame at the next moment in the order. The current block may have a variety of gauge sizes, such as a 1616, 3232, or 3216 gauge, where the numbers represent the number of rows and columns of pixel points on the current block.
Further, in the embodiment of the present application, the current block may be divided into a plurality of sub-blocks, where the size of each sub-block is the same, and the sub-blocks are a set of pixels with a smaller specification. The size of the sub-blocks may be 8x8 or 4x 4.
Illustratively, in the present application, the current block has a size 1616 and can be divided into 4 sub-blocks each having a size of 8 × 8.
It can be understood that, in the embodiment of the present application, in the case that the decoder parses the code stream to obtain the prediction mode parameter indicating that the inter prediction value of the current block is determined using the inter prediction mode, the inter prediction method provided in the embodiment of the present application may be continuously used.
Further, in an embodiment of the present application, the decoder may use one sub-block of the current block as a reference block; then, determining a first predicted value of a pixel point in the reference block by using a first motion vector of the reference block, and determining a second predicted value of each pixel point by using a second motion vector of each pixel point of the reference block; wherein the reference block comprises one or more pixel points.
It should be noted that, in the embodiment of the present application, since the current block includes one or more sub-blocks and, at the same time, includes one or more pixel points, if a pixel point belongs to a sub-block, the pixel point can be predicted based on the sub-block and, at the same time, the pixel point in the sub-block is used to perform point-based prediction on the pixel point. That is, when the decoder performs sub-block-based prediction and point-based prediction, the point-based prediction and the sub-block-based prediction may share the same reference block, wherein the reference block is a block composed of reference pixels required for interpolation filtering. Specifically, for a sub-block, each pixel point constituting the sub-block may be determined as a reference block, and the reference block used for the sub-block based prediction and the reference block used for each point based prediction are both in the determined reference block. Preferably, the prediction based on sub-blocks and the prediction based on points in the corresponding sub-blocks are performed simultaneously, or the prediction based on sub-blocks and the prediction based on points in the corresponding sub-blocks are performed continuously, so as to avoid repeated reading of the reference pixels into the buffer.
It is to be understood that in the present application, if the point-based prediction and the sub-block-based prediction can share the same reference block, the decoder can limit the MV of the point-based prediction and thus can limit whether to increase the bandwidth. For example, if the subblock-based prediction uses a horizontally and vertically separable 8-tap filter and the point-based prediction uses a horizontally and vertically separable 4-tap filter, then the MV of the point at the upper left corner exceeds the MV of the subblock by 2 pixels to the left or upward, and the reference block needed by the point does not exceed the reference block of the subblock. Then the MV of the point in the upper right corner exceeds the MV of the sub-block by 2 pixels to the right or upwards, and the reference block needed for this point does not exceed the reference block of the sub-block, and accordingly the range of motion available for the point inside the sub-block is larger.
Further, the limit of the MV of the point may be relaxed appropriately, the performance and complexity may be weighed, and the size of the reference block may be increased appropriately.
103, determining a first weight and a second weight of a pixel point for a pixel point in a current block; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value.
In the present application, for a pixel point in a current block, a decoder may determine a first weight and a second weight of the pixel point respectively; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value. Specifically, the first weight and the second weight of the same pixel point may be the same or different.
It should be noted that, in the embodiment of the present application, for a pixel point in a current block, a decoder may obtain a first predicted value and a second predicted value through sub-block-based prediction and point-based prediction, and accordingly, the decoder may also perform weight value setting on the two predicted values, so as to determine a first weight corresponding to the first predicted value and a second weight corresponding to the second predicted value.
It can be understood that, in the present application, for a pixel point in a current block, a first weight is a weight value corresponding to a prediction of a decoder based on a sub-block, and a second weight is a weight value corresponding to a prediction of a decoder based on a point.
Further, in the embodiments of the present application, the decoder may determine the first weight and the second weight respectively in a plurality of different manners, wherein the decoder may determine the first weight and the second weight in the same manner or in different manners.
For example, in the present application, when determining a first weight corresponding to a prediction based on a sub-block for a pixel in a current block, a decoder may first determine a target sub-block corresponding to the pixel; the target sub-block comprises the pixel point, namely, which sub-block of a plurality of sub-blocks of the current block the pixel point belongs to is determined; then, a first distance between the pixel position of the pixel point and the reference position of the target sub-block can be determined; the reference position is the position used by the first motion vector of the target sub-block; finally, the decoder may determine a first weight according to the first distance; the first distance is inversely proportional to the first weight, that is, the larger the first distance is, the farther the pixel position of the pixel point is from the reference position is, and the smaller the first weight corresponding to the pixel point is.
That is, in the present application, based on the prediction of the sub-block, the decoder has a greater weight on a pixel position closer to a pixel position used by the first motion vector of the sub-block and a lesser weight on a pixel position farther from the pixel position used by the first motion vector of the sub-block.
For example, in the present application, when determining a first weight corresponding to a prediction based on a sub-block for a pixel in a current block, a decoder may first determine a target sub-block corresponding to the pixel; the target sub-block comprises the pixel point, namely, which sub-block of the plurality of sub-blocks of the current block the pixel point belongs to is determined; then, a deviation value between the motion vector of the pixel point and the first motion vector of the target sub-block can be determined; finally, the first weight may be further determined according to the deviation value; the deviation value is inversely proportional to the first weight, that is, the smaller the deviation value is, the closer the motion vector of the pixel point is to the first motion vector of the target sub-block is, and the larger the first weight corresponding to the pixel point is.
That is, in the present application, based on prediction of a sub-block, a decoder has a large weight for a motion vector of a pixel point and a first motion vector of the sub-block, and has a small weight for a motion vector of the pixel point and the first motion vector of the sub-block, which have a large difference.
For example, in the present application, when determining the second weight corresponding to the point-based prediction of one pixel in the current block, the decoder may first determine the corresponding sub-pixels and integer pixels of the pixel point in the reference image; a second distance between the sub-pixel and the integer pixel may then be determined; finally, a second weight can be determined according to the second distance; the second distance is inversely proportional to the second weight, that is, the larger the second distance, the farther the distance between the sub-pixel and the whole pixel is, the smaller the second weight corresponding to the pixel point is.
That is, in the present application, based on the prediction of the pixel, the decoder has a higher weight for the sub-pixel position closer to the integer pixel position in the reference image corresponding to the pixel position of the pixel point, and has a lower weight for the sub-pixel position farther from the integer pixel in the reference image corresponding to the pixel position of the pixel point.
For example, in the present application, the decoder may first determine the absolute value of the motion vector of a pixel when determining the second weight corresponding to the point-based prediction of a pixel in the current block; the second weight may then be further determined in absolute value; wherein the absolute value is inversely proportional to the second weight, i.e. the smaller the absolute value, the larger the corresponding second weight.
That is, in the present application, based on the prediction of the point, the decoder has a higher weight for a smaller absolute value of the motion vector of the pixel point and a lower weight for a larger absolute value of the motion vector of the pixel point. For example, the second weight corresponding to the pixel point with the motion vector of (1/4 ) is greater than the second weight corresponding to the pixel point with the motion vector of (1/2 ).
It is to be understood that in the present application, the decoder may set the first weight and the second weight in a different manner, or may set the first weight and the second weight in the same manner.
Further, in the present application, the setting of the first weight based on the prediction of the sub-block may use a method similar to the interleaved prediction, that is, the first weight is set to be larger for a pixel position closer to a pixel position used by the first motion vector of the sub-block and smaller for a pixel position farther from the pixel position used by the first motion vector of the sub-block based on the prediction of the sub-block.
For example, fig. 10 is a first diagram of the first weight, and as shown in fig. 10, the second weight corresponding to the pixel position closer to the pixel position used by the first motion vector of the sub-block is set to 4, and the second weight corresponding to the pixel position farther from the pixel position used by the first motion vector of the sub-block is set to 1. Wherein the application does not limit the specific value of the second weight.
For example, fig. 11 is a diagram illustrating the second first weight, and as shown in fig. 11, the second weight corresponding to the pixel position closer to the pixel position used by the first motion vector of the sub-block is set to 3, and the second weight corresponding to the pixel position farther from the pixel position used by the first motion vector of the sub-block is set to 1. Wherein, the application does not limit the specific value of the second weight.
Further, in the present application, the setting of the second weight of the point-based prediction may use a different setting method from the second weight of the sub-block-based prediction. For example, the decoder may set the second weight to be larger when the sub-pixel position in the reference image corresponding to the pixel position of the pixel point is closer to the integer pixel, and set the second weight to be smaller when the sub-pixel position in the reference image corresponding to the pixel position of the pixel point is farther from the integer pixel. Or the second weight is set to be larger when the motion vector of the pixel point is closer to the integer, and the second weight is set to be smaller when the motion vector of the pixel point is farther from the integer.
Illustratively, in the present application, the second weight of the pixel position where the motion vector of the pixel point is (1/4 ) is greater than the second weight of the pixel position where the motion vector of the pixel point is (1/2 ).
Further, in the present application, if the decoder determines the position of the second filter first, that is, the decoder determines the motion vector of the integer pixel and the motion vector of the sub-pixel corresponding to the current pixel first, when the second weight of the point-based prediction is set, the pixel position where the absolute value of the motion vector of the sub-pixel is larger is set to be smaller, and the pixel position where the absolute value of the motion vector of the sub-pixel is smaller is set to be larger.
Exemplarily, in the present application, the method for deriving the second weight w from the motion vector (x, y) of the sub-pixel is:
w=clip3(0,1,1-abs(x)-abs(y))
where abs represents the absolute value and clip3(a, b, c) represents that if c is less than a, the result takes a, if c is greater than b, the result takes b, otherwise the result takes c. Specifically, this is normalized in that 1 for x and y represents 1 pixel, and this expression can be transformed for computational convenience, such as by shifting out fractions/decimals.
It is to be understood that, in the present application, the second weight w may be multiplied by a multiple, etc. in order to match the first weight based on the predicted value of the sub-block.
It can be understood that the order of the step 102 and the step 103 performed by the decoder is not limited by the inter-frame prediction method provided in the embodiment of the present application, that is, in the present application, the decoder may perform the step 102 first and then perform the step 103, may perform the step 103 first and then perform the step 102, and may also perform the step 102 and the step 103 at the same time.
And step 104, determining the predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value.
In an embodiment of the present application, after the decoder obtains the first prediction value and the corresponding first weight based on the prediction of the sub-block and obtains the second prediction value and the corresponding second weight based on the prediction of the point, the decoder may obtain the prediction value of a pixel point in the current block by using the first weight, the second weight, the first prediction value and the second prediction value.
It can be understood that, in the embodiment of the present application, after performing the sub-block-based prediction and the point-based prediction, for a pixel point in the current block, after determining the first weight and the second weight respectively corresponding to the pixel point, the decoder may perform weighted average on the first predicted value and the second predicted value by using the first weight and the second weight, so as to obtain the predicted value of the pixel point.
Fig. 12 is a schematic view illustrating a second implementation flow of the inter prediction method, as shown in fig. 12, before determining the third predictor of the current block according to the predictors of the pixel points, that is, before step 105, the method for performing inter prediction by the decoder may further include the following steps:
and step 106, determining a predicted value of a pixel point in the current block based on the first predicted value and the second predicted value.
In the embodiment of the present application, after the decoder obtains the first prediction value based on the prediction of the sub-block and obtains the second prediction value based on the prediction of the point, the decoder may obtain the prediction value of a pixel point in the current block by using the first prediction value and the second prediction value.
It can be understood that, in the embodiment of the present application, compared with step 104, the decoder may also directly perform average operation on the first predicted value and the second predicted value of the same pixel point after obtaining the first predicted value and the second predicted value respectively, so as to obtain the predicted value of the pixel point.
That is, in the present application, the decoder may determine the final prediction result by directly using the first prediction value and the second prediction value without performing weighted average operation on the two prediction values.
In the present application, the method proposed in step 106 may be used as one of the cases in step 104, that is, the case where the first weight and the second weight are equal.
Step 105, determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
In an embodiment of the application, after determining the prediction value of the pixel point based on the first weight, the second weight, the first prediction value and the second prediction value, the decoder may further determine a third prediction value of the current block according to the prediction value of the pixel point.
It should be noted that, in the present application, the decoder may perform filtering processing on all the pixel points in the current block according to the above method, and determine the third prediction value of the current block by using the prediction values of all the pixel points in the current block, or may perform filtering processing on part of the pixel points in the current block, and determine the third prediction value of the sub block by using the prediction values of part of the pixel points in the current block.
It can be understood that, in the embodiment of the present application, the third prediction value is used to determine a reconstructed value of the current block.
Further, in the embodiment of the present application, after traversing all pixel points or a portion of pixel points in the current block, the decoder may add the prediction values of all pixel points or a portion of pixel points of the current block to obtain a summation result, and may perform normalization processing on the addition result, so as to finally obtain a third prediction value of the current block.
In the embodiment of the present application, furthermore, the inter prediction methods proposed in the above steps 101 to 106 can be applied to both unidirectional prediction and bidirectional prediction.
Specifically, fig. 13 is a schematic diagram of unidirectional prediction, and as shown in fig. 13, a decoder obtains a group of predicted values, i.e., a first predicted value, by using prediction based on sub-blocks for a current block, and further generates a predicted block based on sub-blocks, and obtains another group of predicted values, i.e., a second predicted value, by using prediction based on points for the current block, and further generates a predicted block based on points, and performs weighted average on two predicted values of the same pixel point, so that the predicted value of the pixel point can be obtained, and finally the prediction speed of the current block is obtained.
Further, in an embodiment of the present application, fig. 14 is a schematic implementation flow diagram of a third implementation flow of an inter prediction method, and as shown in fig. 14, the method for a decoder to perform inter prediction may further include the following steps:
step 201, analyzing the code stream, and determining the inter-frame prediction direction.
Step 202, if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of each pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; and determining a sixth predicted value of the pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on the second prediction direction.
Step 203, determining a third weight and a fourth weight of a pixel point of the current block based on the first prediction direction; and determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction.
And 204, obtaining a predicted value of the pixel point corresponding to the first prediction direction based on the third weight, the fourth predicted value and the fifth predicted value, and obtaining a predicted value of the pixel point corresponding to the second prediction direction based on the fifth weight, the sixth predicted value and the seventh predicted value.
And step 205, determining a predicted value of the pixel point according to the predicted value corresponding to the first prediction direction and the predicted value corresponding to the second prediction direction.
In contrast, fig. 15 is a first schematic diagram of bi-directional prediction, as shown in fig. 15, for a first reference direction and a second reference direction, a decoder may perform sub-block-based prediction and point-based prediction on each unidirectional prediction, and for the same pixel, obtain two prediction values corresponding to the prediction direction, for example, a fourth prediction value and a fifth prediction value, and finally obtain a sub-block-based prediction block and a point-based prediction block in each reference direction, and then obtain a prediction value of the pixel in the prediction direction by weighted average of a third weight and a fourth weight, so as to obtain prediction blocks in the first reference direction and the second reference direction, respectively. And finally, according to the predicted value of the pixel point in each prediction direction, calculating to obtain the final prediction result of the pixel point, namely obtaining the prediction block corresponding to the current block.
That is to say, in the present application, when the decoder performs bi-directional prediction, for each of the unidirectional predictions, a set of prediction values is obtained for the current block by using prediction based on sub-blocks, and at the same time, another set of prediction values is obtained for the current block by using prediction based on points, and the prediction values of the pixel points can be obtained by performing weighted average on the two prediction values of the same pixel points. And then averaging (or weighted averaging) the predicted values of the two unidirectional predictions to finally obtain the predicted value of the bidirectional prediction.
It should be noted that, in the present application, the averaging operation or the weighted averaging operation in the inter-frame prediction processing process may be performed after all prediction blocks that need to be averaged or weighted averaged are obtained, or may be performed after all prediction blocks of the sub-block that need to be averaged or weighted averaged are obtained, or may be performed after all prediction blocks of points that need to be averaged or weighted averaged are obtained, which is not limited in this application.
Further, in an embodiment of the present application, fig. 16 is a schematic view illustrating an implementation flow of an inter prediction method in a fourth embodiment, as shown in fig. 16, the method for a decoder to perform inter prediction may further include the following steps:
step 201, analyzing the code stream, and determining the inter-frame prediction direction.
Step 202, if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of each pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; and determining a sixth predicted value of the pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on the second prediction direction.
Step 203, determining a third weight and a fourth weight of a pixel point of the current block based on the first prediction direction; and determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction.
And step 206, determining the predicted value of the pixel point based on the third weight, the fourth weight, the fifth weight, the sixth weight, the fourth predicted value, the fifth predicted value, the sixth predicted value and the seventh predicted value.
In contrast, fig. 17 is a schematic diagram of bi-directional prediction, as shown in fig. 17, for a first reference direction and a second reference direction, a decoder may perform sub-block-based prediction and point-based prediction on each unidirectional prediction, obtain two prediction values corresponding to the prediction direction, for example, a fourth prediction value and a fifth prediction value, for the same pixel point, and finally obtain a prediction block based on the sub-block and a prediction block based on the point in each reference direction, and after determining the two prediction values in each unidirectional prediction, perform weighted average calculation by using a weight corresponding to each prediction value, and finally obtain a final prediction result of the pixel point, that is, obtain the prediction block corresponding to the current block.
That is, in the present application, when the decoder performs bi-directional prediction, for each of the uni-directional predictions, a set of prediction values is obtained for the current block by using prediction based on sub-blocks, and at the same time, another set of prediction values is obtained for the current block by using prediction based on points, and then the four prediction values of the two uni-directional predictions are averaged (or weighted average), so that the prediction values of the bi-directional prediction can be obtained finally.
It can be seen that for bi-prediction, the decoder may determine the respective predicted values of the two prediction directions and then perform an averaging or weighted average operation, or may not calculate the predicted value of each prediction direction separately, but directly obtain the weighted average of the predicted block based on the sub-block in the first prediction direction, the predicted block based on the point in the first prediction direction, the predicted block based on the sub-block in the second prediction direction, and the predicted block based on the point in the second prediction direction.
It can be understood that, in the present application, the averaging operation may be regarded as a special weighted averaging operation, and the pixel positions of the pixels in a part of the scenes may be understood as different expressions of the same concept.
Further, the inter-frame prediction method proposed in the present application may be used for only the luminance component, or may be used for the luminance component and the chrominance component, or may be used for some or all of the other formats, such as RGB. The embodiments of the present application take a luminance component as an example for illustration, but are not limited to the luminance component.
That is, the inter prediction method proposed in the present application may be applied to any image component, and in the present embodiment, a prediction scheme is exemplarily used for a luminance component, but may also be applied to a chrominance component, or any other format of components. The inter-frame prediction method proposed in the present application can also be applied to any video format, including but not limited to YUV format, including but not limited to the luma component in YUV format.
The embodiment provides an inter-frame prediction method, which can use prediction based on subblocks for a current block to obtain a group of predicted values, namely a first predicted value, use point-based prediction for the current block to obtain another group of predicted values, namely a second predicted value, and use the first weight and the second weight to perform weighted average on the first predicted value and the second predicted value after determining a first weight based on the prediction of the subblocks and a second weight based on the prediction of the points for the same pixel points in the current block, so as to finally obtain a new predicted value of the current block.
Based on the above embodiment, in the present application, if a horizontal-vertical separable 8-tap filter is used to predict each pixel point, each pixel point needs to be interpolated for 8+1 to 9 points, and each pixel point in a subblock of 4x4 needs to be interpolated for 3.75 points on average, which can be seen that the computational complexity is much higher than that of prediction based on subblocks, and the bandwidth is also greatly increased. For example, the filter corresponding to the prediction based on the subblock uses an 8-tap filter, and the filter corresponding to the prediction based on the point uses a 4-tap filter.
Illustratively, in the present application, the decoder may use a horizontally and vertically separable 4-tap filter when performing point-based prediction. Still further, the decoder may directly multiplex a 4-tap filter for subblock-based prediction, which is currently commonly used in the chrominance component interpolation process. AVS3 filter coefficients for affine sub-block based chroma are shown in table 2:
TABLE 2
Figure PCTCN2020121676-APPB-000002
Figure PCTCN2020121676-APPB-000003
For example, in the present application, the decoder may also use a 3x3 filter for point-based prediction, where the 3x3 filter may be a horizontal and vertical inseparable filter, and may also be a horizontal and vertical separable 3-tap filter. In contrast, the computational complexity and the stress on bandwidth for point-based prediction using a 3x3 filter are both less than using a 4-tap filter.
Specifically, in the present application, a horizontally and vertically separable 3-tap filter is taken as an example, when point-based prediction is performed. Each pixel point needs to interpolate 3 intermediate points in the horizontal direction and 1 point in the vertical direction, and the total number of the intermediate points is 4, and only 12 times of multiplication is needed because only 3 taps are needed. While a sub-block of size 4x4 requires 3.75 points to interpolate on average per point, 30 multiplications on average if an 8-tap filter is used, and 22.5 multiplications on average if a 6-tap filter is used.
Further, in the embodiment of the present application, if a horizontally and vertically separable 3-tap filter is used, since the center of the 3-tap filter is at the position of the center tap, the motion vector of the sub-pixel can be set to a negative value. Specifically. In setting the motion vector of the sub-pixel of the 3-tap filter, the motion vector of the sub-pixel may be set in the interval of-1/2 pixels to 1/2 pixels.
For example, if the motion vector of a pixel is 1/4 pixels, the decoder can use the pixel of the corresponding position of the pixel in the reference image as the center, and the motion vector of the sub-pixel is 1/4. Fig. 18 is a first schematic diagram of interpolation filtering, and as shown in fig. 18, a pixel 1 is a pixel at a corresponding position of a current pixel 2 in a reference image, and 3 pixel points centered on the pixel 1 may be used to perform interpolation filtering on the current pixel 2.
For example, if the motion vector of a pixel is 3/4 pixels, the decoder can center the pixel on the right of the reference image with the sub-pixel motion vector being-1/4. Fig. 19 is a schematic diagram of interpolation filtering, and as shown in fig. 19, a pixel 1 is a pixel at a corresponding position of a current pixel 2 in a reference image, and a pixel 3 is a pixel on the right side of the current pixel 2, so that the current pixel 2 can be interpolated and filtered by using 3 pixels centered on the pixel 3.
Further, in the embodiment of the present application, if a horizontally and vertically separable 3-tap filter is used, when setting the motion vector of the sub-pixel of the 3-tap filter, the motion vector of the sub-pixel may also be set in the interval of-1 pixel to 1 pixel.
For example, if the motion vector of a pixel is 3/4 pixels, the decoder may use the pixel at the corresponding position of the pixel in the reference image as the center, the motion vector of the sub-pixel is 3/4 pixels, and fig. 20 is a schematic diagram of interpolation filtering, where as shown in fig. 20, the pixel 1 is the pixel at the corresponding position of the current pixel 2 in the reference image, and the current pixel 2 may be interpolated and filtered by using 3 pixels centered on the pixel 1.
Further, in the embodiment of the present application, when performing prediction based on points, the position of the filter used by each pixel point may be determined first, then the calculation of the motion vector of each sub-pixel is directly performed based on the position of the filter, and finally interpolation filtering is performed according to 3 points used by the filter.
For example, the decoder may determine a pixel point in the reference image that is located at the same position as the current pixel, and use the pixel point as the center of the filter corresponding to the current pixel. Specifically, if the motion vector of a pixel is 5/4 pixels, the corresponding sub-pixel motion vector is 5/4 pixels, and the whole pixel motion vector is 0 pixels. Fig. 21 is a fourth schematic diagram of interpolation filtering, and as shown in fig. 21, a pixel 1 is a pixel at a corresponding position of a current pixel 2 in a reference image, and 3 pixel points centered on the pixel 1 may be used to perform interpolation filtering on the current pixel 2. In this scenario, the motion vector of an integer pixel is referred to as motion vector 1, and the motion vector of a sub-pixel is referred to as motion vector 2, where motion vector 1 is 0 pixel and motion vector 2 is 5/4 pixels.
It can be understood that, in the present application, the interpolation filtering method may set the position of the reference pixel used by the filter for a group of pixel points according to a uniform rule, so that the implementation is simpler and more convenient. For example, the same motion vector 1 is set for each pixel of the same sub-block, so that the reference pixels used by the filter of each pixel are regularly arranged, similar to the rule of predicting the reference pixel used by each point based on the sub-block, but the number of the reference pixels used for the point-based prediction is smaller. Accordingly, uniform reading of reference pixels can be realized also for hardware, and parallel operation of software, such as Single Instruction Multiple Data (SIMD) is also more advantageous, so that the control bandwidth can be reduced.
Exemplarily, fig. 22 is a schematic diagram of interpolation filtering, where as shown in fig. 22, a pixel point A, B, C, D is a reference pixel list corresponding to the same position of 4 adjacent pixel points in the current block in the reference image respectively, pixel points a, b, c, d respectively represent their actual positions in the reference image, if the motion vector of the sub-pixel is set to have a value ranging from-1/2 pixels to 1/2 pixels, upon point-based prediction, the horizontal direction filters of the horizontal and vertical separable 3-tap filters used are respectively four horizontal blocks in the figure, it can be seen that the difference between the filters of the pixel point a and the pixel point B is 1 pixel in the horizontal direction, but the difference between the filters of the pixel point a and the pixel point C is 2 pixels in the horizontal direction, and the difference between the filters of the pixel point D and the pixel point C is 1 pixel in the horizontal direction.
Exemplarily, fig. 23 is a schematic diagram six of interpolation filtering, as shown in fig. 23, a pixel point A, B, C, D is a reference pixel unit corresponding to the same positions of several adjacent pixel points in a reference image respectively in a current block, and pixel points a, b, c, and d respectively represent the actual positions of the adjacent pixel points in the reference image, if the positions of the filters are determined first, then the centers of the filters in the horizontal direction in the horizontal and vertical separable 3-tap filters used in prediction based on the points are the positions corresponding to the several pixel points respectively, that is, the motion vectors (motion vectors 1) of the whole pixels are all 0, and it can be determined that the filters in the horizontal direction corresponding to the 4 adjacent pixel points also have a difference of 1 pixel.
Further, in the embodiment of the present application, when determining the position of the filter of a group of pixels, that is, when determining the motion vector 1 (motion vector of whole pixel) of a group of pixels, the method may be performed according to a principle that the motion vectors 2 (motion vectors of sub-pixels) corresponding to the group of pixels are more uniformly dispersed around 0. Specifically, after determining the motion vector 2 corresponding to the group of pixel points, the motion vectors 2 may be summed, and then the position corresponding to the minimum summation result may be determined as the position of the filter of the group of pixel points, that is, the decoder may determine the motion vector of the integer pixel corresponding to the minimum sum of the motion vectors 2 corresponding to the group of pixel points as the motion vector 1 corresponding to the group of pixel points.
It can be understood that, in the present application, the group of pixel points proposed by the above method may be pixel points of subblocks divided based on prediction of subblocks, or may also be a group of subblock subdivision based on prediction of subblocks, such as subdividing a subblock of 4x4 into 4 groups of 2x 2. There may also be "interleaving" with sub-blocks partitioned based on prediction of the sub-blocks, but since there is no data dependency between the points, the intermediate result of interpolation is not shared, and therefore the complexity is lower than that of interleaved prediction.
Further, in the present application, the decoder may further determine the second filter for point-based prediction in other various ways. For example, the decoder may determine the second filter parameters of the second filter using a form of a function or polynomial, where the function or polynomial may include a spline function, a piecewise function, or the like. The decoder may further determine the second filter parameters of the second filter using a form of look-up table (mapping table of phase to motion vector). The present application is not particularly limited.
For example, in the present application, if the second filter used in the point-based prediction is a 3-tap filter, the second filter parameter in the horizontal direction of the second filter, i.e., the coefficient of the filter, may be represented as the following table 3, assuming that the motion vector (motion vector 2) of the sub-pixel corresponding to the position of the current pixel is (mv _ x, mv _ y):
TABLE 3
Pixel point Filter coefficient
Left side of mv_x×mv_x-mv_x/2
Center of a ship 1-2×mv_x×mv_x
Right side mv_x×mv_x+mv_x/2
The second filter parameter of the vertical direction of the second filter, i.e., the coefficient of the filter, can be expressed as the following table 4:
TABLE 4
Pixel point Filter coefficient
On the upper part mv_y×mv_y-mv_y/2
Center of a ship 1-2×mv_y×mv_y
Lower part mv_y×mv_y+mv_y/2
It should be noted that, in the present application, the filter coefficient calculated by the decoder according to the preset calculation rule based on the motion vector of the sub-pixel corresponding to the current pixel may be regarded as a function (polynomial) related to mv _ x or mv _ y. However, since the filter of the separable filter has only 3 taps in the horizontal direction or the vertical direction, when the pixel division point is more off-center, the filter coefficient accuracy obtained by using the same calculation rule, i.e., calculation using the same simple function (e.g., a first order, a quadratic function) may be poor.
In order to improve the accuracy of the filter coefficients, in the present application, when the filter coefficients are calculated using the motion vectors of the sub-pixels corresponding to the current pixel, a piecewise function may be used instead of the original simple function, that is, at least one filter coefficient, that is, the second filter coefficient may be calculated by the piecewise function.
It is to be understood that in the present application, the piecewise function may be referred to as a spline function (spline function). Illustratively, when the value (absolute value) of mv _ x or mv _ y is less than (or equal to) a threshold value, the corresponding filter coefficient may be derived using one function (polynomial), and when the value (absolute value) of mv _ x or mv _ y is greater than (or equal to) the threshold value, the corresponding filter coefficient may be derived using the other function (polynomial). Namely, a two-segment type segmentation function is adopted to calculate the filter coefficient.
Further, in the embodiment of the present application, a three-segment or multi-segment segmentation function may also be used to calculate the filter coefficients.
Specifically, in the present application, when different filter coefficients in a plurality of filter coefficients corresponding to a pixel point are calculated, the same piecewise function may be used, or different piecewise functions may be used; the thresholds used for mv _ x or mv _ y may or may not all be the same when computing different filter coefficients in a plurality of filter coefficients corresponding to a pixel point.
Further, in the embodiment of the present application, during the process of calculating the filter coefficient, the decoder may further use a preset upper limit value and/or a preset lower limit value to limit the final calculation result. For example, when the calculated filter coefficient is greater than or equal to a preset upper limit value, the preset upper limit value is directly derived as a corresponding filter coefficient, or when the calculated filter coefficient is less than or equal to a preset lower limit value, the preset lower limit value is directly derived as a corresponding filter coefficient.
It can be understood that the piecewise function manner or the size limitation manner proposed in the present application may not only be applied to the separable filter, but also be applied to a common two-dimensional filter, that is, a two-dimensional filter that is not separable horizontally or vertically, and specifically, at least one coefficient of the filter may be calculated according to the piecewise function, or the size of the final calculation result may be limited according to a preset upper limit value and a preset lower limit value.
It should be noted that, in the present application, the manner of limiting the size of the filter coefficient may be understood as a more specific piecewise function, and therefore, the decoder may adopt the manner of piecewise function and the manner of limiting the size at the same time when calculating the filter coefficient.
Further, in this application, if the second filtering parameter of the second filter is the filter phase, the decoder may determine the mapping table of the second phase and the motion vector when determining the second filtering parameter, and then determine the filter phase corresponding to the pixel point, that is, the second filtering parameter, according to the mapping table of the second phase and the motion vector and the second motion vector.
It should be noted that, in the present application, the decoder may obtain the mapping table of the first phase and the motion vector or the mapping table of the second phase and the motion vector through training or calculation.
That is to say, in the embodiment of the present application, the mapping table of the first phase and the motion vector or the mapping table of the second phase and the motion vector may be obtained by derivation through a certain calculation formula, or may be obtained by direct training. When the mapping table of the phase and the motion vector is obtained through calculation, derivation can be performed according to a piecewise function mode, and derivation can be performed according to other more complex formulas. Of course, there may be filter phases that are not necessarily derived from a formula, but rather trained.
In the embodiment of the present application, if the precision of the motion vector is high, or the precision of the motion vector used in filtering is high, or the possible values of mv _ x or mv _ y are many, or the complexity of calculating the coefficient according to the motion vector is not high, it is reasonable to determine the filter parameter by calculating the filter coefficient through the motion vector and the scaling parameter. However, if the accuracy of the motion vector is not high, or the accuracy of the motion vector used in the filtering is not high, or there are not many possible values for mv _ x or mv _ y, or the complexity of calculating the coefficients from the motion vector is high, then it may be preferable to determine the filter parameters as the filter phase, i.e. to make the separable filter into a plurality of phases, each in the form of a fixed set of coefficients.
Illustratively, if the motion vector precision when filtering is 1/16 pixels precision, the maximum is 16/16 pixels, and the minimum is-16/16 pixels, there are 33 possible values of mv _ x or mv _ y in total. For example, the mapping table of the phase and the motion vector is shown in table 5, where, for the horizontal direction, the motion vector difference mv _ x of the pixel corresponding to mv in the horizontal direction, coef0 is the coefficient of the adjacent pixel on the left of the pixel, coef1 is the coefficient of the pixel, and coef2 is the coefficient of the adjacent pixel on the right of the pixel. For the vertical direction, the motion vector difference mv _ y of the mv-corresponding pixel in the vertical direction, coef0 is the coefficient of the adjacent pixel above the pixel, coef1 is the coefficient of the pixel, and coef2 is the coefficient of the adjacent pixel below the pixel.
Further, in an embodiment of the present application, the decoder may determine a precision parameter of the motion vector, and determine a reduction parameter based on the precision parameter; so that the reduction processing and/or the right shift processing can be performed in accordance with the reduction parameter after the filter processing with the filter phase.
It is understood that in the present application, if the motion vector precision when performing filtering is 1/16 pixel precision, the filter coefficients are all amplified by 256 times, and therefore, after the filtering process is completed and the filtering result is obtained, the filtering result needs to be reduced by 256 times or shifted to the right by 8 bits.
Further, in the present application, the reduction or the right shift may be performed after the one-way filtering, respectively; or after the horizontal and vertical filtering is finished; if the filter is respectively reduced or right-shifted after unidirectional filtering, the reduction times or the right-shifted digits need to be adjusted, so that the consistency of the total reduction times or the right-shifted digits and the magnification of the filter coefficient is ensured.
TABLE 5
mv coef0 coef1 coef2
-16 256 0 0
-15 240 16 0
-14 224 32 0
-13 208 48 0
-12 192 64 0
-11 176 80 0
-10 160 96 0
-9 144 112 0
-8 128 128 0
-7 105 158 -7
-6 84 184 -12
-5 65 206 -15
-4 48 224 -16
-3 33 238 -15
-2 20 248 -12
-1 9 254 -7
0 0 256 0
1 -7 254 9
2 -12 248 20
3 -15 238 33
4 -16 224 48
5 -15 206 65
6 -12 184 84
7 -7 158 105
8 0 128 128
9 0 112 144
10 0 96 160
11 0 80 176
12 0 64 192
13 0 48 208
14 0 32 224
15 0 16 240
16 0 0 256
The embodiment provides an inter-frame prediction method, which can use prediction based on subblocks for a current block to obtain a group of predicted values, namely a first predicted value, use point-based prediction for the current block to obtain another group of predicted values, namely a second predicted value, and use the first weight and the second weight to perform weighted average on the first predicted value and the second predicted value after determining a first weight based on the prediction of the subblocks and a second weight based on the prediction of the points for the same pixel points in the current block, so as to finally obtain a new predicted value of the current block.
Fig. 24 is a flowchart illustrating a fifth implementation of the inter prediction method, and as shown in fig. 24, the method for performing inter prediction by the encoder may include the following steps:
step 401, determining the prediction mode parameter of the current block.
In an embodiment of the present application, an encoder may first determine a prediction mode parameter of a current block. Specifically, the encoder may first determine a prediction mode used by the current block and then determine corresponding prediction mode parameters based on the prediction mode. Wherein the prediction mode parameter may be used to determine a prediction mode used by the current block.
It should be noted that, in the embodiment of the present application, the prediction mode parameter indicates the prediction mode adopted by the current block and a parameter related to the prediction mode. Here, for the determination of the prediction mode parameter, a simple decision strategy may be adopted, such as determining according to the magnitude of the distortion value; a complex decision strategy, such as determination based on the result of Rate Distortion Optimization (RDO), may also be adopted, and the embodiment of the present application is not limited in any way. Generally, the prediction mode parameters of the current block may be determined in an RDO manner.
Specifically, in some embodiments, when determining the prediction mode parameter of the current block, the encoder may perform pre-coding processing on the current block by using multiple prediction modes to obtain a rate-distortion cost value corresponding to each prediction mode; and then selecting the minimum rate distortion cost value from the obtained multiple rate distortion cost values, and determining the prediction mode parameters of the current block according to the prediction mode corresponding to the minimum rate distortion cost value.
That is, on the encoder side, the current block may be pre-encoded in a plurality of prediction modes for the current block. Here, the plurality of prediction modes generally include an inter prediction mode, a conventional intra prediction mode, and a non-conventional intra prediction mode; the conventional Intra Prediction mode may include a Direct-Current (DC) mode, a PLANAR (PLANAR) mode, an angular mode, and the like, the non-conventional Intra Prediction mode may include a Matrix-based Intra Prediction (MIP) mode, a Cross-component Linear Model Prediction (CCLM) mode, an Intra Block Copy (IBC) mode, a plt (pattern) mode, and the like, and the inter Prediction mode may include a general inter Prediction mode, a GPM mode, an AWP mode, and the like.
Therefore, after the current block is pre-coded by utilizing a plurality of prediction modes, the rate distortion cost value corresponding to each prediction mode can be obtained; and then selecting the minimum rate distortion cost value from the obtained multiple rate distortion cost values, and determining the prediction mode corresponding to the minimum rate distortion cost value as the prediction mode parameter of the current block. In addition, after the current block is pre-coded by utilizing a plurality of prediction modes, a distortion value corresponding to each prediction mode can be obtained; then, the minimum distortion value is selected from the obtained distortion values, the prediction mode corresponding to the minimum distortion value is determined as the prediction mode used by the current block, and the corresponding prediction mode parameters are set according to the prediction mode. In this way, the determined prediction mode parameters are finally used for encoding the current block, and in the prediction mode, the prediction residual error can be smaller, and the encoding efficiency can be improved.
That is to say, on the encoding side, the encoder may select an optimal prediction mode to perform pre-encoding on the current block, and in this process, the prediction mode of the current block may be determined, and then a prediction mode parameter for indicating the prediction mode is determined, so that the corresponding prediction mode parameter is written into the code stream and transmitted to the decoder by the encoder.
Correspondingly, on the decoder side, the decoder can directly acquire the prediction mode parameters of the current block by analyzing the code stream, and determines the prediction mode used by the current block and the related parameters corresponding to the prediction mode according to the prediction mode parameters acquired by analysis.
Step 402, when the prediction mode parameter indicates that the inter prediction value of the current block is determined by using the inter prediction mode, determining a first prediction value of a pixel point in each sub-block by using a first motion vector of each sub-block of the current block, and determining a second prediction value of each pixel point by using a second motion vector of each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block includes one or more pixel points.
In an implementation of the present application, after determining the prediction mode parameter, if the prediction mode parameter indicates that the current block determines the inter prediction value of the current block using the inter prediction mode, the encoder may determine the first prediction value of the pixel point in each sub-block using the first motion vector of each sub-block of the current block, and at the same time, the encoder may determine the second prediction value of each pixel point using the second motion vector of each pixel point of the current block.
That is to say, in the present application, the encoder may use prediction based on sub-blocks for the current block to obtain a set of prediction values, that is, a first prediction value corresponding to a pixel point in each sub-block, and meanwhile, the encoder may use prediction based on points for the current block to obtain another set of prediction values, that is, a second prediction value corresponding to each pixel point.
It can be understood that, in the present application, the first prediction value is a prediction value of a pixel point in a subblock obtained based on prediction of the subblock, and the second prediction value is a prediction value of a pixel point in a current block obtained based on prediction of a point. It can be seen that, for the same pixel (pixel position) in the current block, the encoder can obtain the corresponding two prediction values in different ways, i.e., a first prediction value obtained by sub-block-based prediction and a second prediction value obtained by point-based prediction.
It is to be understood that in the present application, the current block may include one or more sub-blocks; the current block may include one or more pixels (samples), each corresponding to a pixel location and a pixel value.
For example, in an embodiment of the present application, when determining the first prediction value of a pixel point in each sub-block, the encoder may determine the first motion vector of each sub-block of the current block, and then determine the first prediction value corresponding to the pixel point in the sub-block based on the first motion vector.
That is, in this application, when the encoder performs prediction based on sub-blocks, the encoder may divide the current block into sub-blocks, determine a motion vector for each sub-block, that is, each sub-block corresponds to a first motion vector, and then use the first motion vector of each sub-block to perform prediction on the sub-blocks, so as to obtain first prediction values corresponding to pixel points in the sub-blocks.
For example, in the embodiment of the present application, when determining the second prediction value of each pixel point, the encoder may determine the second motion vector of each pixel point of the current block, and then determine the second prediction value corresponding to the pixel point based on the second motion vector.
That is to say, in the present application, when the encoder performs prediction based on sub-blocks, a motion vector may be determined for each pixel, that is, each pixel corresponds to a second motion vector, and then the encoder may predict the pixel by using the second motion vector of each pixel to obtain a corresponding second prediction value.
It is understood that, in the embodiment of the present application, both the first motion vector of the sub-block of the current block and the second motion vector of the pixel point may be derived by using an affine model. Further, the encoder may also perform the determination of the first motion vector and the second motion vector in other manners, such as other sub-block based prediction techniques. The encoder may also determine a second motion vector for a pixel in a sub-block based on the first motion vectors of the sub-blocks, such as determining a second motion vector for any pixel in one of the sub-blocks based on the first motion vectors of two or more of the sub-blocks.
Further, in embodiments of the present application, the encoder sub-block based prediction and the point based prediction may use different prediction methods. For example, the encoder may use different interpolation methods for sub-block based prediction and point based prediction. Specifically, in the sub-block-based prediction and the point-based prediction, the encoder may interpolate the values of the sub-pixels using different interpolation filters.
For example, in the present application, the subblock-based prediction may use a relatively complex filter, and the point-based prediction may use a relatively simple filter.
For example, in the present application, a filter with a larger number of taps may be used for subblock-based prediction, and a filter with a smaller number of taps may be used for point-based prediction.
For example, in the present application, the subblock-based prediction and the point-based prediction may use a separable two-dimensional filter or an inseparable two-dimensional filter, respectively.
It should be noted that, in the embodiment of the present application, the separable filter may be a two-dimensional filter, and specifically, the separable filter may be a two-dimensional filter that is separable horizontally and vertically or a two-dimensional filter that is separable horizontally and vertically, and may be configured by a one-dimensional filter in the horizontal direction and a one-dimensional filter in the vertical direction.
That is, in the present application, if a separable filter is used, the separable filter can perform filtering processing in the horizontal direction and the vertical direction, respectively.
It can be understood that, in the present application, as a separable filter that can be separated horizontally and vertically, filtering processing may be performed on two-dimensional pixel points in two directions, specifically, the separable filter may perform filtering in one direction (e.g., in a horizontal direction or a vertical direction) to obtain an intermediate value corresponding to the one direction, and then perform filtering in another direction (e.g., in a vertical direction or a horizontal direction) on the intermediate value to obtain a final filtering result.
It should be noted that in the present application, separable filters have been used in various common encoding and decoding scenarios, such as interpolation filtering for inter-frame block-based prediction, interpolation filtering for inter-frame sub-block-based prediction, interpolation filtering for affine sub-block-based prediction, and the like. Further, the separable filter may also be applied to block or sub-block based motion compensation in HEVC, VVC, AVS3, i.e. the motion compensated predictive interpolation filter may comprise a separable two-dimensional filter. For example, the luminance interpolation filter for affine motion compensated prediction in AVS3 may be a separable two-dimensional filter, where the luminance interpolation filter coefficients are as shown in table 1 above.
Further, in an embodiment of the present application, in performing sub-block-based prediction, for a sub-block of the current block, the encoder may determine a first filtering parameter according to the first motion vector; then, based on the first filtering parameter, filtering processing may be performed by using a first filter to obtain a first predicted value.
Specifically, in the present application, the first filter may be any one of the following filters: an n-tap interpolation filter, a separable two-dimensional filter, an inseparable two-dimensional filter; wherein n is any one of the following values: 8,6,5,4,3,2.
It should be noted that, in the embodiment of the present application, the first filtering parameter of the first filter may include a filter coefficient of the first filter, and may also include a filter phase of the first filter, and the present application is not limited specifically.
For example, in the present application, if the first filtering parameter is a filter coefficient of the first filter, the encoder may first determine the first scale parameter when determining the first filtering parameter according to the first motion vector; then, a first filtering parameter is determined according to the first scale parameter and the first motion vector.
For example, in the present application, if the first filtering parameter is a filter phase of the first filter, the encoder may first determine a mapping table of the first phase and the motion vector when determining the first filtering parameter according to the first motion vector; and then determining a first filtering parameter according to the mapping table of the first phase and the motion vector and the first motion vector.
Further, in the embodiment of the present application, when performing the point-based prediction, for a pixel point of the current block, the encoder may determine a second filtering parameter according to the second motion vector; then, based on the second filtering parameter, the second filter is used for filtering processing to obtain a second predicted value.
Specifically, in the present application, the second filter may be any one of the following filters: an m-tap interpolation filter, a separable two-dimensional filter, and an inseparable two-dimensional filter; wherein m is any one of the following values: 8,6,5,4,3,2.
It is to be understood that in the present application, m may be less than or equal to n, i.e., a filter with a higher number of taps may be used for subblock-based prediction, and a filter with a lower number of taps may be used for point-based prediction.
It should be noted that, in the embodiment of the present application, the second filtering parameter of the second filter may include a filter coefficient of the second filter, and may also include a filter phase of the second filter, and the present application is not limited specifically.
For example, in the present application, if the second filtering parameter is a filter coefficient of the second filter, the encoder may determine the second scale parameter first when determining the second filtering parameter according to the second motion vector; and then determining a second filtering parameter according to the second proportion parameter and the second motion vector.
For example, in this application, if the second filtering parameter is a filter phase of the second filter, the encoder may first determine a mapping table of the second phase and the motion vector when determining the second filtering parameter according to the second motion vector; and then determining a second filtering parameter according to the mapping table of the second phase and the motion vector and the second motion vector.
That is, in an embodiment of the present application, the first filter of the prediction based on the sub-block and the second filter of the prediction based on the point may respectively take different forms, and specifically, the first filter coefficient of the first filter and the second filter coefficient of the second filter may be respectively determined using different manners. The sub-block based prediction uses a form of determining coefficients whose phases are determined from the table according to the motion vectors of the sub-pixels as shown in table 1 above. The point-based prediction may be in the form of determining a coefficient from a table by determining a phase from a motion vector of a sub-pixel, or in the form of calculating a coefficient from a motion vector of a sub-pixel.
For example, in the embodiment of the present application, when determining the first prediction value of a pixel point in each sub-block of the current block, the encoder may also determine a motion vector deviation between each sub-block and the current block; and then determining a first predicted value of a pixel point in each sub-block based on the motion vector deviation.
It should be noted that, in the present application, the first proportional parameter or the second proportional parameter may include at least one proportional value, where at least one proportional value is a non-zero real number.
Further, in the embodiment of the present application, the encoder may calculate any one of the filter coefficients obtained based on the motion vector, and may be a linear function (polynomial), a quadratic function (polynomial) or a high-order function (polynomial) of the motion vector, and the present application is not particularly limited.
That is, in the present application, in the plurality of filter coefficients corresponding to the plurality of pixel points obtained by calculation according to different calculation methods in the preset calculation rule, the encoder may obtain part of the filter coefficients as a linear function (polynomial) of the motion vector, or as a quadratic function (polynomial) or a high-order function (polynomial) of the second motion vector, that is, as a nonlinear relationship.
In some scenarios, a first predictor quantity for a sub-block may be determined even without first performing a calculation of a first motion vector. For example, bi-directional optical flow, referred to as BIO in AVS and BDOF in VVC, calculates the motion vector deviation of the subblock and the current block using the bi-directional optical flow, and then obtains a new prediction value according to the gradient, the motion vector deviation, and the current prediction value using the optical flow principle.
It should be noted that, in the present application, in some scenarios, calculation such as calculation of motion vector deviation, calculation of predicted value correction, and the like may be performed on a block (coding block or prediction block), for example, a 4 × 4 block as a whole, and may also be processed according to the concept of sub-block in the present application.
It should be noted that, in the embodiment of the present application, the current block is an image block to be encoded in the current frame, the current frame is sequentially encoded in a certain order in the form of an image block, and the current block is an image block to be encoded in the current frame at the next moment in the order. The current block may have a variety of gauge sizes, such as a 1616, 3232, or 3216 gauge, where the numbers represent the number of rows and columns of pixel points on the current block.
Further, in the embodiment of the present application, the current block may be divided into a plurality of sub-blocks, where the size of each sub-block is the same, and the sub-blocks are a set of pixels with a smaller specification. The sub-blocks may be 8x8 or 4x4 in size.
For example, in the present application, the current block has a size of 1616, and may be divided into 4 sub-blocks each having a size of 8 × 8.
It can be understood that, in the embodiment of the present application, in the case that the encoder determines that the prediction mode parameter indicates that the inter prediction value of the current block is determined using the inter prediction mode, the inter prediction method provided by the embodiment of the present application may be continuously employed.
Further, in an embodiment of the present application, the encoder may take one sub-block of the current block as a reference block; then, determining a first predicted value of a pixel point in the reference block by using a first motion vector of the reference block, and determining a second predicted value of each pixel point by using a second motion vector of each pixel point of the reference block; wherein the reference block comprises one or more pixel points.
It should be noted that, in the embodiment of the present application, since the current block includes one or more sub-blocks and, at the same time, includes one or more pixel points, if a pixel point belongs to a sub-block, the pixel point can be predicted based on the sub-block and at the same time, the pixel point in the sub-block is used for performing point-based prediction on the pixel point. That is, when the encoder performs the subblock-based prediction and the point-based prediction, the point-based prediction and the subblock-based prediction may share the same reference block, wherein the reference block is a block composed of reference pixels required for interpolation filtering. Specifically, for a sub-block, each pixel point constituting the sub-block may be determined as a reference block, and the reference block used for the sub-block based prediction and the reference block used for each point based prediction are both in the determined reference block. Preferably, the prediction based on sub-blocks and the point-based prediction of points within the corresponding sub-blocks may be performed simultaneously, or the prediction based on sub-blocks and the point-based prediction of points within the corresponding sub-blocks may be performed consecutively, thereby avoiding repeated reading of reference pixels into the buffer.
It is to be understood that in the present application, if the point-based prediction and the sub-block-based prediction can share the same reference block, the encoder can limit the MV of the point-based prediction and thus can limit whether to increase the bandwidth. For example, if the subblock-based prediction uses a horizontally and vertically separable 8-tap filter and the point-based prediction uses a horizontally and vertically separable 4-tap filter, then the MV of the point at the upper left corner exceeds the MV of the subblock by 2 pixels to the left or upward, and the reference block needed by the point does not exceed the reference block of the subblock. Then the MV of the point in the upper right corner exceeds the MV of the sub-block by 2 pixels to the right or upwards, and the reference block needed for this point does not exceed the reference block of the sub-block, and accordingly the range of motion available for the point inside the sub-block is larger.
Further, the limit of the MV of the point may be relaxed appropriately, the performance and complexity may be weighed, and the size of the reference block may be increased appropriately.
Step 403, determining a first weight and a second weight of a pixel point in a current block; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value.
In the present application, for a pixel point in a current block, an encoder may determine a first weight and a second weight of the pixel point respectively; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value. Specifically, the first weight and the second weight of the same pixel point may be the same or different.
It should be noted that, in the embodiment of the present application, for a pixel point in a current block, an encoder may obtain a first predicted value and a second predicted value through sub-block-based prediction and point-based prediction, and accordingly, the encoder may also perform weight value setting on the two predicted values, so as to determine a first weight corresponding to the first predicted value and a second weight corresponding to the second predicted value.
It can be understood that, in the present application, for a pixel point in a current block, a first weight is a weight value corresponding to a prediction of an encoder based on a sub-block, and a second weight is a weight value corresponding to a prediction of an encoder based on a point.
Further, in the embodiments of the present application, the encoder may determine the first weight and the second weight respectively in a plurality of different manners, wherein the encoder may determine the first weight and the second weight in the same manner or in different manners.
For example, in the present application, when determining a first weight corresponding to a prediction based on a sub-block for a pixel in a current block, an encoder may first determine a target sub-block corresponding to the pixel; the target sub-block comprises the pixel point, namely, which sub-block of a plurality of sub-blocks of the current block the pixel point belongs to is determined; then, a first distance between the pixel position of the pixel point and the reference position of the target sub-block can be determined; wherein the reference position is a position used by a first motion vector of the target sub-block; finally, the encoder may determine a first weight according to the first distance; the first distance is inversely proportional to the first weight, that is, the larger the first distance is, the farther the pixel position of the pixel point is from the reference position is, and the smaller the first weight corresponding to the pixel point is.
That is, in the present application, based on the prediction of the sub-block, the encoder has a greater weight on a pixel position closer to a pixel position used by the first motion vector of the sub-block and a lesser weight on a pixel position farther from the pixel position used by the first motion vector of the sub-block.
For example, in the present application, when determining a first weight corresponding to a prediction based on a sub-block for a pixel in a current block, an encoder may first determine a target sub-block corresponding to the pixel; the target sub-block comprises the pixel point, namely, which sub-block of the plurality of sub-blocks of the current block the pixel point belongs to is determined; then, a deviation value between the motion vector of the pixel point and the first motion vector of the target sub-block can be determined; finally, a first weight may be further determined according to the deviation value; the deviation value is inversely proportional to the first weight, that is, the smaller the deviation value is, the closer the motion vector of the pixel point is to the first motion vector of the target sub-block is, and the larger the first weight corresponding to the pixel point is.
That is, in the present application, based on prediction of a sub-block, the encoder has a large weight for a motion vector of a pixel point and a first motion vector of the sub-block, and has a small weight for a motion vector of the pixel point and the first motion vector of the sub-block, which have a large difference.
For example, in the present application, when determining the second weight corresponding to the point-based prediction of one pixel in the current block, the encoder may first determine the corresponding sub-pixel and integer pixel of the pixel point in the reference image; a second distance between the sub-pixel and the integer pixel may then be determined; finally, a second weight can be determined according to the second distance; the second distance is inversely proportional to the second weight, that is, the larger the second distance, the farther the distance between the sub-pixel and the whole pixel is, the smaller the second weight corresponding to the pixel point is.
That is, in the present application, based on the prediction of the pixel, the encoder has a higher weight for the sub-pixel position in the reference image corresponding to the pixel position of the pixel point to be closer to the integer pixel position, and has a lower weight for the sub-pixel position in the reference image corresponding to the pixel position of the pixel point to be farther from the integer pixel.
For example, in the present application, the encoder may first determine the absolute value of the motion vector of a pixel in the current block when determining the second weight corresponding to the point-based prediction of the pixel; the second weight may then be further determined in absolute value; wherein the absolute value is inversely proportional to the second weight, i.e. the smaller the absolute value, the larger the corresponding second weight.
That is, in the present application, based on the prediction of the point, the encoder has a higher weight for a smaller absolute value of the motion vector of the pixel point and a lower weight for a larger absolute value of the motion vector of the pixel point. For example, the second weight corresponding to the pixel point with the motion vector of (1/4 ) is greater than the second weight corresponding to the pixel point with the motion vector of (1/2 ).
It is to be understood that in the present application, the encoder may set the first weight and the second weight in a different manner, or may set the first weight and the second weight in the same manner.
Further, in the present application, the setting of the first weight based on the prediction of the sub-block may use a method similar to the interleaved prediction, that is, the first weight is set to be larger for a pixel position closer to a pixel position used by the first motion vector of the sub-block and smaller for a pixel position farther from the pixel position used by the first motion vector of the sub-block based on the prediction of the sub-block.
Further, in the present application, the setting of the second weight of the point-based prediction may use a different setting method from the second weight of the sub-block-based prediction. For example, the encoder may set the second weight to be larger when the sub-pixel position in the reference image corresponding to the pixel position of the pixel point is closer to the integer pixel, and set the second weight to be smaller when the sub-pixel position in the reference image corresponding to the pixel position of the pixel point is farther from the integer pixel. Or the second weight is set to be larger when the motion vector of the pixel point is closer to the integer, and the second weight is set to be smaller when the motion vector of the pixel point is farther from the integer.
Illustratively, in the present application, the second weight of the pixel position where the motion vector of the pixel point is (1/4 ) is greater than the second weight of the pixel position where the motion vector of the pixel point is (1/2 ).
Further, in the present application, if the encoder determines the position of the second filter first, that is, the encoder determines the motion vector of the integer pixel and the motion vector of the sub-pixel corresponding to the current pixel first, when the second weight of the point-based prediction is set, the pixel position where the absolute value of the motion vector of the sub-pixel is larger is set to be smaller, and the pixel position where the absolute value of the motion vector of the sub-pixel is smaller is set to be larger.
Exemplarily, in the present application, the method for deriving the second weight w from the motion vector (x, y) of the sub-pixel is:
w=clip3(0,1,1-abs(x)-abs(y))
where abs represents the absolute value and clip3(a, b, c) represents that if c is less than a, the result takes a, if c is greater than b, the result takes b, otherwise the result takes c. Specifically, this is normalized in that 1 for x and y represents 1 pixel, and this expression can be transformed for computational convenience, such as by shifting out fractions/decimals.
It is understood that, in the present application, the second weight w may be multiplied by a multiple, etc. in order to match the first weight based on the predicted value of the sub-block.
It can be understood that the order of the step 402 and the step 403 executed by the encoder is not limited by the inter-frame prediction method provided in the embodiment of the present application, that is, in the present application, the encoder may execute the step 402 first and then execute the step 403, may execute the step 403 first and then execute the step 402, and may execute the step 402 and the step 403 at the same time.
And 404, determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value.
In an embodiment of the present application, after obtaining a first predictor and a corresponding first weight based on prediction of a sub-block and obtaining a second predictor and a corresponding second weight based on prediction of a point, an encoder may obtain a predictor of a pixel point in a current block using the first weight, the second weight, the first predictor and the second predictor.
It can be understood that, in the embodiment of the present application, after performing the sub-block-based prediction and the point-based prediction, for a pixel point in the current block, after determining the first weight and the second weight respectively corresponding to the pixel point, the encoder may perform weighted average on the first predicted value and the second predicted value by using the first weight and the second weight, so as to obtain the predicted value of the pixel point.
Further, in this application, before determining the third predictor of the current block according to the predictors of the pixel points, that is, before step 405, the method for performing inter prediction by the encoder may further include the following steps:
and 406, determining a prediction value of a pixel point in the current block based on the first prediction value and the second prediction value.
In the embodiment of the present application, after the encoder obtains the first prediction value based on the prediction of the sub-block and obtains the second prediction value based on the prediction of the point, the encoder may obtain the prediction value of one pixel point in the current block by using the first prediction value and the second prediction value.
It can be understood that, in the embodiment of the present application, compared to step 404, the encoder may also directly perform average operation on the first predicted value and the second predicted value of the same pixel point after obtaining the first predicted value and the second predicted value respectively, so as to obtain the predicted value of the pixel point.
That is, in the present application, the encoder may determine the final prediction result by directly using the first prediction value and the second prediction value without performing weighted average operation on the two prediction values.
In the present application, the method proposed in step 406 may be used as one of the cases of step 404, that is, the case where the first weight and the second weight are equal.
Step 405, determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
In an embodiment of the application, after the encoder determines the prediction value of the pixel point based on the first weight, the second weight, the first prediction value and the second prediction value, the encoder may determine the third prediction value of the current block according to the prediction value of the pixel point.
It should be noted that, in the present application, the encoder may perform filtering processing on all the pixel points in the current block according to the above method, and determine the third prediction value of the current block by using the prediction values of all the pixel points in the current block, or may perform filtering processing on part of the pixel points in the current block, and determine the third prediction value of the sub block by using the prediction values of part of the pixel points in the current block.
It can be understood that, in the embodiment of the present application, the third prediction value is used to determine a residual (residual) of the current block.
Further, in the embodiment of the present application, the encoder may perform addition operation on the prediction values of all the pixel points or part of the pixel points of the current block after traversing all the pixel points or part of the pixel points of the current block, so as to obtain a summation result, and may perform normalization processing on the addition result, so as to finally obtain a third prediction value of the current block.
In the embodiment of the present application, further, the inter prediction methods proposed in the above steps 401 to 406 may be applied to both unidirectional prediction and bidirectional prediction.
Further, in an embodiment of the present application, the method for inter prediction by an encoder may further include the following steps:
step 501, determining an inter-frame prediction direction.
Step 502, if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using a first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using a second motion vector of each pixel point based on the first prediction direction; and determining a sixth predicted value of the pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on the second prediction direction.
Step 503, for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; and determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction.
Step 504, obtaining a predicted value of the pixel point corresponding to the first prediction direction based on the third weight, the fourth predicted value and the fifth predicted value, and obtaining a predicted value of the pixel point corresponding to the second prediction direction based on the fifth weight, the sixth predicted value and the seventh predicted value.
And 505, determining a predicted value of the pixel point according to the predicted value corresponding to the first prediction direction and the predicted value corresponding to the second prediction direction.
That is to say, in the present application, when the encoder performs bi-directional prediction, for each of the unidirectional predictions, a set of prediction values is obtained for the current block by using prediction based on sub-blocks, and at the same time, another set of prediction values is obtained for the current block by using prediction based on points, and the prediction values of the pixel points can be obtained by performing weighted average on the two prediction values of the same pixel points. Then, the predicted values of the two unidirectional predictions are averaged (or weighted average), and finally, the predicted value of the bidirectional prediction can be obtained.
It should be noted that, in the present application, the averaging operation or the weighted average operation in the inter-frame prediction processing process may be performed after the entire prediction blocks that need to be averaged or weighted averaged are obtained, or may be performed after the prediction blocks of the sub-blocks that need to be averaged or weighted averaged are obtained, or may be performed after the prediction blocks of the points that need to be averaged or weighted averaged are obtained, which is not particularly limited in the present application.
Further, in an embodiment of the present application, the method for inter prediction by an encoder may further include the following steps:
step 501, determining an inter-frame prediction direction.
Step 502, if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using a first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using a second motion vector of each pixel point based on the first prediction direction; and determining a sixth predicted value of the pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on the second prediction direction.
Step 503, for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; and determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction.
Step 506, determining a predicted value of the pixel point based on the third weight, the fourth weight, the fifth weight, the sixth weight, the fourth predicted value, the fifth predicted value, the sixth predicted value and the seventh predicted value.
That is to say, in the present application, when the encoder performs bidirectional prediction, for each unidirectional prediction, a set of prediction values is obtained for the current block by using prediction based on sub-blocks, and another set of prediction values is obtained for the current block by using point-based prediction, and then four prediction values of two unidirectional predictions are averaged (or weighted average), so as to finally obtain the bidirectional prediction value.
It can be seen that for bi-directional prediction, the encoder may determine the respective predicted values of the two prediction directions and then perform an averaging or weighted average operation, or may not calculate the predicted value of each prediction direction separately, but directly obtain the weighted average of the first prediction direction sub-block-based prediction block, the first prediction direction point-based prediction block, the second prediction direction sub-block-based prediction block, and the second prediction direction point-based prediction block.
It can be understood that, in the present application, the averaging operation may be regarded as a special weighted averaging operation, and the pixel positions of the pixels in a part of the scenes may be understood as different expressions of the same concept.
Further, the inter-frame prediction method proposed in the present application may be applied to only the luminance component, or may be applied to the luminance component and the chrominance component, or may be applied to some or all of the components in any other format, such as RGB. The embodiments of the present application take the luminance component as an example for explanation, but are not limited to the luminance component.
That is, the inter prediction method proposed in the present application may be applied to any image component, and in the present embodiment, a prediction scheme is exemplarily used for a luminance component, but may also be applied to a chrominance component, or any other format of components. The inter-frame prediction method proposed in the present application can also be applied to any video format, including but not limited to YUV format, including but not limited to the luma component in YUV format.
The embodiment provides an inter-frame prediction method, which can use prediction based on subblocks for a current block to obtain a group of predicted values, namely a first predicted value, use point-based prediction for the current block to obtain another group of predicted values, namely a second predicted value, and use the first weight and the second weight to perform weighted average on the first predicted value and the second predicted value after determining a first weight based on the prediction of the subblocks and a second weight based on the prediction of the points for the same pixel points in the current block, so as to finally obtain a new predicted value of the current block.
Based on the above embodiments, in yet another embodiment of the present application, fig. 25 is a schematic diagram of a first constituent structure of a decoder, and as shown in fig. 25, a decoder 300 according to an embodiment of the present application may include a parsing part 301 and a first determining part 302;
the analysis part 301 is configured to analyze the code stream and obtain the prediction mode parameter of the current block;
the first determining part 302 is configured to determine a first predictor of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block and a second predictor of the each pixel point of the current block using a second motion vector of the each pixel point when the prediction mode parameter indicates that the inter predictor of the current block is determined using an inter prediction mode; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
Fig. 26 is a schematic diagram illustrating a composition structure of a decoder, and as shown in fig. 26, the decoder 300 according to the embodiment of the present application may further include a first processor 303, a first memory 304 storing an executable instruction of the first processor 303, a first communication interface 305, and a first bus 306 for connecting the first processor 303, the first memory 304, and the first communication interface 305.
Further, in an embodiment of the present application, the first processor 303 is configured to parse a code stream to obtain a prediction mode parameter of a current block; when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
Fig. 27 is a first schematic view of a component structure of an encoder, and as shown in fig. 27, an encoder 400 according to an embodiment of the present application may include a second determining portion 401;
the second determining part 401 configured to determine a prediction mode parameter of the current block; when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
Fig. 28 is a schematic diagram of a composition structure of an encoder, and as shown in fig. 28, the encoder 400 according to the embodiment of the present application may further include a second processor 402, a second memory 403 in which an executable instruction of the second processor 402 is stored, a second communication interface 404, and a second bus 405 for connecting the second processor 402, the second memory 403, and the second communication interface 404.
Further, in an embodiment of the present application, the second processor 402 is configured to determine a prediction mode parameter of the current block; when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; for a pixel point in the current block, determining a first weight and a second weight of the pixel point; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
The embodiment of the application provides an encoder and an encoder, which can use prediction based on subblocks for a current block to obtain a group of predicted values, namely a first predicted value, use point-based prediction for the current block to obtain another group of predicted values, namely a second predicted value, and use the first weight and the second weight to carry out weighted average on the first predicted value and the second predicted value after determining a first weight based on the prediction of the subblocks and a second weight based on the prediction of the points for the same pixel point in the current block, so as to finally obtain a new predicted value of the current block, thereby reducing the complexity of calculation and reducing the bandwidth while improving the accuracy of interframe prediction, thereby greatly improving the encoding performance and improving the encoding and decoding efficiency.
Embodiments of the present application provide a computer-readable storage medium and a computer-readable storage medium, on which a program is stored, which when executed by a processor implements the method as described in the above embodiments.
Specifically, the program instructions corresponding to an inter-frame prediction method in this embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a usb disk, and when the program instructions corresponding to an inter-frame prediction method in the storage medium are read or executed by an electronic device, the method includes the following steps:
analyzing the code stream to obtain the prediction mode parameter of the current block;
when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
Specifically, the program instructions corresponding to an inter-frame prediction method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, or the like, and when the program instructions corresponding to an inter-frame prediction method in the storage medium are read or executed by an electronic device, the method includes the following steps:
determining a prediction mode parameter of a current block;
when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks for implementing the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict. The features disclosed in the several method or apparatus embodiments provided herein may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Industrial applicability
According to the inter-frame prediction method, the encoder, the decoder and the computer storage medium, the decoder analyzes the code stream and obtains the prediction mode parameters of the current block; when the prediction mode parameter indicates that the inter prediction mode is used to determine the inter prediction value of the current block, determining a first prediction value of a pixel point in each sub-block using a first motion vector of each sub-block of the current block, and determining a second prediction value of each pixel point using a second motion vector of each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in a current block; the first weight corresponds to the first predicted value, and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block. That is to say, the interframe prediction method provided by the application can use prediction based on subblocks for a current block to obtain a group of predicted values, namely a first predicted value, use point-based prediction for the current block to obtain another group of predicted values, namely a second predicted value, and use the first weight and the second weight to perform weighted averaging on the first predicted value and the second predicted value after determining the first weight and the second weight of the point-based prediction for the same pixel point in the current block, so as to finally obtain a new predicted value of the current block, thereby improving accuracy of interframe prediction, reducing complexity of calculation, reducing bandwidth, further greatly improving encoding performance, and improving encoding and decoding efficiency.

Claims (43)

  1. An inter-prediction method applied to a decoder, the method comprising:
    analyzing the code stream to obtain the prediction mode parameter of the current block;
    when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
    determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
    determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
    determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
  2. The method of claim 1, wherein prior to determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    analyzing the code stream and determining the inter-frame prediction direction;
    if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; determining a sixth predicted value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on a second prediction direction;
    for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction;
    obtaining a predicted value of the pixel point corresponding to the first prediction direction based on the third weight, the fourth predicted value and the fifth predicted value, and obtaining a predicted value of the pixel point corresponding to the second prediction direction based on the fifth weight, the sixth predicted value and the seventh predicted value;
    and determining the predicted value of the pixel point according to the predicted value corresponding to the first prediction direction and the predicted value corresponding to the second prediction direction.
  3. The method of claim 1, wherein prior to determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    analyzing the code stream and determining the inter-frame prediction direction;
    if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; determining a sixth prediction value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh prediction value of each pixel point by using the second motion vector of each pixel point based on a second prediction direction;
    for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction;
    determining a predicted value of the pixel point based on the third weight, the fourth weight, the fifth weight, the sixth weight, the fourth predicted value, the fifth predicted value, the sixth predicted value, and the seventh predicted value.
  4. The method of claim 1, wherein the determining a first predictor of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block comprises:
    for a sub-block of the current block, determining a first filtering parameter according to the first motion vector;
    and performing filtering processing by using a first filter based on the first filtering parameter to obtain the first predicted value.
  5. The method of claim 4, wherein the first filter may be any one of the following filters: an n-tap interpolation filter, a separable two-dimensional filter, an inseparable two-dimensional filter; wherein n is any one of the following values: 8,6,5,4,3,2.
  6. The method of claim 4, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a first scale parameter;
    determining the first filtering parameter according to the first scale parameter and the first motion vector; wherein the first filter parameter is a first order function, a second order function, or a higher order function of the first motion vector.
  7. The method of claim 4, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a mapping table of the first phase and the motion vector;
    and determining the first filtering parameter according to the mapping table of the first phase and the motion vector and the first motion vector.
  8. The method of claim 1, wherein said determining a second predictor for each pixel of the current block using a second motion vector for said each pixel comprises:
    for a pixel point of the current block, determining a second filtering parameter according to the second motion vector;
    and performing filtering processing by using a second filter based on the second filtering parameter to obtain the second predicted value.
  9. The method of claim 8, wherein the second filter can be any one of the following filters: an m-tap interpolation filter, a separable two-dimensional filter, a non-separable two-dimensional filter, wherein m is any one of the following values: 8,6,5,4,3,2.
  10. The method of claim 8, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a second ratio parameter;
    determining the second filtering parameter according to the second proportion parameter and the second motion vector; wherein the second filter parameter is a first order function, a second order function, or a higher order function of the second motion vector.
  11. The method of claim 8, wherein said determining a second filter parameter from said second motion vector comprises:
    determining a mapping table of the second phase and the motion vector;
    and determining the second filtering parameter according to the mapping table of the second phase and the motion vector and the second motion vector.
  12. The method of claim 1, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining a target sub-block corresponding to the pixel point;
    determining a first distance between the pixel position of the pixel point and the reference position of the target sub-block; the reference position is a position used by the first motion vector of the target sub-block;
    determining the first weight according to the first distance; wherein the first distance is inversely proportional to the first weight.
  13. The method of claim 1, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining a target sub-block corresponding to the pixel point;
    determining a deviation value between a motion vector of the pixel point and the first motion vector of the target sub-block;
    determining the first weight according to the deviation value; wherein the deviation value is inversely proportional to the first weight.
  14. The method of claim 1, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining corresponding sub-pixels and whole pixels of the pixel points in the reference image;
    determining a second distance between the sub-pixel and the integer pixel;
    determining the second weight according to the second distance; wherein the second distance is inversely proportional to the second weight.
  15. The method of claim 1, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining the absolute value of the motion vector of the pixel point;
    determining said second weight in accordance with said absolute value; wherein the absolute value is inversely proportional to the second weight.
  16. The method according to any one of claims 4 to 11, wherein before determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    and for a pixel point in the current block, determining a predicted value of the pixel point based on the first predicted value and the second predicted value.
  17. The method of claim 1, wherein the method further comprises:
    taking a sub-block of the current block as a reference block;
    determining a first predicted value of a pixel point in the reference block by using a first motion vector of the reference block, and determining a second predicted value of each pixel point by using a second motion vector of each pixel point of the reference block; wherein the reference block comprises one or more pixel points.
  18. The method of claim 9, wherein if the second filter is a horizontally and vertically separable 3-tap filter, the method further comprises:
    the motion vector of the sub-pixel corresponding to each pixel point belongs to a range from-1/2 pixels to 1/2 pixels; alternatively, the first and second electrodes may be,
    and the motion vector of the sub-pixel corresponding to each pixel point belongs to a range from-1 pixel to 1 pixel.
  19. The method of claim 9, wherein if the second filter is a horizontally and vertically separable 3-tap filter, the method further comprises:
    determining reference pixel points with the same positions as each pixel point in the reference image;
    and determining the reference pixel point as the center of the second filter.
  20. An inter prediction method applied to an encoder, the method comprising:
    determining a prediction mode parameter of a current block;
    when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points;
    determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value;
    determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value;
    determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
  21. The method of claim 20, wherein prior to said determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    determining an inter-frame prediction direction;
    if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; determining a sixth predicted value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on a second prediction direction;
    for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction;
    obtaining a predicted value of the pixel point corresponding to the first prediction direction based on the third weight, the fourth predicted value and the fifth predicted value, and obtaining a predicted value of the pixel point corresponding to the second prediction direction based on the fifth weight, the sixth predicted value and the seventh predicted value;
    and determining the predicted value of the pixel point according to the predicted value corresponding to the first prediction direction and the predicted value corresponding to the second prediction direction.
  22. The method of claim 20, wherein prior to said determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    determining an inter-frame prediction direction;
    if the inter-frame prediction direction is bidirectional prediction, determining a fourth prediction value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a fifth prediction value of each pixel point by using the second motion vector of each pixel point based on the first prediction direction; determining a sixth predicted value of a pixel point in each sub-block by using the first motion vector of each sub-block and determining a seventh predicted value of each pixel point by using the second motion vector of each pixel point based on a second prediction direction;
    for a pixel point of the current block, determining a third weight and a fourth weight of the pixel point based on the first prediction direction; determining a fifth weight and a sixth weight of the pixel point based on the second prediction direction;
    determining a predicted value of the pixel point based on the third weight, the fourth weight, the fifth weight, the sixth weight, the fourth predicted value, the fifth predicted value, the sixth predicted value, and the seventh predicted value.
  23. The method of claim 20, wherein the determining a first predictor of pixel points in each sub-block of the current block using a first motion vector of the each sub-block comprises:
    for a sub-block of the current block, determining a first filtering parameter according to the first motion vector;
    and performing filtering processing by using a first filter based on the first filtering parameter to obtain the first predicted value.
  24. The method of claim 23, wherein the first filter may be any one of the following filters: an n-tap interpolation filter, a separable two-dimensional filter, an inseparable two-dimensional filter; wherein n is any one of the following values: 8,6,5,4,3,2.
  25. The method of claim 23, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a first scale parameter;
    determining the first filtering parameter according to the first scale parameter and the first motion vector; wherein the first filter parameter is a first order function, a second order function, or a higher order function of the first motion vector.
  26. The method of claim 23, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a mapping table of the first phase and the motion vector;
    and determining the first filtering parameter according to the mapping table of the first phase and the motion vector and the first motion vector.
  27. The method of claim 20, wherein said determining a second predictor for each pixel of the current block using a second motion vector for said each pixel comprises:
    for a pixel point of the current block, determining a second filtering parameter according to the second motion vector;
    and performing filtering processing by using a second filter based on the second filtering parameter to obtain the second predicted value.
  28. The method of claim 27, wherein the second filter can be any one of the following: an m-tap interpolation filter, a separable two-dimensional filter, an inseparable two-dimensional filter, wherein m is any one of the following values: 8,6,5,4,3,2.
  29. The method of claim 27, wherein said determining a first filter parameter from said first motion vector comprises:
    determining a second ratio parameter;
    determining the second filtering parameter according to the second proportion parameter and the second motion vector; wherein the second filter parameter is a first order function, a second order function, or a higher order function of the second motion vector.
  30. The method of claim 27, wherein said determining a second filter parameter from said second motion vector comprises:
    determining a mapping table of the second phase and the motion vector;
    and determining the second filtering parameter according to the mapping table of the second phase and the motion vector and the second motion vector.
  31. The method of claim 20, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining a target sub-block corresponding to the pixel point;
    determining a first distance between the pixel position of the pixel point and the reference position of the target sub-block; the reference position is a position used by the first motion vector of the target sub-block;
    determining the first weight according to the first distance; wherein the first distance is inversely proportional to the first weight.
  32. The method of claim 20, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining a target sub-block corresponding to the pixel point;
    determining a deviation value between a motion vector of the pixel point and the first motion vector of the target sub-block;
    determining the first weight according to the deviation value; wherein the deviation value is inversely proportional to the first weight.
  33. The method of claim 20, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining corresponding sub-pixels and whole pixels of the pixel points in the reference image;
    determining a second distance between the sub-pixel and the integer pixel;
    determining the second weight according to the second distance; wherein the second distance is inversely proportional to the second weight.
  34. The method of claim 20, wherein said determining, for a pixel within said current block, a first weight and a second weight of said pixel comprises:
    determining the absolute value of the motion vector of the pixel point;
    determining said second weight in accordance with said absolute value; wherein the absolute value is inversely proportional to the second weight.
  35. The method according to any of claims 23 to 30, wherein before determining the third predictor of the current block based on the predictor of the pixel point, the method further comprises:
    and for a pixel point in the current block, determining a predicted value of the pixel point based on the first predicted value and the second predicted value.
  36. The method of claim 20, wherein the method further comprises:
    taking a sub-block of the current block as a reference block;
    determining a first predicted value of a pixel point in the reference block by using a first motion vector of the reference block, and determining a second predicted value of each pixel point by using a second motion vector of each pixel point of the reference block; wherein the reference block comprises one or more pixel points.
  37. The method of claim 28, wherein if the second filter is a horizontally and vertically separable 3-tap filter, the method further comprises:
    the motion vector of the sub-pixel corresponding to each pixel point belongs to a range from-1/2 pixels to 1/2 pixels; alternatively, the first and second electrodes may be,
    and the motion vector of the sub-pixel corresponding to each pixel point belongs to a range from-1 pixel to 1 pixel.
  38. The method of claim 28, wherein if the second filter is a horizontally and vertically separable 3-tap filter, the method further comprises:
    determining reference pixel points with the same positions as each pixel point in the reference image;
    and determining the reference pixel point as the center of the second filter.
  39. A decoder, the decoder comprising a parsing part, a first determining part;
    the analysis part is configured to analyze the code stream and obtain the prediction mode parameter of the current block;
    the first determining part is configured to determine a first predictor of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block and a second predictor of the each pixel point of the current block using a second motion vector of the each pixel point when the prediction mode parameter indicates that the inter predictor of the current block is determined using an inter prediction mode; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; determining a first weight and a second weight of a pixel point in the current block; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third prediction value is used to determine a reconstructed value of the current block.
  40. A decoder comprising a first processor, a first memory having stored thereon first processor-executable instructions that, when executed, implement a method as defined in any one of claims 1-19.
  41. An encoder, the encoder comprising a second determining portion;
    the second determining part configured to determine a prediction mode parameter of the current block; when the prediction mode parameter indicates that an inter prediction value of the current block is determined using an inter prediction mode, determining a first prediction value of a pixel point in each sub-block of the current block using a first motion vector of the each sub-block, and determining a second prediction value of the each pixel point using a second motion vector of the each pixel point of the current block; wherein the current block comprises one or more sub-blocks; the current block comprises one or more pixel points; for a pixel point in the current block, determining a first weight and a second weight of the pixel point; wherein the first weight corresponds to the first predicted value and the second weight corresponds to the second predicted value; determining a predicted value of the pixel point based on the first weight, the second weight, the first predicted value and the second predicted value; determining a third predicted value of the current block according to the predicted value of the pixel point; wherein the third predictor is used to determine a residue of the current block.
  42. An encoder comprising a second processor, a second memory storing instructions executable by the second processor, the instructions when executed, the second processor implementing the method of any of claims 20-38.
  43. A computer storage medium storing a computer program which, when executed by a first processor, implements the method of any of claims 1-19 or which, when executed by a second processor, implements the method of any of claims 20-38.
CN202080093968.7A 2020-10-16 2020-10-16 Inter-frame prediction method, encoder, decoder, and computer storage medium Pending CN114982228A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/121676 WO2022077495A1 (en) 2020-10-16 2020-10-16 Inter-frame prediction methods, encoder and decoders and computer storage medium

Publications (1)

Publication Number Publication Date
CN114982228A true CN114982228A (en) 2022-08-30

Family

ID=81208900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080093968.7A Pending CN114982228A (en) 2020-10-16 2020-10-16 Inter-frame prediction method, encoder, decoder, and computer storage medium

Country Status (2)

Country Link
CN (1) CN114982228A (en)
WO (1) WO2022077495A1 (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170084055A (en) * 2014-11-06 2017-07-19 삼성전자주식회사 Video encoding method and apparatus, video decoding method and apparatus
WO2016175549A1 (en) * 2015-04-27 2016-11-03 엘지전자 주식회사 Method for processing video signal and device for same
KR20200038947A (en) * 2017-08-22 2020-04-14 파나소닉 인텔렉츄얼 프로퍼티 코포레이션 오브 아메리카 Image encoder, image decoder, image encoding method and image decoding method
CN116582682A (en) * 2017-08-22 2023-08-11 松下电器(美国)知识产权公司 Image encoder, image decoder, and non-transitory computer readable medium
CN111107373B (en) * 2018-10-29 2023-11-03 华为技术有限公司 Inter-frame prediction method based on affine prediction mode and related device
CN110719489B (en) * 2019-09-18 2022-02-18 浙江大华技术股份有限公司 Motion vector correction method, motion vector prediction method, motion vector encoding device, and storage device
CN111669584B (en) * 2020-06-11 2022-10-28 浙江大华技术股份有限公司 Inter-frame prediction filtering method and device and computer readable storage medium

Also Published As

Publication number Publication date
WO2022077495A1 (en) 2022-04-21

Similar Documents

Publication Publication Date Title
TWI816439B (en) Block-based prediction
JP5623640B2 (en) Video coding using high resolution reference frames
KR102359415B1 (en) Interpolation filter for inter prediction apparatus and method for video coding
CN102804779A (en) Image processing device and method
TWI504241B (en) Video encoding method and apparatus, video decoding method and apparatus, and programs product thereof
CN114586366A (en) Inter-frame prediction method, encoder, decoder, and storage medium
WO2022022278A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
KR20210070368A (en) Video image element prediction method and apparatus, computer storage medium
WO2022061680A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
US11202082B2 (en) Image processing apparatus and method
WO2022077495A1 (en) Inter-frame prediction methods, encoder and decoders and computer storage medium
TW202332274A (en) Mip for all channels in the case of 4:4:4-chroma format and of single tree
KR20110126075A (en) Method and apparatus for video encoding and decoding using extended block filtering
CN116980596A (en) Intra-frame prediction method, encoder, decoder and storage medium
EP3984229A1 (en) Simplified downsampling for matrix based intra prediction
WO2022037344A1 (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
TWI833073B (en) Coding using intra-prediction
US20220264148A1 (en) Sample Value Clipping on MIP Reduced Prediction
CN114125466A (en) Inter-frame prediction method, encoder, decoder, and computer storage medium
JP2022541115A (en) Image component prediction method, apparatus and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination