WO2020181476A1

WO2020181476A1 - Video image prediction method and device

Info

Publication number: WO2020181476A1
Application number: PCT/CN2019/077726
Authority: WO
Inventors: 陈旭; 郑建铧
Original assignee: 华为技术有限公司
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2020-09-17
Also published as: CN116600139A; CN113557738A; CN113557738B

Abstract

The present application provides a video image prediction method and device, for use in solving, to a certain extent, the problem in the prior art of low prediction accuracy. In embodiments of the present application, inter-frame prediction is performed by using a merge with motion vector difference (MMVD) mode in combination with a decoder-side motion vector refinement (DMVR) method (i.e., an MMVD-based DMVR method); for a situation that a bidirectional prediction process exists in an MMVD, decoding is performed by combining MVD information after the bidirectional prediction process is optimized; in this way, a matching relationship between a first reference image and a second reference image (i.e., between forward and backward prediction images) can be fully utilized; and compared with the traditional method, the present invention reduces redundancy to a certain extent, thereby relatively improving the prediction accuracy.

Description

Video image prediction method and device

Technical field

This application relates to the field of image coding and decoding technologies, and in particular to video image prediction methods and devices, and corresponding video encoders and video decoders.

Background technique

With the development of information technology, high-definition TV, web conferencing, IPTV, 3D TV and other video services are developing rapidly, and video signals have become the most important way for people to obtain information in daily life due to their intuitiveness and efficiency. Because the video signal contains a large amount of data, it needs to take up a lot of transmission bandwidth and storage space. In order to effectively transmit and store video signals, it is necessary to compress and encode video signals. Video compression technology has increasingly become an indispensable key technology in the field of video applications.

The basic principle of video coding and compression is to use the correlation between spatial domain, time domain and codewords to remove redundancy as much as possible. The current popular approach is to use a hybrid video coding framework based on image blocks to implement video coding compression through steps such as prediction (including intra-frame prediction and inter-frame prediction), transformation, quantization, and entropy coding.

Among various video encoding/decoding schemes, motion estimation/motion compensation in inter-frame prediction is a key technology that affects encoding/decoding performance. When the existing inter-frame prediction adopts the merge with motion vector difference (MMVD) method, there is redundancy in the case of bidirectional prediction, which results in low decoding accuracy.

Summary of the invention

The embodiments of the present application provide video image prediction methods, devices, and corresponding encoders and decoders, which can reduce redundancy to a certain extent, improve image prediction accuracy, and thereby improve coding and decoding performance.

In the first aspect, an embodiment of the present application provides a video image prediction method, including:

Determine (or obtain) the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed; according to the first initial motion vector prediction value , The second initial motion vector predicted value, execute a motion vector correction process (or a motion vector refinement process, such as decoder-side motion vector refinement (DMVR)) to obtain the first corrected motion vector Predicted value and the second modified motion vector prediction value; determine the first motion vector prediction value according to the difference between the first modified motion vector prediction value and the first motion vector, and determine the first motion vector prediction value according to the second modified motion vector prediction value and the The second motion vector difference determines a second motion vector predictor; and predicts the current image block to be processed according to the first motion vector predictor and the second motion vector predictor.

In addition, it should be understood that the image block currently to be processed (referred to as the current block for short) herein can be understood as the image block currently being processed. For example, in the encoding process, it refers to the image block currently being encoded (encoding); in the decoding process, it refers to the image block currently being decoded (decoding block).

In an example, the first initial motion vector prediction value corresponds to the initial motion vector prediction value of the first list (ie list0), and accordingly, the second initial motion vector prediction value corresponds to the second list (ie list1) The initial motion vector prediction value.

In another example, the first initial motion vector prediction value corresponds to the initial motion vector prediction value in the first direction (for example, forward), and correspondingly, the second initial motion vector prediction value corresponds to the second direction (for example, Backward) initial motion vector prediction value; this application does not limit this.

In addition, it should be noted that the initial motion information of the current image block in the embodiment of the present application may include a motion vector MV and reference image indication information. Of course, the initial motion information may also include either or both of them. For example, in the case where the codec side agrees on a reference image, the initial motion information may only include the motion vector MV. The reference image indication information is used to indicate which one or which reconstructed images are used in the current block as the reference image, and the motion vector indicates the position offset of the reference block position relative to the current block position in the reference image used, generally including horizontal component offset and Vertical component offset. For example, use (x, y) to represent MV, x to represent the position offset in the horizontal direction, and y to represent the position offset in the vertical direction. Using the position of the current block plus the MV, the position of its reference block in the reference image can be obtained. The reference image indication information may include a reference image list and/or a reference image index corresponding to the reference image list. The reference image index is used to identify the reference image corresponding to the used motion vector in the specified reference image list (list0 or list1). The image may be referred to as a frame, and the reference image may be referred to as a reference frame.

The initial motion information of the current image block in the embodiment of the present application is initial bidirectional prediction motion information, that is, it includes motion information used for forward and backward prediction directions. Here, the forward and backward prediction directions are the two prediction directions of the bidirectional prediction mode. It can be understood that "forward" and "backward" respectively correspond to the reference image list 0 (list0, above The first list) and reference image list 1 (list1, the second list above).

It should be understood that the execution subject of the method in the embodiments of the present application may be an image prediction device, for example, a video encoder or a video decoder, or an electronic device with video encoding and decoding functions, for example, it may be a frame in a video encoder. Inter prediction unit, or motion compensation unit in video decoder.

In an example implementation, the motion vector correction process is performed according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion The vector prediction value may include: performing a motion vector correction process according to the first initial motion vector prediction value to obtain a first modified motion vector prediction value, and performing a motion vector correction process according to the second initial motion vector prediction value to obtain a second modified motion vector Predictive value.

The above design can be applied to inter-frame prediction on the encoding side, and can also be applied to inter-frame prediction on the decoding side.

In an implementation manner, the motion vector correction process may be a DMVR process, and in the embodiment of the present application, the two may be replaced with each other. Correspondingly, the first modified motion vector prediction value or the second modified motion vector prediction value may also be referred to as the first refined motion vector prediction value or the second refined motion vector prediction value.

The image prediction method in the embodiments of this application is not only suitable for merge prediction mode (merge) and/or advanced motion vector prediction mode (advanced motion vector prediction, AMVP), but also suitable for using spatial reference blocks, time domain reference blocks and / Or other modes in which the motion information of the inter-view reference block predicts the motion information of the current image block, thereby improving the coding and decoding performance.

Through the solution provided by the embodiments of the present application, when the bidirectional prediction process is adopted for the current block, the bidirectional initial motion vector prediction value is optimized by the motion vector refinement method, and then combined with the MVD information for decoding. In this case, Making full use of the matching relationship between the first reference image and the second reference image (that is, between the forward and backward prediction images), compared with the traditional method, reduces the redundancy to a certain extent, so that the prediction accuracy is relatively improved.

In a possible design, determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed includes:

When the first flag parsed from the code stream (such as mmvd_flag[x0][y0]) indicates that the current image block to be processed is inter-frame prediction using the fused motion vector difference MMVD method, it is determined that the current image block to be processed The first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

For example, the first flag may also be called mmvd_flag[x0][y0], and the above name is also used in the standard text or code. As an example, when mmvd_flag[x0][y0] is the first value, it indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, and when mmvd_flag[x0][y0] is the second value, Indicates that the inter-frame prediction of the current image block to be processed does not use the merged motion vector difference MMVD mode. For example: the first value can be 1 (or true), and the second value can be 0 (or false).

The above design can be applied to the decoding side.

In the above design, the MMVD method combined with the motion vector refinement method (ie, the DMVR method based on MMVD) is used to perform inter-frame prediction. For the situation where MMVD has a two-way prediction process, the two-way prediction process is optimized and then combined with MVD information for decoding In this case, the matching relationship between the first reference image and the second reference image (that is, between the forward and backward prediction images) can be fully utilized. Compared with the traditional method, the redundancy is reduced to a certain extent, and the prediction accuracy is relatively Can be improved.

In a possible design, applied to the encoding side, when it is determined that the current image block to be processed is inter-frame prediction using the fused motion vector difference MMVD method, the first initial motion vector predictor of the current image block to be processed, The second initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

On the encoding side, the available inter-frame prediction modes can include multiple, for example, the inter-frame prediction mode with the least rate-distortion cost can be selected among multiple inter-frame prediction modes. When the selected inter-frame prediction mode is MMVD, Then implement the plan provided in this application. Of course, on the encoding side, when the multiple inter-frame prediction modes available for selection include MMVD, the solution provided in this application can be used when determining the prediction block of the current processing block based on MMVD, and then the rate-distortion cost algorithm can be compared with other inter-frame The prediction blocks determined by the prediction prediction mode are compared, and the inter prediction mode with the least rate-distortion cost is selected.

In a possible design, the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed includes:

According to the candidate index (for example, base candidate index) parsed from the code stream, the corresponding candidate motion information (for example, base candidate) is determined from the candidate list, and the candidate motion information includes the third motion vector predictor and the fourth motion vector predictor. Motion vector predictor, the third motion vector predictor is used as the first initial motion vector predictor, and the fourth motion vector predictor is used as the second initial motion vector predictor; or, the first position in the candidate list is determined The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information are the first initial motion vector predictor and the second initial motion vector predictor.

The above design is applied to the decoding side.

In a possible design, when applied to the encoding side, the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed includes:

According to the rate-distortion cost algorithm, select candidate motion information (for example, base candidate) from a candidate list, where the candidate motion information includes a third motion vector predictor and a fourth motion vector predictor, and the third motion vector predictor As the first initial motion vector predictor, the fourth motion vector predictor is used as the second initial motion vector predictor; or, determine the third motion vector predictor and the second motion vector predictor included in the candidate motion information at the first position in the candidate list The four motion vector predictors are the first initial motion vector predictor and the second initial motion vector predictor.

In a possible design, the executing a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value includes:

When the image block to which the candidate motion information belongs and the current image block to be processed belong to different images (for example, the candidate motion information corresponding to the candidate index (or the selected candidate motion information) is from the temporal neighboring block of the current image block The motion information of the T1 pixel position of the corresponding position is not located in the image where the current image block to be processed is located), and the motion vector correction is performed according to the first initial motion vector predicted value and the second initial motion vector predicted value process.

In the above design, when it is determined that the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, the motion vector correction process is performed. Because of different images, the success rate of performing correction to find a better motion vector predictor is higher. High, the redundancy can be reduced through the above design.

The above design can be applied to the encoding side or the decoding side.

In one possible design, it also includes:

When the image block to which the candidate motion information belongs and the current image block to be processed belong to the same image (for example, the candidate motion information corresponding to the candidate index (or the selected candidate motion information) is from the spatial neighboring block of the current image block When the motion information of the A0 pixel position is located in the image where the current image block to be processed is located), determine the first target motion vector predicted value according to the first initial motion vector predicted value and the first motion vector difference, and Determine the second target motion vector predictor according to the second initial motion vector predictor and the second motion vector difference; according to the first target motion vector predictor and the second target motion vector predictor, the current The image block to be processed is decoded.

The above design can be applied to the encoding side as well as the decoding side.

It can be seen that in the above design, when it is determined that the image block to which the candidate motion information belongs belongs to the same image as the currently to-be-processed image block, the motion vector correction process is not performed. Because of the same image, the correction is performed to find a better motion vector predictor. The success rate is low, which can improve resource utilization to a certain extent.

Determine the corresponding candidate (such as base candidate) from the candidate list according to the candidate index (such as base candidate index) parsed from the code stream, the candidate including the first candidate motion information and the second candidate motion Information, wherein the first candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value (that is, the fifth motion vector prediction value is the first initial motion vector prediction value, and the sixth motion vector prediction value is the second initial motion vector prediction value );or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value (the seventh motion vector prediction value is the first initial motion vector prediction value, and the eighth motion vector prediction value is the second initial motion vector prediction value).

It should be understood that the construction of the candidate list uses the motion information (for example, motion vector MV) of the image blocks that have been previously encoded or decoded before the current image block to be processed. Some previously encoded or decoded image blocks are provided in accordance with this application. The motion vector correction method is processed. Some previously encoded or decoded image blocks are processed in the traditional way. Based on this, in some cases, the candidate index only corresponds to the second candidate motion information (that is, the original candidate motion information, that is, the non-correction method). Candidate motion information). In some cases, the candidate index corresponds to the first candidate motion information (in the previous encoding or decoding process, the candidate motion vector information in the correction mode) and the second candidate motion information. The fifth motion vector predicted value and the sixth motion vector predicted value are modified motion vector predicted values, and the seventh motion vector predicted value and the eighth motion vector predicted value are original motion vector predicted values.

It should be understood that the image block to which the first candidate motion information and the second candidate motion information belong is the same image block (such as the A0 pixel position), and the first candidate motion information is corrected (DMVR), and the second candidate motion information The information is uncorrected (DMVR).

The above design is applied to the decoding side.

In a possible design, applied to the encoding side, the determining the first initial motion vector predictor and the second initial motion vector predictor of the current image block to be processed includes:

According to the rate-distortion cost algorithm, the corresponding candidate (for example, base candidate) is determined from the candidate list. The candidate includes first candidate motion information and second candidate motion information, wherein the first candidate motion information includes the fifth candidate. A motion vector predictor and a sixth motion vector predictor, the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value (that is, the fifth motion vector prediction value is the first initial motion vector prediction value, and the sixth motion vector prediction value is the sixth motion vector prediction value) ;or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value (the seventh motion vector prediction value is the first initial motion vector prediction value, and the eighth motion vector prediction value is the second motion vector prediction value).

In a possible design, the candidate at the first position in the candidate list includes first candidate motion information (modified candidate motion information) and second candidate motion information (original candidate motion information). A candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed includes:

When the image block to which the first candidate motion information or the second candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first An initial motion vector prediction value and the second initial motion vector prediction value;

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.

Acquiring a first reference prediction block corresponding to the first initial motion vector prediction value, and a second reference prediction block corresponding to the second initial motion vector prediction value;

Determine a first modified reference prediction block according to the first reference prediction block, and determine a second modified reference prediction block according to the second reference prediction block;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor.

In a possible design, determining a first modified reference prediction block according to the first reference prediction block and determining a second modified reference prediction block according to the second reference prediction block includes:

Performing a motion search according to the first reference prediction block pair to obtain at least one second reference prediction block pair;

Wherein, the first reference prediction block pair includes the first reference prediction block and the second reference prediction block; the second reference prediction block pair includes a third reference prediction block and a fourth reference prediction block, and the The third reference prediction block is obtained based on the motion search of the first reference prediction block in the first preset area, and the fourth reference prediction block is obtained based on the second reference prediction block in the second It is obtained by motion search in the preset area;

Determining the difference between the third reference prediction block and the fourth reference prediction block included in each second reference prediction block pair in the at least one second reference prediction block pair;

Determining a reference prediction block pair with the smallest difference among the at least one second reference prediction block pair;

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is smaller than the difference between the first reference prediction block and the second reference prediction block, according to the difference Performing a motion search on the smallest second reference prediction block pair to obtain at least one third reference prediction block pair;

Wherein, the third reference prediction block pair includes a fifth reference prediction block and a sixth reference prediction block, and the fifth reference prediction block is based on the third reference prediction block included in the second reference prediction block pair with the smallest difference. Obtained by performing a motion search in the first preset area, and the sixth reference prediction block is based on a second reference prediction block with the smallest difference to a fourth reference prediction block included in a motion search in the second preset area get;

It is determined that the fifth reference prediction block included in the third reference prediction block pair with the smallest difference is the first modified reference prediction block, and it is determined that the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is all The second modified reference prediction block.

In one possible design, it also includes:

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is greater than the difference between the first reference prediction block and the second reference prediction block, it is determined The first reference prediction block is the first modified reference prediction block, and it is determined that the second reference prediction block is a second modified reference prediction block.

In one possible design, it also includes:

When it is determined that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is greater than the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference When it is determined that the third reference prediction block included in the second reference prediction block pair with the smallest difference is the first modified reference prediction block, it is determined that the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is The second modification refers to the prediction block.

Using the first reference prediction block pair corresponding to the first initial motion vector prediction value and the second reference prediction block pair corresponding to the second initial motion vector prediction value as a basic reference prediction block pair to perform a motion search;

If it is determined that the difference of the reference prediction block pair with the smallest difference among the at least one reference prediction block pair obtained by the motion search is less than the difference of the basic reference prediction block pair, the reference prediction block pair with the smallest difference is updated to the basic reference prediction block pair , Continue to perform motion search based on the updated basic reference prediction block pair;

Wherein, the difference between the reference prediction block pair is the difference between the first reference prediction block and the second reference prediction block included in the reference prediction block pair, and the first reference prediction block included in the reference prediction block pair after search is determined based on the basic reference prediction block. A motion search is performed on the included first reference prediction block in the surrounding preset area, and the surrounding preset area of the first reference prediction block included in the basic reference prediction block is located in the first preset area; The second reference prediction block included in the reference prediction block pair is obtained by performing a motion search on the included second reference prediction block in the surrounding preset area based on the basic reference prediction block. The comparison between the basic reference prediction block and the included second reference prediction block The peripheral preset area is located in the second preset area;

If it is determined that the difference of the basic reference prediction block pair is smaller than the difference of any reference prediction block pair obtained by the motion search, the motion search is stopped, the first reference prediction block included in the basic reference prediction block pair is used as the target reference prediction block, and the basic reference prediction block The second reference prediction block included in the prediction block pair is used as the target reference prediction block;

If it is determined that the search area of the first reference prediction block included in the basic reference prediction block pair exceeds the first preset area or the search area of the second reference prediction block included in the basic reference prediction block pair exceeds the second preset area, Stop motion search.

In the second aspect, an embodiment of the present application provides a video image prediction method, including:

Determine (or obtain) the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the first image block to be processed; predict according to the first initial motion vector Value, the first motion vector difference to determine the first motion vector prediction value, and the second motion vector prediction value is determined according to the second initial motion vector prediction value and the second motion vector difference (in other words, according to the first initial motion vector The predicted value performs a motion vector correction process to obtain a first corrected motion vector predicted value, and performs a motion vector correction process according to the second initial motion vector predicted value to obtain a second corrected motion vector predicted value); according to the first motion vector predicted value 1. The second motion vector predictor performs a motion vector correction process to obtain a first modified motion vector predictor and a second modified motion vector predictor; according to the first modified motion vector predictor and the second modified The motion vector predictor predicts the first image block to be processed.

According to the solution provided by the embodiments of the present application, after combining the two initial motion vector prediction values with the motion vector difference, the motion vector correction process is performed, and then the motion vector prediction value after the correction is used for inter-frame prediction. Compared with the traditional In terms of processing methods, the prediction accuracy will be relatively improved.

In a possible design, determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed includes: The first flag parsed in the code stream (for example, mmvd_flag[x0][y0]) indicates that when the current image block to be processed is inter-predicted using the fused motion vector difference MMVD method, the first flag of the current image block to be processed is determined The initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

In the above design, when MMVD is used for prediction, the solution provided by the embodiment of this application is adopted, and the motion vector correction process is performed based on the two initial motion vector prediction values obtained by the MMVD method. Compared with the traditional method, the accuracy is Will be relatively improved.

The above design can be applied to the decoding side.

In a possible design, applied to the decoding side, when determining the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector of the current image block to be processed Before the difference, it also includes: determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed when the fused motion vector difference MMVD method is used for inter-frame prediction of the current image block to be processed , The first motion vector difference, and the second motion vector difference.

In a possible design, the determining the first initial motion vector predictor and the second initial motion vector predictor of the currently to-be-processed image block includes: from the candidate index parsed from the code stream from The candidate list determines the corresponding candidate motion information, the candidate motion information includes a third motion vector predictor and a fourth motion vector predictor, the third motion vector predictor is used as the first initial motion vector predictor, and the second Four motion vector predictors are used as the second initial motion vector predictors; or, the third motion vector predictor and the fourth motion vector predictor included in the candidate motion information at the first position in the candidate list are determined to be the first initial motion The vector predicted value and the second initial motion vector predicted value.

The above design can be applied to the decoding side.

In a possible design, applied to the encoding side, the determining the first initial motion vector predictor and the second initial motion vector predictor of the current image block to be processed includes: selecting from a candidate list according to a rate-distortion cost algorithm The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor, the third motion vector predictor serves as the first initial motion vector predictor, and the fourth motion vector predictor The vector predictor is used as the second initial motion vector predictor; or, it is determined that the third motion vector predictor and the fourth motion vector predictor included in the candidate motion information at the first position in the candidate list are the first initial motion vector predictor. Value and the second initial motion vector predicted value.

In a possible design, the executing the motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value includes:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first motion vector predicted value and the second motion vector predicted value.

In one possible design, it also includes:

When the image block to which the candidate motion information belongs belongs to the same image as the current image block to be processed, the current image block to be processed is determined according to the first motion vector prediction value and the second motion vector prediction value. Make predictions.

In a possible design, determining the first initial motion vector predictor and the second initial motion vector predictor of the current image block to be processed includes: according to a candidate index parsed from the code stream (for example, base candidate index) Determine the corresponding candidate (for example, base candidate) from the candidate list, where the candidate includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the fifth motion A vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value; or,

The above design can be applied to the decoding side.

In a possible design, it is applied to the encoding side to determine the first initial motion vector predictor and the second initial motion vector predictor of the current image block to be processed, including:

The candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, where the first candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor. The second candidate motion information includes the seventh motion vector predictor and the eighth motion vector predictor;

Acquiring a first reference prediction block corresponding to the first motion vector prediction value, and a second reference prediction block corresponding to the second motion vector prediction value;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor. The above design can be applied to the encoding side as well as the decoding side.

It is determined that the fifth reference prediction block included in the third reference prediction block pair with the smallest difference is the first modified reference prediction block, and it is determined that the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is all The second modified reference prediction block. The above design can be applied to the encoding side as well as the decoding side.

In one possible design, it also includes:

In a third aspect, an embodiment of the present application provides a video image prediction device, including:

A prediction unit, configured to determine the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the current image block to be processed;

A correction unit, configured to perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain a first corrected motion vector predicted value and a second corrected motion vector predicted value;

The prediction unit is further configured to determine a first motion vector prediction value according to the difference between the first modified motion vector prediction value and the first motion vector, and determine the first motion vector prediction value according to the second modified motion vector prediction value and the second motion vector prediction value. The motion vector difference determines a second motion vector predictor; and predicts the current image block to be processed according to the first motion vector predictor and the second motion vector predictor.

In different application scenarios, the image prediction device is for example applied to a video encoding device (video encoder) or a video decoding device (video decoder).

As an example, the function of the foregoing apparatus may be implemented by an inter prediction unit. The inter prediction unit includes a prediction unit and a correction unit.

In a possible design, the prediction unit determines the first initial motion vector predicted value, the second initial motion vector predicted value, the first motion vector difference, and the second motion vector of the current image block to be processed. The bad aspects are specifically used for:

When the first identifier parsed from the code stream indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, the first initial motion vector predictor and the second prediction value of the current image block to be processed are determined. The initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

The device provided by this design can be applied to the decoder. The action of parsing the first identifier from the code stream may be performed by the entropy decoding unit in the decoder, and the entropy decoding unit parses the first identifier from the code stream and transmits it to the prediction unit in the image prediction device.

It should be noted that the subsequent design of the device including the parsing action from the code stream is suitable for the decoder.

In a possible design, when the device is applied to an encoder, the prediction unit determines the first initial motion vector predicted value, the second initial motion vector predicted value, and the second predicted value of the current image block to be processed. The aspects of the first motion vector difference and the second motion vector difference are specifically used for:

When it is determined that the fusion motion vector difference MMVD method is adopted for inter-frame prediction of the current image block to be processed (for example, MMVD is selected in multiple inter-frame prediction modes according to the rate-distortion cost algorithm), the first image block of the current image block to be processed is determined An initial motion vector predictor, a second initial motion vector predictor, a first motion vector difference, and a second motion vector difference.

In a possible design, the prediction unit is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed:

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value.

The device provided by the above design can be applied to a decoder.

In a possible design, the device is applied to an encoder, and the prediction unit is specifically designed to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed Used for:

Determine the corresponding candidate motion information from the candidate list according to the rate-distortion cost algorithm. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor is used as the first initial motion vector. Predicted value, the fourth motion vector predicted value is used as the second initial motion vector predicted value.

The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information of the first position in the candidate list are determined as the first initial motion vector predictor and the second initial motion vector predictor.

The device provided by the above design can be applied to an encoder or a decoder.

In a possible design, the correction unit is specifically used for:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first initial motion vector predicted value and the second initial motion vector predicted value.

In a possible design, the prediction unit is further configured to, when the image block to which the candidate motion information belongs and the current image block to be processed belong to the same image, according to the first initial motion vector predicted value, The first motion vector difference determines the first target motion vector predictor, and the second target motion vector predictor is determined according to the second initial motion vector predictor and the second motion vector difference; according to the first target motion vector predictor And the second target motion vector predictor to predict the current image block to be processed.

A corresponding candidate item is determined from the candidate list according to the candidate index parsed from the code stream, the candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the fifth motion A vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The device provided by the above design can be applied to a decoder.

In a possible design, the device provided by this design is applied to an encoder, and the prediction unit is used in determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed , Specifically used for:

According to the rate-distortion cost algorithm, a corresponding candidate item is determined from the candidate list. The candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the fifth motion vector predictor and the second candidate motion information. Six motion vector predictors, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

In a possible design, the candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, wherein the first candidate motion information includes the fifth motion vector predictor and the sixth candidate motion information. A motion vector predictor, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The prediction unit is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed:

In a possible design, the correction unit is specifically configured to perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value:

In a possible design, the modification unit specifically uses the aspect of determining a first modified reference prediction block according to the first reference prediction block and determining a second modified reference prediction block according to the second reference prediction block. in:

In a possible design, the correction unit is also used for:

In a fourth aspect, an embodiment of the present application provides a video image prediction device, including:

A prediction unit, configured to determine a first initial motion vector prediction value, a second initial motion vector prediction value, a first motion vector difference, and a second motion vector difference of the first image block to be processed;

The correction unit is configured to determine a first motion vector predictor according to the first initial motion vector predictor and the first motion vector difference, and determine a second motion according to the second initial motion vector predictor and the second motion vector difference Vector predicted value;

The prediction unit is further configured to perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion vector predicted value ; Predict the first image block to be processed according to the first modified motion vector prediction value and the second modified motion vector prediction value.

In different application scenarios, the video image prediction device is applied to, for example, a video encoding device (video encoder) or a video decoding device (video decoder).

The device provided by the above design can be applied to a decoder.

In a possible design, when applied to an encoder, the prediction unit determines the first initial motion vector prediction value, the second initial motion vector prediction value, and the first motion vector difference of the current image block to be processed. , And the aspect of the second motion vector difference, specifically used for:

When the fusion motion vector difference MMVD is selected for inter prediction of the current image block to be processed in multiple inter prediction modes according to the rate-distortion cost algorithm, the first initial motion vector predictor of the current image block to be processed is determined, The second initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value; or,

The device provided by the above design can be applied to a decoder.

In a possible design, when applied to an encoder, the prediction unit specifically uses the predictive value of the first initial motion vector and the second predictive value of the initial motion vector of the current image block to be processed. in:

Determine the corresponding candidate motion information from the candidate list according to the rate-distortion cost algorithm. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor is used as the first initial motion vector. Predicted value, the fourth motion vector predicted value is used as the second initial motion vector predicted value; or,

In a possible design, the correction unit is specifically configured to perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value:

In a possible design, the prediction unit is also used for:

The device provided by the above design can be applied to a decoder.

The corresponding candidate items are determined from the candidate list according to the rate-distortion cost algorithm. The candidates include first candidate motion information and second candidate motion information, wherein the first candidate motion information includes the fifth motion vector predictor and the second candidate motion information. Six motion vector predictors, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

In a possible design, the correction unit, in terms of performing a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value, specifically includes:

A fifth aspect of the present application provides an image prediction device, the device includes: a processor and a memory coupled to the processor; the processor is configured to execute various implementation manners of the first aspect or the second aspect Method in.

A sixth aspect of the present application provides a video encoder. The video encoder is used to encode a current image block to be processed and includes: an inter-frame prediction module, wherein the inter-frame prediction module includes the application in the third aspect or the fourth aspect In the image prediction device provided by the design of the encoder, the inter-frame prediction module is used to predict the predicted value of the pixel value of the current image block to be processed; the entropy coding module is used to encode the indication information into the code stream , The indication information is used to indicate the initial motion information of the image block (including the first initial motion vector prediction value and the second initial motion vector prediction value); the reconstruction module is used to indicate the pixels of the current image block to be processed The predicted value of the value reconstructs the image block.

The seventh aspect of the present application provides a video decoder, which is used to decode image blocks from a code stream, and includes: an entropy decoding module, which is used to decode indication information from the code stream. To indicate the initial motion information (including the first initial motion vector predictor and the second initial motion vector predictor) of the current image block to be processed; the inter-frame prediction module includes the one applied to the decoder in the third or fourth aspect Design the provided image prediction device, the inter-frame prediction module is used to predict the predicted value of the pixel value of the current image block to be processed; the reconstruction module is used to predict the pixel value based on the current image block to be processed Value reconstruction of the current image block to be processed.

In an eighth aspect, an embodiment of the present application provides a device for decoding video data, and the device includes:

Memory and video decoder.

Wherein, the memory is used to store video data in the form of a code stream, and the video data includes one or more image blocks;

In a possible example, a video decoder is used to determine (or obtain) the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second Motion vector difference; according to the first initial motion vector predicted value and the second initial motion vector predicted value, perform a motion vector correction (or motion vector refinement, such as DMVR) process to obtain the first corrected motion vector predicted value And the second modified motion vector predictor (in other words, the motion vector correction process is performed according to the first initial motion vector predictor to obtain the first modified motion vector predictor, and the motion vector correction process is performed according to the second initial motion vector predictor to Obtain the second modified motion vector predictor); determine the first motion vector predictor according to the difference between the first modified motion vector predictor and the first motion vector, and determine the first motion vector predictor according to the second modified motion vector predictor and the The second motion vector difference determines a second motion vector prediction value; the current image block to be processed is predicted according to the first motion vector prediction value and the second motion vector prediction value.

The video decoder can specifically implement the method corresponding to the design of the decoder described in the first aspect. The video decoder includes any device in the third aspect applied to the design of an inter prediction unit or a decoder.

In another possible example, a video decoder is used to determine (or obtain) the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the first image block to be processed. A second motion vector difference; determine a first motion vector predictor according to the first initial motion vector predictor and a first motion vector difference, and determine a second motion vector predictor according to the second initial motion vector predictor and a second motion vector difference Motion vector prediction value (in other words, the motion vector correction process is performed according to the first initial motion vector prediction value to obtain the first modified motion vector prediction value, and the motion vector correction process is performed according to the second initial motion vector prediction value to obtain the second correction Motion vector prediction value); according to the first motion vector prediction value and the second motion vector prediction value, perform a motion vector correction process to obtain the first modified motion vector prediction value and the second modified motion vector prediction value; The first modified motion vector predictor and the second modified motion vector predictor predict the first image block to be processed.

The video decoder can specifically implement the method corresponding to the design of the decoder described in the second aspect. The video decoder includes any device in the fourth aspect applied to the design of an inter prediction unit or a decoder.

In a ninth aspect, an embodiment of the present application provides a device for encoding video data, and the device includes:

Memory and video encoder.

In a possible example, a video encoder is used to determine (or obtain) the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second Motion vector difference; according to the first initial motion vector predicted value and the second initial motion vector predicted value, perform a motion vector correction (or motion vector refinement, such as DMVR) process to obtain the first corrected motion vector predicted value And the second modified motion vector predictor (in other words, the motion vector correction process is performed according to the first initial motion vector predictor to obtain the first modified motion vector predictor, and the motion vector correction process is performed according to the second initial motion vector predictor to Obtain the second modified motion vector predictor); determine the first motion vector predictor according to the difference between the first modified motion vector predictor and the first motion vector, and determine the first motion vector predictor according to the second modified motion vector predictor and the The second motion vector difference determines a second motion vector prediction value; the current image block to be processed is predicted according to the first motion vector prediction value and the second motion vector prediction value.

Exemplarily, the video encoder may implement the method corresponding to the design of the encoder described in the first aspect. The video encoder includes any device in the third aspect applied to the design of the inter prediction unit.

In another possible example, a video encoder is used to determine (or obtain) the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the first image block to be processed. A second motion vector difference; determine a first motion vector predictor according to the first initial motion vector predictor and a first motion vector difference, and determine a second motion vector predictor according to the second initial motion vector predictor and a second motion vector difference Motion vector prediction value (in other words, the motion vector correction process is performed according to the first initial motion vector prediction value to obtain the first modified motion vector prediction value, and the motion vector correction process is performed according to the second initial motion vector prediction value to obtain the second correction Motion vector prediction value); according to the first motion vector prediction value and the second motion vector prediction value, perform a motion vector correction process to obtain the first modified motion vector prediction value and the second modified motion vector prediction value; The first modified motion vector predictor and the second modified motion vector predictor predict the first image block to be processed.

Exemplarily, the video encoder may implement the method corresponding to the design of the encoder described in the second aspect. The video encoder includes any device applied to the design of an inter prediction unit in the fourth aspect.

In a tenth aspect, an embodiment of the present application provides an encoding device, including: a non-volatile memory and a processor coupled with each other, the processor calls the program code stored in the memory to execute the first aspect or the second aspect The method described in the aspect corresponds to some or all of the steps of the method applied to the design of the encoder.

In an eleventh aspect, an embodiment of the present application provides a decoding device, including: a non-volatile memory and a processor coupled with each other, the processor calls the program code stored in the memory to execute the first aspect or the second aspect The second aspect corresponds to some or all of the steps of the method applied to the design of the decoder.

In a twelfth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores program code, where the program code includes any one of the first aspect or the second aspect Instructions for some or all of the steps of the method.

In a thirteenth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute part or all of the steps of any one of the first aspect or the second aspect .

A fourteenth aspect of the present application provides an electronic device, including the video encoder according to the sixth aspect, or the video decoder according to the seventh aspect, or the third, fourth, or fifth aspect. The image prediction device described.

It should be understood that the third to fourteenth aspects of this application are the same as or similar to the technical solutions of the first and second aspects of this application, and the beneficial effects achieved by each aspect and corresponding feasible implementation manners are similar, and will not be repeated here. .

Description of the drawings

FIG. 1A is a block diagram of an example of a video encoding and decoding system 10 used to implement an embodiment of the present application;

FIG. 1B is a block diagram of an example of a video decoding system 40 used to implement an embodiment of the present application;

FIG. 2 is a block diagram of an example structure of an encoder 20 used to implement an embodiment of the present application;

FIG. 3 is a block diagram of an example structure of a decoder 30 used to implement an embodiment of the present application;

FIG. 4 is a block diagram of an example of a video decoding device 400 used to implement an embodiment of the present application;

Fig. 5 is a block diagram of another example of an encoding device or a decoding device for implementing an embodiment of the present application;

FIG. 6 is a schematic diagram of candidate blocks in the spatial domain and the time domain used to implement an embodiment of the present application;

FIG. 7A is a schematic diagram of MMVD search points used to implement an embodiment of the present application;

FIG. 7B is a schematic diagram of an MMVD search process used to implement an embodiment of the present application;

FIG. 8 is a schematic flowchart of a method for predicting and predicting a video image according to an embodiment of the present application;

FIG. 9 is a schematic diagram of forward and backward reference images used to implement an embodiment of the present application;

FIG. 10A is a schematic diagram of a candidate list used to implement an embodiment of the present application;

Fig. 10B is a schematic diagram of selecting a motion vector of a prediction block of a current block for implementing an embodiment of the present application;

FIG. 11 is a schematic diagram of a search point used to implement an embodiment of the present application;

Fig. 12 is a schematic diagram of a motion vector refinement process used to implement an embodiment of the present application;

FIG. 13 is a schematic flowchart of another video image prediction and prediction method used to implement an embodiment of the present application;

FIG. 14 is a schematic diagram of a motion vector refinement process used to implement an embodiment of the present application;

FIG. 15 is a structural block diagram of a video image prediction device 1500 used to implement an embodiment of the present application.

detailed description

The embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. In the following description, reference is made to the accompanying drawings that form a part of the present disclosure and illustrate specific aspects of the embodiments of the present application or specific aspects that can be used in the embodiments of the present application. It should be understood that the embodiments of the present application may be used in other aspects, and may include structural or logical changes not depicted in the drawings. Therefore, the following detailed description should not be understood in a restrictive sense, and the scope of this application is defined by the appended claims. For example, it should be understood that the content disclosed in conjunction with the described method may be equally applicable to the corresponding device or system for executing the method, and vice versa. For example, if one or more specific method steps are described, the corresponding device may include one or more units such as functional units to perform the described one or more method steps (for example, one unit performs one or more steps) , Or multiple units, each of which performs one or more of multiple steps), even if such one or more units are not explicitly described or illustrated in the drawings. On the other hand, for example, if a specific device is described based on one or more units such as functional units, the corresponding method may include one step to perform the functionality of one or more units (for example, one step performs one or more units). The functionality, or multiple steps, each of which performs the functionality of one or more of the multiple units), even if such one or more steps are not explicitly described or illustrated in the drawings. Further, it should be understood that, unless expressly stated otherwise, the features of the exemplary embodiments and/or aspects described herein can be combined with each other.

The technical solutions involved in the embodiments of this application may not only be applied to existing video coding standards (such as H.264, HEVC, etc.), but may also be applied to future video coding standards (such as H.266). The terminology used in the implementation mode of this application is only used to explain the specific embodiments of this application, and is not intended to limit this application. The following briefly introduces some concepts that may be involved in the embodiments of the present application.

Video coding generally refers to processing a sequence of pictures that form a video or video sequence. In the field of video coding, the terms "picture", "frame" or "image" can be used as synonyms. Video encoding used in this article means video encoding or video decoding. Video encoding is performed on the source side and usually includes processing (for example, by compressing) the original video picture to reduce the amount of data required to represent the video picture, so as to store and/or transmit more efficiently. Video decoding is performed on the destination side and usually involves inverse processing relative to the encoder to reconstruct the video picture. The “encoding” of video pictures involved in the embodiments should be understood as involving “encoding” or “decoding” of a video sequence. The combination of the encoding part and the decoding part is also called codec (encoding and decoding).

A video sequence includes a series of pictures, the pictures are further divided into slices, and the slices are divided into blocks. Video coding is performed in units of blocks. In some new video coding standards, the concept of blocks is further expanded. For example, there is a macroblock (MB) in the H.264 standard, and the macroblock can be further divided into multiple prediction blocks (partitions) that can be used for predictive coding. In the high-efficiency video coding (HEVC) standard, basic concepts such as coding unit (CU), prediction unit (PU), and transform unit (TU) are adopted, which are functionally A variety of block units are divided, and a new tree-based structure is used for description. For example, the CU can be divided into smaller CUs according to the quadtree, and the smaller CUs can be further divided to form a quadtree structure. The CU is a basic unit for dividing and encoding the coded image. There is a similar tree structure for PU and TU. PU can correspond to prediction block and is the basic unit of prediction coding. The CU is further divided into multiple PUs according to the division mode. The TU can correspond to the transform block and is the basic unit for transforming the prediction residual. However, no matter CU, PU or TU, they all belong to the concept of block (or image block) in nature.

For example, in HEVC, a CTU is split into multiple CUs by using a quadtree structure represented as a coding tree. A decision is made at the CU level whether to use inter-picture (temporal) or intra-picture (spatial) prediction to encode picture regions. Each CU can be further split into one, two or four PUs according to the PU split type. The same prediction process is applied in a PU, and relevant information is transmitted to the decoder on the basis of the PU. After the residual block is obtained by applying a prediction process based on the PU split type, the CU may be divided into transform units (TU) according to other quadtree structures similar to the coding tree used for the CU. In the latest development of video compression technology, quad-tree and binary tree (Quad-tree and Binary Tree, QTBT) are used to divide frames to divide coding blocks. In the QTBT block structure, the CU may have a square or rectangular shape.

In this article, for ease of description and understanding, the image block to be processed in the current image can be referred to as the current block or image block to be processed. For example, in encoding, it refers to the block currently being encoded; in decoding, it refers to the currently being decoded. Piece. The decoded image block used to predict the current block in the reference image is called a reference block, that is, a reference block is a block that provides a reference signal for the current block, where the reference signal represents the pixel value in the image block. The block in the reference image that provides the prediction signal for the current block may be a prediction block, where the prediction signal represents the pixel value or sample value or sample signal in the prediction block. For example, after traversing multiple reference blocks, the best reference block is found. This best reference block will provide prediction for the current block, and this block is called a prediction block.

In the case of lossless video coding, the original video picture can be reconstructed, that is, the reconstructed video picture has the same quality as the original video picture (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, for example, quantization is performed to perform further compression to reduce the amount of data required to represent the video picture, and the decoder side cannot completely reconstruct the video picture, that is, the quality of the reconstructed video picture is compared with the original video picture The quality is low or poor.

Several video coding standards of H.261 belong to "lossy hybrid video coding and decoding" (that is, combining spatial and temporal prediction in the sample domain with 2D transform coding for applying quantization in the transform domain). Each picture of a video sequence is usually divided into a set of non-overlapping blocks, and is usually coded at the block level. In other words, the encoder side usually processes the video at the block (video block) level, that is, encodes the video. For example, the prediction block is generated by spatial (intra-picture) prediction and temporal (inter-picture) prediction, from the current block (currently processed or to be processed). The processed block) subtracts the prediction block to obtain the residual block, transforms the residual block in the transform domain and quantizes the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side will process the inverse of the encoder Partially applied to the coded or compressed block to reconstruct the current block for representation. In addition, the encoder duplicates the decoder processing loop, so that the encoder and the decoder generate the same prediction (for example, intra prediction and inter prediction) and/or reconstruction for processing, that is, to encode subsequent blocks.

The following describes the system architecture applied by the embodiments of the present application. Referring to FIG. 1A, FIG. 1A exemplarily shows a schematic block diagram of a video encoding and decoding system 10 applied in an embodiment of the present application. As shown in FIG. 1A, the video encoding and decoding system 10 may include a source device 12 and a destination device 14. The source device 12 generates encoded video data. Therefore, the source device 12 may be referred to as a video encoding device. The destination device 14 can decode the encoded video data generated by the source device 12, and therefore, the destination device 14 can be referred to as a video decoding device. Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program codes in the form of instructions or data structures accessible by a computer, as described herein. The source device 12 and the destination device 14 may include various devices, including desktop computers, mobile computing devices, notebook (for example, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones. Computers, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, wireless communication equipment, or the like.

Although FIG. 1A shows the source device 12 and the destination device 14 as separate devices, the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding The functionality of the destination device 14 or the corresponding functionality. In such embodiments, the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .

The source device 12 and the destination device 14 may communicate with each other via a link 13, and the destination device 14 may receive encoded video data from the source device 12 via the link 13. Link 13 may include one or more media or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time. In this example, the source device 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 14. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.

The source device 12 includes an encoder 20, and optionally, the source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22. In a specific implementation form, the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:

The picture source 16, which can include or can be any type of picture capture device, for example to capture real-world pictures, and/or any type of pictures or comments (for screen content encoding, some text on the screen is also considered to be encoded Picture or part of an image) generating equipment, for example, a computer graphics processor for generating computer animation pictures, or for obtaining and/or providing real world pictures, computer animation pictures (for example, screen content, virtual reality, VR) pictures), and/or any combination thereof (for example, augmented reality (AR) pictures). The picture source 16 may be a camera for capturing pictures or a memory for storing pictures. The picture source 16 may also include any type of (internal or external) interface for storing previously captured or generated pictures and/or acquiring or receiving pictures. When the picture source 16 is a camera, the picture source 16 may be, for example, a local or an integrated camera integrated in the source device; when the picture source 16 is a memory, the picture source 16 may be local or, for example, an integrated camera integrated in the source device. Memory. When the picture source 16 includes an interface, the interface may be, for example, an external interface for receiving pictures from an external video source. The external video source is, for example, an external picture capturing device, such as a camera, an external memory, or an external picture generating device, such as It is an external computer graphics processor, computer or server. The interface can be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.

Among them, a picture can be regarded as a two-dimensional array or matrix of picture elements. The pixel points in the array can also be called sampling points. The number of sampling points of the array or picture in the horizontal and vertical directions (or axis) defines the size and/or resolution of the picture. In order to represent colors, three color components are usually used, that is, pictures can be represented as or contain three sample arrays. For example, in the RBG format or color space, a picture includes corresponding red, green, and blue sample arrays. However, in video coding, each pixel is usually expressed in a luminance/chrominance format or color space. For example, for a picture in the YUV format, it includes the luminance component indicated by Y (sometimes indicated by L) and the two indicated by U and V. Chrominance components. The luma component Y represents brightness or gray level intensity (for example, the two are the same in a grayscale picture), and the two chroma components U and V represent chroma or color information components. Correspondingly, a picture in the YUV format includes a luminance sample array of luminance sample values (Y), and two chrominance sample arrays of chrominance values (U and V). Pictures in RGB format can be converted or converted to YUV format, and vice versa. This process is also called color conversion or conversion. If the picture is black and white, the picture may only include the luminance sample array. In the embodiment of the present application, the picture transmitted from the picture source 16 to the picture processor may also be referred to as original picture data 17.

The picture preprocessor 18 is configured to receive the original picture data 17 and perform preprocessing on the original picture data 17 to obtain the preprocessed picture 19 or the preprocessed picture data 19. For example, the pre-processing performed by the picture pre-processor 18 may include trimming, color format conversion (for example, conversion from RGB format to YUV format), toning, or denoising.

The encoder 20 (or video encoder 20) is configured to receive the pre-processed picture data 19, and process the pre-processed picture data 19 using a relevant prediction mode (such as the prediction mode in the various embodiments herein), thereby The encoded picture data 21 is provided (the structure details of the encoder 20 will be described further based on FIG. 2 or FIG. 4 or FIG. 5). In some embodiments, the encoder 20 may be used to implement the various embodiments described below to realize the application of the chrominance block prediction method described in this application on the encoding side.

The communication interface 22 can be used to receive the encoded picture data 21, and can transmit the encoded picture data 21 to the destination device 14 or any other device (such as a memory) via the link 13 for storage or direct reconstruction, so The other device can be any device used for decoding or storage. The communication interface 22 may be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a data packet, for transmission on the link 13.

The destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, a picture post processor 32, and a display device 34. They are described as follows:

The communication interface 28 may be used to receive the encoded picture data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded picture data storage device. The communication interface 28 can be used to transmit or receive the encoded picture data 21 via the link 13 between the source device 12 and the destination device 14 or via any type of network. The link 13 is, for example, a direct wired or wireless connection. The type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof. The communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded picture data 21.

Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded picture data Information about the transmission of the transmitted data.

The decoder 30 (or referred to as the decoder 30) is used to receive the encoded picture data 21 and provide the decoded picture data 31 or the decoded picture 31 (below will further describe the decoder 30 based on Figure 3 or Figure 4 or Figure 5 Structural details). In some embodiments, the decoder 30 may be used to implement the various embodiments described below to realize the application of the chrominance block prediction method described in this application on the decoding side.

The picture post processor 32 is configured to perform post-processing on the decoded picture data 31 (also referred to as reconstructed picture data) to obtain post-processed picture data 33. The post-processing performed by the picture post-processor 32 may include: color format conversion (for example, conversion from YUV format to RGB format), toning, trimming or resampling, or any other processing, and can also be used to convert post-processed picture data 33 Transmission to display device 34.

The display device 34 is configured to receive the post-processed image data 33 to display the image to, for example, users or viewers. The display device 34 may be or may include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor. For example, the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (digital light processor, DLP) or any other type of display.

Although FIG. 1A shows the source device 12 and the destination device 14 as separate devices, the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or Corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .

It is obvious to those skilled in the art based on the description that the functionality of different units or the existence and (accurate) division of the functionality of the source device 12 and/or the destination device 14 shown in FIG. 1A may vary according to actual devices and applications. The source device 12 and the destination device 14 may include any of a variety of devices, including any type of handheld or stationary device, for example, a notebook or laptop computer, mobile phone, smart phone, tablet or tablet computer, video camera, desktop Computers, set-top boxes, televisions, cameras, in-vehicle devices, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices And so on, and can not use or use any type of operating system.

Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the technology is partially implemented in software, the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure . Any of the foregoing content (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.

In some cases, the video encoding and decoding system 10 shown in FIG. 1A is only an example, and the technology of this application can be applied to video encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, video encoding or video encoding). decoding). In other instances, the data can be retrieved from local storage, streamed on the network, etc. The video encoding device can encode data and store the data to the memory, and/or the video decoding device can retrieve the data from the memory and decode the data. In some instances, encoding and decoding are performed by devices that do not communicate with each other but only encode data to the memory and/or retrieve data from the memory and decode the data.

Referring to FIG. 1B, FIG. 1B is an explanatory diagram of an example of a video coding system 40 including the encoder 20 of FIG. 2 and/or the decoder 30 of FIG. 3 according to an exemplary embodiment. The video decoding system 40 can implement a combination of various technologies in the embodiments of the present application. In the illustrated embodiment, the video decoding system 40 may include an imaging device 41, an encoder 20, a decoder 30 (and/or a video encoder/decoder implemented by the logic circuit 47 of the processing circuit 46), and an antenna 42 , One or more processors 43, one or more memories 44 and/or display devices 45.

As shown in FIG. 1B, the imaging device 41, the antenna 42, the processing circuit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the display device 45 can communicate with each other. As discussed, although the encoder 20 and the decoder 30 are used to illustrate the video coding system 40, in different examples, the video coding system 40 may include only the encoder 20 or only the decoder 30.

In some examples, antenna 42 may be used to transmit or receive an encoded bitstream of video data. In addition, in some examples, the display device 45 may be used to present video data. In some examples, the logic circuit 47 may be implemented by the processing circuit 46. The processing circuit 46 may include application-specific integrated circuit (ASIC) logic, graphics processor, general purpose processor, and so on. The video decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like. In some examples, the logic circuit 47 may be implemented by hardware, such as dedicated hardware for video encoding, and the processor 43 may be implemented by general software, an operating system, and the like. In addition, the memory 44 may be any type of memory, such as volatile memory (for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.) or non-volatile memory. Memory (for example, flash memory, etc.), etc. In a non-limiting example, the memory 44 may be implemented by cache memory. In some instances, the logic circuit 47 may access the memory 44 (e.g., to implement an image buffer). In other examples, the logic circuit 47 and/or the processing circuit 46 may include memory (e.g., cache, etc.) for implementing image buffers and the like.

In some examples, the encoder 20 implemented by logic circuits may include an image buffer (e.g., implemented by the processing circuit 46 or the memory 44) and a graphics processing unit (e.g., implemented by the processing circuit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include an encoder 20 implemented by a logic circuit 47 to implement various modules discussed with reference to FIG. 2 and/or any other encoder system or subsystem described herein. Logic circuits can be used to perform the various operations discussed herein.

In some examples, decoder 30 may be implemented by logic circuit 47 in a similar manner to implement the various modules discussed with reference to decoder 30 of FIG. 3 and/or any other decoder systems or subsystems described herein. In some examples, the decoder 30 implemented by logic circuits may include an image buffer (implemented by the processing circuit 44 or the memory 44) and a graphics processing unit (implemented by the processing circuit 46, for example). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include a decoder 30 implemented by a logic circuit 47 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder systems or subsystems described herein.

In some examples, antenna 42 may be used to receive an encoded bitstream of video data. As discussed, the encoded bitstream may include data, indicators, index values, mode selection data, etc., related to the encoded video frame discussed herein, such as data related to coded partitions (e.g., transform coefficients or quantized transform coefficients). , (As discussed) optional indicators, and/or data defining code partitions). The video coding system 40 may also include a decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream. The display device 45 is used to present video frames.

It should be understood that, for the example described with reference to the encoder 20 in the embodiments of the present application, the decoder 30 may be used to perform the reverse process. Regarding signaling syntax elements, the decoder 30 can be used to receive and parse such syntax elements, and decode related video data accordingly. In some examples, the encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such instances, the decoder 30 can parse such syntax elements and decode related video data accordingly.

It should be noted that the video image encoding method described in the embodiment of the application occurs at the encoder 20, and the video image decoding method described in the embodiment of the application occurs at the decoder 30. The encoder 20 and the decoder in the embodiment of the application The device 30 may be, for example, an encoder/decoder corresponding to video standard protocols such as H.263, H.264, HEVV, MPEG-2, MPEG-4, VP8, VP9, or next-generation video standard protocols (such as H.266, etc.).

Referring to Fig. 2, Fig. 2 shows a schematic/conceptual block diagram of an example of an encoder 20 for implementing an embodiment of the present application. In the example of FIG. 2, the encoder 20 includes a residual calculation unit 204, a transformation processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transformation processing unit 212, a reconstruction unit 214, a buffer 216, and a loop filter. 220. A decoded picture buffer (DPB) 230, a prediction processing unit 260, and an entropy coding unit 270. The prediction processing unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a mode selection unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The encoder 20 shown in FIG. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

For example, the residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the prediction processing unit 260, and the entropy encoding unit 270 form the forward signal path of the encoder 20, and for example, the inverse quantization unit 210, the inverse transform processing unit 212, and the The structure unit 214, the buffer 216, the loop filter 220, the decoded picture buffer (DPB) 230, and the prediction processing unit 260 form the backward signal path of the encoder, wherein the backward signal path of the encoder corresponds to The signal path of the decoder (see decoder 30 in FIG. 3).

The encoder 20 receives a picture 201 or an image block 203 of a picture 201, for example, a picture in a picture sequence that forms a video or a video sequence through, for example, an input 202. The image block 203 may also be called the current picture block or the picture block to be encoded, and the picture 201 may be called the current picture or the picture to be encoded (especially when the current picture is distinguished from other pictures in video encoding, the other pictures are for example the same video sequence). That is, the previous coded and/or decoded picture in the video sequence that also includes the current picture).

The embodiment of the encoder 20 may include a segmentation unit (not shown in FIG. 2) for segmenting the picture 201 into a plurality of blocks such as the image block 203, usually into a plurality of non-overlapping blocks. The segmentation unit can be used to use the same block size and the corresponding grid defining the block size for all pictures in the video sequence, or to change the block size between pictures or subsets or groups of pictures, and divide each picture into The corresponding block.

In one example, the prediction processing unit 260 of the encoder 20 may be used to perform any combination of the aforementioned segmentation techniques.

Like the picture 201, the image block 203 is also or can be regarded as a two-dimensional array or matrix of sampling points with sample values, although its size is smaller than that of the picture 201. In other words, the image block 203 may include, for example, one sampling array (for example, a luminance array in the case of a black-and-white picture 201) or three sampling arrays (for example, one luminance array and two chrominance arrays in the case of a color picture) or Any other number and/or type of array depending on the color format applied. The number of sampling points in the horizontal and vertical directions (or axes) of the image block 203 defines the size of the image block 203.

The encoder 20 shown in FIG. 2 is used to encode the picture 201 block by block, for example, to perform encoding and prediction on each image block 203.

The residual calculation unit 204 is configured to calculate the residual block 205 based on the picture image block 203 and the prediction block 265 (other details of the prediction block 265 are provided below), for example, by subtracting the sample value of the picture image block 203 sample by sample (pixel by pixel). The sample value of the block 265 is de-predicted to obtain the residual block 205 in the sample domain.

The transform processing unit 206 is configured to apply a transform such as discrete cosine transform (DCT) or discrete sine transform (DST) to the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain. . The transform coefficient 207 may also be referred to as a transform residual coefficient, and represents the residual block 205 in the transform domain.

The transform processing unit 206 may be used to apply an integer approximation of DCT/DST, such as the transform specified for HEVC/H.265. Compared with the orthogonal DCT transform, this integer approximation is usually scaled by a factor. In order to maintain the norm of the residual block processed by the forward and inverse transformation, an additional scaling factor is applied as part of the transformation process. The scaling factor is usually selected based on certain constraints. For example, the scaling factor is a trade-off between the power of 2 used for the shift operation, the bit depth of the transform coefficient, accuracy, and implementation cost. For example, on the decoder 30 side, for example, the inverse transformation processing unit 212 for the inverse transformation (and on the encoder 20 side, for example, the inverse transformation processing unit 212 for the corresponding inverse transformation) designate a specific scaling factor, and accordingly, the encoder The 20 side uses the transformation processing unit 206 to specify a corresponding scaling factor for the positive transformation.

The quantization unit 208 is used to quantize the transform coefficient 207 by applying scalar quantization or vector quantization, for example, to obtain the quantized transform coefficient 209. The quantized transform coefficient 209 may also be referred to as a quantized residual coefficient 209. The quantization process can reduce the bit depth associated with some or all of the transform coefficients 207. For example, n-bit transform coefficients can be rounded down to m-bit transform coefficients during quantization, where n is greater than m. The degree of quantization can be modified by adjusting the quantization parameter (QP). For example, for scalar quantization, different scales can be applied to achieve finer or coarser quantization. A smaller quantization step size corresponds to a finer quantization, and a larger quantization step size corresponds to a coarser quantization. The appropriate quantization step size can be indicated by a quantization parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization steps. For example, a smaller quantization parameter can correspond to fine quantization (smaller quantization step size), and a larger quantization parameter can correspond to coarse quantization (larger quantization step size), and vice versa. The quantization may include division by a quantization step size and corresponding quantization or inverse quantization performed by, for example, inverse quantization 210, or may include multiplication by a quantization step size. Embodiments according to some standards such as HEVC may use quantization parameters to determine the quantization step size. In general, the quantization step size can be calculated based on the quantization parameter using a fixed-point approximation of an equation including division. Additional scaling factors can be introduced for quantization and inverse quantization to restore the norm of the residual block that may be modified due to the scale used in the fixed-point approximation of the equations for the quantization step size and the quantization parameter. In an example embodiment, the scales of inverse transform and inverse quantization may be combined. Alternatively, a custom quantization table can be used and signaled from the encoder to the decoder in, for example, a bitstream. Quantization is a lossy operation, where the larger the quantization step, the greater the loss.

The inverse quantization unit 210 is configured to apply the inverse quantization of the quantization unit 208 on the quantized coefficients to obtain the inverse quantized coefficients 211, for example, based on or use the same quantization step size as the quantization unit 208, and apply the quantization scheme applied by the quantization unit 208 The inverse quantification scheme. The inversely quantized coefficient 211 may also be referred to as the inversely quantized residual coefficient 211, which corresponds to the transform coefficient 207, although the loss due to quantization is usually different from the transform coefficient.

The inverse transform processing unit 212 is configured to apply the inverse transform of the transform applied by the transform processing unit 206, for example, an inverse discrete cosine transform (DCT) or an inverse discrete sine transform (DST), so as to be in the sample domain Obtain the inverse transform block 213. The inverse transformation block 213 may also be referred to as an inverse transformation and inverse quantization block 213 or an inverse transformation residual block 213.

The reconstruction unit 214 (for example, the summer 214) is used to add the inverse transform block 213 (that is, the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain, for example, The sample value of the reconstructed residual block 213 and the sample value of the prediction block 265 are added.

Optionally, the buffer unit 216 (or "buffer" 216 for short) such as the line buffer 216 is used to buffer or store the reconstructed block 215 and the corresponding sample value, for example, for intra prediction. In other embodiments, the encoder can be used to use the unfiltered reconstructed block and/or the corresponding sample value stored in the buffer unit 216 to perform any type of estimation and/or prediction, such as intra-frame prediction.

For example, the embodiment of the encoder 20 may be configured such that the buffer unit 216 is used not only for storing the reconstructed block 215 for intra prediction 254, but also for the loop filter 220 unit (not shown in FIG. 2 Out), and/or, for example, the buffer unit 216 and the decoded picture buffer unit 230 form one buffer. Other embodiments may be used to use the filtered block 221 and/or blocks or samples from the decoded picture buffer 230 (neither shown in FIG. 2) as the input or basis for the intra prediction 254.

The loop filter unit 220 (or “loop filter” 220 for short) is used to filter the reconstructed block 215 to obtain the filtered block 221, thereby smoothly performing pixel conversion or improving video quality. The loop filter unit 220 is intended to represent one or more loop filters, such as deblocking filters, sample-adaptive offset (SAO) filters or other filters, such as bilateral filters, auto Adaptive loop filter (ALF), or sharpening or smoothing filter, or collaborative filter. Although the loop filter unit 220 is shown as an in-loop filter in FIG. 2, in other configurations, the loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstructed block 221. The decoded picture buffer 230 may store the reconstructed coded block after the loop filter unit 220 performs a filtering operation on the reconstructed coded block.

The embodiment of the encoder 20 (correspondingly, the loop filter unit 220) may be used to output loop filter parameters (e.g., sample adaptive offset information), for example, directly output or by the entropy encoding unit 270 or any other The entropy coding unit outputs after entropy coding, for example, so that the decoder 30 can receive and apply the same loop filter parameters for decoding.

The decoded picture buffer (DPB) 230 may be a reference picture memory that stores reference picture data for the encoder 20 to encode video data. DPB 230 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM) (including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM) (resistive RAM, RRAM)) or other types of memory devices. The DPB 230 and the buffer 216 may be provided by the same memory device or by separate memory devices. In a certain example, a decoded picture buffer (DPB) 230 is used to store the filtered block 221. The decoded picture buffer 230 may be further used to store other previous filtered blocks of the same current picture or different pictures such as the previously reconstructed picture, such as the previously reconstructed and filtered block 221, and may provide a complete previous Reconstruction is a decoded picture (and corresponding reference blocks and samples) and/or a partially reconstructed current picture (and corresponding reference blocks and samples), for example, for inter prediction. In a certain example, if the reconstructed block 215 is reconstructed without in-loop filtering, a decoded picture buffer (DPB) 230 is used to store the reconstructed block 215.

The prediction processing unit 260, also called the block prediction processing unit 260, is used to receive or obtain the image block 203 (the current image block 203 of the current picture 201) and reconstructed picture data, such as the same (current) picture from the buffer 216 The reference samples and/or the reference picture data 231 of one or more previously decoded pictures from the decoded picture buffer 230, and used to process such data for prediction, that is, the provision can be an inter-predicted block 245 or a The prediction block 265 of the intra prediction block 255.

The mode selection unit 262 may be used to select a prediction mode (for example, intra or inter prediction mode) and/or the corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.

The embodiment of the mode selection unit 262 can be used to select a prediction mode (for example, from those supported by the prediction processing unit 260) that provides the best match or minimum residual (the minimum residual means Better compression in transmission or storage), or provide minimal signaling overhead (minimum signaling overhead means better compression in transmission or storage), or consider or balance both. The mode selection unit 262 may be configured to determine a prediction mode based on rate distortion optimization (RDO), that is, select a prediction mode that provides the smallest rate-distortion optimization, or select a prediction mode whose related rate-distortion at least meets the prediction mode selection criteria .

The prediction processing performed by an example of the encoder 20 (for example, by the prediction processing unit 260) and the mode selection performed (for example, by the mode selection unit 262) will be explained in detail below.

As described above, the encoder 20 is used to determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes. The prediction mode set may include, for example, an intra prediction mode and/or an inter prediction mode.

The set of intra prediction modes may include 35 different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in H.265, or may include 67 Different intra-frame prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in H.266 under development.

In a possible implementation, the set of inter-frame prediction modes depends on the available reference pictures (ie, for example, the aforementioned at least part of the decoded pictures stored in the DBP230) and other inter-frame prediction parameters, such as whether to use the entire reference picture or only use A part of the reference picture, such as the search window area surrounding the area of the current block, to search for the best matching reference block, and/or depending on whether pixel interpolation such as half pixel and/or quarter pixel interpolation is applied, The set of inter prediction modes may include, for example, skip mode and merge mode. In specific implementation, the inter-frame prediction mode set may include the skip-based merged motion vector difference (MMVD) mode in the embodiment of the present application, or the merge-based MMVD mode. In one example, the intra prediction unit 254 may be used to perform any combination of inter prediction techniques described below.

In addition to the above prediction modes, the embodiments of the present application may also apply skip mode and/or direct mode.

The prediction processing unit 260 may be further used to divide the image block 203 into smaller block partitions or sub-blocks, for example, by iteratively using quad-tree (QT) segmentation and binary-tree (BT) segmentation. Or triple-tree (TT) segmentation, or any combination thereof, and used to perform prediction, for example, for each of the block partitions or sub-blocks, where the mode selection includes selecting the tree structure of the segmented image block 203 and selecting the application The prediction mode for each of the block partitions or sub-blocks.

The inter prediction unit 244 may include a motion estimation (ME) unit (not shown in FIG. 2) and a motion compensation (MC) unit (not shown in FIG. 2). The motion estimation unit is used to receive or obtain the picture image block 203 (the current picture image block 203 of the current picture 201) and the decoded picture 231, or at least one or more previously reconstructed blocks, for example, one or more other/different The reconstructed block of the previously decoded picture 231 is used for motion estimation. For example, the video sequence may include the current picture and the previously decoded picture 31, or in other words, the current picture and the previously decoded picture 31 may be part of the picture sequence forming the video sequence, or form the picture sequence.

For example, the encoder 20 may be used to select a reference block from multiple reference blocks of the same or different pictures among multiple other pictures, and provide the reference picture and/or provide a reference to the motion estimation unit (not shown in FIG. 2) The offset (spatial offset) between the position of the block (X, Y coordinates) and the position of the current block is used as an inter prediction parameter. This offset is also called a motion vector (MV).

The motion compensation unit is used to obtain inter prediction parameters, and perform inter prediction based on or using the inter prediction parameters to obtain the inter prediction block 245. The motion compensation performed by the motion compensation unit (not shown in FIG. 2) may include fetching or generating a prediction block based on a motion/block vector determined by motion estimation (interpolation of sub-pixel accuracy may be performed). Interpolation filtering can generate additional pixel samples from known pixel samples, thereby potentially increasing the number of candidate prediction blocks that can be used to encode picture blocks. Once the motion vector for the PU of the current picture block is received, the motion compensation unit 246 can locate the prediction block pointed to by the motion vector in a reference picture list. The motion compensation unit 246 may also generate syntax elements associated with the blocks and video slices for use by the decoder 30 when decoding picture blocks of the video slices.

Specifically, the aforementioned inter-prediction unit 244 may transmit syntax elements to the entropy encoding unit 270, and the syntax elements include inter-prediction parameters (for example, after traversing multiple inter-prediction modes and selecting the inter-prediction mode used for prediction of the current block) Instructions). In a possible application scenario, if there is only one inter-frame prediction mode, the inter-frame prediction parameter may not be carried in the syntax element. In this case, the decoder 30 can directly use the default prediction mode for decoding. It can be understood that the inter prediction unit 244 may be used to perform any combination of inter prediction techniques.

The intra prediction unit 254 is used to obtain, for example, receive a picture block 203 (current picture block) of the same picture and one or more previously reconstructed blocks, for example reconstructed adjacent blocks, for intra estimation. For example, the encoder 20 may be used to select an intra prediction mode from a plurality of (predetermined) intra prediction modes.

The embodiment of the encoder 20 may be used to select an intra prediction mode based on optimization criteria, for example, based on a minimum residual (for example, an intra prediction mode that provides a prediction block 255 most similar to the current picture block 203) or a minimum rate distortion.

The intra prediction unit 254 is further configured to determine the intra prediction block 255 based on the intra prediction parameters of the selected intra prediction mode. In any case, after selecting the intra prediction mode for the block, the intra prediction unit 254 is also used to provide intra prediction parameters to the entropy encoding unit 270, that is, to provide an indication of the selected intra prediction mode for the block Information. In one example, the intra prediction unit 254 may be used to perform any combination of intra prediction techniques.

Specifically, the aforementioned intra-prediction unit 254 may transmit syntax elements to the entropy encoding unit 270, and the syntax elements include intra-prediction parameters (for example, after traversing multiple intra-prediction modes, selecting the intra-prediction mode used for prediction of the current block) Instructions). In a possible application scenario, if there is only one intra prediction mode, the intra prediction parameters may not be carried in the syntax element. In this case, the decoder 30 can directly use the default prediction mode for decoding.

The entropy coding unit 270 is used to apply entropy coding algorithms or schemes (for example, variable length coding (VLC) scheme, context adaptive VLC (context adaptive VLC, CAVLC) scheme, arithmetic coding scheme, context adaptive binary arithmetic) Coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or other entropy Encoding method or technique) applied to quantized residual coefficients 209, inter-frame prediction parameters, intra-frame prediction parameters and/or loop filter parameters, one or all (or not applied), to obtain the output 272 For example, encoded picture data 21 output in the form of encoded bitstream 21. The encoded bitstream can be transmitted to the video decoder 30, or archived for later transmission or retrieval by the video decoder 30. The entropy encoding unit 270 may also be used for entropy encoding other syntax elements of the current video slice being encoded.

Other structural variants of the video encoder 20 can be used to encode video streams. For example, the non-transform-based encoder 20 may directly quantize the residual signal without the transform processing unit 206 for certain blocks or frames. In another embodiment, the encoder 20 may have a quantization unit 208 and an inverse quantization unit 210 combined into a single unit.

Specifically, in this embodiment of the present application, the encoder 20 may be used to implement the video image encoding method described in the following embodiments.

It should be understood that other structural changes of the video encoder 20 can be used to encode the video stream. For example, for some image blocks or image frames, the video encoder 20 may directly quantize the residual signal without being processed by the transform processing unit 206, and accordingly does not need to be processed by the inverse transform processing unit 212; or, for some For image blocks or image frames, the video encoder 20 does not generate residual data, and accordingly does not need to be processed by the transform processing unit 206, quantization unit 208, inverse quantization unit 210, and inverse transform processing unit 212; or, the video encoder 20 may The reconstructed image block is directly stored as a reference block without being processed by the filter 220; or, the quantization unit 208 and the inverse quantization unit 210 in the video encoder 20 may be combined together. The loop filter 220 is optional, and for lossless compression coding, the transform processing unit 206, the quantization unit 208, the inverse quantization unit 210, and the inverse transform processing unit 212 are optional. It should be understood that, according to different application scenarios, the inter prediction unit 244 and the intra prediction unit 254 may be selectively activated.

Referring to FIG. 3, FIG. 3 shows a schematic/conceptual block diagram of an example of a decoder 30 for implementing an embodiment of the present application. The video decoder 30 is used to receive, for example, encoded picture data (for example, an encoded bit stream) 21 encoded by the encoder 20 to obtain a decoded picture 231. During the decoding process, video decoder 30 receives video data from video encoder 20, such as an encoded video bitstream and associated syntax elements that represent picture blocks of an encoded video slice.

In the example of FIG. 3, the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (such as a summer 314), a buffer 316, a loop filter 320, and The decoded picture buffer 330 and the prediction processing unit 360. The prediction processing unit 360 may include an inter prediction unit 344, an intra prediction unit 354, and a mode selection unit 362. In some examples, video decoder 30 may perform decoding passes that are substantially reciprocal of the encoding passes described with video encoder 20 of FIG. 2.

The entropy decoding unit 304 is configured to perform entropy decoding on the encoded picture data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in FIG. 3), for example, inter prediction, intra prediction parameters , Loop filter parameters and/or any one or all of other syntax elements (decoded). The entropy decoding unit 304 is further configured to forward the inter prediction parameters, intra prediction parameters and/or other syntax elements to the prediction processing unit 360. The video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

The inverse quantization unit 310 can be functionally the same as the inverse quantization unit 110, the inverse transformation processing unit 312 can be functionally the same as the inverse transformation processing unit 212, the reconstruction unit 314 can be functionally the same as the reconstruction unit 214, and the buffer 316 can be functionally identical. Like the buffer 216, the loop filter 320 may be functionally the same as the loop filter 220, and the decoded picture buffer 330 may be functionally the same as the decoded picture buffer 230.

The prediction processing unit 360 may include an inter prediction unit 344 and an intra prediction unit 354. The inter prediction unit 344 may be functionally similar to the inter prediction unit 244, and the intra prediction unit 354 may be functionally similar to the intra prediction unit 254. . The prediction processing unit 360 is generally used to perform block prediction and/or obtain a prediction block 365 from the encoded data 21, and to receive or obtain (explicitly or implicitly) prediction-related parameters and/or information about the prediction from the entropy decoding unit 304, for example. Information about the selected prediction mode.

When a video slice is coded as an intra-coded (I) slice, the intra-prediction unit 354 of the prediction processing unit 360 is used for the intra-prediction mode based on the signal and the previous decoded block from the current frame or picture. Data to generate a prediction block 365 for the picture block of the current video slice. When a video frame is coded as inter-coded (ie, B or P) slices, the inter-frame prediction unit 344 (eg, motion compensation unit) of the prediction processing unit 360 is used for the motion vector and the received from the entropy decoding unit 304 The other syntax elements generate a prediction block 365 for the video block of the current video slice. For inter prediction, a prediction block can be generated from a reference picture in a reference picture list. The video decoder 30 may use the default construction technique to construct a list of reference frames based on the reference pictures stored in the DPB 330: list 0 and list 1.

The prediction processing unit 360 is configured to determine prediction information for the video block of the current video slice by parsing the motion vector and other syntax elements, and use the prediction information to generate the prediction block for the current video block being decoded. In an example of the present application, the prediction processing unit 360 uses some syntax elements received to determine the prediction mode (for example, intra or inter prediction) and the inter prediction slice type ( For example, B slice, P slice or GPB slice), construction information for one or more of the reference picture list for the slice, motion vector for each inter-coded video block of the slice, The inter prediction status and other information of each inter-encoded video block of the slice to decode the video block of the current video slice. In another example of the present disclosure, the syntax elements received by the video decoder 30 from the bitstream include receiving adaptive parameter set (APS), sequence parameter set (sequence parameter set, SPS), and picture parameter set (picture parameter set). parameter set, PPS) or a syntax element in one or more of the slice headers.

The inverse quantization unit 310 may be used to inverse quantize (ie, inverse quantize) the quantized transform coefficients provided in the bitstream and decoded by the entropy decoding unit 304. The inverse quantization process may include using the quantization parameter calculated by the video encoder 20 for each video block in the video slice to determine the degree of quantization that should be applied and also determine the degree of inverse quantization that should be applied.

The inverse transform processing unit 312 is used to apply an inverse transform (for example, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to transform coefficients so as to generate a residual block in the pixel domain.

The reconstruction unit 314 (for example, the summer 314) is used to add the inverse transform block 313 (that is, the reconstructed residual block 313) to the prediction block 365 to obtain the reconstructed block 315 in the sample domain, for example by adding The sample value of the reconstructed residual block 313 and the sample value of the prediction block 365 are added.

The loop filter unit 320 (during the encoding cycle or after the encoding cycle) is used to filter the reconstructed block 315 to obtain the filtered block 321, thereby smoothly performing pixel transformation or improving video quality. In one example, the loop filter unit 320 may be used to perform any combination of the filtering techniques described below. The loop filter unit 320 is intended to represent one or more loop filters, such as deblocking filters, sample-adaptive offset (SAO) filters or other filters, such as bilateral filters, auto Adaptive loop filter (ALF), or sharpening or smoothing filter, or collaborative filter. Although the loop filter unit 320 is shown as an in-loop filter in FIG. 3, in other configurations, the loop filter unit 320 may be implemented as a post-loop filter.

The decoded video block 321 in a given frame or picture is then stored in a decoded picture buffer 330 that stores reference pictures for subsequent motion compensation.

The decoder 30 is used, for example, to output the decoded picture 31 through the output 332 for presentation or viewing by the user.

Other variants of the video decoder 30 can be used to decode the compressed bitstream. For example, the decoder 30 may generate an output video stream without the loop filter unit 320. For example, the non-transform-based decoder 30 may directly inversely quantize the residual signal without the inverse transform processing unit 312 for certain blocks or frames. In another embodiment, the video decoder 30 may have an inverse quantization unit 310 and an inverse transform processing unit 312 combined into a single unit.

Specifically, in the embodiment of the present application, the decoder 30 is used to implement the video image decoding method described in the following embodiments.

It should be understood that other structural changes of the video decoder 30 can be used to decode the encoded video bitstream. For example, the video decoder 30 may generate an output video stream without processing by the filter 320; or, for some image blocks or image frames, the entropy decoding unit 304 of the video decoder 30 does not decode the quantized coefficients, and accordingly does not It needs to be processed by the inverse quantization unit 310 and the inverse transform processing unit 312. The loop filter 320 is optional; and for lossless compression, the inverse quantization unit 310 and the inverse transform processing unit 312 are optional. It should be understood that, according to different application scenarios, the inter prediction unit and the intra prediction unit may be selectively activated.

It should be understood that in the encoder 20 and decoder 30 of the present application, the processing result for a certain link can be further processed and output to the next link, for example, in interpolation filtering, motion vector derivation or loop filtering, etc. After the link, operations such as Clip or shift are further performed on the processing results of the corresponding link.

For example, the motion vector of the control point of the current image block derived from the motion vector of the adjacent affine coding block, or the motion vector of the sub-block of the current image block derived from the motion vector may undergo further processing, and this application will not do this limited. For example, restrict the value range of the motion vector so that it is within a certain bit width. Assuming that the bit width of the allowed motion vector is bitDepth, the range of the motion vector is -2^(bitDepth-1) ~ 2^(bitDepth-1)-1, where the "^" symbol represents the power. If bitDepth is 16, the value range is -32768～32767. If bitDepth is 18, the value range is -131072～131071. For another example, the value of the motion vector (for example, the motion vector MV of the four 4x4 sub-blocks in an 8x8 image block) is restricted, so that the maximum difference between the integer parts of the four 4x4 sub-blocks MV does not exceed N pixels, for example, no more than one pixel.

Refer to FIG. 4, which is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application. The video coding device 400 is suitable for implementing the embodiments described herein. In one embodiment, the video coding device 400 may be a video decoder (for example, the decoder 30 of FIG. 1A) or a video encoder (for example, the encoder 20 of FIG. 1A). In another embodiment, the video coding device 400 may be one or more components of the decoder 30 in FIG. 1A or the encoder 20 in FIG. 1A described above.

The video decoding device 400 includes: an entry port 410 for receiving data and a receiving unit (Rx) 420, a processor, logic unit or central processing unit (CPU) 430 for processing data, and a transmitter unit for transmitting data (Tx) 440 (or simply referred to as transmitter 440) and outlet port 450, as well as memory 460 (such as memory 460) for storing data. The video decoding device 400 may also include photoelectric conversion components and electro-optical (EO) components coupled with the inlet port 410, the receiver unit 420 (or simply referred to as the receiver 420), the transmitter unit 440 and the outlet port 450 for optical signals. Or the outlet or entrance of electrical signals.

The processor 430 is implemented by hardware and software. The processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGA, ASIC, and DSP. The processor 430 communicates with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460. The processor 430 includes a decoding module 470 (for example, an encoding module 470 or a decoding module 470). The encoding/decoding module 470 implements the embodiments disclosed herein to implement the chroma block prediction method provided in the embodiments of the present application. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations. Therefore, the encoding/decoding module 470 provides a substantial improvement to the function of the video decoding device 400 and affects the conversion of the video decoding device 400 to different states. Alternatively, the encoding/decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.

The memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution. The memory 460 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM), and/or static Random Access Memory (SRAM).

Referring to FIG. 5, FIG. 5 is a simplified block diagram of an apparatus 500 that can be used as either or both of the source device 12 and the destination device 14 in FIG. 1A according to an exemplary embodiment. The device 500 can implement the technology of the present application. In other words, FIG. 5 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 500 for short) according to an embodiment of the application. The decoding device 500 may include a processor 510, a memory 530, and a bus system 550. The processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute instructions stored in the memory. The memory of the decoding device stores program codes, and the processor can call the program codes stored in the memory to execute the various video image encoding or decoding methods described in this application, especially in various inter prediction modes or intra prediction modes Video encoding or decoding method. To avoid repetition, it will not be described in detail here.

In the embodiment of the present application, the processor 510 may be a central processing unit (Central Processing Unit, referred to as "CPU"), and the processor 510 may also be other general-purpose processors, digital signal processors (DSP), and dedicated integrated Circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 530 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 530. The memory 530 may include code and data 531 accessed by the processor 510 using the bus 550. The memory 530 may further include an operating system 533 and an application program 535. The application program 535 includes at least one program that allows the processor 510 to execute the video encoding or decoding method described in this application (especially the video image prediction method described in this application). For example, the application program 535 may include applications 1 to N, which further include a video encoding or decoding application (referred to as a video coding application) that executes the video encoding or decoding method described in this application.

In addition to the data bus, the bus system 550 may also include a power bus, a control bus, and a status signal bus. However, for clear description, various buses are marked as the bus system 550 in the figure.

Optionally, the decoding device 500 may further include one or more output devices, such as a display 570. In one example, the display 570 may be a touch-sensitive display that merges the display with a touch-sensitive unit operable to sense touch input. The display 570 may be connected to the processor 510 via the bus 550.

The following describes related technologies used in the inter-frame prediction involved in this application.

1) Merge mode

For the merge mode, first construct a candidate motion list (also referred to as a candidate list) based on the motion information of the adjacent coded blocks in the spatial or temporal domain of the current block, and use the candidate motion information with the least rate-distortion cost in the candidate motion list as the current The motion vector predictor (MVP) of the block, and then the index value of the position of the optimal candidate motion information in the candidate motion list (for example, denoted as merge index, the same below) is passed to the decoding end. Among them, the position of the neighboring block and its traversal sequence are predefined. The rate-distortion cost is calculated by formula (1), where J represents the rate-distortion cost RD Cost, and SAD is the sum of absolute errors between the predicted pixel value and the original pixel value obtained after motion estimation using candidate motion vector predictors (sum of absolute differences, SAD), R represents the bit rate, and λ represents the Lagrangian multiplier. The encoding end transmits the index value of the selected motion vector predictor in the candidate motion list to the decoding end. Further, the motion search is performed in the neighborhood centered on the MVP to obtain the actual motion vector of the current block, and the encoding end transmits the difference (motion vector difference) (ie, residual) between the MVP and the actual motion vector to the decoding end.

J=SAD+λR(1)

The current block spatial and temporal candidate motion information is shown in Figure 6. The spatial candidate motion information comes from 5 adjacent blocks (A0, A1, B0, B1 and B2) in space. See Figure 6, if adjacent blocks Unavailable (the neighboring block does not exist or the neighboring block is not coded or the prediction mode adopted by the neighboring block is not an inter prediction mode), then the motion information of the neighboring block is not added to the candidate motion list. The temporal candidate motion information of the current block is obtained by scaling the MV of the corresponding block in the reference frame according to the picture order count (POC) of the reference frame and the current frame. First, determine whether the block at position T in the reference frame is available, and if not, select the block at position C in the reference frame.

The position and traversal order of neighboring blocks in merge mode are also predefined, and the position and traversal order of neighboring blocks may be different in different modes.

As you can see, a list of candidate motions needs to be maintained in the merge mode. Before adding new motion information to the candidate list, it will first check whether the same motion information already exists in the list, and if it does, the motion information will not be added to the list. We call this checking process the pruning of the candidate motion list. List pruning is to prevent the same motion information from appearing in the list and avoid redundant rate-distortion cost calculation.

2) Merge with motion vector difference (MMVD) method

MMVD makes use of merge candidates. Select one or more candidate motion information from the merge candidate motion list, and then perform motion vector (MV) extended expression based on the candidate motion information. MV expansion expression includes MV starting point, movement step length and movement direction.

Using the existing merge candidate motion list, the selected candidate motion vector is the default merge type (for example, MRG_TYPE_DEFAULT_N). The selected candidate motion vector is the starting point of the MV, in other words, the selected candidate motion vector is used to determine the initial position of the MV. As shown in Table 1, the basic candidate index (Base candidate IDX) indicates which candidate motion vector in the candidate motion list is selected as the optimal candidate motion vector.

Table 1

Base candidate IDX Basecandidate IDX	00	11	22	33
N ^th MVP N ^th MVP	1 ^st MVP 1 ^st MVP	2 ^nd MVP 2 ^nd MVP	3 ^rd MVP 3 ^rd MVP	4 ^th MVP 4 ^th MVP

If the number of candidate motion vectors available for selection in the merge candidate motion list is 1, then the Basecandidate IDX may not be determined. Exemplarily, when decoding, the first candidate motion information in the candidate motion list is used as the selected candidate motion information.

The step identifier (Distance IDX) represents the offset distance information of the motion vector. The value of the step size represents the distance from the initial position (for example, the preset distance), and the definition of the preset distance is shown in Table 2.

Table 2

Distance IDX Distance IDX	00	11	22	33	44	55	66	77
Pixel distance Pixel distance	1/4-pel1/4-pel	1/2-pel1/2-pel	1-pel1-pel	2-pel2-pel	4-pel4-pel	8-pel8-pel	16-pel16-pel	32-pel32-pel

The direction IDX (Direction IDX) indicates the direction of the motion vector difference (MVD) based on the initial position. The direction indicator can include four situations in total, see Table 3 for specific definitions.

table 3

Direction IDXDirection IDX	0000	0101	1010	1111
x-axisx-axis	++	–-	N/AN/A	N/AN/A
y-axisy-axis	N/AN/A	N/AN/A	++	–-

As shown in Figure 7A, the solid line is the corresponding position of the motion vector in the L0 reference frame and the L1 reference frame of the bidirectional prediction in the motion vector starting point, and the dashed line is the pointing position of the motion vector combined with MVD, the vector between the two The difference is MVD. On the decoding side, the motion vector difference can be determined based on Distance IDX and Direction IDX. Referring to Figure 7B, taking the position of the starting point of the motion vector as the center (dotted line) as an example, the black solid dot is (shown in Table 2) the peripheral offset motion vector (the motion vector at the starting point of the motion vector) at one time Value plus MVD) pointing position, the hollow solid line dot is the pointing position of the peripheral offset motion vector (the motion vector value of the starting point of the motion vector plus MVD) twice the distance.

Referring to FIG. 7A and FIG. 7B, the process of determining the predicted pixel value of the current image block according to the MMVD method may include:

First, determine the MV starting point according to Basecandidate IDX. For example, see the hollow dotted dot in the center in Figure 7A, and the position corresponding to the solid line in Figure 7B. The solid line is the bidirectional prediction in the L0 reference frame and the L1 reference frame. The corresponding position pointed to by the motion vector identified by the Base candidate IDX. Then, based on the Direction IDX, determine which direction to shift based on the starting point of the MV, and then determine how many pixels to shift in the direction indicated by the Direction IDX based on the Distance IDX. For example, Direction IDX==00, Distance IDX=2, it means that the motion vector offset by one pixel in the positive x direction is used as the motion vector of the current image block to predict or obtain the predicted pixel value of the current image block. In other words, the motion vector difference (MVD) can be determined based on the Direction IDX and the Distance IDX, and then the motion vector predictor identified by the Basecandidate IDX is added to the determined MVD to obtain the motion vector predictor required for decoding.

If bidirectional prediction is adopted, the candidate motion information may include the forward motion vector predictor and the backward motion vector predictor. Exemplarily, the forward motion vector predicted value and the backward motion vector predicted value may be predicted values obtained by forward and backward prediction with reference to the two reference frame lists List0 and List1. In addition, the candidate motion information may include the forward and backward motion vector predictor and the picture sequence number (PictureOrderCount, POC) corresponding to the forward and backward reference prediction block. When the POC corresponding to the reference prediction block is smaller than the POC of the current block, forward prediction is identified; when the POC corresponding to the reference prediction block is greater than the POC of the current block, backward prediction is identified.

In addition, it should be noted that in this application, "at least one" means one or more, and "multiple" means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple . It should be understood that although the terms first, second, etc. may be used in the embodiments of the present application to describe each object (such as a motion vector prediction value, a reference prediction block, etc.), these terms are only used to distinguish each object from each other.

When the existing inter-frame prediction adopts the MMVD mode, in the case of bidirectional prediction, the encoding and decoding accuracy is low due to insufficient use of the matching relationship between the forward and backward reference prediction blocks.

Based on this, the embodiments of the present application provide a video image prediction method and device. When it is determined to use MMVD for decoding, if it is bidirectional prediction, a combined motion vector correction (or motion vector refinement) method can be used, such as decoding A decoder-side motion vector refinement (DMVR) method is used to correct two bidirectional motion vector predictors, and then perform a decoding operation based on the corrected motion vector predictors, thereby improving decoding accuracy. Among them, the method and the device are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.

The embodiments of the present application illustrate the following two methods, using the MMVD method combined with the motion vector correction method for prediction:

The first possible implementation manner: After the initial motion vector predictor is determined based on the candidate index, in the case of bidirectional prediction, the initial forward and backward motion vector predictor is corrected based on the motion vector correction method, based on the corrected motion The motion vector predictor combined with the vector predictor and MVD decodes the current image block to be processed.

The second possible implementation manner: After the initial motion vector predictor is determined based on the candidate index, in the case of bidirectional prediction, the initial forward and backward motion vector predictor is corrected based on the motion vector correction method, based on the corrected motion The motion vector predictor combined with the vector predictor and MVD decodes the previous block of the image to be processed.

The above two implementation manners provided by the present application will be described in detail below with reference to the accompanying drawings. In an example, the two implementation manners described above may be specifically executed by a video codec device, a video codec, a video codec system, and other devices with a video codec function. The two implementation manners described above can occur both in the encoding process and the decoding process. More specifically, the two implementation manners described above can occur in the inter-frame prediction process during encoding and decoding.

Refer to FIG. 8 for a schematic flowchart of a first possible implementation manner in the video image prediction provided by this application.

S801: Determine the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed.

In another example, the first initial motion vector prediction value corresponds to the initial motion vector prediction value in the first direction (for example, forward), and correspondingly, the second initial motion vector prediction value corresponds to the second direction (for example, Backward) initial motion vector prediction value; this application does not limit this. For example, referring to FIG. 9, the current image to which the current image block to be processed belongs in the embodiment of the present application has two reference images one after the other, which are the first reference image (such as the forward reference image) and the second reference image respectively. Reference image (such as backward reference image). That is, the first initial motion vector prediction value may be the initial forward motion vector prediction value in the forward prediction direction, and the second initial motion vector prediction value may be the initial backward motion vector prediction value in the backward prediction direction. The following takes the initial forward motion vector predictor and the initial backward motion vector predictor as examples for description.

S802. Perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain a first corrected motion vector predicted value and a second corrected motion vector predicted value.

S803. Determine a first motion vector prediction value according to the first modified motion vector prediction value and the first motion vector difference, and determine a second motion vector prediction value according to the second modified motion vector prediction value and the second motion vector difference. Motion vector prediction value.

Exemplarily, when step S804 is implemented, the sum of the first modified motion vector prediction value and the first motion vector difference may be used as the first motion vector prediction value, and the second modified motion vector prediction value may be combined with The sum of the second motion vector difference is used as the second motion vector prediction value.

S804: Predict the current image block to be processed according to the first motion vector prediction value and the second motion vector prediction value.

It should be noted that, in the embodiment of the present application, the current image block to be processed may be a sub-block after the current block is divided, or the current block. For example, during encoding or decoding, the image to be processed is divided into 16×16 image blocks for encoding and decoding. When the image prediction method provided in this embodiment of the application is executed, inter-frame prediction can be performed for 16×16 image blocks. Of course The 16×16 image block can also be further divided, for example, into 16 4×4 sub-blocks, and the image prediction method provided in the embodiment of the present application is used for each sub-block to perform inter-frame prediction. When the sub-block method is adopted, the forward and backward motion vector predictors of multiple sub-blocks belonging to the same image block and the forward and backward motion vector differences are the same, but the forward and backward motion vector prediction of each sub-block after refined processing is generally Will be different.

The execution of the solution provided in the embodiments of the application may have trigger conditions, for example, it is determined that the current image block to be processed uses the MMVD method for inter-frame prediction, and then the execution is started. Step S801 is to determine the first initial value of the current image block to be processed. The motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference. When it is determined not to adopt the MMVD method, other methods can be used for inter-frame prediction.

For example, on the decoding side, before determining the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the current image block to be processed, from the code stream Analyze the first flag (such as mmvd_flag[x0][y0]); thereby determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the first motion vector prediction value of the current image block to be processed In the case of the second motion vector difference, when the first indicator indicates that the fused motion vector difference MMVD method is used for inter-frame prediction of the current image block to be processed, the determination of the first initial motion vector predictor of the current image block to be processed is performed, The second initial motion vector predictor, the first motion vector difference, and the second motion vector difference.

In determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed, various methods can be used, for example, the following Method 1 or Method 2 to obtain.

method one:

A candidate list is constructed according to the motion information of neighboring blocks of the current image block to be processed, and a certain candidate motion information is selected from the candidate list as the predicted motion information of the current image block to be processed. As shown in Figure 6, the motion vector predictor of the neighboring block A0 is selected as the predicted motion information of the current image block. Specifically, the forward motion vector of A0 is used as the forward motion vector of the current block, and the backward motion vector of A0 The motion vector is used as the backward predictive motion vector of the current block.

It should be understood that the constructed candidate motion vector list may include multiple candidate motion information, or may only include one candidate motion information. When multiple candidate motion information is included, the candidate index will be included in the code stream during decoding, so that before the candidate motion information of the current block to be processed is determined, the candidate index is parsed from the code stream, and the candidate index is obtained from the candidate list according to the candidate index. Determine the corresponding candidate motion information. The candidate motion information corresponding to the candidate index includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor is used as the first initial motion vector predictor, and the fourth motion vector predictor is used as the second motion vector predictor. The initial motion vector prediction value. When the candidate list includes multiple candidate motion information, during encoding, when selecting candidate motion information for the current image block to be processed, the candidate motion information with the least rate-distortion cost in the candidate motion list can be used as the current block Motion vector prediction value.

When only one candidate motion information is included in the candidate list, the candidate index may not be included in the code stream during encoding, so as to determine the third motion vector predictor and the first motion vector prediction value included in the candidate motion information at the first position in the candidate list. The four motion vector predictors are the first initial motion vector predictor and the second initial motion vector predictor.

It should be understood that the candidate list includes multiple candidate motion information. When encoding, it is determined that the candidate motion information at the first position in the candidate list is used as the candidate motion information of the current image block to be processed. At this time, the candidate index may not be compiled into the code stream, so that the candidate index that is not included in the code stream during decoding can directly determine the third motion vector predictor and the first motion vector prediction value included in the candidate motion information at the first position in the candidate list. The four motion vector predictors are the first initial motion vector predictor and the second initial motion vector predictor.

Method 2: Construct a candidate list based on the motion information of the neighboring blocks of the current image block to be processed. The candidate list is constructed by using the MVs of the neighboring blocks that have been previously encoded or decoded. The neighboring blocks may be provided in accordance with this application. Perform motion vector refinement processing in the method, or adjacent blocks are processed according to the traditional technology (non-motion vector refinement). In the candidate list, the candidate motion information corresponding to the candidate index may only include the original candidate motion information (candidate motion information obtained by non-motion vector refinement processing), and the candidate motion information corresponding to the candidate index may include the original candidate motion information and refinement The processed candidate motion information. For example, as shown in Figure 10A, L0 represents the first list (list0), L1 represents the second list (list1), mvL0_A, ref0 and mvL1_A indicated by candidate index 0, ref1 represents the candidate motion information after refinement processing, and candidate index The mvL0_D, ref0 and mvL1_D, ref1 indicated by 0 represent the original candidate motion information. The mvL0_B, ref0 and mvL1_B, ref1 indicated by the candidate index 1 represent the candidate motion information after refinement processing, and the mvL0_C, ref0 and mvL1_C, and ref1 indicated by the candidate index 1 represent the original candidate motion information. .

When the candidate list includes multiple candidate motion information, when decoding, the code stream will include the candidate index, so that before the candidate motion information of the current block to be processed is determined, the candidate index is parsed from the code stream, and the candidate index is obtained according to the candidate index. Determine the corresponding candidate in the candidate list. The candidate includes two candidate motion information, such as the first candidate motion information (refined candidate motion information) and the second candidate motion information (non-refined candidate motion information) ). For example, as shown in FIG. 10A, the candidate index is 0, the first candidate motion information includes (mvL0_B, ref0 and mvL1_B, ref1), and the second candidate motion information includes mvL0_C, ref0 and mvL1_C, ref1. In one example, the two motion vector predictors included in the refined candidate motion information can be selected as the first initial motion vector predictor and the second initial motion vector predictor. In another example, the non-refined motion vector predictor can be selected. The two motion vector predictors included in the candidate motion information of the transformation process are used as the first initial motion vector predictor and the second initial motion vector predictor. In another example, it can be determined that the image block to which the candidate belongs and the current waiting Whether the processed image blocks belong to different images, when they belong to different images, the two motion vector predictors included in the refined candidate motion information can be selected as the first initial motion vector predictor and the second initial motion vector predictor. For the same image, two motion vector predictors included in the candidate motion information of the non-refined processing can be selected as the first initial motion vector predictor and the second initial motion vector predictor.

It should be noted whether the image block to which the candidate belongs and the image block to be processed currently belong to different images, that is, the image block to which the candidate (or the first candidate motion information or the second candidate motion information) belongs Or the source image block, whether the corresponding position is located in the image where the current image block to be processed is located. For example, referring to Figure 10B, the candidate motion information (first candidate motion information or second candidate motion information) corresponding to the candidate is the motion information from the T1 pixel position of the temporal neighboring block of the current image block to be processed, namely The T1 pixel position is outside the image where the current image block to be processed is located. The candidate motion information corresponding to the candidate item is the motion information from the A0 pixel position of the spatial neighboring block of the current image block to be processed, that is, the A0 pixel position is located in the image where the current image block to be processed is located.

When the candidate list includes only one candidate, when encoding, the candidate index may not be compiled into the code stream to determine the candidate at the first position in the candidate list. The candidate includes two candidate motion information, each The candidates include two candidate motion information, such as the first candidate motion information (refined (or modified) processed candidate motion information), the second candidate motion information (non-refined processed candidate motion information), an example , You can select the two motion vector predictors included in the refined candidate motion information as the first initial motion vector predictor and the second initial motion vector predictor. In another example, you can select the non-refined processed The candidate motion information includes two motion vector predictors as the first initial motion vector predictor and the second initial motion vector predictor. In another example, the image block to which the candidate belongs and the current image block to be processed can be determined Whether it belongs to different images, when it belongs to different images, two motion vector predictors included in the refined candidate motion information can be selected as the first initial motion vector predictor and the second initial motion vector predictor. When they belong to the same image , The two motion vector predictors included in the candidate motion information of the non-refined processing can be selected as the first initial motion vector predictor and the second initial motion vector predictor.

It should be understood that the above method 1 or method 2 are only examples of two specific methods for obtaining the predicted motion information of the image block. The present application does not limit the method of obtaining the motion information of the prediction block, and any method that can obtain the predicted motion information of the image block The methods are all within the protection scope of this application.

As an example, this article exemplifies a partial syntax structure for parsing the inter-frame prediction mode (including parsing the first identifier and the candidate index) used in the current image block to be processed, as shown in Table 4.

Table 4

Among them, in Table 4, mmvd_flag[x0][y0] corresponds to the first flag, and mmvd_merge_flag[x0][y0] in Table 4 can also be called mmvd_merge_idx[x0][y0], mmvd_merge_idx[x0][y0] is used for The basic candidate index indicating the selected MMVD candidate motion vector list, mmvd_merge_flag[x0][y0] or mmvd_merge_idx[x0][y0] corresponds to the candidate index mentioned in this embodiment of the application. mmvd_distance_idx[x0][y0] is used to indicate the distance index of the offset initial position. mmvd_direction_idx[x0][y0] is used to indicate the direction of the initial position MVD.

It should be noted that the first motion vector difference and the second motion vector difference mentioned in the embodiment of the present application may be determined according to mmvd_distance_idx[x0][y0] and mmvd_direction_idx[x0][y0].

As an example, when step S802 is performed, there may be a starting condition, for example, when the image block to which the candidate motion information selected based on the candidate list belongs and the current image block to be processed belong to different images, according to the first An initial motion vector prediction value, the second initial motion vector prediction value, and a motion vector correction process. When the image block to which the candidate motion information belongs based on the selected candidate list belongs to the same image as the currently to-be-processed image block, non-refinement processing can be directly performed, specifically, based on the first initial motion vector prediction value , The first motion vector difference determines the first target motion vector predictor, and the second target motion vector predictor is determined according to the second initial motion vector predictor and the second motion vector difference; and the second target motion vector predictor is predicted according to the first target motion vector Value and the second target motion vector prediction value to predict the current image block to be processed.

There may be multiple ways to perform the correction process in step S802. The following examples describe two possible motion vector correction methods. Of course, other possible motion vector correction methods can also be used in the embodiment of the present application, which will not be repeated in this application.

The first possible example:

A1: Determine the predicted motion information of the current image block to be processed. The predicted motion information includes the initial forward motion vector predicted value and the initial backward motion vector predicted value.

A2: According to the initial forward motion vector prediction value of the current image block to be processed, the forward reference prediction block of the current block is obtained in the forward reference image by the motion compensation method.

A3: According to the initial backward motion vector predictor of the current image block to be processed, the backward reference prediction block of the current image block to be processed is obtained from the backward reference image by the motion compensation method.

A4: Determine the distance between the forward reference prediction block obtained by A2 and the backward reference prediction block obtained by A3 according to the pixel value of the forward reference prediction block obtained by A2 and the pixel value of the backward reference prediction block obtained by A3 The difference.

A5: In the forward reference image described in A2, the forward reference prediction block obtained by A2 is used as a starting point to perform a motion search of integer or sub-pixel steps. The sub-pixels can be 1/2 pixels, 1/4 pixels, 1/8 pixels, 1/16 pixels, etc., all of which perform an entire pixel step motion search to obtain at least one forward prediction block of the currently decoded block.

As shown in FIG. 11, the (0,0) point position is the search starting point. In addition, the search is performed at 8 full-pixel step search points around the search starting point to obtain 8 forward reference prediction blocks. In the embodiments of the present application, the search method used is not limited, and any search method may be used.

A6: Similar to A5, in the backward reference image described in A3, the backward reference prediction block obtained by A3 is used as a starting point to perform a motion search with an entire pixel step to obtain 8 backward reference prediction blocks.

In A7, A5, and A6, for each full-pixel step, the calculated difference between the forward reference prediction block and the backward reference prediction block is calculated to obtain 8 difference values, and the 8 difference values corresponding to the search starting point are determined The smallest difference among the differences between the forward and backward reference prediction blocks. The forward and backward reference prediction blocks corresponding to the smallest difference are the best forward reference prediction block and the best backward reference prediction block.

Through the first possible example, the first initial motion vector prediction value corresponds to the first reference prediction block, and the second initial motion vector prediction value corresponds to the second reference prediction block. When performing motion vector refinement processing, a first modified reference prediction block is determined according to the first reference prediction block, and a second modified reference prediction block is determined according to the second reference prediction block.

Since in bidirectional prediction, the first reference prediction block and the second reference prediction block appear in pairs, for the convenience of description, the first reference prediction block and the second reference prediction block are referred to as a first reference prediction block pair.

In the first possible example, the motion vector correction process is based on the reference prediction block pair consisting of two reference prediction blocks as the starting search point, and the surrounding search for multiple reference prediction block pairs with the smallest difference Reference prediction block pair.

Exemplarily, when the first modified reference prediction block is determined according to the first reference prediction block, and the second modified reference prediction block is determined according to the second reference prediction block, it may be implemented in the following manner:

B1, performing a motion search according to the first reference prediction block pair to obtain at least one second reference prediction block pair.

Wherein, the first reference prediction block pair includes the first reference prediction block and the second reference prediction block; the second reference prediction block pair includes a third reference prediction block and a fourth reference prediction block, and the The third reference prediction block is obtained based on the motion search of the first reference prediction block in the first preset area, and the fourth reference prediction block is obtained based on the second reference prediction block in the second It can be obtained by motion search in the preset area.

Exemplarily, when performing a motion search based on the first reference prediction block pair in B1, the search may be performed based on the first reference prediction block pair in whole-pixel or sub-pixel steps to obtain at least one second reference Predict block pairs.

The sub-pixels can be 1/2 pixels, 1/4 pixels, 1/8 pixels, or 1/16 pixels.

B2. Determine the difference between the third reference prediction block and the fourth reference prediction block included in each second reference prediction block pair in the at least one second reference prediction block pair;

Specifically, when the third reference prediction block is compared with the fourth reference prediction block, the sum of the absolute values of the differences of the pixels in the two image blocks may be used as the third reference prediction block and the fourth reference prediction block Optionally, the sum of the squares of the pixel differences in the two image blocks may be used as the difference value between the third reference prediction block and the fourth reference prediction block, and the comparison method of the difference is not specifically limited.

B3. Determine the reference prediction block pair with the smallest difference among the at least one second reference prediction block pair.

B4. When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is smaller than the difference between the first reference prediction block and the second reference prediction block, Performing a motion search according to the second reference prediction block pair with the smallest difference to obtain at least one third reference prediction block pair;

B5. Determine that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is smaller than the third reference prediction block and the fourth reference prediction included in the second reference prediction block pair with the smallest difference When the difference between blocks is determined, the fifth reference prediction block included in the third reference prediction block pair with the smallest difference is determined to be the first modified reference prediction block, and the third reference prediction block included in the third reference prediction block pair with the smallest difference is determined to be the first The six reference prediction block is the second modified reference prediction block.

In addition, as an example, it is determined that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is smaller than the third reference prediction included in the second reference prediction block pair with the smallest difference. When there is a difference between the block and the fourth reference prediction block, continue to perform the motion search with the third reference prediction block pair with the smallest difference as the search starting point. Until the number of executions reaches a preset threshold, or the searched position exceeds the search area.

As an example, after B3 determines the reference prediction block pair with the smallest difference among the at least one second reference prediction block pair, if it is determined that the second reference prediction block pair with the smallest difference includes the third reference prediction block and the fourth reference If the difference between the prediction blocks is greater than the difference between the first reference prediction block and the second reference prediction block, it is determined that the first reference prediction block is the first modified reference prediction block, and the second reference prediction block is determined The prediction block is the second modified reference prediction block.

As another example, after B4 obtains at least one third reference prediction block pair, if it is determined that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is greater than that of the third reference prediction block with the smallest difference The second reference prediction block pair includes the difference between the third reference prediction block and the fourth reference prediction block, the third reference prediction block included in the second reference prediction block pair with the smallest difference is determined to be the first modified reference prediction block , Determining that the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is the second modified reference prediction block.

The following specifically describes the MMVD combined with motion vector refinement processing process provided by the embodiment of the present application with reference to an example.

As shown in Figure 12, the predicted motion information of the current image block to be processed is obtained. Assuming that the forward and backward motion vector prediction values of the current image block to be processed are MV0 (-22, 18) and MV1 (2, 12), the forward and backward motion The vector difference is MVD0(1,0) and MVD1(-1,0).

The forward and backward prediction is performed on the current image block to be processed to obtain the forward prediction block and the backward prediction block of the current image block to be processed.

Take MV0(-22,18) and MV1(2,12) as the reference input of the forward and backward motion vector prediction values, and perform the first-precision motion search on the forward reference prediction block q0 and the backward reference prediction block h0, for example , The first precision is 1 pixel.

The previous backward reference prediction blocks q0 and h0 are used as the search starting point to perform the first-precision motion search to determine the difference between the new forward and backward reference prediction blocks obtained in each search, such as 8 forward and backward reference prediction blocks around the forward and backward reference prediction blocks The difference between the pairs, and the difference between the forward reference prediction block q0 and the backward reference prediction block h0, assuming that the motion vector prediction values of the front and back reference prediction blocks with the smallest difference are (-21,18) and (1,12), respectively . The updated search points are (-21, 18) and (1, 12) respectively corresponding to the forward reference prediction block q1 and the backward reference prediction block h1, and the motion search with the first precision is continued. The previous and backward reference prediction blocks q1 and h1 are used as the search starting point to perform the first-precision motion search to determine the difference between the front and back reference prediction blocks obtained in each search, such as the forward and backward reference prediction blocks q1 and h1 around 8 forward and backward reference predictions The difference between the block pairs, and the difference between the forward reference prediction block q1 and the backward reference prediction block h1, assuming that the motion vector prediction values of the front and rear reference prediction blocks with the smallest difference are (-20, 18) and (0, 12) ). (-21,18) and (1,12) correspond to the forward reference prediction block q2 and the backward reference prediction block h2, respectively.

In the embodiment of the present application, the number of motion searches with the first precision can be configured, such as once, twice, and so on. Or determine the range of motion search. Stop searching when out of range.

Taking two times as an example, the motion vector prediction value (-20,18) of the forward reference prediction block q2 and MVD0(1,0) are summed to obtain (-19,18), and the motion of the backward reference prediction block h2 The vector predicted value (0,12) and MVD1 are summed to get (1,12). Therefore, the current image block to be processed is predicted based on the forward motion vector predictor (-19, 18) and the backward motion vector predictor (1, 12). Figure 12 shows only one kind of motion search process.

It should be noted that when the first-precision motion search is performed on the forward reference prediction block and the backward reference prediction block, the first precision can be any set precision, for example, it can be 1 pixel precision or 1/2 pixel precision or 1/4 pixel accuracy or 1/8 pixel accuracy, etc.

The second possible example:

C1: Obtain the predicted motion information of the current image block to be processed (including the initial forward motion vector predicted value and the initial backward motion vector predicted value);

C2: According to the initial forward motion vector prediction value of the current image block to be processed, the forward reference prediction block of the current block is obtained in the forward reference image by the motion compensation method.

C3: According to the initial backward motion vector predictor of the current block, the backward reference prediction block of the current image block to be processed is obtained from the backward reference image by the motion compensation method.

C4: According to the pixel value of the forward reference prediction block obtained by C2 and the pixel value of the backward reference prediction block obtained by C3, the pixel value of the template matching block is obtained by a weighting method.

C5: In the forward reference image described in C2, perform a motion search with an entire pixel step. It should be pointed out that, regardless of whether the search starting point is an entire pixel (the starting point can be an entire pixel, or a sub-pixel, such as: 1/2, 1/4, 1/8, 1/16, etc.), the entire pixel is performed Step motion search to obtain at least one forward reference prediction block of the currently decoded block.

As shown in FIG. 11, the (0,0) point position is the search starting point. In addition, the search is performed at 8 full-pixel step search points around the search starting point to obtain the corresponding prediction block. In the embodiments of the present application, the search method used is not limited, and any search method may be used. Calculate the matching error between each forward reference prediction block and the template matching block described in C4, and select the forward motion vector predictor corresponding to the forward reference prediction block with the smallest matching error as the optimal forward Motion vector prediction value. The matching error can be calculated using the SAD criterion.

C6: Similar to C5, in the backward reference image described in C3, a motion search with an entire pixel step is performed, regardless of whether the search starting point is an entire pixel point (the starting point can be an entire pixel or a sub-pixel, such as: 1 /2, 1/4, 1/8, 1/16, etc.), the whole pixel step motion search is performed to obtain at least one backward prediction block of the currently decoded block, and each backward prediction block is calculated with the C4 For the matching error between the template matching blocks, the backward motion vector predictor corresponding to the backward predictive block with the smallest matching error is selected as the optimal backward motion vector predictor.

In this embodiment of the present application, the motion vector refinement process provided in the second possible example above may be performed multiple times to complete image prediction. Exemplarily, taking twice as an example, after determining the optimal forward motion vector predictor and the optimal backward motion vector predictor through C1-C6, use the optimal forward motion vector predictor as the search The starting point, and the optimal backward motion vector predictor as the starting point of the search for motion search. The searched forward motion vector prediction block is compared with the template matching block, and the forward motion vector prediction value corresponding to the forward reference prediction block with the smallest matching error is selected as the optimal forward motion vector prediction value for the second search. The same applies. Obtain the optimal backward motion vector predictor for the second search.

As an example, the template matching blocks used in multiple searches can be the same, and of course they can also be updated. For example, in the second search, according to the optimal forward motion vector predicted value of one search, the motion compensation method is used in the forward reference Obtain the forward reference prediction block of the current image block to be processed in the image; and obtain the backward reference of the current image block to be processed in the backward reference image through the motion compensation method according to the optimal backward motion vector prediction value of a search For the prediction block, the pixel values of the forward and backward reference prediction blocks obtained by motion compensation are weighted to obtain the latest matching template block.

Refer to FIG. 13 for a schematic flowchart of a second possible implementation manner in the video image prediction provided by this application. The method shown in FIG. 13 may be executed by a video codec device, a video codec, a video codec system, and other devices with video codec functions. The method shown in FIG. 13 may occur during the encoding process or the decoding process. More specifically, the method shown in FIG. 13 may occur during the inter-frame prediction process during encoding and decoding.

S1301: Determine the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the first image block to be processed.

S1302. Determine a first motion vector prediction value according to the first initial motion vector prediction value and the first motion vector difference, and determine a second motion vector prediction value according to the second initial motion vector prediction value and the second motion vector difference . (In other words, the motion vector correction process is performed according to the first initial motion vector prediction value to obtain the first modified motion vector prediction value, and the motion vector correction process is performed according to the second initial motion vector prediction value to obtain the second modified motion vector prediction value ).

Exemplarily, when step S1302 is implemented, the sum of the first initial motion vector prediction value and the first motion vector difference may be used as the first motion vector prediction value, and the second initial motion vector prediction value and the second motion vector The sum of the differences is used as the second motion vector prediction value.

S1303. Perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value to obtain a first corrected motion vector predicted value and a second corrected motion vector predicted value.

S1304: Predict the first image block to be processed according to the first modified motion vector prediction value and the second modified motion vector prediction value.

In a possible design, before determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed, the method further includes :

Analyze the first flag (such as mmvd_flag[x0][y0]) from the code stream; determine the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference of the current image block to be processed, And the second motion vector difference, including: determining the first initial motion vector prediction of the current image block to be processed when the first identifier indicates that the fused motion vector difference MMVD method is adopted for inter-frame prediction of the current image block to be processed Value, second initial motion vector predicted value, first motion vector difference, and second motion vector difference.

In determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed, various methods can be used. For example, the above description can be used Method 1 or Method 2 to obtain, so I won’t go into details here.

Exemplarily, there may be a start condition when S1303 is executed in the embodiment of the present application. For example, when the image block to which the candidate motion information belongs and the current image block to be processed belong to the selected candidate list belong to different images, The first motion vector predictor and the second motion vector predictor perform a motion vector correction process. When the image block to which the candidate motion information selected in the candidate list belongs belongs to the same image as the currently to-be-processed image block, the first motion vector prediction value and the second motion vector prediction value obtained in step S1302 Value, decode the current image block to be processed.

In this embodiment of the application, for how to perform the motion vector refinement process for the first motion vector prediction value and the second motion vector prediction value, please refer to the first possible example and the second possible example above The description in, I won’t go into details here.

As shown in Figure 14, the predicted motion information of the current image block to be processed is obtained. It is assumed that the forward and backward motion vector prediction values of the current image block to be processed are MV0 (-22, 18) and MV1 (2, 12), respectively. The vector difference is MVD0(1,0) and MVD1(-1,0).

Sum the forward motion vector predictor MV0(-22,18) and MVD0(1,0) to get MV2(-21,18), and combine the backward motion vector predictor MV1(2,12) with MVD1(-1) ,0) and get MV3(1,12). The forward reference prediction block corresponding to MV2 is q0, and the forward reference prediction block corresponding to MV3 is h0.

Use MV2(-21,18) and MV3(1,12) as the reference input of the forward and backward motion vector prediction values, and perform the first-precision motion search on the forward reference prediction block q0 and the backward reference prediction block h0, for example , The first precision is 1 pixel.

The previous backward reference prediction blocks q0 and h0 are used as the search starting point to perform the first-precision motion search to determine the difference between the new forward and backward reference prediction blocks obtained in each search, such as 8 forward and backward reference prediction block pairs around the forward and backward reference prediction blocks As for the difference between the forward reference prediction block q0 and the backward reference prediction block h0, it is assumed that the motion vector prediction values of the front and rear reference prediction blocks with the smallest difference are (-21, 17) and (1, 11), respectively. The updated search points are (-21, 17) and (1, 11) respectively corresponding to the forward reference prediction block q1 and the backward reference prediction block h1, and the motion search with the first precision is continued. The previous and backward reference prediction blocks q1 and h1 are used as the search starting point to perform the first-precision motion search to determine the difference between the front and back reference prediction blocks obtained in each search, such as the forward and backward reference prediction blocks q1 and h1 around 8 forward and backward reference predictions The difference between blocks, and the difference between the forward reference prediction block q1 and the backward reference prediction block h1, assuming that the motion vector prediction values of the front and back reference prediction blocks with the smallest difference are (-21,16) and (1,10), respectively . (-21, 16) and (1, 10) correspond to the forward reference prediction block q2 and the backward reference prediction block h2, respectively.

Taking twice as an example, the current image block to be processed is predicted based on the forward motion vector predictor (-21, 16) and the backward motion vector predictor (1, 10). Figure 14 shows only one kind of motion search process.

It should be understood that the solution of the refined MMVD combined with the refined processing process provided by the embodiment of the present application can be applied to the decoding side, and of course it is also applicable to the encoding side. The details on the encoding side will not be described in detail.

The image prediction apparatus according to the embodiment of the present application will be described in detail below with reference to FIG. 15.

FIG. 15 is a schematic block diagram of an image prediction device according to an embodiment of the present application. It should be noted that the image prediction device 1500 is suitable for both inter-frame prediction of decoded video images and inter-frame prediction of encoded video images. It should be understood that the image prediction device 1500 herein may correspond to the frame in FIG. 2 The inter prediction unit 244 may alternatively correspond to the inter prediction unit 344 in FIG. 3. The image prediction device 1500 may include a prediction unit 1501 and a correction unit 1502.

In one possible implementation:

The prediction unit 1501 is configured to determine the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed;

The correction unit 1502 is configured to perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion vector predicted value ；

The prediction unit 1501 is further configured to determine a first motion vector prediction value according to the difference between the first modified motion vector prediction value and the first motion vector, and determine the first motion vector prediction value according to the second modified motion vector prediction value and the first motion vector prediction value. The second motion vector prediction value is determined by the difference of the two motion vectors; and the current image block to be processed is predicted according to the first motion vector prediction value and the second motion vector prediction value.

Exemplarily, the prediction unit 1501 determines the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed , Specifically used for:

Exemplarily, the prediction unit 1501 is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed: The candidate index (or according to the rate-distortion cost algorithm) determines the corresponding candidate motion information from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor is used as The first initial motion vector prediction value, and the fourth motion vector prediction value is used as the second initial motion vector prediction value.

Exemplarily, the prediction unit 1501 is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed:

Exemplarily, the correction unit 1502 is specifically configured to:

Exemplarily, the prediction unit 1501 is further configured to: when the image block to which the candidate motion information belongs belongs to the same image as the current image block to be processed, according to the first initial motion vector predicted value, the first motion The vector difference determines the first target motion vector prediction value, and the second target motion vector prediction value is determined according to the second initial motion vector prediction value and the second motion vector difference; according to the first target motion vector prediction value and the The second target motion vector predictor predicts the current image block to be processed.

According to the candidate index parsed from the code stream (or according to the rate-distortion cost algorithm), the corresponding candidate is determined from the candidate list. The candidate includes the first candidate motion information and the second candidate motion information. The first candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector prediction value and the sixth motion vector prediction value are the first initial motion Vector prediction value and the second initial motion vector prediction value; or,

When the image block to which the first candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the first initial motion vector predictor Value and the second initial motion vector predicted value.

Exemplarily, the candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, wherein the first candidate motion information includes the fifth motion vector predictor and the sixth motion vector predictor , The second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The prediction unit 1501 is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed:

When the image block to which the first candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first initial motion vector predictor and The second initial motion vector prediction value;

Exemplarily, the correction unit 1502 is specifically configured to perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value:

Exemplarily, the modification unit 1502 is specifically configured to: determine a first modified reference prediction block according to the first reference prediction block and determine a second modified reference prediction block according to the second reference prediction block:

Exemplarily, the correction unit 1502 is further configured to:

In another possible implementation:

The prediction unit is specifically used for determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed:

Exemplarily, the prediction unit is specifically configured to determine the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed:

Determine the corresponding candidate motion information from the candidate list according to the candidate index parsed from the code stream (or according to the rate-distortion cost algorithm). The candidate motion information includes the third motion vector predictor and the fourth motion vector predictor. The third motion vector predicted value is used as the first initial motion vector predicted value, and the fourth motion vector predicted value is used as the second initial motion vector predicted value; or,

Exemplarily, the correction unit is specifically configured to perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value:

Exemplarily, the prediction unit is further used for:

Determine the corresponding candidate from the candidate list according to the candidate index parsed from the code stream (or according to the rate-distortion cost algorithm), the candidate including the first candidate motion information and the second candidate motion information, wherein the first candidate A candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor, and the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

Exemplarily, the correction unit 1502 is specifically configured to perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value, including:

In the embodiment of the present application, the apparatus 1500 including only the prediction unit 1501 and the correction unit 1502 may correspond to an inter-frame prediction unit, and may be applied to both the encoding end and the decoding end.

Exemplarily, at the decoding end, the positions of the prediction unit 1501 and the correction unit 1502 in FIG. 15 correspond to the positions of the inter prediction unit 344 in FIG. 3, in other words, the specific implementation of the functions of the prediction unit 1501 and the correction unit 1502 can be seen in FIG. Specific details of the inter prediction unit 344 in 3.

Exemplarily, at the encoding end, in FIG. 15, the positions of the prediction unit 1501 and the correction unit 1502 correspond to the positions of the inter prediction unit 244 in FIG. 2, in other words, the specific implementation of the functions of the prediction unit 1501 and the correction unit 1502 Refer to the specific details of the inter prediction unit 244 in FIG. 2.

It should be understood that the foregoing apparatus 1500 may execute the method shown in FIG. 8 or FIG. 13, and the apparatus 1500 may be a video encoding apparatus, a video decoding apparatus, a video encoding and decoding system, or other equipment with a video encoding and decoding function. The apparatus 1500 can be used to perform image prediction during the encoding process, and can also be used to perform image prediction during the decoding process.

For details, please refer to the introduction of image prediction methods in this article. For the sake of brevity, I will not repeat them here.

Those skilled in the art can understand that the functions described in conjunction with the various illustrative logical blocks, modules, and algorithm steps disclosed herein can be implemented by hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by various illustrative logical blocks, modules, and steps may be stored or transmitted as one or more instructions or codes on a computer-readable medium, and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) . In this manner, computer-readable media may generally correspond to (1) non-transitory tangible computer-readable storage media, or (2) communication media, such as signals or carrier waves. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application. The computer program product may include a computer-readable medium.

By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium. For example, if you use coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source, then the coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media. However, it should be understood that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transient tangible storage media. As used herein, magnetic disks and optical discs include compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), and Blu-ray discs. Disks usually reproduce data magnetically, while discs use lasers to reproduce data optically. data. Combinations of the above should also be included in the scope of computer-readable media.

It can be processed by one or more digital signal processors (DSP), general-purpose microprocessors, application-specific integrated circuits (ASIC), field programmable logic arrays (FPGA), or other equivalent integrated or discrete logic circuits, for example To execute instructions. Therefore, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec. Moreover, the technology may be fully implemented in one or more circuits or logic elements.

The technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets). Various components, modules, or units are described in this application to emphasize the functional aspects of the device for implementing the disclosed technology, but they do not necessarily need to be implemented by different hardware units. In fact, as described above, various units can be combined in the codec hardware unit with appropriate software and/or firmware, or through interoperable hardware units (including one or more processors as described above). provide.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

The above are only exemplary specific implementations of this application, but the protection scope of this application is not limited thereto. Any person skilled in the art can easily think of changes or changes within the technical scope disclosed in this application. Replacement shall be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A video image prediction method, characterized in that it comprises:

Determining the first initial motion vector prediction value, the second initial motion vector prediction value, the first motion vector difference, and the second motion vector difference of the current image block to be processed;

Performing a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion vector predicted value;

Determine a first motion vector prediction value based on the first modified motion vector prediction value and the first motion vector difference, and determine a second motion vector based on the second modified motion vector prediction value and the second motion vector difference Predictive value;

Predicting the current image block to be processed according to the first motion vector prediction value and the second motion vector prediction value.
The method according to claim 1, wherein the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the image block to be processed are determined ,include:

When the first identifier parsed from the code stream indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, the first initial motion vector predictor and the second prediction value of the current image block to be processed are determined. The initial motion vector predictor, the first motion vector difference, and the second motion vector difference.
The method according to claim 1 or 2, wherein the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the currently to-be-processed image block comprises:

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value; or,

The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information of the first position in the candidate list are determined as the first initial motion vector predictor and the second initial motion vector predictor.
The method according to claim 3, wherein the performing a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value comprises:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first initial motion vector predicted value and the second initial motion vector predicted value.
The method of claim 3, further comprising:

When the image block to which the candidate motion information belongs belongs to the same image as the current image block to be processed, the first target motion vector predictor is determined according to the first initial motion vector predictor and the first motion vector difference, and according to The second initial motion vector predictor and the second motion vector difference determine the second target motion vector predictor; according to the first target motion vector predictor and the second target motion vector predictor, the current pending Process image blocks for prediction.
The method according to claim 1 or 2, wherein the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the currently to-be-processed image block comprises:

The corresponding candidate item is determined from the candidate list according to the candidate index parsed from the code stream, the candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the first candidate motion information. 5. A motion vector predictor and a sixth motion vector predictor, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value; or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The method according to claim 1 or 2, wherein the candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, wherein the first candidate motion information includes fifth candidate motion information. A motion vector predictor and a sixth motion vector predictor, the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The determining the first initial motion vector prediction value and the second initial motion vector prediction value of the current image block to be processed includes:

When the image block to which the first candidate motion information or the second candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first An initial motion vector prediction value and the second initial motion vector prediction value;

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The method according to any one of claims 1-7, wherein the executing a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value includes:

Acquiring a first reference prediction block corresponding to the first initial motion vector prediction value, and a second reference prediction block corresponding to the second initial motion vector prediction value;

Determine a first modified reference prediction block according to the first reference prediction block, and determine a second modified reference prediction block according to the second reference prediction block;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor.
The method according to claim 8, wherein determining a first modified reference prediction block according to the first reference prediction block, and determining a second modified reference prediction block according to the second reference prediction block, comprises:

Performing a motion search according to the first reference prediction block pair to obtain at least one second reference prediction block pair;

Wherein, the first reference prediction block pair includes the first reference prediction block and the second reference prediction block; the second reference prediction block pair includes a third reference prediction block and a fourth reference prediction block, and the The third reference prediction block is obtained based on the motion search of the first reference prediction block in the first preset area, and the fourth reference prediction block is obtained based on the second reference prediction block in the second It is obtained by motion search in the preset area;

Determining the difference between the third reference prediction block and the fourth reference prediction block included in each second reference prediction block pair in the at least one second reference prediction block pair;

Determining a reference prediction block pair with the smallest difference among the at least one second reference prediction block pair;

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is smaller than the difference between the first reference prediction block and the second reference prediction block, according to the difference Performing a motion search on the smallest second reference prediction block pair to obtain at least one third reference prediction block pair;

Wherein, the third reference prediction block pair includes a fifth reference prediction block and a sixth reference prediction block, and the fifth reference prediction block is based on the third reference prediction block included in the second reference prediction block pair with the smallest difference. Obtained by performing a motion search in the first preset area, and the sixth reference prediction block is based on a second reference prediction block with the smallest difference to a fourth reference prediction block included in a motion search in the second preset area get;

It is determined that the fifth reference prediction block included in the third reference prediction block pair with the smallest difference is the first modified reference prediction block, and it is determined that the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is all The second modified reference prediction block.
The method of claim 9, further comprising:

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is greater than the difference between the first reference prediction block and the second reference prediction block, it is determined The first reference prediction block is the first modified reference prediction block, and it is determined that the second reference prediction block is a second modified reference prediction block.
The method of claim 9, further comprising:

When it is determined that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is greater than the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference When it is determined that the third reference prediction block included in the second reference prediction block pair with the smallest difference is the first modified reference prediction block, it is determined that the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is The second modification refers to the prediction block.
A video image prediction method, characterized in that it comprises:

Determining the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the first image block to be processed;

Determining a first motion vector prediction value according to the first initial motion vector prediction value and the first motion vector difference, and determining a second motion vector prediction value according to the second initial motion vector prediction value and the second motion vector difference;

Performing a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion vector predicted value;

The first image block to be processed is predicted according to the first modified motion vector prediction value and the second modified motion vector prediction value.
The method of claim 12, wherein the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the current image block to be processed are determined ,include:

When the first identifier parsed from the code stream indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, the first initial motion vector predictor and the second prediction value of the current image block to be processed are determined. The initial motion vector predictor, the first motion vector difference, and the second motion vector difference.
The method according to claim 12 or 13, wherein the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the currently to-be-processed image block comprises:

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value; or,

The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information of the first position in the candidate list are determined as the first initial motion vector predictor and the second initial motion vector predictor.
The method according to claim 14, wherein the performing a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value comprises:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first motion vector predicted value and the second motion vector predicted value.
The method of claim 14, further comprising:

When the image block to which the candidate motion information belongs belongs to the same image as the current image block to be processed, the current image block to be processed is determined according to the first motion vector prediction value and the second motion vector prediction value. Make predictions.
The method according to claim 12 or 13, wherein determining the first initial motion vector prediction value and the second initial motion vector prediction value of the currently to-be-processed image block comprises:

The corresponding candidate item is determined from the candidate list according to the candidate index parsed from the code stream, the candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the first candidate motion information. 5. A motion vector predictor and a sixth motion vector predictor, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value; or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The method according to claim 12 or 13, wherein the determining the first initial motion vector prediction value and the second initial motion vector prediction value of the currently to-be-processed image block comprises:

The candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, where the first candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor. The second candidate motion information includes the seventh motion vector predictor and the eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first An initial motion vector prediction value and the second initial motion vector prediction value;

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The method according to any one of claims 12-18, wherein the executing a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value includes:

Acquiring a first reference prediction block corresponding to the first motion vector prediction value, and a second reference prediction block corresponding to the second motion vector prediction value;

Determine a first modified reference prediction block according to the first reference prediction block, and determine a second modified reference prediction block according to the second reference prediction block;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor.
A video image prediction device, characterized by comprising:

A prediction unit, configured to determine the first initial motion vector predictor, the second initial motion vector predictor, the first motion vector difference, and the second motion vector difference of the current image block to be processed;

A correction unit, configured to perform a motion vector correction process according to the first initial motion vector predicted value and the second initial motion vector predicted value to obtain a first corrected motion vector predicted value and a second corrected motion vector predicted value;

The prediction unit is further configured to determine a first motion vector prediction value according to the difference between the first modified motion vector prediction value and the first motion vector, and determine the first motion vector prediction value according to the second modified motion vector prediction value and the second motion vector prediction value. The motion vector difference determines a second motion vector predictor; and predicts the current image block to be processed according to the first motion vector predictor and the second motion vector predictor.
The device according to claim 20, wherein the prediction unit is specifically configured to:

When the first identifier parsed from the code stream indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, the first initial motion vector predictor and the second prediction value of the current image block to be processed are determined. The initial motion vector predictor, the first motion vector difference, and the second motion vector difference.
The apparatus according to claim 20 or 21, wherein the prediction unit is specifically configured to:

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value.
The apparatus according to claim 20 or 21, wherein the prediction unit is specifically configured to:

The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information of the first position in the candidate list are determined as the first initial motion vector predictor and the second initial motion vector predictor.
The device according to claim 22 or 23, wherein the correction unit is specifically configured to:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first initial motion vector predicted value and the second initial motion vector predicted value.
The device according to claim 22 or 23, wherein the prediction unit is further configured to: when the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to the same image, according to the first An initial motion vector predictor and a first motion vector difference to determine a first target motion vector predictor, and determine a second target motion vector predictor according to the second initial motion vector predictor and a second motion vector difference; The first target motion vector predictor and the second target motion vector predictor predict the current image block to be processed.
The apparatus according to claim 20 or 21, wherein the prediction unit is specifically configured to:

The corresponding candidate item is determined from the candidate list according to the candidate index parsed from the code stream, the candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the first candidate motion information. 5. A motion vector predictor and a sixth motion vector predictor, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value; or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The device according to claim 20 or 21, wherein the candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, wherein the first candidate motion information includes fifth candidate motion information. A motion vector predictor and a sixth motion vector predictor, the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

The prediction unit is specifically used for:

When the image block to which the first candidate motion information or the second candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first An initial motion vector prediction value and the second initial motion vector prediction value;

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The device according to any one of claims 20-27, wherein the correction unit is specifically configured to:

Acquiring a first reference prediction block corresponding to the first initial motion vector prediction value, and a second reference prediction block corresponding to the second initial motion vector prediction value;

Determine a first modified reference prediction block according to the first reference prediction block, and determine a second modified reference prediction block according to the second reference prediction block;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor.
The device according to claim 28, wherein the correction unit is specifically configured to:

Performing a motion search according to the first reference prediction block pair to obtain at least one second reference prediction block pair;

Wherein, the first reference prediction block pair includes the first reference prediction block and the second reference prediction block; the second reference prediction block pair includes a third reference prediction block and a fourth reference prediction block, and the The third reference prediction block is obtained based on the motion search of the first reference prediction block in the first preset area, and the fourth reference prediction block is obtained based on the second reference prediction block in the second It is obtained by motion search in the preset area;

Determining the difference between the third reference prediction block and the fourth reference prediction block included in each second reference prediction block pair in the at least one second reference prediction block pair;

Determining a reference prediction block pair with the smallest difference among the at least one second reference prediction block pair;

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is smaller than the difference between the first reference prediction block and the second reference prediction block, according to the difference Performing a motion search on the smallest second reference prediction block pair to obtain at least one third reference prediction block pair;

Wherein, the third reference prediction block pair includes a fifth reference prediction block and a sixth reference prediction block, and the fifth reference prediction block is based on the third reference prediction block included in the second reference prediction block pair with the smallest difference. Obtained by performing a motion search in the first preset area, and the sixth reference prediction block is based on a second reference prediction block with the smallest difference to a fourth reference prediction block included in a motion search in the second preset area get;

It is determined that the fifth reference prediction block included in the third reference prediction block pair with the smallest difference is the first modified reference prediction block, and it is determined that the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is all The second modified reference prediction block.
The device according to claim 29, wherein the correction unit is further configured to:

When it is determined that the difference between the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is greater than the difference between the first reference prediction block and the second reference prediction block, it is determined The first reference prediction block is the first modified reference prediction block, and it is determined that the second reference prediction block is a second modified reference prediction block.
The device according to claim 30, wherein the correction unit is further configured to:

When it is determined that the difference between the fifth reference prediction block and the sixth reference prediction block included in the third reference prediction block pair with the smallest difference is greater than the third reference prediction block and the fourth reference prediction block included in the second reference prediction block pair with the smallest difference When it is determined that the third reference prediction block included in the second reference prediction block pair with the smallest difference is the first modified reference prediction block, it is determined that the fourth reference prediction block included in the second reference prediction block pair with the smallest difference is The second modification refers to the prediction block.
A video image prediction device, characterized by comprising:

A prediction unit, configured to determine a first initial motion vector prediction value, a second initial motion vector prediction value, a first motion vector difference, and a second motion vector difference of the first image block to be processed;

The correction unit is configured to determine a first motion vector predictor according to the first initial motion vector predictor and the first motion vector difference, and determine a second motion according to the second initial motion vector predictor and the second motion vector difference Vector predicted value;

The prediction unit is further configured to perform a motion vector correction process according to the first motion vector predicted value and the second motion vector predicted value to obtain the first corrected motion vector predicted value and the second corrected motion vector predicted value ; Predict the first image block to be processed according to the first modified motion vector prediction value and the second modified motion vector prediction value.
The apparatus according to claim 32, wherein the prediction unit is specifically configured to:

When the first identifier parsed from the code stream indicates that the inter-frame prediction of the current image block to be processed adopts the fused motion vector difference MMVD method, the first initial motion vector predictor and the second prediction value of the current image block to be processed are determined. The initial motion vector predictor, the first motion vector difference, and the second motion vector difference.
The device according to claim 32 or 33, wherein the prediction unit is specifically configured to:

According to the candidate index parsed from the code stream, the corresponding candidate motion information is determined from the candidate list. The candidate motion information includes a third motion vector predictor and a fourth motion vector predictor. The third motion vector predictor Value as the first initial motion vector prediction value, and the fourth motion vector prediction value as the second initial motion vector prediction value; or,

The third motion vector predictor and the fourth motion vector predictor included in the candidate motion information of the first position in the candidate list are determined as the first initial motion vector predictor and the second initial motion vector predictor.
The device according to claim 34, wherein the correction unit is specifically configured to:

When the image block to which the candidate motion information belongs and the currently to-be-processed image block belong to different images, a motion vector correction process is performed according to the first motion vector predicted value and the second motion vector predicted value.
The apparatus according to claim 34, wherein the prediction unit is further configured to:

When the image block to which the candidate motion information belongs belongs to the same image as the current image block to be processed, the current image block to be processed is determined according to the first motion vector prediction value and the second motion vector prediction value. Make predictions.
The device according to claim 32 or 33, wherein the prediction unit is specifically configured to:

The corresponding candidate item is determined from the candidate list according to the candidate index parsed from the code stream, the candidate item includes the first candidate motion information and the second candidate motion information, wherein the first candidate motion information includes the first candidate motion information. 5. A motion vector predictor and a sixth motion vector predictor, where the second candidate motion information includes a seventh motion vector predictor and an eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the current image block to be processed belong to different images, it is determined that the fifth motion vector predictor value and the sixth motion vector predictor value are The first initial motion vector prediction value and the second initial motion vector prediction value; or,

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The device according to claim 32 or 33, wherein the prediction unit is specifically configured to:

The candidate at the first position in the candidate list includes first candidate motion information and second candidate motion information, where the first candidate motion information includes a fifth motion vector predictor and a sixth motion vector predictor. The second candidate motion information includes the seventh motion vector predictor and the eighth motion vector predictor;

When the image block to which the first candidate motion information or the second candidate motion information belongs and the currently to-be-processed image block belong to different images, it is determined that the fifth motion vector predictor and the sixth motion vector predictor are the first An initial motion vector prediction value and the second initial motion vector prediction value;

When the image block to which the first candidate motion information or the second candidate motion information belongs belongs to the same image as the current image block to be processed, it is determined that the seventh motion vector predictor and the eighth motion vector predictor are the The first initial motion vector prediction value and the second initial motion vector prediction value.
The device according to any one of claims 32-38, wherein the correction unit is specifically configured to:

Acquiring a first reference prediction block corresponding to the first motion vector prediction value, and a second reference prediction block corresponding to the second motion vector prediction value;

Determine a first modified reference prediction block according to the first reference prediction block, and determine a second modified reference prediction block according to the second reference prediction block;

Wherein, the difference between the first modified reference prediction block and the second modified reference prediction block is less than or equal to the difference between the first reference prediction block and the second reference prediction block, and the first modified reference prediction block A prediction block is an image block in a first preset area that has the same size as the first reference prediction block, the first preset area includes the first reference prediction block, and the second modified reference prediction block is An image block in a second preset area that has the same size as the second reference prediction block, the second preset area includes the second reference prediction block; the first modified reference prediction block corresponds to the The first modified motion vector predictor, and the second modified reference prediction block corresponds to the second modified motion vector predictor.
A video encoder, characterized in that the video encoder is used to encode the current image block to be processed, and includes:

The inter-frame prediction module includes the image prediction device according to any one of claims 20, 28-32, and 39, wherein the inter-frame prediction module is used to predict the predicted value of the pixel value of the current image block to be processed ；

The entropy coding module is used to encode indication information into a bitstream, the indication information is used to indicate the initial motion information of the current image block to be processed, and the initial motion information includes a first initial motion vector predictor and a second initial motion vector. Motion vector prediction value;

The reconstruction module is configured to reconstruct the current image block to be processed based on the predicted value of the pixel value of the current image block to be processed.
A video decoder, characterized in that the video decoder is used to decode a current image block to be processed from a code stream, and includes:

The entropy decoding module is used to decode indication information from the code stream. The indication information is used to indicate the initial motion information of the currently decoded image block. The initial motion information includes a first initial motion vector predictor and a second initial motion vector Predictive value;

The inter-frame prediction module includes the image prediction device according to any one of claims 20 to 39, wherein the inter-frame prediction module is configured to predict the predicted value of the pixel value of the current image block to be processed;

The reconstruction module is configured to reconstruct the current image block to be processed based on the predicted value of the pixel value of the current image block to be processed.
A video decoding device, characterized by comprising: a non-volatile memory and a processor coupled with each other, the processor calls the program code stored in the memory to execute any one of claims 1-19 Described method.
A video encoding device, which is characterized by comprising: a non-volatile memory and a processor coupled with each other, and the processor calls the program code stored in the memory to execute as claimed in claims 1, 8-12, 19. Any of the methods described.