WO2019154424A1

WO2019154424A1 - Video decoding method, video decoder, and electronic device

Info

Publication number: WO2019154424A1
Application number: PCT/CN2019/074822
Authority: WO
Inventors: 傅佳莉; 赵寅; 高山
Original assignee: 华为技术有限公司
Priority date: 2018-02-12
Filing date: 2019-02-12
Publication date: 2019-08-15
Also published as: CN110166778A

Abstract

The present application provides a video decoding method and a video decoder. The method comprises: analyzing motion vector difference (MVD) information of a current image block from a code stream; obtaining a motion vector prediction (MVP) value of the current image block; if the MVP is comprised in a target motion vector set corresponding to a target motion vector resolution (MVR), obtaining a motion vector of the current image block on the basis of the MVP, the MVD information, and the MVR, wherein the target motion vector set is one of multiple motion vector sets, the multiple motion vector sets comprise a first motion vector set and a second motion vector set, and a first MVR corresponding to the first motion vector set is different from a second MVR corresponding to the second motion vector set; on the basis of the motion vector having the MVR of the current image block, obtaining a prediction block of the current image block; and reconstructing the current image block on the basis of the prediction block. Embodiments of the present application are favorable for improving the MVP resolution.

Description

Video decoding method, video decoder, and electronic device

Technical field

The present application relates to the field of video codec technology, and in particular, to a video decoding method, a video decoder, and an electronic device.

Background technique

Through video compression technology, such as MPEG (Moving Pictures Experts Group)-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 advanced video codec ( Advanced video coding, AVC), ITU-T H.265 high efficiency video coding (HEVC) and the standards defined by the H.266 standard and those described in the extensions of the above standards, devices Efficient transmission and reception of digital video information can be achieved. Typically, an image of a video sequence is divided into image blocks for encoding or decoding.

In video compression technology, in order to reduce or remove redundant information in a video sequence, image block based spatial prediction (intra prediction) and/or temporal prediction (inter prediction) are introduced. The inter-frame prediction technique can be used as a reference frame of the current frame as a reference frame of the current frame, and a matching reference block is found for the current image block in the current image, and the pixel value of the pixel in the reference block is used as the current image block. a predicted value of a pixel value of a sample pixel, and acquiring motion information of the current image block, the motion information may include, for example, indication information of the image in which the reference block is located (ie, reference image information) and a current image block to the reference block Position offset information (ie motion vector MV). An image may be referred to as a frame, and a reference image may be referred to as a reference frame.

In order to reduce the bit overhead required to transmit motion information, motion information of neighboring locations may be used to predict motion information for the current location. The motion vector in the motion information is differentially encoded and divided into two parts: Motion Vector Prediction (MVP) and Motion Vector Difference (MVD). The motion vector predictor may be derived from the motion vector of the temporal and/or spatial neighboring locations, and the motion vector predictor is not encoded or encoded directly into the codestream, and the transmitted MVD information may be encoded in the codestream. In the decoding process, the decoding end extracts the MVD information in the code stream, and derives the motion vector prediction value, and calculates the sum of the motion vector prediction value and the MVD to obtain the final MV. The closer the derived motion vector predictor is to the final MV, the smaller the MVD information that needs to be transmitted.

Currently, conventional H.264 and H.265 video encoders and video decoders only support transmission of motion vector difference information with quarter-pixel precision, and quarter-pixel precision is fixed. However, the distance of object motion in the actual scene may be distributed at different pixel precision positions, so the current practice may result in inaccurate motion vector prediction, resulting in deviation of motion vector prediction accuracy.

Summary of the invention

The embodiments of the present application provide a video decoding method, a video decoder, and corresponding electronic devices to improve motion vector prediction accuracy, thereby improving codec performance.

In a first aspect, an embodiment of the present application provides a video decoding method, where the method includes: receiving a code stream, where the code stream carries motion vector difference MVD information of a currently decoded image block; and parsing out from the code stream. The motion vector difference MVD information; acquiring a motion vector predictor MVP of the current decoded image block; when the motion vector predictor is included in a target motion vector set corresponding to a target motion vector accuracy (eg, When the motion vector predictor is included in the target motion vector set, determining, according to the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions, the target motion vector accuracy corresponding to the target motion vector set is the motion vector predictor Having a motion vector accuracy), based on the motion vector predictor, the motion vector difference information, and the target motion vector accuracy, obtaining a motion vector of the current decoded image block, wherein the current decoded image block The motion vector has the target motion vector precision, and the target motion vector set is a plurality of motion vector sets One of the target motion vector precisions is one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector accuracy, the plurality of motion vector sets including a first motion vector set and a second motion a vector set, the at least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector precision corresponding to the first motion vector set Different from the second motion vector precision corresponding to the second motion vector set; based on the motion vector of the current decoded image block, obtaining a prediction block of the current decoded image block (which may also be understood as a pixel value of the currently decoded image block) Predicting value); reconstructing the current decoded image block based on the prediction block.

Furthermore, it should be understood that the current image block (abbreviated as the current block) herein can be understood as the image block currently being processed. For example, in the encoding process, the current image block or the current encoded image block refers to an encoding block currently being encoded; in the decoding process, the current image block or the currently decoded image block refers to a decoding block currently being decoded (decoding block). .

Furthermore, it should be understood that the block that provides the prediction for the current image block is referred to as a prediction block. The pixel value or sampled value or sampled signal within the prediction block is called a prediction signal.

It should be understood that the execution body of the method of the embodiment of the present application may be a video decoder or an electronic device having a video decoding function.

It can be seen that, in the video decoding method of the embodiment of the present application, after the video decoder obtains the motion vector predictor of the currently decoded image block, the target motion vector accuracy corresponding to the target motion vector set to which the motion vector predictor belongs is obtained. The adaptive motion vector accuracy for the currently decoded image block can be adaptively determined. If the number of motion vector sets is N, the video decoder can support M kinds of motion vector precision, where M is less than or equal to N, M and N are A positive integer improves the motion vector prediction accuracy. Specifically, since the embodiment of the present invention can adaptively select motion vector precision, one or more image blocks corresponding to some video content are used with higher pixel precision (for example, 1 /8 pixel precision) motion vectors improve video codec quality relative to motion vectors using lower pixel precision, and the benefits are better than interpolation overhead, such as based on high pixel precision (eg 1/8 pixel) The motion vector obtained by the motion vector is closer to the original block of the currently decoded image block, even if the fractional pixel bit is interpolated The placement results in some interpolation overhead; for one or more image blocks corresponding to certain video content, using lower pixel precision motion vectors (eg integer pixel precision) versus motion vectors using higher pixel precision, neither The video coding and decoding quality is reduced, and the interpolation overhead cost is also avoided. Therefore, the video decoding method of the embodiment of the present application improves the codec performance as a whole.

With reference to the first aspect, in some implementations of the first aspect, when the motion vector predictor is included in a target motion vector set corresponding to a target motion vector accuracy, based on the motion vector predictor, The motion vector difference (MVD) information and the target motion vector accuracy are obtained, and the motion vector of the current image block to be decoded is obtained, including:

Determining that the motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the motion vector prediction according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions The value has the motion vector accuracy;

Calculating a sum of the motion vector predictor and the motion vector difference MVD to obtain a motion vector of the current decoded image block, a motion vector of the current decoded image block, the motion vector predictor, and a motion vector difference The value MVD has the same motion vector precision (that is, both have the target motion vector precision); or,

And scaling (eg, amplifying) the motion vector difference information based on the target motion vector precision (eg, an index value based on target motion vector accuracy) to obtain a scaled (eg, amplified) motion vector difference MVD; Calculating a sum of the motion vector predictor and the amplified motion vector difference MVD to obtain a motion vector of the current decoded image block, a motion vector of the current decoded image block, the motion vector predictor, and The amplified motion vector difference MVD has the same motion vector accuracy (ie, both have target motion vector accuracy).

It should be noted that the correspondence between multiple motion vector sets and multiple motion vector precisions may be a one-to-one correspondence, or may be a M:N correspondence, and M is not equal to N, for example, multiple motion vector sets. A plurality of motion vector precisions may be one-to-many or many-to-one correspondences, which is not limited in this embodiment of the present application.

It can be seen that, in the embodiment of the present application, after the motion vector difference value MVD information of the current decoded image block parsed from the code stream is deduced by the video decoder, the motion vector predictor value of the current decoded image block is determined according to multiple Corresponding relationship between the motion vector set and various motion vector precisions, adaptively deriving the target motion vector precision, and scaling (for example, amplifying) the parsed MVD information according to the adaptively derived target motion vector precision, And recovering the parsed motion vector difference information into video encoding end scaling (eg, reducing) motion vector difference information before processing, thereby ensuring accuracy of motion vector difference scaling, and scaling the processed MVD information relative to the original The MVD information occupies less bit overhead or the same bit overhead, thereby improving codec performance.

With reference to the first aspect, in some implementations of the first aspect, the first neighboring block corresponding to the motion vector predictor in the first motion vector set has a first distance from the current image block different from the second a second distance between the second neighboring block and the current image block corresponding to the motion vector predictor in the motion vector set, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block and/ Or in a time domain adjacent to the block. It should be noted that the first neighboring block herein refers to: an airspace and/or a time domain neighboring block corresponding to the MVPs in the first motion vector set. As shown in Table-7A, the first neighboring block may refer to the airspace. The neighboring blocks A1 and B1; and the second neighboring block herein generally refer to: an airspace and/or a time domain neighboring block corresponding to the MVPs in the second motion vector set, as shown in Table-7A, the second neighboring block It can refer to the airspace adjacent blocks A0 and B0.

It should be noted that the first distance herein is, for example, the distance of the pixel position of the upper left corner of the first neighboring block relative to the pixel position of the upper left corner of the current image block, or the pixel position of the center point of the first neighboring block relative to the current image block. The distance of the center point pixel position; the second distance here is, for example, the distance of the pixel position of the upper left corner of the second neighboring block relative to the pixel position of the upper left corner of the current image block, or the center point pixel position of the second neighboring block relative to the current The distance from the center point pixel position of the image block, but the application is not limited thereto.

It should be noted that the spatial neighboring block herein may include one or more spatial neighboring blocks adjacent to the current image block in the image of the current image block.

It should be noted that the time domain neighboring block herein may include one or more airspace neighboring blocks in the reference image adjacent to the co-located block, and/or one of the collocated blocks or a plurality of sub-blocks, wherein the collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

With reference to the first aspect, in some implementations of the first aspect, if the first distance between the first neighboring block and the current decoded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the first a second distance between the second neighboring block corresponding to the motion vector predictor in the second motion vector set and the current decoded image block, and the first motion vector precision corresponding to the first motion vector set (eg, 1/8 pixel precision) Higher than the second motion vector accuracy corresponding to the second motion vector set (eg, 1/4 pixel precision); or,

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is greater than the second distance corresponding to the motion vector predictor in the second motion vector set a second distance between the block and the currently decoded image block, the first motion vector precision corresponding to the first motion vector set (eg, 1/4 pixel precision) is lower than the second motion vector accuracy corresponding to the second motion vector set (eg 1/8 pixel accuracy).

It should be noted that, here is a description for different sets of motion vectors, and the distance between the neighboring block corresponding to a certain motion vector predictor and the current image block is equal to the neighboring block and the current image block corresponding to the predicted value of the other motion vector. In the case of the distance, it is common to group the two motion vector predictors into the same motion vector set.

In conjunction with the first aspect, in some implementations of the first aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the first aspect, in some implementations of the first aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule, wherein the motion The vector precision assignment rule is used to characterize the relationship between the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the current image block and the magnitude of the motion vector precision. The motion vector precision assignment rule can be understood as: the closer the distance between the neighboring block corresponding to the motion vector predictor and the current decoded image block, the higher the motion vector accuracy; the neighboring block corresponding to the motion vector predictor and the currently decoded image block The farther the distance is, the lower the motion vector accuracy. Furthermore, the neighboring blocks herein may include, but are not limited to, a spatial neighboring block and/or a time domain neighboring block of the currently decoded image block.

In conjunction with the first aspect, in some implementations of the first aspect, the code stream further carries a motion vector accuracy parameter, the motion vector accuracy parameter being used to indicate a value of the plurality of motion vector precisions, the motion The vector precision parameter is carried in any one of a sequence parameter set, an image parameter set PPS, a stripe slice header, or some other hierarchical position of the currently decoded image block. Preferably, the various motion vector precisions herein may be greater than or equal to the three motion vector precisions.

With reference to the first aspect, in some implementations of the first aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a motion vector accuracy number for the current decoding processing unit ( For example, pps_amvr_number) and at least two motion vector precision values (for example, pps_amvr_set[pps_amvr_number]) corresponding to the motion vector precision number, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, and a CTU And one or more of the CUs.

It can be seen that, in the decoding method of the embodiment of the present application, the video decoder is configured to set the multiple motion vector set and the multiple motion vectors according to the motion vector precision parameter and the motion vector precision assignment rule parsed from the code stream. The correspondence of precision, thereby facilitating the video decoder to adaptively determine the target motion vector accuracy for decoding the image block (eg, integer pixel precision or 1/2 pixel precision or 1/4 pixel precision or 1/8 pixel precision or 4 Pixel accuracy, etc.).

With reference to the first aspect, in some implementations of the first aspect, the code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; for example, The first identifier may be carried in any one of a sequence parameter set, an image parameter set or a slice header of the decoded image block; or

The code stream further carries a first identifier and a second identifier, where the first identifier is used to indicate a third motion vector accuracy, and the second identifier is used to indicate a third motion vector set; for example, the first identifier and the first identifier The second identifier may be carried in any one of a sequence parameter set, an image parameter set or a slice header of the decoded image block;

The third motion vector set is a first motion vector set of the plurality of motion vector sets, or a second motion vector set, or other motion vector set, and the third motion vector precision is the multiple First motion vector accuracy in motion vector accuracy, or second motion vector accuracy, or other motion vector accuracy.

It can be seen that in the case that the video encoder only signals the motion vector accuracy of the important motion vector set, the decoding method of the embodiment of the present invention can adaptively determine the motion vector accuracy for the currently decoded image block, and by targeting the important motion vector. The set delivery flag indicates the specific motion vector accuracy and also improves the encoding and decoding efficiency.

In conjunction with the first aspect, in some implementations of the first aspect, the code stream further carries a third identifier (eg, a candidate predicted motion vector index), the third identifier is used to indicate a candidate for the currently decoded image block a motion vector predictor MVP (eg, the candidate predicted motion vector index may indicate a location of the candidate motion vector predictor selected in the candidate motion vector prediction list); accordingly, the acquiring the motion of the current image block to be decoded The vector predictor value MVP includes: determining, according to the third identifier, a candidate motion vector predictor MVP of the current image block to be decoded from a motion vector prediction candidate list; or

The acquiring the motion vector predictor MVP of the current image block to be decoded includes: acquiring a motion vector predictor MVP of the current image block to be decoded by using a bidirectional matching method or a template matching method.

It can be seen that, on the one hand, the video decoding method in the embodiment of the present application is applicable not only to the candidate motion vector predictor list in the advanced motion vector prediction mode (AMVP), but also to the spatial neighboring block in other modes. And/or a list of candidate motion vector predictors constructed by motion vectors of temporally neighboring blocks, thereby improving codec performance. On the other hand, the video decoding method of the embodiment of the present invention can support multiple methods for acquiring motion vector predictors, thereby improving the flexibility of the video decoding method.

A second aspect of the present application provides a video decoder, including: an entropy decoding module, configured to receive a code stream, where the code stream carries motion vector difference (MVD) information of an image block to be currently decoded, and from the The motion vector difference information of the current image block to be decoded is parsed in the code stream; the inter prediction module is configured to acquire a motion vector predictor (MVP) of the current image block to be decoded; when the motion vector predictor When the target motion vector set corresponding to the target motion vector accuracy is included, the motion vector of the current decoded image block is obtained based on the motion vector predictor, the motion vector difference information, and the target motion vector precision, The motion vector of the currently decoded image block has the target motion vector precision, and the target motion vector set is one of a plurality of motion vector sets, and the target motion vector precision includes a first motion vector accuracy and a first One of a plurality of motion vector precisions of motion vector accuracy, the plurality of motion vector sets including a first motion vector set and a second motion a vector set, the at least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector precision corresponding to the first motion vector set Differentiating the second motion vector precision corresponding to the second motion vector set; and performing motion compensation based on the motion vector of the current decoded image block having the target motion vector accuracy to obtain the current image to be decoded a prediction block of a block (which may also be understood as a predicted value of a pixel value of a currently decoded image block); and a reconstruction module for predicting a block based on the currently decoded image block (also as a pixel value of the currently decoded image block) The predicted value of the reconstructed image block is reconstructed. Wherein, "when the motion vector predictor is included in the target motion vector set corresponding to the target motion vector accuracy, based on the motion vector predictor, the motion vector difference information, and the target motion vector accuracy Obtaining a motion vector of the current decoded image block, for example, "when the motion vector predictor is included in the target motion vector set, according to a plurality of motion vector sets and corresponding to a plurality of motion vector precisions Determining, by the relationship, a target motion vector accuracy corresponding to the target motion vector set as a motion vector accuracy of the motion vector predictor; based on the motion vector predictor, the motion vector difference information, and the target motion vector accuracy And obtaining a motion vector of the current decoded image block).

With reference to the second aspect, in some implementations of the second aspect, when the motion vector predictor is included in a target motion vector set corresponding to the target motion vector accuracy, based on the motion vector predictor, And the motion vector difference information and the target motion vector precision, to obtain an aspect of the motion vector of the current decoded image block, where the inter prediction module is specifically configured to:

Calculating a sum of the motion vector predictor and the motion vector difference MVD information to obtain a motion vector of the current decoded image block, a motion vector of the currently decoded image block, the motion vector predictor, and a motion vector The difference MVD has the same motion vector precision (that is, both have the target motion vector precision); or,

And scaling (eg, amplifying) the motion vector difference information based on the target motion vector accuracy to obtain a scaled (eg, amplified) motion vector difference MVD; calculating the motion vector predictor and the a sum of the amplified motion vector difference values MVD to obtain a motion vector of the current decoded image block, the motion vector of the current decoded image block, the motion vector predictor value, and the amplified motion vector difference value MVD having the same Motion vector accuracy (that is, both have target motion vector accuracy).

With reference to the second aspect, in some implementations of the second aspect, the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is different from the first a second distance between the second neighboring block corresponding to the motion vector predictor in the motion vector predictor and the current decoded image block, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block And / or time domain neighboring blocks.

With reference to the second aspect, in some implementations of the second aspect, if the first distance between the first neighboring block and the current decoded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the first a second distance between the second neighboring block corresponding to the motion vector predictor in the motion vector predictor and the current decoded image block, and the first motion vector corresponding to the first motion vector set is higher in precision than the second motion vector Set the corresponding second motion vector precision; or,

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is greater than the second distance corresponding to the motion vector predictor in the second motion vector set And the second motion distance of the first motion vector set is lower than the second motion vector precision corresponding to the second motion vector set.

In conjunction with the second aspect, in some implementations of the second aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the second aspect, in some implementations of the second aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule, wherein the motion The vector precision assignment rule is configured to represent a relationship between a distance between a neighboring block corresponding to a motion vector predictor included in the motion vector set and a current image block and a magnitude of a motion vector precision, that is, the motion The farther the distance between the neighboring block corresponding to the motion vector predictor included in the vector set and the current decoded image block is, the lower the motion vector accuracy is; the neighboring block corresponding to the motion vector predictor included in the motion vector set is The closer the distance between the currently decoded image blocks, the higher the motion vector accuracy.

In conjunction with the second aspect, in some implementations of the second aspect, the code stream further carries a motion vector accuracy parameter, the motion vector accuracy parameter being used to indicate a value of the plurality of motion vector precisions, the motion The vector precision parameter is carried in any one of a sequence parameter set, an image parameter set PPS or a slice header of the decoded image block;

Correspondingly, the entropy decoding module is further configured to parse the motion vector precision parameter from the code stream.

With reference to the second aspect, in some implementations of the second aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a motion vector accuracy number for the current decoding processing unit ( For example, pps_amvr_number) and at least two motion vector precision values (for example, pps_amvr_set[pps_amvr_number]) corresponding to the motion vector precision number, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, and a decoding. One or more of a tree unit CTU and a decoding unit CU;

With reference to the second aspect, in some implementations of the second aspect, the code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; or

The code stream further carries a first identifier and a second identifier, where the first identifier is used to indicate a third motion vector precision, and the second identifier is used to indicate a third motion vector set;

The third motion vector set is a first motion vector set of the plurality of motion vector sets, or a second motion vector set, or other motion vector set, and the third motion vector precision is the multiple First motion vector accuracy in motion vector accuracy, or second motion vector accuracy, or other motion vector accuracy;

Correspondingly, the entropy decoding module is further configured to parse the first identifier from the code stream, or decode the first identifier and the second identifier.

In conjunction with the second aspect, in some implementations of the second aspect, the code stream further carries a third identifier for indicating a candidate motion vector predictor of the currently decoded image block, the entropy decoding module further for the slave code Parsing the third identifier in the stream;

In terms of acquiring a motion vector predictor of the current decoded image block, the inter prediction module is specifically configured to:

Determining a candidate motion vector predictor of the currently decoded image block from the candidate motion vector prediction list based on the third identifier; or

Obtaining a motion vector predictor of the currently decoded image block by using a bidirectional matching method or a template matching method.

The embodiment of the present application further provides a video encoding method, a video encoder, and a corresponding electronic device to improve motion vector prediction accuracy, thereby improving encoding performance.

A third aspect of the present application provides a video encoding method, the method comprising: performing a motion estimation process on a current coded image block to obtain a motion vector of a current coded image block; and acquiring a motion vector predictor value of the current coded image block. And when the motion vector predictor is included in the target motion vector set corresponding to the target motion vector accuracy (eg, when the motion vector predictor is included in the target motion vector set, according to the plurality of motion vector sets and Corresponding relationship of motion vector accuracy, determining that the target motion vector accuracy corresponding to the target motion vector set is a motion vector accuracy of the motion vector predictor having a corresponding motion vector, based on the motion vector of the current image block to be encoded, Obtaining, by the motion vector predictor and the target motion vector, a motion vector difference (MVD) information of the current coded image block; wherein the motion vector difference (MVD) of the current coded image block has a target motion Vector accuracy, the target motion vector set is one of a plurality of motion vector sets, The target motion vector accuracy is one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector accuracy, the plurality of motion vector sets including a first motion vector set and a second motion vector set, At least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector set corresponding to the first motion vector set is different from the first a second motion vector accuracy corresponding to the second motion vector set; and entropy encoding the motion vector difference information of the current coded image block into the code stream.

It can be seen that, in the video coding method of the embodiment of the present application, after the video encoder obtains the motion vector predictor of the current coded image block, the target motion vector accuracy corresponding to the target motion vector set to which the motion vector predictor belongs may be Adaptively determining the appropriate motion vector accuracy for the current coded image block. If the number of motion vector sets is N, the video coding method can support M motion vector MV precision, where M is less than or equal to N, M and N are A positive integer, such as M, N is greater than or equal to 3, improves motion vector prediction accuracy. Specifically, since the embodiment of the present invention can adaptively select motion vector accuracy, for one or more image blocks corresponding to certain video content, Using motion vectors with higher pixel precision (eg 1/8 pixel precision) versus motion vectors using lower pixel precision improves video codec quality and benefits are better than interpolation overhead, eg based on A prediction block obtained with a motion vector having high pixel precision (for example, 1/8 pixel precision) is closer to the original of the currently coded image block. Block, even if interpolating the fractional pixel position results in some interpolation overhead; for one or more image blocks corresponding to some video content, using lower pixel precision motion vectors (eg integer pixel precision) versus using higher pixels The video coding method of the embodiment of the present invention improves the codec performance as a whole, and the video coding method of the embodiment of the present application improves the coding and decoding performance as a whole.

With reference to the third aspect, in some implementations of the third aspect, when the motion vector predictor is included in the target motion vector set corresponding to the target motion vector accuracy, based on the current image block to be encoded The motion vector, the motion vector predictor, and the target motion vector precision, obtain motion vector difference (MVD) information of the current image block to be encoded, including:

Determining that the motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the motion vector prediction according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions The value has / corresponding to the motion vector accuracy;

Calculating a difference MVD between the motion vector of the current coded image block and the motion vector predictor, the motion vector of the current coded image block, the motion vector predictor, and the MVD having a target motion vector accuracy; or

Calculating a difference MVD between the motion vector of the current coded image block and the motion vector predictor, the motion vector of the current coded image block and the motion vector predictor having a target motion vector accuracy; and, based on the target Motion vector precision (e.g., an index of the target motion vector accuracy) scales (e.g., scales down) the calculated motion vector difference values to obtain motion vector difference values that are scaled (e.g., reduced).

It can be seen that, in the video coding method of the embodiment of the present application, a better balance between coding efficiency and bit overhead can be achieved by adaptively selecting pixel precision, and the video encoder can adaptively select the target motion vector precision to encode the MVD. The information, for example, the reduced motion vector difference value occupies a smaller number of bits or the same number of bits with respect to the original motion vector difference, and the bit transmission overhead can be reduced for the former, thereby improving the codec performance as a whole.

With reference to the third aspect, in some implementations of the third aspect, the first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is different from the second a second distance between the first two neighboring blocks and the current image block corresponding to the motion vector predictor in the motion vector set, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block and / or time domain adjacent blocks. It should be noted that the first neighboring block herein refers to: an airspace and/or a time domain neighboring block corresponding to the MVPs in the first motion vector set, as shown in Table-3, the first neighboring block may refer to the airspace. The neighboring blocks A1 and B1; and the second neighboring block herein generally refer to: an airspace and/or a time domain neighboring block corresponding to the MVPs in the second motion vector set, as shown in Table-7A, the second neighboring block It can refer to the airspace adjacent blocks A0 and B0.

With reference to the third aspect, in some implementations of the third aspect, if the first distance between the first neighboring block and the current coded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the second motion vector set a second distance between the second neighboring block corresponding to the motion vector predictor and the current coded image block, the first motion vector corresponding to the first motion vector set is higher than the second motion vector set Second motion vector accuracy; or,

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current coded image block is greater than the second distance corresponding to the motion vector predictor in the second motion vector set And a second motion distance of the first motion vector set is lower than a second motion vector precision corresponding to the second motion vector set.

In conjunction with the third aspect, in some implementations of the third aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the third aspect, in some implementations of the third aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector precision assignment rule; wherein the motion vector accuracy The assignment rule is used to characterize the relationship between the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the current image block and the magnitude of the motion vector precision. The motion vector precision assignment rule can be understood as: the closer the distance between the neighboring block corresponding to the motion vector predictor and the current coded image block, the higher the motion vector accuracy; the neighboring block corresponding to the motion vector predictor and the current coded image block The farther the distance is, the lower the motion vector accuracy. Furthermore, the neighboring blocks herein may include, but are not limited to, a spatial neighboring block and/or a time domain neighboring block of the currently coded image block.

In conjunction with the third aspect, in some implementations of the third aspect, the method further comprises:

Encoding the motion vector precision parameter into a code stream, the motion vector precision parameter is used to indicate a value of the plurality of motion vector precisions, and the motion vector precision parameter carries a sequence parameter set and an image parameter of the coded image block Set any of PPS or strip head.

Encoding the motion vector precision parameter into a code stream, the motion vector precision parameter is used to indicate a motion vector accuracy number (eg, pps_amvr_number) for the current encoding processing unit and at least two corresponding to the motion vector precision number a motion vector precision value (for example, pps_amvr_set[pps_amvr_number]), wherein the current encoding processing unit includes a video sequence, an image, a slice slice, a region partition, a Coding Tree Unit (CTU), and a coding unit (CU) One or more of them.

Encoding the first identifier entropy into the code stream, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; or

Encoding the first identifier and the second identifier into a code stream, the first identifier is used to indicate a third motion vector precision, and the second identifier is used to indicate a third motion vector set;

It can be seen that, compared with the existing video coding method, the video coding method in the embodiment of the present application only signals the motion vector precision of the important motion vector set, which can further save bit overhead and improve the codec performance of the video image.

With reference to the third aspect, in some implementations of the third aspect, the acquiring the motion vector predictor of the current image block to be encoded includes: determining, according to the rate distortion cost criterion, from a candidate motion vector prediction list (candidate MVP list) a candidate motion vector predictor MVP for the current image block to be encoded, for example, the candidate motion vector predictor MVP encoding the current coded image block has a minimum rate distortion cost;

The entropy encoding the motion vector difference MVD of the current image block to be encoded into the code stream includes:

And entropy encoding a motion vector difference value MVD and a third identifier (eg, a candidate prediction motion vector index) of the current image block to be encoded into a code stream, where the third identifier is used to indicate a candidate motion vector predictor value MVP of the current coded image block. (For example, the candidate predicted motion vector index may indicate the location of the candidate motion vector predictor selected in the candidate motion vector prediction list).

It should be noted that the video coding method in the embodiment of the present application is applicable not only to the candidate motion vector predictor list in the advanced motion vector prediction (AMVP) but also to the spatial neighboring block in other modes. And/or a list of candidate motion vector predictors constructed by motion vectors of temporally neighboring blocks, thereby improving codec performance.

A fourth aspect of the present application provides a video encoder, including: an inter prediction module, configured to perform a motion estimation process on a current coded image block to obtain a motion vector of a current coded image block; and acquire the current coded image block a motion vector predictor; the inter prediction module is further configured to: when the motion vector predictor is included in a target motion vector set corresponding to a target motion vector accuracy (eg, when the motion vector predictor is included in Determining, according to the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions, the target motion vector accuracy corresponding to the target motion vector set is a motion vector accuracy of the motion vector predictor having a target motion vector set Obtaining motion vector difference (MVD) information of the current coded image block based on the motion vector of the current coded image block, the motion vector predictor, and the target motion vector accuracy; wherein the current code A motion vector difference (MVD) of an image block has a target motion vector precision, the target motion vector set And being one of a plurality of motion vector sets, the target motion vector precision being one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector precision, the multiple motion vector sets including the first a motion vector set and a second motion vector set, the at least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector set Corresponding first motion vector precision is different from second motion vector precision corresponding to the second motion vector set; and an entropy encoding module, configured to: perform motion vector difference (MVD) information of the current image block to be encoded Entropy is encoded into the code stream.

With reference to the fourth aspect, in some implementations of the fourth aspect, the current coded image block is obtained based on the motion vector of the current coded image block, the motion vector predictor, and the target motion vector accuracy An aspect of motion vector difference (MVD) information, the inter prediction module is specifically configured to:

Determining that the motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the motion vector prediction according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions The value has a corresponding/corresponding motion vector accuracy; calculating a difference MVD between the motion vector of the current coded image block and the motion vector predictor, the motion vector of the current coded image block, the motion vector predictor, and the MVD has target motion vector accuracy; or,

Calculating a difference MVD between the motion vector of the current coded image block and the motion vector predictor, the motion vector of the current coded image block and the motion vector predictor having a target motion vector accuracy; and, based on the target Motion vector accuracy (e.g., an index value based on the target motion vector accuracy) scales (e.g., scales down) the calculated motion vector difference values to obtain a scaled (e.g., reduced) motion vector difference value.

With reference to the fourth aspect, in some implementations of the fourth aspect, the first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is different from the second a second distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the motion vector set, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block and/or Or in a time domain adjacent to the block.

With reference to the fourth aspect, in some implementations of the fourth aspect, if the first distance between the first neighboring block and the current coded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the second motion vector set a second distance between the second neighboring block corresponding to the motion vector predictor and the current coded image block, the first motion vector corresponding to the first motion vector set is higher than the second motion vector set Second motion vector accuracy; or,

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the fourth aspect, in some implementations of the fourth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule; wherein the motion vector accuracy The assignment rule is used to characterize the relationship between the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the current image block and the magnitude of the motion vector precision. The motion vector precision assignment rule can be understood as: the closer the distance between the neighboring block corresponding to the motion vector predictor and the current coded image block, the higher the motion vector accuracy; the neighboring block corresponding to the motion vector predictor and the current coded image block The farther the distance is, the lower the motion vector accuracy. Furthermore, the neighboring blocks herein may include, but are not limited to, a spatial neighboring block and/or a time domain neighboring block of the currently coded image block.

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the entropy encoding module is further configured to: entropy encode a motion vector precision parameter into a code stream, where the motion vector precision parameter is used to indicate the multiple motions A value of vector precision, the motion vector accuracy parameter being carried in any one of a sequence parameter set, an image parameter set PPS, or a slice header of the coded image block.

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the entropy encoding module is further configured to: entropy encode a motion vector precision parameter into a code stream, where the motion vector precision parameter is used to indicate that the current encoding process is used a number of motion vector precisions of the unit (eg, pps_amvr_number) and at least two motion vector precision values (eg, pps_amvr_set[pps_amvr_number]) corresponding to the number of motion vector precisions, wherein the current encoding processing unit includes a video sequence, an image, One or more of a stripe, a region partition, a Coding Tree Unit (CTU), and a coding unit (CU).

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the entropy encoding module is further configured to:

With reference to the fourth aspect, in some implementation manners of the fourth aspect, in the aspect of acquiring a motion vector predictor of a current image block to be encoded, the inter prediction module is specifically configured to: select a candidate motion vector according to a rate distortion cost criterion. Determining a candidate motion vector predictor MVP for the current image block to be encoded in a candidate MVP list, for example, the candidate motion vector predictor MVP encoding the current coded image block has a minimum rate distortion cost;

The entropy encoding module is further configured to encode a third identity entropy (eg, a candidate prediction motion vector index) into the code stream, where the third identifier is used to indicate a candidate motion vector predictor (MVP) of the currently coded image block (eg, The candidate predicted motion vector index may indicate the location of the candidate motion vector predictor selected in the candidate motion vector prediction list.

A fifth aspect of the present application provides a video decoding method, including: receiving a code stream, where the code stream carries motion vector difference MVD information of a currently decoded image block and a candidate motion vector predictor for indicating a currently decoded image block. An index of the MVP, parsing the MVD information and the index from the code stream; determining a candidate motion vector predictor MVP for the currently decoded image block from the candidate motion vector predictor list based on the index; The candidate motion vector predictor MVP is included in the target motion vector set corresponding to the target motion vector accuracy (for example, when the motion vector predictor is included in the target motion vector set, according to the plurality of motion vector sets and Corresponding relationship of motion vector accuracy, determining that the target motion vector accuracy corresponding to the target motion vector set is a motion vector accuracy of the motion vector predictor having a corresponding motion vector accuracy, based on the candidate motion vector predictor value MVP, Motion vector difference MVD information and the target motion vector precision, to obtain a motion vector of the currently decoded image block The motion vector of the currently decoded image block has the target motion vector precision, and the target motion vector set is one of a plurality of motion vector sets, and the target motion vector precision includes a first motion vector accuracy and a first One of a plurality of motion vector precisions of motion vector accuracy, the plurality of motion vector sets including a first motion vector set and a second motion vector set, the first motion vector set and the second motion vector set The at least one motion vector set includes two or more motion vector predictors, and the first motion vector accuracy corresponding to the first motion vector set is different from the second motion vector precision corresponding to the second motion vector set; Performing motion compensation based on the motion vector of the current decoded image block having the target motion vector precision, obtaining a prediction block of the currently decoded image block (which may also be understood as a predicted value of a pixel value of the currently decoded image block); The prediction block reconstructs the current decoded image block.

It should be noted that the target motion vector set is, for example, a subset of a plurality of motion vector predictors used to construct a candidate motion vector predictor list. For example, the plurality of motion vector sets are a plurality of subsets of the plurality of motion vector predictors used to construct the candidate motion vector predictor list, and the motion vector predictors included in the different motion vector sets are different from each other.

It can be seen that, in the video decoding method of the embodiment of the present application, after the video decoder obtains the candidate motion vector predictor of the current decoded image block, the target motion vector accuracy corresponding to the target motion vector set to which the motion vector predictor belongs is obtained. The adaptive motion vector accuracy for the currently decoded image block can be adaptively determined. If the number of motion vector sets is N, the video decoder can support M motion vector MV precision, where M is less than or equal to N, M, and N. All of them are positive integers, which improves the motion vector prediction accuracy. Specifically, since the embodiment of the present invention can adaptively select motion vector precision, one or more image blocks corresponding to some video content are used with higher pixel precision ( Motion vectors such as 1/8 pixel precision, relative to motion vectors using lower pixel precision, improve video codec quality, and the benefits are better than interpolation overhead, for example, based on high pixel precision (eg 1 /8 pixel precision) The motion vector obtained by the motion vector is closer to the original block of the currently decoded image block, even if the interpolation is The number of pixel locations results in some interpolation overhead; for one or more image blocks corresponding to certain video content, motion vectors with lower pixel precision (eg integer pixel precision) versus motion vectors using higher pixel precision, The video decoding method of the embodiment of the present application improves the codec performance as a whole.

With reference to the fifth aspect, in some implementations of the fifth aspect, when the candidate motion vector predictor MVP is included in the target motion vector set corresponding to the target motion vector accuracy, based on the candidate motion vector predictor value MVP And the motion vector difference (MVD) information and the target motion vector precision, and the motion vector of the current image block to be decoded is obtained, including:

Determining that the candidate motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the candidate motion according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions Vector predictive value has / corresponding motion vector accuracy;

Calculating a sum of the candidate motion vector predictor value MVP and the motion vector difference value MVD to obtain a motion vector of the current decoded image block, a motion vector of the current decoded image block, and the candidate motion vector predictor value MVP And the motion vector difference MVD has the same motion vector precision (that is, both have the target motion vector precision); or,

And scaling (eg, amplifying) the motion vector difference information based on the target motion vector precision to obtain a scaled (eg, amplified) motion vector difference MVD; calculating the candidate motion vector predictor MVP and a sum of the scaled motion vector difference values MVD to obtain a motion vector of the current decoded image block, the motion vector of the current decoded image block, the candidate motion vector predictor value MVP, and the motion vector difference value MVD having The same motion vector accuracy (that is, both have the target motion vector accuracy).

It can be seen that, in this embodiment of the present application, for the scaled motion vector difference MVD information of the current decoded image block parsed from the code stream, the video decoder determines the current corresponding index in the candidate motion vector predictor list. After decoding the candidate motion vector predictor MVP of the image block, adaptively deriving the target motion vector precision of the MVD information according to the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions, according to the adaptive derivation The target motion vector precision scales (eg, amplifies) the parsed MVD information to restore the parsed scaled motion vector difference information to a motion vector before the video encoding end is scaled (eg, reduced) The difference information ensures the accuracy of the motion vector difference scaling, and the scaled MVD information occupies less bit overhead or the same bit overhead relative to the original MVD information, thereby improving the codec performance.

With reference to the fifth aspect, in some implementations of the fifth aspect, the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is different from the first a second distance between the second neighboring block corresponding to the motion vector predictor in the motion vector predictor and the current decoded image block, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block And / or time domain neighboring blocks.

It should be noted that the spatial neighboring block herein may include one or more spatial neighboring blocks adjacent to the current image block in the image of the currently decoded image block.

With reference to the fifth aspect, in some implementations of the fifth aspect, if the first distance between the first neighboring block and the current decoded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the first a second distance between the second neighboring block corresponding to the motion vector predictor in the second motion vector set and the current decoded image block, and the first motion vector precision corresponding to the first motion vector set (eg, 1/8 pixel precision) Higher than the second motion vector accuracy corresponding to the second motion vector set (eg, 1/4 pixel precision); or,

In conjunction with the fifth aspect, in some implementations of the fifth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the fifth aspect, in some implementations of the fifth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule, wherein the motion The vector precision assignment rule is used to characterize the relationship between the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the currently decoded image block and the magnitude of the motion vector precision. The motion vector precision assignment rule can be understood as: the closer the distance between the neighboring block corresponding to the motion vector predictor and the current image block, the higher the motion vector accuracy; the distance between the neighboring block corresponding to the motion vector predictor and the current image block The farther away, the lower the motion vector accuracy. Furthermore, adjacent blocks herein may include, but are not limited to, a spatial neighboring block and/or a time domain neighboring block of the current image block.

With reference to the fifth aspect, in some implementations of the fifth aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a value of the multiple motion vector precision, the motion The vector precision parameter is carried in any one of a sequence parameter set, an image parameter set PPS, a stripe slice header, or some other hierarchical position of the decoded image block. Preferably, the various motion vector precisions herein may be greater than or equal to the three motion vector precisions.

In conjunction with the fifth aspect, in some implementations of the fifth aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a motion vector accuracy number for the current decoding processing unit ( For example, pps_amvr_number) and at least two motion vector precision values (for example, pps_amvr_set[pps_amvr_number]) corresponding to the motion vector precision number, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, and an encoding. One or more of a Coding Tree Unit (CTU) and a coding unit (CU).

With reference to the fifth aspect, in some implementations of the fifth aspect, the code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector accuracy corresponding to the third motion vector set; for example, The first identifier may be carried in any one of a sequence parameter set, an image parameter set or a slice header of the decoded image block; or

A sixth aspect of the present application provides a video decoder, including: an entropy decoding module, configured to receive a code stream, where the code stream carries motion vector difference MVD information of a currently decoded image block and is used to indicate a current decoded image block. An index of the candidate motion vector predictor MVP, the MVD information and the index are parsed from the code stream; an inter prediction module, configured to determine from the candidate motion vector predictor list based on the index a candidate motion vector predictor MVP (also referred to as a best candidate motion vector predictor MVP) of the currently decoded image block; when the candidate motion vector predictor MVP is included in the target motion vector set corresponding to the target motion vector precision ( For example, when the candidate motion vector predictor is included in the target motion vector set, determining the target motion vector accuracy corresponding to the target motion vector set according to the correspondence between the multiple motion vector sets and the plurality of motion vector precisions Determining a corresponding motion vector predictor having a corresponding motion vector accuracy, based on the candidate motion vector predictor MVP, the motion And a motion vector of the current decoded image block, where the motion vector of the current decoded image block has the target motion vector precision, and the target motion vector set is One of a plurality of sets of motion vectors, the target motion vector accuracy being one of a plurality of motion vector accuracies including a first motion vector precision and a second motion vector accuracy, the plurality of motion vector sets including the first motion a vector set and a second motion vector set, the at least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector set corresponds to The first motion vector precision is different from the second motion vector precision corresponding to the second motion vector set; and the motion vector with the target motion vector precision of the current decoded image block is motion compensated to obtain a current decoded image. a prediction block of a block (also understood as a predicted value of a pixel value of a currently decoded image block); Reconstruction module, based on the prediction block to reconstruct the decoded image of the current block.

With reference to the sixth aspect, in some implementations of the sixth aspect, the candidate motion vector is based on the candidate motion vector when the candidate motion vector predictor MVP is included in the target motion vector set corresponding to the target motion vector accuracy Observing the value MVP, the motion vector difference information, and the target motion vector accuracy, to obtain an aspect of the motion vector of the current decoded image block, where the inter prediction module is specifically configured to:

Calculating a sum of the candidate motion vector predictor value MVP and the motion vector difference value MVD information to obtain a motion vector of the current decoded image block, a motion vector of the current decoded image block, and the candidate motion vector predictor value MVP and motion vector difference MVD have the same motion vector accuracy (that is, both have target motion vector accuracy); or,

In conjunction with the sixth aspect, in some implementations of the sixth aspect, the first distance between the first neighboring block and the current decoded image block corresponding to the motion vector predictor in the first motion vector set is different from the first a second distance between the second neighboring block corresponding to the motion vector predictor in the motion vector predictor and the current decoded image block, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block And / or time domain neighboring blocks.

With reference to the sixth aspect, in some implementations of the sixth aspect, if the first distance between the first neighboring block and the current decoded image block corresponding to the motion vector predictor in the first motion vector set is smaller than the first a second distance between the second neighboring block corresponding to the motion vector predictor in the motion vector predictor and the current decoded image block, and the first motion vector corresponding to the first motion vector set is higher in precision than the second motion vector Set the corresponding second motion vector precision; or,

In conjunction with the sixth aspect, in some implementations of the sixth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.

With reference to the sixth aspect, in some implementations of the sixth aspect, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule, wherein the motion The vector precision assignment rule is used to characterize the relationship between the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the currently decoded image block and the magnitude of the motion vector precision. The motion vector precision assignment rule can be understood as: the closer the distance between the neighboring block corresponding to the motion vector predictor and the current image block, the higher the motion vector accuracy; the distance between the neighboring block corresponding to the motion vector predictor and the current image block The farther away, the lower the motion vector accuracy. Furthermore, adjacent blocks herein may include, but are not limited to, a spatial neighboring block and/or a time domain neighboring block of the current image block.

With reference to the sixth aspect, in some implementations of the sixth aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a value of the multiple motion vector precision, the motion The vector precision parameter is carried in any one of a sequence parameter set, an image parameter set PPS or a slice header of the decoded image block;

In conjunction with the sixth aspect, in some implementations of the sixth aspect, the code stream further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a motion vector accuracy number for the current decoding processing unit ( For example, pps_amvr_number) and at least two motion vector precision values (for example, pps_amvr_set[pps_amvr_number]) corresponding to the motion vector precision number, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, and a decoding. One or more of a Coding Tree Unit (CTU) and a coding unit (CU);

With reference to the sixth aspect, in some implementations of the sixth aspect, the code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; or

A seventh aspect of the present application provides a video decoding apparatus, the apparatus comprising: a processor and a memory coupled to the processor; the processor for performing the first, fifth aspects or various implementations thereof The method in .

An eighth aspect of the present application provides a video encoding apparatus, the apparatus comprising: a processor and a memory coupled to the processor; the processor for performing the third aspect or various implementations thereof method.

A ninth aspect of the present application provides a video decoding apparatus including a nonvolatile storage medium, and a processor, the nonvolatile storage medium storing an executable program, the processor and the nonvolatile The storage mediums are coupled to one another and execute the executable program to implement the methods of the first, fifth, or various implementations thereof.

A tenth aspect of the present application provides a video encoding apparatus including a nonvolatile storage medium, and a processor, the nonvolatile storage medium storing an executable program, the processor and the nonvolatile The storage mediums are coupled to each other and the executable program is executed to implement the methods of the third aspect or various implementations thereof.

An eleventh aspect of the present application provides a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the first or third or fifth aspects described above or Methods in various implementations.

A twelfth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first or third or fifth aspects or the various implementations thereof.

A thirteenth aspect of the present application provides an electronic device comprising the video decoder described in the second aspect, the sixth aspect, or the various implementations thereof, or the video coding described in the fourth aspect or various implementation manners thereof Device.

It should be understood that the beneficial effects obtained by the various aspects and the corresponding implementable design manners are similar and will not be described again.

DRAWINGS

1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application;

2 is a schematic block diagram of a video encoder in an embodiment of the present application;

3 is a schematic block diagram of a video decoder in an embodiment of the present application;

4A is a schematic diagram of an integer pixel position and a fractional pixel position in an embodiment of the present application;

4B is a schematic diagram of another integer pixel position and a fractional pixel position in the embodiment of the present application;

FIG. 5 is a schematic flowchart of a video encoding method according to an embodiment of the present application;

FIG. 6A is a schematic diagram of constructing an AMVP candidate MVP list according to an embodiment of the present application; FIG.

FIG. 6B is another schematic diagram of constructing an AMVP candidate MVP list according to an embodiment of the present application; FIG.

6C is a schematic diagram showing a syntax structure of an image parameter set PPS output by a video encoder according to an embodiment of the present application;

6D is a schematic diagram of another syntax structure of an image parameter set PPS output by a video encoder according to an embodiment of the present application;

6E is a schematic diagram of still another syntax structure of an image parameter set PPS output by a video encoder according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of a video decoding method according to an embodiment of the present application;

8 is a schematic diagram of a template matching method for decoding end motion vector derivation (DMVD);

9 is a schematic diagram of a bidirectional matching method for decoding end motion vector derivation (DMVD);

FIG. 10 is a schematic flowchart of step 705 in a video decoding method according to an embodiment of the present application;

11 is a schematic block diagram of another encoding device or decoding device according to an embodiment of the present application;

FIG. 12 is a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments.

An encoded video stream, or a portion thereof, such as a video frame or image block, may use temporal and spatial similarities in the video stream to improve encoding performance. For example, the current image block of the video stream can predict the motion information for the current image block based on the previously encoded block in the video stream and identify the difference between the prediction block and the current image block (ie, the original block) (also known as the original block). Is the residual), thereby encoding the current image block based on the previously encoded block. In this way, only the residuals and some parameters used to generate the current image block are included in the digital video output bitstream, rather than including the entirety of the current image block in the digital video output bitstream. This technique can be called inter prediction.

A motion vector is an important parameter in the inter prediction process that represents the spatial displacement of a previously coded block relative to the current coded block. Motion estimation methods, such as motion search, can be used to obtain motion vectors. In the initial interframe prediction technique, bits (i.e., bits) representing motion vectors are included in the encoded code stream to allow the decoder to reproduce the prediction block, thereby obtaining a reconstructed block. In order to further improve the coding efficiency, it is later proposed to use a motion vector predictor (also referred to as a reference motion vector) to differentially encode the motion vector, that is, instead of coding the motion vector as a whole, and only between the motion vector and the motion vector predictor. Difference. In some cases, the motion vector predictor may be selected from previously used motion vectors, and selecting the previously used motion vector to encode the current motion vector may further reduce the bit overhead included in the encoded video bitstream.

Under the premise that the encoded video code stream (referred to as the code stream or the bit stream) includes the motion vector difference MVD information at the image block level, the embodiment of the present application describes adaptively selecting the motion vector precision for encoding the image block and using A technical solution for determining a motion vector accuracy selected by the video encoder for the image block by a video decoder. According to some embodiments of the present application, the video decoder may derive the motion vector accuracy selected by the video encoder for the current image block without receiving a syntax element indicating motion vector accuracy. In accordance with some embodiments of the present application, a video encoder may signal, in a code stream, motion vector accuracy selected by a video encoder for a certain set of motion vectors (e.g., an important set of motion vectors). According to embodiments of the present application, an adaptive selection between integer pixel precision and different levels of fractional pixel precision may be employed. For example, in the embodiment of the present application, the integer pixel precision, the four-pixel precision, the half-pixel precision, the quarter-pixel precision, or the eighth-pixel precision of the motion vector used for encoding the image block may be Adapt to the choice. The term "an eighth pixel" in the embodiment of the present application refers to an accuracy of one eighth (1/8) of a pixel, for example, one of the following: an integer pixel position (0/8), a pixel One-eighth (1/8), two-eighths of a pixel (2/8, also known as a quarter of a pixel), three-eighths of a pixel (3/8), and an eighth of a pixel Four (4/8, also known as one-half of a pixel and two-quarters of a pixel), five-fifths of a pixel (5/8), and six-eighths of a pixel (6/8, also known as Three-quarters of the pixels, or seven-eighths of the pixels (7/8).

FIG. 1 is a schematic block diagram of a video encoding and decoding system 10 in an embodiment of the present application. Video encoder 20 in system 10 is operative to adaptively select target motion vector precision for encoding image blocks in accordance with various method examples of video encoding processes proposed herein (eg, integer pixel precision or 1/2 pixel precision or 1/ 4 pixel precision or 1/8 pixel precision or 4 pixel precision, etc.). The video decoder 30 in the system 10 is configured to determine motion vector precision for encoding image blocks selected by the video encoder for the image block according to various method examples of the video decoding process proposed by the present application, thereby improving motion vector prediction accuracy, and further Improve codec performance.

As shown in FIG. 1, system 10 includes source device 12 and destination device 14, which produces encoded video data that will be decoded by destination device 14 at a later time. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook computers, tablet computers, set top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" "Touchpads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like.

Destination device 14 may receive encoded video data to be decoded via communication link 16. Communication link 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14. In one possible implementation, communication link 16 may include communication media that enables source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data can be modulated and transmitted to destination device 14 in accordance with a communication standard (e.g., a wireless communication protocol). Communication media can include any wireless or wired communication medium, such as a radio frequency spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network (eg, a global network of local area networks, wide area networks, or the Internet). Communication media can include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

Alternatively, the encoded data may be output from the output interface 22 to a storage device (not shown). Similarly, encoded data can be accessed from a storage device by an input interface. The storage device can include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or Any other suitable digital storage medium for storing encoded video data. In another possible implementation, the storage device may correspond to a file server or another intermediate storage device that may maintain encoded video produced by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting this encoded video data to destination device 14. Possible Implementations A file server includes a web server, a file transfer protocol server, a network attached storage device, or a local disk unit. Destination device 14 can access the encoded video data via any standard data connection that includes an Internet connection. This data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from a storage device can be streaming, downloading, or a combination of both.

The techniques of this application are not necessarily limited to wireless applications or settings. Techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air broadcast, cable television transmission, satellite television transmission, streaming video transmission (eg, via the Internet), encoding digital video for use in It is stored on a data storage medium and decodes digital video or other applications stored on the data storage medium. In some possible implementations, system 10 is used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the possible implementation of FIG. 1, source device 12 includes video source 18, video encoder 20, and output interface 22. In accordance with an embodiment of the present application, video encoder 20 of source device 12 is operative to support adaptive selection of techniques for encoding motion vector resolution (also referred to as motion vector accuracy or pixel precision) of an image block. In some applications, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include sources such as video capture devices (eg, cameras), video archives containing previously captured video, video feed interfaces to receive video from video content providers. And/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources. As a possible implementation, if the video source 18 is a video camera, the source device 12 and the destination device 14 may form a so-called camera phone or video phone. The techniques described in this application are illustratively applicable to video decoding and are applicable to wireless and/or wired applications.

Captured, pre-captured, or computer generated video may be encoded by video encoder 20. The encoded video data can be transmitted directly to the destination device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on a storage device (not shown) for later access by the destination device 14 or other device for decoding and/or playback.

Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some applications, input interface 28 can include a receiver and/or a modem. The input interface 28 of the destination device 14 receives the encoded video data via the communication link 16. The encoded video data communicated or provided on the storage device via communication link 16 may include various syntax elements generated by video encoder 20 for use by video decoder 30 to decode the video data (see Figures 6C-6E, below) Detailed introduction). These syntax elements can be included with encoded video data that is transmitted over a communication medium, stored on a storage medium, or stored on a file server.

Display device 32 may be integrated with destination device 14 or external to destination device 14. In some possible implementations, destination device 14 can include an integrated display device and is also configured to interface with an external display device. In other possible implementations, the destination device 14 can be a display device. In general, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or another type of display device.

Video encoder 20 and video decoder 30 may operate in accordance with, for example, the next generation video codec compression standard (H.266) currently under development and may conform to the H.266 Test Model (JEM). Alternatively, video encoder 20 and video decoder 30 may be according to, for example, the ITU-TH.265 standard, also referred to as a high efficiency video decoding standard, or other proprietary or industry standard of the ITU-TH.264 standard or an extension of these standards. In operation, the ITU-TH.264 standard is alternatively referred to as MPEG-4 Part 10, also known as advanced video coding (AVC). However, the techniques of this application are not limited to any particular decoding standard. Other possible implementations of the video compression standard include MPEG-2 and ITU-T H.263.

Although not shown in FIG. 1, in some aspects video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include a suitable multiplexer-demultiplexer ( MUX-DEMUX) unit or other hardware and software to handle the encoding of both audio and video in a common data stream or in a separate data stream. If applicable, in some possible implementations, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, software, hardware, firmware, or any combination thereof. When the technology is partially implemented in software, the apparatus may store the instructions of the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of the present application. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder/decoder (CODEC) in a respective device. part.

The present application may illustratively involve video encoder 20 "signaling" particular information to another device, such as video decoder 30. However, it should be understood that video encoder 20 may signal information by associating particular syntax elements with various encoded portions of the video data. That is, video encoder 20 may "signal" the data by storing the particular syntax elements to the header information of the various encoded portions of the video data. In some applications, these syntax elements may be encoded and stored (eg, stored to a storage system or file server) prior to being received and decoded by video decoder 30. Thus, the term "signaling" may illustratively refer to the communication of grammar or other data used to decode compressed video data, whether this communication occurs in real time or near real time or occurs over a time span, such as may be encoded Occurs when a syntax element is stored to the media, and the syntax element can then be retrieved by the decoding device at any time after storage to the media.

JCT-VC developed the H.265 (HEVC) standard. HEVC standardization is based on an evolution model of a video decoding device called the HEVC Test Model (HM). The latest standard documentation for H.265 is available at http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16), which is the full text of the standard document. The manner of reference is incorporated herein. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-TH.264/AVC. For example, H.264 provides nine intra-prediction coding modes, while HM provides up to 35 intra-prediction coding modes.

In general, the working model description of HM can divide a video frame or image into a sequence of treeblocks or largest coding units (LCUs) containing both luminance and chrominance samples, also referred to as CTUs. Treeblocks have similar purposes to macroblocks of the H.264 standard. A stripe contains several consecutive treeblocks in decoding order. A video frame or image can be segmented into one or more stripes. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node again and split into four other child nodes. The final non-splitable child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded image blocks. The syntax data associated with the decoded code stream may define the maximum number of times the tree block can be split, and may also define the minimum size of the decoded node.

A coding unit (CU) includes a decoding node and a prediction unit (PU) and a transform unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU may range from 8 x 8 pixels up to a maximum of 64 x 64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which a CU is partitioned into one or more PUs. The split mode may be different between situations where the CU is skipped or encoded by direct mode coding, intra prediction mode coding, or inter prediction mode. The PU can be divided into a shape that is non-square. For example, syntax data associated with a CU may also describe a situation in which a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The TU is typically sized based on the size of the PU within a given CU defined for the partitioned LCU, although this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some possible implementations, the residual samples corresponding to the CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT). The leaf node of the RQT can be referred to as a TU. The pixel difference values associated with the TU may be transformed to produce transform coefficients, which may be quantized.

For example, when the PU is intra-mode encoded, data related to the intra prediction mode of the PU is involved. For example, when the PU is inter-mode encoded, data for determining a motion vector of the PU is involved. For example, the data used to determine the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, and the resolution of the motion vector (eg, integer pixel precision, half pixel precision, quarter pixel precision) Or an eighth-pixel precision, etc.), a reference image pointed to by the motion vector, and/or a reference image list of motion vectors (eg, list 0 or list 1).

In accordance with embodiments of the present application, video encoder 20 may adaptively select motion vectors having integer pixel precision or fractional (eg, quarter or eighth) pixel precision motion vectors. According to some embodiments of the present application, video encoder 20 may not need to generate an indication of the pixel precision of the motion vector of the image block to be included in the code stream of the encoded video data. Rather, video decoder 30 may derive the motion vector accuracy using the same or similar methods used by video encoder 20. In accordance with further embodiments of the present application, video encoder 20 may include in the codestream one or more syntax elements used by video decoder 30 to determine the accuracy of the selected motion vector. It should be understood that to calculate the value of the fractional pixel position, video encoder 20 may include a variety of interpolation filters. For a motion vector with fractional pixel precision, an interpolation filter is used to interpolate the predicted block of the current image block indicated by the motion vector. For example, bilinear interpolation can be used to calculate the value of a fractional pixel position. Video encoder 20 is operative to perform a motion search to obtain a motion vector.

In general, TUs use transform and quantization processes. A given CU with one or more PUs may also include one or more transform units (TUs). After prediction, video encoder 20 may calculate a residual value corresponding to the PU. The residual value includes pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using TU to produce serialized transform coefficients for entropy decoding.

After intra-predictive or inter-predictive decoding of a PU using a CU, video encoder 20 may calculate residual data for the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may be included in transforming (eg, discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after application to the residual video data. The residual data may correspond to a pixel difference between a pixel of the uncoded image and a predicted value corresponding to the PU. Video encoder 20 may form a TU that includes residual data for the CU, and then transform the TU to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization illustratively refers to the process of quantizing the coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process can reduce the bit depth associated with some or all of the coefficients. For example, the n-bit value can be rounded down to an m-bit value during quantization, where n is greater than m.

In some possible implementations, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to produce an entropy encoded serialized vector. In other possible implementations, video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may be based on context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), grammar based context adaptive binary. Arithmetic decoding (SBAC), probability interval partitioning entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 to decode the video data.

The video decoder 30 of the destination device 14 is configured to perform a technical solution similar or corresponding to any or all of the technical solutions of the video encoder 20 of the embodiment of the present application. For example, using the same derivation technique, video decoder 30 may determine what motion vector precision to use to decode the video data without receiving a syntax element indicating motion vector accuracy.

JVET is committed to the development of the H.266 standard. The H.266 standardization process is based on an evolution model of a video decoding device called the H.266 test model. The algorithm description of H.266 is available from http://phenix.int-evry.fr/jvet, and the latest algorithm description is included in JVET-F1001-v2, which is incorporated herein by reference in its entirety. . At the same time, the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined Binary Tree" (QTBT) is introduced. The QTBT structure rejects the concepts of CU, PU, TU, etc. in HEVC, and supports more flexible CU partitioning shapes. One CU can be square or rectangular. A CTU first performs quadtree partitioning, and the leaf nodes of the quadtree further perform binary tree partitioning. At the same time, there are two division modes in the binary tree division, symmetric horizontal division and symmetric vertical division. The leaf nodes of the binary tree are called CUs, and the CUs of the JEM cannot be further divided during the prediction and transformation process, that is, the CUs, PUs, and TUs of the JEM have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luma pixels.

The present application may refer to a PU that the video decoder is currently decoding as a current decoded PU, also referred to as a current decoded image block (also referred to as a current decoding unit). The present application may refer to a CU that the video decoder is currently decoding as a current decoding CU, also referred to as a current decoded image block (also referred to as a current decoding unit). The present application may refer to a PU that the video encoder is currently encoding as a current encoded PU, also referred to as a current encoded image block (also referred to as a current coding unit). The present application may refer to a CU currently being encoded by a video decoder as a current encoded CU, also referred to as a current encoded image block (also referred to as a current coding unit). The present application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the image block is uniformly used for representation. In some specific applications, the term "image block" may also be used herein to refer to a tree block containing a decoding node as well as a PU and a TU, eg, an LCU or CU.

Various method examples of a video encoding process or a video decoding process will be described in detail below in the embodiments of the present application to adaptively select motion vector precision for encoding an image block and determine by the video decoder that the video encoder selects for the image block. Motion vector accuracy to improve the motion vector prediction accuracy of the image block, thereby improving the codec performance.

FIG. 2 is a schematic block diagram of a video encoder 20 in the embodiment of the present application. Referring to FIG. 5 together, video encoder 20 may perform adaptive determination of a process for encoding motion vector resolution (also referred to as motion vector accuracy or pixel precision) of a current image block, particularly a prediction module in video encoder 20. 40. For example, inter prediction module 41 may perform a process of adaptively determining a motion vector resolution for encoding a current image block.

For an image block in a video frame, video encoder 20 may perform intra and inter prediction of the image block (eg, LCU, CU, or PU). The intra prediction process relies on spatial prediction to reduce or remove spatial redundancy of video data within a video frame or image. The inter prediction process relies on temporal prediction to reduce or remove temporal redundancy of video data within adjacent frames or adjacent images of a video sequence. An intra mode (I mode) may refer to any of a number of spatially based compressed modes; an inter mode, such as unidirectional prediction (P mode) or bidirectional prediction (B mode) may refer to several time based compression modes Either.

As shown in FIG. 2, video encoder 20 receives a current image block within a video frame to be encoded. In the example of FIG. 2, video encoder 20 may include prediction module 40, reference image memory 64, summer 50, transform module 52, quantization module 54, and entropy encoding module 56. The prediction module 40 can include an inter prediction module 41 and an intra prediction module 46. Inter prediction module 41 may perform an inter prediction process, while intra prediction module 46 may perform an intra prediction process. As an example manner, inter prediction module 41 may include motion estimation module 42 and motion compensation module 44. To reconstruct the image block, video encoder 20 may also include an inverse quantization module 58, an inverse transform module 60, and a summer 62 (also referred to as reconstruction module 62). In one implementation, video encoder 20 may also include a deblocking filter (not shown in FIG. 2) to filter block boundaries to remove blockiness artifacts from the reconstructed image. The deblocking filter typically filters the output of summer 62 as needed. In addition to the deblocking filter, an additional loop filter (in-loop or post-loop) can also be used. In another implementation, video encoder 20 may also include a video data store (not shown in FIG. 2), where the video data store may store video data to be encoded by video encoder 20. Video data stored in the video data store can be obtained, for example, from video source 18. Reference image memory 64 (also referred to as decoded image buffer 64) is used to store reference image data for video encoder 20 encoding video data, such as for storage by video encoder 20 in an intra or inter coding mode. Reference image data to be used. The video data memory and reference image memory 64 may be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM). Or other types of memory devices. The video data memory and reference image memory 64 may be provided by the same memory device or a separate memory device. In various implementations, the video data store can be deployed on the chip as other modules of video encoder 20, or the video data store can be disposed external to the chip relative to other modules of video encoder 20.

As shown in FIG. 2, during encoding, video encoder 20 receives a video frame or slice to be encoded. The frame or slice may be divided into a plurality of image blocks (for example, an LCU, a CU, or a PU), for example, the division of the image block may be performed according to a quadtree structure of the LCU and the CU.

Prediction module 40 may select one of a plurality of encoding modes of the current image block, such as one or more of a plurality of intra encoding modes, based on encoding quality and cost calculation results (eg, rate-distortion cost, RDcost). One of the inter coding modes. Prediction module 40 may provide the resulting intra-coded block or inter-coded block to summer 50 to generate residual block data and provide the resulting intra-coded block or inter-coded block to summer 62. The encoded image block is reconstructed for use as a reference image.

Motion estimation module 42 and motion compensation module 44 within inter prediction module 41 may perform inter prediction encoding of the received current image block relative to one or more of the one or more reference images to provide temporal compression. Motion estimation unit 42 is operative to determine an inter prediction mode for the video stripe based on a predetermined pattern of the video sequence. The predetermined mode may designate a video stripe in a video sequence as a P strip, a B strip, or a GPB strip. Intra prediction module 46 may perform intra prediction encoding of the received image block relative to one or more neighboring blocks in the same frame or strip as the image block to be encoded to provide spatial compression.

Motion estimation module 42 and motion compensation module 44 may be integrated together, but may be separately illustrated for conceptual purposes. Motion estimation unit 42 is operative to perform a motion estimation process to obtain a motion vector. For example, the motion vector is used to indicate the displacement of the current video frame or the currently being coded image block (referred to as the current coding block) within the current image relative to the prediction block within the reference image. Wherein, the prediction block is a block that is found to closely match the current coding block, for example, a block that is closely matched with the current coding block according to the difference value between the block and the block, wherein the difference value between the two blocks may be It is regarded as the accumulation of pixel difference values at each corresponding position in the two blocks. The calculation method of difference is generally based on the SAD (sum of absolute difference) criterion. Or other criteria, such as SATD (Sum of Absolute Transform Difference), MR-SAD (mean-removed sum of absolute difference), SSD (sum of squared differences), and the like. In some possible implementations, video encoder 20 may calculate a value of a sub-integer pixel location of a reference image stored in reference image memory 64. For example, video encoder 20 may interpolate values of a quarter pixel position, an eighth pixel position, or other fractional pixel position of a reference image. The motion estimation module 42 can be configured to perform motion search with respect to integer pixel locations and fractional pixel locations and output motion vectors with integer pixel precision or motion vectors with fractional pixel precision. In other words, the inter prediction module 41 (eg, the motion estimation module 42) may perform an Integer Motion Estimation (IME) on the current image block (eg, the current CU or the current PU, in some cases the current CU is the current PU). And then Fractional Motion Estimation (FME) is performed. When inter prediction module 41 (eg, motion estimation module 42) performs an IME on the current image block, inter prediction module 41 (eg, motion estimation module 42) may search for references to the current image block in one or more reference images. Piece. After finding the reference block for the current image block, inter prediction module 41 (eg, motion estimation module 42) may generate a motion vector indicating the displacement between the reference block relative to the current image block, and the motion vector has Integer pixel precision. When inter prediction module 41 (e.g., motion estimation module 42) performs FME on the current image block, inter prediction module 41 (e.g., motion estimation module 42) may improve the motion vectors generated by performing IME on the current image block. A motion vector generated by performing FME on a current image block may have fractional pixel precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.). After generating the motion vector of the current image block, inter prediction module 41 (e.g., motion compensation module 44) may use the motion vector of the current image block to generate a prediction block for the current image block.

The motion estimation module 42 is configured to calculate a motion vector of the current coded image block in the inter-coded frame by comparing the current coded image block with the image block of the reference image in the reference image memory 64. When the reference image in reference image memory 64 contains a value for the fractional pixel location, the motion vector computed by motion estimation module 42 may be directed to the fractional pixel location of the reference image. If the value without the fractional pixel position is stored in the reference image memory 64, the motion estimation module 42 or motion compensation module 44 is also used to calculate the value of the fractional pixel position of the reference image stored in the reference image memory 64, such as an interpolation reference. Fractional pixels of an image (such as an I frame or a P frame). As an example, a reference image may be selected from a first reference image list (List 0) or a second reference image list (List 1), each of the lists (eg, a reference image index) for identifying stored in the reference image memory 64 One or more reference images in . Motion estimation module 42 is operative to transmit or provide motion vectors to motion compensation module 44. In the embodiment of the present invention, the inter prediction module 41 (for example, the motion estimation module 42) is further configured to send the motion vector difference (MVD) information of the current coded image block to the entropy coding module 56, where the motion vector difference information is used. Representing motion vectors and motion vector predictors (eg, candidate motion vector predictors selected from candidate motion vector predictor lists, eg, best candidate predictive motion vectors) generated for the current image block using the IME and/or FME The difference between the two. In some possible implementations, inter prediction module 41 (e.g., motion estimation module 42) may output a candidate predicted motion vector index for indicating the location of the candidate predicted motion vector in the candidate predicted motion vector list to entropy encoding module 56. The following will be described in detail in conjunction with FIG. 5, and details are not described herein again.

An inter prediction module 41 (eg, motion compensation module 44) is operative to perform a motion compensation process, wherein the motion compensation process includes fetching or generating a prediction block based on motion vectors determined by motion estimation, possibly including performing fractional pixel precision Interpolation. After receiving the motion vector of the current encoded image block, motion compensation module 44 may locate the predicted block to which the motion vector is directed in one of the reference image lists. The video encoder 20 forms a residual pixel block by subtracting the pixel value of the prediction block from the pixel value of the currently encoded image block, thereby forming a pixel difference value. The pixel difference values form residual data for the block and may include both luminance and chrominance difference components. Summer 50 represents one or more components that perform this subtraction. The inter prediction module 41 (eg, motion estimation module 42 and/or motion compensation module 44) may generate some coding processing units at different levels (eg, sequence level, image level, strip level, regional partition level, CTU level, or CU level). A syntax element, such as a syntax element associated with the current image block and the current image, for use by video decoder 30 to decode the current image block of the current image, such as syntax elements associated with the current image block and video stripe. The video decoder 30 is used by the video decoder 30 to decode the current image block of the video strip. As an example, motion estimation module 42 of video encoder 20 may be operable to provide or transmit motion vector accuracy parameters to entropy encoding module 56, such that entropy encoding unit 56 entropy encodes motion vector precision parameters into a code stream, The motion vector precision parameter is used to indicate a value of a plurality of motion vector precision parameters, and the motion vector precision parameter may be carried in any one of a sequence parameter set of the current image block, an image parameter set PPS, or a slice header. In a specific example, the motion vector accuracy parameter may include a value of a motion vector precision for a current encoding processing unit and a value of at least two motion vector precisions corresponding to the motion vector precision number, such as motion vector precision. The number is 4, and the corresponding four motion vector precision values are integer pixel precision, half pixel precision, quarter pixel precision, and eighth pixel precision. As another example, motion estimation module 42 of video encoder 20 is also operative to provide or transmit to the entropy encoding module 56 indication information indicating the motion vector accuracy of a certain set of important motion vectors. The video encoding process of the embodiment of the present application will be described in detail below with reference to FIG. 5, and details are not described herein again.

After the prediction module 40 generates a prediction block of the original image block currently being encoded via inter prediction or intra prediction, the video encoder 20 forms a residual image block by subtracting the prediction block from the original image block currently being encoded. Summer 50 represents one or more components that perform this subtraction. Transform module 52 applies the transform to transform the residual data in the residual image block into residual transform coefficients, such as discrete cosine transform (DCT) or conceptually similar transform (eg, discrete sinusoidal transform DST). Transform module 52 may convert the residual data from the pixel domain to a transform domain (eg, a frequency domain).

Transform module 52 may provide or send the resulting residual transform coefficients to quantization module 54. Quantization module 54 is operative to quantize the residual transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the residual transform coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some possible implementations, quantization module 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding module 56 may perform a scan.

After quantization, entropy encoding module 56 can be used to entropy encode the quantized residual transform coefficients. For example, entropy encoding module 56 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax based context adaptive binary arithmetic coding (SBAC), probability interval segmentation entropy ( PIPE) coding or other entropy coding techniques. Entropy encoding module 56 may also be used to entropy encode the current video strip being encoded or some syntax elements of the current image block. After entropy encoding by entropy encoding module 56, the encoded code stream may be transmitted to video decoder 30 or archive for later transmission or retrieved by video decoder 30.

In the embodiment of the present application, the entropy encoding module 56 is configured to entropy encode the motion vector difference information into the code stream of the current coded image block; optionally, the entropy coding module 56 is further configured to use the motion vector accuracy of an important motion vector set. The indication information is encoded into the code stream to indicate a specific pixel precision of the motion vector predictor of the important motion vector set, such as having integer pixel precision or fractional pixel precision, such as quarter-pixel precision or eighth-pixel precision or Other fractional pixel precision.

In the embodiment of the present application, the video encoder 20 signals the motion vector predictor by, for example, implementing an Advanced Motion Vector Prediction (AMVP) mode or a merge mode. In AMVP mode, video encoder 20 (e.g., inter prediction module 41) establishes a list of candidate motion vector predictors based on motion vectors determined from previously encoded image blocks. This will be described in detail below with reference to FIG. 6A and FIG. 6B, and details are not described herein again. Video encoder 20 signals an index identification in the candidate motion vector predictor list to identify a corresponding candidate motion vector predictor (MVP) and signal motion vector difference (MVD) information. In an example implementation, the entropy encoding module 56 is configured to entropy encode an index (ie, a third identifier) indicating a motion vector predictor MVP of the current coded image block and a motion vector difference information of the current coded image block into the code stream. . Accordingly, in AMVP mode, video decoder 30 (e.g., inter prediction module 82) establishes a list of candidate motion vector predictors based on motion vectors determined from previously decoded image blocks. Determining a candidate motion vector predictor from the candidate motion vector predictor list based on the index (ie, the third identifier); video decoder 30 performs inter prediction using the motion vector to obtain a predicted block of the decoded image block, wherein The motion vector is obtained based on candidate motion vector predictor (MVP), motion vector difference (MVD) information, and adaptively determined motion vector accuracy corresponding to the index.

In the merge mode, video encoder 20 (e.g., inter prediction module 41) builds a candidate list based on the previously encoded image block; video decoder 30 builds a candidate list based on the previously decoded image block. Video encoder 20 signals the index of one of the candidates in the candidate list. In the merge mode, video decoder 30 performs inter prediction using the candidate motion vector corresponding to the index and the reference image index of the candidate motion vector to obtain a prediction block that is currently decoding the image block. In both the AMVP mode and the merge mode, video encoder 20 and video decoder 30 utilize the same list building technique, such that the list used by video encoder 20 in determining how to encode the block and video decoder 30 are determining how to decode. The list used when the block is matched.

Inverse quantization module 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for the reference image. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reconstructed image block that is used as a reference block for storage in a reference. In the image memory 64. The reconstructed image block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-predict subsequent video frames or blocks in the image.

It should be understood that other structural changes to video encoder 20 may be used to encode the video stream. For example, for certain image blocks or image frames, video encoder 20 may directly quantize the residual signal without the need for processing by transform module 52, and accordingly need not be processed by inverse transform module 58; or, for some image blocks Or the image frame, the video encoder 20 does not generate residual data, and accordingly does not need to be processed by the transform module 52, the quantization module 54, the inverse quantization module 58 and the inverse transform module 60; alternatively, the video encoder 20 can reconstruct the reconstructed image The block is stored directly as a reference block without being processed by the filter unit; alternatively, the quantization module 54 and the inverse quantization module 58 in the video encoder 20 may be merged together; or, the transform module 52 and the inverse transform in the video encoder 20 Modules 60 may be merged together; alternatively, summer 50 and summer 62 may be merged together; alternatively, reference image memory 64 may be disposed outside of video encoder 20; or video encoder 20 may be a non-hybrid encoder structure, There is no need to reconstruct the image block, and accordingly the inverse quantization module 58 and the inverse transform module 60 may not be included; or, in some scenarios, only the interframe coding is triggered. Formula corresponding inter prediction module 41 is operated. It should be understood that, in the portions not detailed in the above embodiments, reference may be made to the related description of other embodiments (for example, the embodiment shown in FIG. 5).

FIG. 3 is a schematic block diagram of a video decoder 30 in the embodiment of the present application. Referring to Figure 7, video decoder 30 may perform adaptive determination of the process for decoding the motion vector resolution (also referred to as motion vector accuracy or pixel precision) of the current image block, particularly inter-frame in video decoder 30. Prediction module 82 may perform a process of adaptively determining a motion vector resolution for decoding a current image block.

As shown in FIG. 3, video decoder 30 may include an entropy decoding module 80, a prediction module 81, an inverse quantization module 86, an inverse transform module 88, and a reconstruction module 90 (e.g., summer 90). In an example, the prediction module 81 may include an inter prediction module 82 and an intra prediction module 84, which are not limited in this embodiment of the present application.

In one possible implementation, video decoder 30 may also include reference image memory 92. It should be understood that the reference image memory 92 can also be disposed outside of the video decoder 30. In some possible implementations, video decoder 30 may perform an exemplary reciprocal decoding process with respect to the encoding flow described by video encoder 20 from FIG.

During the decoding process, video decoder 30 receives from video encoder 20 an encoded video code stream (referred to as a code stream) that represents an image block of the encoded video slice and associated syntax elements. Video decoder 30 may receive syntax elements at a video sequence level, a video stripe level, an image level, and/or an image block level. Entropy decoding module 80 of video decoder 30 entropy decodes the code stream. Entropy decoding module 80 provides or transmits partial data (eg, some syntax elements) that are entropy decoded to prediction module 81. In an embodiment of the present application, in an example, the entropy decoded data may include motion vector difference (MVD) information of a currently decoded image block. It should be noted that the MVD information may be the current decoded image block. The motion vector difference value, or may also be a scaled motion vector difference value, for example, may be obtained by scaling the MVD based on a certain motion vector precision (for example, an index value of a certain motion vector precision), which will be described below. 7 Detailed description, no more details here. Optionally, the data decoded by the entropy herein may further include a motion vector precision parameter, where the motion vector precision parameter is used to indicate a value of multiple motion vector precisions, where the motion vector precision parameter may be carried in the image block. Any one of a sequence parameter set, an image parameter set PPS, or a slice header. In a specific example, the motion vector accuracy parameter may include a value of a motion vector precision for a current decoding processing unit and a value of at least two motion vector precisions corresponding to the motion vector precision number, such as integer pixel precision. , one-half pixel precision, quarter-pixel precision, and/or one-eighth pixel accuracy. Optionally, the data decoded by the entropy herein may further include indication information of motion vector accuracy of a certain set of important motion vectors. Optionally, the entropy decoded data herein may further include an index (ie, a third identifier) for indicating a candidate motion vector predictor (MVP) of the currently decoded image block.

The inter prediction module 82 in the prediction module 81 may determine motion information of the currently decoded image block based on the entropy decoded data, the motion information including a motion vector (MV), motion vector accuracy, and optionally, the motion The information may further include reference image indication information, for example, in the case where the codec side jointly agrees on the reference image, the motion information may not include the reference image indication information. Wherein the reference image indication information is used to indicate which one or which reconstructed images are used as the reference image, and the motion vector represents the positional offset of the reference block position relative to the current block position in the used reference image, generally including the horizontal component offset And vertical component offset. For example, (x, y) is used to represent the MV, x is the positional shift in the horizontal direction, and y is the positional shift in the vertical direction. Using the position of the current block plus the MV offset, the position of its reference block in the reference image can be obtained. The reference image indication information may include a reference image list number and a reference image index corresponding to the reference image list. The reference image index is used to identify the reference image pointed to by the motion vector in the specified reference image list (RefPicList0 or RefPicList1). In the embodiment of the present application, the inter prediction module 82 is configured to acquire a motion vector predictor (MVP) of a currently decoded image block; when the motion vector predictor is included in a motion vector accuracy (referred to as a target in this document) When the motion vector accuracy corresponds to a certain motion vector set (referred to herein as a target motion vector set), based on the motion vector predictor, the entropy decoded motion vector difference information, and the target motion vector accuracy, The motion vector of the currently decoded image block. For example, when the motion vector predictor is included in the target motion vector set, determining, according to the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions, the target motion vector accuracy corresponding to the target motion vector set is the The motion vector predictor has a motion vector precision, and based on the motion vector predictor, the entropy decoded motion vector difference information, and the target motion vector precision, a motion vector of the current decoded image block is obtained. The details will be described below in conjunction with FIG. 7, and will not be described again here.

In this embodiment, the prediction module 81 is configured to generate a prediction block of the currently decoded image block; specifically, when the video slice is decoded into an intra-frame decoding (I) slice, the intra prediction module 84 of the prediction module 81 may A prediction block of an image block of the current video stripe is generated based on the signaled intra prediction mode and data from a previously decoded image block of the current frame or image. The inter prediction module 82 of the prediction module 81 generates a current video image based on the entropy decoded data received from the entropy decoding module 80 when the video image is decoded into an inter-frame decoded (eg, B, P, or GPB) stripe. The prediction block of the image block, for example, the inter prediction module 82 uses the determined motion vector to identify the prediction block in the reference image in the reference image memory 92 (also referred to as the decoded image buffer 92).

In the embodiment of the present application, in the AMVP mode, the video decoder 30 (for example, the inter prediction module 82) establishes a candidate motion vector predictor list based on the motion vector determined from the previously decoded image block; A candidate motion vector predictor is determined in the candidate motion vector predictor list. Video decoder 30 (e.g., inter prediction module 82) performs inter prediction using motion vectors to obtain a prediction block of the image block being decoded, wherein the motion vector is a candidate motion corresponding to the index based (ie, the third identification) Vector predictive value (MVP), entropy decoded motion vector difference (MVD) information, and adaptively determined motion vector accuracy are obtained. It should be understood that, in the embodiment of the present application, there are other methods for obtaining motion vector predictors, which will be described below in conjunction with FIG. 7, FIG. 8, or FIG.

The inverse quantization module 86 inverse quantizes the quantized transform coefficients decoded by the entropy decoding module 80, ie, dequantizes. The inverse quantization process may include using the quantization parameters calculated by video encoder 20 for each of the video slices to determine the degree of quantization that should be applied and likewise determine the degree of inverse quantization that should be applied. Inverse transform module 88 applies the inverse transform to transform coefficients, such as inverse DCT, inverse integer transform, or a conceptually similar inverse transform process, to generate residual blocks in the pixel domain.

Inter prediction module 82 (also referred to as motion compensation module 82) generates a motion compensated block and may also involve performing interpolation based on the interpolation filter. Alternatively, an identifier of an interpolation filter for motion estimation with fractional pixel precision may be included in the syntax element. Inter prediction module 82 may use the interpolation filters used by video encoder 20 during encoding of the image block to calculate interpolated values for the fractional pixels of the reference block. Inter prediction module 82 may determine an interpolation filter used by video encoder 20 based on the received syntax information and use the interpolation filter to generate a prediction block. For example, inter prediction module 82 may use bilinear interpolation to interpolate the value of the one-teenth pixel position of the reference block.

After the inter prediction module 82 generates the prediction block for the current image block, the video decoder 30 obtains the reconstructed by summing the residual block from the inverse transform module 88 with the corresponding prediction block generated by the inter prediction module 82. A block, ie a decoded image block. Summer 90 represents the component that performs this summation operation. A loop filter (either in the decoding loop or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality, if desired. A filter unit (not shown) may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Also, decoded image blocks in a given frame or image may be stored in decoded image buffer 92, which stores reference images for subsequent motion compensation. The reference image memory 92 can be part of a memory that can also store decoded video for later presentation on a display device (eg, display device 32 of FIG. 1), or can be separate from such memory.

It should be understood that other structural variations of video decoder 30 may be used to decode the encoded video code stream. For example, video decoder 30 may generate an output video stream without processing by a filter unit; or, for some image blocks or image frames, entropy decoding module 80 of video decoder 30 does not decode the quantized coefficients, correspondingly Processing by inverse quantization module 86 and inverse transform module 88 is required. For example, the inverse quantization module 86 and the inverse transform module 88 in the video decoder 30 may be merged together; alternatively, the reference image memory 92 may be disposed outside of the video decoder 30; or, in some scenarios, only the inter-coding mode is triggered The corresponding inter prediction module 82 operates.

It should be understood that, in the parts that are not detailed in the above embodiments, reference may be made to the related description of other embodiments.

4A is a schematic diagram of an integer pixel position and a fractional pixel position in an embodiment of the present application. Integer pixel positions, half-pixel positions, and quarter-pixel positions are illustrated in Figure 4A, it being understood that other precision pixel locations are also possible.

FIG. 4B is a schematic diagram of another integer pixel position and a fractional pixel position in the embodiment of the present application. As shown in FIG. 4B, FIG. 4B illustrates the fractional pixel position of an integer pixel (pel) 100. The integer pixel 100 corresponds to a half pixel position 102A to 102C (half pel 102), a quarter pixel position 104A to 104L (quarter pel 104), and an eighth pixel position 106A to 106AV (one eighth pel 106).

As shown in FIG. 4B, if the motion vector has an eighth pixel precision, the motion vector can point to an integer pixel location 100, a half pixel location 102, a quarter pixel location 104, or an eighth pixel location 106. Any of them. However, if the motion vector has a quarter-pixel precision, the motion vector may point to any of the integer pixel position 100, the half-pixel position 102, or the quarter-pixel position 104, but will not point to eight One pixel location 106. It should be understood that in other examples, other precisions may be used, such as one-sixteenth pixel precision, thirty-one-one pixel precision, or the like.

The value of the pixel at integer pixel location 100 can be included in the corresponding reference image. That is, the value of the pixel at integer pixel location 100 generally corresponds to the actual value of the pixel in the reference image. In the motion estimation or motion compensation process, if the motion vector accuracy of the current motion vector is fractional pixel precision (for example, one-half pixel precision, or quarter-pixel precision), then the entire pixel position of the reference image is required. The pixel value is interpolated using an interpolation filter to obtain a pixel value of the fractional pixel position as the value of the prediction block of the current block. The specific interpolation operation process is related to the interpolation filter used. For example, an adaptive interpolation filter or a fixed interpolation filter can be used to interpolate the half-pixel position 102, the quarter-pixel position 104, and the eighth. The value of a pixel location 106 (collectively referred to as a fractional pixel location). In general, the value of the fractional pixel location may be interpolated from one or more neighboring pixels, where adjacent pixels may be values of adjacent integer pixel locations or previously determined fractional pixel locations.

The following describes in detail how video codecs (eg, video encoder 20, video decoder 30) adaptively have, for example, integer pixel precision or fractional pixel precision (eg, one-eighth pixel precision and quarters). The accuracy of the motion vector is chosen between one pixel precision). Video encoder 20 may make this selection for each motion vector, each CU, each LCU, every slice, every frame, every GOP, or other decoding unit. When video encoder 20 selects a quarter pixel precision of the motion vector, the motion vector may refer to any of integer pixel location 100, half pixel location 102, or quarter pixel location 104. When the video encoder 20 selects the eighth-pixel precision of the motion vector, the motion vector may refer to an integer pixel location 100, a half-pixel location 102, a quarter-pixel location 104, or an eighth pixel. Any of the locations 106.

FIG. 5 is a schematic flowchart of a video encoding method according to an embodiment of the present application. The process 500 shown in Figure 5 can be performed by a video encoding device, a video encoder (e.g., video encoder 20), and other devices having video encoding capabilities. Process 500 is described as a series of steps or operations, it being understood that process 500 can be performed in various sequences and/or concurrently, and is not limited to the order of execution shown in FIG. Assuming that a video data stream having multiple video frames is using a video encoder, a process 500 comprising the steps of encoding the current image block of the current video frame is performed;

The method shown in FIG. 5 may include steps 501 to 511, and step 501 to step 511 are described in detail below.

Step 501: Perform a motion estimation process on an image block currently being encoded (referred to as a current coded image block or a current image block for short) to obtain a motion vector of a current coded image block, where the motion vector is used to indicate that the currently coded image block is relatively The displacement of the predicted block within the reference image;

In an example implementation, step 501 can be performed by inter prediction module 41 of video encoder 20, such as motion estimation module 42; wherein motion estimation module 42 compares the encoded image using quarter-pixel precision motion vectors The difference between the block and the block using the eighth-pixel precision motion vector coding image. For example, motion estimation module 42 uses one or more quarter-pixel precision motion vector encoded image blocks in a first coding pass and one or more eighth-pixel precision motion vector encoded images in a second coding round. Piece. Motion estimation module 42 may further use various combinations of one or more quarter-pixel precision motion vectors of the block and one or more eighth-pixel precision motion vectors in the third coding round. Motion estimation module 42 may calculate rate-distortion values for each coding pass of the image block and calculate a difference between the rate-distortion values. For example, for different combinations of search starting points (such as the search starting point of the available motion vectors pointed by the spatial neighboring block of the currently decoded image block and/or the temporally adjacent block) and the pixel precision, the motion estimation module 42 performs the reference image in the reference image. Corresponding score or integer pixel step motion search, get the corresponding prediction block. A matching error between each of the predicted blocks and the current coded image block is calculated, and a prediction block with the smallest matching error and its corresponding motion vector are selected as the optimal prediction block and the corresponding optimal motion vector. Among them, the matching error can be calculated by using the SAD criterion.

Step 503: Acquire a motion vector predictor of the current coded image block.

In an exemplary implementation, step 503 may be performed by inter prediction module 41 of video encoder 20, such as motion estimation module 42; it should be understood that in some implementations, steps 501 and 503 are not According to the sequence, when the motion vector of the current coded image block is obtained, correspondingly, the motion vector predictor of the current coded image block can be obtained.

In step 503, if the inter prediction mode of the current coded image block is the AMVP mode, the AMVP candidate motion vector predictor list (ie, the AMVP candidate MVP list) of the current coded image block is constructed; and the candidate is obtained according to the rate distortion cost criterion. A candidate motion vector predictor (also referred to as a best candidate MVP) for the currently encoded image block is selected in the motion vector prediction list; for example, the candidate motion vector predictor MVP encodes a rate distortion penalty of the current encoded image block The smallest.

Referring to FIG. 6A together, taking the AMVP mode as an example, the candidate MV prediction value list of the current image block may include two motion vector predictors in the spatial domain and the time domain, and A0-A1 and B0-B2 are spatial neighboring block positions of the current image block. (also referred to as spatial domain candidate reference block location), T _TL , T _{C ,} and T _BR are optional time domain neighboring block locations (also referred to as time domain candidate reference block locations) of the current image block, and the application is not limited thereto. Generally, the MVs of the spatial domain and the temporal neighboring blocks are sequentially acquired from the spatial neighboring block position and the temporal neighboring block position as a plurality of candidate motion vector predictors to form a candidate MV predicted value list.

As an example implementation, referring to FIG. 6A, the AMVP candidate MV predictor list generation process is as follows:

a) First obtain the airspace candidate motion information. A candidate MV prediction value is generated on the left side and the upper side of the current image block, and the candidate MV prediction value on the left side is obtained from A0, A1, the scaled A0, and the scaled A1, and the upper candidate MV prediction value is from B0, B1, B2. , zoomed B0, scaled B1, and scaled B2 are obtained;

b) Next, the time domain candidate motion information is acquired. Obtaining motion information of a right-side spatial adjacent block T _BR of a right-side concatenated block (ie, a co-located block) of a current image block in a neighboring coded picture (also referred to as a reference picture) or an intermediate block T _{C of a} co-located block Movement information;

c) combining the spatial candidate motion information and the time domain candidate motion information, retaining the first two candidate motion information, and less than two candidate motion information complementing the zero motion information (0, 0) to form a final candidate MV predictor list.

An example table of AMVP candidate motion vector predictor lists is as follows:

MV IdxMV Idx	MV预测值(用位置表示)MV prediction value (represented by position)
00	MV _A1 MV _A1
11	MV _B1 MV _B1

Table 1

Or, referring to FIG. 6B, taking the AMVP mode as an example, the candidate MV prediction value list of the current image block may include two motion vector predictors in the spatial domain and the time domain, and A, L, AL, AR, and BL are spatial neighbors of the current image block. The block position (also referred to as the spatial domain candidate reference block position), T is an optional time domain neighboring block position (also referred to as a time domain candidate reference block position) of the current image block, and the application is not limited thereto. Generally, the MVs of the spatial domain and the temporal neighboring blocks are sequentially acquired from the spatial neighboring block position and the temporal neighboring block position as a plurality of candidate motion vector predictors to form a candidate MV predicted value list. Another example table of AMVP candidate motion vector predictor lists is as follows:

MV IdxMV Idx	MV预测值(用位置表示)MV prediction value (represented by position)
00	MV _BL MV _BL
11	MV _T MV _T

Table 2

Other methods for establishing a candidate motion vector predictor list or other methods for obtaining a motion vector predictor are described in the prior art and will not be described herein.

Step 505: When the motion vector predictor is included in the target motion vector set corresponding to the target motion vector accuracy (for example, when the motion vector predictor is included in the target motion vector set, according to the multiple motion vector set Corresponding to the plurality of motion vector precisions, determining that the target motion vector accuracy corresponding to the target motion vector set is a motion vector accuracy of the motion vector predictor having a corresponding motion vector, based on the motion vector of the current encoded image block And the motion vector predictor and the target motion vector precision, to obtain motion vector difference (MVD) information of the current coded image block; wherein the motion vector difference (MVD) of the current coded image block has a target Motion vector accuracy, the target motion vector set being one of a plurality of motion vector sets, the target motion vector accuracy being one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector precision, The plurality of motion vector sets includes a first motion vector set and a second motion vector set, the first At least one of the motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion vector set corresponding to the first motion vector set is different from the second motion The second motion vector accuracy corresponding to the vector set; preferably, the motion vector of the current coded image block and the motion vector predictor have the same motion vector accuracy, that is, the target motion vector accuracy;

In an example implementation, step 505 can be performed by inter prediction module 41 of video encoder 20, such as motion estimation module 42;

In some possible implementations, step 505 can include:

Obtaining motion vector difference (MVD) information of the current coded image block based on the motion vector of the current coded image block, the motion vector predictor, and the target motion vector accuracy; wherein, as an example implementation, the calculating a difference MVD between a motion vector of the current coded image block and the motion vector predictor, the motion vector of the current coded image block, the motion vector predictor, and the MVD having the same motion vector accuracy (ie, each having a target motion Vector accuracy). As another example implementation manner, calculating a difference MVD between a motion vector of the current coded image block and the motion vector predictor, where the motion vector of the current coded image block and the motion vector predictor have the same motion vector Accuracy (ie, each having a target motion vector accuracy); and scaling (eg, reducing) the calculated MVD based on the target motion vector precision (eg, an index of the target motion vector accuracy) to obtain a scaled (eg, zoom out) the processed motion vector difference MVD. For example, Scaled MVD=MVD>>mvrIdx, where >> denotes a right shift, Scaled MVD denotes a scaled motion vector difference MVD, and mvrIdx denotes an index value of the target motion vector precision, for example, assuming MVD=16, target motion The vector precision index value = 2, Scaled MVD = 16 >> 2 = 4. It should be understood that if mvrIdx = 0 (e.g., shifting 0 bits to the right), the reduced MVD is the original MVD. As an alternative implementation, if mvrIdx=0 (eg, shifting 0 bits to the right), then the calculated MVD may not need to be reduced (ie, scale); if mvrIdx is not equal to 0 (eg, mvrIdx=2), Then, the calculated MVD is scaled (eg, reduced) based on the index of the target motion vector precision (eg, shifted right by 2 bits) to obtain a scaled (eg, reduced) processed motion vector difference MVD, which is reduced. The number of occupied MVD occupied bits is smaller than the original MVD occupied bits.

In the embodiment of the present application, the first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is different from the second motion in the plurality of motion vector sets. a second distance between the second neighboring block and the current image block corresponding to the motion vector predictor in the vector set, the first neighboring block and the second neighboring block being included in the airspace neighboring block of the current image block and/or The time domain is adjacent to the block. It should be noted that the first distance here is, for example, the distance of the pixel position of the upper left corner of the first neighboring block relative to the pixel position of the upper left corner of the current image block, or the pixel position of the center point of the first neighboring block relative to the current image. The distance of the center point pixel position of the block; the second distance here is, for example, the distance of the pixel position of the upper left corner of the second neighboring block relative to the pixel position of the upper left corner of the current image block, or the pixel position of the center point of the second neighboring block is relative The distance from the center point pixel position of the current image block, but the application is not limited thereto.

In some possible implementations, the first neighboring block corresponding to the motion vector predictor in the first motion vector set corresponds to a first distance of the current image block that is smaller than a motion vector predictor in the second motion vector set. In the case of the second distance between the second neighboring block and the current image block, the first motion vector precision corresponding to the first motion vector set is smaller than the second motion vector precision corresponding to the second motion vector set; or

The first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is greater than the second neighboring block corresponding to the motion vector predictor in the second motion vector set In the case of the second distance from the current image block, the first motion vector accuracy corresponding to the first motion vector set is greater than the second motion vector precision corresponding to the second motion vector set.

It should be understood that the correspondence between the multiple motion vector sets and the multiple motion vector precisions may be preset or locally determined by the video encoder. For the latter, in one example, the Corresponding relationship between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to the motion vector precision assignment rule; wherein the motion vector precision assignment rules are used to represent the motion vector predictor values included in the motion vector set The relationship between the distance between the corresponding neighboring block and the current image block and the magnitude of the motion vector precision. For example, it may be a correspondence relationship set according to a set of preset fixed value motion vector precision and motion vector precision assignment rule "the closer the distance is, the higher the motion vector accuracy is, the farther the distance is, the lower the motion vector accuracy is."

Table-3 or Table-4 illustrates a one-to-one correspondence between multiple motion vector sets and various motion vector precisions. It should be understood that multiple motion vector sets and multiple motion vector precisions may also be M:N correspondences. Relationship, M is not equal to N. For example, multiple motion vector sets and multiple motion vector precisions may also be one-to-many or many-to-one correspondence, which will be described in detail below.

As shown in Table-3 or Table-4, in the embodiment of the present application, the distance between the predicted motion vector predictor value (ie, the neighboring block where the candidate motion vector predictor is located) and the current image block may be used to construct the candidate. The plurality of candidate motion vector predictors of the motion vector prediction list are divided into a plurality of motion vector sets, each motion vector set including one or more candidate motion vector predictors for constructing the candidate motion vector prediction list, and at least one motion vector set A plurality of candidate motion vector predictors are included; the encoding end and the decoding end pre-arrange a set of motion vector precisions, for example, {1/4, 1/2, 1, 2, 4, 8, etc., and the encoding end selects, for example, according to a probability statistical result. The optimal motion vector accuracy of each motion vector set; or the motion vector accuracy corresponding to each motion vector set (each position MV prediction value) is pre-agreed by the encoding end and the decoding end.

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A1,MV _B1 MV _A1 , MV _B1	1/41/4
11	MV _A0,MV _B0 MV _A0 , MV _B0	1/21/2
22	MV _B2 MV _B2	11
33	MV _T MV _T	22
44	以其它方式获取的MVPMVP obtained in other ways	44

table 3

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A,MV _L MV _A , MV _L	1/41/4
11	MV _AL MV _AL	1/21/2
22	MV _AR,MV _BL MV _AR , MV _BL	11
33	MV _T MV _T	22
44	以其它方式获取的MVPMVP obtained in other ways	44

Table 4

Step 507: Encode entropy of motion vector difference (MVD) information of the current coded image block into a code stream.

In an example implementation, step 507 may be performed by entropy encoding module 56 of video encoder 20;

In step 507, the MVD information and the third identifier entropy of the current coded image block may be encoded into a code stream, where the third identifier is used to indicate a motion vector predictor value MVP of the current coded image block, for example, the third identifier is a candidate MVP. The index number of the candidate MVP (for example, the optimal candidate MVP) selected in the list. It should be noted that the MVD information of the current coded image block herein may be the MVD of the current coded image block, or may be a scaled MVD (for example, the reduced MVD).

Optionally, in some feasible implementations, the process 500 may further include entropy encoding a motion vector precision parameter into a code stream, where the motion vector precision parameter is used to indicate a value of the multiple motion vector precision, The motion vector accuracy parameter is carried in any one of a sequence parameter set SPS, an image parameter set PPS, a slice header, or other positions of the image block. As a specific example, the motion vector accuracy parameter may include a motion vector accuracy number for a current encoding processing unit and at least two motion vector precision values corresponding to the motion vector precision number, wherein the current The encoding processing unit includes one or more of a video sequence, an image, a slice, a Partition, a CTU, and a CU.

Referring to FIG. 6C, the encoding end (eg, video encoder 20) employs a rate-distortion optimization algorithm (ie, RDO) at different encoding processing units (eg, sequence level, image level, strip level, region level, CTU level, or CU level). The optimal set of motion vector precision values is selected and written to the decoder (e.g., video decoder 30). Taking the image level as an example, if the encoding end obtains the optimal motion vector set accuracy of the current image according to RDO as {1/4, 1/2, 1, 2}, the set of motion vector precision is written into the PPS header. In the information, the specific syntax structure Picture parameter set RBSP syntax is shown in FIG. 6C, wherein pps_amvr_number represents the number of motion vector precisions available in the current image, and the value range is a positive integer greater than or equal to 1; the value of pps_amvr_number can be The number and location determination of the candidate motion vector predictors used to construct the candidate motion vector predictor list are determined. Pps_amvr_set[pps_amvr_number] represents the value of the motion vector precision available in the current image.

It should be understood that the encoding end (e.g., video encoder 20) may also not need to encode the motion vector accuracy parameters into the code stream. Accordingly, the decoding end (e.g., video decoder 30) may employ a pre-agreed set with the encoding end. The motion vector accuracy can even adopt the motion vector precision corresponding to each motion vector set (the position MV prediction value) that is pre-agreed by the encoding end.

Optionally, in some possible implementations, the process 500 may further include:

Encoding the first identifier into the code stream, where the first identifier is used to indicate the third motion vector precision corresponding to the third motion vector set; in other words, the third motion vector precision corresponding to the transmission may be pre-agreed between the codecs Which motion vector set (or which position motion vector predictor);

It should be noted that the third motion vector set herein represents a set of important motion vector predictor values, that is, a motion vector set formed by one or more motion vector predictors whose probability of being the best motion vector predictor is the largest or larger. The third motion vector accuracy is the motion vector precision used by the third motion vector set. The third motion vector set may be a default set of motion vectors, such as a first motion vector set in the multiple motion vector sets (such as motion vector set 0 in Table-5); of course, The three motion vector precision set may also be a second motion vector set, or may be other motion vector sets different from the first or second motion vector set; in addition, the third motion vector precision may be the multiple motions The first motion vector accuracy in the vector precision, or the second motion vector accuracy, or other motion vector precision, is not limited in this application.

table 5

运动矢量精度索引Motion vector precision index	运动矢量精度Motion vector accuracy	二进制码Binary code
00	1/81/8	0000
11	1/41/4	0101
22	1/21/2	1010
33	11	1111

Table-6

This identification information (eg, the first identification) can be used for different codec processing units such as sequence level, image level, strip level, area level, image block level (eg, CTU level, CU level). Correspondingly, the identification information may be carried in a sequence parameter set SPS, an image parameter set PPS or a slice header of the image block, such as SPS header information and PPS header information.

When the motion vector accuracy is an image level, its identification information (for example, the first identifier, specifically the motion vector precision index as shown in Table 6) is written into the image parameter set PPS header information, and the specific syntax structure "Picture parameter set" RBSP syntax" see FIG. 6D, where pps_amvr_accuracy is 0, indicating that the motion vector accuracy of a certain motion vector set (for example, motion vector set 0) is 1/8 in the encoding and decoding process of any image block in the current image. When pps_amvr_accuracy is 1, it indicates that the motion vector accuracy of a certain motion vector set (for example, motion vector set 0) is 1/4 in the encoding and decoding process of any image block in the current image, and the current image is represented by pps_amvr_accuracy 2. Regardless of which image block is encoded or decoded, the default motion vector set (for example, motion vector set 0) has a motion vector precision of 1/2, and pps_amvr_accuracy of 3 indicates which image block is in the current image. During the encoding and decoding process, the default motion vector set of a certain motion vector set (for example, motion vector set 0) is 1.

Optionally, as an alternative implementation, the process 500 may further include:

Encoding the first identifier and the second identifier into a code stream, the first identifier is used to indicate a third motion vector precision, and the second identifier is used to indicate a third motion vector set; in other words, may be transmitted in the code stream Which of the motion vector sets corresponds to the third motion vector accuracy;

The identification information (for example, the first identifier and the second identifier) can be used for different codec units such as sequence level, image level, strip level, area level, CTU level, and CU level. Correspondingly, the identification information can be carried in a sequence parameter set, an image parameter set or a slice header of the image block.

When the motion vector precision is an image level, the identification information (for example, the first identifier (specifically, the motion vector precision index) and the second identifier (specifically, the index of the motion vector set) are written into the image parameter set PPS header information. For the specific syntax structure "Picture parameter set RBSP syntax", see FIG. 6E, where pps_amvr set_idx represents an index of the motion vector set, and its value range is a positive integer, that is, there is at least one motion vector set; pps_amvr set_accuracy[pps_amvr set_idx] indicates motion The vector collection pps_amvr set_idx corresponds to the value of the motion vector precision. Where pps_amvr set_accuracy[pps_amvr set_idx] is 0, indicating that the motion vector of the motion vector set specified by pps_amvr set_idx is 1/8 in the encoding and decoding process of the image block in the current image; pps_amvr set_accuracy[pps_amvr set_idx] is 1 indicates that the motion vector of the motion vector set specified by pps_amvr set_idx has a precision of 1/4 in the encoding and decoding process of the image block in the current image, and pps_amvr set_accuracy[pps_amvr set_idx] is 2 indicates the current image. Regardless of which image block is encoded or decoded, the motion vector set of the motion vector set specified by pps_amvr set_idx has a precision of 1/2, and pps_amvr set_accuracy[pps_amvr set_idx] of 3 indicates which image block is in the current image. During the decoding process, the motion vector set of the motion vector set specified by pps_amvr set_idx has a precision of 1.

Further, in order to rebuild the image block, the process 500 of the embodiment of the present invention may further include:

Step 509: Obtain, according to a motion vector of the current coded image block, a prediction block of a current coded image block, where the motion vector has a target motion vector accuracy;

In an example implementation, step 509 may be performed by inter prediction module 41 of video encoder 20, such as motion compensation module 44;

Motion compensation performed by motion compensation module 44 may involve extracting or generating a prediction block based on a motion vector, for example, motion compensation module 44 may locate a prediction block to which the motion vector is directed in one of the reference image lists. Motion compensation module 44 may also involve performing interpolation of fractional pixel precision. Referring to Figure 4B, the value of the pixel at integer pixel location 100 generally corresponds to the actual value of the pixel in the reference image. If the target motion vector accuracy is fractional pixel precision (eg, one-half pixel precision, or quarter-pixel precision, or one-eighth pixel precision), then the pixel value of the integer pixel position of the reference image is required. The interpolation filter performs interpolation to obtain a pixel value of the fractional pixel position, thereby obtaining the value of the prediction block of the current block.

Step 511: Perform reconstruction on the current coded image block based on the prediction block.

In an example implementation, step 511 may be performed by a reconstruction module 62 of video encoder 20, such as summer 62.

It should be understood that for certain image blocks or image frames, video encoder 20 does not generate residual data, and accordingly, reconstructs the current coded image block based on the prediction block; for certain image blocks or image frames Video encoder 20 generates residual data (also referred to as a residual block), and accordingly reconstructs the current encoded image block based on the predicted block and the residual block.

It can be seen that, in the method of the embodiment of the present application, on the one hand, after the video encoder obtains the motion vector predictor of the current coded image block, the target motion vector accuracy corresponding to the target motion vector set to which the motion vector predictor belongs is obtained. The adaptive motion vector accuracy for the current coded image block can be adaptively determined. If the number of motion vector sets is N, the video encoder can support M motion vector MV precision, where M is less than or equal to N, M, and N. All of them are positive integers, for example, M, N is greater than or equal to 3, which improves the motion vector prediction accuracy. Specifically, since the embodiment of the present invention can adaptively select motion vector accuracy, one or more images corresponding to some video content. Blocks, using motion vectors with higher pixel precision (eg 1/8 pixel precision) versus motion vectors using lower pixel precision, improve video codec quality, and the benefits are better than interpolation overhead costs, such as a prediction block obtained based on a motion vector having high pixel precision (for example, 1/8 pixel precision) is closer to the original of the currently coded image block Block, even if interpolating the fractional pixel position results in some interpolation overhead; for one or more image blocks corresponding to some video content, using lower pixel precision motion vectors (eg integer pixel precision) versus using higher pixels The precision motion vector has neither reduced the video coding and decoding quality nor the interpolation overhead cost. Therefore, the video coding method in the embodiment of the present application improves the coding and decoding performance as a whole.

On the other hand, the video encoder can adaptively select the target motion vector precision to encode the MVD information, for example, the scaled (eg, reduced) processed MVD occupies fewer bits or occupies the same number of bits relative to the original MVD, The former can also reduce the bit transmission overhead, thereby further improving the codec performance;

In a further aspect, the video encoder of the embodiment of the invention does not signal any information related to the motion vector accuracy or only signals the motion vector precision of the important motion vector set, and can further save the bit while improving the coding efficiency. Overhead, which further improves the codec performance.

FIG. 7 is a schematic flowchart of a video decoding method according to an embodiment of the present application. The process 700 shown in Figure 7 can be performed by a video decoding device, a video decoder (e.g., video decoder 30), and other devices having video decoding capabilities. Process 700 is described as a series of steps or operations, it being understood that process 700 can be performed in various sequences and/or concurrently, and is not limited to the order of execution shown in FIG. Assuming that a video data stream having multiple video frames is using a video decoder, a process 700 comprising the steps of decoding the current image block of the current video frame is performed;

The method shown in FIG. 7 includes steps 701 to 709, and steps 701 to 709 are described in detail below.

Step 701: Receive a code stream, where the code stream carries motion vector difference (MVD) information of an image block currently being decoded (referred to as a current decoded image block or a current image block), and from the code Parsing the MVD information in the stream;

For example, step 701 can be performed by entropy decoding module 80 of video decoder 30;

In an implementation manner, step 701 can include: receiving a code stream, where the code stream includes motion vector difference (MVD) information of a currently decoded image block and a candidate motion vector predictor (MVP) for indicating a currently decoded image block. a third identifier (eg, an index number) from which the MVD information and the third identifier are parsed;

The MVD information of the current decoded image block may be the MVD of the currently decoded image block, or may be the scaled processed MVD of the currently decoded image block.

Step 703: Acquire a motion vector predictor MVP of the current decoded image block.

For example, step 703 can be performed by prediction module 81 of video decoder 30, such as inter prediction module 82 (also referred to as motion compensation module 82) in prediction module 81;

In an implementation manner, step 703 may include determining a candidate motion vector predictor of the currently decoded image block from the candidate motion vector prediction list based on the third identifier (eg, an index number); as an example, in In the case where the inter prediction mode of the currently decoded image block is the AMVP mode, an AMVP candidate MVP list of the current decoded image block is constructed; and the AMVP candidate is based on the third identifier (eg, an index number) A candidate motion vector predictor (also referred to as a best candidate MVP) of the current decoded image block is determined in the motion vector predictor list; and referring to FIG. 6A or FIG. 6B, the video decoder 30 may employ a video encoder 20 similar to the video encoder 20 The method of constructing the AMVP candidate motion vector predictor list of the currently decoded image block is as shown in Table-1. For example, when the index number idx=1, it indicates that the best candidate MVP is MV _B1 .

In another implementation manner, referring to FIG. 8, a Decoder-side motion vector derivation (DMVD) technique, in particular, a template matching (TM) method, may be used to obtain the current decoding. The motion vector predictor (MVP) of the image block. The TM method refers to determining a part of the area adjacent to the current image block space as a template TMc (for example, a part of the adjacent left part and/or a part of the adjacent upper side), for which the template searches for the best matching template in the reference image ( It has the same size and shape as the template, called TMr. The predicted value of the motion vector of the current image block is determined according to the positions of the two templates. The interval between the two templates may be an integer pixel value or a non-integer pixel value, for example, but not limited to, one-half pixel, one-quarter pixel or one-eighth pixel. For details, refer to the prior art, and details are not described herein again.

In another implementation manner, referring to FIG. 9, a Decoder-side motion vector derivation (DMVD) technique, in particular, a Bilateral matching method, is used to obtain the current decoded image block. Motion vector predictor (MVP). Among them, the principle of the two-way matching method is to assume that the object is moving in a straight line, and the most matching block can be found among two different reference frames along the motion track of the current block. For example, given a forward MV0, first determine its corresponding backward MV1, MV0 and MV1 are mirrored. Specifically, from the magnitude of the MV, MV0 and MV1 are proportional to the time domain interval (TD0, TD1) between the current image and the two reference images. From the direction of the MV, MV0 and MV1 point in opposite directions. Two matching blocks are respectively determined among the two reference frames according to MV0 and MV1, and the relative positions of the two matching blocks with respect to the current block are the predicted values of the MV of the current block. The MV pair with the smallest SAD can be selected as the motion vector predictor that is closest to the motion vector of the current block. For details, refer to the prior art, and details are not described herein again.

Step 705: When the motion vector predictor is included in the target motion vector set corresponding to the target motion vector precision (for example, when the motion vector predictor is included in the target motion vector set, according to the multiple motion vector set Corresponding to the plurality of motion vector precisions, determining that the target motion vector accuracy corresponding to the target motion vector set is the motion vector accuracy of the motion vector predictor), based on the motion vector predictor, the motion vector difference The value information and the target motion vector precision, the motion vector of the current decoded image block is obtained, wherein the motion vector of the current decoded image block has the target motion vector precision, and the target motion vector set is a plurality of motions One of the vector sets, the target motion vector precision being one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector accuracy, the plurality of motion vector sets including the first motion vector set and a second motion vector set, the first motion vector set and the second motion vector set At least one motion vector set includes two or more motion vector predictors, and the first motion vector accuracy corresponding to the first motion vector set is different from the second motion vector corresponding to the second motion vector set Accuracy

For example, step 705 can be performed by prediction module 81 of video decoder 30, such as inter prediction module 82 in prediction module 81;

In some possible implementations, referring to FIG. 10, step 705 may include: step 7051 and step 7052, and step 7051 and step 7052 will be described in detail below.

Step 7051: determining that the motion vector predictor is included in the target motion vector set, and determining, according to a correspondence between the multiple motion vector sets and the plurality of motion vector precisions, a target motion vector accuracy corresponding to the target vector set. The motion vector predictor has a corresponding/corresponding motion vector accuracy;

The first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current image block is different from the motion vector in the second motion vector set. And a second distance between the second neighboring block and the current image block corresponding to the predicted value, the first neighboring block and the second neighboring block being included in the spatial neighboring block and/or the time domain neighboring block of the current image block.

It should be noted that the first distance herein is, for example, the distance of the pixel position of the upper left corner of the first neighboring block relative to the pixel position of the upper left corner of the current image block, or the pixel position of the center point of the first neighboring block relative to the current image block. The distance of the center point pixel position; the second distance here is, for example, the distance of the pixel position of the upper left corner of the second neighboring block relative to the pixel position of the upper left corner of the current image block, or the center point pixel position of the second neighboring block relative to the current The distance from the center point pixel position of the image block, but the application is not limited thereto. It should be understood that the first and second statements herein are merely for convenience of description.

It should be noted that the spatial neighboring block herein may include one or more spatial neighboring blocks adjacent to the current image block in the image of the current image block. As shown in FIG. 6A, the spatial neighboring block of the current image block includes: a spatial neighboring block A0 located at a lower left side of the current image block, and a spatial neighboring block A1 located at a left side of the current image block, located in the current image. The spatially adjacent block B0 on the upper right side of the block is located in the spatial neighboring block B1 on the upper side of the current image block, and/or in the spatial adjacent block B2 on the upper left side of the current image block.

It should be noted that the time domain neighboring block herein may include one or more airspace neighboring blocks in the reference image adjacent to the co-located block, and/or one of the collocated blocks or a plurality of sub-blocks, wherein the collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block. The reference image herein refers to a reconstructed image. Specifically, the reference image herein refers to a reference image in one or more reference image lists, for example, may be a reference corresponding to a specified reference image index in the specified reference image list. The image may also be the reference image in the first position in the default reference image list, which is not limited in this application. It should be noted that no matter which kind of neighboring block, it refers to a motion vector image block (also referred to as an encoded image block or a decoded image block). 6A, the current image block adjacent time-domain blocks may include: a lower right airspace collocated block of the current image block (co-located blocks) neighboring blocks T _BR, facing the intermediate block and Block T _c , and/or the upper left block T _TL block of the collocated block.

In some possible implementations, the first neighboring block corresponding to the motion vector predictor in the first motion vector set corresponds to a first distance of the current image block that is smaller than a motion vector predictor in the second motion vector set. In the case of the second distance between the second neighboring block and the current image block, the first motion vector accuracy corresponding to the first motion vector set is higher than the second motion vector precision corresponding to the second motion vector set; or

The first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is greater than the second neighboring block corresponding to the motion vector predictor in the second motion vector set In the case of the second distance from the current image block, the first motion vector accuracy corresponding to the first motion vector set is lower than the second motion vector accuracy corresponding to the second motion vector set.

In this embodiment of the present application, the correspondence between multiple motion vector sets in step 7051 and multiple motion vector precisions is obtained in at least four ways:

The first mode: the correspondence between the multiple motion vector sets and the multiple motion vector precisions may be preset by the decoding end (for example, the video decoder 30), for example, according to a preset rule or a preset manner, for example, It may be a correspondence relationship set according to a set of preset fixed value motion vector precision and a preset motion vector precision assignment rule "the closer the distance is, the higher the motion vector accuracy is, the farther the distance is, the lower the motion vector accuracy is."

Referring to FIG. 6A and Table-7A, or referring to FIG. 6B and Table-8A, in the embodiment of the present application, the distance between the location of the candidate motion vector predictor (ie, the neighboring block where the candidate motion vector predictor is located) and the current image block may be Determining a plurality of motion vector sets, for example, dividing the plurality of candidate motion vector predictors used to construct the candidate motion vector prediction list into a plurality of motion vector sets, each motion vector set including one or one for constructing a candidate motion vector prediction list a plurality of candidate motion vector predictors, and the at least one motion vector set includes a plurality of candidate motion vector predictors; the encoding end and the decoding end pre-agreed a set of motion vector precisions, for example, {1/4, 1/2, 1, 2, 4,8, etc., the decoding end according to the preset motion vector accuracy assignment rule "the closer the distance, the higher the motion vector accuracy, the farther the distance is, the lower the motion vector accuracy" sets the motion vector precision of each motion vector set.

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A1,MV _B1 MV _A1 , MV _B1	1/41/4
11	MV _A0,MV _B0 MV _A0 , MV _B0	1/21/2
22	MV _B2 MV _B2	11
33	MV _T MV _T	22

Table-7A

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A,MV _L MV _A , MV _L	1/41/4
11	MV _AL MV _AL	1/21/2
22	MV _AR,MV _BL MV _AR , MV _BL	11
33	MV _T MV _T	22

Table-8A

The second mode: the correspondence between the multiple motion vector sets and the multiple motion vector precisions may be preset by the decoding end (for example, the video decoder 30). For example, the encoding end and the decoding end pre-agreed each motion vector set (pre- Set the position motion vector prediction values belonging to each motion vector set and the preset motion vector precision; as shown in Table-7B and Table-8B, for the other way to obtain the MVP, the corresponding motion vector precision may also be preset.

Table-7B

Table-8B

The third mode: the correspondence between the multiple motion vector sets and the multiple motion vector precisions may be preset by the decoding end (for example, the video decoder 30), for example, according to a preset rule or a preset manner, and the encoding is performed. The terminal and the decoding end pre-arrange a set of motion vector precisions, for example, {1/4, 1/2, 1, 2, 4, 8, etc., which differs from the first mode in that the decoding end determines each of the results according to, for example, probability statistics. The motion vector accuracy of the motion vector set, wherein the aforementioned probability statistics result is used to calculate which of the motion vectors is used with the highest probability;

The fourth mode: the correspondence between the multiple motion vector sets and the multiple motion vector precisions may be locally determined by the decoding end (for example, the video decoder 30). In one example, the multiple motion vector sets are combined. The correspondence between the motion vector precisions is determined according to the motion vector precision assignment rule; specifically, the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is based on the motion vector precision assignment rules and the parsing from the code stream Determined by the motion vector precision parameter, the motion vector precision assignment rule is used to represent the distance between the adjacent block corresponding to the motion vector predictor included in the motion vector set and the current image block and the motion vector accuracy. The relationship between the values, in other words, "the closer the distance, the higher the motion vector accuracy, and the farther the distance is, the lower the motion vector accuracy is."

Correspondingly, the code stream received in step 701 further carries a motion vector precision parameter, where the motion vector precision parameter is used to indicate a value of the multiple motion vector precision, and the motion vector precision parameter is carried in the image block. Any of sequence parameter set SPS, image parameter set PPS, strip header or other location. As a specific example, the motion vector accuracy parameter may include a motion vector accuracy number for a current decoding processing unit and at least two motion vector precision values corresponding to the motion vector precision number, wherein the current The decoding processing unit includes one or more of a video sequence, an image, a slice, a region partition, a CTU, and a CU. Preferably, the plurality of motion vector precisions herein are greater than or equal to three motion vector precisions.

Referring to FIG. 6C, a set of motion vector precision parameters are carried in the image parameter set PPS header information. In the specific syntax structure Picture parameter set RBSP syntax, pps_amvr_number represents the number of motion vector precisions available in the current image, and the value range is A positive integer greater than or equal to 1; the value of pps_amvr_number may be determined according to the number and location of candidate motion vector predictors used to construct the candidate motion vector predictor list. Pps_amvr_set[pps_amvr_number] represents the value of the motion vector precision available in the current image.

Taking pps_amvr_number=4, pps_amvr_set[0]=1/4, pps_amvr_set[1]=1/2, pps_amvr_set[2]=1, pps_amvr_set[3]=2 as an example, consider the basis of the number of candidate motion vector predictors Upper, according to the distance between the neighboring block position of the candidate motion vector predictor and the current block, the closer the distance is, the higher the motion vector accuracy is, and the farther the distance is, the lower the motion vector accuracy is. Therefore, the motion vector precision of each motion vector set can be set as above - 8A is shown.

Taking pps_amvr_number=3, pps_amvr_set[0]=1/4, pps_amvr_set[1]=1, pps_amvr_set[2]=4 as an example, considering the number of candidate motion vector predictors, based on the predicted motion vector predictor The distance between the adjacent block position and the current block. The closer the distance is, the higher the motion vector accuracy is. The farther the distance is, the lower the motion vector accuracy is. Therefore, the motion vector accuracy of each motion vector set can be set as shown in Table -8C.

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A,MV _L MV _A , MV _L	1/41/4
11	MV _AL,MV _AR,MV _BL MV _AL , MV _AR , MV _BL	11
22	MV _T MV _T	44

Table-8C

The fifth mode is different from the foregoing first to fourth modes in that the coding end indicates the motion vector accuracy of a certain motion vector predictor set by signal; on the basis of any of the foregoing modes, the decoder further parses the code. The stream obtains the motion vector precision of a certain motion vector predictor set, thereby obtaining a correspondence between multiple motion vector sets and various motion vector precisions.

It should be noted that the third motion vector set herein represents a set of important motion vector predictor values, that is, a motion vector set formed by one or more motion vector predictors whose probability of being the best motion vector predictor is the largest or larger. The third motion vector accuracy is the motion vector precision used by the third motion vector set. The third motion vector set may be a default set of motion vectors, such as a first motion vector set of the plurality of motion vector sets (the motion vector set 0 in the above table); of course, the third motion The vector precision set may also be a second motion vector set, or may be other motion vector sets different from the first or second motion vector set; further, the third motion vector precision may be the multiple motion vector precision The first motion vector accuracy, or the second motion vector accuracy, or other motion vector accuracy, is not limited in this application.

Correspondingly, in an implementation manner, the code stream received in step 701 further carries a first identifier for indicating the accuracy of the third motion vector corresponding to the third motion vector set. In other words, the codec terminals may be pre-agreed in advance. The third motion vector accuracy of the transmission corresponds to which motion vector set (or which position motion vector predictor), so that only the indication of the motion vector accuracy needs to be transmitted in the code stream;

This identification information (eg, the first identification) can be used for different decoding processing units such as sequence level, image level, strip level, area level, CTU level, CU level, and the like. Accordingly, the identification information can be carried in a sequence parameter set, an image parameter set, a strip header or other location.

As shown in FIG. 6D, the identification information (for example, the first identifier, specifically the motion vector accuracy index in Table-6) is used for the image hierarchy. Accordingly, the identification information is carried in the image parameter set PPS header information, in the syntax. In the structure "Picture parameter set RBSP syntax", pps_amvr_accuracy is 0, indicating that the motion vector accuracy of a certain motion vector set (for example, the third motion vector set) is 1/1 in the decoding process of which image block in the current image. 8. When pps_amvr_accuracy is 1, it indicates that no matter which image block is decoded in the current image, the default motion vector set (for example, the third motion vector set) has a motion vector precision of 1/4, and pps_amvr_accuracy is 2 when In the decoding process of which image block in the current image, the motion vector accuracy of a certain motion vector set (for example, the third motion vector set) is 1/2, and pps_amvr_accuracy is 3, which indicates which one of the current images is. In the decoding process of the image block, the default motion vector accuracy of a certain motion vector set (for example, the third motion vector set) is 1.

As an alternative implementation, the code stream received in step 701 further carries a first identifier for indicating the accuracy of the third motion vector and a second identifier for indicating the third motion vector set. In other words, the codec end is not It is pre-agreed which of the motion vector sets (or which position motion vector predictors) the transmitted third motion vector precision corresponds to, and which third motion vector precision corresponds to which motion vector set is also transmitted in the code stream. ;

The identification information (eg, the first identifier and the second identifier) can be used for different decoding processing units such as sequence level, image level, strip level, area level, CTU level, CU level, and the like. Accordingly, the identification information can be carried in a sequence parameter set, an image parameter set, a strip header or other location.

As shown in FIG. 6E, the identification information (for example, the first identifier (specifically, the motion vector precision index in Table-6) and the second identifier (specifically, the index of the motion vector set) are used for the image hierarchy, and accordingly, this The identifier information is carried in the PPS header information of the image parameter set. In the syntax structure "Picture parameter set RBSP syntax", pps_amvr set_idx represents an index of the motion vector set, and the value range is a positive integer, that is, there is at least one motion vector set; Pps_amvr set_accuracy[pps_amvr set_idx] represents the value of the motion vector precision corresponding to the motion vector set pps_amvr set_idx. Where pps_amvr set_accuracy[pps_amvr set_idx] is 0, indicating that the motion vector of the motion vector set specified by pps_amvr set_idx is 1/8 in the decoding process of the image block in the current image; pps_amvr set_accuracy[pps_amvr set_idx] is 1 Indicates that the motion vector of the motion vector set specified by pps_amvr set_idx has a precision of 1/4 in the decoding process of the image block in the current image, and pps_amvr set_accuracy[pps_amvr set_idx] is 2, indicating that the current image is In the decoding process of which image block, the motion vector set of the motion vector set specified by pps_amvr set_idx has a precision of 1/2, and pps_amvr set_accuracy[pps_amvr set_idx] is 3, indicating which image block is decoded during the current image. The motion vector set of the motion vector set specified by pps_amvr set_idx has a precision of 1.

For example, after parsing the code stream, pps_amvr set_idx=0; pps_amvr set_accuracy[pps_amvr set_idx]=1/8 is obtained. As shown in Table-9A and Table-9B, the motion vector precision corresponding to each motion vector set is as follows:

Table-9A

运动矢量集合Sport vector collection	运动矢量预测值(用位置表示)Motion vector predictor (represented by position)	运动矢量精度Motion vector accuracy
00	MV _A1,MV _B1 MV _A1 , MV _B1	1/81/8
11	MV _A0,MV _B0 MV _A0 , MV _B0	1/21/2
22	MV _B2 MV _B2	11
33	MV _T MV _T	22

Table-9B

Step 7052: Obtain a motion vector of a currently decoded image block based on the motion vector predictor, the motion vector difference information, and a target motion vector precision; wherein, as an example implementation, calculate the motion vector predictor and a sum of motion vector difference MVD information to obtain a motion vector of the current decoded image block, the motion vector of the current decoded image block, the motion vector predictor value, and the motion vector difference value MVD having the same motion vector accuracy (ie, Both have target motion vector accuracy). As another example implementation, the motion vector difference information is scaled (eg, amplified) based on the target motion vector precision to obtain a scaled (eg, amplified) motion vector difference MVD; And a sum of the motion vector predictor and the amplified motion vector difference MVD to obtain a motion vector of the current decoded image block, a motion vector of the currently decoded image block, the motion vector predictor, and a motion vector The difference MVD has the same motion vector accuracy (ie, both have target motion vector accuracy). One of the calculation modes is: MV=MVP+(MVD<<mvrIdx), where MV represents a motion vector of the currently decoded image block, MVP represents a motion vector predictor of the currently decoded image block, and MVD represents the code. The motion vector difference MVD information in the stream, mvrIdx represents the index value of the target motion vector precision, and << represents the left shift, that is, the amplification processing. It should be understood that if mvrIdx=0, the amplified MVD (eg, shifted left by 0 bits) is the same as the MVD before amplification; if mvrIdx is not equal to 0 (eg, mvrIdx=2), based on the accuracy of the target motion vector The index performs an amplification process on the calculated MVD (for example, shifting left by 2 bits) to obtain an enlarged motion vector difference value MVD.

Step 707: Obtain a prediction block of a currently decoded image block based on a motion vector of the current decoded image block having a target motion vector precision.

For example, step 707 can be performed by inter prediction module 82 (e.g., motion compensation module 82) of video decoder 30;

Motion compensation performed by inter prediction module 82 may involve extracting or generating a prediction block based on a motion vector, for example, inter prediction module 82 may locate a prediction block to which the motion vector is directed in one of the reference image lists. Inter prediction module 82 may also be involved in performing interpolation of fractional pixel precision. Referring to Figure 4B, the value of the pixel at integer pixel location 100 generally corresponds to the actual value of the pixel in the reference image. If the target motion vector accuracy is fractional pixel precision (for example, one-half pixel precision, or quarter-pixel precision, or one-eighth pixel precision), then the pixel value of the entire pixel position of the reference image needs to be adopted. The interpolation filter performs interpolation to obtain a pixel value of the fractional pixel position, thereby obtaining the value of the prediction block of the current block.

Step 709: Perform reconstruction on the current decoded image block based on the prediction block, thereby completing a decoding process of the currently decoded image block.

For example, step 709 can be performed by reconstruction module 90 of video decoder 30, such as summer 90.

It can be seen that, in the video decoding method of the embodiment of the present application, on the one hand, after the video decoder obtains the motion vector predictor of the currently decoded image block, the target motion vector accuracy corresponding to the target motion vector set to which the motion vector predictor belongs is obtained. The method can adaptively determine the appropriate motion vector precision for the currently decoded image block. If the number of motion vector sets is N, the video decoder can support M motion vector MV precision, where M is less than or equal to N, Both M and N are positive integers, for example, M, N is greater than or equal to 3, which improves motion vector prediction accuracy. Specifically, since the embodiment of the present invention can adaptively select motion vector accuracy, one or some corresponding to certain video content Multiple image blocks, using motion vectors with higher pixel precision (such as 1/8 pixel precision) versus motion vectors using lower pixel precision, improve video codec quality, and the benefits are better than interpolation overhead The cost, such as a prediction block based on a motion vector with high pixel precision (eg 1/8 pixel precision), is closer to the current decoding. Like the original block of a block, even if the interpolated fractional pixel position results in some interpolation overhead; for one or more image blocks corresponding to some video content, a motion vector with lower pixel precision (such as integer pixel precision) is used relative to The video decoding method of the embodiment of the present application improves the codec performance as a whole, and the video decoding method of the embodiment of the present application improves the codec performance.

On the other hand, the video decoder can adaptively select the target motion vector precision to decode the MVD information in the code stream, and the MVD information in the code stream occupies less bits or occupies the same number of bits relative to the original MVD. For example, the MVD information is amplified or restored to the original MVD according to the target motion vector precision, which saves bit transmission overhead to a certain extent, thereby improving the codec performance;

In still another aspect, the decoding method of the embodiment of the present invention can be adaptively determined for use in the case where the video encoder does not signal any information related to motion vector accuracy or only signals the motion vector accuracy of the important motion vector set. The motion vector accuracy of the currently decoded image block can further improve the coding efficiency and further save the bit overhead, thereby further improving the encoding and decoding performance.

Additionally, the video encoding or decoding method of embodiments of the present application can be implemented in any electronic device or device that requires encoding and/or decoding of a video image.

FIG. 11 is a schematic block diagram of an implementation manner of a video encoding device or a video decoding device (referred to as a decoding device 1100) according to an embodiment of the present disclosure. The decoding device 1100 may include a processor 1110, a memory 1130, and a bus system 1150. The processor and the memory are connected by a bus system for storing instructions for executing instructions stored in the memory. The memory of the decoding device stores program code, and the processor can invoke program code stored in the memory to perform various method instances of the video encoding or decoding process described herein to adaptively select a target motion vector for encoding or decoding the image block. Accuracy (such as integer pixel precision or 1/2 pixel precision or 1/4 pixel precision or 1/8 pixel precision or 4 pixel precision, etc.). To avoid repetition, it will not be described in detail here.

In the embodiment of the present application, the processor 1110 may be a central processing unit (CPU), and the processor 1110 may also be other general-purpose processors, digital signal processors (DSPs), and dedicated integration. Circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The memory 1130 can include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 1130. Memory 1130 can include code and data 1131 accessed by processor 1110 using bus 1150. Memory 1130 can further include an operating system 1133 and an application 1135 that includes at least one program that allows processor 1110 to perform the video encoding or decoding methods described herein. For example, the application 1135 can include applications 1 through N, which further include a video encoding or decoding application (referred to as a video coding application) that performs the video encoding or decoding methods described herein.

The bus system 1150 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 1150 in the figure.

Alternatively, the decoding device 1100 may also include one or more output devices, such as a display 1170. In one example, display 1170 can be a tactile display or a touch display that combines the display with a tactile unit that operatively senses a touch input. Display 1170 can be coupled to processor 1110 via bus 1150.

12 is a schematic block diagram of an electronic device 1200 having a video codec function, which may incorporate a video encoder or video decoder in accordance with an embodiment of the present invention, in accordance with an embodiment of the present application. The electronic device 1200 can be, for example, a mobile terminal or user equipment of a wireless communication system. It should be understood that embodiments of the present application can be implemented in any electronic device or device that requires encoding and decoding, or encoding, or decoding of a video image.

Device 1200 can include a housing for incorporating and protecting the device. Device 1200 can also include display 1232 in the form of a liquid crystal display. In other embodiments of the invention, the display may be any suitable display technology suitable for displaying images or video. Device 1200 can also include a user input/output interface 1234. In other embodiments of the invention, any suitable data or user interface mechanism may be utilized. For example, the user interface can be implemented as a virtual keyboard or data entry system as part of a touch sensitive display, such as a touch screen. The device may include a microphone 1236 or any suitable audio input, which may be a digital or analog signal input. Apparatus 1200 can also include an audio output device, which in an embodiment of the invention can be any of the following: earphone 1238, speaker or analog audio or digital audio output connection. Device 1200 can also include a battery, and in other embodiments of the invention, the device can be powered by any suitable mobile energy device, such as a solar cell, fuel cell, or clock mechanism generator. The device may also include an infrared port 1242 for short range line of sight communication with other devices. In other embodiments, device 1200 may also include any suitable short range communication solution, such as a Bluetooth wireless connection or a USB wired connection.

Device 1200 can include a controller 56 or processor for controlling device 1200. Controller 56 may be coupled to memory 1258, which may store data in the form of data and audio in an embodiment of the invention, and/or may also store instructions for execution on controller 1256. Controller 1256 can also be coupled to codec circuitry 54 suitable for implementing encoding and decoding of audio and/or video data or assisted encoding and decoding by controller 1256.

Apparatus 1200 can also include card reader 1248 and smart card 1246 for providing user information and for providing authentication information for authenticating and authorizing users on the network.

Apparatus 1200 can also include a radio interface circuit 1252 that is coupled to the controller and that is adapted to generate wireless communication signals, for example, for communicating with a cellular communication network, a wireless communication system, or a wireless local area network. Apparatus 1200 can also include an antenna 1244 that is coupled to radio interface circuitry 1252 for transmitting radio frequency signals generated at radio interface circuitry 1252 to other apparatus(s) and for receiving radio frequency signals from other apparatus(s).

In some embodiments of the invention, apparatus 1200 includes a camera capable of recording or detecting a single frame, and codec 1254 or controller receives the individual frames and processes them. In some embodiments of the invention, the device may receive video image data to be processed from another device prior to transmission and/or storage. In some embodiments of the invention, device 1200 may receive images for encoding/decoding via a wireless or wired connection.

The terms "first", "second" and "third" appearing in this application do not have a meaning of order, only to distinguish two subjects in some description contexts for convenience of understanding, but the subject indicated It is not necessary to be a different subject in all embodiments. "A and/or B" appearing in the present application includes three cases of A, B, and A and B.

It should be noted that the same steps or the same terms are to be construed as being the same as the different embodiments. For the sake of brevity, the repeated description is omitted as appropriate.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

Those skilled in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described herein can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the various illustrative logical blocks, modules, and functions described in the steps can be stored or transmitted as one or more instructions or code on a computer readable medium and executed by a hardware-based processing unit. The computer readable medium can comprise a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or any communication medium that facilitates transfer of the computer program from one location to another (eg, according to a communication protocol) . In this manner, a computer readable medium may generally correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium, such as a signal or carrier. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application. The computer program product can comprise a computer readable medium.

By way of example and not limitation, such computer readable storage medium may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage device, flash memory or may be used to store instructions or data structures The desired program code in the form of any other medium that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if a coaxial cable, fiber optic cable, twisted pair cable, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit commands from a website, server, or other remote source, then the coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media. However, it should be understood that the computer readable storage medium and data storage medium do not include connections, carrier waves, signals, or other temporary media, but rather are directed to non-transitory tangible storage media. As used herein, magnetic disks and optical disks include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), and blu-ray disc, where the disc typically reproduces data magnetically, while the disc is optically reproduced using a laser data. Combinations of the above should also be included in the scope of computer readable media.

One or more processes, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits To perform the corresponding functions. Accordingly, the term "processor," as used herein, may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functions described in the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or Into the combined codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements. In one example, various illustrative logical blocks, units, modules in video encoder 20 and video decoder 30 may be understood as corresponding circuit devices or logic elements.

The techniques of the present application can be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset). Various components, modules or units are described herein to emphasize functional aspects of the apparatus for performing the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Indeed, as described above, various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (including one or more processors as described above) provide.

The foregoing is only an exemplary embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or within the technical scope disclosed by the present application. Replacement should be covered by the scope of this application. Therefore, the scope of protection of the present application should be determined by the scope of protection of the claims.

Claims

A video decoding method, comprising:

Receiving a code stream carrying motion vector difference MVD information of a currently decoded image block;

Parsing the motion vector difference information from the code stream;

Obtaining a motion vector predictor MVP of the current decoded image block;

When the motion vector predictor is included in the target motion vector set corresponding to the target motion vector accuracy, the current is obtained based on the motion vector predictor, the motion vector difference information, and the target motion vector accuracy Decoding a motion vector of the image block, wherein the motion vector of the current decoded image block has the target motion vector precision, the target motion vector set being one of a plurality of motion vector sets, the target motion vector accuracy being included One of a plurality of motion vector precisions of a first motion vector accuracy and a second motion vector accuracy, the plurality of motion vector sets including a first motion vector set and a second motion vector set, the first motion vector set and At least one of the second motion vector sets includes two or more motion vector predictors, and the first motion vector set corresponding to the first motion vector set is different from the second motion vector set Second motion vector accuracy;

Obtaining a prediction block of the currently decoded image block based on the motion vector of the currently decoded image block;

The current decoded image block is reconstructed based on the prediction block.
The method according to claim 1, wherein said motion vector predictor and said motion vector are based on said motion vector predictor when said motion vector predictor is included in a target motion vector set corresponding to target motion vector accuracy The difference information and the target motion vector precision are obtained, and the motion vector of the current decoded image block is obtained, including:

Determining that the motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the motion vector prediction according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions The value has the motion vector accuracy;

Calculating a sum of the motion vector predictor and the motion vector difference information to obtain a motion vector of the current decoded image block, where the motion vector of the currently decoded image block, the motion vector predictor MVP, and the The motion vector difference MVD has the target motion vector accuracy; or

And enlarging the motion vector difference information based on the target motion vector precision to obtain an amplified motion vector difference MVD; calculating a sum of the motion vector predictor and the amplified motion vector difference MVD And obtaining a motion vector of the current decoded image block, wherein the motion vector of the current decoded image block, the motion vector predictor, and the amplified motion vector difference MVD have the target motion vector precision.
A method according to claim 1 or 2, wherein

The first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is different from the second neighboring block corresponding to the motion vector predictor in the second motion vector set The second neighboring block and the second neighboring block are included in a spatial neighboring block and/or a time domain neighboring block of the current image block, with a second distance from the current image block.
A method according to any one of claims 1 to 3, characterized in that

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is smaller than the second neighbor corresponding to the motion vector predictor in the second motion vector set a second motion distance of the first motion vector set is higher than a second motion vector accuracy corresponding to the second motion vector set; or

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is greater than the second distance corresponding to the motion vector predictor in the second motion vector set And the second motion distance of the first motion vector set is lower than the second motion vector precision corresponding to the second motion vector set.
The method according to claim 2, wherein the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.
The method according to claim 2 or 5, wherein the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector precision assignment rule, wherein the motion vectors The precision assignment rule is used to: the farther the distance between the neighboring block corresponding to the motion vector predictor included in the motion vector set and the current decoded image block is, the lower the motion vector accuracy is; the motion vector set is included in the motion vector set. The closer the distance between the neighboring block corresponding to the motion vector predictor and the currently decoded image block, the higher the motion vector accuracy.
The method according to any one of claims 1 to 6, wherein the code stream further carries a motion vector precision parameter, and the motion vector precision parameter is used to indicate a value of the plurality of motion vector precisions. The motion vector accuracy parameter is carried in any one of a sequence parameter set SPS, an image parameter set PPS, or a slice header of the currently decoded image block.
The method according to any one of claims 1 to 6, wherein the code stream further carries a motion vector precision parameter, and the motion vector precision parameter is used to indicate a motion vector accuracy for the current decoding processing unit. And a number of at least two motion vector precision values corresponding to the number of motion vector precisions, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, a decoding tree unit CTU, and a decoding unit CU One or more.
A method according to any one of claims 1 to 8, wherein

The code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; or

The code stream further carries a first identifier and a second identifier, where the first identifier is used to indicate a third motion vector precision, and the second identifier is used to indicate a third motion vector set;

The third motion vector set is a first motion vector set of the plurality of motion vector sets, or a second motion vector set, or other motion vector set, and the third motion vector precision is the multiple First motion vector accuracy in motion vector accuracy, or second motion vector accuracy, or other motion vector accuracy.
The method according to any one of claims 1 to 9, wherein the code stream further carries a third identifier, where the third identifier is used to indicate a candidate motion vector predictor value MVP of the currently decoded image block; Obtaining the motion vector predictor MVP of the current decoded image block, including: determining a candidate motion vector predictor MVP of the current decoded image block from the candidate motion vector predictor list based on the third identifier;

or,

The acquiring the motion vector predictor MVP of the current decoded image block includes: acquiring a motion vector predictor MVP of the current decoded image block by using a bidirectional matching method or a template matching method.
A video decoder, comprising:

An entropy decoding module, configured to receive a code stream, where the code stream carries motion vector difference MVD information of a currently decoded image block, and parses motion vector difference information of the current decoded image block from the code stream;

An inter prediction module, configured to acquire a motion vector predictor MVP of the current decoded image block; and when the motion vector predictor is included in a target motion vector set corresponding to a target motion vector accuracy, based on the motion vector prediction a value, the motion vector difference information, and the target motion vector accuracy, to obtain a motion vector of the current decoded image block, wherein the motion vector of the current decoded image block has the target motion vector accuracy, the target The motion vector set is one of a plurality of motion vector sets, the target motion vector precision being one of a plurality of motion vector precisions including a first motion vector precision and a second motion vector precision, the plurality of motion vector sets The first motion vector set and the second motion vector set are included, and at least one of the first motion vector set and the second motion vector set includes two or more motion vector predictors, and the first motion The first motion vector accuracy corresponding to the vector set is different from the second motion vector set corresponding to Two motion vector accuracy; and a prediction block based on the motion vector decoding a current image block to obtain the decoded image of the current block;

And a reconstruction module, configured to reconstruct the current decoded image block based on the prediction block of the current decoded image block.
The video decoder according to claim 11, wherein when said motion vector predictor is included in a target motion vector set corresponding to a target motion vector accuracy, based on said motion vector predictor value Having the MVD information and the target motion vector precision, the aspect of the motion vector of the current decoded image block is obtained, and the inter prediction module is specifically configured to:

Determining that the motion vector predictor is included in the target motion vector set, and determining a target motion vector accuracy corresponding to the target vector set as the motion vector prediction according to a correspondence between a plurality of motion vector sets and a plurality of motion vector precisions a value of motion vector accuracy; calculating a sum of the motion vector predictor and the motion vector difference information to obtain a motion vector of the current decoded image block, wherein the motion vector of the currently decoded image block, the The motion vector predictor value MVP and the motion vector difference value MVD have the target motion vector accuracy; or

And enlarging the motion vector difference information based on the target motion vector precision to obtain an amplified motion vector difference MVD; calculating a sum of the motion vector predictor and the amplified motion vector difference MVD And obtaining a motion vector of the current decoded image block, wherein the motion vector of the current decoded image block, the motion vector predictor, and the amplified motion vector difference MVD have the target motion vector precision.
A video decoder as claimed in claim 11 or 12, wherein

The first distance between the first neighboring block and the current image block corresponding to the motion vector predictor in the first motion vector set is different from the second neighboring block corresponding to the motion vector predictor in the second motion vector set The second neighboring block and the second neighboring block are included in a spatial neighboring block and/or a time domain neighboring block of the current image block, with a second distance from the current image block.
A video decoder according to any of claims 11 to 13, wherein

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is smaller than the second neighbor corresponding to the motion vector predictor in the second motion vector set a second motion distance of the first motion vector set is higher than a second motion vector accuracy corresponding to the second motion vector set; or

If the first distance between the first neighboring block corresponding to the motion vector predictor in the first motion vector set and the current decoded image block is greater than the second distance corresponding to the motion vector predictor in the second motion vector set And the second motion distance of the first motion vector set is lower than the second motion vector precision corresponding to the second motion vector set.
The video decoder according to claim 12, wherein the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is preset.
The video decoder according to claim 12 or 15, wherein the correspondence between the plurality of motion vector sets and the plurality of motion vector precisions is determined according to a motion vector accuracy assignment rule, wherein The motion vector precision assignment rule is used to: the farther the distance between the neighboring block corresponding to the motion vector predictor corresponding to the motion vector predictor and the current decoded image block is, the lower the motion vector accuracy is; The closer the distance between the neighboring block corresponding to the included motion vector predictor and the currently decoded image block, the higher the motion vector accuracy.
The video decoder according to any one of claims 11 to 16, wherein the code stream further carries a motion vector precision parameter, and the motion vector precision parameter is used to indicate a value of the plurality of motion vector precisions. The motion vector accuracy parameter is carried in any one of a sequence parameter set, an image parameter set PPS, or a slice header of the decoded image block;

Correspondingly, the entropy decoding module is further configured to parse the motion vector precision parameter from the code stream.
The video decoder according to any one of claims 11 to 16, wherein the code stream further carries a motion vector precision parameter, the motion vector precision parameter being used to indicate a motion vector for the current decoding processing unit a number of precisions and at least two motion vector precision values corresponding to the number of motion vector precisions, wherein the current decoding processing unit includes a video sequence, an image, a slice slice, a region partition, a decoding tree unit CTU, and a decoding unit CU One or more of

Correspondingly, the entropy decoding module is further configured to parse the motion vector precision parameter from the code stream.
A video decoder according to any of claims 11 to 18, characterized in that

The code stream further carries a first identifier, where the first identifier is used to indicate a third motion vector precision corresponding to the third motion vector set; or

The code stream further carries a first identifier and a second identifier, where the first identifier is used to indicate a third motion vector precision, and the second identifier is used to indicate a third motion vector set;

Correspondingly, the entropy decoding module is further configured to parse the first identifier from the code stream, or decode the first identifier and the second identifier;

The third motion vector set is a first motion vector set of the plurality of motion vector sets, or a second motion vector set, or other motion vector set; the third motion vector precision is the multiple First motion vector accuracy in motion vector accuracy, or second motion vector accuracy, or other motion vector accuracy.
The video decoder according to any one of claims 11 to 19, wherein the code stream further carries a third identifier, where the third identifier is used to indicate a candidate motion vector predictor value MVP of the currently decoded image block. ;

The entropy decoding module is further configured to parse the third identifier from the code stream;

In the aspect of acquiring the motion vector predictor MVP of the current decoded image block, the inter prediction module is specifically configured to: determine a candidate motion of the currently decoded image block from the candidate motion vector prediction list based on the third identifier Vector predictive value MVP;

or,

In the aspect of acquiring the motion vector predictor MVP of the current decoded image block, the inter prediction module is specifically configured to: acquire the motion vector predictor MVP of the current decoded image block by using a bidirectional matching method or a template matching method.
An electronic device comprising the video decoder of any one of claims 11 to 20.